* [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation @ 2024-07-15 5:19 Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 1/8] hw/pci: Do not add ROM BAR for SR-IOV VF Akihiko Odaki ` (8 more replies) 0 siblings, 9 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto") Introduction ------------ This series is based on the RFC series submitted by Yui Washizu[1]. See also [2] for the context. This series enables SR-IOV emulation for virtio-net. It is useful to test SR-IOV support on the guest, or to expose several vDPA devices in a VM. vDPA devices can also provide L2 switching feature for offloading though it is out of scope to allow the guest to configure such a feature. The PF side code resides in virtio-pci. The VF side code resides in the PCI common infrastructure, but it is restricted to work only for virtio-net-pci because of lack of validation. User Interface -------------- A user can configure a SR-IOV capable virtio-net device by adding virtio-net-pci functions to a bus. Below is a command line example: -netdev user,id=n -netdev user,id=o -netdev user,id=p -netdev user,id=q -device pcie-root-port,id=b -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f The VFs specify the paired PF with "sriov-pf" property. The PF must be added after all VFs. It is user's responsibility to ensure that VFs have function numbers larger than one of the PF, and the function numbers have a consistent stride. Keeping VF instances -------------------- A problem with SR-IOV emulation is that it needs to hotplug the VFs as the guest requests. Previously, this behavior was implemented by realizing and unrealizing VFs at runtime. However, this strategy does not work well for the proposed virtio-net emulation; in this proposal, device options passed in the command line must be maintained as VFs are hotplugged, but they are consumed when the machine starts and not available after that, which makes realizing VFs at runtime impossible. As an strategy alternative to runtime realization/unrealization, this series proposes to reuse the code to power down PCI Express devices. When a PCI Express device is powered down, it will be hidden from the guest but will be kept realized. This effectively implements the behavior we need for the SR-IOV emulation. Summary ------- Patch 1 disables ROM BAR, which virtio-net-pci enables by default, for VFs. Patch 2 makes zero stride valid for 1 VF configuration. Patch 3 and 4 adds validations. Patch 5 adds user-created SR-IOV VF infrastructure. Patch 6 makes virtio-pci work as SR-IOV PF for user-created VFs. Patch 7 allows user to create SR-IOV VFs with virtio-net-pci. [1] https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/ [2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4bc5@daynix.com/ Co-developed-by: Yui Washizu <yui.washidu@gmail.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- Changes in v5: - Dropped the RFC tag. - Fixed device unrealization. - Rebased. - Link to v4: https://lore.kernel.org/r/20240428-sriov-v4-0-ac8ac6212982@daynix.com Changes in v4: - Added patch "hw/pci: Fix SR-IOV VF number calculation" to fix division by zero reported by Yui Washizu. - Rebased. - Link to v3: https://lore.kernel.org/r/20240305-sriov-v3-0-abdb75770372@daynix.com Changes in v3: - Rebased. - Link to v2: https://lore.kernel.org/r/20231210-sriov-v2-0-b959e8a6dfaf@daynix.com Changes in v2: - Changed to keep VF instances. - Link to v1: https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7bd6@daynix.com --- Akihiko Odaki (8): hw/pci: Do not add ROM BAR for SR-IOV VF hw/pci: Fix SR-IOV VF number calculation pcie_sriov: Ensure PF and VF are mutually exclusive pcie_sriov: Check PCI Express for SR-IOV PF pcie_sriov: Allow user to create SR-IOV device virtio-pci: Implement SR-IOV PF virtio-net: Implement SR-IOV VF docs: Document composable SR-IOV device MAINTAINERS | 1 + docs/system/index.rst | 1 + docs/system/sriov.rst | 36 +++++ include/hw/pci/pci_device.h | 6 +- include/hw/pci/pcie_sriov.h | 18 +++ include/hw/virtio/virtio-pci.h | 1 + hw/pci/pci.c | 76 +++++++---- hw/pci/pcie_sriov.c | 298 +++++++++++++++++++++++++++++++++-------- hw/virtio/virtio-net-pci.c | 1 + hw/virtio/virtio-pci.c | 20 ++- 10 files changed, 369 insertions(+), 89 deletions(-) --- base-commit: e7b8390b9167be9272c2c959919d9b397842da7f change-id: 20231202-sriov-9402fb262be8 Best regards, -- Akihiko Odaki <akihiko.odaki@daynix.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v5 1/8] hw/pci: Do not add ROM BAR for SR-IOV VF 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki @ 2024-07-15 5:19 ` Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 2/8] hw/pci: Fix SR-IOV VF number calculation Akihiko Odaki ` (7 subsequent siblings) 8 siblings, 0 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki A SR-IOV VF cannot have a ROM BAR. Co-developed-by: Yui Washizu <yui.washidu@gmail.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/pci/pci.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index ca8fb5383765..497a057b79f9 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2412,6 +2412,14 @@ static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom, return; } + if (pci_is_vf(pdev)) { + if (pdev->rom_bar == ON_OFF_AUTO_ON) { + error_setg(errp, "ROM BAR cannot be enabled for SR-IOV VF"); + } + + return; + } + if (load_file || pdev->romsize == UINT32_MAX) { path = qemu_find_file(QEMU_FILE_TYPE_BIOS, pdev->romfile); if (path == NULL) { -- 2.45.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 2/8] hw/pci: Fix SR-IOV VF number calculation 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 1/8] hw/pci: Do not add ROM BAR for SR-IOV VF Akihiko Odaki @ 2024-07-15 5:19 ` Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 3/8] pcie_sriov: Ensure PF and VF are mutually exclusive Akihiko Odaki ` (6 subsequent siblings) 8 siblings, 0 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki pci_config_get_bar_addr() had a division by vf_stride. vf_stride needs to be non-zero when there are multiple VFs, but the specification does not prohibit to make it zero when there is only one VF. Do not perform the division for the first VF to avoid division by zero. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/pci/pci.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 497a057b79f9..ae7137d70579 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -1490,7 +1490,11 @@ static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg, pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_OFFSET); uint16_t vf_stride = pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_STRIDE); - uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) / vf_stride; + uint32_t vf_num = d->devfn - (pf->devfn + vf_offset); + + if (vf_num) { + vf_num /= vf_stride; + } if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) { new_addr = pci_get_quad(pf->config + bar); -- 2.45.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 3/8] pcie_sriov: Ensure PF and VF are mutually exclusive 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 1/8] hw/pci: Do not add ROM BAR for SR-IOV VF Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 2/8] hw/pci: Fix SR-IOV VF number calculation Akihiko Odaki @ 2024-07-15 5:19 ` Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 4/8] pcie_sriov: Check PCI Express for SR-IOV PF Akihiko Odaki ` (5 subsequent siblings) 8 siblings, 0 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki A device cannot be a SR-IOV PF and a VF at the same time. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/pci/pcie_sriov.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c index 56523ab4e833..ec8fc0757b92 100644 --- a/hw/pci/pcie_sriov.c +++ b/hw/pci/pcie_sriov.c @@ -42,6 +42,11 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, uint8_t *cfg = dev->config + offset; uint8_t *wmask; + if (pci_is_vf(dev)) { + error_setg(errp, "a device cannot be a SR-IOV PF and a VF at the same time"); + return false; + } + if (total_vfs) { uint16_t ari_cap = pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI); uint16_t first_vf_devfn = dev->devfn + vf_offset; -- 2.45.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 4/8] pcie_sriov: Check PCI Express for SR-IOV PF 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki ` (2 preceding siblings ...) 2024-07-15 5:19 ` [PATCH v5 3/8] pcie_sriov: Ensure PF and VF are mutually exclusive Akihiko Odaki @ 2024-07-15 5:19 ` Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 5/8] pcie_sriov: Allow user to create SR-IOV device Akihiko Odaki ` (4 subsequent siblings) 8 siblings, 0 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki SR-IOV requires PCI Express. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/pci/pcie_sriov.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c index ec8fc0757b92..3af0cc7d560a 100644 --- a/hw/pci/pcie_sriov.c +++ b/hw/pci/pcie_sriov.c @@ -42,6 +42,11 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, uint8_t *cfg = dev->config + offset; uint8_t *wmask; + if (!pci_is_express(dev)) { + error_setg(errp, "PCI Express is required for SR-IOV PF"); + return false; + } + if (pci_is_vf(dev)) { error_setg(errp, "a device cannot be a SR-IOV PF and a VF at the same time"); return false; -- 2.45.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 5/8] pcie_sriov: Allow user to create SR-IOV device 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki ` (3 preceding siblings ...) 2024-07-15 5:19 ` [PATCH v5 4/8] pcie_sriov: Check PCI Express for SR-IOV PF Akihiko Odaki @ 2024-07-15 5:19 ` Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 6/8] virtio-pci: Implement SR-IOV PF Akihiko Odaki ` (3 subsequent siblings) 8 siblings, 0 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki A user can create a SR-IOV device by specifying the PF with the sriov-pf property of the VFs. The VFs must be added before the PF. A user-creatable VF must have PCIDeviceClass::sriov_vf_user_creatable set. Such a VF cannot refer to the PF because it is created before the PF. A PF that user-creatable VFs can be attached calls pcie_sriov_pf_init_from_user_created_vfs() during realization and pcie_sriov_pf_exit() when exiting. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- include/hw/pci/pci_device.h | 6 +- include/hw/pci/pcie_sriov.h | 18 +++ hw/pci/pci.c | 62 ++++++---- hw/pci/pcie_sriov.c | 288 +++++++++++++++++++++++++++++++++++--------- 4 files changed, 291 insertions(+), 83 deletions(-) diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h index 49b341ce2e27..14d391333aff 100644 --- a/include/hw/pci/pci_device.h +++ b/include/hw/pci/pci_device.h @@ -37,6 +37,8 @@ struct PCIDeviceClass { uint16_t subsystem_id; /* only for header type = 0 */ const char *romfile; /* rom bar */ + + bool sriov_vf_user_creatable; }; enum PCIReqIDType { @@ -160,6 +162,8 @@ struct PCIDevice { /* ID of standby device in net_failover pair */ char *failover_pair_id; uint32_t acpi_index; + + char *sriov_pf; }; static inline int pci_intx(PCIDevice *pci_dev) @@ -192,7 +196,7 @@ static inline int pci_is_express_downstream_port(const PCIDevice *d) static inline int pci_is_vf(const PCIDevice *d) { - return d->exp.sriov_vf.pf != NULL; + return d->sriov_pf || d->exp.sriov_vf.pf != NULL; } static inline uint32_t pci_config_size(const PCIDevice *d) diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h index c5d2d318d330..f75b8f22ee92 100644 --- a/include/hw/pci/pcie_sriov.h +++ b/include/hw/pci/pcie_sriov.h @@ -18,6 +18,7 @@ typedef struct PCIESriovPF { uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each VF bar */ PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */ + bool vf_user_created; /* If VFs are created by user */ } PCIESriovPF; typedef struct PCIESriovVF { @@ -40,6 +41,23 @@ void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num, void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num, MemoryRegion *memory); +/** + * pcie_sriov_pf_init_from_user_created_vfs() - Initialize PF with user-created + * VFs. + * @dev: A PCIe device being realized. + * @offset: The offset of the SR-IOV capability. + * @errp: pointer to Error*, to store an error if it happens. + * + * Return: The size of added capability. 0 if the user did not create VFs. + * -1 if failed. + */ +int16_t pcie_sriov_pf_init_from_user_created_vfs(PCIDevice *dev, + uint16_t offset, + Error **errp); + +bool pcie_sriov_register_device(PCIDevice *dev, Error **errp); +void pcie_sriov_unregister_device(PCIDevice *dev); + /* * Default (minimal) page size support values * as required by the SR/IOV standard: diff --git a/hw/pci/pci.c b/hw/pci/pci.c index ae7137d70579..0368e448992a 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -138,6 +138,7 @@ static Property pci_props[] = { QEMU_PCIE_ERR_UNC_MASK_BITNR, true), DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present, QEMU_PCIE_ARI_NEXTFN_1_BITNR, false), + DEFINE_PROP_STRING("sriov-pf", PCIDevice, sriov_pf), DEFINE_PROP_END_OF_LIST() }; @@ -1012,13 +1013,8 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev, Error **errp) dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION; } - /* - * With SR/IOV and ARI, a device at function 0 need not be a multifunction - * device, as it may just be a VF that ended up with function 0 in - * the legacy PCI interpretation. Avoid failing in such cases: - */ - if (pci_is_vf(dev) && - dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { + /* SR/IOV is not handled here. */ + if (pci_is_vf(dev)) { return; } @@ -1051,7 +1047,8 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev, Error **errp) } /* function 0 indicates single function, so function > 0 must be NULL */ for (func = 1; func < PCI_FUNC_MAX; ++func) { - if (bus->devices[PCI_DEVFN(slot, func)]) { + PCIDevice *device = bus->devices[PCI_DEVFN(slot, func)]; + if (device && !pci_is_vf(device)) { error_setg(errp, "PCI: %x.0 indicates single function, " "but %x.%x is already populated.", slot, slot, func); @@ -1336,6 +1333,7 @@ static void pci_qdev_unrealize(DeviceState *dev) pci_unregister_io_regions(pci_dev); pci_del_option_rom(pci_dev); + pcie_sriov_unregister_device(pci_dev); if (pc->exit) { pc->exit(pci_dev); @@ -1367,7 +1365,6 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, pcibus_t size = memory_region_size(memory); uint8_t hdr_type; - assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar */ assert(region_num >= 0); assert(region_num < PCI_NUM_REGIONS); assert(is_power_of_2(size)); @@ -1378,7 +1375,6 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, assert(hdr_type != PCI_HEADER_TYPE_BRIDGE || region_num < 2); r = &pci_dev->io_regions[region_num]; - r->addr = PCI_BAR_UNMAPPED; r->size = size; r->type = type; r->memory = memory; @@ -1386,22 +1382,35 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, ? pci_get_bus(pci_dev)->address_space_io : pci_get_bus(pci_dev)->address_space_mem; - wmask = ~(size - 1); - if (region_num == PCI_ROM_SLOT) { - /* ROM enable bit is writable */ - wmask |= PCI_ROM_ADDRESS_ENABLE; - } - - addr = pci_bar(pci_dev, region_num); - pci_set_long(pci_dev->config + addr, type); + if (pci_is_vf(pci_dev)) { + PCIDevice *pf = pci_dev->exp.sriov_vf.pf; + assert(!pf || type == pf->exp.sriov_pf.vf_bar_type[region_num]); - if (!(r->type & PCI_BASE_ADDRESS_SPACE_IO) && - r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) { - pci_set_quad(pci_dev->wmask + addr, wmask); - pci_set_quad(pci_dev->cmask + addr, ~0ULL); + r->addr = pci_bar_address(pci_dev, region_num, r->type, r->size); + if (r->addr != PCI_BAR_UNMAPPED) { + memory_region_add_subregion_overlap(r->address_space, + r->addr, r->memory, 1); + } } else { - pci_set_long(pci_dev->wmask + addr, wmask & 0xffffffff); - pci_set_long(pci_dev->cmask + addr, 0xffffffff); + r->addr = PCI_BAR_UNMAPPED; + + wmask = ~(size - 1); + if (region_num == PCI_ROM_SLOT) { + /* ROM enable bit is writable */ + wmask |= PCI_ROM_ADDRESS_ENABLE; + } + + addr = pci_bar(pci_dev, region_num); + pci_set_long(pci_dev->config + addr, type); + + if (!(r->type & PCI_BASE_ADDRESS_SPACE_IO) && + r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) { + pci_set_quad(pci_dev->wmask + addr, wmask); + pci_set_quad(pci_dev->cmask + addr, ~0ULL); + } else { + pci_set_long(pci_dev->wmask + addr, wmask & 0xffffffff); + pci_set_long(pci_dev->cmask + addr, 0xffffffff); + } } } @@ -2162,6 +2171,11 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) } } + if (!pcie_sriov_register_device(pci_dev, errp)) { + pci_qdev_unrealize(DEVICE(pci_dev)); + return; + } + /* * A PCIe Downstream Port that do not have ARI Forwarding enabled must * associate only Device 0 with the device attached to the bus diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c index 3af0cc7d560a..0c875e61fe96 100644 --- a/hw/pci/pcie_sriov.c +++ b/hw/pci/pcie_sriov.c @@ -20,6 +20,8 @@ #include "qapi/error.h" #include "trace.h" +static GHashTable *pfs; + static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs) { for (uint16_t i = 0; i < total_vfs; i++) { @@ -31,14 +33,49 @@ static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs) dev->exp.sriov_pf.vf = NULL; } -bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, - const char *vfname, uint16_t vf_dev_id, - uint16_t init_vfs, uint16_t total_vfs, - uint16_t vf_offset, uint16_t vf_stride, - Error **errp) +static void clear_ctrl_vfe(PCIDevice *dev) +{ + uint8_t *ctrl = dev->config + dev->exp.sriov_cap + PCI_SRIOV_CTRL; + pci_set_word(ctrl, pci_get_word(ctrl) & ~PCI_SRIOV_CTRL_VFE); +} + +static void register_vfs(PCIDevice *dev) +{ + uint16_t num_vfs; + uint16_t i; + uint16_t sriov_cap = dev->exp.sriov_cap; + + assert(sriov_cap > 0); + num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF); + if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) { + clear_ctrl_vfe(dev); + return; + } + + trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn), num_vfs); + for (i = 0; i < num_vfs; i++) { + pci_set_enabled(dev->exp.sriov_pf.vf[i], true); + } +} + +static void unregister_vfs(PCIDevice *dev) +{ + uint16_t i; + uint8_t *cfg = dev->config + dev->exp.sriov_cap; + + trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn)); + for (i = 0; i < pci_get_word(cfg + PCI_SRIOV_TOTAL_VF); i++) { + pci_set_enabled(dev->exp.sriov_pf.vf[i], false); + } +} + +static bool pcie_sriov_pf_init_common(PCIDevice *dev, uint16_t offset, + uint16_t vf_dev_id, uint16_t init_vfs, + uint16_t total_vfs, uint16_t vf_offset, + uint16_t vf_stride, Error **errp) { - BusState *bus = qdev_get_parent_bus(&dev->qdev); - int32_t devfn = dev->devfn + vf_offset; uint8_t *cfg = dev->config + offset; uint8_t *wmask; @@ -100,6 +137,28 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, qdev_prop_set_bit(&dev->qdev, "multifunction", true); + return true; +} + +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, + const char *vfname, uint16_t vf_dev_id, + uint16_t init_vfs, uint16_t total_vfs, + uint16_t vf_offset, uint16_t vf_stride, + Error **errp) +{ + BusState *bus = qdev_get_parent_bus(&dev->qdev); + int32_t devfn = dev->devfn + vf_offset; + + if (pfs && g_hash_table_contains(pfs, dev->qdev.id)) { + error_setg(errp, "attaching user-created SR-IOV VF unsupported"); + return false; + } + + if (!pcie_sriov_pf_init_common(dev, offset, vf_dev_id, init_vfs, + total_vfs, vf_offset, vf_stride, errp)) { + return false; + } + dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs); for (uint16_t i = 0; i < total_vfs; i++) { @@ -129,7 +188,22 @@ void pcie_sriov_pf_exit(PCIDevice *dev) { uint8_t *cfg = dev->config + dev->exp.sriov_cap; - unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)); + if (dev->exp.sriov_pf.vf_user_created) { + uint16_t ven_id = pci_get_word(dev->config + PCI_VENDOR_ID); + uint16_t total_vfs = pci_get_word(dev->config + PCI_SRIOV_TOTAL_VF); + uint16_t vf_dev_id = pci_get_word(dev->config + PCI_SRIOV_VF_DID); + + unregister_vfs(dev); + + for (uint16_t i = 0; i < total_vfs; i++) { + dev->exp.sriov_pf.vf[i]->exp.sriov_vf.pf = NULL; + + pci_config_set_vendor_id(dev->exp.sriov_pf.vf[i]->config, ven_id); + pci_config_set_device_id(dev->exp.sriov_pf.vf[i]->config, vf_dev_id); + } + } else { + unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)); + } } void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num, @@ -162,74 +236,172 @@ void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num, void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num, MemoryRegion *memory) { - PCIIORegion *r; - PCIBus *bus = pci_get_bus(dev); uint8_t type; - pcibus_t size = memory_region_size(memory); - assert(pci_is_vf(dev)); /* PFs must use pci_register_bar */ - assert(region_num >= 0); - assert(region_num < PCI_NUM_REGIONS); + assert(dev->exp.sriov_vf.pf); type = dev->exp.sriov_vf.pf->exp.sriov_pf.vf_bar_type[region_num]; - if (!is_power_of_2(size)) { - error_report("%s: PCI region size must be a power" - " of two - type=0x%x, size=0x%"FMT_PCIBUS, - __func__, type, size); - exit(1); - } - - r = &dev->io_regions[region_num]; - r->memory = memory; - r->address_space = - type & PCI_BASE_ADDRESS_SPACE_IO - ? bus->address_space_io - : bus->address_space_mem; - r->size = size; - r->type = type; - - r->addr = pci_bar_address(dev, region_num, r->type, r->size); - if (r->addr != PCI_BAR_UNMAPPED) { - memory_region_add_subregion_overlap(r->address_space, - r->addr, r->memory, 1); - } + return pci_register_bar(dev, region_num, type, memory); } -static void clear_ctrl_vfe(PCIDevice *dev) +static gint compare_vf_devfns(gconstpointer a, gconstpointer b) { - uint8_t *ctrl = dev->config + dev->exp.sriov_cap + PCI_SRIOV_CTRL; - pci_set_word(ctrl, pci_get_word(ctrl) & ~PCI_SRIOV_CTRL_VFE); + return (*(PCIDevice **)a)->devfn - (*(PCIDevice **)b)->devfn; } -static void register_vfs(PCIDevice *dev) +int16_t pcie_sriov_pf_init_from_user_created_vfs(PCIDevice *dev, + uint16_t offset, + Error **errp) { - uint16_t num_vfs; + GPtrArray *pf; + PCIDevice **vfs; + BusState *bus = qdev_get_parent_bus(DEVICE(dev)); + uint16_t ven_id = pci_get_word(dev->config + PCI_VENDOR_ID); + uint16_t vf_dev_id; + uint16_t vf_offset; + uint16_t vf_stride; uint16_t i; - uint16_t sriov_cap = dev->exp.sriov_cap; - assert(sriov_cap > 0); - num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF); - if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) { - clear_ctrl_vfe(dev); - return; + if (!pfs || !dev->qdev.id) { + return 0; } - trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn), - PCI_FUNC(dev->devfn), num_vfs); - for (i = 0; i < num_vfs; i++) { - pci_set_enabled(dev->exp.sriov_pf.vf[i], true); + pf = g_hash_table_lookup(pfs, dev->qdev.id); + if (!pf) { + return 0; + } + + if (pf->len > UINT16_MAX) { + error_setg(errp, "too many VFs"); + return -1; + } + + g_ptr_array_sort(pf, compare_vf_devfns); + vfs = (void *)pf->pdata; + + if (vfs[0]->devfn <= dev->devfn) { + error_setg(errp, "a VF function number is less than the PF function number"); + return -1; + } + + vf_dev_id = pci_get_word(vfs[0]->config + PCI_DEVICE_ID); + vf_offset = vfs[0]->devfn - dev->devfn; + vf_stride = pf->len < 2 ? 0 : vfs[1]->devfn - vfs[0]->devfn; + + for (i = 0; i < pf->len; i++) { + if (bus != qdev_get_parent_bus(&vfs[i]->qdev)) { + error_setg(errp, "SR-IOV VF parent bus mismatches with PF"); + return -1; + } + + if (ven_id != pci_get_word(vfs[i]->config + PCI_VENDOR_ID)) { + error_setg(errp, "SR-IOV VF vendor ID mismatches with PF"); + return -1; + } + + if (vf_dev_id != pci_get_word(vfs[i]->config + PCI_DEVICE_ID)) { + error_setg(errp, "inconsistent SR-IOV VF device IDs"); + return -1; + } + + for (size_t j = 0; j < PCI_NUM_REGIONS; j++) { + if (vfs[i]->io_regions[j].size != vfs[0]->io_regions[j].size || + vfs[i]->io_regions[j].type != vfs[0]->io_regions[j].type) { + error_setg(errp, "inconsistent SR-IOV BARs"); + return -1; + } + } + + if (vfs[i]->devfn - vfs[0]->devfn != vf_stride * i) { + error_setg(errp, "inconsistent SR-IOV stride"); + return -1; + } + } + + if (!pcie_sriov_pf_init_common(dev, offset, vf_dev_id, pf->len, + pf->len, vf_offset, vf_stride, errp)) { + return -1; + } + + for (i = 0; i < pf->len; i++) { + vfs[i]->exp.sriov_vf.pf = dev; + vfs[i]->exp.sriov_vf.vf_number = i; + + /* set vid/did according to sr/iov spec - they are not used */ + pci_config_set_vendor_id(vfs[i]->config, 0xffff); + pci_config_set_device_id(vfs[i]->config, 0xffff); } + + dev->exp.sriov_pf.vf = vfs; + dev->exp.sriov_pf.vf_user_created = true; + + for (i = 0; i < PCI_NUM_REGIONS; i++) { + PCIIORegion *region = &vfs[0]->io_regions[i]; + + if (region->size) { + pcie_sriov_pf_init_vf_bar(dev, i, region->type, region->size); + } + } + + return PCI_EXT_CAP_SRIOV_SIZEOF; } -static void unregister_vfs(PCIDevice *dev) +bool pcie_sriov_register_device(PCIDevice *dev, Error **errp) { - uint16_t i; - uint8_t *cfg = dev->config + dev->exp.sriov_cap; + if (!dev->exp.sriov_pf.vf && dev->qdev.id && + pfs && g_hash_table_contains(pfs, dev->qdev.id)) { + error_setg(errp, "attaching user-created SR-IOV VF unsupported"); + return false; + } - trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn), - PCI_FUNC(dev->devfn)); - for (i = 0; i < pci_get_word(cfg + PCI_SRIOV_TOTAL_VF); i++) { - pci_set_enabled(dev->exp.sriov_pf.vf[i], false); + if (dev->sriov_pf) { + PCIDevice *pci_pf; + GPtrArray *pf; + + if (!PCI_DEVICE_GET_CLASS(dev)->sriov_vf_user_creatable) { + error_setg(errp, "user cannot create SR-IOV VF with this device type"); + return false; + } + + if (!pci_is_express(dev)) { + error_setg(errp, "PCI Express is required for SR-IOV VF"); + return false; + } + + if (!pci_qdev_find_device(dev->sriov_pf, &pci_pf)) { + error_setg(errp, "PCI device specified as SR-IOV PF already exists"); + return false; + } + + if (!pfs) { + pfs = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, NULL); + } + + pf = g_hash_table_lookup(pfs, dev->sriov_pf); + if (!pf) { + pf = g_ptr_array_new(); + g_hash_table_insert(pfs, g_strdup(dev->sriov_pf), pf); + } + + g_ptr_array_add(pf, dev); + } + + return true; +} + +void pcie_sriov_unregister_device(PCIDevice *dev) +{ + if (dev->sriov_pf && pfs) { + GPtrArray *pf = g_hash_table_lookup(pfs, dev->sriov_pf); + + if (pf) { + g_ptr_array_remove_fast(pf, dev); + + if (!pf->len) { + g_hash_table_remove(pfs, dev->sriov_pf); + g_ptr_array_free(pf, FALSE); + } + } } } @@ -316,7 +488,7 @@ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize) uint16_t pcie_sriov_vf_number(PCIDevice *dev) { - assert(pci_is_vf(dev)); + assert(dev->exp.sriov_vf.pf); return dev->exp.sriov_vf.vf_number; } -- 2.45.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 6/8] virtio-pci: Implement SR-IOV PF 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki ` (4 preceding siblings ...) 2024-07-15 5:19 ` [PATCH v5 5/8] pcie_sriov: Allow user to create SR-IOV device Akihiko Odaki @ 2024-07-15 5:19 ` Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 7/8] virtio-net: Implement SR-IOV VF Akihiko Odaki ` (2 subsequent siblings) 8 siblings, 0 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki Allow user to attach SR-IOV VF to a virtio-pci PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- include/hw/virtio/virtio-pci.h | 1 + hw/virtio/virtio-pci.c | 20 +++++++++++++++----- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h index 9e67ba38c748..34539f2f6722 100644 --- a/include/hw/virtio/virtio-pci.h +++ b/include/hw/virtio/virtio-pci.h @@ -152,6 +152,7 @@ struct VirtIOPCIProxy { uint32_t modern_io_bar_idx; uint32_t modern_mem_bar_idx; int config_cap; + uint16_t last_pcie_cap_offset; uint32_t flags; bool disable_modern; bool ignore_backend_features; diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 9534730bba19..0c8fcc5627d5 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1955,6 +1955,7 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp) uint8_t *config; uint32_t size; VirtIODevice *vdev = virtio_bus_get_device(bus); + int16_t res; /* * Virtio capabilities present without @@ -2100,6 +2101,14 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp) pci_register_bar(&proxy->pci_dev, proxy->legacy_io_bar_idx, PCI_BASE_ADDRESS_SPACE_IO, &proxy->bar); } + + res = pcie_sriov_pf_init_from_user_created_vfs(&proxy->pci_dev, + proxy->last_pcie_cap_offset, + errp); + if (res > 0) { + proxy->last_pcie_cap_offset += res; + virtio_add_feature(&vdev->host_features, VIRTIO_F_SR_IOV); + } } static void virtio_pci_device_unplugged(DeviceState *d) @@ -2187,7 +2196,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp) if (pcie_port && pci_is_express(pci_dev)) { int pos; - uint16_t last_pcie_cap_offset = PCI_CONFIG_SPACE_SIZE; + proxy->last_pcie_cap_offset = PCI_CONFIG_SPACE_SIZE; pos = pcie_endpoint_cap_init(pci_dev, 0); assert(pos > 0); @@ -2207,9 +2216,9 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp) pci_set_word(pci_dev->config + pos + PCI_PM_PMC, 0x3); if (proxy->flags & VIRTIO_PCI_FLAG_AER) { - pcie_aer_init(pci_dev, PCI_ERR_VER, last_pcie_cap_offset, + pcie_aer_init(pci_dev, PCI_ERR_VER, proxy->last_pcie_cap_offset, PCI_ERR_SIZEOF, NULL); - last_pcie_cap_offset += PCI_ERR_SIZEOF; + proxy->last_pcie_cap_offset += PCI_ERR_SIZEOF; } if (proxy->flags & VIRTIO_PCI_FLAG_INIT_DEVERR) { @@ -2234,9 +2243,9 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp) } if (proxy->flags & VIRTIO_PCI_FLAG_ATS) { - pcie_ats_init(pci_dev, last_pcie_cap_offset, + pcie_ats_init(pci_dev, proxy->last_pcie_cap_offset, proxy->flags & VIRTIO_PCI_FLAG_ATS_PAGE_ALIGNED); - last_pcie_cap_offset += PCI_EXT_CAP_ATS_SIZEOF; + proxy->last_pcie_cap_offset += PCI_EXT_CAP_ATS_SIZEOF; } if (proxy->flags & VIRTIO_PCI_FLAG_INIT_FLR) { @@ -2263,6 +2272,7 @@ static void virtio_pci_exit(PCIDevice *pci_dev) bool pcie_port = pci_bus_is_express(pci_get_bus(pci_dev)) && !pci_bus_is_root(pci_get_bus(pci_dev)); + pcie_sriov_pf_exit(&proxy->pci_dev); msix_uninit_exclusive_bar(pci_dev); if (proxy->flags & VIRTIO_PCI_FLAG_AER && pcie_port && pci_is_express(pci_dev)) { -- 2.45.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 7/8] virtio-net: Implement SR-IOV VF 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki ` (5 preceding siblings ...) 2024-07-15 5:19 ` [PATCH v5 6/8] virtio-pci: Implement SR-IOV PF Akihiko Odaki @ 2024-07-15 5:19 ` Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 8/8] docs: Document composable SR-IOV device Akihiko Odaki 2024-07-30 11:37 ` [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Michael S. Tsirkin 8 siblings, 0 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki A virtio-net device can be added as a SR-IOV VF to another virtio-pci device that will be the PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/virtio/virtio-net-pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/virtio/virtio-net-pci.c b/hw/virtio/virtio-net-pci.c index e03543a70a75..dba4987d6e04 100644 --- a/hw/virtio/virtio-net-pci.c +++ b/hw/virtio/virtio-net-pci.c @@ -75,6 +75,7 @@ static void virtio_net_pci_class_init(ObjectClass *klass, void *data) k->device_id = PCI_DEVICE_ID_VIRTIO_NET; k->revision = VIRTIO_PCI_ABI_VERSION; k->class_id = PCI_CLASS_NETWORK_ETHERNET; + k->sriov_vf_user_creatable = true; set_bit(DEVICE_CATEGORY_NETWORK, dc->categories); device_class_set_props(dc, virtio_net_properties); vpciklass->realize = virtio_net_pci_realize; -- 2.45.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 8/8] docs: Document composable SR-IOV device 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki ` (6 preceding siblings ...) 2024-07-15 5:19 ` [PATCH v5 7/8] virtio-net: Implement SR-IOV VF Akihiko Odaki @ 2024-07-15 5:19 ` Akihiko Odaki 2024-07-30 11:37 ` [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Michael S. Tsirkin 8 siblings, 0 replies; 15+ messages in thread From: Akihiko Odaki @ 2024-07-15 5:19 UTC (permalink / raw) To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- MAINTAINERS | 1 + docs/system/index.rst | 1 + docs/system/sriov.rst | 36 ++++++++++++++++++++++++++++++++++++ 3 files changed, 38 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 6725913c8b3a..ca0222e5528b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2008,6 +2008,7 @@ F: hw/pci-bridge/* F: qapi/pci.json F: docs/pci* F: docs/specs/*pci* +F: docs/system/sriov.rst PCIE DOE M: Huai-Cheng Kuo <hchkuo@avery-design.com.tw> diff --git a/docs/system/index.rst b/docs/system/index.rst index c21065e51932..718e9d3c56bb 100644 --- a/docs/system/index.rst +++ b/docs/system/index.rst @@ -39,3 +39,4 @@ or Hypervisor.Framework. multi-process confidential-guest-support vm-templating + sriov diff --git a/docs/system/sriov.rst b/docs/system/sriov.rst new file mode 100644 index 000000000000..a851a66a4b8b --- /dev/null +++ b/docs/system/sriov.rst @@ -0,0 +1,36 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Compsable SR-IOV device +======================= + +SR-IOV (Single Root I/O Virtualization) is an optional extended capability of a +PCI Express device. It allows a single physical function (PF) to appear as +multiple virtual functions (VFs) for the main purpose of eliminating software +overhead in I/O from virtual machines. + +There are devices with predefined SR-IOV configurations, but it is also possible +to compose an SR-IOV device yourself. Composing an SR-IOV device is currently +only supported by virtio-net-pci. + +Users can configure an SR-IOV-capable virtio-net device by adding +virtio-net-pci functions to a bus. Below is a command line example: + +.. code-block:: shell + + -netdev user,id=n -netdev user,id=o + -netdev user,id=p -netdev user,id=q + -device pcie-root-port,id=b + -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f + -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f + -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f + -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f + +The VFs specify the paired PF with ``sriov-pf`` property. The PF must be +added after all VFs. It is the user's responsibility to ensure that VFs have +function numbers larger than one of the PF, and that the function numbers +have a consistent stride. + +You may also need to perform additional steps to activate the SR-IOV feature on +your guest. For Linux, refer to [1]_. + +.. [1] https://docs.kernel.org/PCI/pci-iov-howto.html -- 2.45.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki ` (7 preceding siblings ...) 2024-07-15 5:19 ` [PATCH v5 8/8] docs: Document composable SR-IOV device Akihiko Odaki @ 2024-07-30 11:37 ` Michael S. Tsirkin 2024-07-30 12:26 ` Akihiko Odaki 8 siblings, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2024-07-30 11:37 UTC (permalink / raw) To: Akihiko Odaki Cc: Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen, qemu-devel, qemu-block, Yui Washizu On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote: > Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com> > ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto") OK I will revert this for now. We'll try again after the release, there will be time to address s390. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation 2024-07-30 11:37 ` [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Michael S. Tsirkin @ 2024-07-30 12:26 ` Akihiko Odaki 2024-07-30 17:56 ` Michael S. Tsirkin 0 siblings, 1 reply; 15+ messages in thread From: Akihiko Odaki @ 2024-07-30 12:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen, qemu-devel, qemu-block, Yui Washizu On 2024/07/30 20:37, Michael S. Tsirkin wrote: > On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote: >> Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com> >> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto") > > OK I will revert this for now. We'll try again after the release, > there will be time to address s390. > I'd like to know if anybody wants to use igb on a s390x machine configured with libvirt. Such a configuration is already kind of broken, and it is likely to require significant effort on both side of libvirt and QEMU to fix it. As an alternative, I'm also introducing SR-IOV support to virtio-net-pci. It does not suffer the same problem with igb thanks to its flexible configuration mechanism. Regards, Akihiko Odaki ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation 2024-07-30 12:26 ` Akihiko Odaki @ 2024-07-30 17:56 ` Michael S. Tsirkin 2024-07-31 8:58 ` Cédric Le Goater 0 siblings, 1 reply; 15+ messages in thread From: Michael S. Tsirkin @ 2024-07-30 17:56 UTC (permalink / raw) To: Akihiko Odaki Cc: Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen, qemu-devel, qemu-block, Yui Washizu On Tue, Jul 30, 2024 at 09:26:20PM +0900, Akihiko Odaki wrote: > On 2024/07/30 20:37, Michael S. Tsirkin wrote: > > On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote: > > > Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com> > > > ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto") > > > > OK I will revert this for now. We'll try again after the release, > > there will be time to address s390. > > > > I'd like to know if anybody wants to use igb on a s390x machine configured > with libvirt. Such a configuration is already kind of broken, and it is > likely to require significant effort on both side of libvirt and QEMU to fix > it. I assume Cédric wouldn't report it if was unused. > As an alternative, I'm also introducing SR-IOV support to virtio-net-pci. It > does not suffer the same problem with igb thanks to its flexible > configuration mechanism. > > Regards, > Akihiko Odaki Sounds like this needs more review time, anyway. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation 2024-07-30 17:56 ` Michael S. Tsirkin @ 2024-07-31 8:58 ` Cédric Le Goater 2024-08-01 7:13 ` Akihiko Odaki 0 siblings, 1 reply; 15+ messages in thread From: Cédric Le Goater @ 2024-07-31 8:58 UTC (permalink / raw) To: Michael S. Tsirkin, Akihiko Odaki Cc: Marcel Apfelbaum, Alex Williamson, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen, qemu-devel, qemu-block, Yui Washizu On 7/30/24 19:56, Michael S. Tsirkin wrote: > On Tue, Jul 30, 2024 at 09:26:20PM +0900, Akihiko Odaki wrote: >> On 2024/07/30 20:37, Michael S. Tsirkin wrote: >>> On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote: >>>> Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com> >>>> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto") >>> >>> OK I will revert this for now. We'll try again after the release, >>> there will be time to address s390. >>> >> >> I'd like to know if anybody wants to use igb on a s390x machine configured >> with libvirt. Such a configuration is already kind of broken, and it is >> likely to require significant effort on both side of libvirt and QEMU to fix >> it. > > > I assume Cédric wouldn't report it if was unused. > > >> As an alternative, I'm also introducing SR-IOV support to virtio-net-pci. It >> does not suffer the same problem with igb thanks to its flexible >> configuration mechanism. >> >> Regards, >> Akihiko Odaki > > Sounds like this needs more review time, anyway. > > Using an emulated IGB device in a s390x VM is not a common scenario. The IGB device is not supported downstream (RHEL on Z). However, the change broke the s390x machines I use for upstream QEMU development, I removed the IGB device as a fix for now. The Z PCI device model has 'uid' and 'fid' properties which are set by the VMM, and in this case, they are auto-generated, hence the conflicting ids with libvirt. This is Z specific but, possibly there are subtle use cases on other platforms which could have similar consequences. Something to be aware of. Also, and this is why I thought it was important to report. Creating PCI devices at machine init time (with -nodefaults) in the back of the management layer is a no-no. libvirt requests to have full control on the machine topology and this change seems like a violation of this rule, even if VFs are kind of special. Daniel, did I understand correctly the above constraint on the machine definition ? I can't say if we should revert for 9.1. Again, this Z is specific. The changes were introduced to catch errors early when the PF device is realized. There is no doubt they are useful. Some rework is needed to avoid the conflicting id issue though. Thanks, C. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation 2024-07-31 8:58 ` Cédric Le Goater @ 2024-08-01 7:13 ` Akihiko Odaki 2024-08-01 7:51 ` Michael S. Tsirkin 0 siblings, 1 reply; 15+ messages in thread From: Akihiko Odaki @ 2024-08-01 7:13 UTC (permalink / raw) To: Cédric Le Goater, Michael S. Tsirkin Cc: Marcel Apfelbaum, Alex Williamson, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen, qemu-devel, qemu-block, Yui Washizu On 2024/07/31 17:58, Cédric Le Goater wrote: > On 7/30/24 19:56, Michael S. Tsirkin wrote: >> On Tue, Jul 30, 2024 at 09:26:20PM +0900, Akihiko Odaki wrote: >>> On 2024/07/30 20:37, Michael S. Tsirkin wrote: >>>> On Mon, Jul 15, 2024 at 02:19:06PM +0900, Akihiko Odaki wrote: >>>>> Based-on: <20240714-rombar-v2-0-af1504ef55de@daynix.com> >>>>> ("[PATCH v2 0/4] hw/pci: Convert rom_bar into OnOffAuto") >>>> >>>> OK I will revert this for now. We'll try again after the release, >>>> there will be time to address s390. >>>> >>> >>> I'd like to know if anybody wants to use igb on a s390x machine >>> configured >>> with libvirt. Such a configuration is already kind of broken, and it is >>> likely to require significant effort on both side of libvirt and QEMU >>> to fix >>> it. >> >> >> I assume Cédric wouldn't report it if was unused. >> >> >>> As an alternative, I'm also introducing SR-IOV support to >>> virtio-net-pci. It >>> does not suffer the same problem with igb thanks to its flexible >>> configuration mechanism. >>> >>> Regards, >>> Akihiko Odaki >> >> Sounds like this needs more review time, anyway. >> >> > > Using an emulated IGB device in a s390x VM is not a common scenario. > The IGB device is not supported downstream (RHEL on Z). However, the > change broke the s390x machines I use for upstream QEMU development, > I removed the IGB device as a fix for now. > > The Z PCI device model has 'uid' and 'fid' properties which are set > by the VMM, and in this case, they are auto-generated, hence the > conflicting ids with libvirt. This is Z specific but, possibly there > are subtle use cases on other platforms which could have similar > consequences. Something to be aware of. > > > Also, and this is why I thought it was important to report. Creating > PCI devices at machine init time (with -nodefaults) in the back of the > management layer is a no-no. libvirt requests to have full control > on the machine topology and this change seems like a violation of > this rule, even if VFs are kind of special. > > Daniel, did I understand correctly the above constraint on the machine > definition ? It is problematic even if it is not init time. QEMU shouldn't implicitly add a PCI device without letting the management layer know. For the virtio-net SR-IOV support which I have been preparing, I'm avoiding automatically adding PCI devices but instead letting the user create VFs. However, fixing existing SR-IOV devices (igb and nvme) will require more efforts and possible breaking changes of the command line. > > > I can't say if we should revert for 9.1. Again, this Z is specific. > The changes were introduced to catch errors early when the PF device > is realized. There is no doubt they are useful. Some rework is needed > to avoid the conflicting id issue though. I think it is a good idea to revert these patches for now, and add them back along with the virtio-net-pci SR-IOV support when the next merge window starts though I don't require to do so. If someone experiences this problem in the next merge window, we can advise to use virtio-net-pci, or to specify qemu:commandline to workaround it. We can start thinking of making a breaking change if neither of them satisfies the requirement. Regards, Akihiko Odaki ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation 2024-08-01 7:13 ` Akihiko Odaki @ 2024-08-01 7:51 ` Michael S. Tsirkin 0 siblings, 0 replies; 15+ messages in thread From: Michael S. Tsirkin @ 2024-08-01 7:51 UTC (permalink / raw) To: Akihiko Odaki Cc: Cédric Le Goater, Marcel Apfelbaum, Alex Williamson, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch, Klaus Jensen, qemu-devel, qemu-block, Yui Washizu On Thu, Aug 01, 2024 at 04:13:14PM +0900, Akihiko Odaki wrote: > I think it is a good idea to revert these patches for now OK I reverted the 2 patchsets. there were some bugfixes there but I had to revert them too due to the dependency. If appropriate, feel free to resubmit just the fixes. -- MST ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-08-01 7:51 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-15 5:19 [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 1/8] hw/pci: Do not add ROM BAR for SR-IOV VF Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 2/8] hw/pci: Fix SR-IOV VF number calculation Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 3/8] pcie_sriov: Ensure PF and VF are mutually exclusive Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 4/8] pcie_sriov: Check PCI Express for SR-IOV PF Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 5/8] pcie_sriov: Allow user to create SR-IOV device Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 6/8] virtio-pci: Implement SR-IOV PF Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 7/8] virtio-net: Implement SR-IOV VF Akihiko Odaki 2024-07-15 5:19 ` [PATCH v5 8/8] docs: Document composable SR-IOV device Akihiko Odaki 2024-07-30 11:37 ` [PATCH v5 0/8] virtio-net: add support for SR-IOV emulation Michael S. Tsirkin 2024-07-30 12:26 ` Akihiko Odaki 2024-07-30 17:56 ` Michael S. Tsirkin 2024-07-31 8:58 ` Cédric Le Goater 2024-08-01 7:13 ` Akihiko Odaki 2024-08-01 7:51 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).