* [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
@ 2024-04-28  9:05 Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 1/7] hw/pci: Do not add ROM BAR for SR-IOV VF Akihiko Odaki
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Akihiko Odaki @ 2024-04-28  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki
Based-on: <20240315-reuse-v9-0-67aa69af4d53@daynix.com>
("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and improvements")
Introduction
------------
This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.
This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices
in a VM. vDPA devices can also provide L2 switching feature for
offloading though it is out of scope to allow the guest to configure
such a feature.
The PF side code resides in virtio-pci. The VF side code resides in
the PCI common infrastructure, but it is restricted to work only for
virtio-net-pci because of lack of validation.
User Interface
--------------
A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
  -netdev user,id=n -netdev user,id=o
  -netdev user,id=p -netdev user,id=q
  -device pcie-root-port,id=b
  -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs have
function numbers larger than one of the PF, and the function numbers
have a consistent stride.
Keeping VF instances
--------------------
A problem with SR-IOV emulation is that it needs to hotplug the VFs as
the guest requests. Previously, this behavior was implemented by
realizing and unrealizing VFs at runtime. However, this strategy does
not work well for the proposed virtio-net emulation; in this proposal,
device options passed in the command line must be maintained as VFs
are hotplugged, but they are consumed when the machine starts and not
available after that, which makes realizing VFs at runtime impossible.
As an strategy alternative to runtime realization/unrealization, this
series proposes to reuse the code to power down PCI Express devices.
When a PCI Express device is powered down, it will be hidden from the
guest but will be kept realized. This effectively implements the
behavior we need for the SR-IOV emulation.
Summary
-------
Patch 1 disables ROM BAR, which virtio-net-pci enables by default, for
VFs.
Patch 2 makes zero stride valid for 1 VF configuration.
Patch 3 and 4 adds validations.
Patch 5 adds user-created SR-IOV VF infrastructure.
Patch 6 makes virtio-pci work as SR-IOV PF for user-created VFs.
Patch 7 allows user to create SR-IOV VFs with virtio-net-pci.
[1] https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
[2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4bc5@daynix.com/
Co-developed-by: Yui Washizu <yui.washidu@gmail.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
Changes in v4:
- Added patch "hw/pci: Fix SR-IOV VF number calculation" to fix division
  by zero reported by Yui Washizu.
- Rebased.
- Link to v3: https://lore.kernel.org/r/20240305-sriov-v3-0-abdb75770372@daynix.com
Changes in v3:
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231210-sriov-v2-0-b959e8a6dfaf@daynix.com
Changes in v2:
- Changed to keep VF instances.
- Link to v1: https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7bd6@daynix.com
---
Akihiko Odaki (7):
      hw/pci: Do not add ROM BAR for SR-IOV VF
      hw/pci: Fix SR-IOV VF number calculation
      pcie_sriov: Ensure PF and VF are mutually exclusive
      pcie_sriov: Check PCI Express for SR-IOV PF
      pcie_sriov: Allow user to create SR-IOV device
      virtio-pci: Implement SR-IOV PF
      virtio-net: Implement SR-IOV VF
 include/hw/pci/pci_device.h |   6 +-
 include/hw/pci/pcie_sriov.h |  19 +++
 hw/pci/pci.c                |  76 +++++++----
 hw/pci/pcie_sriov.c         | 298 +++++++++++++++++++++++++++++++++++---------
 hw/virtio/virtio-net-pci.c  |   1 +
 hw/virtio/virtio-pci.c      |   7 ++
 6 files changed, 323 insertions(+), 84 deletions(-)
---
base-commit: 2ac5458086ab61282f30c2f8bdf2ae9a0a06a75d
change-id: 20231202-sriov-9402fb262be8
Best regards,
-- 
Akihiko Odaki <akihiko.odaki@daynix.com>
^ permalink raw reply	[flat|nested] 13+ messages in thread
* [PATCH RFC v4 1/7] hw/pci: Do not add ROM BAR for SR-IOV VF
  2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
@ 2024-04-28  9:05 ` Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 2/7] hw/pci: Fix SR-IOV VF number calculation Akihiko Odaki
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Akihiko Odaki @ 2024-04-28  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki
A SR-IOV VF cannot have a ROM BAR.
Co-developed-by: Yui Washizu <yui.washidu@gmail.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pci.c | 8 ++++++++
 1 file changed, 8 insertions(+)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index cb5ac46e9f27..201ff64e11cc 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2359,6 +2359,14 @@ static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom,
         return;
     }
 
+    if (pci_is_vf(pdev)) {
+        if (pdev->rom_bar != UINT32_MAX) {
+            error_setg(errp, "ROM BAR cannot be enabled for SR-IOV VF");
+        }
+
+        return;
+    }
+
     if (load_file || pdev->romsize == UINT32_MAX) {
         path = qemu_find_file(QEMU_FILE_TYPE_BIOS, pdev->romfile);
         if (path == NULL) {
-- 
2.44.0
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH RFC v4 2/7] hw/pci: Fix SR-IOV VF number calculation
  2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 1/7] hw/pci: Do not add ROM BAR for SR-IOV VF Akihiko Odaki
@ 2024-04-28  9:05 ` Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 3/7] pcie_sriov: Ensure PF and VF are mutually exclusive Akihiko Odaki
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Akihiko Odaki @ 2024-04-28  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki
pci_config_get_bar_addr() had a division by vf_stride. vf_stride needs
to be non-zero when there are multiple VFs, but the specification does
not prohibit to make it zero when there is only one VF.
Do not perform the division for the first VF to avoid division by zero.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pci.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 201ff64e11cc..dbecb3d4aa42 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1437,7 +1437,11 @@ static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
             pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
         uint16_t vf_stride =
             pci_get_word(pf->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
-        uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) / vf_stride;
+        uint32_t vf_num = d->devfn - (pf->devfn + vf_offset);
+
+        if (vf_num) {
+            vf_num /= vf_stride;
+        }
 
         if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
             new_addr = pci_get_quad(pf->config + bar);
-- 
2.44.0
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH RFC v4 3/7] pcie_sriov: Ensure PF and VF are mutually exclusive
  2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 1/7] hw/pci: Do not add ROM BAR for SR-IOV VF Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 2/7] hw/pci: Fix SR-IOV VF number calculation Akihiko Odaki
@ 2024-04-28  9:05 ` Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 4/7] pcie_sriov: Check PCI Express for SR-IOV PF Akihiko Odaki
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Akihiko Odaki @ 2024-04-28  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki
A device cannot be a SR-IOV PF and a VF at the same time.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pcie_sriov.c | 5 +++++
 1 file changed, 5 insertions(+)
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 56523ab4e833..ec8fc0757b92 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -42,6 +42,11 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
     uint8_t *cfg = dev->config + offset;
     uint8_t *wmask;
 
+    if (pci_is_vf(dev)) {
+        error_setg(errp, "a device cannot be a SR-IOV PF and a VF at the same time");
+        return false;
+    }
+
     if (total_vfs) {
         uint16_t ari_cap = pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI);
         uint16_t first_vf_devfn = dev->devfn + vf_offset;
-- 
2.44.0
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH RFC v4 4/7] pcie_sriov: Check PCI Express for SR-IOV PF
  2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
                   ` (2 preceding siblings ...)
  2024-04-28  9:05 ` [PATCH RFC v4 3/7] pcie_sriov: Ensure PF and VF are mutually exclusive Akihiko Odaki
@ 2024-04-28  9:05 ` Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 5/7] pcie_sriov: Allow user to create SR-IOV device Akihiko Odaki
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Akihiko Odaki @ 2024-04-28  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki
SR-IOV requires PCI Express.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pcie_sriov.c | 5 +++++
 1 file changed, 5 insertions(+)
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index ec8fc0757b92..3af0cc7d560a 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -42,6 +42,11 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
     uint8_t *cfg = dev->config + offset;
     uint8_t *wmask;
 
+    if (!pci_is_express(dev)) {
+        error_setg(errp, "PCI Express is required for SR-IOV PF");
+        return false;
+    }
+
     if (pci_is_vf(dev)) {
         error_setg(errp, "a device cannot be a SR-IOV PF and a VF at the same time");
         return false;
-- 
2.44.0
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH RFC v4 5/7] pcie_sriov: Allow user to create SR-IOV device
  2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
                   ` (3 preceding siblings ...)
  2024-04-28  9:05 ` [PATCH RFC v4 4/7] pcie_sriov: Check PCI Express for SR-IOV PF Akihiko Odaki
@ 2024-04-28  9:05 ` Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 6/7] virtio-pci: Implement SR-IOV PF Akihiko Odaki
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Akihiko Odaki @ 2024-04-28  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki
A user can create a SR-IOV device by specifying the PF with the
sriov-pf property of the VFs. The VFs must be added before the PF.
A user-creatable VF must have PCIDeviceClass::sriov_vf_user_creatable
set. Such a VF cannot refer to the PF because it is created before the
PF.
A PF that user-creatable VFs can be attached calls
pcie_sriov_pf_init_from_user_created_vfs() during realization and
pcie_sriov_pf_exit() when exiting.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pci_device.h |   6 +-
 include/hw/pci/pcie_sriov.h |  19 +++
 hw/pci/pci.c                |  62 ++++++----
 hw/pci/pcie_sriov.c         | 288 +++++++++++++++++++++++++++++++++++---------
 4 files changed, 292 insertions(+), 83 deletions(-)
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index 6be0f989ebe0..eefd9d9a7b5a 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -37,6 +37,8 @@ struct PCIDeviceClass {
     uint16_t subsystem_id;              /* only for header type = 0 */
 
     const char *romfile;                /* rom bar */
+
+    bool sriov_vf_user_creatable;
 };
 
 enum PCIReqIDType {
@@ -160,6 +162,8 @@ struct PCIDevice {
     /* ID of standby device in net_failover pair */
     char *failover_pair_id;
     uint32_t acpi_index;
+
+    char *sriov_pf;
 };
 
 static inline int pci_intx(PCIDevice *pci_dev)
@@ -192,7 +196,7 @@ static inline int pci_is_express_downstream_port(const PCIDevice *d)
 
 static inline int pci_is_vf(const PCIDevice *d)
 {
-    return d->exp.sriov_vf.pf != NULL;
+    return d->sriov_pf || d->exp.sriov_vf.pf != NULL;
 }
 
 static inline uint32_t pci_config_size(const PCIDevice *d)
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index d576a8c6be19..20626b5605c9 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,6 +18,7 @@
 struct PCIESriovPF {
     uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
     PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
+    bool vf_user_created; /* If VFs are created by user */
 };
 
 struct PCIESriovVF {
@@ -40,6 +41,24 @@ void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
 void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
                                 MemoryRegion *memory);
 
+/**
+ * pcie_sriov_pf_init_from_user_created_vfs() - Initialize PF with user-created
+ *                                              VFs.
+ * @dev: A PCIe device being realized.
+ * @offset: The offset of the SR-IOV capability.
+ * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Return:
+ * * true - @dev is initialized as a PCIe SR-IOV PF.
+ * * false - @dev is not initialized because there is no SR-IOV VFs or an error
+ *           occurred.
+ */
+bool pcie_sriov_pf_init_from_user_created_vfs(PCIDevice *dev, uint16_t offset,
+                                              Error **errp);
+
+bool pcie_sriov_register_device(PCIDevice *dev, Error **errp);
+void pcie_sriov_unregister_device(PCIDevice *dev);
+
 /*
  * Default (minimal) page size support values
  * as required by the SR/IOV standard:
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index dbecb3d4aa42..e79bb8b6b6fa 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -85,6 +85,7 @@ static Property pci_props[] = {
                     QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
     DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
                     QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
+    DEFINE_PROP_STRING("sriov-pf", PCIDevice, sriov_pf),
     DEFINE_PROP_END_OF_LIST()
 };
 
@@ -959,13 +960,8 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev, Error **errp)
         dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
     }
 
-    /*
-     * With SR/IOV and ARI, a device at function 0 need not be a multifunction
-     * device, as it may just be a VF that ended up with function 0 in
-     * the legacy PCI interpretation. Avoid failing in such cases:
-     */
-    if (pci_is_vf(dev) &&
-        dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+    /* SR/IOV is not handled here. */
+    if (pci_is_vf(dev)) {
         return;
     }
 
@@ -998,7 +994,8 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev, Error **errp)
     }
     /* function 0 indicates single function, so function > 0 must be NULL */
     for (func = 1; func < PCI_FUNC_MAX; ++func) {
-        if (bus->devices[PCI_DEVFN(slot, func)]) {
+        PCIDevice *device = bus->devices[PCI_DEVFN(slot, func)];
+        if (device && !pci_is_vf(device)) {
             error_setg(errp, "PCI: %x.0 indicates single function, "
                        "but %x.%x is already populated.",
                        slot, slot, func);
@@ -1283,6 +1280,7 @@ static void pci_qdev_unrealize(DeviceState *dev)
 
     pci_unregister_io_regions(pci_dev);
     pci_del_option_rom(pci_dev);
+    pcie_sriov_unregister_device(pci_dev);
 
     if (pc->exit) {
         pc->exit(pci_dev);
@@ -1314,7 +1312,6 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
     pcibus_t size = memory_region_size(memory);
     uint8_t hdr_type;
 
-    assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar */
     assert(region_num >= 0);
     assert(region_num < PCI_NUM_REGIONS);
     assert(is_power_of_2(size));
@@ -1325,7 +1322,6 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
     assert(hdr_type != PCI_HEADER_TYPE_BRIDGE || region_num < 2);
 
     r = &pci_dev->io_regions[region_num];
-    r->addr = PCI_BAR_UNMAPPED;
     r->size = size;
     r->type = type;
     r->memory = memory;
@@ -1333,22 +1329,35 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
                         ? pci_get_bus(pci_dev)->address_space_io
                         : pci_get_bus(pci_dev)->address_space_mem;
 
-    wmask = ~(size - 1);
-    if (region_num == PCI_ROM_SLOT) {
-        /* ROM enable bit is writable */
-        wmask |= PCI_ROM_ADDRESS_ENABLE;
-    }
-
-    addr = pci_bar(pci_dev, region_num);
-    pci_set_long(pci_dev->config + addr, type);
+    if (pci_is_vf(pci_dev)) {
+        PCIDevice *pf = pci_dev->exp.sriov_vf.pf;
+        assert(!pf || type == pf->exp.sriov_pf.vf_bar_type[region_num]);
 
-    if (!(r->type & PCI_BASE_ADDRESS_SPACE_IO) &&
-        r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
-        pci_set_quad(pci_dev->wmask + addr, wmask);
-        pci_set_quad(pci_dev->cmask + addr, ~0ULL);
+        r->addr = pci_bar_address(pci_dev, region_num, r->type, r->size);
+        if (r->addr != PCI_BAR_UNMAPPED) {
+            memory_region_add_subregion_overlap(r->address_space,
+                                                r->addr, r->memory, 1);
+        }
     } else {
-        pci_set_long(pci_dev->wmask + addr, wmask & 0xffffffff);
-        pci_set_long(pci_dev->cmask + addr, 0xffffffff);
+        r->addr = PCI_BAR_UNMAPPED;
+
+        wmask = ~(size - 1);
+        if (region_num == PCI_ROM_SLOT) {
+            /* ROM enable bit is writable */
+            wmask |= PCI_ROM_ADDRESS_ENABLE;
+        }
+
+        addr = pci_bar(pci_dev, region_num);
+        pci_set_long(pci_dev->config + addr, type);
+
+        if (!(r->type & PCI_BASE_ADDRESS_SPACE_IO) &&
+            r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+            pci_set_quad(pci_dev->wmask + addr, wmask);
+            pci_set_quad(pci_dev->cmask + addr, ~0ULL);
+        } else {
+            pci_set_long(pci_dev->wmask + addr, wmask & 0xffffffff);
+            pci_set_long(pci_dev->cmask + addr, 0xffffffff);
+        }
     }
 }
 
@@ -2109,6 +2118,11 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
         }
     }
 
+    if (!pcie_sriov_register_device(pci_dev, errp)) {
+        pci_qdev_unrealize(DEVICE(pci_dev));
+        return;
+    }
+
     /*
      * A PCIe Downstream Port that do not have ARI Forwarding enabled must
      * associate only Device 0 with the device attached to the bus
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 3af0cc7d560a..183d6f17d606 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -20,6 +20,8 @@
 #include "qapi/error.h"
 #include "trace.h"
 
+static GHashTable *pfs;
+
 static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs)
 {
     for (uint16_t i = 0; i < total_vfs; i++) {
@@ -31,14 +33,49 @@ static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs)
     dev->exp.sriov_pf.vf = NULL;
 }
 
-bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
-                        const char *vfname, uint16_t vf_dev_id,
-                        uint16_t init_vfs, uint16_t total_vfs,
-                        uint16_t vf_offset, uint16_t vf_stride,
-                        Error **errp)
+static void clear_ctrl_vfe(PCIDevice *dev)
+{
+    uint8_t *ctrl = dev->config + dev->exp.sriov_cap + PCI_SRIOV_CTRL;
+    pci_set_word(ctrl, pci_get_word(ctrl) & ~PCI_SRIOV_CTRL_VFE);
+}
+
+static void register_vfs(PCIDevice *dev)
+{
+    uint16_t num_vfs;
+    uint16_t i;
+    uint16_t sriov_cap = dev->exp.sriov_cap;
+
+    assert(sriov_cap > 0);
+    num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
+    if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
+        clear_ctrl_vfe(dev);
+        return;
+    }
+
+    trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
+                             PCI_FUNC(dev->devfn), num_vfs);
+    for (i = 0; i < num_vfs; i++) {
+        pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
+    }
+}
+
+static void unregister_vfs(PCIDevice *dev)
+{
+    uint16_t i;
+    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
+
+    trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
+                               PCI_FUNC(dev->devfn));
+    for (i = 0; i < pci_get_word(cfg + PCI_SRIOV_TOTAL_VF); i++) {
+        pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
+    }
+}
+
+static bool pcie_sriov_pf_init_common(PCIDevice *dev, uint16_t offset,
+                                      uint16_t vf_dev_id, uint16_t init_vfs,
+                                      uint16_t total_vfs, uint16_t vf_offset,
+                                      uint16_t vf_stride, Error **errp)
 {
-    BusState *bus = qdev_get_parent_bus(&dev->qdev);
-    int32_t devfn = dev->devfn + vf_offset;
     uint8_t *cfg = dev->config + offset;
     uint8_t *wmask;
 
@@ -100,6 +137,28 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 
     qdev_prop_set_bit(&dev->qdev, "multifunction", true);
 
+    return true;
+}
+
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+                        const char *vfname, uint16_t vf_dev_id,
+                        uint16_t init_vfs, uint16_t total_vfs,
+                        uint16_t vf_offset, uint16_t vf_stride,
+                        Error **errp)
+{
+    BusState *bus = qdev_get_parent_bus(&dev->qdev);
+    int32_t devfn = dev->devfn + vf_offset;
+
+    if (pfs && g_hash_table_contains(pfs, dev->qdev.id)) {
+        error_setg(errp, "attaching user-created SR-IOV VF unsupported");
+        return false;
+    }
+
+    if (!pcie_sriov_pf_init_common(dev, offset, vf_dev_id, init_vfs,
+                                   total_vfs, vf_offset, vf_stride, errp)) {
+        return false;
+    }
+
     dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs);
 
     for (uint16_t i = 0; i < total_vfs; i++) {
@@ -129,7 +188,22 @@ void pcie_sriov_pf_exit(PCIDevice *dev)
 {
     uint8_t *cfg = dev->config + dev->exp.sriov_cap;
 
-    unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF));
+    if (dev->exp.sriov_pf.vf_user_created) {
+        uint16_t ven_id = pci_get_word(dev->config + PCI_VENDOR_ID);
+        uint16_t total_vfs = pci_get_word(dev->config + PCI_SRIOV_TOTAL_VF);
+        uint16_t vf_dev_id = pci_get_word(dev->config + PCI_SRIOV_VF_DID);
+
+        unregister_vfs(dev);
+
+        for (uint16_t i = 0; i < total_vfs; i++) {
+            dev->exp.sriov_pf.vf[i]->exp.sriov_vf.pf = NULL;
+
+            pci_config_set_vendor_id(dev->exp.sriov_pf.vf[i]->config, ven_id);
+            pci_config_set_device_id(dev->exp.sriov_pf.vf[i]->config, vf_dev_id);
+        }
+    } else {
+        unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF));
+    }
 }
 
 void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
@@ -162,74 +236,172 @@ void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
 void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
                                 MemoryRegion *memory)
 {
-    PCIIORegion *r;
-    PCIBus *bus = pci_get_bus(dev);
     uint8_t type;
-    pcibus_t size = memory_region_size(memory);
 
-    assert(pci_is_vf(dev)); /* PFs must use pci_register_bar */
-    assert(region_num >= 0);
-    assert(region_num < PCI_NUM_REGIONS);
+    assert(dev->exp.sriov_vf.pf);
     type = dev->exp.sriov_vf.pf->exp.sriov_pf.vf_bar_type[region_num];
 
-    if (!is_power_of_2(size)) {
-        error_report("%s: PCI region size must be a power"
-                     " of two - type=0x%x, size=0x%"FMT_PCIBUS,
-                     __func__, type, size);
-        exit(1);
-    }
-
-    r = &dev->io_regions[region_num];
-    r->memory = memory;
-    r->address_space =
-        type & PCI_BASE_ADDRESS_SPACE_IO
-        ? bus->address_space_io
-        : bus->address_space_mem;
-    r->size = size;
-    r->type = type;
-
-    r->addr = pci_bar_address(dev, region_num, r->type, r->size);
-    if (r->addr != PCI_BAR_UNMAPPED) {
-        memory_region_add_subregion_overlap(r->address_space,
-                                            r->addr, r->memory, 1);
-    }
+    return pci_register_bar(dev, region_num, type, memory);
 }
 
-static void clear_ctrl_vfe(PCIDevice *dev)
+static gint compare_vf_devfns(gconstpointer a, gconstpointer b)
 {
-    uint8_t *ctrl = dev->config + dev->exp.sriov_cap + PCI_SRIOV_CTRL;
-    pci_set_word(ctrl, pci_get_word(ctrl) & ~PCI_SRIOV_CTRL_VFE);
+    return (*(PCIDevice **)a)->devfn - (*(PCIDevice **)b)->devfn;
 }
 
-static void register_vfs(PCIDevice *dev)
+bool pcie_sriov_pf_init_from_user_created_vfs(PCIDevice *dev, uint16_t offset,
+                                              Error **errp)
 {
-    uint16_t num_vfs;
+    GPtrArray *pf;
+    PCIDevice **vfs;
+    BusState *bus = qdev_get_parent_bus(DEVICE(dev));
+    uint16_t ven_id = pci_get_word(dev->config + PCI_VENDOR_ID);
+    uint16_t vf_dev_id;
+    uint16_t vf_offset;
+    uint16_t vf_stride;
     uint16_t i;
-    uint16_t sriov_cap = dev->exp.sriov_cap;
 
-    assert(sriov_cap > 0);
-    num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
-    if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
-        clear_ctrl_vfe(dev);
-        return;
+    if (!pfs || !dev->qdev.id) {
+        return false;
     }
 
-    trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
-                             PCI_FUNC(dev->devfn), num_vfs);
-    for (i = 0; i < num_vfs; i++) {
-        pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
+    pf = g_hash_table_lookup(pfs, dev->qdev.id);
+    if (!pf) {
+        return false;
+    }
+
+    if (pf->len > UINT16_MAX) {
+        error_setg(errp, "too many VFs");
+        return false;
+    }
+
+    g_ptr_array_sort(pf, compare_vf_devfns);
+    vfs = (void *)pf->pdata;
+
+    if (vfs[0]->devfn <= dev->devfn) {
+        error_setg(errp, "a VF function number is less than the PF function number");
+        return false;
     }
+
+    vf_dev_id = pci_get_word(vfs[0]->config + PCI_DEVICE_ID);
+    vf_offset = vfs[0]->devfn - dev->devfn;
+    vf_stride = pf->len < 2 ? 0 : vfs[1]->devfn - vfs[0]->devfn;
+
+    for (i = 0; i < pf->len; i++) {
+        if (bus != qdev_get_parent_bus(&vfs[i]->qdev)) {
+            error_setg(errp, "SR-IOV VF parent bus mismatches with PF");
+            return false;
+        }
+
+        if (ven_id != pci_get_word(vfs[i]->config + PCI_VENDOR_ID)) {
+            error_setg(errp, "SR-IOV VF vendor ID mismatches with PF");
+            return false;
+        }
+
+        if (vf_dev_id != pci_get_word(vfs[i]->config + PCI_DEVICE_ID)) {
+            error_setg(errp, "inconsistent SR-IOV VF device IDs");
+            return false;
+        }
+
+        for (size_t j = 0; j < PCI_NUM_REGIONS; j++) {
+            if (vfs[i]->io_regions[j].size != vfs[0]->io_regions[j].size ||
+                vfs[i]->io_regions[j].type != vfs[0]->io_regions[j].type) {
+                error_setg(errp, "inconsistent SR-IOV BARs");
+                return false;
+            }
+        }
+
+        if (vfs[i]->devfn - vfs[0]->devfn != vf_stride * i) {
+            error_setg(errp, "inconsistent SR-IOV stride");
+            return false;
+        }
+    }
+
+    if (!pcie_sriov_pf_init_common(dev, offset, vf_dev_id, pf->len,
+                                   pf->len, vf_offset, vf_stride, errp)) {
+        return false;
+    }
+
+    for (i = 0; i < pf->len; i++) {
+        vfs[i]->exp.sriov_vf.pf = dev;
+        vfs[i]->exp.sriov_vf.vf_number = i;
+
+        /* set vid/did according to sr/iov spec - they are not used */
+        pci_config_set_vendor_id(vfs[i]->config, 0xffff);
+        pci_config_set_device_id(vfs[i]->config, 0xffff);
+    }
+
+    dev->exp.sriov_pf.vf = vfs;
+    dev->exp.sriov_pf.vf_user_created = true;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        uint8_t type = vfs[0]->io_regions[i].type;
+        pcibus_t size = vfs[0]->io_regions[i].size;
+
+        if (size) {
+            pcie_sriov_pf_init_vf_bar(dev, i, type, size);
+        }
+    }
+
+    return true;
 }
 
-static void unregister_vfs(PCIDevice *dev)
+bool pcie_sriov_register_device(PCIDevice *dev, Error **errp)
 {
-    uint16_t i;
-    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
+    if (!dev->exp.sriov_pf.vf && dev->qdev.id &&
+        pfs && g_hash_table_contains(pfs, dev->qdev.id)) {
+        error_setg(errp, "attaching user-created SR-IOV VF unsupported");
+        return false;
+    }
 
-    trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
-                               PCI_FUNC(dev->devfn));
-    for (i = 0; i < pci_get_word(cfg + PCI_SRIOV_TOTAL_VF); i++) {
-        pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
+    if (dev->sriov_pf) {
+        PCIDevice *pci_pf;
+        GPtrArray *pf;
+
+        if (!PCI_DEVICE_GET_CLASS(dev)->sriov_vf_user_creatable) {
+            error_setg(errp, "user cannot create SR-IOV VF with this device type");
+            return false;
+        }
+
+        if (!pci_is_express(dev)) {
+            error_setg(errp, "PCI Express is required for SR-IOV VF");
+            return false;
+        }
+
+        if (!pci_qdev_find_device(dev->sriov_pf, &pci_pf)) {
+            error_setg(errp, "PCI device specified as SR-IOV PF already exists");
+            return false;
+        }
+
+        if (!pfs) {
+            pfs = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, NULL);
+        }
+
+        pf = g_hash_table_lookup(pfs, dev->sriov_pf);
+        if (!pf) {
+            pf = g_ptr_array_new();
+            g_hash_table_insert(pfs, g_strdup(dev->sriov_pf), pf);
+        }
+
+        g_ptr_array_add(pf, dev);
+    }
+
+    return true;
+}
+
+void pcie_sriov_unregister_device(PCIDevice *dev)
+{
+    if (dev->sriov_pf && pfs) {
+        GPtrArray *pf = g_hash_table_lookup(pfs, dev->qdev.id);
+
+        if (pf) {
+            g_ptr_array_remove_fast(pf, dev);
+
+            if (!pf->len) {
+                g_hash_table_remove(pfs, dev->qdev.id);
+                g_ptr_array_free(pf, FALSE);
+            }
+        }
     }
 }
 
@@ -316,7 +488,7 @@ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize)
 
 uint16_t pcie_sriov_vf_number(PCIDevice *dev)
 {
-    assert(pci_is_vf(dev));
+    assert(dev->exp.sriov_vf.pf);
     return dev->exp.sriov_vf.vf_number;
 }
 
-- 
2.44.0
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH RFC v4 6/7] virtio-pci: Implement SR-IOV PF
  2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
                   ` (4 preceding siblings ...)
  2024-04-28  9:05 ` [PATCH RFC v4 5/7] pcie_sriov: Allow user to create SR-IOV device Akihiko Odaki
@ 2024-04-28  9:05 ` Akihiko Odaki
  2024-04-28  9:05 ` [PATCH RFC v4 7/7] virtio-net: Implement SR-IOV VF Akihiko Odaki
  2024-05-16  2:00 ` [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Yui Washizu
  7 siblings, 0 replies; 13+ messages in thread
From: Akihiko Odaki @ 2024-04-28  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki
Allow user to attach SR-IOV VF to a virtio-pci PF.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/virtio/virtio-pci.c | 7 +++++++
 1 file changed, 7 insertions(+)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index eaaf86402cfa..996bb2cbad20 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2245,6 +2245,12 @@ static void virtio_pci_device_plugged(DeviceState *d, Error **errp)
         pci_register_bar(&proxy->pci_dev, proxy->legacy_io_bar_idx,
                          PCI_BASE_ADDRESS_SPACE_IO, &proxy->bar);
     }
+
+    if (pcie_sriov_pf_init_from_user_created_vfs(&proxy->pci_dev,
+                                                 PCI_CONFIG_SPACE_SIZE,
+                                                 errp)) {
+        virtio_add_feature(&vdev->host_features, VIRTIO_F_SR_IOV);
+    }
 }
 
 static void virtio_pci_device_unplugged(DeviceState *d)
@@ -2253,6 +2259,7 @@ static void virtio_pci_device_unplugged(DeviceState *d)
     bool modern = virtio_pci_modern(proxy);
     bool modern_pio = proxy->flags & VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY;
 
+    pcie_sriov_pf_exit(&proxy->pci_dev);
     virtio_pci_stop_ioeventfd(proxy);
 
     if (modern) {
-- 
2.44.0
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH RFC v4 7/7] virtio-net: Implement SR-IOV VF
  2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
                   ` (5 preceding siblings ...)
  2024-04-28  9:05 ` [PATCH RFC v4 6/7] virtio-pci: Implement SR-IOV PF Akihiko Odaki
@ 2024-04-28  9:05 ` Akihiko Odaki
  2024-05-16  2:00 ` [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Yui Washizu
  7 siblings, 0 replies; 13+ messages in thread
From: Akihiko Odaki @ 2024-04-28  9:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Yui Washizu, Akihiko Odaki
A virtio-net device can be added as a SR-IOV VF to another virtio-pci
device that will be the PF.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/virtio/virtio-net-pci.c | 1 +
 1 file changed, 1 insertion(+)
diff --git a/hw/virtio/virtio-net-pci.c b/hw/virtio/virtio-net-pci.c
index e03543a70a75..dba4987d6e04 100644
--- a/hw/virtio/virtio-net-pci.c
+++ b/hw/virtio/virtio-net-pci.c
@@ -75,6 +75,7 @@ static void virtio_net_pci_class_init(ObjectClass *klass, void *data)
     k->device_id = PCI_DEVICE_ID_VIRTIO_NET;
     k->revision = VIRTIO_PCI_ABI_VERSION;
     k->class_id = PCI_CLASS_NETWORK_ETHERNET;
+    k->sriov_vf_user_creatable = true;
     set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
     device_class_set_props(dc, virtio_net_properties);
     vpciklass->realize = virtio_net_pci_realize;
-- 
2.44.0
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
  2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
                   ` (6 preceding siblings ...)
  2024-04-28  9:05 ` [PATCH RFC v4 7/7] virtio-net: Implement SR-IOV VF Akihiko Odaki
@ 2024-05-16  2:00 ` Yui Washizu
  2024-07-15  5:15   ` Akihiko Odaki
  7 siblings, 1 reply; 13+ messages in thread
From: Yui Washizu @ 2024-05-16  2:00 UTC (permalink / raw)
  To: Akihiko Odaki, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Jason Wang,
	Sriram Yagnaraman, Keith Busch, Klaus Jensen
  Cc: qemu-devel, qemu-block
On 2024/04/28 18:05, Akihiko Odaki wrote:
> Based-on: <20240315-reuse-v9-0-67aa69af4d53@daynix.com>
> ("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and improvements")
>
> Introduction
> ------------
>
> This series is based on the RFC series submitted by Yui Washizu[1].
> See also [2] for the context.
>
> This series enables SR-IOV emulation for virtio-net. It is useful
> to test SR-IOV support on the guest, or to expose several vDPA devices
> in a VM. vDPA devices can also provide L2 switching feature for
> offloading though it is out of scope to allow the guest to configure
> such a feature.
>
> The PF side code resides in virtio-pci. The VF side code resides in
> the PCI common infrastructure, but it is restricted to work only for
> virtio-net-pci because of lack of validation.
>
> User Interface
> --------------
>
> A user can configure a SR-IOV capable virtio-net device by adding
> virtio-net-pci functions to a bus. Below is a command line example:
>    -netdev user,id=n -netdev user,id=o
>    -netdev user,id=p -netdev user,id=q
>    -device pcie-root-port,id=b
>    -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>    -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>    -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>    -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>
> The VFs specify the paired PF with "sriov-pf" property. The PF must be
> added after all VFs. It is user's responsibility to ensure that VFs have
> function numbers larger than one of the PF, and the function numbers
> have a consistent stride.
I tried to start a VM with more than 8 VFs allocated using your patch,
but the following error occured and qemu didn't work:
VF function number overflows.
I think the cause of this error is that virtio-net-pci PFs don't have ARI.
(pcie_ari_init is not added to virtio-net-pci when PFs are initialized.)
I think it is possible to add it later,
but how about adding pcie_ari_init ?
As a trial,
adding pcie_ari_init to virtio_pci_realize enabled the creation of more 
than 8 VFs.
>
> Keeping VF instances
> --------------------
>
> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
> the guest requests. Previously, this behavior was implemented by
> realizing and unrealizing VFs at runtime. However, this strategy does
> not work well for the proposed virtio-net emulation; in this proposal,
> device options passed in the command line must be maintained as VFs
> are hotplugged, but they are consumed when the machine starts and not
> available after that, which makes realizing VFs at runtime impossible.
>
> As an strategy alternative to runtime realization/unrealization, this
> series proposes to reuse the code to power down PCI Express devices.
> When a PCI Express device is powered down, it will be hidden from the
> guest but will be kept realized. This effectively implements the
> behavior we need for the SR-IOV emulation.
>
> Summary
> -------
>
> Patch 1 disables ROM BAR, which virtio-net-pci enables by default, for
> VFs.
> Patch 2 makes zero stride valid for 1 VF configuration.
> Patch 3 and 4 adds validations.
> Patch 5 adds user-created SR-IOV VF infrastructure.
> Patch 6 makes virtio-pci work as SR-IOV PF for user-created VFs.
> Patch 7 allows user to create SR-IOV VFs with virtio-net-pci.
>
> [1] https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
> [2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4bc5@daynix.com/
>
> Co-developed-by: Yui Washizu <yui.washidu@gmail.com>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> Changes in v4:
> - Added patch "hw/pci: Fix SR-IOV VF number calculation" to fix division
>    by zero reported by Yui Washizu.
> - Rebased.
> - Link to v3: https://lore.kernel.org/r/20240305-sriov-v3-0-abdb75770372@daynix.com
>
> Changes in v3:
> - Rebased.
> - Link to v2: https://lore.kernel.org/r/20231210-sriov-v2-0-b959e8a6dfaf@daynix.com
>
> Changes in v2:
> - Changed to keep VF instances.
> - Link to v1: https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7bd6@daynix.com
>
> ---
> Akihiko Odaki (7):
>        hw/pci: Do not add ROM BAR for SR-IOV VF
>        hw/pci: Fix SR-IOV VF number calculation
>        pcie_sriov: Ensure PF and VF are mutually exclusive
>        pcie_sriov: Check PCI Express for SR-IOV PF
>        pcie_sriov: Allow user to create SR-IOV device
>        virtio-pci: Implement SR-IOV PF
>        virtio-net: Implement SR-IOV VF
>
>   include/hw/pci/pci_device.h |   6 +-
>   include/hw/pci/pcie_sriov.h |  19 +++
>   hw/pci/pci.c                |  76 +++++++----
>   hw/pci/pcie_sriov.c         | 298 +++++++++++++++++++++++++++++++++++---------
>   hw/virtio/virtio-net-pci.c  |   1 +
>   hw/virtio/virtio-pci.c      |   7 ++
>   6 files changed, 323 insertions(+), 84 deletions(-)
> ---
> base-commit: 2ac5458086ab61282f30c2f8bdf2ae9a0a06a75d
> change-id: 20231202-sriov-9402fb262be8
>
> Best regards,
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
  2024-05-16  2:00 ` [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Yui Washizu
@ 2024-07-15  5:15   ` Akihiko Odaki
  2024-07-31  9:34     ` Yui Washizu
  0 siblings, 1 reply; 13+ messages in thread
From: Akihiko Odaki @ 2024-07-15  5:15 UTC (permalink / raw)
  To: Yui Washizu, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Jason Wang,
	Sriram Yagnaraman, Keith Busch, Klaus Jensen
  Cc: qemu-devel, qemu-block
On 2024/05/16 11:00, Yui Washizu wrote:
> 
> On 2024/04/28 18:05, Akihiko Odaki wrote:
>> Based-on: <20240315-reuse-v9-0-67aa69af4d53@daynix.com>
>> ("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and 
>> improvements")
>>
>> Introduction
>> ------------
>>
>> This series is based on the RFC series submitted by Yui Washizu[1].
>> See also [2] for the context.
>>
>> This series enables SR-IOV emulation for virtio-net. It is useful
>> to test SR-IOV support on the guest, or to expose several vDPA devices
>> in a VM. vDPA devices can also provide L2 switching feature for
>> offloading though it is out of scope to allow the guest to configure
>> such a feature.
>>
>> The PF side code resides in virtio-pci. The VF side code resides in
>> the PCI common infrastructure, but it is restricted to work only for
>> virtio-net-pci because of lack of validation.
>>
>> User Interface
>> --------------
>>
>> A user can configure a SR-IOV capable virtio-net device by adding
>> virtio-net-pci functions to a bus. Below is a command line example:
>>    -netdev user,id=n -netdev user,id=o
>>    -netdev user,id=p -netdev user,id=q
>>    -device pcie-root-port,id=b
>>    -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>>    -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>>    -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>>    -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>>
>> The VFs specify the paired PF with "sriov-pf" property. The PF must be
>> added after all VFs. It is user's responsibility to ensure that VFs have
>> function numbers larger than one of the PF, and the function numbers
>> have a consistent stride.
> 
> 
> I tried to start a VM with more than 8 VFs allocated using your patch,
> but the following error occured and qemu didn't work:
> VF function number overflows.
> 
> I think the cause of this error is that virtio-net-pci PFs don't have ARI.
> (pcie_ari_init is not added to virtio-net-pci when PFs are initialized.)
> I think it is possible to add it later,
> but how about adding pcie_ari_init ?
> 
> As a trial,
> adding pcie_ari_init to virtio_pci_realize enabled the creation of more 
> than 8 VFs.
I have just looked into that possibility, but adding pcie_ari_init to 
virtio_pci_realize has some implications. Unconditionally calling 
pcie_ari_init will break the existing configuration of virtio-pci 
devices so we need to implement some logic to detect when ARI is needed. 
Preferably such logic should be implemented in the common PCI 
infrastructure instead of implementing it in virtio-pci so that other 
PCI multifunction devices can benefit from it.
While I don't think implementing this will be too complicated, I need to 
ensure that such a feature is really needed before doing so.
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
  2024-07-15  5:15   ` Akihiko Odaki
@ 2024-07-31  9:34     ` Yui Washizu
  2024-08-01  5:37       ` Akihiko Odaki
  0 siblings, 1 reply; 13+ messages in thread
From: Yui Washizu @ 2024-07-31  9:34 UTC (permalink / raw)
  To: Akihiko Odaki, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Jason Wang,
	Sriram Yagnaraman, Keith Busch, Klaus Jensen
  Cc: qemu-devel, qemu-block
On 2024/07/15 14:15, Akihiko Odaki wrote:
> On 2024/05/16 11:00, Yui Washizu wrote:
>>
>> On 2024/04/28 18:05, Akihiko Odaki wrote:
>>> Based-on: <20240315-reuse-v9-0-67aa69af4d53@daynix.com>
>>> ("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and 
>>> improvements")
>>>
>>> Introduction
>>> ------------
>>>
>>> This series is based on the RFC series submitted by Yui Washizu[1].
>>> See also [2] for the context.
>>>
>>> This series enables SR-IOV emulation for virtio-net. It is useful
>>> to test SR-IOV support on the guest, or to expose several vDPA devices
>>> in a VM. vDPA devices can also provide L2 switching feature for
>>> offloading though it is out of scope to allow the guest to configure
>>> such a feature.
>>>
>>> The PF side code resides in virtio-pci. The VF side code resides in
>>> the PCI common infrastructure, but it is restricted to work only for
>>> virtio-net-pci because of lack of validation.
>>>
>>> User Interface
>>> --------------
>>>
>>> A user can configure a SR-IOV capable virtio-net device by adding
>>> virtio-net-pci functions to a bus. Below is a command line example:
>>>    -netdev user,id=n -netdev user,id=o
>>>    -netdev user,id=p -netdev user,id=q
>>>    -device pcie-root-port,id=b
>>>    -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>>>    -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>>>    -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>>>    -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>>>
>>> The VFs specify the paired PF with "sriov-pf" property. The PF must be
>>> added after all VFs. It is user's responsibility to ensure that VFs 
>>> have
>>> function numbers larger than one of the PF, and the function numbers
>>> have a consistent stride.
>>
>>
>> I tried to start a VM with more than 8 VFs allocated using your patch,
>> but the following error occured and qemu didn't work:
>> VF function number overflows.
>>
>> I think the cause of this error is that virtio-net-pci PFs don't have 
>> ARI.
>> (pcie_ari_init is not added to virtio-net-pci when PFs are initialized.)
>> I think it is possible to add it later,
>> but how about adding pcie_ari_init ?
>>
>> As a trial,
>> adding pcie_ari_init to virtio_pci_realize enabled the creation of 
>> more than 8 VFs.
>
> I have just looked into that possibility, but adding pcie_ari_init to 
> virtio_pci_realize has some implications. Unconditionally calling 
> pcie_ari_init will break the existing configuration of virtio-pci 
> devices so we need to implement some logic to detect when ARI is 
> needed. Preferably such logic should be implemented in the common PCI 
> infrastructure instead of implementing it in virtio-pci so that other 
> PCI multifunction devices can benefit from it.
>
> While I don't think implementing this will be too complicated, I need 
> to ensure that such a feature is really needed before doing so.
OK.
I want to use this emulation for offloading virtual network
in a environment where there are many containers in VMs.
So, I consider that the feature is need.
I think that 7 VFs are too few.
I'll keep thinking about the feature's necessity.
I'll add other comments to RFC v5 patch.
Regards,
Yui Washizu
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
  2024-07-31  9:34     ` Yui Washizu
@ 2024-08-01  5:37       ` Akihiko Odaki
  2024-08-01  5:52         ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Akihiko Odaki @ 2024-08-01  5:37 UTC (permalink / raw)
  To: Yui Washizu, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Jason Wang,
	Sriram Yagnaraman, Keith Busch, Klaus Jensen
  Cc: qemu-devel, qemu-block
On 2024/07/31 18:34, Yui Washizu wrote:
> 
> On 2024/07/15 14:15, Akihiko Odaki wrote:
>> On 2024/05/16 11:00, Yui Washizu wrote:
>>>
>>> On 2024/04/28 18:05, Akihiko Odaki wrote:
>>>> Based-on: <20240315-reuse-v9-0-67aa69af4d53@daynix.com>
>>>> ("[PATCH for 9.1 v9 00/11] hw/pci: SR-IOV related fixes and 
>>>> improvements")
>>>>
>>>> Introduction
>>>> ------------
>>>>
>>>> This series is based on the RFC series submitted by Yui Washizu[1].
>>>> See also [2] for the context.
>>>>
>>>> This series enables SR-IOV emulation for virtio-net. It is useful
>>>> to test SR-IOV support on the guest, or to expose several vDPA devices
>>>> in a VM. vDPA devices can also provide L2 switching feature for
>>>> offloading though it is out of scope to allow the guest to configure
>>>> such a feature.
>>>>
>>>> The PF side code resides in virtio-pci. The VF side code resides in
>>>> the PCI common infrastructure, but it is restricted to work only for
>>>> virtio-net-pci because of lack of validation.
>>>>
>>>> User Interface
>>>> --------------
>>>>
>>>> A user can configure a SR-IOV capable virtio-net device by adding
>>>> virtio-net-pci functions to a bus. Below is a command line example:
>>>>    -netdev user,id=n -netdev user,id=o
>>>>    -netdev user,id=p -netdev user,id=q
>>>>    -device pcie-root-port,id=b
>>>>    -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>>>>    -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>>>>    -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>>>>    -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>>>>
>>>> The VFs specify the paired PF with "sriov-pf" property. The PF must be
>>>> added after all VFs. It is user's responsibility to ensure that VFs 
>>>> have
>>>> function numbers larger than one of the PF, and the function numbers
>>>> have a consistent stride.
>>>
>>>
>>> I tried to start a VM with more than 8 VFs allocated using your patch,
>>> but the following error occured and qemu didn't work:
>>> VF function number overflows.
>>>
>>> I think the cause of this error is that virtio-net-pci PFs don't have 
>>> ARI.
>>> (pcie_ari_init is not added to virtio-net-pci when PFs are initialized.)
>>> I think it is possible to add it later,
>>> but how about adding pcie_ari_init ?
>>>
>>> As a trial,
>>> adding pcie_ari_init to virtio_pci_realize enabled the creation of 
>>> more than 8 VFs.
>>
>> I have just looked into that possibility, but adding pcie_ari_init to 
>> virtio_pci_realize has some implications. Unconditionally calling 
>> pcie_ari_init will break the existing configuration of virtio-pci 
>> devices so we need to implement some logic to detect when ARI is 
>> needed. Preferably such logic should be implemented in the common PCI 
>> infrastructure instead of implementing it in virtio-pci so that other 
>> PCI multifunction devices can benefit from it.
>>
>> While I don't think implementing this will be too complicated, I need 
>> to ensure that such a feature is really needed before doing so.
> 
> 
> OK.
> I want to use this emulation for offloading virtual network
> in a environment where there are many containers in VMs.
> So, I consider that the feature is need.
> I think that 7 VFs are too few.
> I'll keep thinking about the feature's necessity.
I understand there could be many containers in VMs, but will a single 
device deal with them? If the virtio-net VFs are backed by the vDPA 
capability of one physical device, it will not have VFs more than that. 
The VMs must have several PFs individually paired with VFs to 
accommodate more containers on one VM.
I don't know much about vDPA-capable device, but as a reference, igb 
only has 8 VFs.
> 
> 
> I'll add other comments to RFC v5 patch.
The RFC tag is already dropped.
Regards,
Akihiko Odaki
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation
  2024-08-01  5:37       ` Akihiko Odaki
@ 2024-08-01  5:52         ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-08-01  5:52 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Yui Washizu, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Jason Wang, Sriram Yagnaraman, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block
On Thu, Aug 01, 2024 at 02:37:55PM +0900, Akihiko Odaki wrote:
> I don't know much about vDPA-capable device, but as a reference, igb only
> has 8 VFs.
modern vdpa capable devices have much more than 8 VFs, 8 is a very low
number.
-- 
MST
^ permalink raw reply	[flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-08-01  5:53 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-28  9:05 [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Akihiko Odaki
2024-04-28  9:05 ` [PATCH RFC v4 1/7] hw/pci: Do not add ROM BAR for SR-IOV VF Akihiko Odaki
2024-04-28  9:05 ` [PATCH RFC v4 2/7] hw/pci: Fix SR-IOV VF number calculation Akihiko Odaki
2024-04-28  9:05 ` [PATCH RFC v4 3/7] pcie_sriov: Ensure PF and VF are mutually exclusive Akihiko Odaki
2024-04-28  9:05 ` [PATCH RFC v4 4/7] pcie_sriov: Check PCI Express for SR-IOV PF Akihiko Odaki
2024-04-28  9:05 ` [PATCH RFC v4 5/7] pcie_sriov: Allow user to create SR-IOV device Akihiko Odaki
2024-04-28  9:05 ` [PATCH RFC v4 6/7] virtio-pci: Implement SR-IOV PF Akihiko Odaki
2024-04-28  9:05 ` [PATCH RFC v4 7/7] virtio-net: Implement SR-IOV VF Akihiko Odaki
2024-05-16  2:00 ` [PATCH RFC v4 0/7] virtio-net: add support for SR-IOV emulation Yui Washizu
2024-07-15  5:15   ` Akihiko Odaki
2024-07-31  9:34     ` Yui Washizu
2024-08-01  5:37       ` Akihiko Odaki
2024-08-01  5:52         ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).