qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements
@ 2024-06-27  6:07 Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 01/12] hw/pci: Rename has_power to enabled Akihiko Odaki
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

I submitted a RFC series[1] to add support for SR-IOV emulation to
virtio-net-pci. During the development of the series, I fixed some
trivial bugs and made improvements that I think are independently
useful. This series extracts those fixes and improvements from the RFC
series.

[1]: https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6dfaf@daynix.com/

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
Changes in v10:
- Added patch "hw/ppc/spapr_pci: Do not reject VFs created after a PF".
- Added patch "hw/ppc/spapr_pci: Do not create DT for disabled PCI device".
- Added patch "hw/pci: Convert rom_bar into OnOffAuto".
- Dropped patch "hw/pci: Determine if rombar is explicitly enabled".
- Dropped patch "hw/qdev: Remove opts member".
- Link to v9: https://lore.kernel.org/r/20240315-reuse-v9-0-67aa69af4d53@daynix.com

Changes in v9:
- Rebased.
- Restored '#include "qapi/error.h"' (Michael S. Tsirkin)
- Added patch "pcie_sriov: Ensure VF function number does not overflow"
  to fix abortion with wrong PF addr.
- Link to v8: https://lore.kernel.org/r/20240228-reuse-v8-0-282660281e60@daynix.com

Changes in v8:
- Clarified that "hw/pci: Replace -1 with UINT32_MAX for romsize" is
  not a bug fix. (Markus Armbruster)
- Squashed patch "vfio: Avoid inspecting option QDict for rombar" into
  "hw/pci: Determine if rombar is explicitly enabled".
  (Markus Armbruster)
- Noted the minor semantics change for patch "hw/pci: Determine if
  rombar is explicitly enabled". (Markus Armbruster)
- Link to v7: https://lore.kernel.org/r/20240224-reuse-v7-0-29c14bcb952e@daynix.com

Changes in v7:
- Replaced -1 with UINT32_MAX when expressing uint32_t.
  (Markus Armbruster)
- Added patch "hw/pci: Replace -1 with UINT32_MAX for romsize".
- Link to v6: https://lore.kernel.org/r/20240220-reuse-v6-0-2e42a28b0cf2@daynix.com

Changes in v6:
- Fixed migration.
- Added patch "pcie_sriov: Do not manually unrealize".
- Restored patch "pcie_sriov: Release VFs failed to realize" that was
  missed in v5.
- Link to v5: https://lore.kernel.org/r/20240218-reuse-v5-0-e4fc1c19b5a9@daynix.com

Changes in v5:
- Added patch "hw/pci: Always call pcie_sriov_pf_reset()".
- Added patch "pcie_sriov: Reset SR-IOV extended capability".
- Removed a reference to PCI_SRIOV_CTRL_VFE in hw/nvme.
  (Michael S. Tsirkin)
- Noted the impact on the guest of patch "pcie_sriov: Do not reset
  NumVFs after unregistering VFs". (Michael S. Tsirkin)
- Changed to use pcie_sriov_num_vfs().
- Restored pci_set_power() and changed it to call pci_set_enabled() only
  for PFs with an expalanation. (Michael S. Tsirkin)
- Reordered patches.
- Link to v4: https://lore.kernel.org/r/20240214-reuse-v4-0-89ad093a07f4@daynix.com

Changes in v4:
- Reverted the change to pci_rom_bar_explicitly_enabled().
  (Michael S. Tsirkin)
- Added patch "pcie_sriov: Do not reset NumVFs after unregistering VFs".
- Added patch "hw/nvme: Refer to dev->exp.sriov_pf.num_vfs".
- Link to v3: https://lore.kernel.org/r/20240212-reuse-v3-0-8017b689ce7f@daynix.com

Changes in v3:
- Extracted patch "hw/pci: Use -1 as a default value for rombar" from
  patch "hw/pci: Determine if rombar is explicitly enabled"
  (Philippe Mathieu-Daudé)
- Added an audit result of PCIDevice::rom_bar to the message of patch
  "hw/pci: Use -1 as a default value for rombar"
  (Philippe Mathieu-Daudé)
- Link to v2: https://lore.kernel.org/r/20240210-reuse-v2-0-24ba2a502692@daynix.com

Changes in v2:
- Reset after enabling a function so that NVMe VF state gets updated.
- Link to v1: https://lore.kernel.org/r/20240203-reuse-v1-0-5be8c5ce6338@daynix.com

---
Akihiko Odaki (12):
      hw/pci: Rename has_power to enabled
      hw/ppc/spapr_pci: Do not create DT for disabled PCI device
      hw/ppc/spapr_pci: Do not reject VFs created after a PF
      pcie_sriov: Do not manually unrealize
      pcie_sriov: Ensure VF function number does not overflow
      pcie_sriov: Reuse SR-IOV VF device instances
      pcie_sriov: Release VFs failed to realize
      pcie_sriov: Remove num_vfs from PCIESriovPF
      pcie_sriov: Register VFs after migration
      hw/pci: Replace -1 with UINT32_MAX for romsize
      hw/pci: Convert rom_bar into OnOffAuto
      hw/qdev: Remove opts member

 docs/igd-assign.txt               |   2 +-
 docs/pcie_sriov.txt               |   8 +-
 include/hw/pci/pci.h              |   2 +-
 include/hw/pci/pci_device.h       |  19 ++++-
 include/hw/pci/pcie_sriov.h       |   9 ++-
 include/hw/qdev-core.h            |   4 -
 hw/core/qdev.c                    |   1 -
 hw/net/igb.c                      |  13 +++-
 hw/nvme/ctrl.c                    |  24 ++++--
 hw/pci/pci.c                      |  33 +++++----
 hw/pci/pci_host.c                 |   4 +-
 hw/pci/pcie_sriov.c               | 149 +++++++++++++++++++++-----------------
 hw/ppc/spapr_pci.c                |   8 +-
 hw/vfio/pci-quirks.c              |   2 +-
 hw/vfio/pci.c                     |  11 ++-
 hw/xen/xen_pt_load_rom.c          |   6 +-
 system/qdev-monitor.c             |  12 +--
 tests/qtest/virtio-net-failover.c |  32 ++++----
 hw/pci/trace-events               |   2 +-
 19 files changed, 200 insertions(+), 141 deletions(-)
---
base-commit: 74abb45dac6979e7ff76172b7f0a24e869405184
change-id: 20240129-reuse-faae22b11934

Best regards,
-- 
Akihiko Odaki <akihiko.odaki@daynix.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v10 01/12] hw/pci: Rename has_power to enabled
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 02/12] hw/ppc/spapr_pci: Do not create DT for disabled PCI device Akihiko Odaki
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

The renamed state will not only represent powering state of PFs, but
also represent SR-IOV VF enablement in the future.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pci.h        |  7 ++++++-
 include/hw/pci/pci_device.h |  2 +-
 hw/pci/pci.c                | 14 +++++++-------
 hw/pci/pci_host.c           |  4 ++--
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index eaa3fc99d884..6c92b2f70008 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -642,6 +642,11 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
-void pci_set_power(PCIDevice *pci_dev, bool state);
+void pci_set_enabled(PCIDevice *pci_dev, bool state);
+
+static inline void pci_set_power(PCIDevice *pci_dev, bool state)
+{
+    pci_set_enabled(pci_dev, state);
+}
 
 #endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b273..d57f9ce83884 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -56,7 +56,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 struct PCIDevice {
     DeviceState qdev;
     bool partially_hotplugged;
-    bool has_power;
+    bool enabled;
 
     /* PCI config space */
     uint8_t *config;
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 324c1302d25f..295a32714a4a 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1525,7 +1525,7 @@ static void pci_update_mappings(PCIDevice *d)
             continue;
 
         new_addr = pci_bar_address(d, i, r->type, r->size);
-        if (!d->has_power) {
+        if (!d->enabled) {
             new_addr = PCI_BAR_UNMAPPED;
         }
 
@@ -1613,7 +1613,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
         pci_update_irq_disabled(d, was_irq_disabled);
         memory_region_set_enabled(&d->bus_master_enable_region,
                                   (pci_get_word(d->config + PCI_COMMAND)
-                                   & PCI_COMMAND_MASTER) && d->has_power);
+                                   & PCI_COMMAND_MASTER) && d->enabled);
     }
 
     msi_write_config(d, addr, val_in, l);
@@ -2811,18 +2811,18 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
     return msg;
 }
 
-void pci_set_power(PCIDevice *d, bool state)
+void pci_set_enabled(PCIDevice *d, bool state)
 {
-    if (d->has_power == state) {
+    if (d->enabled == state) {
         return;
     }
 
-    d->has_power = state;
+    d->enabled = state;
     pci_update_mappings(d);
     memory_region_set_enabled(&d->bus_master_enable_region,
                               (pci_get_word(d->config + PCI_COMMAND)
-                               & PCI_COMMAND_MASTER) && d->has_power);
-    if (!d->has_power) {
+                               & PCI_COMMAND_MASTER) && d->enabled);
+    if (!d->enabled) {
         pci_device_reset(d);
     }
 }
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index dfe6fe618401..0d82727cc9dd 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -86,7 +86,7 @@ void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr,
      * allowing direct removal of unexposed functions.
      */
     if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
-        !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) {
+        !pci_dev->enabled || is_pci_dev_ejected(pci_dev)) {
         return;
     }
 
@@ -111,7 +111,7 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr,
      * allowing direct removal of unexposed functions.
      */
     if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
-        !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) {
+        !pci_dev->enabled || is_pci_dev_ejected(pci_dev)) {
         return ~0x0;
     }
 

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 02/12] hw/ppc/spapr_pci: Do not create DT for disabled PCI device
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 01/12] hw/pci: Rename has_power to enabled Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 03/12] hw/ppc/spapr_pci: Do not reject VFs created after a PF Akihiko Odaki
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

Disabled means it is a disabled SR-IOV VF or it is powered off, and
hidden from the guest.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/ppc/spapr_pci.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 7cf9904c3546..f63182a03c41 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1296,6 +1296,10 @@ static void spapr_dt_pci_device_cb(PCIBus *bus, PCIDevice *pdev,
         return;
     }
 
+    if (!pdev->enabled) {
+        return;
+    }
+
     err = spapr_dt_pci_device(p->sphb, pdev, p->fdt, p->offset);
     if (err < 0) {
         p->err = err;

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 03/12] hw/ppc/spapr_pci: Do not reject VFs created after a PF
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 01/12] hw/pci: Rename has_power to enabled Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 02/12] hw/ppc/spapr_pci: Do not create DT for disabled PCI device Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 04/12] pcie_sriov: Do not manually unrealize Akihiko Odaki
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

A PF may automatically create VFs and the PF may be function 0.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/ppc/spapr_pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index f63182a03c41..ed4454bbf79e 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1573,7 +1573,9 @@ static void spapr_pci_pre_plug(HotplugHandler *plug_handler,
      * hotplug, we do not allow functions to be hotplugged to a
      * slot that already has function 0 present
      */
-    if (plugged_dev->hotplugged && bus->devices[PCI_DEVFN(slotnr, 0)] &&
+    if (plugged_dev->hotplugged &&
+        !pci_is_vf(pdev) &&
+        bus->devices[PCI_DEVFN(slotnr, 0)] &&
         PCI_FUNC(pdev->devfn) != 0) {
         error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
                    " additional functions can no longer be exposed to guest.",

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 04/12] pcie_sriov: Do not manually unrealize
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (2 preceding siblings ...)
  2024-06-27  6:07 ` [PATCH v10 03/12] hw/ppc/spapr_pci: Do not reject VFs created after a PF Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 05/12] pcie_sriov: Ensure VF function number does not overflow Akihiko Odaki
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

A device gets automatically unrealized when being unparented.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pcie_sriov.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index e9b23221d713..499becd5273f 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -204,11 +204,7 @@ static void unregister_vfs(PCIDevice *dev)
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
                                PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        Error *err = NULL;
         PCIDevice *vf = dev->exp.sriov_pf.vf[i];
-        if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) {
-            error_reportf_err(err, "Failed to unplug: ");
-        }
         object_unparent(OBJECT(vf));
         object_unref(OBJECT(vf));
     }

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 05/12] pcie_sriov: Ensure VF function number does not overflow
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (3 preceding siblings ...)
  2024-06-27  6:07 ` [PATCH v10 04/12] pcie_sriov: Do not manually unrealize Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

pci_new() aborts when creating a VF with a function number equals to or
is greater than PCI_DEVFN_MAX.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 docs/pcie_sriov.txt         |  8 +++++---
 include/hw/pci/pcie_sriov.h |  5 +++--
 hw/net/igb.c                | 13 ++++++++++---
 hw/nvme/ctrl.c              | 24 ++++++++++++++++--------
 hw/pci/pcie_sriov.c         | 19 +++++++++++++++++--
 5 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index a47aad0bfab0..ab2142807f79 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -52,9 +52,11 @@ setting up a BAR for a VF.
       ...
 
       /* Add and initialize the SR/IOV capability */
-      pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
-                       vf_devid, initial_vfs, total_vfs,
-                       fun_offset, stride);
+      if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+                              vf_devid, initial_vfs, total_vfs,
+                              fun_offset, stride, errp)) {
+         return;
+      }
 
       /* Set up individual VF BARs (parameters as for normal BARs) */
       pcie_sriov_pf_init_vf_bar( ... )
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 450cbef6c201..aa704e8f9d9f 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -27,10 +27,11 @@ typedef struct PCIESriovVF {
     uint16_t vf_number; /* Logical VF number of this function */
 } PCIESriovVF;
 
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         const char *vfname, uint16_t vf_dev_id,
                         uint16_t init_vfs, uint16_t total_vfs,
-                        uint16_t vf_offset, uint16_t vf_stride);
+                        uint16_t vf_offset, uint16_t vf_stride,
+                        Error **errp);
 void pcie_sriov_pf_exit(PCIDevice *dev);
 
 /* Set up a VF bar in the SR/IOV bar area */
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 1ef6170465f1..4a19fca1c638 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -447,9 +447,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp)
 
     pcie_ari_init(pci_dev, 0x150);
 
-    pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
-        IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
-        IGB_VF_OFFSET, IGB_VF_STRIDE);
+    if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET,
+                            TYPE_IGBVF, IGB_82576_VF_DEV_ID,
+                            IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
+                            IGB_VF_OFFSET, IGB_VF_STRIDE,
+                            errp)) {
+        pcie_cap_exit(pci_dev);
+        igb_cleanup_msix(s);
+        msi_uninit(pci_dev);
+        return;
+    }
 
     pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
         PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 127c3d238346..066389e391b6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8048,7 +8048,8 @@ out:
     return pow2ceil(bar_size);
 }
 
-static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
+static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+                            Error **errp)
 {
     uint16_t vf_dev_id = n->params.use_intel_id ?
                          PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
@@ -8057,12 +8058,17 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
                                       le16_to_cpu(cap->vifrsm),
                                       NULL, NULL);
 
-    pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
-                       n->params.sriov_max_vfs, n->params.sriov_max_vfs,
-                       NVME_VF_OFFSET, NVME_VF_STRIDE);
+    if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
+                            n->params.sriov_max_vfs, n->params.sriov_max_vfs,
+                            NVME_VF_OFFSET, NVME_VF_STRIDE,
+                            errp)) {
+        return false;
+    }
 
     pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
                               PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
+
+    return true;
 }
 
 static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
@@ -8155,6 +8161,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
         return false;
     }
 
+    if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs &&
+        !nvme_init_sriov(n, pci_dev, 0x120, errp)) {
+        msix_uninit(pci_dev, &n->bar0, &n->bar0);
+        return false;
+    }
+
     nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
 
     if (n->params.cmb_size_mb) {
@@ -8165,10 +8177,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
         nvme_init_pmr(n, pci_dev);
     }
 
-    if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) {
-        nvme_init_sriov(n, pci_dev, 0x120);
-    }
-
     return true;
 }
 
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 499becd5273f..f0bde0d3fc79 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -24,14 +24,27 @@ static PCIDevice *register_vf(PCIDevice *pf, int devfn,
                               const char *name, uint16_t vf_num);
 static void unregister_vfs(PCIDevice *dev);
 
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         const char *vfname, uint16_t vf_dev_id,
                         uint16_t init_vfs, uint16_t total_vfs,
-                        uint16_t vf_offset, uint16_t vf_stride)
+                        uint16_t vf_offset, uint16_t vf_stride,
+                        Error **errp)
 {
     uint8_t *cfg = dev->config + offset;
     uint8_t *wmask;
 
+    if (total_vfs) {
+        uint16_t ari_cap = pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI);
+        uint16_t first_vf_devfn = dev->devfn + vf_offset;
+        uint16_t last_vf_devfn = first_vf_devfn + vf_stride * (total_vfs - 1);
+
+        if ((!ari_cap && PCI_SLOT(dev->devfn) != PCI_SLOT(last_vf_devfn)) ||
+            last_vf_devfn >= PCI_DEVFN_MAX) {
+            error_setg(errp, "VF function number overflows");
+            return false;
+        }
+    }
+
     pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
                         offset, PCI_EXT_CAP_SRIOV_SIZEOF);
     dev->exp.sriov_cap = offset;
@@ -69,6 +82,8 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
     pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
 
     qdev_prop_set_bit(&dev->qdev, "multifunction", true);
+
+    return true;
 }
 
 void pcie_sriov_pf_exit(PCIDevice *dev)

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (4 preceding siblings ...)
  2024-06-27  6:07 ` [PATCH v10 05/12] pcie_sriov: Ensure VF function number does not overflow Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-07-10  6:37   ` Cédric Le Goater
  2024-06-27  6:07 ` [PATCH v10 07/12] pcie_sriov: Release VFs failed to realize Akihiko Odaki
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

Disable SR-IOV VF devices by reusing code to power down PCI devices
instead of removing them when the guest requests to disable VFs. This
allows to realize devices and report VF realization errors at PF
realization time.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pci.h        |  5 ---
 include/hw/pci/pci_device.h | 15 +++++++
 include/hw/pci/pcie_sriov.h |  1 -
 hw/pci/pci.c                |  2 +-
 hw/pci/pcie_sriov.c         | 95 +++++++++++++++++++--------------------------
 5 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6c92b2f70008..442017b4865d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -644,9 +644,4 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
 void pci_set_enabled(PCIDevice *pci_dev, bool state);
 
-static inline void pci_set_power(PCIDevice *pci_dev, bool state)
-{
-    pci_set_enabled(pci_dev, state);
-}
-
 #endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d57f9ce83884..ca151325085d 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -205,6 +205,21 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
     return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
 }
 
+static inline void pci_set_power(PCIDevice *pci_dev, bool state)
+{
+    /*
+     * Don't change the enabled state of VFs when powering on/off the device.
+     *
+     * When powering on, VFs must not be enabled immediately but they must
+     * wait until the guest configures SR-IOV.
+     * When powering off, their corresponding PFs will be reset and disable
+     * VFs.
+     */
+    if (!pci_is_vf(pci_dev)) {
+        pci_set_enabled(pci_dev, state);
+    }
+}
+
 uint16_t pci_requester_id(PCIDevice *dev);
 
 /* DMA access functions */
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index aa704e8f9d9f..70649236c18a 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,7 +18,6 @@
 typedef struct PCIESriovPF {
     uint16_t num_vfs;   /* Number of virtual functions created */
     uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
-    const char *vfname; /* Reference to the device type used for the VFs */
     PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
 } PCIESriovPF;
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 295a32714a4a..c682c3dcb68e 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2822,7 +2822,7 @@ void pci_set_enabled(PCIDevice *d, bool state)
     memory_region_set_enabled(&d->bus_master_enable_region,
                               (pci_get_word(d->config + PCI_COMMAND)
                                & PCI_COMMAND_MASTER) && d->enabled);
-    if (!d->enabled) {
+    if (d->qdev.realized) {
         pci_device_reset(d);
     }
 }
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index f0bde0d3fc79..faadb0d2ea85 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -20,9 +20,16 @@
 #include "qapi/error.h"
 #include "trace.h"
 
-static PCIDevice *register_vf(PCIDevice *pf, int devfn,
-                              const char *name, uint16_t vf_num);
-static void unregister_vfs(PCIDevice *dev);
+static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs)
+{
+    for (uint16_t i = 0; i < total_vfs; i++) {
+        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
+        object_unparent(OBJECT(vf));
+        object_unref(OBJECT(vf));
+    }
+    g_free(dev->exp.sriov_pf.vf);
+    dev->exp.sriov_pf.vf = NULL;
+}
 
 bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         const char *vfname, uint16_t vf_dev_id,
@@ -30,6 +37,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         uint16_t vf_offset, uint16_t vf_stride,
                         Error **errp)
 {
+    BusState *bus = qdev_get_parent_bus(&dev->qdev);
+    int32_t devfn = dev->devfn + vf_offset;
     uint8_t *cfg = dev->config + offset;
     uint8_t *wmask;
 
@@ -49,7 +58,6 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         offset, PCI_EXT_CAP_SRIOV_SIZEOF);
     dev->exp.sriov_cap = offset;
     dev->exp.sriov_pf.num_vfs = 0;
-    dev->exp.sriov_pf.vfname = g_strdup(vfname);
     dev->exp.sriov_pf.vf = NULL;
 
     pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -83,14 +91,34 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 
     qdev_prop_set_bit(&dev->qdev, "multifunction", true);
 
+    dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs);
+
+    for (uint16_t i = 0; i < total_vfs; i++) {
+        PCIDevice *vf = pci_new(devfn, vfname);
+        vf->exp.sriov_vf.pf = dev;
+        vf->exp.sriov_vf.vf_number = i;
+
+        if (!qdev_realize(&vf->qdev, bus, errp)) {
+            unparent_vfs(dev, i);
+            return false;
+        }
+
+        /* set vid/did according to sr/iov spec - they are not used */
+        pci_config_set_vendor_id(vf->config, 0xffff);
+        pci_config_set_device_id(vf->config, 0xffff);
+
+        dev->exp.sriov_pf.vf[i] = vf;
+        devfn += vf_stride;
+    }
+
     return true;
 }
 
 void pcie_sriov_pf_exit(PCIDevice *dev)
 {
-    unregister_vfs(dev);
-    g_free((char *)dev->exp.sriov_pf.vfname);
-    dev->exp.sriov_pf.vfname = NULL;
+    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
+
+    unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF));
 }
 
 void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
@@ -156,38 +184,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
     }
 }
 
-static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name,
-                              uint16_t vf_num)
-{
-    PCIDevice *dev = pci_new(devfn, name);
-    dev->exp.sriov_vf.pf = pf;
-    dev->exp.sriov_vf.vf_number = vf_num;
-    PCIBus *bus = pci_get_bus(pf);
-    Error *local_err = NULL;
-
-    qdev_realize(&dev->qdev, &bus->qbus, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-        return NULL;
-    }
-
-    /* set vid/did according to sr/iov spec - they are not used */
-    pci_config_set_vendor_id(dev->config, 0xffff);
-    pci_config_set_device_id(dev->config, 0xffff);
-
-    return dev;
-}
-
 static void register_vfs(PCIDevice *dev)
 {
     uint16_t num_vfs;
     uint16_t i;
     uint16_t sriov_cap = dev->exp.sriov_cap;
-    uint16_t vf_offset =
-        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
-    uint16_t vf_stride =
-        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
-    int32_t devfn = dev->devfn + vf_offset;
 
     assert(sriov_cap > 0);
     num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
@@ -195,18 +196,10 @@ static void register_vfs(PCIDevice *dev)
         return;
     }
 
-    dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
-
     trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
                              PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn,
-                                              dev->exp.sriov_pf.vfname, i);
-        if (!dev->exp.sriov_pf.vf[i]) {
-            num_vfs = i;
-            break;
-        }
-        devfn += vf_stride;
+        pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
     }
     dev->exp.sriov_pf.num_vfs = num_vfs;
 }
@@ -219,12 +212,8 @@ static void unregister_vfs(PCIDevice *dev)
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
                                PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
-        object_unparent(OBJECT(vf));
-        object_unref(OBJECT(vf));
+        pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
     }
-    g_free(dev->exp.sriov_pf.vf);
-    dev->exp.sriov_pf.vf = NULL;
     dev->exp.sriov_pf.num_vfs = 0;
 }
 
@@ -246,14 +235,10 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
                              PCI_FUNC(dev->devfn), off, val, len);
 
     if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
-        if (dev->exp.sriov_pf.num_vfs) {
-            if (!(val & PCI_SRIOV_CTRL_VFE)) {
-                unregister_vfs(dev);
-            }
+        if (val & PCI_SRIOV_CTRL_VFE) {
+            register_vfs(dev);
         } else {
-            if (val & PCI_SRIOV_CTRL_VFE) {
-                register_vfs(dev);
-            }
+            unregister_vfs(dev);
         }
     }
 }

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 07/12] pcie_sriov: Release VFs failed to realize
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (5 preceding siblings ...)
  2024-06-27  6:07 ` [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 08/12] pcie_sriov: Remove num_vfs from PCIESriovPF Akihiko Odaki
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

Release VFs failed to realize just as we do in unregister_vfs().

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pcie_sriov.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index faadb0d2ea85..9bd7f8acc3f4 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -99,6 +99,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
         vf->exp.sriov_vf.vf_number = i;
 
         if (!qdev_realize(&vf->qdev, bus, errp)) {
+            object_unparent(OBJECT(vf));
+            object_unref(vf);
             unparent_vfs(dev, i);
             return false;
         }

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 08/12] pcie_sriov: Remove num_vfs from PCIESriovPF
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (6 preceding siblings ...)
  2024-06-27  6:07 ` [PATCH v10 07/12] pcie_sriov: Release VFs failed to realize Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 09/12] pcie_sriov: Register VFs after migration Akihiko Odaki
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

num_vfs is not migrated so use PCI_SRIOV_CTRL_VFE and PCI_SRIOV_NUM_VF
instead.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pcie_sriov.h |  1 -
 hw/pci/pcie_sriov.c         | 28 ++++++++++++++++++++--------
 hw/pci/trace-events         |  2 +-
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 70649236c18a..5148c5b77dd1 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -16,7 +16,6 @@
 #include "hw/pci/pci.h"
 
 typedef struct PCIESriovPF {
-    uint16_t num_vfs;   /* Number of virtual functions created */
     uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
     PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
 } PCIESriovPF;
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 9bd7f8acc3f4..fae6acea4acb 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -57,7 +57,6 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
     pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
                         offset, PCI_EXT_CAP_SRIOV_SIZEOF);
     dev->exp.sriov_cap = offset;
-    dev->exp.sriov_pf.num_vfs = 0;
     dev->exp.sriov_pf.vf = NULL;
 
     pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -186,6 +185,12 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
     }
 }
 
+static void clear_ctrl_vfe(PCIDevice *dev)
+{
+    uint8_t *ctrl = dev->config + dev->exp.sriov_cap + PCI_SRIOV_CTRL;
+    pci_set_word(ctrl, pci_get_word(ctrl) & ~PCI_SRIOV_CTRL_VFE);
+}
+
 static void register_vfs(PCIDevice *dev)
 {
     uint16_t num_vfs;
@@ -195,6 +200,7 @@ static void register_vfs(PCIDevice *dev)
     assert(sriov_cap > 0);
     num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
     if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
+        clear_ctrl_vfe(dev);
         return;
     }
 
@@ -203,20 +209,18 @@ static void register_vfs(PCIDevice *dev)
     for (i = 0; i < num_vfs; i++) {
         pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
     }
-    dev->exp.sriov_pf.num_vfs = num_vfs;
 }
 
 static void unregister_vfs(PCIDevice *dev)
 {
-    uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
     uint16_t i;
+    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
 
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
-                               PCI_FUNC(dev->devfn), num_vfs);
-    for (i = 0; i < num_vfs; i++) {
+                               PCI_FUNC(dev->devfn));
+    for (i = 0; i < pci_get_word(cfg + PCI_SRIOV_TOTAL_VF); i++) {
         pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
     }
-    dev->exp.sriov_pf.num_vfs = 0;
 }
 
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
@@ -242,6 +246,9 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
         } else {
             unregister_vfs(dev);
         }
+    } else if (range_covers_byte(off, len, PCI_SRIOV_NUM_VF)) {
+        clear_ctrl_vfe(dev);
+        unregister_vfs(dev);
     }
 }
 
@@ -304,7 +311,7 @@ PCIDevice *pcie_sriov_get_pf(PCIDevice *dev)
 PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
 {
     assert(!pci_is_vf(dev));
-    if (n < dev->exp.sriov_pf.num_vfs) {
+    if (n < pcie_sriov_num_vfs(dev)) {
         return dev->exp.sriov_pf.vf[n];
     }
     return NULL;
@@ -312,5 +319,10 @@ PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
 
 uint16_t pcie_sriov_num_vfs(PCIDevice *dev)
 {
-    return dev->exp.sriov_pf.num_vfs;
+    uint16_t sriov_cap = dev->exp.sriov_cap;
+    uint8_t *cfg = dev->config + sriov_cap;
+
+    return sriov_cap &&
+           (pci_get_word(cfg + PCI_SRIOV_CTRL) & PCI_SRIOV_CTRL_VFE) ?
+           pci_get_word(cfg + PCI_SRIOV_NUM_VF) : 0;
 }
diff --git a/hw/pci/trace-events b/hw/pci/trace-events
index 19643aa8c6b0..e98f575a9d19 100644
--- a/hw/pci/trace-events
+++ b/hw/pci/trace-events
@@ -14,7 +14,7 @@ msix_write_config(char *name, bool enabled, bool masked) "dev %s enabled %d mask
 
 # hw/pci/pcie_sriov.c
 sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: creating %d vf devs"
-sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: Unregistering %d vf devs"
+sriov_unregister_vfs(const char *name, int slot, int function) "%s %02x:%x: Unregistering vf devs"
 sriov_config_write(const char *name, int slot, int fun, uint32_t offset, uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"
 
 # pcie.c

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 09/12] pcie_sriov: Register VFs after migration
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (7 preceding siblings ...)
  2024-06-27  6:07 ` [PATCH v10 08/12] pcie_sriov: Remove num_vfs from PCIESriovPF Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:07 ` [PATCH v10 10/12] hw/pci: Replace -1 with UINT32_MAX for romsize Akihiko Odaki
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

pcie_sriov doesn't have code to restore its state after migration, but
igb, which uses pcie_sriov, naively claimed its migration capability.

Add code to register VFs after migration and fix igb migration.

Fixes: 3a977deebe6b ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pcie_sriov.h | 2 ++
 hw/pci/pci.c                | 7 +++++++
 hw/pci/pcie_sriov.c         | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 5148c5b77dd1..c5d2d318d330 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -57,6 +57,8 @@ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
                              uint32_t val, int len);
 
+void pcie_sriov_pf_post_load(PCIDevice *dev);
+
 /* Reset SR/IOV */
 void pcie_sriov_pf_reset(PCIDevice *dev);
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index c682c3dcb68e..af1c743611af 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -733,10 +733,17 @@ static bool migrate_is_not_pcie(void *opaque, int version_id)
     return !pci_is_express((PCIDevice *)opaque);
 }
 
+static int pci_post_load(void *opaque, int version_id)
+{
+    pcie_sriov_pf_post_load(opaque);
+    return 0;
+}
+
 const VMStateDescription vmstate_pci_device = {
     .name = "PCIDevice",
     .version_id = 2,
     .minimum_version_id = 1,
+    .post_load = pci_post_load,
     .fields = (const VMStateField[]) {
         VMSTATE_INT32_POSITIVE_LE(version_id, PCIDevice),
         VMSTATE_BUFFER_UNSAFE_INFO_TEST(config, PCIDevice,
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index fae6acea4acb..56523ab4e833 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -252,6 +252,13 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
     }
 }
 
+void pcie_sriov_pf_post_load(PCIDevice *dev)
+{
+    if (dev->exp.sriov_cap) {
+        register_vfs(dev);
+    }
+}
+
 
 /* Reset SR/IOV */
 void pcie_sriov_pf_reset(PCIDevice *dev)

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 10/12] hw/pci: Replace -1 with UINT32_MAX for romsize
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (8 preceding siblings ...)
  2024-06-27  6:07 ` [PATCH v10 09/12] pcie_sriov: Register VFs after migration Akihiko Odaki
@ 2024-06-27  6:07 ` Akihiko Odaki
  2024-06-27  6:08 ` [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto Akihiko Odaki
  2024-06-27  6:08 ` [PATCH v10 12/12] hw/qdev: Remove opts member Akihiko Odaki
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

romsize is an uint32_t variable. Specifying -1 as an uint32_t value is
obscure way to denote UINT32_MAX.

Worse, if int is wider than 32-bit, it will change the behavior of a
construct like the following:
romsize = -1;
if (romsize != -1) {
    ...
}

When -1 is assigned to romsize, -1 will be implicitly casted into
uint32_t, resulting in UINT32_MAX. On contrary, when evaluating
romsize != -1, romsize will be casted into int, and it will be a
comparison of UINT32_MAX and -1, and result in false.

Replace -1 with UINT32_MAX for statements involving the variable to
clarify the intent and prevent potential breakage.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 hw/pci/pci.c             | 8 ++++----
 hw/xen/xen_pt_load_rom.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index af1c743611af..1eb6abf534ca 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -70,7 +70,7 @@ static bool pcie_has_upstream_port(PCIDevice *dev);
 static Property pci_props[] = {
     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
     DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
-    DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, -1),
+    DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, UINT32_MAX),
     DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
     DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
                     QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
@@ -2073,7 +2073,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
                                  g_cmp_uint32, NULL);
     }
 
-    if (pci_dev->romsize != -1 && !is_power_of_2(pci_dev->romsize)) {
+    if (pci_dev->romsize != UINT32_MAX && !is_power_of_2(pci_dev->romsize)) {
         error_setg(errp, "ROM size %u is not a power of two", pci_dev->romsize);
         return;
     }
@@ -2359,7 +2359,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom,
         return;
     }
 
-    if (load_file || pdev->romsize == -1) {
+    if (load_file || pdev->romsize == UINT32_MAX) {
         path = qemu_find_file(QEMU_FILE_TYPE_BIOS, pdev->romfile);
         if (path == NULL) {
             path = g_strdup(pdev->romfile);
@@ -2378,7 +2378,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom,
                        pdev->romfile);
             return;
         }
-        if (pdev->romsize != -1) {
+        if (pdev->romsize != UINT_MAX) {
             if (size > pdev->romsize) {
                 error_setg(errp, "romfile \"%s\" (%u bytes) "
                            "is too large for ROM size %u",
diff --git a/hw/xen/xen_pt_load_rom.c b/hw/xen/xen_pt_load_rom.c
index 03422a8a7148..6bc64acd3352 100644
--- a/hw/xen/xen_pt_load_rom.c
+++ b/hw/xen/xen_pt_load_rom.c
@@ -53,7 +53,7 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
     }
     fseek(fp, 0, SEEK_SET);
 
-    if (dev->romsize != -1) {
+    if (dev->romsize != UINT_MAX) {
         if (st.st_size > dev->romsize) {
             error_report("ROM BAR \"%s\" (%ld bytes) is too large for ROM size %u",
                          rom_file, (long) st.st_size, dev->romsize);

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (9 preceding siblings ...)
  2024-06-27  6:07 ` [PATCH v10 10/12] hw/pci: Replace -1 with UINT32_MAX for romsize Akihiko Odaki
@ 2024-06-27  6:08 ` Akihiko Odaki
  2024-07-02 13:58   ` Michael S. Tsirkin
  2024-06-27  6:08 ` [PATCH v10 12/12] hw/qdev: Remove opts member Akihiko Odaki
  11 siblings, 1 reply; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:08 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

rom_bar is tristate but was defined as uint32_t so convert it into
OnOffAuto.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 docs/igd-assign.txt               |  2 +-
 include/hw/pci/pci_device.h       |  2 +-
 hw/pci/pci.c                      |  4 ++--
 hw/vfio/pci-quirks.c              |  2 +-
 hw/vfio/pci.c                     | 11 +++++------
 hw/xen/xen_pt_load_rom.c          |  4 ++--
 tests/qtest/virtio-net-failover.c | 32 ++++++++++++++++----------------
 7 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
index e17bb50789ad..35c6c8e28493 100644
--- a/docs/igd-assign.txt
+++ b/docs/igd-assign.txt
@@ -35,7 +35,7 @@ IGD has two different modes for assignment using vfio-pci:
       ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
       PCI address 1f.0.
     * The IGD device must have a VGA ROM, either provided via the romfile
-      option or loaded automatically through vfio (standard).  rombar=0
+      option or loaded automatically through vfio (standard).  rombar=off
       will disable legacy mode support.
     * Hotplug of the IGD device is not supported.
     * The IGD device must be a SandyBridge or newer model device.
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index ca151325085d..49b341ce2e27 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -147,7 +147,7 @@ struct PCIDevice {
     uint32_t romsize;
     bool has_rom;
     MemoryRegion rom;
-    uint32_t rom_bar;
+    OnOffAuto rom_bar;
 
     /* INTx routing notifier */
     PCIINTxRoutingNotifier intx_routing_notifier;
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 1eb6abf534ca..901f5460d774 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -71,7 +71,7 @@ static Property pci_props[] = {
     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
     DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
     DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, UINT32_MAX),
-    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
+    DEFINE_PROP_ON_OFF_AUTO("rombar",  PCIDevice, rom_bar, ON_OFF_AUTO_AUTO),
     DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
                     QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
     DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present,
@@ -2334,7 +2334,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom,
         return;
     }
 
-    if (!pdev->rom_bar) {
+    if (pdev->rom_bar == ON_OFF_AUTO_OFF) {
         /*
          * Load rom via fw_cfg instead of creating a rom bar,
          * for 0.11 compatibility.
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 39dae72497e0..0e920ed0691a 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -33,7 +33,7 @@
  * execution as noticed with the BCM 57810 card for lack of a
  * more better way to handle such issues.
  * The  user can still override by specifying a romfile or
- * rombar=1.
+ * rombar=on.
  * Please see https://bugs.launchpad.net/qemu/+bug/1284874
  * for an analysis of the 57810 card hang. When adding
  * a new vendor id/device id combination below, please also add
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 74a79bdf61f9..4c4d9dc81efb 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -902,7 +902,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
         error_report("vfio-pci: Cannot read device rom at "
                     "%s", vdev->vbasedev.name);
         error_printf("Device option ROM contents are probably invalid "
-                    "(check dmesg).\nSkip option ROM probe with rombar=0, "
+                    "(check dmesg).\nSkip option ROM probe with rombar=off, "
                     "or load from file with romfile=\n");
         return;
     }
@@ -1012,11 +1012,10 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 {
     uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
     off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
-    DeviceState *dev = DEVICE(vdev);
     char *name;
     int fd = vdev->vbasedev.fd;
 
-    if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
+    if (vdev->pdev.romfile || vdev->pdev.rom_bar == ON_OFF_AUTO_OFF) {
         /* Since pci handles romfile, just print a message and return */
         if (vfio_opt_rom_in_denylist(vdev) && vdev->pdev.romfile) {
             warn_report("Device at %s is known to cause system instability"
@@ -1046,17 +1045,17 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
     }
 
     if (vfio_opt_rom_in_denylist(vdev)) {
-        if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
+        if (vdev->pdev.rom_bar == ON_OFF_AUTO_ON) {
             warn_report("Device at %s is known to cause system instability"
                         " issues during option rom execution",
                         vdev->vbasedev.name);
             error_printf("Proceeding anyway since user specified"
-                         " non zero value for rombar\n");
+                         " on for rombar\n");
         } else {
             warn_report("Rom loading for device at %s has been disabled"
                         " due to system instability issues",
                         vdev->vbasedev.name);
-            error_printf("Specify rombar=1 or romfile to force\n");
+            error_printf("Specify rombar=on or romfile to force\n");
             return;
         }
     }
diff --git a/hw/xen/xen_pt_load_rom.c b/hw/xen/xen_pt_load_rom.c
index 6bc64acd3352..025a6b25a916 100644
--- a/hw/xen/xen_pt_load_rom.c
+++ b/hw/xen/xen_pt_load_rom.c
@@ -26,7 +26,7 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
     Object *owner = OBJECT(dev);
 
     /* If loading ROM from file, pci handles it */
-    if (dev->romfile || !dev->rom_bar) {
+    if (dev->romfile || dev->rom_bar == ON_OFF_AUTO_OFF) {
         return NULL;
     }
 
@@ -71,7 +71,7 @@ void *pci_assign_dev_load_option_rom(PCIDevice *dev,
     if (!fread(ptr, 1, st.st_size, fp)) {
         error_report("pci-assign: Cannot read from host %s", rom_file);
         error_printf("Device option ROM contents are probably invalid "
-                     "(check dmesg).\nSkip option ROM probe with rombar=0, "
+                     "(check dmesg).\nSkip option ROM probe with rombar=off, "
                      "or load from file with romfile=\n");
         goto close_rom;
     }
diff --git a/tests/qtest/virtio-net-failover.c b/tests/qtest/virtio-net-failover.c
index 73dfabc2728b..f65b97683fb6 100644
--- a/tests/qtest/virtio-net-failover.c
+++ b/tests/qtest/virtio-net-failover.c
@@ -568,7 +568,7 @@ static void test_hotplug_2_reverse(void)
                          "{'bus': 'root0',"
                          "'failover': true,"
                          "'netdev': 'hs0',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_STANDBY0"'}");
 
@@ -655,7 +655,7 @@ static void test_migrate_out(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -765,7 +765,7 @@ static void test_migrate_in(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -819,7 +819,7 @@ static void test_off_migrate_out(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -887,7 +887,7 @@ static void test_off_migrate_in(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -938,7 +938,7 @@ static void test_guest_off_migrate_out(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1014,7 +1014,7 @@ static void test_guest_off_migrate_in(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1065,7 +1065,7 @@ static void test_migrate_guest_off_abort(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1170,7 +1170,7 @@ static void test_migrate_abort_wait_unplug(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1259,7 +1259,7 @@ static void test_migrate_abort_active(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1358,7 +1358,7 @@ static void test_migrate_off_abort(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1450,7 +1450,7 @@ static void test_migrate_abort_timeout(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1543,7 +1543,7 @@ static void test_multi_out(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1574,7 +1574,7 @@ static void test_multi_out(gconstpointer opaque)
                          "{'bus': 'root3',"
                          "'failover_pair_id': 'standby1',"
                          "'netdev': 'hs3',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY1"'}");
 
@@ -1713,7 +1713,7 @@ static void test_multi_in(gconstpointer opaque)
                          "{'bus': 'root1',"
                          "'failover_pair_id': 'standby0',"
                          "'netdev': 'hs1',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY0"'}");
 
@@ -1737,7 +1737,7 @@ static void test_multi_in(gconstpointer opaque)
                          "{'bus': 'root3',"
                          "'failover_pair_id': 'standby1',"
                          "'netdev': 'hs3',"
-                         "'rombar': 0,"
+                         "'rombar': 'off',"
                          "'romfile': '',"
                          "'mac': '"MAC_PRIMARY1"'}");
 

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v10 12/12] hw/qdev: Remove opts member
  2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (10 preceding siblings ...)
  2024-06-27  6:08 ` [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto Akihiko Odaki
@ 2024-06-27  6:08 ` Akihiko Odaki
  11 siblings, 0 replies; 23+ messages in thread
From: Akihiko Odaki @ 2024-06-27  6:08 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Akihiko Odaki

It is no longer used.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
---
 include/hw/qdev-core.h |  4 ----
 hw/core/qdev.c         |  1 -
 system/qdev-monitor.c  | 12 +++++++-----
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 5336728a23f6..40f2d185f17c 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -237,10 +237,6 @@ struct DeviceState {
      * @pending_deleted_expires_ms: optional timeout for deletion events
      */
     int64_t pending_deleted_expires_ms;
-    /**
-     * @opts: QDict of options for the device
-     */
-    QDict *opts;
     /**
      * @hotplugged: was device added after PHASE_MACHINE_READY?
      */
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index f3a996f57dee..2fc84699d432 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -706,7 +706,6 @@ static void device_finalize(Object *obj)
         dev->canonical_path = NULL;
     }
 
-    qobject_unref(dev->opts);
     g_free(dev->id);
 }
 
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 6af6ef7d667f..3551989d5153 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -624,6 +624,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
     char *id;
     DeviceState *dev = NULL;
     BusState *bus = NULL;
+    QDict *properties;
 
     driver = qdict_get_try_str(opts, "driver");
     if (!driver) {
@@ -705,13 +706,14 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
     }
 
     /* set properties */
-    dev->opts = qdict_clone_shallow(opts);
-    qdict_del(dev->opts, "driver");
-    qdict_del(dev->opts, "bus");
-    qdict_del(dev->opts, "id");
+    properties = qdict_clone_shallow(opts);
+    qdict_del(properties, "driver");
+    qdict_del(properties, "bus");
+    qdict_del(properties, "id");
 
-    object_set_properties_from_keyval(&dev->parent_obj, dev->opts, from_json,
+    object_set_properties_from_keyval(&dev->parent_obj, properties, from_json,
                                       errp);
+    qobject_unref(properties);
     if (*errp) {
         goto err_del_dev;
     }

-- 
2.45.2



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto
  2024-06-27  6:08 ` [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto Akihiko Odaki
@ 2024-07-02 13:58   ` Michael S. Tsirkin
  2024-07-03  2:15     ` BALATON Zoltan
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-07-02 13:58 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, Markus Armbruster, qemu-devel, qemu-block

On Thu, Jun 27, 2024 at 03:08:00PM +0900, Akihiko Odaki wrote:
> rom_bar is tristate but was defined as uint32_t so convert it into
> OnOffAuto.
> 
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>

Commit log should explain why this is an improvement,
not just what's done.


> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> index e17bb50789ad..35c6c8e28493 100644
> --- a/docs/igd-assign.txt
> +++ b/docs/igd-assign.txt
> @@ -35,7 +35,7 @@ IGD has two different modes for assignment using vfio-pci:
>        ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
>        PCI address 1f.0.
>      * The IGD device must have a VGA ROM, either provided via the romfile
> -      option or loaded automatically through vfio (standard).  rombar=0
> +      option or loaded automatically through vfio (standard).  rombar=off
>        will disable legacy mode support.
>      * Hotplug of the IGD device is not supported.
>      * The IGD device must be a SandyBridge or newer model device.

...

> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index 39dae72497e0..0e920ed0691a 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -33,7 +33,7 @@
>   * execution as noticed with the BCM 57810 card for lack of a
>   * more better way to handle such issues.
>   * The  user can still override by specifying a romfile or
> - * rombar=1.
> + * rombar=on.
>   * Please see https://bugs.launchpad.net/qemu/+bug/1284874
>   * for an analysis of the 57810 card hang. When adding
>   * a new vendor id/device id combination below, please also add


So we are apparently breaking a bunch of users who followed
documentation to the dot. Why is this a good idea?


-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto
  2024-07-02 13:58   ` Michael S. Tsirkin
@ 2024-07-03  2:15     ` BALATON Zoltan
  2024-07-03  6:00       ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: BALATON Zoltan @ 2024-07-03  2:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Akihiko Odaki, Philippe Mathieu-Daudé, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster,
	qemu-devel, qemu-block

On Tue, 2 Jul 2024, Michael S. Tsirkin wrote:
> On Thu, Jun 27, 2024 at 03:08:00PM +0900, Akihiko Odaki wrote:
>> rom_bar is tristate but was defined as uint32_t so convert it into
>> OnOffAuto.
>>
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>
> Commit log should explain why this is an improvement,
> not just what's done.
>
>
>> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
>> index e17bb50789ad..35c6c8e28493 100644
>> --- a/docs/igd-assign.txt
>> +++ b/docs/igd-assign.txt
>> @@ -35,7 +35,7 @@ IGD has two different modes for assignment using vfio-pci:
>>        ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
>>        PCI address 1f.0.
>>      * The IGD device must have a VGA ROM, either provided via the romfile
>> -      option or loaded automatically through vfio (standard).  rombar=0
>> +      option or loaded automatically through vfio (standard).  rombar=off
>>        will disable legacy mode support.
>>      * Hotplug of the IGD device is not supported.
>>      * The IGD device must be a SandyBridge or newer model device.
>
> ...
>
>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>> index 39dae72497e0..0e920ed0691a 100644
>> --- a/hw/vfio/pci-quirks.c
>> +++ b/hw/vfio/pci-quirks.c
>> @@ -33,7 +33,7 @@
>>   * execution as noticed with the BCM 57810 card for lack of a
>>   * more better way to handle such issues.
>>   * The  user can still override by specifying a romfile or
>> - * rombar=1.
>> + * rombar=on.
>>   * Please see https://bugs.launchpad.net/qemu/+bug/1284874
>>   * for an analysis of the 57810 card hang. When adding
>>   * a new vendor id/device id combination below, please also add
>
>
> So we are apparently breaking a bunch of users who followed
> documentation to the dot. Why is this a good idea?

On/off is clearer than 1/0. But isn't 1/0 a synonym for on/off so 
previous command lines would still work?

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto
  2024-07-03  2:15     ` BALATON Zoltan
@ 2024-07-03  6:00       ` Michael S. Tsirkin
  2024-07-03 11:00         ` BALATON Zoltan
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-07-03  6:00 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Akihiko Odaki, Philippe Mathieu-Daudé, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster,
	qemu-devel, qemu-block

On Wed, Jul 03, 2024 at 04:15:23AM +0200, BALATON Zoltan wrote:
> On Tue, 2 Jul 2024, Michael S. Tsirkin wrote:
> > On Thu, Jun 27, 2024 at 03:08:00PM +0900, Akihiko Odaki wrote:
> > > rom_bar is tristate but was defined as uint32_t so convert it into
> > > OnOffAuto.
> > > 
> > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > 
> > Commit log should explain why this is an improvement,
> > not just what's done.
> > 
> > 
> > > diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> > > index e17bb50789ad..35c6c8e28493 100644
> > > --- a/docs/igd-assign.txt
> > > +++ b/docs/igd-assign.txt
> > > @@ -35,7 +35,7 @@ IGD has two different modes for assignment using vfio-pci:
> > >        ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
> > >        PCI address 1f.0.
> > >      * The IGD device must have a VGA ROM, either provided via the romfile
> > > -      option or loaded automatically through vfio (standard).  rombar=0
> > > +      option or loaded automatically through vfio (standard).  rombar=off
> > >        will disable legacy mode support.
> > >      * Hotplug of the IGD device is not supported.
> > >      * The IGD device must be a SandyBridge or newer model device.
> > 
> > ...
> > 
> > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > index 39dae72497e0..0e920ed0691a 100644
> > > --- a/hw/vfio/pci-quirks.c
> > > +++ b/hw/vfio/pci-quirks.c
> > > @@ -33,7 +33,7 @@
> > >   * execution as noticed with the BCM 57810 card for lack of a
> > >   * more better way to handle such issues.
> > >   * The  user can still override by specifying a romfile or
> > > - * rombar=1.
> > > + * rombar=on.
> > >   * Please see https://bugs.launchpad.net/qemu/+bug/1284874
> > >   * for an analysis of the 57810 card hang. When adding
> > >   * a new vendor id/device id combination below, please also add
> > 
> > 
> > So we are apparently breaking a bunch of users who followed
> > documentation to the dot. Why is this a good idea?
> 
> On/off is clearer than 1/0. But isn't 1/0 a synonym for on/off so previous
> command lines would still work?
> 
> Regards,
> BALATON Zoltan

I see nothing in code that would make it so:


const QEnumLookup OnOffAuto_lookup = {
    .array = (const char *const[]) {
        [ON_OFF_AUTO_AUTO] = "auto",
        [ON_OFF_AUTO_ON] = "on",
        [ON_OFF_AUTO_OFF] = "off",
    },
    .size = ON_OFF_AUTO__MAX
};

I also tried with an existing property:

$ ./qemu-system-x86_64 -device intel-hda,msi=0
qemu-system-x86_64: -device intel-hda,msi=0: Parameter 'msi' does not accept value '0'


-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto
  2024-07-03  6:00       ` Michael S. Tsirkin
@ 2024-07-03 11:00         ` BALATON Zoltan
  2024-07-03 13:35           ` Michael S. Tsirkin
  2024-07-03 14:56           ` Alex Williamson
  0 siblings, 2 replies; 23+ messages in thread
From: BALATON Zoltan @ 2024-07-03 11:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Akihiko Odaki, Philippe Mathieu-Daudé, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster,
	qemu-devel, qemu-block

On Wed, 3 Jul 2024, Michael S. Tsirkin wrote:
> On Wed, Jul 03, 2024 at 04:15:23AM +0200, BALATON Zoltan wrote:
>> On Tue, 2 Jul 2024, Michael S. Tsirkin wrote:
>>> On Thu, Jun 27, 2024 at 03:08:00PM +0900, Akihiko Odaki wrote:
>>>> rom_bar is tristate but was defined as uint32_t so convert it into
>>>> OnOffAuto.
>>>>
>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>
>>> Commit log should explain why this is an improvement,
>>> not just what's done.
>>>
>>>
>>>> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
>>>> index e17bb50789ad..35c6c8e28493 100644
>>>> --- a/docs/igd-assign.txt
>>>> +++ b/docs/igd-assign.txt
>>>> @@ -35,7 +35,7 @@ IGD has two different modes for assignment using vfio-pci:
>>>>        ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
>>>>        PCI address 1f.0.
>>>>      * The IGD device must have a VGA ROM, either provided via the romfile
>>>> -      option or loaded automatically through vfio (standard).  rombar=0
>>>> +      option or loaded automatically through vfio (standard).  rombar=off
>>>>        will disable legacy mode support.
>>>>      * Hotplug of the IGD device is not supported.
>>>>      * The IGD device must be a SandyBridge or newer model device.
>>>
>>> ...
>>>
>>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>>>> index 39dae72497e0..0e920ed0691a 100644
>>>> --- a/hw/vfio/pci-quirks.c
>>>> +++ b/hw/vfio/pci-quirks.c
>>>> @@ -33,7 +33,7 @@
>>>>   * execution as noticed with the BCM 57810 card for lack of a
>>>>   * more better way to handle such issues.
>>>>   * The  user can still override by specifying a romfile or
>>>> - * rombar=1.
>>>> + * rombar=on.
>>>>   * Please see https://bugs.launchpad.net/qemu/+bug/1284874
>>>>   * for an analysis of the 57810 card hang. When adding
>>>>   * a new vendor id/device id combination below, please also add
>>>
>>>
>>> So we are apparently breaking a bunch of users who followed
>>> documentation to the dot. Why is this a good idea?
>>
>> On/off is clearer than 1/0. But isn't 1/0 a synonym for on/off so previous
>> command lines would still work?
>>
>> Regards,
>> BALATON Zoltan
>
> I see nothing in code that would make it so:
>
>
> const QEnumLookup OnOffAuto_lookup = {
>    .array = (const char *const[]) {
>        [ON_OFF_AUTO_AUTO] = "auto",
>        [ON_OFF_AUTO_ON] = "on",
>        [ON_OFF_AUTO_OFF] = "off",
>    },
>    .size = ON_OFF_AUTO__MAX
> };
>
> I also tried with an existing property:
>
> $ ./qemu-system-x86_64 -device intel-hda,msi=0
> qemu-system-x86_64: -device intel-hda,msi=0: Parameter 'msi' does not accept value '0'

Then it was probably bit properties that also accept 0/1, on/off, 
true/false. Maybe similar aliases could be added to on/off/auto?

In any case when I first saw rombar I thought it would set the BAR of the 
ROM so wondered why it's 1 and not 5 or 6 or an offset. So on/off is 
clearer in this case.

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto
  2024-07-03 11:00         ` BALATON Zoltan
@ 2024-07-03 13:35           ` Michael S. Tsirkin
  2024-07-03 14:56           ` Alex Williamson
  1 sibling, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-07-03 13:35 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Akihiko Odaki, Philippe Mathieu-Daudé, Marcel Apfelbaum,
	Alex Williamson, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster,
	qemu-devel, qemu-block

On Wed, Jul 03, 2024 at 01:00:21PM +0200, BALATON Zoltan wrote:
> On Wed, 3 Jul 2024, Michael S. Tsirkin wrote:
> > On Wed, Jul 03, 2024 at 04:15:23AM +0200, BALATON Zoltan wrote:
> > > On Tue, 2 Jul 2024, Michael S. Tsirkin wrote:
> > > > On Thu, Jun 27, 2024 at 03:08:00PM +0900, Akihiko Odaki wrote:
> > > > > rom_bar is tristate but was defined as uint32_t so convert it into
> > > > > OnOffAuto.
> > > > > 
> > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > 
> > > > Commit log should explain why this is an improvement,
> > > > not just what's done.
> > > > 
> > > > 
> > > > > diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> > > > > index e17bb50789ad..35c6c8e28493 100644
> > > > > --- a/docs/igd-assign.txt
> > > > > +++ b/docs/igd-assign.txt
> > > > > @@ -35,7 +35,7 @@ IGD has two different modes for assignment using vfio-pci:
> > > > >        ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
> > > > >        PCI address 1f.0.
> > > > >      * The IGD device must have a VGA ROM, either provided via the romfile
> > > > > -      option or loaded automatically through vfio (standard).  rombar=0
> > > > > +      option or loaded automatically through vfio (standard).  rombar=off
> > > > >        will disable legacy mode support.
> > > > >      * Hotplug of the IGD device is not supported.
> > > > >      * The IGD device must be a SandyBridge or newer model device.
> > > > 
> > > > ...
> > > > 
> > > > > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > > > > index 39dae72497e0..0e920ed0691a 100644
> > > > > --- a/hw/vfio/pci-quirks.c
> > > > > +++ b/hw/vfio/pci-quirks.c
> > > > > @@ -33,7 +33,7 @@
> > > > >   * execution as noticed with the BCM 57810 card for lack of a
> > > > >   * more better way to handle such issues.
> > > > >   * The  user can still override by specifying a romfile or
> > > > > - * rombar=1.
> > > > > + * rombar=on.
> > > > >   * Please see https://bugs.launchpad.net/qemu/+bug/1284874
> > > > >   * for an analysis of the 57810 card hang. When adding
> > > > >   * a new vendor id/device id combination below, please also add
> > > > 
> > > > 
> > > > So we are apparently breaking a bunch of users who followed
> > > > documentation to the dot. Why is this a good idea?
> > > 
> > > On/off is clearer than 1/0. But isn't 1/0 a synonym for on/off so previous
> > > command lines would still work?
> > > 
> > > Regards,
> > > BALATON Zoltan
> > 
> > I see nothing in code that would make it so:
> > 
> > 
> > const QEnumLookup OnOffAuto_lookup = {
> >    .array = (const char *const[]) {
> >        [ON_OFF_AUTO_AUTO] = "auto",
> >        [ON_OFF_AUTO_ON] = "on",
> >        [ON_OFF_AUTO_OFF] = "off",
> >    },
> >    .size = ON_OFF_AUTO__MAX
> > };
> > 
> > I also tried with an existing property:
> > 
> > $ ./qemu-system-x86_64 -device intel-hda,msi=0
> > qemu-system-x86_64: -device intel-hda,msi=0: Parameter 'msi' does not accept value '0'
> 
> Then it was probably bit properties that also accept 0/1, on/off,
> true/false.

I mean, the code is open, why do you keep guessing?


No, these reuse the bool parsing logic:

static void prop_get_bit(Object *obj, Visitor *v, const char *name,
                         void *opaque, Error **errp)
{
    Property *prop = opaque;
    uint32_t *p = object_field_prop_ptr(obj, prop);
    bool value = (*p & qdev_get_prop_mask(prop)) != 0;

    visit_type_bool(v, name, &value, errp);
}

and that never accepted 0 or 1:


bool qapi_bool_parse(const char *name, const char *value, bool *obj, Error **errp)
{       
    if (g_str_equal(value, "on") ||
        g_str_equal(value, "yes") ||
        g_str_equal(value, "true") ||
        g_str_equal(value, "y")) {
        *obj = true; 
        return true;
    }
    if (g_str_equal(value, "off") ||
        g_str_equal(value, "no") ||
        g_str_equal(value, "false") ||
        g_str_equal(value, "n")) {
        *obj = false;
        return true;
    }
    
    error_setg(errp, QERR_INVALID_PARAMETER_VALUE, name,
               "'on' or 'off'");
    return false;
}



> Maybe similar aliases could be added to on/off/auto?

Could be, but even then switching to that would mean that user sets 1
but query returns "on".  Might or might not surprise some users.

Adding true/false yes/no y/n aliases to on/off/auto might make sense
though, for consistency. Donnu if QAPI guys will agree, though,
and not directly related to this patchset.

One other idea is to add a generic way to detect that a property is set
by user. This requirement comes up, once in a while.




> In any case when I first saw rombar I thought it would set the BAR of the
> ROM so wondered why it's 1 and not 5 or 6 or an offset. So on/off is clearer
> in this case.
> 
> Regards,
> BALATON Zoltan


I agree here, but it's been here for a long time.

-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto
  2024-07-03 11:00         ` BALATON Zoltan
  2024-07-03 13:35           ` Michael S. Tsirkin
@ 2024-07-03 14:56           ` Alex Williamson
  1 sibling, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2024-07-03 14:56 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Michael S. Tsirkin, Akihiko Odaki, Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Cédric Le Goater, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster,
	qemu-devel, qemu-block

On Wed, 3 Jul 2024 13:00:21 +0200 (CEST)
BALATON Zoltan <balaton@eik.bme.hu> wrote:

> On Wed, 3 Jul 2024, Michael S. Tsirkin wrote:
> > On Wed, Jul 03, 2024 at 04:15:23AM +0200, BALATON Zoltan wrote:  
> >> On Tue, 2 Jul 2024, Michael S. Tsirkin wrote:  
> >>> On Thu, Jun 27, 2024 at 03:08:00PM +0900, Akihiko Odaki wrote:  
> >>>> rom_bar is tristate but was defined as uint32_t so convert it into
> >>>> OnOffAuto.
> >>>>
> >>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>  
> >>>
> >>> Commit log should explain why this is an improvement,
> >>> not just what's done.
> >>>
> >>>  
> >>>> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> >>>> index e17bb50789ad..35c6c8e28493 100644
> >>>> --- a/docs/igd-assign.txt
> >>>> +++ b/docs/igd-assign.txt
> >>>> @@ -35,7 +35,7 @@ IGD has two different modes for assignment using vfio-pci:
> >>>>        ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
> >>>>        PCI address 1f.0.
> >>>>      * The IGD device must have a VGA ROM, either provided via the romfile
> >>>> -      option or loaded automatically through vfio (standard).  rombar=0
> >>>> +      option or loaded automatically through vfio (standard).  rombar=off
> >>>>        will disable legacy mode support.
> >>>>      * Hotplug of the IGD device is not supported.
> >>>>      * The IGD device must be a SandyBridge or newer model device.  
> >>>
> >>> ...
> >>>  
> >>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> >>>> index 39dae72497e0..0e920ed0691a 100644
> >>>> --- a/hw/vfio/pci-quirks.c
> >>>> +++ b/hw/vfio/pci-quirks.c
> >>>> @@ -33,7 +33,7 @@
> >>>>   * execution as noticed with the BCM 57810 card for lack of a
> >>>>   * more better way to handle such issues.
> >>>>   * The  user can still override by specifying a romfile or
> >>>> - * rombar=1.
> >>>> + * rombar=on.
> >>>>   * Please see https://bugs.launchpad.net/qemu/+bug/1284874
> >>>>   * for an analysis of the 57810 card hang. When adding
> >>>>   * a new vendor id/device id combination below, please also add  
> >>>
> >>>
> >>> So we are apparently breaking a bunch of users who followed
> >>> documentation to the dot. Why is this a good idea?  
> >>
> >> On/off is clearer than 1/0. But isn't 1/0 a synonym for on/off so previous
> >> command lines would still work?
> >>
> >> Regards,
> >> BALATON Zoltan  
> >
> > I see nothing in code that would make it so:
> >
> >
> > const QEnumLookup OnOffAuto_lookup = {
> >    .array = (const char *const[]) {
> >        [ON_OFF_AUTO_AUTO] = "auto",
> >        [ON_OFF_AUTO_ON] = "on",
> >        [ON_OFF_AUTO_OFF] = "off",
> >    },
> >    .size = ON_OFF_AUTO__MAX
> > };
> >
> > I also tried with an existing property:
> >
> > $ ./qemu-system-x86_64 -device intel-hda,msi=0
> > qemu-system-x86_64: -device intel-hda,msi=0: Parameter 'msi' does not accept value '0'  
> 
> Then it was probably bit properties that also accept 0/1, on/off, 
> true/false. Maybe similar aliases could be added to on/off/auto?
> 
> In any case when I first saw rombar I thought it would set the BAR of the 
> ROM so wondered why it's 1 and not 5 or 6 or an offset. So on/off is 
> clearer in this case.

There's only one PCI spec defined offset for the ROM BAR.  Yes, the
option could be more clear but relocating the ROM to a different
regular BAR offset is invalid.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances
  2024-06-27  6:07 ` [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
@ 2024-07-10  6:37   ` Cédric Le Goater
  2024-07-10 10:52     ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Cédric Le Goater @ 2024-07-10  6:37 UTC (permalink / raw)
  To: Akihiko Odaki, Philippe Mathieu-Daudé, Michael S. Tsirkin,
	Marcel Apfelbaum, Alex Williamson, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster
  Cc: qemu-devel, qemu-block, Thomas Huth

Hello,

This change introduced a regression on s390x. I could have spotted it
earlier. Sorry about that. Here is the scenario,

QEMU now creates automatically the PCI device objects representing the
VFs when the PF device is realized in pcie_sriov_pf_init(). This is
good to report errors early but it has an important drawback.

On s390x, PCI devices have a dual S390PCIBusDevice object. This device
model has 'uid' and 'fid' properties which can be either set by the VMM
or, if not, auto-generated by the S390PCIBusDevice realize handler. In
the VF case, these ids are auto-generated by QEMU and they can possibly
conflict with the uid number space of libvirt. The conflict is detected
when the machine is created and the start is aborted with a message :

   2024-07-08T12:51:42.876883Z qemu-system-s390x: -device {"driver":"zpci","uid":17,"fid":16,"target":"hostdev0","id":"zpci17"}: uid 17 already in use

This problem can occur today with a s390x VM using an IGB device.

It worked fine when the VFs were created at OS runtime because the initial
topology of the machine was in place. Adding VFs was more or less like
hotplug. AIUI, libvirt should have full control on the machine topology
and so, creating VFs in QEMU at init time in the back of libvirt seems
like a violation of this rule.

That said, the s390x case is specific and could perhaps be handled in a
special way.

Thanks,

C.



On 6/27/24 8:07 AM, Akihiko Odaki wrote:
> Disable SR-IOV VF devices by reusing code to power down PCI devices
> instead of removing them when the guest requests to disable VFs. This
> allows to realize devices and report VF realization errors at PF
> realization time.
> 
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>   include/hw/pci/pci.h        |  5 ---
>   include/hw/pci/pci_device.h | 15 +++++++
>   include/hw/pci/pcie_sriov.h |  1 -
>   hw/pci/pci.c                |  2 +-
>   hw/pci/pcie_sriov.c         | 95 +++++++++++++++++++--------------------------
>   5 files changed, 56 insertions(+), 62 deletions(-)
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 6c92b2f70008..442017b4865d 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -644,9 +644,4 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
>   MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
>   void pci_set_enabled(PCIDevice *pci_dev, bool state);
>   
> -static inline void pci_set_power(PCIDevice *pci_dev, bool state)
> -{
> -    pci_set_enabled(pci_dev, state);
> -}
> -
>   #endif
> diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
> index d57f9ce83884..ca151325085d 100644
> --- a/include/hw/pci/pci_device.h
> +++ b/include/hw/pci/pci_device.h
> @@ -205,6 +205,21 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
>       return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
>   }
>   
> +static inline void pci_set_power(PCIDevice *pci_dev, bool state)
> +{
> +    /*
> +     * Don't change the enabled state of VFs when powering on/off the device.
> +     *
> +     * When powering on, VFs must not be enabled immediately but they must
> +     * wait until the guest configures SR-IOV.
> +     * When powering off, their corresponding PFs will be reset and disable
> +     * VFs.
> +     */
> +    if (!pci_is_vf(pci_dev)) {
> +        pci_set_enabled(pci_dev, state);
> +    }
> +}
> +
>   uint16_t pci_requester_id(PCIDevice *dev);
>   
>   /* DMA access functions */
> diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
> index aa704e8f9d9f..70649236c18a 100644
> --- a/include/hw/pci/pcie_sriov.h
> +++ b/include/hw/pci/pcie_sriov.h
> @@ -18,7 +18,6 @@
>   typedef struct PCIESriovPF {
>       uint16_t num_vfs;   /* Number of virtual functions created */
>       uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
> -    const char *vfname; /* Reference to the device type used for the VFs */
>       PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
>   } PCIESriovPF;
>   
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 295a32714a4a..c682c3dcb68e 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2822,7 +2822,7 @@ void pci_set_enabled(PCIDevice *d, bool state)
>       memory_region_set_enabled(&d->bus_master_enable_region,
>                                 (pci_get_word(d->config + PCI_COMMAND)
>                                  & PCI_COMMAND_MASTER) && d->enabled);
> -    if (!d->enabled) {
> +    if (d->qdev.realized) {
>           pci_device_reset(d);
>       }
>   }
> diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> index f0bde0d3fc79..faadb0d2ea85 100644
> --- a/hw/pci/pcie_sriov.c
> +++ b/hw/pci/pcie_sriov.c
> @@ -20,9 +20,16 @@
>   #include "qapi/error.h"
>   #include "trace.h"
>   
> -static PCIDevice *register_vf(PCIDevice *pf, int devfn,
> -                              const char *name, uint16_t vf_num);
> -static void unregister_vfs(PCIDevice *dev);
> +static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs)
> +{
> +    for (uint16_t i = 0; i < total_vfs; i++) {
> +        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
> +        object_unparent(OBJECT(vf));
> +        object_unref(OBJECT(vf));
> +    }
> +    g_free(dev->exp.sriov_pf.vf);
> +    dev->exp.sriov_pf.vf = NULL;
> +}
>   
>   bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>                           const char *vfname, uint16_t vf_dev_id,
> @@ -30,6 +37,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>                           uint16_t vf_offset, uint16_t vf_stride,
>                           Error **errp)
>   {
> +    BusState *bus = qdev_get_parent_bus(&dev->qdev);
> +    int32_t devfn = dev->devfn + vf_offset;
>       uint8_t *cfg = dev->config + offset;
>       uint8_t *wmask;
>   
> @@ -49,7 +58,6 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>                           offset, PCI_EXT_CAP_SRIOV_SIZEOF);
>       dev->exp.sriov_cap = offset;
>       dev->exp.sriov_pf.num_vfs = 0;
> -    dev->exp.sriov_pf.vfname = g_strdup(vfname);
>       dev->exp.sriov_pf.vf = NULL;
>   
>       pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
> @@ -83,14 +91,34 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>   
>       qdev_prop_set_bit(&dev->qdev, "multifunction", true);
>   
> +    dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs);
> +
> +    for (uint16_t i = 0; i < total_vfs; i++) {
> +        PCIDevice *vf = pci_new(devfn, vfname);
> +        vf->exp.sriov_vf.pf = dev;
> +        vf->exp.sriov_vf.vf_number = i;
> +
> +        if (!qdev_realize(&vf->qdev, bus, errp)) {
> +            unparent_vfs(dev, i);
> +            return false;
> +        }
> +
> +        /* set vid/did according to sr/iov spec - they are not used */
> +        pci_config_set_vendor_id(vf->config, 0xffff);
> +        pci_config_set_device_id(vf->config, 0xffff);
> +
> +        dev->exp.sriov_pf.vf[i] = vf;
> +        devfn += vf_stride;
> +    }
> +
>       return true;
>   }
>   
>   void pcie_sriov_pf_exit(PCIDevice *dev)
>   {
> -    unregister_vfs(dev);
> -    g_free((char *)dev->exp.sriov_pf.vfname);
> -    dev->exp.sriov_pf.vfname = NULL;
> +    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
> +
> +    unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF));
>   }
>   
>   void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
> @@ -156,38 +184,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
>       }
>   }
>   
> -static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name,
> -                              uint16_t vf_num)
> -{
> -    PCIDevice *dev = pci_new(devfn, name);
> -    dev->exp.sriov_vf.pf = pf;
> -    dev->exp.sriov_vf.vf_number = vf_num;
> -    PCIBus *bus = pci_get_bus(pf);
> -    Error *local_err = NULL;
> -
> -    qdev_realize(&dev->qdev, &bus->qbus, &local_err);
> -    if (local_err) {
> -        error_report_err(local_err);
> -        return NULL;
> -    }
> -
> -    /* set vid/did according to sr/iov spec - they are not used */
> -    pci_config_set_vendor_id(dev->config, 0xffff);
> -    pci_config_set_device_id(dev->config, 0xffff);
> -
> -    return dev;
> -}
> -
>   static void register_vfs(PCIDevice *dev)
>   {
>       uint16_t num_vfs;
>       uint16_t i;
>       uint16_t sriov_cap = dev->exp.sriov_cap;
> -    uint16_t vf_offset =
> -        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
> -    uint16_t vf_stride =
> -        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
> -    int32_t devfn = dev->devfn + vf_offset;
>   
>       assert(sriov_cap > 0);
>       num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> @@ -195,18 +196,10 @@ static void register_vfs(PCIDevice *dev)
>           return;
>       }
>   
> -    dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
> -
>       trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
>                                PCI_FUNC(dev->devfn), num_vfs);
>       for (i = 0; i < num_vfs; i++) {
> -        dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn,
> -                                              dev->exp.sriov_pf.vfname, i);
> -        if (!dev->exp.sriov_pf.vf[i]) {
> -            num_vfs = i;
> -            break;
> -        }
> -        devfn += vf_stride;
> +        pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
>       }
>       dev->exp.sriov_pf.num_vfs = num_vfs;
>   }
> @@ -219,12 +212,8 @@ static void unregister_vfs(PCIDevice *dev)
>       trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
>                                  PCI_FUNC(dev->devfn), num_vfs);
>       for (i = 0; i < num_vfs; i++) {
> -        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
> -        object_unparent(OBJECT(vf));
> -        object_unref(OBJECT(vf));
> +        pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
>       }
> -    g_free(dev->exp.sriov_pf.vf);
> -    dev->exp.sriov_pf.vf = NULL;
>       dev->exp.sriov_pf.num_vfs = 0;
>   }
>   
> @@ -246,14 +235,10 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
>                                PCI_FUNC(dev->devfn), off, val, len);
>   
>       if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> -        if (dev->exp.sriov_pf.num_vfs) {
> -            if (!(val & PCI_SRIOV_CTRL_VFE)) {
> -                unregister_vfs(dev);
> -            }
> +        if (val & PCI_SRIOV_CTRL_VFE) {
> +            register_vfs(dev);
>           } else {
> -            if (val & PCI_SRIOV_CTRL_VFE) {
> -                register_vfs(dev);
> -            }
> +            unregister_vfs(dev);
>           }
>       }
>   }
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances
  2024-07-10  6:37   ` Cédric Le Goater
@ 2024-07-10 10:52     ` Michael S. Tsirkin
  2024-07-13 12:45       ` Akihiko Odaki
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-07-10 10:52 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Akihiko Odaki, Philippe Mathieu-Daudé, Marcel Apfelbaum,
	Alex Williamson, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, Markus Armbruster, qemu-devel, qemu-block,
	Thomas Huth

On Wed, Jul 10, 2024 at 08:37:27AM +0200, Cédric Le Goater wrote:
> Hello,
> 
> This change introduced a regression on s390x. I could have spotted it
> earlier. Sorry about that. Here is the scenario,
> 
> QEMU now creates automatically the PCI device objects representing the
> VFs when the PF device is realized in pcie_sriov_pf_init(). This is
> good to report errors early but it has an important drawback.
> 
> On s390x, PCI devices have a dual S390PCIBusDevice object. This device
> model has 'uid' and 'fid' properties which can be either set by the VMM
> or, if not, auto-generated by the S390PCIBusDevice realize handler. In
> the VF case, these ids are auto-generated by QEMU and they can possibly
> conflict with the uid number space of libvirt. The conflict is detected
> when the machine is created and the start is aborted with a message :
> 
>   2024-07-08T12:51:42.876883Z qemu-system-s390x: -device {"driver":"zpci","uid":17,"fid":16,"target":"hostdev0","id":"zpci17"}: uid 17 already in use
> 
> This problem can occur today with a s390x VM using an IGB device.
> 
> It worked fine when the VFs were created at OS runtime because the initial
> topology of the machine was in place. Adding VFs was more or less like
> hotplug. AIUI, libvirt should have full control on the machine topology
> and so, creating VFs in QEMU at init time in the back of libvirt seems
> like a violation of this rule.
> 
> That said, the s390x case is specific and could perhaps be handled in a
> special way.
> 
> Thanks,
> 
> C.


Thanks for reporting this Cédric. Akihiko what's your
plan to handle this? Do you have the time to address this issue?


-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances
  2024-07-10 10:52     ` Michael S. Tsirkin
@ 2024-07-13 12:45       ` Akihiko Odaki
  2024-07-20 19:11         ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Akihiko Odaki @ 2024-07-13 12:45 UTC (permalink / raw)
  To: Michael S. Tsirkin, Cédric Le Goater
  Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson,
	Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost,
	Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen,
	Markus Armbruster, qemu-devel, qemu-block, Thomas Huth

On 2024/07/10 19:52, Michael S. Tsirkin wrote:
> On Wed, Jul 10, 2024 at 08:37:27AM +0200, Cédric Le Goater wrote:
>> Hello,
>>
>> This change introduced a regression on s390x. I could have spotted it
>> earlier. Sorry about that. Here is the scenario,
>>
>> QEMU now creates automatically the PCI device objects representing the
>> VFs when the PF device is realized in pcie_sriov_pf_init(). This is
>> good to report errors early but it has an important drawback.
>>
>> On s390x, PCI devices have a dual S390PCIBusDevice object. This device
>> model has 'uid' and 'fid' properties which can be either set by the VMM
>> or, if not, auto-generated by the S390PCIBusDevice realize handler. In
>> the VF case, these ids are auto-generated by QEMU and they can possibly
>> conflict with the uid number space of libvirt. The conflict is detected
>> when the machine is created and the start is aborted with a message :
>>
>>    2024-07-08T12:51:42.876883Z qemu-system-s390x: -device {"driver":"zpci","uid":17,"fid":16,"target":"hostdev0","id":"zpci17"}: uid 17 already in use
>>
>> This problem can occur today with a s390x VM using an IGB device.
>>
>> It worked fine when the VFs were created at OS runtime because the initial
>> topology of the machine was in place. Adding VFs was more or less like
>> hotplug. AIUI, libvirt should have full control on the machine topology
>> and so, creating VFs in QEMU at init time in the back of libvirt seems
>> like a violation of this rule.
>>
>> That said, the s390x case is specific and could perhaps be handled in a
>> special way.
>>
>> Thanks,
>>
>> C.
> 
> 
> Thanks for reporting this Cédric. Akihiko what's your
> plan to handle this? Do you have the time to address this issue?

Creating VFs at initialization time only makes problems apparent early. 
Even without this change, hot-plugging another PCI device after 
realizing a VF results in a similar situation.

A proper way to handle this is to add new properties to igb and nvme to 
let libvirt specify the VF ids. However I wonder if it is a worthwhile 
addition (i.e., if igb and nvme's SR-IOV emulation will be used with 
s390x and libvirt).

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances
  2024-07-13 12:45       ` Akihiko Odaki
@ 2024-07-20 19:11         ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-07-20 19:11 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Cédric Le Goater, Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Alex Williamson, Paolo Bonzini,
	Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Markus Armbruster,
	qemu-devel, qemu-block, Thomas Huth

On Sat, Jul 13, 2024 at 09:45:07PM +0900, Akihiko Odaki wrote:
> On 2024/07/10 19:52, Michael S. Tsirkin wrote:
> > On Wed, Jul 10, 2024 at 08:37:27AM +0200, Cédric Le Goater wrote:
> > > Hello,
> > > 
> > > This change introduced a regression on s390x. I could have spotted it
> > > earlier. Sorry about that. Here is the scenario,
> > > 
> > > QEMU now creates automatically the PCI device objects representing the
> > > VFs when the PF device is realized in pcie_sriov_pf_init(). This is
> > > good to report errors early but it has an important drawback.
> > > 
> > > On s390x, PCI devices have a dual S390PCIBusDevice object. This device
> > > model has 'uid' and 'fid' properties which can be either set by the VMM
> > > or, if not, auto-generated by the S390PCIBusDevice realize handler. In
> > > the VF case, these ids are auto-generated by QEMU and they can possibly
> > > conflict with the uid number space of libvirt. The conflict is detected
> > > when the machine is created and the start is aborted with a message :
> > > 
> > >    2024-07-08T12:51:42.876883Z qemu-system-s390x: -device {"driver":"zpci","uid":17,"fid":16,"target":"hostdev0","id":"zpci17"}: uid 17 already in use
> > > 
> > > This problem can occur today with a s390x VM using an IGB device.
> > > 
> > > It worked fine when the VFs were created at OS runtime because the initial
> > > topology of the machine was in place. Adding VFs was more or less like
> > > hotplug. AIUI, libvirt should have full control on the machine topology
> > > and so, creating VFs in QEMU at init time in the back of libvirt seems
> > > like a violation of this rule.
> > > 
> > > That said, the s390x case is specific and could perhaps be handled in a
> > > special way.
> > > 
> > > Thanks,
> > > 
> > > C.
> > 
> > 
> > Thanks for reporting this Cédric. Akihiko what's your
> > plan to handle this? Do you have the time to address this issue?
> 
> Creating VFs at initialization time only makes problems apparent early. Even
> without this change, hot-plugging another PCI device after realizing a VF
> results in a similar situation.
> 
> A proper way to handle this is to add new properties to igb and nvme to let
> libvirt specify the VF ids. However I wonder if it is a worthwhile addition
> (i.e., if igb and nvme's SR-IOV emulation will be used with s390x and
> libvirt).
> 
> Regards,
> Akihiko Odaki


Well okay but libvirt will not update overnight.
If we need time to discuss the design, I can revert for the
release and reapply after we have a fix.

-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-07-20 19:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-27  6:07 [PATCH v10 00/12] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 01/12] hw/pci: Rename has_power to enabled Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 02/12] hw/ppc/spapr_pci: Do not create DT for disabled PCI device Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 03/12] hw/ppc/spapr_pci: Do not reject VFs created after a PF Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 04/12] pcie_sriov: Do not manually unrealize Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 05/12] pcie_sriov: Ensure VF function number does not overflow Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 06/12] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
2024-07-10  6:37   ` Cédric Le Goater
2024-07-10 10:52     ` Michael S. Tsirkin
2024-07-13 12:45       ` Akihiko Odaki
2024-07-20 19:11         ` Michael S. Tsirkin
2024-06-27  6:07 ` [PATCH v10 07/12] pcie_sriov: Release VFs failed to realize Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 08/12] pcie_sriov: Remove num_vfs from PCIESriovPF Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 09/12] pcie_sriov: Register VFs after migration Akihiko Odaki
2024-06-27  6:07 ` [PATCH v10 10/12] hw/pci: Replace -1 with UINT32_MAX for romsize Akihiko Odaki
2024-06-27  6:08 ` [PATCH v10 11/12] hw/pci: Convert rom_bar into OnOffAuto Akihiko Odaki
2024-07-02 13:58   ` Michael S. Tsirkin
2024-07-03  2:15     ` BALATON Zoltan
2024-07-03  6:00       ` Michael S. Tsirkin
2024-07-03 11:00         ` BALATON Zoltan
2024-07-03 13:35           ` Michael S. Tsirkin
2024-07-03 14:56           ` Alex Williamson
2024-06-27  6:08 ` [PATCH v10 12/12] hw/qdev: Remove opts member Akihiko Odaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).