[PULL 00/41] virtio,pc,pci: features, fixes, cleanups

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PULL 00/41] virtio,pc,pci: features, fixes, cleanups
@ 2025-02-21 12:22 Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 01/41] docs/about: Change notes on x86 machine type deprecation into a general one Michael S. Tsirkin
                   ` (41 more replies)
  0 siblings, 42 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

The following changes since commit 4d5d933bbc7cc52f6cc6b9021f91fa06266222d5:

  Merge tag 'pull-xenfv-20250116' of git://git.infradead.org/users/dwmw2/qemu into staging (2025-01-16 09:03:43 -0500)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to dd6d545e8f2d9a0e8a8c287ec16469f03ef5c198:

  docs/devel/reset: Document reset expectations for DMA and IOMMU (2025-02-21 07:21:25 -0500)

----------------------------------------------------------------
virtio,pc,pci: features, fixes, cleanups

Features:

SR-IOV emulation for pci
virtio-mem-pci support for s390
interleave support for cxl
big endian support for vdpa svq
new QAPI events for vhost-user

Also vIOMMU reset order fixups are in.
Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Akihiko Odaki (12):
      hw/net: Fix NULL dereference with software RSS
      hw/ppc/spapr_pci: Do not create DT for disabled PCI device
      hw/ppc/spapr_pci: Do not reject VFs created after a PF
      s390x/pci: Avoid creating zpci for VFs
      s390x/pci: Allow plugging SR-IOV devices
      s390x/pci: Check for multifunction after device realization
      pcie_sriov: Do not manually unrealize
      pcie_sriov: Ensure VF addr does not overflow
      pcie_sriov: Reuse SR-IOV VF device instances
      pcie_sriov: Release VFs failed to realize
      pcie_sriov: Remove num_vfs from PCIESriovPF
      pcie_sriov: Register VFs after migration

Alexander Graf (1):
      hw/virtio/virtio-nsm: Respond with correct length

Daniel P. Berrangé (1):
      hw/virtio: reset virtio balloon stats on machine reset

Eric Auger (5):
      hw/virtio/virtio-iommu: Migrate to 3-phase reset
      hw/i386/intel-iommu: Migrate to 3-phase reset
      hw/arm/smmuv3: Move reset to exit phase
      hw/vfio/common: Add a trace point in vfio_reset_handler
      docs/devel/reset: Document reset expectations for DMA and IOMMU

Jonah Palmer (3):
      vhost-iova-tree: Implement an IOVA-only tree
      vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees
      vhost-iova-tree: Update documentation

Konstantin Shkolnyy (1):
      vdpa: Fix endian bugs in shadow virtqueue

Laurent Vivier (1):
      net: vhost-user: add QAPI events to report connection state

Li Zhijian (4):
      hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration
      hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call
      hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure
      hw/mem/cxl_type3: Ensure errp is set on realization failure

Matias Ezequiel Vara Larsen (1):
      vhost-user-snd: correct the calculation of config_size

Nicholas Piggin (3):
      qtest/libqos/pci: Do not write to PBA memory
      hw/pci/msix: Warn on PBA writes
      hw/pci: Assert a bar is not registered multiple times

Sairaj Kodilkar (2):
      amd_iommu: Use correct DTE field for interrupt passthrough
      amd_iommu: Use correct bitmask to set capability BAR

Stefano Garzarella (2):
      cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()
      MAINTAINERS: add more files to `vhost`

Thomas Huth (4):
      docs/about: Change notes on x86 machine type deprecation into a general one
      hw/i386/pc: Fix crash that occurs when introspecting TYPE_PC_MACHINE machines
      hw/i386/microvm: Fix crash that occurs when introspecting the microvm machine
      tests/qtest/vhost-user-test: Use modern virtio for vhost-user tests

Yao Xingtao (1):
      mem/cxl_type3: support 3, 6, 12 and 16 interleave ways

 docs/pcie_sriov.txt                     |   8 +-
 qapi/net.json                           |  40 ++++++++
 hw/i386/amd_iommu.h                     |   2 +-
 hw/virtio/vhost-iova-tree.h             |   8 +-
 hw/virtio/vhost-shadow-virtqueue.h      |   5 +-
 include/hw/cxl/cxl_device.h             |   4 +-
 include/hw/pci/pcie_sriov.h             |   9 +-
 include/hw/virtio/virtio-balloon.h      |   4 +
 include/qemu/iova-tree.h                |  22 +++++
 backends/cryptodev-vhost.c              |   2 +-
 hw/arm/smmu-common.c                    |   9 +-
 hw/arm/smmuv3.c                         |  14 ++-
 hw/cxl/cxl-component-utils.c            |   9 +-
 hw/cxl/cxl-device-utils.c               |  12 +--
 hw/cxl/switch-mailbox-cci.c             |   4 +-
 hw/i386/amd_iommu.c                     |  10 +-
 hw/i386/intel_iommu.c                   |  12 ++-
 hw/i386/microvm.c                       |  66 ++++++-------
 hw/i386/pc.c                            |   6 +-
 hw/mem/cxl_type3.c                      |  45 ++++++---
 hw/net/igb.c                            |  10 +-
 hw/net/virtio-net.c                     |  45 ++++-----
 hw/nvme/ctrl.c                          |  22 +++--
 hw/pci/msix.c                           |   9 ++
 hw/pci/pci.c                            |  22 ++++-
 hw/pci/pcie_sriov.c                     | 159 +++++++++++++++++--------------
 hw/ppc/spapr_pci.c                      |  20 +++-
 hw/s390x/s390-pci-bus.c                 |  42 +++++++--
 hw/vfio/common.c                        |   1 +
 hw/virtio/vhost-iova-tree.c             | 115 +++++++++++++++++++----
 hw/virtio/vhost-shadow-virtqueue.c      |  73 ++++++++++-----
 hw/virtio/vhost-user-snd.c              |  18 +++-
 hw/virtio/vhost-vdpa.c                  |  40 +++++---
 hw/virtio/virtio-balloon.c              |  30 +++++-
 hw/virtio/virtio-iommu.c                |  14 ++-
 hw/virtio/virtio-nsm.c                  |   2 +-
 net/vhost-user.c                        |   3 +
 net/vhost-vdpa.c                        |  12 ++-
 tests/qtest/libqos/pci.c                |   2 -
 tests/qtest/vhost-user-test.c           |   3 +-
 util/iova-tree.c                        |  46 +++++++++
 MAINTAINERS                             |  11 ++-
 docs/about/deprecated.rst               |   7 --
 docs/about/removed-features.rst         |  11 +--
 docs/devel/reset.rst                    |   5 +
 hw/arm/trace-events                     |   1 +
 hw/i386/trace-events                    |   1 +
 hw/pci/trace-events                     |   2 +-
 hw/vfio/trace-events                    |   1 +
 hw/virtio/trace-events                  |   2 +-
 tests/functional/meson.build            |   2 +
 tests/functional/test_virtio_balloon.py | 161 ++++++++++++++++++++++++++++++++
 52 files changed, 882 insertions(+), 301 deletions(-)
 create mode 100755 tests/functional/test_virtio_balloon.py



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PULL 01/41] docs/about: Change notes on x86 machine type deprecation into a general one
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
@ 2025-02-21 12:22 ` Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 02/41] hw/net: Fix NULL dereference with software RSS Michael S. Tsirkin
                   ` (40 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Thomas Huth, Zhao Liu, devel

From: Thomas Huth <thuth@redhat.com>

We now have a general note about versioned machine types getting
deprecated and removed at the beginning of the deprecated.rst file,
so we should also have a general note about this in removed-features.rst
(which will also apply to versioned non-x86 machine types) instead of
listing individual old machine types in the document.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250116064644.65670-1-thuth@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/about/deprecated.rst       |  7 -------
 docs/about/removed-features.rst | 11 +++++------
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 4a3c302962..7b42d6eecc 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -236,13 +236,6 @@ deprecated; use the new name ``dtb-randomness`` instead. The new name
 better reflects the way this property affects all random data within
 the device tree blob, not just the ``kaslr-seed`` node.
 
-``pc-i440fx-2.4`` up to ``pc-i440fx-2.12`` (since 9.1)
-''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-These old machine types are quite neglected nowadays and thus might have
-various pitfalls with regards to live migration. Use a newer machine type
-instead.
-
 PPC 405 ``ref405ep`` machine (since 9.1)
 ''''''''''''''''''''''''''''''''''''''''
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index c6616ce05e..156c0c253c 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -972,6 +972,11 @@ from Linux in 2021, and is not supported anymore by QEMU either.
 System emulator machines
 ------------------------
 
+Note: Versioned machine types that have been introduced in a QEMU version
+that has initially been released more than 6 years before are considered
+obsolete and will be removed without further notice in this document.
+Please use newer machine types instead.
+
 ``s390-virtio`` (removed in 2.6)
 ''''''''''''''''''''''''''''''''
 
@@ -1006,12 +1011,6 @@ mips ``fulong2e`` machine alias (removed in 6.0)
 
 This machine has been renamed ``fuloong2e``.
 
-``pc-0.10`` up to ``pc-i440fx-2.3`` (removed in 4.0 up to 9.0)
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-These machine types were very old and likely could not be used for live
-migration from old QEMU versions anymore. Use a newer machine type instead.
-
 Raspberry Pi ``raspi2`` and ``raspi3`` machines (removed in 6.2)
 ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 02/41] hw/net: Fix NULL dereference with software RSS
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 01/41] docs/about: Change notes on x86 machine type deprecation into a general one Michael S. Tsirkin
@ 2025-02-21 12:22 ` Michael S. Tsirkin
  2025-02-27  9:51   ` Michael Tokarev
  2025-02-21 12:22 ` [PULL 03/41] hw/ppc/spapr_pci: Do not create DT for disabled PCI device Michael S. Tsirkin
                   ` (39 subsequent siblings)
  41 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Akihiko Odaki, Jason Wang

From: Akihiko Odaki <akihiko.odaki@daynix.com>

When an eBPF program cannot be attached, virtio_net_load_ebpf() returns
false, and virtio_net_device_realize() enters the code path to handle
errors because of this, but it causes NULL dereference because no error
is generated.

Change virtio_net_load_ebpf() to return false only when a fatal error
occurred.

Fixes: b5900dff14e5 ("hw/net: report errors from failing to use eBPF RSS FDs")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-software-v1-1-9e5161b534d8@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/net/virtio-net.c | 45 ++++++++++++++++++---------------------------
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 85e14b788c..d64941bf8e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1352,18 +1352,25 @@ exit:
 
 static bool virtio_net_load_ebpf(VirtIONet *n, Error **errp)
 {
-    bool ret = false;
-
-    if (virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
-        trace_virtio_net_rss_load(n, n->nr_ebpf_rss_fds, n->ebpf_rss_fds);
-        if (n->ebpf_rss_fds) {
-            ret = virtio_net_load_ebpf_fds(n, errp);
-        } else {
-            ret = ebpf_rss_load(&n->ebpf_rss, errp);
-        }
+    if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
+        return true;
     }
 
-    return ret;
+    trace_virtio_net_rss_load(n, n->nr_ebpf_rss_fds, n->ebpf_rss_fds);
+
+    /*
+     * If user explicitly gave QEMU RSS FDs to use, then
+     * failing to use them must be considered a fatal
+     * error. If no RSS FDs were provided, QEMU is trying
+     * eBPF on a "best effort" basis only, so report a
+     * warning and allow fallback to software RSS.
+     */
+    if (n->ebpf_rss_fds) {
+        return virtio_net_load_ebpf_fds(n, errp);
+    }
+
+    ebpf_rss_load(&n->ebpf_rss, &error_warn);
+    return true;
 }
 
 static void virtio_net_unload_ebpf(VirtIONet *n)
@@ -3913,23 +3920,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     net_rx_pkt_init(&n->rx_pkt);
 
     if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
-        Error *err = NULL;
-        if (!virtio_net_load_ebpf(n, &err)) {
-            /*
-             * If user explicitly gave QEMU RSS FDs to use, then
-             * failing to use them must be considered a fatal
-             * error. If no RSS FDs were provided, QEMU is trying
-             * eBPF on a "best effort" basis only, so report a
-             * warning and allow fallback to software RSS.
-             */
-            if (n->ebpf_rss_fds) {
-                error_propagate(errp, err);
-            } else {
-                warn_report("unable to load eBPF RSS: %s",
-                            error_get_pretty(err));
-                error_free(err);
-            }
-        }
+        virtio_net_load_ebpf(n, errp);
     }
 }
 
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PULL 02/41] hw/net: Fix NULL dereference with software RSS
  2025-02-21 12:22 ` [PULL 02/41] hw/net: Fix NULL dereference with software RSS Michael S. Tsirkin
@ 2025-02-27  9:51   ` Michael Tokarev
  0 siblings, 0 replies; 44+ messages in thread
From: Michael Tokarev @ 2025-02-27  9:51 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Peter Maydell, Akihiko Odaki, Jason Wang, qemu-stable

21.02.2025 15:22, Michael S. Tsirkin wrote:
> From: Akihiko Odaki <akihiko.odaki@daynix.com>
> 
> When an eBPF program cannot be attached, virtio_net_load_ebpf() returns
> false, and virtio_net_device_realize() enters the code path to handle
> errors because of this, but it causes NULL dereference because no error
> is generated.
> 
> Change virtio_net_load_ebpf() to return false only when a fatal error
> occurred.
> 
> Fixes: b5900dff14e5 ("hw/net: report errors from failing to use eBPF RSS FDs")
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Message-Id: <20250116-software-v1-1-9e5161b534d8@daynix.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

This looks like qemu-stable material (9.2), though a bit large.
Please let me know if it is not.

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PULL 03/41] hw/ppc/spapr_pci: Do not create DT for disabled PCI device
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 01/41] docs/about: Change notes on x86 machine type deprecation into a general one Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 02/41] hw/net: Fix NULL dereference with software RSS Michael S. Tsirkin
@ 2025-02-21 12:22 ` Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 04/41] hw/ppc/spapr_pci: Do not reject VFs created after a PF Michael S. Tsirkin
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Akihiko Odaki, Shivaprasad G Bhat, Nicholas Piggin,
	Daniel Henrique Barboza, Harsh Prateek Bora, qemu-ppc

From: Akihiko Odaki <akihiko.odaki@daynix.com>

Disabled means it is a disabled SR-IOV VF and hidden from the guest.
Do not create DT when starting the system and also keep the disabled PCI
device not linked to DRC, which generates DT in case of hotplug.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Shivaprasad G Bhat<sbhat@linux.ibm.com>
Tested-by: Shivaprasad G Bhat<sbhat@linux.ibm.com>
Message-Id: <20250116-reuse-v20-1-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/ppc/spapr_pci.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 904227d9aa..b94e4ba131 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1283,8 +1283,7 @@ static void spapr_dt_pci_device_cb(PCIBus *bus, PCIDevice *pdev,
     PciWalkFdt *p = opaque;
     int err;
 
-    if (p->err) {
-        /* Something's already broken, don't keep going */
+    if (p->err || !pdev->enabled) {
         return;
     }
 
@@ -1572,6 +1571,14 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
     SpaprDrc *drc = drc_from_dev(phb, pdev);
     uint32_t slotnr = PCI_SLOT(pdev->devfn);
 
+    /*
+     * If DR or the PCI device is disabled we don't need to do anything
+     * in the case of hotplug or coldplug callbacks.
+     */
+    if (!pdev->enabled) {
+        return;
+    }
+
     g_assert(drc);
 
     if (IS_PCI_BRIDGE(plugged_dev)) {
@@ -1647,6 +1654,11 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
     SpaprDrc *drc = drc_from_dev(phb, pdev);
 
     g_assert(drc);
+
+    if (!drc->dev) {
+        return;
+    }
+
     g_assert(drc->dev == plugged_dev);
 
     if (!spapr_drc_unplug_requested(drc)) {
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 04/41] hw/ppc/spapr_pci: Do not reject VFs created after a PF
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2025-02-21 12:22 ` [PULL 03/41] hw/ppc/spapr_pci: Do not create DT for disabled PCI device Michael S. Tsirkin
@ 2025-02-21 12:22 ` Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 05/41] s390x/pci: Avoid creating zpci for VFs Michael S. Tsirkin
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Akihiko Odaki, Shivaprasad G Bhat, Nicholas Piggin,
	Daniel Henrique Barboza, Harsh Prateek Bora, qemu-ppc

From: Akihiko Odaki <akihiko.odaki@daynix.com>

A PF may automatically create VFs and the PF may be function 0.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Message-Id: <20250116-reuse-v20-2-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/ppc/spapr_pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index b94e4ba131..e0a9d50edc 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1549,7 +1549,9 @@ static void spapr_pci_pre_plug(HotplugHandler *plug_handler,
      * hotplug, we do not allow functions to be hotplugged to a
      * slot that already has function 0 present
      */
-    if (plugged_dev->hotplugged && bus->devices[PCI_DEVFN(slotnr, 0)] &&
+    if (plugged_dev->hotplugged &&
+        !pci_is_vf(pdev) &&
+        bus->devices[PCI_DEVFN(slotnr, 0)] &&
         PCI_FUNC(pdev->devfn) != 0) {
         error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
                    " additional functions can no longer be exposed to guest.",
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 05/41] s390x/pci: Avoid creating zpci for VFs
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2025-02-21 12:22 ` [PULL 04/41] hw/ppc/spapr_pci: Do not reject VFs created after a PF Michael S. Tsirkin
@ 2025-02-21 12:22 ` Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 06/41] s390x/pci: Allow plugging SR-IOV devices Michael S. Tsirkin
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Akihiko Odaki, Matthew Rosato, Eric Farman,
	Thomas Huth, Halil Pasic, Christian Borntraeger,
	Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
	qemu-s390x

From: Akihiko Odaki <akihiko.odaki@daynix.com>

VFs are automatically created by PF, and creating zpci for them will
result in unexpected usage of fids. Currently QEMU does not support
multifunction for s390x so we don't need zpci for VFs anyway.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-3-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/s390x/s390-pci-bus.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index eead269cc2..8c5eb69f7d 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1080,6 +1080,16 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 
         pbdev = s390_pci_find_dev_by_target(s, dev->id);
         if (!pbdev) {
+            /*
+             * VFs are automatically created by PF, and creating zpci for them
+             * will result in unexpected usage of fids. Currently QEMU does not
+             * support multifunction for s390x so we don't need zpci for VFs
+             * anyway.
+             */
+            if (pci_is_vf(pdev)) {
+                return;
+            }
+
             pbdev = s390_pci_device_new(s, dev->id, errp);
             if (!pbdev) {
                 return;
@@ -1167,7 +1177,10 @@ static void s390_pcihost_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
         int32_t devfn;
 
         pbdev = s390_pci_find_dev_by_pci(s, PCI_DEVICE(dev));
-        g_assert(pbdev);
+        if (!pbdev) {
+            g_assert(pci_is_vf(pci_dev));
+            return;
+        }
 
         s390_pci_generate_plug_event(HP_EVENT_STANDBY_TO_RESERVED,
                                      pbdev->fh, pbdev->fid);
@@ -1206,7 +1219,11 @@ static void s390_pcihost_unplug_request(HotplugHandler *hotplug_dev,
          * we've checked the PCI device already (to prevent endless recursion).
          */
         pbdev = s390_pci_find_dev_by_pci(s, PCI_DEVICE(dev));
-        g_assert(pbdev);
+        if (!pbdev) {
+            g_assert(pci_is_vf(PCI_DEVICE(dev)));
+            return;
+        }
+
         pbdev->pci_unplug_request_processed = true;
         qdev_unplug(DEVICE(pbdev), errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 06/41] s390x/pci: Allow plugging SR-IOV devices
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2025-02-21 12:22 ` [PULL 05/41] s390x/pci: Avoid creating zpci for VFs Michael S. Tsirkin
@ 2025-02-21 12:22 ` Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 07/41] s390x/pci: Check for multifunction after device realization Michael S. Tsirkin
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Akihiko Odaki, Matthew Rosato, Eric Farman,
	Thomas Huth, Richard Henderson, David Hildenbrand,
	Ilya Leoshkevich, Halil Pasic, Christian Borntraeger, qemu-s390x

From: Akihiko Odaki <akihiko.odaki@daynix.com>

The guest cannot use VFs due to the lack of multifunction support but
can use PFs.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-4-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/s390x/s390-pci-bus.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index 8c5eb69f7d..c396d55c72 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -974,7 +974,14 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
         PCIDevice *pdev = PCI_DEVICE(dev);
 
-        if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+        /*
+         * Multifunction is not supported due to the lack of CLP. However,
+         * do not check for multifunction capability for SR-IOV devices because
+         * SR-IOV devices automatically add the multifunction capability whether
+         * the user intends to use the functions other than the PF.
+         */
+        if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION &&
+            !pdev->exp.sriov_cap) {
             error_setg(errp, "multifunction not supported in s390");
             return;
         }
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 07/41] s390x/pci: Check for multifunction after device realization
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2025-02-21 12:22 ` [PULL 06/41] s390x/pci: Allow plugging SR-IOV devices Michael S. Tsirkin
@ 2025-02-21 12:22 ` Michael S. Tsirkin
  2025-02-21 12:22 ` [PULL 08/41] pcie_sriov: Do not manually unrealize Michael S. Tsirkin
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Akihiko Odaki, Matthew Rosato, Eric Farman,
	Thomas Huth, Richard Henderson, David Hildenbrand,
	Ilya Leoshkevich, Halil Pasic, Christian Borntraeger, qemu-s390x

From: Akihiko Odaki <akihiko.odaki@daynix.com>

The SR-IOV PFs set the multifunction bit during device realization so
check them after that. There is no functional change because we
explicitly ignore the multifunction bit for SR-IOV devices.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-5-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/s390x/s390-pci-bus.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index c396d55c72..913d72cc74 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -971,21 +971,7 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                     "this device");
     }
 
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
-        PCIDevice *pdev = PCI_DEVICE(dev);
-
-        /*
-         * Multifunction is not supported due to the lack of CLP. However,
-         * do not check for multifunction capability for SR-IOV devices because
-         * SR-IOV devices automatically add the multifunction capability whether
-         * the user intends to use the functions other than the PF.
-         */
-        if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION &&
-            !pdev->exp.sriov_cap) {
-            error_setg(errp, "multifunction not supported in s390");
-            return;
-        }
-    } else if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_S390_PCI_DEVICE)) {
         S390PCIBusDevice *pbdev = S390_PCI_DEVICE(dev);
 
         if (!s390_pci_alloc_idx(s, pbdev)) {
@@ -1076,6 +1062,18 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
         pdev = PCI_DEVICE(dev);
 
+        /*
+         * Multifunction is not supported due to the lack of CLP. However,
+         * do not check for multifunction capability for SR-IOV devices because
+         * SR-IOV devices automatically add the multifunction capability whether
+         * the user intends to use the functions other than the PF.
+         */
+        if (pdev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION &&
+            !pdev->exp.sriov_cap) {
+            error_setg(errp, "multifunction not supported in s390");
+            return;
+        }
+
         if (!dev->id) {
             /* In the case the PCI device does not define an id */
             /* we generate one based on the PCI address         */
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 08/41] pcie_sriov: Do not manually unrealize
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2025-02-21 12:22 ` [PULL 07/41] s390x/pci: Check for multifunction after device realization Michael S. Tsirkin
@ 2025-02-21 12:22 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 09/41] pcie_sriov: Ensure VF addr does not overflow Michael S. Tsirkin
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Akihiko Odaki, Philippe Mathieu-Daudé,
	Marcel Apfelbaum

From: Akihiko Odaki <akihiko.odaki@daynix.com>

A device gets automatically unrealized when being unparented.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-6-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 hw/pci/pcie_sriov.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index e9b23221d7..499becd527 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -204,11 +204,7 @@ static void unregister_vfs(PCIDevice *dev)
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
                                PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        Error *err = NULL;
         PCIDevice *vf = dev->exp.sriov_pf.vf[i];
-        if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) {
-            error_reportf_err(err, "Failed to unplug: ");
-        }
         object_unparent(OBJECT(vf));
         object_unref(OBJECT(vf));
     }
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 09/41] pcie_sriov: Ensure VF addr does not overflow
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (7 preceding siblings ...)
  2025-02-21 12:22 ` [PULL 08/41] pcie_sriov: Do not manually unrealize Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 10/41] pcie_sriov: Reuse SR-IOV VF device instances Michael S. Tsirkin
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Akihiko Odaki, Marcel Apfelbaum, Sriram Yagnaraman,
	Jason Wang, Keith Busch, Klaus Jensen, Jesper Devantier,
	qemu-block

From: Akihiko Odaki <akihiko.odaki@daynix.com>

pci_new() aborts when creating a VF with addr >= PCI_DEVFN_MAX.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-7-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/pcie_sriov.txt         |  8 +++++---
 include/hw/pci/pcie_sriov.h |  5 +++--
 hw/net/igb.c                | 10 +++++++---
 hw/nvme/ctrl.c              | 22 ++++++++++++++--------
 hw/pci/pcie_sriov.c         | 14 ++++++++++++--
 5 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index a47aad0bfa..ab2142807f 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -52,9 +52,11 @@ setting up a BAR for a VF.
       ...
 
       /* Add and initialize the SR/IOV capability */
-      pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
-                       vf_devid, initial_vfs, total_vfs,
-                       fun_offset, stride);
+      if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+                              vf_devid, initial_vfs, total_vfs,
+                              fun_offset, stride, errp)) {
+         return;
+      }
 
       /* Set up individual VF BARs (parameters as for normal BARs) */
       pcie_sriov_pf_init_vf_bar( ... )
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 450cbef6c2..aa704e8f9d 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -27,10 +27,11 @@ typedef struct PCIESriovVF {
     uint16_t vf_number; /* Logical VF number of this function */
 } PCIESriovVF;
 
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         const char *vfname, uint16_t vf_dev_id,
                         uint16_t init_vfs, uint16_t total_vfs,
-                        uint16_t vf_offset, uint16_t vf_stride);
+                        uint16_t vf_offset, uint16_t vf_stride,
+                        Error **errp);
 void pcie_sriov_pf_exit(PCIDevice *dev);
 
 /* Set up a VF bar in the SR/IOV bar area */
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 4d93ce629f..c965fc2fb6 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -446,9 +446,13 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp)
 
     pcie_ari_init(pci_dev, 0x150);
 
-    pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
-        IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
-        IGB_VF_OFFSET, IGB_VF_STRIDE);
+    if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
+                            IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS,
+                            IGB_MAX_VF_FUNCTIONS, IGB_VF_OFFSET, IGB_VF_STRIDE,
+                            errp)) {
+        igb_cleanup_msix(s);
+        return;
+    }
 
     pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
         PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 68903d1d70..8175751518 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8481,7 +8481,8 @@ out:
     return pow2ceil(bar_size);
 }
 
-static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
+static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+                            Error **errp)
 {
     uint16_t vf_dev_id = n->params.use_intel_id ?
                          PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
@@ -8490,12 +8491,16 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
                                       le16_to_cpu(cap->vifrsm),
                                       NULL, NULL);
 
-    pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
-                       n->params.sriov_max_vfs, n->params.sriov_max_vfs,
-                       NVME_VF_OFFSET, NVME_VF_STRIDE);
+    if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
+                            n->params.sriov_max_vfs, n->params.sriov_max_vfs,
+                            NVME_VF_OFFSET, NVME_VF_STRIDE, errp)) {
+        return false;
+    }
 
     pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
                               PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
+
+    return true;
 }
 
 static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
@@ -8620,6 +8625,11 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
         return false;
     }
 
+    if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs &&
+        !nvme_init_sriov(n, pci_dev, 0x120, errp)) {
+        return false;
+    }
+
     nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
 
     pcie_cap_deverr_init(pci_dev);
@@ -8649,10 +8659,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
         nvme_init_pmr(n, pci_dev);
     }
 
-    if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) {
-        nvme_init_sriov(n, pci_dev, 0x120);
-    }
-
     return true;
 }
 
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 499becd527..91c64c988e 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -24,14 +24,22 @@ static PCIDevice *register_vf(PCIDevice *pf, int devfn,
                               const char *name, uint16_t vf_num);
 static void unregister_vfs(PCIDevice *dev);
 
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         const char *vfname, uint16_t vf_dev_id,
                         uint16_t init_vfs, uint16_t total_vfs,
-                        uint16_t vf_offset, uint16_t vf_stride)
+                        uint16_t vf_offset, uint16_t vf_stride,
+                        Error **errp)
 {
+    int32_t devfn = dev->devfn + vf_offset;
     uint8_t *cfg = dev->config + offset;
     uint8_t *wmask;
 
+    if (total_vfs &&
+        (uint32_t)devfn + (uint32_t)(total_vfs - 1) * vf_stride >= PCI_DEVFN_MAX) {
+        error_setg(errp, "VF addr overflows");
+        return false;
+    }
+
     pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
                         offset, PCI_EXT_CAP_SRIOV_SIZEOF);
     dev->exp.sriov_cap = offset;
@@ -69,6 +77,8 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
     pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
 
     qdev_prop_set_bit(&dev->qdev, "multifunction", true);
+
+    return true;
 }
 
 void pcie_sriov_pf_exit(PCIDevice *dev)
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 10/41] pcie_sriov: Reuse SR-IOV VF device instances
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (8 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 09/41] pcie_sriov: Ensure VF addr does not overflow Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 11/41] pcie_sriov: Release VFs failed to realize Michael S. Tsirkin
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Akihiko Odaki, Marcel Apfelbaum

From: Akihiko Odaki <akihiko.odaki@daynix.com>

Disable SR-IOV VF devices by reusing code to power down PCI devices
instead of removing them when the guest requests to disable VFs. This
allows to realize devices and report VF realization errors at PF
realization time.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-8-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/pci/pcie_sriov.h |  1 -
 hw/pci/pci.c                | 14 +++++-
 hw/pci/pcie_sriov.c         | 94 +++++++++++++++----------------------
 3 files changed, 51 insertions(+), 58 deletions(-)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index aa704e8f9d..70649236c1 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,7 +18,6 @@
 typedef struct PCIESriovPF {
     uint16_t num_vfs;   /* Number of virtual functions created */
     uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
-    const char *vfname; /* Reference to the device type used for the VFs */
     PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
 } PCIESriovPF;
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 2afa423925..3e29b30d55 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2963,7 +2963,17 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
 
 void pci_set_power(PCIDevice *d, bool state)
 {
-    pci_set_enabled(d, state);
+    /*
+     * Don't change the enabled state of VFs when powering on/off the device.
+     *
+     * When powering on, VFs must not be enabled immediately but they must
+     * wait until the guest configures SR-IOV.
+     * When powering off, their corresponding PFs will be reset and disable
+     * VFs.
+     */
+    if (!pci_is_vf(d)) {
+        pci_set_enabled(d, state);
+    }
 }
 
 void pci_set_enabled(PCIDevice *d, bool state)
@@ -2977,7 +2987,7 @@ void pci_set_enabled(PCIDevice *d, bool state)
     memory_region_set_enabled(&d->bus_master_enable_region,
                               (pci_get_word(d->config + PCI_COMMAND)
                                & PCI_COMMAND_MASTER) && d->enabled);
-    if (!d->enabled) {
+    if (qdev_is_realized(&d->qdev)) {
         pci_device_reset(d);
     }
 }
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 91c64c988e..f1993bc553 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -20,9 +20,16 @@
 #include "qapi/error.h"
 #include "trace.h"
 
-static PCIDevice *register_vf(PCIDevice *pf, int devfn,
-                              const char *name, uint16_t vf_num);
-static void unregister_vfs(PCIDevice *dev);
+static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs)
+{
+    for (uint16_t i = 0; i < total_vfs; i++) {
+        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
+        object_unparent(OBJECT(vf));
+        object_unref(OBJECT(vf));
+    }
+    g_free(dev->exp.sriov_pf.vf);
+    dev->exp.sriov_pf.vf = NULL;
+}
 
 bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         const char *vfname, uint16_t vf_dev_id,
@@ -30,6 +37,7 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         uint16_t vf_offset, uint16_t vf_stride,
                         Error **errp)
 {
+    BusState *bus = qdev_get_parent_bus(&dev->qdev);
     int32_t devfn = dev->devfn + vf_offset;
     uint8_t *cfg = dev->config + offset;
     uint8_t *wmask;
@@ -44,7 +52,6 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         offset, PCI_EXT_CAP_SRIOV_SIZEOF);
     dev->exp.sriov_cap = offset;
     dev->exp.sriov_pf.num_vfs = 0;
-    dev->exp.sriov_pf.vfname = g_strdup(vfname);
     dev->exp.sriov_pf.vf = NULL;
 
     pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -78,14 +85,34 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 
     qdev_prop_set_bit(&dev->qdev, "multifunction", true);
 
+    dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs);
+
+    for (uint16_t i = 0; i < total_vfs; i++) {
+        PCIDevice *vf = pci_new(devfn, vfname);
+        vf->exp.sriov_vf.pf = dev;
+        vf->exp.sriov_vf.vf_number = i;
+
+        if (!qdev_realize(&vf->qdev, bus, errp)) {
+            unparent_vfs(dev, i);
+            return false;
+        }
+
+        /* set vid/did according to sr/iov spec - they are not used */
+        pci_config_set_vendor_id(vf->config, 0xffff);
+        pci_config_set_device_id(vf->config, 0xffff);
+
+        dev->exp.sriov_pf.vf[i] = vf;
+        devfn += vf_stride;
+    }
+
     return true;
 }
 
 void pcie_sriov_pf_exit(PCIDevice *dev)
 {
-    unregister_vfs(dev);
-    g_free((char *)dev->exp.sriov_pf.vfname);
-    dev->exp.sriov_pf.vfname = NULL;
+    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
+
+    unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF));
 }
 
 void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
@@ -151,38 +178,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
     }
 }
 
-static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name,
-                              uint16_t vf_num)
-{
-    PCIDevice *dev = pci_new(devfn, name);
-    dev->exp.sriov_vf.pf = pf;
-    dev->exp.sriov_vf.vf_number = vf_num;
-    PCIBus *bus = pci_get_bus(pf);
-    Error *local_err = NULL;
-
-    qdev_realize(&dev->qdev, &bus->qbus, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-        return NULL;
-    }
-
-    /* set vid/did according to sr/iov spec - they are not used */
-    pci_config_set_vendor_id(dev->config, 0xffff);
-    pci_config_set_device_id(dev->config, 0xffff);
-
-    return dev;
-}
-
 static void register_vfs(PCIDevice *dev)
 {
     uint16_t num_vfs;
     uint16_t i;
     uint16_t sriov_cap = dev->exp.sriov_cap;
-    uint16_t vf_offset =
-        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
-    uint16_t vf_stride =
-        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
-    int32_t devfn = dev->devfn + vf_offset;
 
     assert(sriov_cap > 0);
     num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
@@ -190,18 +190,10 @@ static void register_vfs(PCIDevice *dev)
         return;
     }
 
-    dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
-
     trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
                              PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn,
-                                              dev->exp.sriov_pf.vfname, i);
-        if (!dev->exp.sriov_pf.vf[i]) {
-            num_vfs = i;
-            break;
-        }
-        devfn += vf_stride;
+        pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
     }
     dev->exp.sriov_pf.num_vfs = num_vfs;
 }
@@ -214,12 +206,8 @@ static void unregister_vfs(PCIDevice *dev)
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
                                PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
-        object_unparent(OBJECT(vf));
-        object_unref(OBJECT(vf));
+        pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
     }
-    g_free(dev->exp.sriov_pf.vf);
-    dev->exp.sriov_pf.vf = NULL;
     dev->exp.sriov_pf.num_vfs = 0;
 }
 
@@ -241,14 +229,10 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
                              PCI_FUNC(dev->devfn), off, val, len);
 
     if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
-        if (dev->exp.sriov_pf.num_vfs) {
-            if (!(val & PCI_SRIOV_CTRL_VFE)) {
-                unregister_vfs(dev);
-            }
+        if (val & PCI_SRIOV_CTRL_VFE) {
+            register_vfs(dev);
         } else {
-            if (val & PCI_SRIOV_CTRL_VFE) {
-                register_vfs(dev);
-            }
+            unregister_vfs(dev);
         }
     }
 }
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 11/41] pcie_sriov: Release VFs failed to realize
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (9 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 10/41] pcie_sriov: Reuse SR-IOV VF device instances Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 12/41] pcie_sriov: Remove num_vfs from PCIESriovPF Michael S. Tsirkin
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Akihiko Odaki, Marcel Apfelbaum

From: Akihiko Odaki <akihiko.odaki@daynix.com>

Release VFs failed to realize just as we do in unregister_vfs().

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-9-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/pci/pcie_sriov.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index f1993bc553..db087bb933 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -93,6 +93,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
         vf->exp.sriov_vf.vf_number = i;
 
         if (!qdev_realize(&vf->qdev, bus, errp)) {
+            object_unparent(OBJECT(vf));
+            object_unref(vf);
             unparent_vfs(dev, i);
             return false;
         }
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 12/41] pcie_sriov: Remove num_vfs from PCIESriovPF
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (10 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 11/41] pcie_sriov: Release VFs failed to realize Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 13/41] pcie_sriov: Register VFs after migration Michael S. Tsirkin
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Akihiko Odaki, Marcel Apfelbaum

From: Akihiko Odaki <akihiko.odaki@daynix.com>

num_vfs is not migrated so use PCI_SRIOV_CTRL_VFE and PCI_SRIOV_NUM_VF
instead.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-10-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/pci/pcie_sriov.h |  1 -
 hw/pci/pcie_sriov.c         | 38 ++++++++++++++++++++++++++-----------
 hw/pci/trace-events         |  2 +-
 3 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 70649236c1..5148c5b77d 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -16,7 +16,6 @@
 #include "hw/pci/pci.h"
 
 typedef struct PCIESriovPF {
-    uint16_t num_vfs;   /* Number of virtual functions created */
     uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
     PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
 } PCIESriovPF;
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index db087bb933..69609c112e 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -51,7 +51,6 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
     pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
                         offset, PCI_EXT_CAP_SRIOV_SIZEOF);
     dev->exp.sriov_cap = offset;
-    dev->exp.sriov_pf.num_vfs = 0;
     dev->exp.sriov_pf.vf = NULL;
 
     pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -188,29 +187,28 @@ static void register_vfs(PCIDevice *dev)
 
     assert(sriov_cap > 0);
     num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
-    if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
-        return;
-    }
 
     trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
                              PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
         pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
     }
-    dev->exp.sriov_pf.num_vfs = num_vfs;
+
+    pci_set_word(dev->wmask + sriov_cap + PCI_SRIOV_NUM_VF, 0);
 }
 
 static void unregister_vfs(PCIDevice *dev)
 {
-    uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
+    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
     uint16_t i;
 
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
-                               PCI_FUNC(dev->devfn), num_vfs);
-    for (i = 0; i < num_vfs; i++) {
+                               PCI_FUNC(dev->devfn));
+    for (i = 0; i < pci_get_word(cfg + PCI_SRIOV_TOTAL_VF); i++) {
         pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
     }
-    dev->exp.sriov_pf.num_vfs = 0;
+
+    pci_set_word(dev->wmask + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0xffff);
 }
 
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
@@ -236,6 +234,17 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
         } else {
             unregister_vfs(dev);
         }
+    } else if (range_covers_byte(off, len, PCI_SRIOV_NUM_VF)) {
+        uint8_t *cfg = dev->config + sriov_cap;
+        uint8_t *wmask = dev->wmask + sriov_cap;
+        uint16_t num_vfs = pci_get_word(cfg + PCI_SRIOV_NUM_VF);
+        uint16_t wmask_val = PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI;
+
+        if (num_vfs <= pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)) {
+            wmask_val |= PCI_SRIOV_CTRL_VFE;
+        }
+
+        pci_set_word(wmask + PCI_SRIOV_CTRL, wmask_val);
     }
 }
 
@@ -252,6 +261,8 @@ void pcie_sriov_pf_reset(PCIDevice *dev)
     unregister_vfs(dev);
 
     pci_set_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF, 0);
+    pci_set_word(dev->wmask + sriov_cap + PCI_SRIOV_CTRL,
+                 PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE | PCI_SRIOV_CTRL_ARI);
 
     /*
      * Default is to use 4K pages, software can modify it
@@ -298,7 +309,7 @@ PCIDevice *pcie_sriov_get_pf(PCIDevice *dev)
 PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
 {
     assert(!pci_is_vf(dev));
-    if (n < dev->exp.sriov_pf.num_vfs) {
+    if (n < pcie_sriov_num_vfs(dev)) {
         return dev->exp.sriov_pf.vf[n];
     }
     return NULL;
@@ -306,5 +317,10 @@ PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
 
 uint16_t pcie_sriov_num_vfs(PCIDevice *dev)
 {
-    return dev->exp.sriov_pf.num_vfs;
+    uint16_t sriov_cap = dev->exp.sriov_cap;
+    uint8_t *cfg = dev->config + sriov_cap;
+
+    return sriov_cap &&
+           (pci_get_word(cfg + PCI_SRIOV_CTRL) & PCI_SRIOV_CTRL_VFE) ?
+           pci_get_word(cfg + PCI_SRIOV_NUM_VF) : 0;
 }
diff --git a/hw/pci/trace-events b/hw/pci/trace-events
index 19643aa8c6..e98f575a9d 100644
--- a/hw/pci/trace-events
+++ b/hw/pci/trace-events
@@ -14,7 +14,7 @@ msix_write_config(char *name, bool enabled, bool masked) "dev %s enabled %d mask
 
 # hw/pci/pcie_sriov.c
 sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: creating %d vf devs"
-sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: Unregistering %d vf devs"
+sriov_unregister_vfs(const char *name, int slot, int function) "%s %02x:%x: Unregistering vf devs"
 sriov_config_write(const char *name, int slot, int fun, uint32_t offset, uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"
 
 # pcie.c
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 13/41] pcie_sriov: Register VFs after migration
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (11 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 12/41] pcie_sriov: Remove num_vfs from PCIESriovPF Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 14/41] qtest/libqos/pci: Do not write to PBA memory Michael S. Tsirkin
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Akihiko Odaki, Marcel Apfelbaum

From: Akihiko Odaki <akihiko.odaki@daynix.com>

pcie_sriov doesn't have code to restore its state after migration, but
igb, which uses pcie_sriov, naively claimed its migration capability.

Add code to register VFs after migration and fix igb migration.

Fixes: 3a977deebe6b ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20250116-reuse-v20-11-7cb370606368@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/pci/pcie_sriov.h | 2 ++
 hw/pci/pci.c                | 7 +++++++
 hw/pci/pcie_sriov.c         | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 5148c5b77d..c5d2d318d3 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -57,6 +57,8 @@ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
                              uint32_t val, int len);
 
+void pcie_sriov_pf_post_load(PCIDevice *dev);
+
 /* Reset SR/IOV */
 void pcie_sriov_pf_reset(PCIDevice *dev);
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 3e29b30d55..69a1b8c298 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -803,10 +803,17 @@ static bool migrate_is_not_pcie(void *opaque, int version_id)
     return !pci_is_express((PCIDevice *)opaque);
 }
 
+static int pci_post_load(void *opaque, int version_id)
+{
+    pcie_sriov_pf_post_load(opaque);
+    return 0;
+}
+
 const VMStateDescription vmstate_pci_device = {
     .name = "PCIDevice",
     .version_id = 2,
     .minimum_version_id = 1,
+    .post_load = pci_post_load,
     .fields = (const VMStateField[]) {
         VMSTATE_INT32_POSITIVE_LE(version_id, PCIDevice),
         VMSTATE_BUFFER_UNSAFE_INFO_TEST(config, PCIDevice,
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 69609c112e..1eb4358256 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -248,6 +248,13 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
     }
 }
 
+void pcie_sriov_pf_post_load(PCIDevice *dev)
+{
+    if (dev->exp.sriov_cap) {
+        register_vfs(dev);
+    }
+}
+
 
 /* Reset SR/IOV */
 void pcie_sriov_pf_reset(PCIDevice *dev)
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 14/41] qtest/libqos/pci: Do not write to PBA memory
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (12 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 13/41] pcie_sriov: Register VFs after migration Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 15/41] hw/pci/msix: Warn on PBA writes Michael S. Tsirkin
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Nicholas Piggin, Akihiko Odaki, Fabiano Rosas,
	Laurent Vivier, Paolo Bonzini

From: Nicholas Piggin <npiggin@gmail.com>

The PCI Local Bus Specification says the result of writes to MSI-X
PBA memory is undefined. QEMU implements them as no-ops, so remove
the pointless write from qpci_msix_pending().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20250117172244.406206-2-npiggin@gmail.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/qtest/libqos/pci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tests/qtest/libqos/pci.c b/tests/qtest/libqos/pci.c
index b23d72346b..a59197b992 100644
--- a/tests/qtest/libqos/pci.c
+++ b/tests/qtest/libqos/pci.c
@@ -328,8 +328,6 @@ bool qpci_msix_pending(QPCIDevice *dev, uint16_t entry)
 
     g_assert(dev->msix_enabled);
     pba_entry = qpci_io_readl(dev, dev->msix_pba_bar, dev->msix_pba_off + off);
-    qpci_io_writel(dev, dev->msix_pba_bar, dev->msix_pba_off + off,
-                   pba_entry & ~(1 << bit_n));
     return (pba_entry & (1 << bit_n)) != 0;
 }
 
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 15/41] hw/pci/msix: Warn on PBA writes
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (13 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 14/41] qtest/libqos/pci: Do not write to PBA memory Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 16/41] hw/pci: Assert a bar is not registered multiple times Michael S. Tsirkin
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Nicholas Piggin, Marcel Apfelbaum, Dmitry Fleytman,
	Sriram Yagnaraman, Phil Dennis-Jordan,
	Philippe Mathieu-Daudé, Akihiko Odaki

From: Nicholas Piggin <npiggin@gmail.com>

Of the MSI-X PBA pending bits, the PCI Local Bus Specification says:

  Software should never write, and should only read
  Pending Bits. If software writes to Pending Bits, the
  result is undefined.

Log a GUEST_ERROR message if the PBA is written to by software.

Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Cc: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20250117172842.406338-2-npiggin@gmail.com>
Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/pci/msix.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 57ec7084a4..66f27b9d71 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -15,6 +15,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/log.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
 #include "hw/pci/pci.h"
@@ -260,6 +261,14 @@ static uint64_t msix_pba_mmio_read(void *opaque, hwaddr addr,
 static void msix_pba_mmio_write(void *opaque, hwaddr addr,
                                 uint64_t val, unsigned size)
 {
+    PCIDevice *dev = opaque;
+
+    qemu_log_mask(LOG_GUEST_ERROR,
+                  "PCI [%s:%02x:%02x.%x] attempt to write to MSI-X "
+                  "PBA at 0x%" FMT_PCIBUS ", ignoring.\n",
+                  pci_root_bus_path(dev), pci_dev_bus_num(dev),
+                  PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn),
+                  addr);
 }
 
 static const MemoryRegionOps msix_pba_mmio_ops = {
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 16/41] hw/pci: Assert a bar is not registered multiple times
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (14 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 15/41] hw/pci/msix: Warn on PBA writes Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 17/41] hw/i386/pc: Fix crash that occurs when introspecting TYPE_PC_MACHINE machines Michael S. Tsirkin
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Nicholas Piggin, Phil Dennis-Jordan, Akihiko Odaki,
	Marcel Apfelbaum

From: Nicholas Piggin <npiggin@gmail.com>

Nothing should be doing this, but it doesn't get caught by
pci_register_bar(). Add an assertion to prevent misuse.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20250117172842.406338-3-npiggin@gmail.com>
Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/pci/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 69a1b8c298..1d42847ef0 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1398,6 +1398,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
     assert(hdr_type != PCI_HEADER_TYPE_BRIDGE || region_num < 2);
 
     r = &pci_dev->io_regions[region_num];
+    assert(!r->size);
     r->addr = PCI_BAR_UNMAPPED;
     r->size = size;
     r->type = type;
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 17/41] hw/i386/pc: Fix crash that occurs when introspecting TYPE_PC_MACHINE machines
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (15 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 16/41] hw/pci: Assert a bar is not registered multiple times Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 18/41] hw/i386/microvm: Fix crash that occurs when introspecting the microvm machine Michael S. Tsirkin
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Huth, Akihiko Odaki, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum

From: Thomas Huth <thuth@redhat.com>

QEMU currently crashes when you try to inspect the machines based on
TYPE_PC_MACHINE for their properties:

 $ echo '{ "execute": "qmp_capabilities" }
         { "execute": "qom-list-properties","arguments":
                      { "typename": "pc-q35-10.0-machine"}}' \
   | ./qemu-system-x86_64 -M pc -qmp stdio
 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 9},
  "package": "v9.2.0-1070-g87e115c122-dirty"}, "capabilities": ["oob"]}}
 {"return": {}}
 Segmentation fault (core dumped)

This happens because TYPE_PC_MACHINE machines add a machine_init-
done_notifier in their instance_init function - but instance_init
of machines are not only called for machines that are realized,
but also for machines that are introspected, so in this case the
listener is added for a q35 machine that is never realized. But
since there is already a running pc machine, the listener function
is triggered immediately, causing a crash since it was not for the
right machine it was meant for.

Such listener functions must never be installed from an instance_init
function. Let's do it from pc_basic_device_init() instead - this
function is called from the MachineClass->init() function instead,
i.e. guaranteed to be only called once in the lifetime of a QEMU
process.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2779
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250117192106.471029-1-thuth@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/pc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index b46975c8a4..85b8a76455 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1241,6 +1241,9 @@ void pc_basic_device_init(struct PCMachineState *pcms,
     /* Super I/O */
     pc_superio_init(isa_bus, create_fdctrl, pcms->i8042_enabled,
                     pcms->vmport != ON_OFF_AUTO_ON, &error_fatal);
+
+    pcms->machine_done.notify = pc_machine_done;
+    qemu_add_machine_init_done_notifier(&pcms->machine_done);
 }
 
 void pc_nic_init(PCMachineClass *pcmc, ISABus *isa_bus, PCIBus *pci_bus)
@@ -1714,9 +1717,6 @@ static void pc_machine_initfn(Object *obj)
     if (pcmc->pci_enabled) {
         cxl_machine_init(obj, &pcms->cxl_devices_state);
     }
-
-    pcms->machine_done.notify = pc_machine_done;
-    qemu_add_machine_init_done_notifier(&pcms->machine_done);
 }
 
 static void pc_machine_reset(MachineState *machine, ResetType type)
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 18/41] hw/i386/microvm: Fix crash that occurs when introspecting the microvm machine
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (16 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 17/41] hw/i386/pc: Fix crash that occurs when introspecting TYPE_PC_MACHINE machines Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 19/41] tests/qtest/vhost-user-test: Use modern virtio for vhost-user tests Michael S. Tsirkin
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Huth, Sergio Lopez, Akihiko Odaki,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	Marcel Apfelbaum

From: Thomas Huth <thuth@redhat.com>

QEMU currently crashes when you try to inspect the properties of the
microvm machine:

 $ echo '{ "execute": "qmp_capabilities" }
         { "execute": "qom-list-properties","arguments":
           { "typename": "microvm-machine"}}' | \
   ./qemu-system-x86_64 -qmp stdio
 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 9},
  "package": "v9.2.0-1072-g60af367187-dirty"}, "capabilities": ["oob"]}}
 {"return": {}}
 qemu-system-x86_64: ../qemu/hw/i386/acpi-microvm.c:250:
  void acpi_setup_microvm(MicrovmMachineState *):
   Assertion `x86ms->fw_cfg' failed.
 Aborted (core dumped)

This happens because the microvm machine adds a machine_done (and a
powerdown_req) notifier in their instance_init function - however, the
instance_init of machines are not only called for machines that are
realized, but also for machines that are introspected, so in this case
the listener is added for a microvm machine that is never realized. And
since there is already a running machine, the listener function is
triggered immediately, causing a crash since it was not for the right
machine it was meant for.

Such listener functions must never be installed from an instance_init
function. Let's do it from microvm_machine_state_init() instead - this
function is the MachineClass->init() function instead, i.e. guaranteed
to be only called once in the lifetime of a QEMU process.

Since the microvm_machine_done() and microvm_powerdown_req() were
defined quite late in the microvm.c file, we have to move them now
also earlier, so that we can get their function pointers from
microvm_machine_state_init() without having to introduce a separate
prototype for those functions earlier.

Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250123204708.1560305-1-thuth@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/microvm.c | 66 +++++++++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index a8d354aabe..d0a236c74f 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -451,11 +451,44 @@ static HotplugHandler *microvm_get_hotplug_handler(MachineState *machine,
     return NULL;
 }
 
+static void microvm_machine_done(Notifier *notifier, void *data)
+{
+    MicrovmMachineState *mms = container_of(notifier, MicrovmMachineState,
+                                            machine_done);
+    X86MachineState *x86ms = X86_MACHINE(mms);
+
+    acpi_setup_microvm(mms);
+    dt_setup_microvm(mms);
+    fw_cfg_add_e820(x86ms->fw_cfg);
+}
+
+static void microvm_powerdown_req(Notifier *notifier, void *data)
+{
+    MicrovmMachineState *mms = container_of(notifier, MicrovmMachineState,
+                                            powerdown_req);
+    X86MachineState *x86ms = X86_MACHINE(mms);
+
+    if (x86ms->acpi_dev) {
+        Object *obj = OBJECT(x86ms->acpi_dev);
+        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
+        adevc->send_event(ACPI_DEVICE_IF(x86ms->acpi_dev),
+                          ACPI_POWER_DOWN_STATUS);
+    }
+}
+
 static void microvm_machine_state_init(MachineState *machine)
 {
     MicrovmMachineState *mms = MICROVM_MACHINE(machine);
     X86MachineState *x86ms = X86_MACHINE(machine);
 
+    /* State */
+    mms->kernel_cmdline_fixed = false;
+
+    mms->machine_done.notify = microvm_machine_done;
+    qemu_add_machine_init_done_notifier(&mms->machine_done);
+    mms->powerdown_req.notify = microvm_powerdown_req;
+    qemu_register_powerdown_notifier(&mms->powerdown_req);
+
     microvm_memory_init(mms);
 
     x86_cpus_init(x86ms, CPU_VERSION_LATEST);
@@ -581,31 +614,6 @@ static void microvm_machine_set_auto_kernel_cmdline(Object *obj, bool value,
     mms->auto_kernel_cmdline = value;
 }
 
-static void microvm_machine_done(Notifier *notifier, void *data)
-{
-    MicrovmMachineState *mms = container_of(notifier, MicrovmMachineState,
-                                            machine_done);
-    X86MachineState *x86ms = X86_MACHINE(mms);
-
-    acpi_setup_microvm(mms);
-    dt_setup_microvm(mms);
-    fw_cfg_add_e820(x86ms->fw_cfg);
-}
-
-static void microvm_powerdown_req(Notifier *notifier, void *data)
-{
-    MicrovmMachineState *mms = container_of(notifier, MicrovmMachineState,
-                                            powerdown_req);
-    X86MachineState *x86ms = X86_MACHINE(mms);
-
-    if (x86ms->acpi_dev) {
-        Object *obj = OBJECT(x86ms->acpi_dev);
-        AcpiDeviceIfClass *adevc = ACPI_DEVICE_IF_GET_CLASS(obj);
-        adevc->send_event(ACPI_DEVICE_IF(x86ms->acpi_dev),
-                          ACPI_POWER_DOWN_STATUS);
-    }
-}
-
 static void microvm_machine_initfn(Object *obj)
 {
     MicrovmMachineState *mms = MICROVM_MACHINE(obj);
@@ -617,14 +625,6 @@ static void microvm_machine_initfn(Object *obj)
     mms->isa_serial = true;
     mms->option_roms = true;
     mms->auto_kernel_cmdline = true;
-
-    /* State */
-    mms->kernel_cmdline_fixed = false;
-
-    mms->machine_done.notify = microvm_machine_done;
-    qemu_add_machine_init_done_notifier(&mms->machine_done);
-    mms->powerdown_req.notify = microvm_powerdown_req;
-    qemu_register_powerdown_notifier(&mms->powerdown_req);
 }
 
 GlobalProperty microvm_properties[] = {
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 19/41] tests/qtest/vhost-user-test: Use modern virtio for vhost-user tests
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (17 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 18/41] hw/i386/microvm: Fix crash that occurs when introspecting the microvm machine Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 20/41] hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration Michael S. Tsirkin
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Huth, Fabiano Rosas, Stefano Garzarella,
	Laurent Vivier, Paolo Bonzini

From: Thomas Huth <thuth@redhat.com>

All other vhost-user tests here use modern virtio, too, so let's
adjust the vhost-user-net test accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20250203124346.169607-1-thuth@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/qtest/vhost-user-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 76d142a158..bd977ef28d 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -1043,7 +1043,8 @@ static void test_multiqueue(void *obj, void *arg, QGuestAllocator *alloc)
 
 static uint64_t vu_net_get_features(TestServer *s)
 {
-    uint64_t features = 0x1ULL << VHOST_F_LOG_ALL |
+    uint64_t features = 0x1ULL << VIRTIO_F_VERSION_1 |
+        0x1ULL << VHOST_F_LOG_ALL |
         0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
 
     if (s->queues > 1) {
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 20/41] hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (18 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 19/41] tests/qtest/vhost-user-test: Use modern virtio for vhost-user tests Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 21/41] hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call Michael S. Tsirkin
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Li Zhijian, Jonathan Cameron, Fan Ni

From: Li Zhijian <lizhijian@fujitsu.com>

Introduce the `CXL_T3_MSIX_VECTOR` enumeration to specify MSIX vector
assignments specific to the Type 3 (T3) CXL device.

The primary goal of this change is to encapsulate the MSIX vector uses
that are unique to the T3 device within an enumeration, improving code
readability and maintenance by avoiding magic numbers. This organizational
change allows for more explicit references to each vector’s role, thereby
reducing the potential for misconfiguration.

It also modified `mailbox_reg_init_common` to accept the `msi_n` parameter,
reflecting the new MSIX vector setup.

This pertains to the T3 device privately; other endpoints should refrain from
using it, despite its public accessibility to all of them.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250203161908.145406-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h |  4 ++--
 hw/cxl/cxl-device-utils.c   | 12 +++++-------
 hw/cxl/switch-mailbox-cci.c |  4 +++-
 hw/mem/cxl_type3.c          | 20 ++++++++++++++------
 4 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 561b375dc8..3a0ee7e8e7 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -264,8 +264,8 @@ void cxl_device_register_block_init(Object *obj, CXLDeviceState *dev,
 typedef struct CXLType3Dev CXLType3Dev;
 typedef struct CSWMBCCIDev CSWMBCCIDev;
 /* Set up default values for the register block */
-void cxl_device_register_init_t3(CXLType3Dev *ct3d);
-void cxl_device_register_init_swcci(CSWMBCCIDev *sw);
+void cxl_device_register_init_t3(CXLType3Dev *ct3d, int msi_n);
+void cxl_device_register_init_swcci(CSWMBCCIDev *sw, int msi_n);
 
 /*
  * CXL r3.1 Section 8.2.8.1: CXL Device Capabilities Array Register
diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 035d034f6d..52ad1e4c3f 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -352,10 +352,8 @@ static void device_reg_init_common(CXLDeviceState *cxl_dstate)
     }
 }
 
-static void mailbox_reg_init_common(CXLDeviceState *cxl_dstate)
+static void mailbox_reg_init_common(CXLDeviceState *cxl_dstate, int msi_n)
 {
-    const uint8_t msi_n = 9;
-
     /* 2048 payload size */
     ARRAY_FIELD_DP32(cxl_dstate->mbox_reg_state32, CXL_DEV_MAILBOX_CAP,
                      PAYLOAD_SIZE, CXL_MAILBOX_PAYLOAD_SHIFT);
@@ -382,7 +380,7 @@ static void memdev_reg_init_common(CXLDeviceState *cxl_dstate)
     cxl_dstate->memdev_status = memdev_status_reg;
 }
 
-void cxl_device_register_init_t3(CXLType3Dev *ct3d)
+void cxl_device_register_init_t3(CXLType3Dev *ct3d, int msi_n)
 {
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
     uint64_t *cap_h = cxl_dstate->caps_reg_state64;
@@ -398,7 +396,7 @@ void cxl_device_register_init_t3(CXLType3Dev *ct3d)
     device_reg_init_common(cxl_dstate);
 
     cxl_device_cap_init(cxl_dstate, MAILBOX, 2, CXL_DEV_MAILBOX_VERSION);
-    mailbox_reg_init_common(cxl_dstate);
+    mailbox_reg_init_common(cxl_dstate, msi_n);
 
     cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000,
         CXL_MEM_DEV_STATUS_VERSION);
@@ -408,7 +406,7 @@ void cxl_device_register_init_t3(CXLType3Dev *ct3d)
                               CXL_MAILBOX_MAX_PAYLOAD_SIZE);
 }
 
-void cxl_device_register_init_swcci(CSWMBCCIDev *sw)
+void cxl_device_register_init_swcci(CSWMBCCIDev *sw, int msi_n)
 {
     CXLDeviceState *cxl_dstate = &sw->cxl_dstate;
     uint64_t *cap_h = cxl_dstate->caps_reg_state64;
@@ -423,7 +421,7 @@ void cxl_device_register_init_swcci(CSWMBCCIDev *sw)
     device_reg_init_common(cxl_dstate);
 
     cxl_device_cap_init(cxl_dstate, MAILBOX, 2, 1);
-    mailbox_reg_init_common(cxl_dstate);
+    mailbox_reg_init_common(cxl_dstate, msi_n);
 
     cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000, 1);
     memdev_reg_init_common(cxl_dstate);
diff --git a/hw/cxl/switch-mailbox-cci.c b/hw/cxl/switch-mailbox-cci.c
index 65cdac6cc1..833b824619 100644
--- a/hw/cxl/switch-mailbox-cci.c
+++ b/hw/cxl/switch-mailbox-cci.c
@@ -17,10 +17,12 @@
 #include "hw/qdev-properties.h"
 #include "hw/cxl/cxl.h"
 
+#define CXL_SWCCI_MSIX_MBOX 3
+
 static void cswmbcci_reset(DeviceState *dev)
 {
     CSWMBCCIDev *cswmb = CXL_SWITCH_MAILBOX_CCI(dev);
-    cxl_device_register_init_swcci(cswmb);
+    cxl_device_register_init_swcci(cswmb, CXL_SWCCI_MSIX_MBOX);
 }
 
 static void cswbcci_realize(PCIDevice *pci_dev, Error **errp)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 0ae1704a34..ebc0ec536e 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -30,6 +30,14 @@
 #include "hw/cxl/cxl.h"
 #include "hw/pci/msix.h"
 
+/* type3 device private */
+enum CXL_T3_MSIX_VECTOR {
+    CXL_T3_MSIX_PCIE_DOE_TABLE_ACCESS = 0,
+    CXL_T3_MSIX_EVENT_START = 2,
+    CXL_T3_MSIX_MBOX = CXL_T3_MSIX_EVENT_START + CXL_EVENT_TYPE_MAX,
+    CXL_T3_MSIX_VECTOR_NR
+};
+
 #define DWORD_BYTE 4
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 
@@ -843,7 +851,6 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     ComponentRegisters *regs = &cxl_cstate->crb;
     MemoryRegion *mr = &regs->component_registers;
     uint8_t *pci_conf = pci_dev->config;
-    unsigned short msix_num = 10;
     int i, rc;
     uint16_t count;
 
@@ -884,16 +891,17 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
                      &ct3d->cxl_dstate.device_registers);
 
     /* MSI(-X) Initialization */
-    rc = msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
+    rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, NULL);
     if (rc) {
         goto err_address_space_free;
     }
-    for (i = 0; i < msix_num; i++) {
+    for (i = 0; i < CXL_T3_MSIX_VECTOR_NR; i++) {
         msix_vector_use(pci_dev, i);
     }
 
     /* DOE Initialization */
-    pcie_doe_init(pci_dev, &ct3d->doe_cdat, 0x190, doe_cdat_prot, true, 0);
+    pcie_doe_init(pci_dev, &ct3d->doe_cdat, 0x190, doe_cdat_prot, true,
+                  CXL_T3_MSIX_PCIE_DOE_TABLE_ACCESS);
 
     cxl_cstate->cdat.build_cdat_table = ct3_build_cdat_table;
     cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
@@ -908,7 +916,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     if (rc) {
         goto err_release_cdat;
     }
-    cxl_event_init(&ct3d->cxl_dstate, 2);
+    cxl_event_init(&ct3d->cxl_dstate, CXL_T3_MSIX_EVENT_START);
 
     /* Set default value for patrol scrub attributes */
     ct3d->patrol_scrub_attrs.scrub_cycle_cap =
@@ -1202,7 +1210,7 @@ static void ct3d_reset(DeviceState *dev)
 
     pcie_cap_fill_link_ep_usp(PCI_DEVICE(dev), ct3d->width, ct3d->speed);
     cxl_component_register_init_common(reg_state, write_msk, CXL2_TYPE3_DEVICE);
-    cxl_device_register_init_t3(ct3d);
+    cxl_device_register_init_t3(ct3d, CXL_T3_MSIX_MBOX);
 
     /*
      * Bring up an endpoint to target with MCTP over VDM.
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 21/41] hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (19 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 20/41] hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 22/41] hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure Michael S. Tsirkin
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Li Zhijian, Jonathan Cameron, Fan Ni

From: Li Zhijian <lizhijian@fujitsu.com>

msix_uninit_exclusive_bar() should be paired with msix_init_exclusive_bar()

Ensure proper resource cleanup by adding the missing
`msix_uninit_exclusive_bar()` call for the Type3 CXL device.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250203161908.145406-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/mem/cxl_type3.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index ebc0ec536e..4775aab0d6 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -944,6 +944,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 err_release_cdat:
     cxl_doe_cdat_release(cxl_cstate);
 err_free_special_ops:
+    msix_uninit_exclusive_bar(pci_dev);
     g_free(regs->special_ops);
 err_address_space_free:
     if (ct3d->dc.host_dc) {
@@ -967,6 +968,7 @@ static void ct3_exit(PCIDevice *pci_dev)
 
     pcie_aer_exit(pci_dev);
     cxl_doe_cdat_release(cxl_cstate);
+    msix_uninit_exclusive_bar(pci_dev);
     g_free(regs->special_ops);
     if (ct3d->dc.host_dc) {
         cxl_destroy_dc_regions(ct3d);
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 22/41] hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (20 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 21/41] hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 23/41] hw/mem/cxl_type3: Ensure errp is set on realization failure Michael S. Tsirkin
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Li Zhijian, Jonathan Cameron,
	Philippe Mathieu-Daudé, Fan Ni

From: Li Zhijian <lizhijian@fujitsu.com>

Address a memory leak issue by ensuring `regs->special_ops` is freed when
`msix_init_exclusive_bar()` encounters an error during CXL Type3 device
initialization.

Additionally, this patch renames err_address_space_free to err_msix_uninit
for better clarity and logical flow

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250203161908.145406-4-Jonathan.Cameron@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/mem/cxl_type3.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 4775aab0d6..ff6861889b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -893,7 +893,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     /* MSI(-X) Initialization */
     rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, NULL);
     if (rc) {
-        goto err_address_space_free;
+        goto err_free_special_ops;
     }
     for (i = 0; i < CXL_T3_MSIX_VECTOR_NR; i++) {
         msix_vector_use(pci_dev, i);
@@ -907,7 +907,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
     cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
     cxl_cstate->cdat.private = ct3d;
     if (!cxl_doe_cdat_init(cxl_cstate, errp)) {
-        goto err_free_special_ops;
+        goto err_msix_uninit;
     }
 
     pcie_cap_deverr_init(pci_dev);
@@ -943,10 +943,10 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 
 err_release_cdat:
     cxl_doe_cdat_release(cxl_cstate);
-err_free_special_ops:
+err_msix_uninit:
     msix_uninit_exclusive_bar(pci_dev);
+err_free_special_ops:
     g_free(regs->special_ops);
-err_address_space_free:
     if (ct3d->dc.host_dc) {
         cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 23/41] hw/mem/cxl_type3: Ensure errp is set on realization failure
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (21 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 22/41] hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 24/41] mem/cxl_type3: support 3, 6, 12 and 16 interleave ways Michael S. Tsirkin
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Li Zhijian, Jonathan Cameron,
	Philippe Mathieu-Daudé, Fan Ni

From: Li Zhijian <lizhijian@fujitsu.com>

Simply pass the errp to its callee which will set errp if needed, to
enhance error reporting for CXL Type 3 device initialization by setting
the errp when realization functions fail.

Previously, failing to set `errp` could result in errors being overlooked,
causing the system to mistakenly treat failure scenarios as successful and
potentially leading to redundant cleanup operations in ct3_exit().

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250203161908.145406-5-Jonathan.Cameron@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/mem/cxl_type3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index ff6861889b..d8b45f9bd1 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -891,7 +891,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
                      &ct3d->cxl_dstate.device_registers);
 
     /* MSI(-X) Initialization */
-    rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, NULL);
+    rc = msix_init_exclusive_bar(pci_dev, CXL_T3_MSIX_VECTOR_NR, 4, errp);
     if (rc) {
         goto err_free_special_ops;
     }
@@ -912,7 +912,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 
     pcie_cap_deverr_init(pci_dev);
     /* Leave a bit of room for expansion */
-    rc = pcie_aer_init(pci_dev, PCI_ERR_VER, 0x200, PCI_ERR_SIZEOF, NULL);
+    rc = pcie_aer_init(pci_dev, PCI_ERR_VER, 0x200, PCI_ERR_SIZEOF, errp);
     if (rc) {
         goto err_release_cdat;
     }
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 24/41] mem/cxl_type3: support 3, 6, 12 and 16 interleave ways
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (22 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 23/41] hw/mem/cxl_type3: Ensure errp is set on realization failure Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:23 ` [PULL 25/41] hw/virtio: reset virtio balloon stats on machine reset Michael S. Tsirkin
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Yao Xingtao, Jonathan Cameron, Fan Ni

From: Yao Xingtao <yaoxt.fnst@fujitsu.com>

Since the kernel does not check the interleave capability, a
3-way, 6-way, 12-way or 16-way region can be create normally.

Applications can access the memory of 16-way region normally because
qemu can convert hpa to dpa correctly for the power of 2 interleave
ways, after kernel implementing the check, this kind of region will
not be created any more.

For non power of 2 interleave ways, applications could not access the
memory normally and may occur some unexpected behaviors, such as
segmentation fault.

So implements this feature is needed.

Link: https://lore.kernel.org/linux-cxl/3e84b919-7631-d1db-3e1d-33000f3f3868@fujitsu.com/
Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20250203161908.145406-6-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/cxl/cxl-component-utils.c |  9 +++++++--
 hw/mem/cxl_type3.c           | 15 +++++++++++----
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index cd116c0401..473895948b 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -243,8 +243,13 @@ static void hdm_init_common(uint32_t *reg_state, uint32_t *write_msk,
     ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, INTERLEAVE_4K, 1);
     ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY,
                      POISON_ON_ERR_CAP, 0);
-    ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 0);
-    ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 0);
+    if (type == CXL2_TYPE3_DEVICE) {
+        ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 1);
+        ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 1);
+    } else {
+        ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 3_6_12_WAY, 0);
+        ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 16_WAY, 0);
+    }
     ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, UIO, 0);
     ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY,
                      UIO_DECODER_COUNT, 0);
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index d8b45f9bd1..6fffa21ead 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1100,10 +1100,17 @@ static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
             continue;
         }
 
-        *dpa = dpa_base +
-            ((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
-             ((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset)
-              >> iw));
+        if (iw < 8) {
+            *dpa = dpa_base +
+                ((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
+                 ((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset)
+                  >> iw));
+        } else {
+            *dpa = dpa_base +
+                ((MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
+                 ((((MAKE_64BIT_MASK(ig + iw, 64 - ig - iw) & hpa_offset)
+                   >> (ig + iw)) / 3) << (ig + 8)));
+        }
 
         return true;
     }
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 25/41] hw/virtio: reset virtio balloon stats on machine reset
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (23 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 24/41] mem/cxl_type3: support 3, 6, 12 and 16 interleave ways Michael S. Tsirkin
@ 2025-02-21 12:23 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 26/41] amd_iommu: Use correct DTE field for interrupt passthrough Michael S. Tsirkin
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Daniel P. Berrangé, David Hildenbrand

From: Daniel P. Berrangé <berrange@redhat.com>

When a machine is first booted, all virtio balloon stats are initialized
to their default value -1 (18446744073709551615 when represented as
unsigned).

They remain that way while the firmware is loading, and early phase of
guest OS boot, until the virtio-balloon driver is activated. Thereafter
the reported stats reflect the guest OS activity.

When a machine reset is performed, however, the virtio-balloon stats are
left unchanged by QEMU, despite the guest OS no longer updating them,
nor indeed even still existing.

IOW, the mgmt app keeps getting stale stats until the guest OS starts
once more and loads the virtio-balloon driver (if ever). At that point
the app will see a discontinuity in the reported values as they sudden
jump from the stale value to the new value. This jump is indigituishable
from a valid data update.

While there is an "last-updated" field to report on the freshness of
the stats, that does not unambiguously tell the mgmt app whether the
stats are still conceptually relevant to the current running workload.

It is more conceptually useful to reset the stats to their default
values on machine reset, given that the previous guest workload the
stats reflect no longer exists. The mgmt app can now clearly identify
that there are is no stats information available from the current
executing workload.

The 'last-updated' time is also reset back to 0.

IOW, on every machine reset, the virtio stats are in the same clean
state they were when the macine first powered on.

A functional test is added to validate this behaviour with a real
world guest OS.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20250204094202.2183262-1-berrange@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-balloon.h      |   4 +
 hw/virtio/virtio-balloon.c              |  30 ++++-
 MAINTAINERS                             |   1 +
 tests/functional/meson.build            |   2 +
 tests/functional/test_virtio_balloon.py | 161 ++++++++++++++++++++++++
 5 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100755 tests/functional/test_virtio_balloon.py

diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index b12c18a43b..0456c211c6 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -16,6 +16,7 @@
 #define QEMU_VIRTIO_BALLOON_H
 
 #include "standard-headers/linux/virtio_balloon.h"
+#include "hw/resettable.h"
 #include "hw/virtio/virtio.h"
 #include "system/iothread.h"
 #include "qom/object.h"
@@ -71,6 +72,9 @@ struct VirtIOBalloon {
 
     bool qemu_4_0_config_size;
     uint32_t poison_val;
+
+    /* State of the resettable container */
+    ResettableState reset_state;
 };
 
 #endif
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index ad05768ded..2eb5a14fa2 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,7 +31,7 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "migration/misc.h"
-
+#include "system/reset.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 
@@ -910,6 +910,8 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
     }
 
     reset_stats(s);
+    s->stats_last_update = 0;
+    qemu_register_resettable(OBJECT(dev));
 }
 
 static void virtio_balloon_device_unrealize(DeviceState *dev)
@@ -917,6 +919,7 @@ static void virtio_balloon_device_unrealize(DeviceState *dev)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
 
+    qemu_unregister_resettable(OBJECT(dev));
     if (s->free_page_bh) {
         qemu_bh_delete(s->free_page_bh);
         object_unref(OBJECT(s->iothread));
@@ -987,6 +990,27 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
     }
 }
 
+static ResettableState *virtio_balloon_get_reset_state(Object *obj)
+{
+    VirtIOBalloon *s = VIRTIO_BALLOON(obj);
+    return &s->reset_state;
+}
+
+static void virtio_balloon_reset_enter(Object *obj, ResetType type)
+{
+    VirtIOBalloon *s = VIRTIO_BALLOON(obj);
+
+    /*
+     * When waking up from standby/suspend-to-ram, do not reset stats.
+     */
+    if (type == RESET_TYPE_WAKEUP) {
+        return;
+    }
+
+    reset_stats(s);
+    s->stats_last_update = 0;
+}
+
 static void virtio_balloon_instance_init(Object *obj)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(obj);
@@ -1038,6 +1062,7 @@ static void virtio_balloon_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
 
     device_class_set_props(dc, virtio_balloon_properties);
     dc->vmsd = &vmstate_virtio_balloon;
@@ -1050,6 +1075,9 @@ static void virtio_balloon_class_init(ObjectClass *klass, void *data)
     vdc->get_features = virtio_balloon_get_features;
     vdc->set_status = virtio_balloon_set_status;
     vdc->vmsd = &vmstate_virtio_balloon_device;
+
+    rc->get_state = virtio_balloon_get_reset_state;
+    rc->phases.enter = virtio_balloon_reset_enter;
 }
 
 static const TypeInfo virtio_balloon_info = {
diff --git a/MAINTAINERS b/MAINTAINERS
index a928ce3e41..013a57d5bf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2246,6 +2246,7 @@ F: include/hw/virtio/virtio-balloon.h
 F: system/balloon.c
 F: include/system/balloon.h
 F: tests/qtest/virtio-balloon-test.c
+F: tests/functional/test_virtio_balloon.py
 
 virtio-9p
 M: Greg Kurz <groug@kaod.org>
diff --git a/tests/functional/meson.build b/tests/functional/meson.build
index cf80924ddc..2d399cc464 100644
--- a/tests/functional/meson.build
+++ b/tests/functional/meson.build
@@ -44,6 +44,7 @@ test_timeouts = {
   'riscv64_tuxrun' : 120,
   's390x_ccw_virtio' : 420,
   'sh4_tuxrun' : 240,
+  'virtio_balloon': 120,
 }
 
 tests_generic_system = [
@@ -242,6 +243,7 @@ tests_x86_64_system_thorough = [
   'linux_initrd',
   'multiprocess',
   'netdev_ethtool',
+  'virtio_balloon',
   'virtio_gpu',
   'x86_64_hotplug_cpu',
   'x86_64_tuxrun',
diff --git a/tests/functional/test_virtio_balloon.py b/tests/functional/test_virtio_balloon.py
new file mode 100755
index 0000000000..67b48e1b4e
--- /dev/null
+++ b/tests/functional/test_virtio_balloon.py
@@ -0,0 +1,161 @@
+#!/usr/bin/env python3
+#
+# virtio-balloon tests
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+import time
+
+from qemu_test import QemuSystemTest, Asset
+from qemu_test import wait_for_console_pattern
+from qemu_test import exec_command_and_wait_for_pattern
+
+UNSET_STATS_VALUE = 18446744073709551615
+
+
+class VirtioBalloonx86(QemuSystemTest):
+
+    ASSET_KERNEL = Asset(
+        ('https://archives.fedoraproject.org/pub/archive/fedora/linux/releases'
+         '/31/Server/x86_64/os/images/pxeboot/vmlinuz'),
+        'd4738d03dbbe083ca610d0821d0a8f1488bebbdccef54ce33e3adb35fda00129')
+
+    ASSET_INITRD = Asset(
+        ('https://archives.fedoraproject.org/pub/archive/fedora/linux/releases'
+         '/31/Server/x86_64/os/images/pxeboot/initrd.img'),
+        '277cd6c7adf77c7e63d73bbb2cded8ef9e2d3a2f100000e92ff1f8396513cd8b')
+
+    ASSET_DISKIMAGE = Asset(
+        ('https://archives.fedoraproject.org/pub/archive/fedora/linux/releases'
+         '/31/Cloud/x86_64/images/Fedora-Cloud-Base-31-1.9.x86_64.qcow2'),
+        'e3c1b309d9203604922d6e255c2c5d098a309c2d46215d8fc026954f3c5c27a0')
+
+    DEFAULT_KERNEL_PARAMS = ('root=/dev/vda1 console=ttyS0 net.ifnames=0 '
+                             'rd.rescue')
+
+    def wait_for_console_pattern(self, success_message, vm=None):
+        wait_for_console_pattern(
+            self,
+            success_message,
+            failure_message="Kernel panic - not syncing",
+            vm=vm,
+        )
+
+    def mount_root(self):
+        self.wait_for_console_pattern('Entering emergency mode.')
+        prompt = '# '
+        self.wait_for_console_pattern(prompt)
+
+        exec_command_and_wait_for_pattern(self, 'mount /dev/vda1 /sysroot',
+                                          prompt)
+        exec_command_and_wait_for_pattern(self, 'chroot /sysroot',
+                                          prompt)
+        exec_command_and_wait_for_pattern(self, "modprobe virtio-balloon",
+                                          prompt)
+
+    def assert_initial_stats(self):
+        ret = self.vm.qmp('qom-get',
+                          {'path': '/machine/peripheral/balloon',
+                           'property': 'guest-stats'})['return']
+        when = ret.get('last-update')
+        assert when == 0
+        stats = ret.get('stats')
+        for name, val in stats.items():
+            assert val == UNSET_STATS_VALUE
+
+    def assert_running_stats(self, then):
+        ret = self.vm.qmp('qom-get',
+                          {'path': '/machine/peripheral/balloon',
+                           'property': 'guest-stats'})['return']
+        when = ret.get('last-update')
+        now = time.time()
+
+        assert when > then and when < now
+        stats = ret.get('stats')
+        # Stat we expect this particular Kernel to have set
+        expectData = [
+            "stat-available-memory",
+            "stat-disk-caches",
+            "stat-free-memory",
+            "stat-htlb-pgalloc",
+            "stat-htlb-pgfail",
+            "stat-major-faults",
+            "stat-minor-faults",
+            "stat-swap-in",
+            "stat-swap-out",
+            "stat-total-memory",
+        ]
+        for name, val in stats.items():
+            if name in expectData:
+                assert val != UNSET_STATS_VALUE
+            else:
+                assert val == UNSET_STATS_VALUE
+
+    def test_virtio_balloon_stats(self):
+        self.set_machine('q35')
+        kernel_path = self.ASSET_KERNEL.fetch()
+        initrd_path = self.ASSET_INITRD.fetch()
+        diskimage_path = self.ASSET_DISKIMAGE.fetch()
+
+        self.vm.set_console()
+        self.vm.add_args("-S")
+        self.vm.add_args("-cpu", "max")
+        self.vm.add_args("-m", "2G")
+        # Slow down BIOS phase with boot menu, so that after a system
+        # reset, we can reliably catch the clean stats again in BIOS
+        # phase before the guest OS launches
+        self.vm.add_args("-boot", "menu=on")
+        self.vm.add_args("-machine", "q35,accel=kvm:tcg")
+        self.vm.add_args("-device", "virtio-balloon,id=balloon")
+        self.vm.add_args('-drive',
+                         f'file={diskimage_path},if=none,id=drv0,snapshot=on')
+        self.vm.add_args('-device', 'virtio-blk-pci,bus=pcie.0,' +
+                         'drive=drv0,id=virtio-disk0,bootindex=1')
+
+        self.vm.add_args(
+            "-kernel",
+            kernel_path,
+            "-initrd",
+            initrd_path,
+            "-append",
+            self.DEFAULT_KERNEL_PARAMS
+        )
+        self.vm.launch()
+
+        # Poll stats at 100ms
+        self.vm.qmp('qom-set',
+                    {'path': '/machine/peripheral/balloon',
+                     'property': 'guest-stats-polling-interval',
+                     'value': 100 })
+
+        # We've not run any guest code yet, neither BIOS or guest,
+        # so stats should be all default values
+        self.assert_initial_stats()
+
+        self.vm.qmp('cont')
+
+        then = time.time()
+        self.mount_root()
+        self.assert_running_stats(then)
+
+        # Race window between these two commands, where we
+        # rely on '-boot menu=on' to (hopefully) ensure we're
+        # still executing the BIOS when QEMU processes the
+        # 'stop', and thus have not loaded the virtio-balloon
+        # driver in the guest
+        self.vm.qmp('system_reset')
+        self.vm.qmp('stop')
+
+        # If the above assumption held, we're in BIOS now and
+        # stats should be all back at their default values
+        self.assert_initial_stats()
+        self.vm.qmp('cont')
+
+        then = time.time()
+        self.mount_root()
+        self.assert_running_stats(then)
+
+
+if __name__ == '__main__':
+    QemuSystemTest.main()
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 26/41] amd_iommu: Use correct DTE field for interrupt passthrough
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (24 preceding siblings ...)
  2025-02-21 12:23 ` [PULL 25/41] hw/virtio: reset virtio balloon stats on machine reset Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 27/41] amd_iommu: Use correct bitmask to set capability BAR Michael S. Tsirkin
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Sairaj Kodilkar, Vasant Hegde, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum

From: Sairaj Kodilkar <sarunkod@amd.com>

Interrupt passthrough is determine by the bits 191,190,187-184.
These bits are part of the 3rd quad word (i.e. index 2) in DTE. Hence
replace dte[3] by dte[2].

Fixes: b44159fe0 ("x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250207045354.27329-2-sarunkod@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/amd_iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 6b13ce894b..98f1209a38 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1309,15 +1309,15 @@ static int amdvi_int_remap_msi(AMDVIState *iommu,
         ret = -AMDVI_IR_ERR;
         break;
     case AMDVI_IOAPIC_INT_TYPE_NMI:
-        pass = dte[3] & AMDVI_DEV_NMI_PASS_MASK;
+        pass = dte[2] & AMDVI_DEV_NMI_PASS_MASK;
         trace_amdvi_ir_delivery_mode("nmi");
         break;
     case AMDVI_IOAPIC_INT_TYPE_INIT:
-        pass = dte[3] & AMDVI_DEV_INT_PASS_MASK;
+        pass = dte[2] & AMDVI_DEV_INT_PASS_MASK;
         trace_amdvi_ir_delivery_mode("init");
         break;
     case AMDVI_IOAPIC_INT_TYPE_EINT:
-        pass = dte[3] & AMDVI_DEV_EINT_PASS_MASK;
+        pass = dte[2] & AMDVI_DEV_EINT_PASS_MASK;
         trace_amdvi_ir_delivery_mode("eint");
         break;
     default:
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 27/41] amd_iommu: Use correct bitmask to set capability BAR
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (25 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 26/41] amd_iommu: Use correct DTE field for interrupt passthrough Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 28/41] vhost-iova-tree: Implement an IOVA-only tree Michael S. Tsirkin
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Sairaj Kodilkar, Vasant Hegde, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum

From: Sairaj Kodilkar <sarunkod@amd.com>

AMD IOMMU provides the base address of control registers through
IVRS table and PCI capability. Since this base address is of 64 bit,
use 32 bits mask (instead of 16 bits) to set BAR low and high.

Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Message-Id: <20250207045354.27329-3-sarunkod@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/amd_iommu.h | 2 +-
 hw/i386/amd_iommu.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index e0dac4d9a9..28125130c6 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -187,7 +187,7 @@
         AMDVI_CAPAB_FLAG_HTTUNNEL |  AMDVI_CAPAB_EFR_SUP)
 
 /* AMDVI default address */
-#define AMDVI_BASE_ADDR 0xfed80000
+#define AMDVI_BASE_ADDR 0xfed80000ULL
 
 /* page management constants */
 #define AMDVI_PAGE_SHIFT 12
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 98f1209a38..044fe43256 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1593,9 +1593,9 @@ static void amdvi_pci_realize(PCIDevice *pdev, Error **errp)
     /* reset AMDVI specific capabilities, all r/o */
     pci_set_long(pdev->config + s->capab_offset, AMDVI_CAPAB_FEATURES);
     pci_set_long(pdev->config + s->capab_offset + AMDVI_CAPAB_BAR_LOW,
-                 AMDVI_BASE_ADDR & ~(0xffff0000));
+                 AMDVI_BASE_ADDR & MAKE_64BIT_MASK(14, 18));
     pci_set_long(pdev->config + s->capab_offset + AMDVI_CAPAB_BAR_HIGH,
-                (AMDVI_BASE_ADDR & ~(0xffff)) >> 16);
+                AMDVI_BASE_ADDR >> 32);
     pci_set_long(pdev->config + s->capab_offset + AMDVI_CAPAB_RANGE,
                  0xff000000);
     pci_set_long(pdev->config + s->capab_offset + AMDVI_CAPAB_MISC, 0);
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 28/41] vhost-iova-tree: Implement an IOVA-only tree
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (26 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 27/41] amd_iommu: Use correct bitmask to set capability BAR Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 29/41] vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees Michael S. Tsirkin
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jonah Palmer, Eugenio Pérez, Lei Yang,
	Stefano Garzarella, Jason Wang

From: Jonah Palmer <jonah.palmer@oracle.com>

Creates and supports an IOVA-only tree to support a SVQ IOVA->HVA and
GPA->IOVA tree for host-only and guest-backed memory, respectively, in
the next patch.

The IOVA allocator still allocates an IOVA range but now adds this range
to the IOVA-only tree as well as to the full IOVA->HVA tree.

In the next patch, the full IOVA->HVA tree will be split into a partial
SVQ IOVA->HVA tree and a GPA->IOVA tree. The motivation behind having an
IOVA-only tree was to have a single tree that would keep track of all
allocated IOVA ranges between the partial SVQ IOVA->HVA and GPA->IOVA
trees.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20250217144936.3589907-2-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-iova-tree.h |  3 ++-
 hw/virtio/vhost-iova-tree.c | 26 ++++++++++++++++++++------
 hw/virtio/vhost-vdpa.c      | 29 +++++++++++++++++++++--------
 net/vhost-vdpa.c            | 10 ++++++++--
 4 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
index 4adfd79ff0..525ce72b1d 100644
--- a/hw/virtio/vhost-iova-tree.h
+++ b/hw/virtio/vhost-iova-tree.h
@@ -21,7 +21,8 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
 
 const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
                                         const DMAMap *map);
-int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
+int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map,
+                              hwaddr taddr);
 void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map);
 
 #endif
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 3d03395a77..216885aa3c 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -28,6 +28,9 @@ struct VhostIOVATree {
 
     /* IOVA address to qemu memory maps. */
     IOVATree *iova_taddr_map;
+
+    /* Allocated IOVA addresses */
+    IOVATree *iova_map;
 };
 
 /**
@@ -44,6 +47,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
     tree->iova_last = iova_last;
 
     tree->iova_taddr_map = iova_tree_new();
+    tree->iova_map = iova_tree_new();
     return tree;
 }
 
@@ -53,6 +57,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
 void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
 {
     iova_tree_destroy(iova_tree->iova_taddr_map);
+    iova_tree_destroy(iova_tree->iova_map);
     g_free(iova_tree);
 }
 
@@ -75,6 +80,7 @@ const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
  *
  * @tree: The iova tree
  * @map: The iova map
+ * @taddr: The translated address (HVA)
  *
  * Returns:
  * - IOVA_OK if the map fits in the container
@@ -83,19 +89,26 @@ const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
  *
  * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
  */
-int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
+int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map, hwaddr taddr)
 {
+    int ret;
+
     /* Some vhost devices do not like addr 0. Skip first page */
     hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size();
 
-    if (map->translated_addr + map->size < map->translated_addr ||
-        map->perm == IOMMU_NONE) {
+    if (taddr + map->size < taddr || map->perm == IOMMU_NONE) {
         return IOVA_ERR_INVALID;
     }
 
-    /* Allocate a node in IOVA address */
-    return iova_tree_alloc_map(tree->iova_taddr_map, map, iova_first,
-                               tree->iova_last);
+    /* Allocate a node in the IOVA-only tree */
+    ret = iova_tree_alloc_map(tree->iova_map, map, iova_first, tree->iova_last);
+    if (unlikely(ret != IOVA_OK)) {
+        return ret;
+    }
+
+    /* Insert a node in the IOVA->HVA tree */
+    map->translated_addr = taddr;
+    return iova_tree_insert(tree->iova_taddr_map, map);
 }
 
 /**
@@ -107,4 +120,5 @@ int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
 void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map)
 {
     iova_tree_remove(iova_tree->iova_taddr_map, map);
+    iova_tree_remove(iova_tree->iova_map, map);
 }
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 3cdaa12ed5..703dcfc929 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -360,14 +360,20 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
     llsize = int128_sub(llend, int128_make64(iova));
     if (s->shadow_data) {
         int r;
+        hwaddr hw_vaddr = (hwaddr)(uintptr_t)vaddr;
 
-        mem_region.translated_addr = (hwaddr)(uintptr_t)vaddr,
         mem_region.size = int128_get64(llsize) - 1,
         mem_region.perm = IOMMU_ACCESS_FLAG(true, section->readonly),
 
-        r = vhost_iova_tree_map_alloc(s->iova_tree, &mem_region);
+        r = vhost_iova_tree_map_alloc(s->iova_tree, &mem_region, hw_vaddr);
         if (unlikely(r != IOVA_OK)) {
             error_report("Can't allocate a mapping (%d)", r);
+
+            if (mem_region.translated_addr == hw_vaddr) {
+                error_report("Insertion to IOVA->HVA tree failed");
+                /* Remove the mapping from the IOVA-only tree */
+                goto fail_map;
+            }
             goto fail;
         }
 
@@ -1142,16 +1148,23 @@ static void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
  *
  * @v: Vhost-vdpa device
  * @needle: The area to search iova
+ * @taddr: The translated address (HVA)
  * @errorp: Error pointer
  */
 static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap *needle,
-                                    Error **errp)
+                                    hwaddr taddr, Error **errp)
 {
     int r;
 
-    r = vhost_iova_tree_map_alloc(v->shared->iova_tree, needle);
+    r = vhost_iova_tree_map_alloc(v->shared->iova_tree, needle, taddr);
     if (unlikely(r != IOVA_OK)) {
         error_setg(errp, "Cannot allocate iova (%d)", r);
+
+        if (needle->translated_addr == taddr) {
+            error_append_hint(errp, "Insertion to IOVA->HVA tree failed");
+            /* Remove the mapping from the IOVA-only tree */
+            vhost_iova_tree_remove(v->shared->iova_tree, *needle);
+        }
         return false;
     }
 
@@ -1192,11 +1205,11 @@ static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
     vhost_svq_get_vring_addr(svq, &svq_addr);
 
     driver_region = (DMAMap) {
-        .translated_addr = svq_addr.desc_user_addr,
         .size = driver_size - 1,
         .perm = IOMMU_RO,
     };
-    ok = vhost_vdpa_svq_map_ring(v, &driver_region, errp);
+    ok = vhost_vdpa_svq_map_ring(v, &driver_region, svq_addr.desc_user_addr,
+                                 errp);
     if (unlikely(!ok)) {
         error_prepend(errp, "Cannot create vq driver region: ");
         return false;
@@ -1206,11 +1219,11 @@ static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
     addr->avail_user_addr = driver_region.iova + avail_offset;
 
     device_region = (DMAMap) {
-        .translated_addr = svq_addr.used_user_addr,
         .size = device_size - 1,
         .perm = IOMMU_RW,
     };
-    ok = vhost_vdpa_svq_map_ring(v, &device_region, errp);
+    ok = vhost_vdpa_svq_map_ring(v, &device_region, svq_addr.used_user_addr,
+                                 errp);
     if (unlikely(!ok)) {
         error_prepend(errp, "Cannot create vq device region: ");
         vhost_vdpa_svq_unmap_ring(v, driver_region.translated_addr);
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 231b45246c..5a3a57203d 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -510,14 +510,20 @@ static int vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, void *buf, size_t size,
                                   bool write)
 {
     DMAMap map = {};
+    hwaddr taddr = (hwaddr)(uintptr_t)buf;
     int r;
 
-    map.translated_addr = (hwaddr)(uintptr_t)buf;
     map.size = size - 1;
     map.perm = write ? IOMMU_RW : IOMMU_RO,
-    r = vhost_iova_tree_map_alloc(v->shared->iova_tree, &map);
+    r = vhost_iova_tree_map_alloc(v->shared->iova_tree, &map, taddr);
     if (unlikely(r != IOVA_OK)) {
         error_report("Cannot map injected element");
+
+        if (map.translated_addr == taddr) {
+            error_report("Insertion to IOVA->HVA tree failed");
+            /* Remove the mapping from the IOVA-only tree */
+            goto dma_map_err;
+        }
         return r;
     }
 
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 29/41] vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (27 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 28/41] vhost-iova-tree: Implement an IOVA-only tree Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 30/41] vhost-iova-tree: Update documentation Michael S. Tsirkin
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jonah Palmer, Eugenio Pérez,
	Stefano Garzarella, Peter Xu, Jason Wang

From: Jonah Palmer <jonah.palmer@oracle.com>

Creates and supports a GPA->IOVA tree and a partial IOVA->HVA tree by
splitting up guest-backed memory maps and host-only memory maps from the
full IOVA->HVA tree. That is, any guest-backed memory maps are now
stored in the GPA->IOVA tree and host-only memory maps stay in the
IOVA->HVA tree.

Also propagates the GPAs (in_addr/out_addr) of a VirtQueueElement to
vhost_svq_translate_addr() to translate GPAs to IOVAs via the GPA->IOVA
tree (when descriptors are backed by guest memory). For descriptors
backed by host-only memory, the existing partial SVQ IOVA->HVA tree is
used.

GPAs are unique in the guest's address space, ensuring unambiguous IOVA
translations. This avoids the issue where different GPAs map to the same
HVA, causing the original HVA->IOVA translation to potentially return an
IOVA associated with the wrong intended GPA.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20250217144936.3589907-3-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-iova-tree.h        |  5 +++
 hw/virtio/vhost-shadow-virtqueue.h |  5 ++-
 include/qemu/iova-tree.h           | 22 ++++++++++
 hw/virtio/vhost-iova-tree.c        | 67 ++++++++++++++++++++++++++++++
 hw/virtio/vhost-shadow-virtqueue.c | 55 ++++++++++++++++--------
 hw/virtio/vhost-vdpa.c             | 19 ++++-----
 net/vhost-vdpa.c                   |  2 +-
 util/iova-tree.c                   | 46 ++++++++++++++++++++
 8 files changed, 190 insertions(+), 31 deletions(-)

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
index 525ce72b1d..0c4ba5abd5 100644
--- a/hw/virtio/vhost-iova-tree.h
+++ b/hw/virtio/vhost-iova-tree.h
@@ -24,5 +24,10 @@ const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
 int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map,
                               hwaddr taddr);
 void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map);
+const DMAMap *vhost_iova_tree_find_gpa(const VhostIOVATree *iova_tree,
+                                       const DMAMap *map);
+int vhost_iova_tree_map_alloc_gpa(VhostIOVATree *iova_tree, DMAMap *map,
+                                  hwaddr taddr);
+void vhost_iova_tree_remove_gpa(VhostIOVATree *iova_tree, DMAMap map);
 
 #endif
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 19c842a15b..9c273739d6 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -118,8 +118,9 @@ uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq);
 void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
                          const VirtQueueElement *elem, uint32_t len);
 int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
-                  size_t out_num, const struct iovec *in_sg, size_t in_num,
-                  VirtQueueElement *elem);
+                  size_t out_num, const hwaddr *out_addr,
+                  const struct iovec *in_sg, size_t in_num,
+                  const hwaddr *in_addr, VirtQueueElement *elem);
 size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num);
 
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 44a45931d5..16d354a814 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -40,6 +40,28 @@ typedef struct DMAMap {
 } QEMU_PACKED DMAMap;
 typedef gboolean (*iova_tree_iterator)(DMAMap *map);
 
+/**
+ * gpa_tree_new:
+ *
+ * Create a new GPA->IOVA tree.
+ *
+ * Returns: the tree point on success, or NULL otherwise.
+ */
+IOVATree *gpa_tree_new(void);
+
+/**
+ * gpa_tree_insert:
+ *
+ * @tree: The GPA->IOVA tree we're inserting the mapping to
+ * @map: The GPA->IOVA mapping to insert
+ *
+ * Inserts a GPA range to the GPA->IOVA tree. If there are overlapped
+ * ranges, IOVA_ERR_OVERLAP will be returned.
+ *
+ * Return: 0 if successful, < 0 otherwise.
+ */
+int gpa_tree_insert(IOVATree *tree, const DMAMap *map);
+
 /**
  * iova_tree_new:
  *
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 216885aa3c..9d2d6a7af2 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -31,6 +31,9 @@ struct VhostIOVATree {
 
     /* Allocated IOVA addresses */
     IOVATree *iova_map;
+
+    /* GPA->IOVA address memory maps */
+    IOVATree *gpa_iova_map;
 };
 
 /**
@@ -48,6 +51,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
 
     tree->iova_taddr_map = iova_tree_new();
     tree->iova_map = iova_tree_new();
+    tree->gpa_iova_map = gpa_tree_new();
     return tree;
 }
 
@@ -58,6 +62,7 @@ void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
 {
     iova_tree_destroy(iova_tree->iova_taddr_map);
     iova_tree_destroy(iova_tree->iova_map);
+    iova_tree_destroy(iova_tree->gpa_iova_map);
     g_free(iova_tree);
 }
 
@@ -122,3 +127,65 @@ void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map)
     iova_tree_remove(iova_tree->iova_taddr_map, map);
     iova_tree_remove(iova_tree->iova_map, map);
 }
+
+/**
+ * Find the IOVA address stored from a guest memory address (GPA)
+ *
+ * @tree: The VhostIOVATree
+ * @map: The map with the guest memory address
+ *
+ * Returns the stored GPA->IOVA mapping, or NULL if not found.
+ */
+const DMAMap *vhost_iova_tree_find_gpa(const VhostIOVATree *tree,
+                                       const DMAMap *map)
+{
+    return iova_tree_find_iova(tree->gpa_iova_map, map);
+}
+
+/**
+ * Allocate a new IOVA range and add the mapping to the GPA->IOVA tree
+ *
+ * @tree: The VhostIOVATree
+ * @map: The IOVA mapping
+ * @taddr: The translated address (GPA)
+ *
+ * Returns:
+ * - IOVA_OK if the map fits both containers
+ * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
+ * - IOVA_ERR_NOMEM if the IOVA-only tree cannot allocate more space
+ *
+ * It returns an assigned IOVA in map->iova if the return value is IOVA_OK.
+ */
+int vhost_iova_tree_map_alloc_gpa(VhostIOVATree *tree, DMAMap *map, hwaddr taddr)
+{
+    int ret;
+
+    /* Some vhost devices don't like addr 0. Skip first page */
+    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size();
+
+    if (taddr + map->size < taddr || map->perm == IOMMU_NONE) {
+        return IOVA_ERR_INVALID;
+    }
+
+    /* Allocate a node in the IOVA-only tree */
+    ret = iova_tree_alloc_map(tree->iova_map, map, iova_first, tree->iova_last);
+    if (unlikely(ret != IOVA_OK)) {
+        return ret;
+    }
+
+    /* Insert a node in the GPA->IOVA tree */
+    map->translated_addr = taddr;
+    return gpa_tree_insert(tree->gpa_iova_map, map);
+}
+
+/**
+ * Remove existing mappings from the IOVA-only and GPA->IOVA trees
+ *
+ * @tree: The VhostIOVATree
+ * @map: The map to remove
+ */
+void vhost_iova_tree_remove_gpa(VhostIOVATree *iova_tree, DMAMap map)
+{
+    iova_tree_remove(iova_tree->gpa_iova_map, map);
+    iova_tree_remove(iova_tree->iova_map, map);
+}
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 37aca8b431..30ba565f03 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -78,24 +78,39 @@ uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
  * @vaddr: Translated IOVA addresses
  * @iovec: Source qemu's VA addresses
  * @num: Length of iovec and minimum length of vaddr
+ * @gpas: Descriptors' GPAs, if backed by guest memory
  */
 static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
                                      hwaddr *addrs, const struct iovec *iovec,
-                                     size_t num)
+                                     size_t num, const hwaddr *gpas)
 {
     if (num == 0) {
         return true;
     }
 
     for (size_t i = 0; i < num; ++i) {
-        DMAMap needle = {
-            .translated_addr = (hwaddr)(uintptr_t)iovec[i].iov_base,
-            .size = iovec[i].iov_len,
-        };
         Int128 needle_last, map_last;
         size_t off;
+        const DMAMap *map;
+        DMAMap needle;
+
+        /* Check if the descriptor is backed by guest memory  */
+        if (gpas) {
+            /* Search the GPA->IOVA tree */
+            needle = (DMAMap) {
+                .translated_addr = gpas[i],
+                .size = iovec[i].iov_len,
+            };
+            map = vhost_iova_tree_find_gpa(svq->iova_tree, &needle);
+        } else {
+            /* Search the IOVA->HVA tree */
+            needle = (DMAMap) {
+                .translated_addr = (hwaddr)(uintptr_t)iovec[i].iov_base,
+                .size = iovec[i].iov_len,
+            };
+            map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
+        }
 
-        const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
         /*
          * Map cannot be NULL since iova map contains all guest space and
          * qemu already has a physical address mapped
@@ -130,6 +145,7 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
  * @sg: Cache for hwaddr
  * @iovec: The iovec from the guest
  * @num: iovec length
+ * @addr: Descriptors' GPAs, if backed by guest memory
  * @more_descs: True if more descriptors come in the chain
  * @write: True if they are writeable descriptors
  *
@@ -137,7 +153,8 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
  */
 static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
                                         const struct iovec *iovec, size_t num,
-                                        bool more_descs, bool write)
+                                        const hwaddr *addr, bool more_descs,
+                                        bool write)
 {
     uint16_t i = svq->free_head, last = svq->free_head;
     unsigned n;
@@ -149,7 +166,7 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
         return true;
     }
 
-    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+    ok = vhost_svq_translate_addr(svq, sg, iovec, num, addr);
     if (unlikely(!ok)) {
         return false;
     }
@@ -174,8 +191,9 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
                                 const struct iovec *out_sg, size_t out_num,
+                                const hwaddr *out_addr,
                                 const struct iovec *in_sg, size_t in_num,
-                                unsigned *head)
+                                const hwaddr *in_addr, unsigned *head)
 {
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
@@ -191,13 +209,14 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
-                                     false);
+    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, out_addr,
+                                     in_num > 0, false);
     if (unlikely(!ok)) {
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
+    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, in_addr, false,
+                                     true);
     if (unlikely(!ok)) {
         return false;
     }
@@ -247,8 +266,9 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
 int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
-                  size_t out_num, const struct iovec *in_sg, size_t in_num,
-                  VirtQueueElement *elem)
+                  size_t out_num, const hwaddr *out_addr,
+                  const struct iovec *in_sg, size_t in_num,
+                  const hwaddr *in_addr, VirtQueueElement *elem)
 {
     unsigned qemu_head;
     unsigned ndescs = in_num + out_num;
@@ -258,7 +278,8 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
         return -ENOSPC;
     }
 
-    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
+    ok = vhost_svq_add_split(svq, out_sg, out_num, out_addr, in_sg, in_num,
+                             in_addr, &qemu_head);
     if (unlikely(!ok)) {
         return -EINVAL;
     }
@@ -274,8 +295,8 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
 static int vhost_svq_add_element(VhostShadowVirtqueue *svq,
                                  VirtQueueElement *elem)
 {
-    return vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
-                         elem->in_num, elem);
+    return vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->out_addr,
+                         elem->in_sg, elem->in_num, elem->in_addr, elem);
 }
 
 /**
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 703dcfc929..7efbde3d4c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -360,17 +360,17 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
     llsize = int128_sub(llend, int128_make64(iova));
     if (s->shadow_data) {
         int r;
-        hwaddr hw_vaddr = (hwaddr)(uintptr_t)vaddr;
+        hwaddr gpa = section->offset_within_address_space;
 
         mem_region.size = int128_get64(llsize) - 1,
         mem_region.perm = IOMMU_ACCESS_FLAG(true, section->readonly),
 
-        r = vhost_iova_tree_map_alloc(s->iova_tree, &mem_region, hw_vaddr);
+        r = vhost_iova_tree_map_alloc_gpa(s->iova_tree, &mem_region, gpa);
         if (unlikely(r != IOVA_OK)) {
             error_report("Can't allocate a mapping (%d)", r);
 
-            if (mem_region.translated_addr == hw_vaddr) {
-                error_report("Insertion to IOVA->HVA tree failed");
+            if (mem_region.translated_addr == gpa) {
+                error_report("Insertion to GPA->IOVA tree failed");
                 /* Remove the mapping from the IOVA-only tree */
                 goto fail_map;
             }
@@ -392,7 +392,7 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
 
 fail_map:
     if (s->shadow_data) {
-        vhost_iova_tree_remove(s->iova_tree, mem_region);
+        vhost_iova_tree_remove_gpa(s->iova_tree, mem_region);
     }
 
 fail:
@@ -446,21 +446,18 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
 
     if (s->shadow_data) {
         const DMAMap *result;
-        const void *vaddr = memory_region_get_ram_ptr(section->mr) +
-            section->offset_within_region +
-            (iova - section->offset_within_address_space);
         DMAMap mem_region = {
-            .translated_addr = (hwaddr)(uintptr_t)vaddr,
+            .translated_addr = section->offset_within_address_space,
             .size = int128_get64(llsize) - 1,
         };
 
-        result = vhost_iova_tree_find_iova(s->iova_tree, &mem_region);
+        result = vhost_iova_tree_find_gpa(s->iova_tree, &mem_region);
         if (!result) {
             /* The memory listener map wasn't mapped */
             return;
         }
         iova = result->iova;
-        vhost_iova_tree_remove(s->iova_tree, *result);
+        vhost_iova_tree_remove_gpa(s->iova_tree, *result);
     }
     vhost_vdpa_iotlb_batch_begin_once(s);
     /*
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 5a3a57203d..bd01866878 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -649,7 +649,7 @@ static ssize_t vhost_vdpa_net_cvq_add(VhostVDPAState *s,
     VhostShadowVirtqueue *svq = g_ptr_array_index(s->vhost_vdpa.shadow_vqs, 0);
     int r;
 
-    r = vhost_svq_add(svq, out_sg, out_num, in_sg, in_num, NULL);
+    r = vhost_svq_add(svq, out_sg, out_num, NULL, in_sg, in_num, NULL, NULL);
     if (unlikely(r != 0)) {
         if (unlikely(r == -ENOSPC)) {
             qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 06295e2755..5b0c95ff15 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -257,3 +257,49 @@ void iova_tree_destroy(IOVATree *tree)
     g_tree_destroy(tree->tree);
     g_free(tree);
 }
+
+static int gpa_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
+{
+    const DMAMap *m1 = a, *m2 = b;
+
+    if (m1->translated_addr > m2->translated_addr + m2->size) {
+        return 1;
+    }
+
+    if (m1->translated_addr + m1->size < m2->translated_addr) {
+        return -1;
+    }
+
+    /* Overlapped */
+    return 0;
+}
+
+IOVATree *gpa_tree_new(void)
+{
+    IOVATree *gpa_tree = g_new0(IOVATree, 1);
+
+    gpa_tree->tree = g_tree_new_full(gpa_tree_compare, NULL, g_free, NULL);
+
+    return gpa_tree;
+}
+
+int gpa_tree_insert(IOVATree *tree, const DMAMap *map)
+{
+    DMAMap *new;
+
+    if (map->translated_addr + map->size < map->translated_addr ||
+        map->perm == IOMMU_NONE) {
+        return IOVA_ERR_INVALID;
+    }
+
+    /* We don't allow inserting ranges that overlap with existing ones */
+    if (iova_tree_find(tree, map)) {
+        return IOVA_ERR_OVERLAP;
+    }
+
+    new = g_new0(DMAMap, 1);
+    memcpy(new, map, sizeof(*new));
+    iova_tree_insert_internal(tree->tree, new);
+
+    return IOVA_OK;
+}
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 30/41] vhost-iova-tree: Update documentation
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (28 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 29/41] vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 31/41] cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0() Michael S. Tsirkin
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jonah Palmer, Eugenio Pérez, Lei Yang,
	Stefano Garzarella

From: Jonah Palmer <jonah.palmer@oracle.com>

Minor update to some of the documentation / comments in
hw/virtio/vhost-iova-tree.c.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20250217144936.3589907-4-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-iova-tree.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 9d2d6a7af2..fa4147b773 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -37,9 +37,9 @@ struct VhostIOVATree {
 };
 
 /**
- * Create a new IOVA tree
+ * Create a new VhostIOVATree
  *
- * Returns the new IOVA tree
+ * Returns the new VhostIOVATree.
  */
 VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
 {
@@ -56,7 +56,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
 }
 
 /**
- * Delete an iova tree
+ * Delete a VhostIOVATree
  */
 void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
 {
@@ -69,10 +69,10 @@ void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
 /**
  * Find the IOVA address stored from a memory address
  *
- * @tree: The iova tree
+ * @tree: The VhostIOVATree
  * @map: The map with the memory address
  *
- * Return the stored mapping, or NULL if not found.
+ * Returns the stored IOVA->HVA mapping, or NULL if not found.
  */
 const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
                                         const DMAMap *map)
@@ -81,10 +81,10 @@ const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
 }
 
 /**
- * Allocate a new mapping
+ * Allocate a new IOVA range and add the mapping to the IOVA->HVA tree
  *
- * @tree: The iova tree
- * @map: The iova map
+ * @tree: The VhostIOVATree
+ * @map: The IOVA mapping
  * @taddr: The translated address (HVA)
  *
  * Returns:
@@ -92,7 +92,7 @@ const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
  * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
  * - IOVA_ERR_NOMEM if tree cannot allocate more space.
  *
- * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
+ * It returns an assigned IOVA in map->iova if the return value is IOVA_OK.
  */
 int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map, hwaddr taddr)
 {
@@ -117,9 +117,9 @@ int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map, hwaddr taddr)
 }
 
 /**
- * Remove existing mappings from iova tree
+ * Remove existing mappings from the IOVA-only and IOVA->HVA trees
  *
- * @iova_tree: The vhost iova tree
+ * @iova_tree: The VhostIOVATree
  * @map: The map to remove
  */
 void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map)
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 31/41] cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0()
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (29 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 30/41] vhost-iova-tree: Update documentation Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 32/41] MAINTAINERS: add more files to `vhost` Michael S. Tsirkin
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Stefano Garzarella, qemu-stable, myluo24,
	Gonglei (Arei), Zhenwei Pi

From: Stefano Garzarella <sgarzare@redhat.com>

The function `vhost_dev_init()` expects the `struct vhost_dev`
(passed as a parameter) to be fully initialized. This is important
because some parts of the code check whether `vhost_dev->config_ops`
is NULL to determine if it has been set (e.g. later via
`vhost_dev_set_config_notifier`).

To ensure this initialization, it’s better to allocate the entire
`CryptoDevBackendVhost` structure (which includes `vhost_dev`) using
`g_mem0()`, following the same approach used for other vhost devices,
such as in `vhost_net_init()`.

Fixes: 042cea274c ("cryptodev: add vhost-user as a new cryptodev backend")
Cc: qemu-stable@nongnu.org
Reported-by: myluo24@m.fudan.edu.cn
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20250211135523.101203-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 backends/cryptodev-vhost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/backends/cryptodev-vhost.c b/backends/cryptodev-vhost.c
index 8718c97326..943680a23a 100644
--- a/backends/cryptodev-vhost.c
+++ b/backends/cryptodev-vhost.c
@@ -53,7 +53,7 @@ cryptodev_vhost_init(
     CryptoDevBackendVhost *crypto;
     Error *local_err = NULL;
 
-    crypto = g_new(CryptoDevBackendVhost, 1);
+    crypto = g_new0(CryptoDevBackendVhost, 1);
     crypto->dev.max_queues = 1;
     crypto->dev.nvqs = 1;
     crypto->dev.vqs = crypto->vqs;
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 32/41] MAINTAINERS: add more files to `vhost`
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (30 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 31/41] cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0() Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 33/41] vdpa: Fix endian bugs in shadow virtqueue Michael S. Tsirkin
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Stefano Garzarella, Philippe Mathieu-Daudé,
	Thomas Huth, Richard Henderson

From: Stefano Garzarella <sgarzare@redhat.com>

While sending a patch for backends/cryptodev-vhost.c I noticed that
Michael wasn`t in CC so I took a look at the files listed under `vhost`
and tried to fix it increasing the coverage by adding new files.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20250211144259.117910-1-sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 MAINTAINERS | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 013a57d5bf..2d07d72933 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2212,12 +2212,16 @@ M: Michael S. Tsirkin <mst@redhat.com>
 R: Stefano Garzarella <sgarzare@redhat.com>
 S: Supported
 F: hw/*/*vhost*
-F: docs/interop/vhost-user.json
-F: docs/interop/vhost-user.rst
+F: docs/interop/vhost-user*
+F: docs/system/devices/vhost-user*
 F: contrib/vhost-user-*/
-F: backends/vhost-user.c
+F: backends/*vhost*
 F: include/system/vhost-user-backend.h
+F: include/hw/virtio/vhost*
+F: include/*/vhost*
 F: subprojects/libvhost-user/
+F: block/export/vhost-user*
+F: util/vhost-user-server.c
 
 vhost-shadow-virtqueue
 R: Eugenio Pérez <eperezma@redhat.com>
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 33/41] vdpa: Fix endian bugs in shadow virtqueue
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (31 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 32/41] MAINTAINERS: add more files to `vhost` Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 34/41] hw/virtio/virtio-nsm: Respond with correct length Michael S. Tsirkin
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Konstantin Shkolnyy, Eugenio Pérez, Lei Yang,
	Stefano Garzarella

From: Konstantin Shkolnyy <kshk@linux.ibm.com>

VDPA didn't work on a big-endian machine due to missing/incorrect
CPU<->LE data format conversions.

Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Message-Id: <20250212164923.1971538-1-kshk@linux.ibm.com>
Fixes: 10857ec0ad ("vhost: Add VhostShadowVirtqueue")
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index 30ba565f03..2481d49345 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -182,10 +182,10 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
         descs[i].len = cpu_to_le32(iovec[n].iov_len);
 
         last = i;
-        i = cpu_to_le16(svq->desc_next[i]);
+        i = svq->desc_next[i];
     }
 
-    svq->free_head = le16_to_cpu(svq->desc_next[last]);
+    svq->free_head = svq->desc_next[last];
     return true;
 }
 
@@ -247,10 +247,12 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
     smp_mb();
 
     if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
-        uint16_t avail_event = *(uint16_t *)(&svq->vring.used->ring[svq->vring.num]);
+        uint16_t avail_event = le16_to_cpu(
+                *(uint16_t *)(&svq->vring.used->ring[svq->vring.num]));
         needs_kick = vring_need_event(avail_event, svq->shadow_avail_idx, svq->shadow_avail_idx - 1);
     } else {
-        needs_kick = !(svq->vring.used->flags & VRING_USED_F_NO_NOTIFY);
+        needs_kick =
+                !(svq->vring.used->flags & cpu_to_le16(VRING_USED_F_NO_NOTIFY));
     }
 
     if (!needs_kick) {
@@ -386,7 +388,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
         return true;
     }
 
-    svq->shadow_used_idx = cpu_to_le16(*(volatile uint16_t *)used_idx);
+    svq->shadow_used_idx = le16_to_cpu(*(volatile uint16_t *)used_idx);
 
     return svq->last_used_idx != svq->shadow_used_idx;
 }
@@ -404,7 +406,7 @@ static bool vhost_svq_enable_notification(VhostShadowVirtqueue *svq)
 {
     if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
         uint16_t *used_event = (uint16_t *)&svq->vring.avail->ring[svq->vring.num];
-        *used_event = svq->shadow_used_idx;
+        *used_event = cpu_to_le16(svq->shadow_used_idx);
     } else {
         svq->vring.avail->flags &= ~cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
     }
@@ -429,7 +431,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
                                              uint16_t num, uint16_t i)
 {
     for (uint16_t j = 0; j < (num - 1); ++j) {
-        i = le16_to_cpu(svq->desc_next[i]);
+        i = svq->desc_next[i];
     }
 
     return i;
@@ -704,7 +706,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     svq->desc_state = g_new0(SVQDescState, svq->vring.num);
     svq->desc_next = g_new0(uint16_t, svq->vring.num);
     for (unsigned i = 0; i < svq->vring.num - 1; i++) {
-        svq->desc_next[i] = cpu_to_le16(i + 1);
+        svq->desc_next[i] = i + 1;
     }
 }
 
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 34/41] hw/virtio/virtio-nsm: Respond with correct length
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (32 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 33/41] vdpa: Fix endian bugs in shadow virtqueue Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 35/41] net: vhost-user: add QAPI events to report connection state Michael S. Tsirkin
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Alexander Graf, Vikrant Garg, Dorjoy Chowdhury,
	Philippe Mathieu-Daudé

From: Alexander Graf <graf@amazon.com>

When we return a response packet from NSM, we need to indicate its
length according to the content of the response. Prior to this patch, we
returned the length of the source buffer, which may confuse guest code
that relies on the response size.

Fix it by returning the response payload size instead.

Fixes: bb154e3e0cc715 ("device/virtio-nsm: Support for Nitro Secure Module device")
Reported-by: Vikrant Garg <vikrant1garg@gmail.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20250213114541.67515-1-graf@amazon.com>
Reviewed-by: Dorjoy Chowdhury <dorjoychy111@gmail.com>
Fixes: bb154e3e0cc715 (&quot;device/virtio-nsm: Support for Nitro Secure Module device&quot;)<br>
Reported-by: Vikrant Garg <vikrant1garg@gmail.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Vikrant Garg <vikrant1garg@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-nsm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-nsm.c b/hw/virtio/virtio-nsm.c
index 098e1aeac6..b22aa74e34 100644
--- a/hw/virtio/virtio-nsm.c
+++ b/hw/virtio/virtio-nsm.c
@@ -1596,7 +1596,7 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
     g_free(req.iov_base);
     g_free(res.iov_base);
     virtqueue_push(vq, out_elem, 0);
-    virtqueue_push(vq, in_elem, in_elem->in_sg->iov_len);
+    virtqueue_push(vq, in_elem, sz);
     virtio_notify(vdev, vq);
     return;
 
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 35/41] net: vhost-user: add QAPI events to report connection state
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (33 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 34/41] hw/virtio/virtio-nsm: Respond with correct length Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 36/41] vhost-user-snd: correct the calculation of config_size Michael S. Tsirkin
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Laurent Vivier, Stefano Brivio, Markus Armbruster,
	Jason Wang, Eric Blake

From: Laurent Vivier <lvivier@redhat.com>

The netdev reports NETDEV_VHOST_USER_CONNECTED event when
the chardev is connected, and NETDEV_VHOST_USER_DISCONNECTED
when it is disconnected.

The NETDEV_VHOST_USER_CONNECTED event includes the chardev id.

This allows a system manager like libvirt to detect when the server
fails.

For instance with passt:

{ 'execute': 'qmp_capabilities' }
{ "return": { } }

[killing passt here]

{ "timestamp": { "seconds": 1739538634, "microseconds": 920450 },
  "event": "NETDEV_VHOST_USER_DISCONNECTED",
  "data": { "netdev-id": "netdev0" } }

[automatic reconnection with reconnect-ms]

{ "timestamp": { "seconds": 1739538638, "microseconds": 354181 },
  "event": "NETDEV_VHOST_USER_CONNECTED",
  "data": { "netdev-id": "netdev0", "chardev-id": "chr0" } }

Tested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20250217092550.1172055-1-lvivier@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 qapi/net.json    | 40 ++++++++++++++++++++++++++++++++++++++++
 net/vhost-user.c |  3 +++
 2 files changed, 43 insertions(+)

diff --git a/qapi/net.json b/qapi/net.json
index 2739a2f423..310cc4fd19 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -1031,3 +1031,43 @@
 ##
 { 'event': 'NETDEV_STREAM_DISCONNECTED',
   'data': { 'netdev-id': 'str' } }
+
+##
+# @NETDEV_VHOST_USER_CONNECTED:
+#
+# Emitted when the vhost-user chardev is connected
+#
+# @netdev-id: QEMU netdev id that is connected
+#
+# @chardev-id: The character device id used by the QEMU netdev
+#
+# Since: 10.0
+#
+# .. qmp-example::
+#
+#     <- { "timestamp": {"seconds": 1739538638, "microseconds": 354181 },
+#          "event": "NETDEV_VHOST_USER_CONNECTED",
+#          "data": { "netdev-id": "netdev0", "chardev-id": "chr0" } }
+#
+##
+{ 'event': 'NETDEV_VHOST_USER_CONNECTED',
+  'data': { 'netdev-id': 'str', 'chardev-id': 'str' } }
+
+##
+# @NETDEV_VHOST_USER_DISCONNECTED:
+#
+# Emitted when the vhost-user chardev is disconnected
+#
+# @netdev-id: QEMU netdev id that is disconnected
+#
+# Since: 10.0
+#
+# .. qmp-example::
+#
+#     <- { "timestamp": { "seconds": 1739538634, "microseconds": 920450 },
+#          "event": "NETDEV_VHOST_USER_DISCONNECTED",
+#          "data": { "netdev-id": "netdev0" } }
+#
+##
+{ 'event': 'NETDEV_VHOST_USER_DISCONNECTED',
+  'data': { 'netdev-id': 'str' } }
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 12555518e8..0b235e50c6 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -16,6 +16,7 @@
 #include "chardev/char-fe.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-net.h"
+#include "qapi/qapi-events-net.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
 #include "qemu/option.h"
@@ -271,6 +272,7 @@ static void chr_closed_bh(void *opaque)
     if (err) {
         error_report_err(err);
     }
+    qapi_event_send_netdev_vhost_user_disconnected(name);
 }
 
 static void net_vhost_user_event(void *opaque, QEMUChrEvent event)
@@ -300,6 +302,7 @@ static void net_vhost_user_event(void *opaque, QEMUChrEvent event)
                                          net_vhost_user_watch, s);
         qmp_set_link(name, true, &err);
         s->started = true;
+        qapi_event_send_netdev_vhost_user_connected(name, chr->label);
         break;
     case CHR_EVENT_CLOSED:
         /* a close event may happen during a read/write, but vhost
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 36/41] vhost-user-snd: correct the calculation of config_size
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (34 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 35/41] net: vhost-user: add QAPI events to report connection state Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 37/41] hw/virtio/virtio-iommu: Migrate to 3-phase reset Michael S. Tsirkin
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Matias Ezequiel Vara Larsen,
	Philippe Mathieu-Daudé, Stefano Garzarella, qemu-stable,
	Dorinda Bassey, Alex Bennée, Manos Pitsidianakis

From: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>

Use virtio_get_config_size() rather than sizeof(struct
virtio_snd_config) for the config_size in the vhost-user-snd frontend.
The frontend shall rely on device features for the size of the device
configuration space. The presence of `controls` in the config space
depends on VIRTIO_SND_F_CTLS according to the specification (v1.3):
`
5.14.4 Device Configuration Layout
...

controls
(driver-read-only) indicates a total number of all available control
elements if VIRTIO_SND_F_CTLS has been negotiated.
`
This fixes an issue introduced by commit ab0c7fb2 ("linux-headers:
update to current kvm/next") in which the optional field `controls` is
added to the virtio_snd_config structure. This breaks vhost-user-device
backends that do not implement the `controls` field.

Fixes: ab0c7fb22b ("linux-headers: update to current kvm/next")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2805
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>
Message-Id: <20250217131255.829892-1-mvaralar@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Dorinda Bassey <dbassey@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-user-snd.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user-snd.c b/hw/virtio/vhost-user-snd.c
index 8610370af8..b414c75c06 100644
--- a/hw/virtio/vhost-user-snd.c
+++ b/hw/virtio/vhost-user-snd.c
@@ -16,6 +16,18 @@
 #include "standard-headers/linux/virtio_ids.h"
 #include "standard-headers/linux/virtio_snd.h"
 
+static const VirtIOFeature feature_sizes[] = {
+    {.flags = 1ULL << VIRTIO_SND_F_CTLS,
+    .end = endof(struct virtio_snd_config, controls)},
+    {}
+};
+
+static const VirtIOConfigSizeParams cfg_size_params = {
+    .min_size = endof(struct virtio_snd_config, chmaps),
+    .max_size = sizeof(struct virtio_snd_config),
+    .feature_sizes = feature_sizes
+};
+
 static const VMStateDescription vu_snd_vmstate = {
     .name = "vhost-user-snd",
     .unmigratable = 1,
@@ -23,16 +35,20 @@ static const VMStateDescription vu_snd_vmstate = {
 
 static const Property vsnd_properties[] = {
     DEFINE_PROP_CHR("chardev", VHostUserBase, chardev),
+    DEFINE_PROP_BIT64("controls", VHostUserBase,
+                      parent_obj.host_features, VIRTIO_SND_F_CTLS, false),
 };
 
 static void vu_snd_base_realize(DeviceState *dev, Error **errp)
 {
     VHostUserBase *vub = VHOST_USER_BASE(dev);
     VHostUserBaseClass *vubs = VHOST_USER_BASE_GET_CLASS(dev);
+    VirtIODevice *vdev = &vub->parent_obj;
 
     vub->virtio_id = VIRTIO_ID_SOUND;
     vub->num_vqs = 4;
-    vub->config_size = sizeof(struct virtio_snd_config);
+    vub->config_size = virtio_get_config_size(&cfg_size_params,
+                                              vdev->host_features);
     vub->vq_size = 64;
 
     vubs->parent_realize(dev, errp);
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 37/41] hw/virtio/virtio-iommu: Migrate to 3-phase reset
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (35 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 36/41] vhost-user-snd: correct the calculation of config_size Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 38/41] hw/i386/intel-iommu: " Michael S. Tsirkin
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Eric Auger, Zhenzhong Duan, Jason Wang, Peter Xu

From: Eric Auger <eric.auger@redhat.com>

Currently the iommu may be reset before the devices
it protects. For example this happens with virtio-net.

Let's use 3-phase reset mechanism and reset the IOMMU on
exit phase after all DMA capable devices have been
reset during the 'enter' or 'hold' phase.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>

Message-Id: <20250218182737.76722-2-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-iommu.c | 14 ++++++++++----
 hw/virtio/trace-events   |  2 +-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index f41104a952..b6e7e01ef7 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -1504,11 +1504,11 @@ static void virtio_iommu_device_unrealize(DeviceState *dev)
     virtio_cleanup(vdev);
 }
 
-static void virtio_iommu_device_reset(VirtIODevice *vdev)
+static void virtio_iommu_device_reset_exit(Object *obj, ResetType type)
 {
-    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
+    VirtIOIOMMU *s = VIRTIO_IOMMU(obj);
 
-    trace_virtio_iommu_device_reset();
+    trace_virtio_iommu_device_reset_exit();
 
     if (s->domains) {
         g_tree_destroy(s->domains);
@@ -1668,6 +1668,7 @@ static void virtio_iommu_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
 
     device_class_set_props(dc, virtio_iommu_properties);
     dc->vmsd = &vmstate_virtio_iommu;
@@ -1675,7 +1676,12 @@ static void virtio_iommu_class_init(ObjectClass *klass, void *data)
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     vdc->realize = virtio_iommu_device_realize;
     vdc->unrealize = virtio_iommu_device_unrealize;
-    vdc->reset = virtio_iommu_device_reset;
+
+    /*
+     * Use 'exit' reset phase to make sure all DMA requests
+     * have been quiesced during 'enter' or 'hold' phase
+     */
+    rc->phases.exit = virtio_iommu_device_reset_exit;
     vdc->get_config = virtio_iommu_get_config;
     vdc->set_config = virtio_iommu_set_config;
     vdc->get_features = virtio_iommu_get_features;
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 04e36ae047..76f0d458b2 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -108,7 +108,7 @@ virtio_pci_notify_write(uint64_t addr, uint64_t val, unsigned int size) "0x%" PR
 virtio_pci_notify_write_pio(uint64_t addr, uint64_t val, unsigned int size) "0x%" PRIx64" = 0x%" PRIx64 " (%d)"
 
 # hw/virtio/virtio-iommu.c
-virtio_iommu_device_reset(void) "reset!"
+virtio_iommu_device_reset_exit(void) "reset!"
 virtio_iommu_system_reset(void) "system reset!"
 virtio_iommu_get_features(uint64_t features) "device supports features=0x%"PRIx64
 virtio_iommu_device_status(uint8_t status) "driver status = %d"
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 38/41] hw/i386/intel-iommu: Migrate to 3-phase reset
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (36 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 37/41] hw/virtio/virtio-iommu: Migrate to 3-phase reset Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 39/41] hw/arm/smmuv3: Move reset to exit phase Michael S. Tsirkin
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Eric Auger, Jason Wang, Peter Xu, Yi Liu,
	Clément Mathieu--Drif, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, Marcel Apfelbaum

From: Eric Auger <eric.auger@redhat.com>

Currently the IOMMU may be reset before the devices
it protects. For example this happens with virtio devices
but also with VFIO devices. In this latter case this
produces spurious translation faults on host.

Let's use 3-phase reset mechanism and reset the IOMMU on
exit phase after all DMA capable devices have been reset
on 'enter' or 'hold' phase.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Zhenzhong Duan <zhenzhong.duan@intel.com>

Message-Id: <20250218182737.76722-3-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/intel_iommu.c | 12 +++++++++---
 hw/i386/trace-events  |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f366c223d0..a5cf2d0e81 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4697,10 +4697,11 @@ static void vtd_init(IntelIOMMUState *s)
 /* Should not reset address_spaces when reset because devices will still use
  * the address space they got at first (won't ask the bus again).
  */
-static void vtd_reset(DeviceState *dev)
+static void vtd_reset_exit(Object *obj, ResetType type)
 {
-    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
+    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(obj);
 
+    trace_vtd_reset_exit();
     vtd_init(s);
     vtd_address_space_refresh_all(s);
 }
@@ -4864,8 +4865,13 @@ static void vtd_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     X86IOMMUClass *x86_class = X86_IOMMU_DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
 
-    device_class_set_legacy_reset(dc, vtd_reset);
+    /*
+     * Use 'exit' reset phase to make sure all DMA requests
+     * have been quiesced during 'enter' or 'hold' phase
+     */
+    rc->phases.exit = vtd_reset_exit;
     dc->vmsd = &vtd_vmstate;
     device_class_set_props(dc, vtd_properties);
     dc->hotpluggable = false;
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 53c02d7ac8..ac9e1a10aa 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -68,6 +68,7 @@ vtd_frr_new(int index, uint64_t hi, uint64_t lo) "index %d high 0x%"PRIx64" low
 vtd_warn_invalid_qi_tail(uint16_t tail) "tail 0x%"PRIx16
 vtd_warn_ir_vector(uint16_t sid, int index, int vec, int target) "sid 0x%"PRIx16" index %d vec %d (should be: %d)"
 vtd_warn_ir_trigger(uint16_t sid, int index, int trig, int target) "sid 0x%"PRIx16" index %d trigger %d (should be: %d)"
+vtd_reset_exit(void) ""
 
 # amd_iommu.c
 amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 0x%"PRIx64" +  offset 0x%"PRIx32
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 39/41] hw/arm/smmuv3: Move reset to exit phase
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (37 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 38/41] hw/i386/intel-iommu: " Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 40/41] hw/vfio/common: Add a trace point in vfio_reset_handler Michael S. Tsirkin
                   ` (2 subsequent siblings)
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Eric Auger, Zhenzhong Duan, Peter Xu, qemu-arm

From: Eric Auger <eric.auger@redhat.com>

Currently the iommu may be reset before the devices
it protects. For example this happens with virtio-scsi-pci.
when system_reset is issued from qmp monitor: spurious
"virtio: zero sized buffers are not allowed" warnings can
be observed. This happens because outstanding DMA requests
are still happening while the SMMU gets reset.

This can also happen with VFIO devices. In that case
spurious DMA translation faults can be observed on host.

Make sure the SMMU is reset in the 'exit' phase after
all DMA capable devices have been reset during the 'enter'
or 'hold' phase.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Message-Id: <20250218182737.76722-4-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/arm/smmu-common.c |  9 +++++++--
 hw/arm/smmuv3.c      | 14 ++++++++++----
 hw/arm/trace-events  |  1 +
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index dd74c2e558..8c1b407b82 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -924,7 +924,12 @@ static void smmu_base_realize(DeviceState *dev, Error **errp)
     }
 }
 
-static void smmu_base_reset_hold(Object *obj, ResetType type)
+/*
+ * Make sure the IOMMU is reset in 'exit' phase after
+ * all outstanding DMA requests have been quiesced during
+ * the 'enter' or 'hold' reset phases
+ */
+static void smmu_base_reset_exit(Object *obj, ResetType type)
 {
     SMMUState *s = ARM_SMMU(obj);
 
@@ -949,7 +954,7 @@ static void smmu_base_class_init(ObjectClass *klass, void *data)
     device_class_set_props(dc, smmu_dev_properties);
     device_class_set_parent_realize(dc, smmu_base_realize,
                                     &sbc->parent_realize);
-    rc->phases.hold = smmu_base_reset_hold;
+    rc->phases.exit = smmu_base_reset_exit;
 }
 
 static const TypeInfo smmu_base_info = {
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index c0cf5df0f6..b49a59b64c 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1870,13 +1870,19 @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
     }
 }
 
-static void smmu_reset_hold(Object *obj, ResetType type)
+/*
+ * Make sure the IOMMU is reset in 'exit' phase after
+ * all outstanding DMA requests have been quiesced during
+ * the 'enter' or 'hold' reset phases
+ */
+static void smmu_reset_exit(Object *obj, ResetType type)
 {
     SMMUv3State *s = ARM_SMMUV3(obj);
     SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
 
-    if (c->parent_phases.hold) {
-        c->parent_phases.hold(obj, type);
+    trace_smmu_reset_exit();
+    if (c->parent_phases.exit) {
+        c->parent_phases.exit(obj, type);
     }
 
     smmuv3_init_regs(s);
@@ -1999,7 +2005,7 @@ static void smmuv3_class_init(ObjectClass *klass, void *data)
     SMMUv3Class *c = ARM_SMMUV3_CLASS(klass);
 
     dc->vmsd = &vmstate_smmuv3;
-    resettable_class_set_parent_phases(rc, NULL, smmu_reset_hold, NULL,
+    resettable_class_set_parent_phases(rc, NULL, NULL, smmu_reset_exit,
                                        &c->parent_phases);
     device_class_set_parent_realize(dc, smmu_realize,
                                     &c->parent_realize);
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index c64ad344bd..7790db780e 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -56,6 +56,7 @@ smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
 smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu mr=%s"
 smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s"
 smmuv3_inv_notifiers_iova(const char *name, int asid, int vmid, uint64_t iova, uint8_t tg, uint64_t num_pages, int stage) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" stage=%d"
+smmu_reset_exit(void) ""
 
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 40/41] hw/vfio/common: Add a trace point in vfio_reset_handler
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (38 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 39/41] hw/arm/smmuv3: Move reset to exit phase Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 12:24 ` [PULL 41/41] docs/devel/reset: Document reset expectations for DMA and IOMMU Michael S. Tsirkin
  2025-02-21 23:17 ` [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Stefan Hajnoczi
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Eric Auger, Cédric Le Goater, Zhenzhong Duan,
	Peter Xu, Alex Williamson

From: Eric Auger <eric.auger@redhat.com>

To ease the debug of reset sequence, let's add a trace point
in vfio_reset_handler()

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20250218182737.76722-5-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/vfio/common.c     | 1 +
 hw/vfio/trace-events | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7499a9b74..173fb3a997 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1386,6 +1386,7 @@ void vfio_reset_handler(void *opaque)
 {
     VFIODevice *vbasedev;
 
+    trace_vfio_reset_handler();
     QLIST_FOREACH(vbasedev, &vfio_device_list, global_next) {
         if (vbasedev->dev->realized) {
             vbasedev->ops->vfio_compute_needs_reset(vbasedev);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index cab1cf1de0..c5385e1a4f 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -120,6 +120,7 @@ vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype
 vfio_legacy_dma_unmap_overflow_workaround(void) ""
 vfio_get_dirty_bitmap(uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start, uint64_t dirty_pages) "iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64" dirty_pages=%"PRIu64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
+vfio_reset_handler(void) ""
 
 # platform.c
 vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PULL 41/41] docs/devel/reset: Document reset expectations for DMA and IOMMU
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (39 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 40/41] hw/vfio/common: Add a trace point in vfio_reset_handler Michael S. Tsirkin
@ 2025-02-21 12:24 ` Michael S. Tsirkin
  2025-02-21 23:17 ` [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Stefan Hajnoczi
  41 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2025-02-21 12:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Eric Auger, Zhenzhong Duan, Peter Xu,
	Richard Henderson, Philippe Mathieu-Daudé, Luc Michel

From: Eric Auger <eric.auger@redhat.com>

To avoid any translation faults, the IOMMUs are expected to be
reset after the devices they protect. Document that we expect
DMA requests to be stopped during the 'enter' or 'hold' phase
while IOMMUs should be reset during the 'exit' phase.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20250218182737.76722-6-eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/devel/reset.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
index adefd59ef9..0b8b2fa5f4 100644
--- a/docs/devel/reset.rst
+++ b/docs/devel/reset.rst
@@ -143,6 +143,11 @@ The *exit* phase is executed only when the last reset operation ends. Therefore
 the object does not need to care how many of reset controllers it has and how
 many of them have started a reset.
 
+DMA capable devices are expected to cancel all outstanding DMA operations
+during either 'enter' or 'hold' phases. IOMMUs are expected to reset during
+the 'exit' phase and this sequencing makes sure no outstanding DMA request
+will fault.
+
 
 Handling reset in a resettable object
 -------------------------------------
-- 
MST



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PULL 00/41] virtio,pc,pci: features, fixes, cleanups
  2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
                   ` (40 preceding siblings ...)
  2025-02-21 12:24 ` [PULL 41/41] docs/devel/reset: Document reset expectations for DMA and IOMMU Michael S. Tsirkin
@ 2025-02-21 23:17 ` Stefan Hajnoczi
  41 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2025-02-21 23:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 116 bytes --]

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/10.0 for any user-visible changes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-02-27  9:52 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-21 12:22 [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Michael S. Tsirkin
2025-02-21 12:22 ` [PULL 01/41] docs/about: Change notes on x86 machine type deprecation into a general one Michael S. Tsirkin
2025-02-21 12:22 ` [PULL 02/41] hw/net: Fix NULL dereference with software RSS Michael S. Tsirkin
2025-02-27  9:51   ` Michael Tokarev
2025-02-21 12:22 ` [PULL 03/41] hw/ppc/spapr_pci: Do not create DT for disabled PCI device Michael S. Tsirkin
2025-02-21 12:22 ` [PULL 04/41] hw/ppc/spapr_pci: Do not reject VFs created after a PF Michael S. Tsirkin
2025-02-21 12:22 ` [PULL 05/41] s390x/pci: Avoid creating zpci for VFs Michael S. Tsirkin
2025-02-21 12:22 ` [PULL 06/41] s390x/pci: Allow plugging SR-IOV devices Michael S. Tsirkin
2025-02-21 12:22 ` [PULL 07/41] s390x/pci: Check for multifunction after device realization Michael S. Tsirkin
2025-02-21 12:22 ` [PULL 08/41] pcie_sriov: Do not manually unrealize Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 09/41] pcie_sriov: Ensure VF addr does not overflow Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 10/41] pcie_sriov: Reuse SR-IOV VF device instances Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 11/41] pcie_sriov: Release VFs failed to realize Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 12/41] pcie_sriov: Remove num_vfs from PCIESriovPF Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 13/41] pcie_sriov: Register VFs after migration Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 14/41] qtest/libqos/pci: Do not write to PBA memory Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 15/41] hw/pci/msix: Warn on PBA writes Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 16/41] hw/pci: Assert a bar is not registered multiple times Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 17/41] hw/i386/pc: Fix crash that occurs when introspecting TYPE_PC_MACHINE machines Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 18/41] hw/i386/microvm: Fix crash that occurs when introspecting the microvm machine Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 19/41] tests/qtest/vhost-user-test: Use modern virtio for vhost-user tests Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 20/41] hw/cxl: Introduce CXL_T3_MSIX_VECTOR enumeration Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 21/41] hw/mem/cxl_type3: Add paired msix_uninit_exclusive_bar() call Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 22/41] hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 23/41] hw/mem/cxl_type3: Ensure errp is set on realization failure Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 24/41] mem/cxl_type3: support 3, 6, 12 and 16 interleave ways Michael S. Tsirkin
2025-02-21 12:23 ` [PULL 25/41] hw/virtio: reset virtio balloon stats on machine reset Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 26/41] amd_iommu: Use correct DTE field for interrupt passthrough Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 27/41] amd_iommu: Use correct bitmask to set capability BAR Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 28/41] vhost-iova-tree: Implement an IOVA-only tree Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 29/41] vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 30/41] vhost-iova-tree: Update documentation Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 31/41] cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0() Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 32/41] MAINTAINERS: add more files to `vhost` Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 33/41] vdpa: Fix endian bugs in shadow virtqueue Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 34/41] hw/virtio/virtio-nsm: Respond with correct length Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 35/41] net: vhost-user: add QAPI events to report connection state Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 36/41] vhost-user-snd: correct the calculation of config_size Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 37/41] hw/virtio/virtio-iommu: Migrate to 3-phase reset Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 38/41] hw/i386/intel-iommu: " Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 39/41] hw/arm/smmuv3: Move reset to exit phase Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 40/41] hw/vfio/common: Add a trace point in vfio_reset_handler Michael S. Tsirkin
2025-02-21 12:24 ` [PULL 41/41] docs/devel/reset: Document reset expectations for DMA and IOMMU Michael S. Tsirkin
2025-02-21 23:17 ` [PULL 00/41] virtio,pc,pci: features, fixes, cleanups Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).