qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PULL 00/46] virtio: features,fixes
@ 2024-06-04 19:05 Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 01/46] vhost: dirty log should be per backend type Michael S. Tsirkin
                   ` (47 more replies)
  0 siblings, 48 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

The following changes since commit 60b54b67c63d8f076152e0f7dccf39854dfc6a77:

  Merge tag 'pull-lu-20240526' of https://gitlab.com/rth7680/qemu into staging (2024-05-26 17:51:00 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to bfcacf81d63a3d95f128bce3faf3564e7f98ea8b:

  hw/cxl: Fix read from bogus memory (2024-06-04 15:05:03 -0400)

----------------------------------------------------------------
virtio: features,fixes

A bunch of improvements:
- vhost dirty log is now only scanned once, not once per device
- virtio and vhost now support VIRTIO_F_NOTIFICATION_DATA
- cxl gained DCD emulation support
- pvpanic gained shutdown support
- acpi now supports Generic Port Affinity Structure
- new tests
- bugfixes

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Alejandro Jimenez (1):
      pvpanic: Emit GUEST_PVSHUTDOWN QMP event on pvpanic shutdown signal

Christian Pötzsch (1):
      Fix vhost user assertion when sending more than one fd

Cindy Lu (2):
      virtio-pci: Fix the use of an uninitialized irqfd.
      virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one()

Fan Ni (12):
      hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
      hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
      include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
      hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
      hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument
      hw/mem/cxl_type3: Add host backend and address space handling for DC regions
      hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
      hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
      hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
      hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
      hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
      hw/mem/cxl_type3: Allow to release extent superset in QMP interface

Gregory Price (2):
      hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference
      hw/cxl/mailbox: interface to add CCI commands to an existing CCI

Halil Pasic (1):
      vhost-vsock: add VIRTIO_F_RING_PACKED to feature_bits

Ira Weiny (1):
      hw/cxl: Fix read from bogus memory

Jiqian Chen (1):
      virtio-pci: only reset pm state during resetting

Jonah Palmer (5):
      virtio/virtio-pci: Handle extra notification data
      virtio: Prevent creation of device using notification-data with ioeventfd
      virtio-mmio: Handle extra notification data
      virtio-ccw: Handle extra notification data
      vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits

Jonathan Cameron (6):
      hw/acpi/GI: Fix trivial parameter alignment issue.
      hw/acpi: Insert an acpi-generic-node base under acpi-generic-initiator
      hw/acpi: Generic Port Affinity Structure support
      bios-tables-test: Allow for new acpihmat-generic-x test data.
      bios-tables-test: Add complex SRAT / HMAT test for GI GP
      bios-tables-test: Add data for complex numa test (GI, GP etc)

Li Feng (2):
      Revert "vhost-user: fix lost reconnect"
      vhost-user: fix lost reconnect again

Marc-André Lureau (1):
      vhost-user-gpu: fix import of DMABUF

Si-Wei Liu (2):
      vhost: dirty log should be per backend type
      vhost: Perform memory section dirty scans once per iteration

Stefano Garzarella (1):
      vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

Thomas Weißschuh (7):
      scripts/update-linux-headers: Copy setup_data.h to correct directory
      linux-headers: update to 6.10-rc1
      hw/misc/pvpanic: centralize definition of supported events
      tests/qtest/pvpanic: use centralized definition of supported events
      hw/misc/pvpanic: add support for normal shutdowns
      tests/qtest/pvpanic: add tests for pvshutdown event
      Revert "docs/specs/pvpanic: mark shutdown event as not implemented"

Wafer (1):
      hw/virtio: Fix obtain the buffer id from the last descriptor

 qapi/cxl.json                               | 143 ++++++
 qapi/qom.json                               |  35 ++
 qapi/run-state.json                         |  14 +
 include/hw/acpi/acpi_generic_initiator.h    |  33 +-
 include/hw/cxl/cxl_device.h                 |  85 +++-
 include/hw/cxl/cxl_events.h                 |  18 +
 include/hw/misc/pvpanic.h                   |   6 +
 include/hw/pci/pci_bridge.h                 |   1 +
 include/hw/virtio/vhost-user.h              |   3 +-
 include/hw/virtio/vhost.h                   |   1 +
 include/hw/virtio/virtio.h                  |   2 +
 include/standard-headers/linux/ethtool.h    |  55 +++
 include/standard-headers/linux/pci_regs.h   |   6 +
 include/standard-headers/linux/pvpanic.h    |   7 +-
 include/standard-headers/linux/virtio_bt.h  |   1 -
 include/standard-headers/linux/virtio_mem.h |   2 +
 include/standard-headers/linux/virtio_net.h | 143 ++++++
 include/sysemu/runstate.h                   |   1 +
 linux-headers/asm-generic/unistd.h          |   5 +-
 linux-headers/asm-loongarch/kvm.h           |   4 +
 linux-headers/asm-mips/unistd_n32.h         |   1 +
 linux-headers/asm-mips/unistd_n64.h         |   1 +
 linux-headers/asm-mips/unistd_o32.h         |   1 +
 linux-headers/asm-powerpc/unistd_32.h       |   1 +
 linux-headers/asm-powerpc/unistd_64.h       |   1 +
 linux-headers/asm-riscv/kvm.h               |   1 +
 linux-headers/asm-s390/unistd_32.h          |   1 +
 linux-headers/asm-s390/unistd_64.h          |   1 +
 linux-headers/asm-x86/kvm.h                 |   4 +-
 linux-headers/asm-x86/unistd_32.h           |   1 +
 linux-headers/asm-x86/unistd_64.h           |   1 +
 linux-headers/asm-x86/unistd_x32.h          |   2 +
 linux-headers/linux/kvm.h                   |   4 +-
 linux-headers/linux/stddef.h                |   8 +
 linux-headers/linux/vhost.h                 |  15 +-
 hw/acpi/acpi_generic_initiator.c            | 209 ++++++---
 hw/block/vhost-user-blk.c                   |   6 +-
 hw/cxl/cxl-mailbox-utils.c                  | 658 +++++++++++++++++++++++++++-
 hw/display/vhost-user-gpu.c                 |   5 +-
 hw/mem/cxl_type3.c                          | 637 +++++++++++++++++++++++++--
 hw/mem/cxl_type3_stubs.c                    |  25 ++
 hw/misc/pvpanic-isa.c                       |   3 +-
 hw/misc/pvpanic-pci.c                       |   3 +-
 hw/misc/pvpanic.c                           |   8 +-
 hw/net/vhost_net.c                          |   2 +
 hw/pci-bridge/pci_expander_bridge.c         |   1 -
 hw/s390x/s390-virtio-ccw.c                  |  17 +-
 hw/scsi/vhost-scsi.c                        |   1 +
 hw/scsi/vhost-user-scsi.c                   |   7 +-
 hw/virtio/vhost-user-base.c                 |   5 +-
 hw/virtio/vhost-user-fs.c                   |   2 +-
 hw/virtio/vhost-user-vsock.c                |   1 +
 hw/virtio/vhost-user.c                      |  18 +-
 hw/virtio/vhost-vsock-common.c              |   1 +
 hw/virtio/vhost.c                           | 112 ++++-
 hw/virtio/virtio-mmio.c                     |  11 +-
 hw/virtio/virtio-pci.c                      |  45 +-
 hw/virtio/virtio.c                          |  45 ++
 net/vhost-vdpa.c                            |  16 +-
 subprojects/libvhost-user/libvhost-user.c   |   2 +-
 system/runstate.c                           |   6 +
 tests/qtest/bios-tables-test.c              |  92 ++++
 tests/qtest/pvpanic-pci-test.c              |  44 +-
 tests/qtest/pvpanic-test.c                  |  34 +-
 docs/specs/pvpanic.rst                      |   2 +-
 scripts/update-linux-headers.sh             |   2 +-
 tests/data/acpi/q35/APIC.acpihmat-generic-x | Bin 0 -> 136 bytes
 tests/data/acpi/q35/CEDT.acpihmat-generic-x | Bin 0 -> 68 bytes
 tests/data/acpi/q35/DSDT.acpihmat-generic-x | Bin 0 -> 10400 bytes
 tests/data/acpi/q35/HMAT.acpihmat-generic-x | Bin 0 -> 360 bytes
 tests/data/acpi/q35/SRAT.acpihmat-generic-x | Bin 0 -> 520 bytes
 71 files changed, 2407 insertions(+), 221 deletions(-)
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat-generic-x
 create mode 100644 tests/data/acpi/q35/CEDT.acpihmat-generic-x
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat-generic-x
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat-generic-x
 create mode 100644 tests/data/acpi/q35/SRAT.acpihmat-generic-x



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PULL 01/46] vhost: dirty log should be per backend type
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 02/46] vhost: Perform memory section dirty scans once per iteration Michael S. Tsirkin
                   ` (46 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Si-Wei Liu

From: Si-Wei Liu <si-wei.liu@oracle.com>

There could be a mix of both vhost-user and vhost-kernel clients
in the same QEMU process, where separate vhost loggers for the
specific vhost type have to be used. Make the vhost logger per
backend type, and have them properly reference counted.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1710448055-11709-1-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost.c | 45 +++++++++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 4acd77e890..a1e8b79e1a 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -43,8 +43,8 @@
     do { } while (0)
 #endif
 
-static struct vhost_log *vhost_log;
-static struct vhost_log *vhost_log_shm;
+static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
+static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
 
 /* Memslots used by backends that support private memslots (without an fd). */
 static unsigned int used_memslots;
@@ -287,6 +287,10 @@ static int vhost_set_backend_type(struct vhost_dev *dev,
         r = -1;
     }
 
+    if (r == 0) {
+        assert(dev->vhost_ops->backend_type == backend_type);
+    }
+
     return r;
 }
 
@@ -319,16 +323,22 @@ static struct vhost_log *vhost_log_alloc(uint64_t size, bool share)
     return log;
 }
 
-static struct vhost_log *vhost_log_get(uint64_t size, bool share)
+static struct vhost_log *vhost_log_get(VhostBackendType backend_type,
+                                       uint64_t size, bool share)
 {
-    struct vhost_log *log = share ? vhost_log_shm : vhost_log;
+    struct vhost_log *log;
+
+    assert(backend_type > VHOST_BACKEND_TYPE_NONE);
+    assert(backend_type < VHOST_BACKEND_TYPE_MAX);
+
+    log = share ? vhost_log_shm[backend_type] : vhost_log[backend_type];
 
     if (!log || log->size != size) {
         log = vhost_log_alloc(size, share);
         if (share) {
-            vhost_log_shm = log;
+            vhost_log_shm[backend_type] = log;
         } else {
-            vhost_log = log;
+            vhost_log[backend_type] = log;
         }
     } else {
         ++log->refcnt;
@@ -340,11 +350,20 @@ static struct vhost_log *vhost_log_get(uint64_t size, bool share)
 static void vhost_log_put(struct vhost_dev *dev, bool sync)
 {
     struct vhost_log *log = dev->log;
+    VhostBackendType backend_type;
 
     if (!log) {
         return;
     }
 
+    assert(dev->vhost_ops);
+    backend_type = dev->vhost_ops->backend_type;
+
+    if (backend_type == VHOST_BACKEND_TYPE_NONE ||
+        backend_type >= VHOST_BACKEND_TYPE_MAX) {
+        return;
+    }
+
     --log->refcnt;
     if (log->refcnt == 0) {
         /* Sync only the range covered by the old log */
@@ -352,13 +371,13 @@ static void vhost_log_put(struct vhost_dev *dev, bool sync)
             vhost_log_sync_range(dev, 0, dev->log_size * VHOST_LOG_CHUNK - 1);
         }
 
-        if (vhost_log == log) {
+        if (vhost_log[backend_type] == log) {
             g_free(log->log);
-            vhost_log = NULL;
-        } else if (vhost_log_shm == log) {
+            vhost_log[backend_type] = NULL;
+        } else if (vhost_log_shm[backend_type] == log) {
             qemu_memfd_free(log->log, log->size * sizeof(*(log->log)),
                             log->fd);
-            vhost_log_shm = NULL;
+            vhost_log_shm[backend_type] = NULL;
         }
 
         g_free(log);
@@ -376,7 +395,8 @@ static bool vhost_dev_log_is_shared(struct vhost_dev *dev)
 
 static inline void vhost_dev_log_resize(struct vhost_dev *dev, uint64_t size)
 {
-    struct vhost_log *log = vhost_log_get(size, vhost_dev_log_is_shared(dev));
+    struct vhost_log *log = vhost_log_get(dev->vhost_ops->backend_type,
+                                          size, vhost_dev_log_is_shared(dev));
     uint64_t log_base = (uintptr_t)log->log;
     int r;
 
@@ -2044,7 +2064,8 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
         uint64_t log_base;
 
         hdev->log_size = vhost_get_log_size(hdev);
-        hdev->log = vhost_log_get(hdev->log_size,
+        hdev->log = vhost_log_get(hdev->vhost_ops->backend_type,
+                                  hdev->log_size,
                                   vhost_dev_log_is_shared(hdev));
         log_base = (uintptr_t)hdev->log->log;
         r = hdev->vhost_ops->vhost_set_log_base(hdev,
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 02/46] vhost: Perform memory section dirty scans once per iteration
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 01/46] vhost: dirty log should be per backend type Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 03/46] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value Michael S. Tsirkin
                   ` (45 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Si-Wei Liu, Joao Martins, Jason Wang

From: Si-Wei Liu <si-wei.liu@oracle.com>

On setups with one or more virtio-net devices with vhost on,
dirty tracking iteration increases cost the bigger the number
amount of queues are set up e.g. on idle guests migration the
following is observed with virtio-net with vhost=on:

48 queues -> 78.11%  [.] vhost_dev_sync_region.isra.13
8 queues -> 40.50%   [.] vhost_dev_sync_region.isra.13
1 queue -> 6.89%     [.] vhost_dev_sync_region.isra.13
2 devices, 1 queue -> 18.60%  [.] vhost_dev_sync_region.isra.14

With high memory rates the symptom is lack of convergence as soon
as it has a vhost device with a sufficiently high number of queues,
the sufficient number of vhost devices.

On every migration iteration (every 100msecs) it will redundantly
query the *shared log* the number of queues configured with vhost
that exist in the guest. For the virtqueue data, this is necessary,
but not for the memory sections which are the same. So essentially
we end up scanning the dirty log too often.

To fix that, select a vhost device responsible for scanning the
log with regards to memory sections dirty tracking. It is selected
when we enable the logger (during migration) and cleared when we
disable the logger. If the vhost logger device goes away for some
reason, the logger will be re-selected from the rest of vhost
devices.

After making mem-section logger a singleton instance, constant cost
of 7%-9% (like the 1 queue report) will be seen, no matter how many
queues or how many vhost devices are configured:

48 queues -> 8.71%    [.] vhost_dev_sync_region.isra.13
2 devices, 8 queues -> 7.97%   [.] vhost_dev_sync_region.isra.14

Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-Id: <1710448055-11709-2-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 include/hw/virtio/vhost.h |  1 +
 hw/virtio/vhost.c         | 67 +++++++++++++++++++++++++++++++++++----
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 02477788df..d75faf46e9 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -129,6 +129,7 @@ struct vhost_dev {
     void *opaque;
     struct vhost_log *log;
     QLIST_ENTRY(vhost_dev) entry;
+    QLIST_ENTRY(vhost_dev) logdev_entry;
     QLIST_HEAD(, vhost_iommu) iommu_list;
     IOMMUNotifier n;
     const VhostDevConfigOps *config_ops;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index a1e8b79e1a..06fc71746e 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -45,6 +45,7 @@
 
 static struct vhost_log *vhost_log[VHOST_BACKEND_TYPE_MAX];
 static struct vhost_log *vhost_log_shm[VHOST_BACKEND_TYPE_MAX];
+static QLIST_HEAD(, vhost_dev) vhost_log_devs[VHOST_BACKEND_TYPE_MAX];
 
 /* Memslots used by backends that support private memslots (without an fd). */
 static unsigned int used_memslots;
@@ -149,6 +150,47 @@ bool vhost_dev_has_iommu(struct vhost_dev *dev)
     }
 }
 
+static inline bool vhost_dev_should_log(struct vhost_dev *dev)
+{
+    assert(dev->vhost_ops);
+    assert(dev->vhost_ops->backend_type > VHOST_BACKEND_TYPE_NONE);
+    assert(dev->vhost_ops->backend_type < VHOST_BACKEND_TYPE_MAX);
+
+    return dev == QLIST_FIRST(&vhost_log_devs[dev->vhost_ops->backend_type]);
+}
+
+static inline void vhost_dev_elect_mem_logger(struct vhost_dev *hdev, bool add)
+{
+    VhostBackendType backend_type;
+
+    assert(hdev->vhost_ops);
+
+    backend_type = hdev->vhost_ops->backend_type;
+    assert(backend_type > VHOST_BACKEND_TYPE_NONE);
+    assert(backend_type < VHOST_BACKEND_TYPE_MAX);
+
+    if (add && !QLIST_IS_INSERTED(hdev, logdev_entry)) {
+        if (QLIST_EMPTY(&vhost_log_devs[backend_type])) {
+            QLIST_INSERT_HEAD(&vhost_log_devs[backend_type],
+                              hdev, logdev_entry);
+        } else {
+            /*
+             * The first vhost_device in the list is selected as the shared
+             * logger to scan memory sections. Put new entry next to the head
+             * to avoid inadvertent change to the underlying logger device.
+             * This is done in order to get better cache locality and to avoid
+             * performance churn on the hot path for log scanning. Even when
+             * new devices come and go quickly, it wouldn't end up changing
+             * the active leading logger device at all.
+             */
+            QLIST_INSERT_AFTER(QLIST_FIRST(&vhost_log_devs[backend_type]),
+                               hdev, logdev_entry);
+        }
+    } else if (!add && QLIST_IS_INSERTED(hdev, logdev_entry)) {
+        QLIST_REMOVE(hdev, logdev_entry);
+    }
+}
+
 static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
                                    MemoryRegionSection *section,
                                    hwaddr first,
@@ -166,12 +208,14 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
     start_addr = MAX(first, start_addr);
     end_addr = MIN(last, end_addr);
 
-    for (i = 0; i < dev->mem->nregions; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        vhost_dev_sync_region(dev, section, start_addr, end_addr,
-                              reg->guest_phys_addr,
-                              range_get_last(reg->guest_phys_addr,
-                                             reg->memory_size));
+    if (vhost_dev_should_log(dev)) {
+        for (i = 0; i < dev->mem->nregions; ++i) {
+            struct vhost_memory_region *reg = dev->mem->regions + i;
+            vhost_dev_sync_region(dev, section, start_addr, end_addr,
+                                  reg->guest_phys_addr,
+                                  range_get_last(reg->guest_phys_addr,
+                                                 reg->memory_size));
+        }
     }
     for (i = 0; i < dev->nvqs; ++i) {
         struct vhost_virtqueue *vq = dev->vqs + i;
@@ -383,6 +427,7 @@ static void vhost_log_put(struct vhost_dev *dev, bool sync)
         g_free(log);
     }
 
+    vhost_dev_elect_mem_logger(dev, false);
     dev->log = NULL;
     dev->log_size = 0;
 }
@@ -998,6 +1043,15 @@ static int vhost_dev_set_log(struct vhost_dev *dev, bool enable_log)
             goto err_vq;
         }
     }
+
+    /*
+     * At log start we select our vhost_device logger that will scan the
+     * memory sections and skip for the others. This is possible because
+     * the log is shared amongst all vhost devices for a given type of
+     * backend.
+     */
+    vhost_dev_elect_mem_logger(dev, enable_log);
+
     return 0;
 err_vq:
     for (; i >= 0; --i) {
@@ -2075,6 +2129,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
             VHOST_OPS_DEBUG(r, "vhost_set_log_base failed");
             goto fail_log;
         }
+        vhost_dev_elect_mem_logger(hdev, true);
     }
     if (vrings) {
         r = vhost_dev_set_vring_enable(hdev, true);
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 03/46] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 01/46] vhost: dirty log should be per backend type Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 02/46] vhost: Perform memory section dirty scans once per iteration Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 04/46] virtio-pci: Fix the use of an uninitialized irqfd Michael S. Tsirkin
                   ` (44 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Stefano Garzarella, Eugenio Pérez, Jason Wang,
	Philippe Mathieu-Daudé

From: Stefano Garzarella <sgarzare@redhat.com>

vhost_vdpa_set_vring_ready() could already fail, but if Linux's
patch [1] will be merged, it may fail with more chance if
userspace does not activate virtqueues before DRIVER_OK when
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.

So better check its return value anyway.

[1] https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarzare@redhat.com/T/#u

Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20240322092315.31885-1-sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/vhost-vdpa.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..eda714d1a4 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -399,7 +399,10 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
     }
 
     for (int i = 0; i < v->dev->nvqs; ++i) {
-        vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
+        int ret = vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
+        if (ret < 0) {
+            return ret;
+        }
     }
     return 0;
 }
@@ -1238,7 +1241,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
-    vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
+    r = vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
+    if (unlikely(r < 0)) {
+        return r;
+    }
 
     if (v->shadow_vqs_enabled) {
         n = VIRTIO_NET(v->dev->vdev);
@@ -1277,7 +1283,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
     }
 
     for (int i = 0; i < v->dev->vq_index; ++i) {
-        vhost_vdpa_set_vring_ready(v, i);
+        r = vhost_vdpa_set_vring_ready(v, i);
+        if (unlikely(r < 0)) {
+            return r;
+        }
     }
 
     return 0;
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 04/46] virtio-pci: Fix the use of an uninitialized irqfd.
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 03/46] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-05  7:27   ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 05/46] virtio/virtio-pci: Handle extra notification data Michael S. Tsirkin
                   ` (43 subsequent siblings)
  47 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Cindy Lu, qemu-stable, Jason Wang

From: Cindy Lu <lulu@redhat.com>

The crash was reported in MAC OS and NixOS, here is the link for this bug
https://gitlab.com/qemu-project/qemu/-/issues/2334
https://gitlab.com/qemu-project/qemu/-/issues/2321

The root cause is that the function virtio_pci_set_guest_notifiers() only
initializes the irqfd when the use_guest_notifier_mask and guest_notifier_mask
are set.
However, this check is missing in virtio_pci_set_vector().
So the fix is to add this check.

This fix is verified in vyatta,MacOS,NixOS,fedora system.

The bt tree for this bug is:
Thread 6 "CPU 0/KVM" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7c817be006c0 (LWP 1269146)]
kvm_virtio_pci_vq_vector_use () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:817
817	    if (irqfd->users == 0) {
(gdb) thread apply all bt
...
Thread 6 (Thread 0x7c817be006c0 (LWP 1269146) "CPU 0/KVM"):
0  kvm_virtio_pci_vq_vector_use () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:817
1  kvm_virtio_pci_vector_use_one () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:893
2  0x00005983657045e2 in memory_region_write_accessor () at ../qemu-9.0.0/system/memory.c:497
3  0x0000598365704ba6 in access_with_adjusted_size () at ../qemu-9.0.0/system/memory.c:573
4  0x0000598365705059 in memory_region_dispatch_write () at ../qemu-9.0.0/system/memory.c:1528
5  0x00005983659b8e1f in flatview_write_continue_step.isra.0 () at ../qemu-9.0.0/system/physmem.c:2713
6  0x000059836570ba7d in flatview_write_continue () at ../qemu-9.0.0/system/physmem.c:2743
7  flatview_write () at ../qemu-9.0.0/system/physmem.c:2774
8  0x000059836570bb76 in address_space_write () at ../qemu-9.0.0/system/physmem.c:2894
9  0x0000598365763afe in address_space_rw () at ../qemu-9.0.0/system/physmem.c:2904
10 kvm_cpu_exec () at ../qemu-9.0.0/accel/kvm/kvm-all.c:2917
11 0x000059836576656e in kvm_vcpu_thread_fn () at ../qemu-9.0.0/accel/kvm/kvm-accel-ops.c:50
12 0x0000598365926ca8 in qemu_thread_start () at ../qemu-9.0.0/util/qemu-thread-posix.c:541
13 0x00007c8185bcd1cf in ??? () at /usr/lib/libc.so.6
14 0x00007c8185c4e504 in clone () at /usr/lib/libc.so.6

Fixes: 2ce6cff94d ("virtio-pci: fix use of a released vector")
Cc: qemu-stable@nongnu.org
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20240522051042.985825-1-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-pci.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b1d02f4b3d..a7faee5b33 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1431,6 +1431,7 @@ static void virtio_pci_set_vector(VirtIODevice *vdev,
 {
     bool kvm_irqfd = (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
         msix_enabled(&proxy->pci_dev) && kvm_msi_via_irqfd_enabled();
+    VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
 
     if (new_vector == old_vector) {
         return;
@@ -1441,7 +1442,8 @@ static void virtio_pci_set_vector(VirtIODevice *vdev,
      * set, we need to release the old vector and set up the new one.
      * Otherwise just need to set the new vector on the device.
      */
-    if (kvm_irqfd && old_vector != VIRTIO_NO_VECTOR) {
+    if (kvm_irqfd && old_vector != VIRTIO_NO_VECTOR &&
+        vdev->use_guest_notifier_mask && k->guest_notifier_mask) {
         kvm_virtio_pci_vector_release_one(proxy, queue_no);
     }
     /* Set the new vector on the device. */
@@ -1451,7 +1453,8 @@ static void virtio_pci_set_vector(VirtIODevice *vdev,
         virtio_queue_set_vector(vdev, queue_no, new_vector);
     }
     /* If the new vector changed need to set it up. */
-    if (kvm_irqfd && new_vector != VIRTIO_NO_VECTOR) {
+    if (kvm_irqfd && new_vector != VIRTIO_NO_VECTOR &&
+        vdev->use_guest_notifier_mask && k->guest_notifier_mask) {
         kvm_virtio_pci_vector_use_one(proxy, queue_no);
     }
 }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 05/46] virtio/virtio-pci: Handle extra notification data
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 04/46] virtio-pci: Fix the use of an uninitialized irqfd Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 06/46] virtio: Prevent creation of device using notification-data with ioeventfd Michael S. Tsirkin
                   ` (42 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jonah Palmer, Eugenio Pérez

From: Jonah Palmer <jonah.palmer@oracle.com>

Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
 - upper 16 bits: shadow_avail_idx
 - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
 - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
 - lower 16 bits: virtqueue index

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240315165557.26942-2-jonah.palmer@oracle.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio.h |  2 ++
 hw/virtio/virtio-pci.c     | 12 +++++++++---
 hw/virtio/virtio.c         | 18 ++++++++++++++++++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..1451926a13 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -307,6 +307,8 @@ int virtio_queue_ready(VirtQueue *vq);
 
 int virtio_queue_empty(VirtQueue *vq);
 
+void virtio_queue_set_shadow_avail_idx(VirtQueue *vq, uint16_t idx);
+
 /* Host binding interface.  */
 
 uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index a7faee5b33..ad0035952f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
     VirtIOPCIProxy *proxy = opaque;
     VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
-    uint16_t vector;
+    uint16_t vector, vq_idx;
     hwaddr pa;
 
     switch (addr) {
@@ -408,8 +408,14 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
             vdev->queue_sel = val;
         break;
     case VIRTIO_PCI_QUEUE_NOTIFY:
-        if (val < VIRTIO_QUEUE_MAX) {
-            virtio_queue_notify(vdev, val);
+        vq_idx = val;
+        if (vq_idx < VIRTIO_QUEUE_MAX && virtio_queue_get_num(vdev, vq_idx)) {
+            if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+                VirtQueue *vq = virtio_get_queue(vdev, vq_idx);
+
+                virtio_queue_set_shadow_avail_idx(vq, val >> 16);
+            }
+            virtio_queue_notify(vdev, vq_idx);
         }
         break;
     case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..f7c99e3a96 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2264,6 +2264,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, int align)
     }
 }
 
+void virtio_queue_set_shadow_avail_idx(VirtQueue *vq, uint16_t shadow_avail_idx)
+{
+    if (!vq->vring.desc) {
+        return;
+    }
+
+    /*
+     * 16-bit data for packed VQs include 1-bit wrap counter and
+     * 15-bit shadow_avail_idx.
+     */
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        vq->shadow_avail_wrap_counter = (shadow_avail_idx >> 15) & 0x1;
+        vq->shadow_avail_idx = shadow_avail_idx & 0x7FFF;
+    } else {
+        vq->shadow_avail_idx = shadow_avail_idx;
+    }
+}
+
 static void virtio_queue_notify_vq(VirtQueue *vq)
 {
     if (vq->vring.desc && vq->handle_output) {
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 06/46] virtio: Prevent creation of device using notification-data with ioeventfd
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 05/46] virtio/virtio-pci: Handle extra notification data Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 07/46] virtio-mmio: Handle extra notification data Michael S. Tsirkin
                   ` (41 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jonah Palmer

From: Jonah Palmer <jonah.palmer@oracle.com>

Prevent the realization of a virtio device that attempts to use the
VIRTIO_F_NOTIFICATION_DATA transport feature without disabling
ioeventfd.

Due to ioeventfd not being able to carry the extra data associated with
this feature, having both enabled is a functional mismatch and therefore
Qemu should not continue the device's realization process.

Although the device does not yet know if the feature will be
successfully negotiated, many devices using this feature wont actually
work without this extra data and would fail FEATURES_OK anyway.

If ioeventfd is able to work with the extra notification data in the
future, this compatibility check can be removed.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240315165557.26942-3-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index f7c99e3a96..28cd406e16 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2980,6 +2980,20 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
     return ret;
 }
 
+static void virtio_device_check_notification_compatibility(VirtIODevice *vdev,
+                                                           Error **errp)
+{
+    VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+    DeviceState *proxy = DEVICE(BUS(bus)->parent);
+
+    if (virtio_host_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
+        k->ioeventfd_enabled(proxy)) {
+        error_setg(errp,
+                   "notification_data=on without ioeventfd=off is not supported");
+    }
+}
+
 size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
                               uint64_t host_features)
 {
@@ -3740,6 +3754,14 @@ static void virtio_device_realize(DeviceState *dev, Error **errp)
         }
     }
 
+    /* Devices should not use both ioeventfd and notification data feature */
+    virtio_device_check_notification_compatibility(vdev, &err);
+    if (err != NULL) {
+        error_propagate(errp, err);
+        vdc->unrealize(dev);
+        return;
+    }
+
     virtio_bus_device_plugged(vdev, &err);
     if (err != NULL) {
         error_propagate(errp, err);
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 07/46] virtio-mmio: Handle extra notification data
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 06/46] virtio: Prevent creation of device using notification-data with ioeventfd Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 08/46] virtio-ccw: " Michael S. Tsirkin
                   ` (40 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jonah Palmer

From: Jonah Palmer <jonah.palmer@oracle.com>

Add support to virtio-mmio devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-mmio device when this feature
is enabled varies depending on the device's virtqueue layout.

The data passed to the virtio-mmio device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240315165557.26942-4-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-mmio.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 22f9fbcf5a..320428ac0d 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -248,6 +248,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, uint64_t value,
 {
     VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
     VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+    uint16_t vq_idx;
 
     trace_virtio_mmio_write_offset(offset, value);
 
@@ -407,8 +408,14 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, uint64_t value,
         }
         break;
     case VIRTIO_MMIO_QUEUE_NOTIFY:
-        if (value < VIRTIO_QUEUE_MAX) {
-            virtio_queue_notify(vdev, value);
+        vq_idx = value;
+        if (vq_idx < VIRTIO_QUEUE_MAX && virtio_queue_get_num(vdev, vq_idx)) {
+            if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+                VirtQueue *vq = virtio_get_queue(vdev, vq_idx);
+
+                virtio_queue_set_shadow_avail_idx(vq, (value >> 16) & 0xFFFF);
+            }
+            virtio_queue_notify(vdev, vq_idx);
         }
         break;
     case VIRTIO_MMIO_INTERRUPT_ACK:
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 08/46] virtio-ccw: Handle extra notification data
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 07/46] virtio-mmio: Handle extra notification data Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 09/46] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits Michael S. Tsirkin
                   ` (39 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jonah Palmer, Richard Henderson, David Hildenbrand,
	Ilya Leoshkevich, Thomas Huth, Halil Pasic, Christian Borntraeger,
	Eric Farman, qemu-s390x

From: Jonah Palmer <jonah.palmer@oracle.com>

Add support to virtio-ccw devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-ccw device when this feature
is enabled varies depending on the device's virtqueue layout.

That data passed to the virtio-ccw device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240315165557.26942-5-jonah.palmer@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/s390x/s390-virtio-ccw.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 3d0bc3e7f2..956ef7b98d 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -125,9 +125,11 @@ static void subsystem_reset(void)
 static int virtio_ccw_hcall_notify(const uint64_t *args)
 {
     uint64_t subch_id = args[0];
-    uint64_t queue = args[1];
+    uint64_t data = args[1];
     SubchDev *sch;
+    VirtIODevice *vdev;
     int cssid, ssid, schid, m;
+    uint16_t vq_idx = data;
 
     if (ioinst_disassemble_sch_ident(subch_id, &m, &cssid, &ssid, &schid)) {
         return -EINVAL;
@@ -136,12 +138,19 @@ static int virtio_ccw_hcall_notify(const uint64_t *args)
     if (!sch || !css_subch_visible(sch)) {
         return -EINVAL;
     }
-    if (queue >= VIRTIO_QUEUE_MAX) {
+
+    vdev = virtio_ccw_get_vdev(sch);
+    if (vq_idx >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, vq_idx)) {
         return -EINVAL;
     }
-    virtio_queue_notify(virtio_ccw_get_vdev(sch), queue);
-    return 0;
 
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+        virtio_queue_set_shadow_avail_idx(virtio_get_queue(vdev, vq_idx),
+                                          (data >> 16) & 0xFFFF);
+    }
+
+    virtio_queue_notify(vdev, vq_idx);
+    return 0;
 }
 
 static int virtio_ccw_hcall_early_printk(const uint64_t *args)
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 09/46] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (7 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 08/46] virtio-ccw: " Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 10/46] Fix vhost user assertion when sending more than one fd Michael S. Tsirkin
                   ` (38 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jonah Palmer, Lei Yang, Eugenio Pérez,
	Srujana Challa, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi, qemu-block,
	virtio-fs

From: Jonah Palmer <jonah.palmer@oracle.com>

Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
of vhost devices.

The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
for these devices ensures that the backend is capable of offering and
providing support for this feature, and that it can be disabled if the
backend does not support it.

Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
Message-Id: <20240315165557.26942-6-jonah.palmer@oracle.com>
Acked-by: Srujana Challa <schalla@marvell.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/block/vhost-user-blk.c    | 1 +
 hw/net/vhost_net.c           | 2 ++
 hw/scsi/vhost-scsi.c         | 1 +
 hw/scsi/vhost-user-scsi.c    | 1 +
 hw/virtio/vhost-user-fs.c    | 2 +-
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c             | 1 +
 7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..bc2677dbef 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_NOTIFICATION_DATA,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..18898afe81 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_NOTIFICATION_DATA,
     VIRTIO_NET_F_HASH_REPORT,
     VHOST_INVALID_FEATURE_BIT
 };
@@ -55,6 +56,7 @@ static const int kernel_feature_bits[] = {
 /* Features supported by others. */
 static const int user_feature_bits[] = {
     VIRTIO_F_NOTIFY_ON_EMPTY,
+    VIRTIO_F_NOTIFICATION_DATA,
     VIRTIO_RING_F_INDIRECT_DESC,
     VIRTIO_RING_F_EVENT_IDX,
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..3d5fe0994d 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_SCSI_F_HOTPLUG,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_NOTIFICATION_DATA,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..0b050805a8 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_SCSI_F_HOTPLUG,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_NOTIFICATION_DATA,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..ae48cc1c96 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,7 +33,7 @@ static const int user_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_RESET,
-
+    VIRTIO_F_NOTIFICATION_DATA,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..802b44a07d 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
     VIRTIO_RING_F_INDIRECT_DESC,
     VIRTIO_RING_F_EVENT_IDX,
     VIRTIO_F_NOTIFY_ON_EMPTY,
+    VIRTIO_F_NOTIFICATION_DATA,
     VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index eda714d1a4..daa38428c5 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
     VIRTIO_F_RING_PACKED,
     VIRTIO_F_RING_RESET,
     VIRTIO_F_VERSION_1,
+    VIRTIO_F_NOTIFICATION_DATA,
     VIRTIO_NET_F_CSUM,
     VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
     VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 10/46] Fix vhost user assertion when sending more than one fd
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (8 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 09/46] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 11/46] vhost-vsock: add VIRTIO_F_RING_PACKED to feature_bits Michael S. Tsirkin
                   ` (37 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Christian Pötzsch

From: Christian Pötzsch <christian.poetzsch@kernkonzept.com>

If the client sends more than one region this assert triggers. The
reason is that two fd's are 8 bytes and VHOST_MEMORY_BASELINE_NREGIONS
is exactly 8.

The assert is wrong because it should not test for the size of the fd
array, but for the numbers of regions.

Signed-off-by: Christian Pötzsch <christian.poetzsch@kernkonzept.com>
Message-Id: <20240426083313.3081272-1-christian.poetzsch@kernkonzept.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 subprojects/libvhost-user/libvhost-user.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c b/subprojects/libvhost-user/libvhost-user.c
index a879149fef..8adb277d54 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -568,7 +568,7 @@ vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
         if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_RIGHTS) {
             fd_size = cmsg->cmsg_len - CMSG_LEN(0);
             vmsg->fd_num = fd_size / sizeof(int);
-            assert(fd_size < VHOST_MEMORY_BASELINE_NREGIONS);
+            assert(vmsg->fd_num <= VHOST_MEMORY_BASELINE_NREGIONS);
             memcpy(vmsg->fds, CMSG_DATA(cmsg), fd_size);
             break;
         }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 11/46] vhost-vsock: add VIRTIO_F_RING_PACKED to feature_bits
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (9 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 10/46] Fix vhost user assertion when sending more than one fd Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 12/46] hw/virtio: Fix obtain the buffer id from the last descriptor Michael S. Tsirkin
                   ` (36 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Halil Pasic, Marc Hartmayer

From: Halil Pasic <pasic@linux.ibm.com>

Not having VIRTIO_F_RING_PACKED in feature_bits[] is a problem when the
vhost-vsock device does not offer the feature bit VIRTIO_F_RING_PACKED
but the in QEMU device is configured to try to use the packed layout
(the virtio property "packed" is on).

As of today, the  Linux kernel vhost-vsock device does not support the
packed queue layout (as vhost does not support packed), and does not
offer VIRTIO_F_RING_PACKED. Thus when for example a vhost-vsock-ccw is
used with packed=on, VIRTIO_F_RING_PACKED ends up being negotiated,
despite the fact that the device does not actually support it, and
one gets to keep the pieces.

Fixes: 74b3e46630 ("virtio: add property to enable packed virtqueue")
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Message-Id: <20240429113334.2454197-1-pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-vsock-common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
index 12ea87d7a7..fd88df2560 100644
--- a/hw/virtio/vhost-vsock-common.c
+++ b/hw/virtio/vhost-vsock-common.c
@@ -22,6 +22,7 @@
 const int feature_bits[] = {
     VIRTIO_VSOCK_F_SEQPACKET,
     VIRTIO_F_RING_RESET,
+    VIRTIO_F_RING_PACKED,
     VHOST_INVALID_FEATURE_BIT
 };
 
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 12/46] hw/virtio: Fix obtain the buffer id from the last descriptor
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (10 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 11/46] vhost-vsock: add VIRTIO_F_RING_PACKED to feature_bits Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 13/46] virtio-pci: only reset pm state during resetting Michael S. Tsirkin
                   ` (35 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Wafer, Jason Wang, Eugenio Pérez

From: Wafer <wafer@jaguarmicro.com>

The virtio-1.3 specification
<https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html> writes:
2.8.6 Next Flag: Descriptor Chaining
      Buffer ID is included in the last descriptor in the list.

If the feature (_F_INDIRECT_DESC) has been negotiated, install only
one descriptor in the virtqueue.
Therefor the buffer id should be obtained from the first descriptor.

In descriptor chaining scenarios, the buffer id should be obtained
from the last descriptor.

Fixes: 86044b24e8 ("virtio: basic packed virtqueue support")

Signed-off-by: Wafer <wafer@jaguarmicro.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20240510072753.26158-2-wafer@jaguarmicro.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 28cd406e16..3678ec2f88 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1745,6 +1745,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
                                              &indirect_desc_cache);
     } while (rc == VIRTQUEUE_READ_DESC_MORE);
 
+    if (desc_cache != &indirect_desc_cache) {
+        /* Buffer ID is included in the last descriptor in the list. */
+        id = desc.id;
+    }
+
     /* Now copy what we have collected and mapped */
     elem = virtqueue_alloc_element(sz, out_num, in_num);
     for (i = 0; i < out_num; i++) {
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 13/46] virtio-pci: only reset pm state during resetting
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (11 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 12/46] hw/virtio: Fix obtain the buffer id from the last descriptor Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 14/46] vhost-user-gpu: fix import of DMABUF Michael S. Tsirkin
                   ` (34 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jiqian Chen

From: Jiqian Chen <Jiqian.Chen@amd.com>

Fix bug imported by 27ce0f3afc9dd ("fix Power Management Control Register for PCI Express virtio devices"
After this change, observe that QEMU may erroneously clear the power status of the device,
or may erroneously clear non writable registers, such as NO_SOFT_RESET, etc.

Only state of PM_CTRL is writable.
Only when flag VIRTIO_PCI_FLAG_INIT_PM is set, need to reset state.

Fixes: 27ce0f3afc9dd ("fix Power Management Control Register for PCI Express virtio devices"
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Message-Id: <20240515073526.17297-2-Jiqian.Chen@amd.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-pci.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ad0035952f..b6757ffce2 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2309,10 +2309,16 @@ static void virtio_pci_bus_reset_hold(Object *obj, ResetType type)
     virtio_pci_reset(qdev);
 
     if (pci_is_express(dev)) {
+        VirtIOPCIProxy *proxy = VIRTIO_PCI(dev);
+
         pcie_cap_deverr_reset(dev);
         pcie_cap_lnkctl_reset(dev);
 
-        pci_set_word(dev->config + dev->exp.pm_cap + PCI_PM_CTRL, 0);
+        if (proxy->flags & VIRTIO_PCI_FLAG_INIT_PM) {
+            pci_word_test_and_clear_mask(
+                dev->config + dev->exp.pm_cap + PCI_PM_CTRL,
+                PCI_PM_CTRL_STATE_MASK);
+        }
     }
 }
 
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 14/46] vhost-user-gpu: fix import of DMABUF
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (12 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 13/46] virtio-pci: only reset pm state during resetting Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 15/46] Revert "vhost-user: fix lost reconnect" Michael S. Tsirkin
                   ` (33 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Marc-André Lureau, Gerd Hoffmann

From: Marc-André Lureau <marcandre.lureau@redhat.com>

When using vhost-user-gpu with GL, qemu -display gtk doesn't show output
and prints: qemu: eglCreateImageKHR failed

Since commit 9ac06df8b ("virtio-gpu-udmabuf: correct naming of
QemuDmaBuf size properties"), egl_dmabuf_import_texture() uses
backing_{width,height} for the texture dimension.

Fixes: 9ac06df8b ("virtio-gpu-udmabuf: correct naming of QemuDmaBuf size properties")
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20240515105237.1074116-1-marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/display/vhost-user-gpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index e4b398d26c..63c64ddde6 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -281,8 +281,9 @@ vhost_user_gpu_handle_display(VhostUserGPU *g, VhostUserGpuMsg *msg)
             modifier = m2->modifier;
         }
 
-        dmabuf = qemu_dmabuf_new(m->fd_width, m->fd_height,
-                                 m->fd_stride, 0, 0, 0, 0,
+        dmabuf = qemu_dmabuf_new(m->width, m->height,
+                                 m->fd_stride, 0, 0,
+                                 m->fd_width, m->fd_height,
                                  m->fd_drm_fourcc, modifier,
                                  fd, false, m->fd_flags &
                                  VIRTIO_GPU_RESOURCE_FLAG_Y_0_TOP);
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 15/46] Revert "vhost-user: fix lost reconnect"
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (13 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 14/46] vhost-user-gpu: fix import of DMABUF Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 16/46] vhost-user: fix lost reconnect again Michael S. Tsirkin
                   ` (32 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Li Feng, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Paolo Bonzini, Fam Zheng, Alex Bennée, qemu-block

From: Li Feng <fengli@smartx.com>

This reverts commit f02a4b8e6431598612466f76aac64ab492849abf.

Since the current patch cannot completely fix the lost reconnect
problem, there is a scenario that is not considered:
- When the virtio-blk driver is removed from the guest os,
  s->connected has no chance to be set to false, resulting in
  subsequent reconnection not being executed.

The next patch will completely fix this issue with a better approach.

Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20240516025753.130171-2-fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/vhost-user.h |  3 +--
 hw/block/vhost-user-blk.c      |  2 +-
 hw/scsi/vhost-user-scsi.c      |  3 +--
 hw/virtio/vhost-user-base.c    |  2 +-
 hw/virtio/vhost-user.c         | 10 ++--------
 5 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index d7c09ffd34..324cd8663a 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -108,7 +108,6 @@ typedef void (*vu_async_close_fn)(DeviceState *cb);
 
 void vhost_user_async_close(DeviceState *d,
                             CharBackend *chardev, struct vhost_dev *vhost,
-                            vu_async_close_fn cb,
-                            IOEventHandler *event_cb);
+                            vu_async_close_fn cb);
 
 #endif
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index bc2677dbef..15cc24d017 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -385,7 +385,7 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
     case CHR_EVENT_CLOSED:
         /* defer close until later to avoid circular close */
         vhost_user_async_close(dev, &s->chardev, &s->dev,
-                               vhost_user_blk_disconnect, vhost_user_blk_event);
+                               vhost_user_blk_disconnect);
         break;
     case CHR_EVENT_BREAK:
     case CHR_EVENT_MUX_IN:
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 0b050805a8..421cd654f8 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -215,8 +215,7 @@ static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event)
     case CHR_EVENT_CLOSED:
         /* defer close until later to avoid circular close */
         vhost_user_async_close(dev, &vs->conf.chardev, &vsc->dev,
-                               vhost_user_scsi_disconnect,
-                               vhost_user_scsi_event);
+                               vhost_user_scsi_disconnect);
         break;
     case CHR_EVENT_BREAK:
     case CHR_EVENT_MUX_IN:
diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
index a83167191e..4b54255682 100644
--- a/hw/virtio/vhost-user-base.c
+++ b/hw/virtio/vhost-user-base.c
@@ -254,7 +254,7 @@ static void vub_event(void *opaque, QEMUChrEvent event)
     case CHR_EVENT_CLOSED:
         /* defer close until later to avoid circular close */
         vhost_user_async_close(dev, &vub->chardev, &vub->vhost_dev,
-                               vub_disconnect, vub_event);
+                               vub_disconnect);
         break;
     case CHR_EVENT_BREAK:
     case CHR_EVENT_MUX_IN:
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index cdf9af4a4b..c929097e87 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2776,7 +2776,6 @@ typedef struct {
     DeviceState *dev;
     CharBackend *cd;
     struct vhost_dev *vhost;
-    IOEventHandler *event_cb;
 } VhostAsyncCallback;
 
 static void vhost_user_async_close_bh(void *opaque)
@@ -2791,10 +2790,7 @@ static void vhost_user_async_close_bh(void *opaque)
      */
     if (vhost->vdev) {
         data->cb(data->dev);
-    } else if (data->event_cb) {
-        qemu_chr_fe_set_handlers(data->cd, NULL, NULL, data->event_cb,
-                                 NULL, data->dev, NULL, true);
-   }
+    }
 
     g_free(data);
 }
@@ -2806,8 +2802,7 @@ static void vhost_user_async_close_bh(void *opaque)
  */
 void vhost_user_async_close(DeviceState *d,
                             CharBackend *chardev, struct vhost_dev *vhost,
-                            vu_async_close_fn cb,
-                            IOEventHandler *event_cb)
+                            vu_async_close_fn cb)
 {
     if (!runstate_check(RUN_STATE_SHUTDOWN)) {
         /*
@@ -2823,7 +2818,6 @@ void vhost_user_async_close(DeviceState *d,
         data->dev = d;
         data->cd = chardev;
         data->vhost = vhost;
-        data->event_cb = event_cb;
 
         /* Disable any further notifications on the chardev */
         qemu_chr_fe_set_handlers(chardev,
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 16/46] vhost-user: fix lost reconnect again
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (14 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 15/46] Revert "vhost-user: fix lost reconnect" Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 17/46] hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference Michael S. Tsirkin
                   ` (31 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Li Feng, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Paolo Bonzini, Fam Zheng, Alex Bennée, qemu-block

From: Li Feng <fengli@smartx.com>

When the vhost-user is reconnecting to the backend, and if the vhost-user fails
at the get_features in vhost_dev_init(), then the reconnect will fail
and it will not be retriggered forever.

The reason is:
When the vhost-user fail at get_features, the vhost_dev_cleanup will be called
immediately.

vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.

The reconnect path is:
vhost_user_blk_event
   vhost_user_async_close(.. vhost_user_blk_disconnect ..)
     qemu_chr_fe_set_handlers <----- clear the notifier callback
       schedule vhost_user_async_close_bh

The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
called, then the event fd callback will not be reinstalled.

We need to ensure that even if vhost_dev_init initialization fails, the event
handler still needs to be reinstalled when s->connected is false.

All vhost-user devices have this issue, including vhost-user-blk/scsi.

Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")

Signed-off-by: Li Feng <fengli@smartx.com>
Message-Id: <20240516025753.130171-3-fengli@smartx.com>
Reviewed-by: Raphael Norwitz <raphael@enfabrica.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/block/vhost-user-blk.c   |  3 ++-
 hw/scsi/vhost-user-scsi.c   |  3 ++-
 hw/virtio/vhost-user-base.c |  3 ++-
 hw/virtio/vhost-user.c      | 10 +---------
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 15cc24d017..fdbc30b9ce 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -354,7 +354,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
 
     if (!s->connected) {
-        return;
+        goto done;
     }
     s->connected = false;
 
@@ -362,6 +362,7 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
 
     vhost_dev_cleanup(&s->dev);
 
+done:
     /* Re-instate the event handler for new connections */
     qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
                              NULL, dev, NULL, true);
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 421cd654f8..cc91ade525 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -182,7 +182,7 @@ static void vhost_user_scsi_disconnect(DeviceState *dev)
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
 
     if (!s->connected) {
-        return;
+        goto done;
     }
     s->connected = false;
 
@@ -190,6 +190,7 @@ static void vhost_user_scsi_disconnect(DeviceState *dev)
 
     vhost_dev_cleanup(&vsc->dev);
 
+done:
     /* Re-instate the event handler for new connections */
     qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL,
                              vhost_user_scsi_event, NULL, dev, NULL, true);
diff --git a/hw/virtio/vhost-user-base.c b/hw/virtio/vhost-user-base.c
index 4b54255682..11e72b1e3b 100644
--- a/hw/virtio/vhost-user-base.c
+++ b/hw/virtio/vhost-user-base.c
@@ -225,13 +225,14 @@ static void vub_disconnect(DeviceState *dev)
     VHostUserBase *vub = VHOST_USER_BASE(vdev);
 
     if (!vub->connected) {
-        return;
+        goto done;
     }
     vub->connected = false;
 
     vub_stop(vdev);
     vhost_dev_cleanup(&vub->vhost_dev);
 
+done:
     /* Re-instate the event handler for new connections */
     qemu_chr_fe_set_handlers(&vub->chardev,
                              NULL, NULL, vub_event,
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index c929097e87..c407ea8939 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2781,16 +2781,8 @@ typedef struct {
 static void vhost_user_async_close_bh(void *opaque)
 {
     VhostAsyncCallback *data = opaque;
-    struct vhost_dev *vhost = data->vhost;
 
-    /*
-     * If the vhost_dev has been cleared in the meantime there is
-     * nothing left to do as some other path has completed the
-     * cleanup.
-     */
-    if (vhost->vdev) {
-        data->cb(data->dev);
-    }
+    data->cb(data->dev);
 
     g_free(data);
 }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 17/46] hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (15 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 16/46] vhost-user: fix lost reconnect again Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:06 ` [PULL 18/46] hw/cxl/mailbox: interface to add CCI commands to an existing CCI Michael S. Tsirkin
                   ` (30 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Gregory Price, Gregory Price, Jonathan Cameron,
	Fan Ni

From: Gregory Price <gourry.memverge@gmail.com>

This allows devices to have fully customized CCIs, along with complex
devices where wrapper devices can override or add additional CCI
commands without having to replicate full command structures or
pollute a base device with every command that might ever be used.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-2-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h |  2 +-
 hw/cxl/cxl-mailbox-utils.c  | 19 +++++++++++++++----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 279b276bda..ccc4611875 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -164,7 +164,7 @@ typedef struct CXLEventLog {
 } CXLEventLog;
 
 typedef struct CXLCCI {
-    const struct cxl_cmd (*cxl_cmd_set)[256];
+    struct cxl_cmd cxl_cmd_set[256][256];
     struct cel_log {
         uint16_t opcode;
         uint16_t effect;
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index e5eb97cb91..2c9f50f0f9 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1447,10 +1447,21 @@ void cxl_init_cci(CXLCCI *cci, size_t payload_max)
                                  bg_timercb, cci);
 }
 
+static void cxl_copy_cci_commands(CXLCCI *cci, const struct cxl_cmd (*cxl_cmds)[256])
+{
+    for (int set = 0; set < 256; set++) {
+        for (int cmd = 0; cmd < 256; cmd++) {
+            if (cxl_cmds[set][cmd].handler) {
+                cci->cxl_cmd_set[set][cmd] = cxl_cmds[set][cmd];
+            }
+        }
+    }
+}
+
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
                                   DeviceState *d, size_t payload_max)
 {
-    cci->cxl_cmd_set = cxl_cmd_set_sw;
+    cxl_copy_cci_commands(cci, cxl_cmd_set_sw);
     cci->d = d;
     cci->intf = intf;
     cxl_init_cci(cci, payload_max);
@@ -1458,7 +1469,7 @@ void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
 
 void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max)
 {
-    cci->cxl_cmd_set = cxl_cmd_set;
+    cxl_copy_cci_commands(cci, cxl_cmd_set);
     cci->d = d;
 
     /* No separation for PCI MB as protocol handled in PCI device */
@@ -1476,7 +1487,7 @@ static const struct cxl_cmd cxl_cmd_set_t3_ld[256][256] = {
 void cxl_initialize_t3_ld_cci(CXLCCI *cci, DeviceState *d, DeviceState *intf,
                                size_t payload_max)
 {
-    cci->cxl_cmd_set = cxl_cmd_set_t3_ld;
+    cxl_copy_cci_commands(cci, cxl_cmd_set_t3_ld);
     cci->d = d;
     cci->intf = intf;
     cxl_init_cci(cci, payload_max);
@@ -1496,7 +1507,7 @@ void cxl_initialize_t3_fm_owned_ld_mctpcci(CXLCCI *cci, DeviceState *d,
                                            DeviceState *intf,
                                            size_t payload_max)
 {
-    cci->cxl_cmd_set = cxl_cmd_set_t3_fm_owned_ld_mctp;
+    cxl_copy_cci_commands(cci, cxl_cmd_set_t3_fm_owned_ld_mctp);
     cci->d = d;
     cci->intf = intf;
     cxl_init_cci(cci, payload_max);
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 18/46] hw/cxl/mailbox: interface to add CCI commands to an existing CCI
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (16 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 17/46] hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference Michael S. Tsirkin
@ 2024-06-04 19:06 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 19/46] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command Michael S. Tsirkin
                   ` (29 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Gregory Price, Gregory Price, Jonathan Cameron,
	Fan Ni

From: Gregory Price <gourry.memverge@gmail.com>

This enables wrapper devices to customize the base device's CCI
(for example, with custom commands outside the specification)
without the need to change the base device.

The also enabled the base device to dispatch those commands without
requiring additional driver support.

Heavily edited by Jonathan Cameron to increase code reuse

Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-3-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h |  2 ++
 hw/cxl/cxl-mailbox-utils.c  | 19 +++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index ccc4611875..a5f8e25020 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -301,6 +301,8 @@ void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max);
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
                                   DeviceState *d, size_t payload_max);
 void cxl_init_cci(CXLCCI *cci, size_t payload_max);
+void cxl_add_cci_commands(CXLCCI *cci, const struct cxl_cmd (*cxl_cmd_set)[256],
+                          size_t payload_max);
 int cxl_process_cci_message(CXLCCI *cci, uint8_t set, uint8_t cmd,
                             size_t len_in, uint8_t *pl_in,
                             size_t *len_out, uint8_t *pl_out,
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2c9f50f0f9..2a64c58e2f 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1424,9 +1424,9 @@ static void bg_timercb(void *opaque)
     }
 }
 
-void cxl_init_cci(CXLCCI *cci, size_t payload_max)
+static void cxl_rebuild_cel(CXLCCI *cci)
 {
-    cci->payload_max = payload_max;
+    cci->cel_size = 0; /* Reset for a fresh build */
     for (int set = 0; set < 256; set++) {
         for (int cmd = 0; cmd < 256; cmd++) {
             if (cci->cxl_cmd_set[set][cmd].handler) {
@@ -1440,6 +1440,13 @@ void cxl_init_cci(CXLCCI *cci, size_t payload_max)
             }
         }
     }
+}
+
+void cxl_init_cci(CXLCCI *cci, size_t payload_max)
+{
+    cci->payload_max = payload_max;
+    cxl_rebuild_cel(cci);
+
     cci->bg.complete_pct = 0;
     cci->bg.starttime = 0;
     cci->bg.runtime = 0;
@@ -1458,6 +1465,14 @@ static void cxl_copy_cci_commands(CXLCCI *cci, const struct cxl_cmd (*cxl_cmds)[
     }
 }
 
+void cxl_add_cci_commands(CXLCCI *cci, const struct cxl_cmd (*cxl_cmd_set)[256],
+                                 size_t payload_max)
+{
+    cci->payload_max = MAX(payload_max, cci->payload_max);
+    cxl_copy_cci_commands(cci, cxl_cmd_set);
+    cxl_rebuild_cel(cci);
+}
+
 void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
                                   DeviceState *d, size_t payload_max)
 {
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 19/46] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (17 preceding siblings ...)
  2024-06-04 19:06 ` [PULL 18/46] hw/cxl/mailbox: interface to add CCI commands to an existing CCI Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 20/46] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support Michael S. Tsirkin
                   ` (28 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fan Ni, Gregory Price, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Based on CXL spec r3.1 Table 8-127 (Identify Memory Device Output
Payload), dynamic capacity event log size should be part of
output of the Identify command.
Add dc_event_log_size to the output payload for the host to get the info.

Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-4-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/cxl/cxl-mailbox-utils.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2a64c58e2f..626acc1d0d 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -21,6 +21,7 @@
 #include "sysemu/hostmem.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
+#define CXL_DC_EVENT_LOG_SIZE 8
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -780,8 +781,9 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
         uint16_t inject_poison_limit;
         uint8_t poison_caps;
         uint8_t qos_telemetry_caps;
+        uint16_t dc_event_log_size;
     } QEMU_PACKED *id;
-    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+    QEMU_BUILD_BUG_ON(sizeof(*id) != 0x45);
     CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
     CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
@@ -807,6 +809,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     st24_le_p(id->poison_list_max_mer, 256);
     /* No limit - so limited by main poison record limit */
     stw_le_p(&id->inject_poison_limit, 0);
+    stw_le_p(&id->dc_event_log_size, CXL_DC_EVENT_LOG_SIZE);
 
     *len_out = sizeof(*id);
     return CXL_MBOX_SUCCESS;
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 20/46] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (18 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 19/46] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 21/46] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices Michael S. Tsirkin
                   ` (27 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fan Ni, Gregory Price, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Per cxl spec r3.1, add dynamic capacity (DC) region representative based on
Table 8-165 and extend the cxl type3 device definition to include DC region
information. Also, based on info in 8.2.9.9.9.1, add 'Get Dynamic Capacity
Configuration' mailbox support.

Note: we store region decode length as byte-wise length on the device, which
should be divided by 256 * MiB before being returned to the host
for "Get Dynamic Capacity Configuration" mailbox command per
specification.

Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-5-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h | 16 +++++++
 hw/cxl/cxl-mailbox-utils.c  | 96 +++++++++++++++++++++++++++++++++++++
 2 files changed, 112 insertions(+)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index a5f8e25020..e839370266 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -422,6 +422,17 @@ typedef struct CXLPoison {
 typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 #define CXL_POISON_LIST_LIMIT 256
 
+#define DCD_MAX_NUM_REGION 8
+
+typedef struct CXLDCRegion {
+    uint64_t base;       /* aligned to 256*MiB */
+    uint64_t decode_len; /* aligned to 256*MiB */
+    uint64_t len;
+    uint64_t block_size;
+    uint32_t dsmadhandle;
+    uint8_t flags;
+} CXLDCRegion;
+
 struct CXLType3Dev {
     /* Private */
     PCIDevice parent_obj;
@@ -454,6 +465,11 @@ struct CXLType3Dev {
     unsigned int poison_list_cnt;
     bool poison_list_overflowed;
     uint64_t poison_list_overflow_ts;
+
+    struct dynamic_capacity {
+        uint8_t num_regions; /* 0-8 regions */
+        CXLDCRegion regions[DCD_MAX_NUM_REGION];
+    } dc;
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 626acc1d0d..bede28e3c8 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -22,6 +22,8 @@
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 #define CXL_DC_EVENT_LOG_SIZE 8
+#define CXL_NUM_EXTENTS_SUPPORTED 512
+#define CXL_NUM_TAGS_SUPPORTED 0
 
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
@@ -80,6 +82,8 @@ enum {
         #define GET_POISON_LIST        0x0
         #define INJECT_POISON          0x1
         #define CLEAR_POISON           0x2
+    DCD_CONFIG  = 0x48,
+        #define GET_DC_CONFIG          0x0
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1238,6 +1242,88 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * CXL r3.1 section 8.2.9.9.9.1: Get Dynamic Capacity Configuration
+ * (Opcode: 4800h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
+                                             uint8_t *payload_in,
+                                             size_t len_in,
+                                             uint8_t *payload_out,
+                                             size_t *len_out,
+                                             CXLCCI *cci)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    struct {
+        uint8_t region_cnt;
+        uint8_t start_rid;
+    } QEMU_PACKED *in = (void *)payload_in;
+    struct {
+        uint8_t num_regions;
+        uint8_t regions_returned;
+        uint8_t rsvd1[6];
+        struct {
+            uint64_t base;
+            uint64_t decode_len;
+            uint64_t region_len;
+            uint64_t block_size;
+            uint32_t dsmadhandle;
+            uint8_t flags;
+            uint8_t rsvd2[3];
+        } QEMU_PACKED records[];
+    } QEMU_PACKED *out = (void *)payload_out;
+    struct {
+        uint32_t num_extents_supported;
+        uint32_t num_extents_available;
+        uint32_t num_tags_supported;
+        uint32_t num_tags_available;
+    } QEMU_PACKED *extra_out;
+    uint16_t record_count;
+    uint16_t i;
+    uint16_t out_pl_len;
+    uint8_t start_rid;
+
+    start_rid = in->start_rid;
+    if (start_rid >= ct3d->dc.num_regions) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    record_count = MIN(ct3d->dc.num_regions - in->start_rid, in->region_cnt);
+
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+    extra_out = (void *)(payload_out + out_pl_len);
+    out_pl_len += sizeof(*extra_out);
+    assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
+
+    out->num_regions = ct3d->dc.num_regions;
+    out->regions_returned = record_count;
+    for (i = 0; i < record_count; i++) {
+        stq_le_p(&out->records[i].base,
+                 ct3d->dc.regions[start_rid + i].base);
+        stq_le_p(&out->records[i].decode_len,
+                 ct3d->dc.regions[start_rid + i].decode_len /
+                 CXL_CAPACITY_MULTIPLIER);
+        stq_le_p(&out->records[i].region_len,
+                 ct3d->dc.regions[start_rid + i].len);
+        stq_le_p(&out->records[i].block_size,
+                 ct3d->dc.regions[start_rid + i].block_size);
+        stl_le_p(&out->records[i].dsmadhandle,
+                 ct3d->dc.regions[start_rid + i].dsmadhandle);
+        out->records[i].flags = ct3d->dc.regions[start_rid + i].flags;
+    }
+    /*
+     * TODO: Assign values once extents and tags are introduced
+     * to use.
+     */
+    stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
+    stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
+
+    *len_out = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1282,6 +1368,11 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
         cmd_media_clear_poison, 72, 0 },
 };
 
+static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
+    [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
+        cmd_dcd_get_dyn_cap_config, 2, 0 },
+};
+
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
     [INFOSTAT][IS_IDENTIFY] = { "IDENTIFY", cmd_infostat_identify, 0, 0 },
     [INFOSTAT][BACKGROUND_OPERATION_STATUS] = { "BACKGROUND_OPERATION_STATUS",
@@ -1487,7 +1578,12 @@ void cxl_initialize_mailbox_swcci(CXLCCI *cci, DeviceState *intf,
 
 void cxl_initialize_mailbox_t3(CXLCCI *cci, DeviceState *d, size_t payload_max)
 {
+    CXLType3Dev *ct3d = CXL_TYPE3(d);
+
     cxl_copy_cci_commands(cci, cxl_cmd_set);
+    if (ct3d->dc.num_regions) {
+        cxl_copy_cci_commands(cci, cxl_cmd_set_dcd);
+    }
     cci->d = d;
 
     /* No separation for PCI MB as protocol handled in PCI device */
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 21/46] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (19 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 20/46] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 22/46] hw/mem/cxl_type3: Add support to create DC regions to " Michael S. Tsirkin
                   ` (26 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fan Ni, Gregory Price, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Rename mem_size as static_mem_size for type3 memdev to cover static RAM and
pmem capacity, preparing for the introduction of dynamic capacity to support
dynamic capacity devices.

Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-6-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h | 2 +-
 hw/cxl/cxl-mailbox-utils.c  | 4 ++--
 hw/mem/cxl_type3.c          | 8 ++++----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index e839370266..f7f56b44e3 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -234,7 +234,7 @@ typedef struct cxl_device_state {
     } timestamp;
 
     /* memory region size, HDM */
-    uint64_t mem_size;
+    uint64_t static_mem_size;
     uint64_t pmem_size;
     uint64_t vmem_size;
 
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index bede28e3c8..b592473587 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -803,7 +803,7 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
     stq_le_p(&id->total_capacity,
-             cxl_dstate->mem_size / CXL_CAPACITY_MULTIPLIER);
+             cxl_dstate->static_mem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->persistent_capacity,
              cxl_dstate->pmem_size / CXL_CAPACITY_MULTIPLIER);
     stq_le_p(&id->volatile_capacity,
@@ -1179,7 +1179,7 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)payload_in;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->mem_size) {
+    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
         return CXL_MBOX_INVALID_PA;
     }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 3e42490b6c..7194c8f902 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -608,7 +608,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostvmem_as, vmr, v_name);
         ct3d->cxl_dstate.vmem_size = memory_region_size(vmr);
-        ct3d->cxl_dstate.mem_size += memory_region_size(vmr);
+        ct3d->cxl_dstate.static_mem_size += memory_region_size(vmr);
         g_free(v_name);
     }
 
@@ -631,7 +631,7 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         }
         address_space_init(&ct3d->hostpmem_as, pmr, p_name);
         ct3d->cxl_dstate.pmem_size = memory_region_size(pmr);
-        ct3d->cxl_dstate.mem_size += memory_region_size(pmr);
+        ct3d->cxl_dstate.static_mem_size += memory_region_size(pmr);
         g_free(p_name);
     }
 
@@ -837,7 +837,7 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > ct3d->cxl_dstate.mem_size) {
+    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
         return -EINVAL;
     }
 
@@ -1010,7 +1010,7 @@ static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
         return false;
     }
 
-    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.mem_size) {
+    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
         return false;
     }
 
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 22/46] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (20 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 21/46] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 23/46] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument Michael S. Tsirkin
                   ` (25 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Fan Ni, Gregory Price, Jonathan Cameron,
	Li Zhijian

From: Fan Ni <fan.ni@samsung.com>

With the change, when setting up memory for type3 memory device, we can
create DC regions.
A property 'num-dc-regions' is added to ct3_props to allow users to pass the
number of DC regions to create. To make it easier, other region parameters
like region base, length, and block size are hard coded. If needed,
these parameters can be added easily.

With the change, we can create DC regions with proper kernel side
support like below:

region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region
echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
echo 1 > /sys/bus/cxl/devices/$region/interleave_ways

echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size

echo 0x40000000 > /sys/bus/cxl/devices/$region/size
echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
echo 1 > /sys/bus/cxl/devices/$region/commit
echo $region > /sys/bus/cxl/drivers/cxl_region/bind

Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-7-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
---
 hw/mem/cxl_type3.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 7194c8f902..06c6f9bb78 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -30,6 +30,7 @@
 #include "hw/pci/msix.h"
 
 #define DWORD_BYTE 4
+#define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 
 /* Default CDAT entries for a memory region */
 enum {
@@ -567,6 +568,50 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
     }
 }
 
+/*
+ * TODO: dc region configuration will be updated once host backend and address
+ * space support is added for DCD.
+ */
+static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
+{
+    int i;
+    uint64_t region_base = 0;
+    uint64_t region_len =  2 * GiB;
+    uint64_t decode_len = 2 * GiB;
+    uint64_t blk_size = 2 * MiB;
+    CXLDCRegion *region;
+    MemoryRegion *mr;
+
+    if (ct3d->hostvmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostvmem);
+        region_base += memory_region_size(mr);
+    }
+    if (ct3d->hostpmem) {
+        mr = host_memory_backend_get_memory(ct3d->hostpmem);
+        region_base += memory_region_size(mr);
+    }
+    if (region_base % CXL_CAPACITY_MULTIPLIER != 0) {
+        error_setg(errp, "DC region base not aligned to 0x%lx",
+                   CXL_CAPACITY_MULTIPLIER);
+        return false;
+    }
+
+    for (i = 0, region = &ct3d->dc.regions[0];
+         i < ct3d->dc.num_regions;
+         i++, region++, region_base += region_len) {
+        *region = (CXLDCRegion) {
+            .base = region_base,
+            .decode_len = decode_len,
+            .len = region_len,
+            .block_size = blk_size,
+            /* dsmad_handle set when creating CDAT table entries */
+            .flags = 0,
+        };
+    }
+
+    return true;
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -635,6 +680,13 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         g_free(p_name);
     }
 
+    if (ct3d->dc.num_regions > 0) {
+        if (!cxl_create_dc_regions(ct3d, errp)) {
+            error_append_hint(errp, "setup DC regions failed");
+            return false;
+        }
+    }
+
     return true;
 }
 
@@ -930,6 +982,7 @@ static Property ct3_props[] = {
                      HostMemoryBackend *),
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
+    DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 23/46] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (21 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 22/46] hw/mem/cxl_type3: Add support to create DC regions to " Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 24/46] hw/mem/cxl_type3: Add host backend and address space handling for DC regions Michael S. Tsirkin
                   ` (24 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fan Ni, Gregory Price, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

The function ct3_build_cdat_entries_for_mr only uses size of the passed
memory region argument, refactor the function definition to make the passed
arguments more specific.

Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-8-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/mem/cxl_type3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 06c6f9bb78..51be50ce87 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -44,7 +44,7 @@ enum {
 };
 
 static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
-                                          int dsmad_handle, MemoryRegion *mr,
+                                          int dsmad_handle, uint64_t size,
                                           bool is_pmem, uint64_t dpa_base)
 {
     CDATDsmas *dsmas;
@@ -63,7 +63,7 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
         .DSMADhandle = dsmad_handle,
         .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
         .DPA_base = dpa_base,
-        .DPA_length = memory_region_size(mr),
+        .DPA_length = size,
     };
 
     /* For now, no memory side cache, plausiblish numbers */
@@ -132,7 +132,7 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
          */
         .EFI_memory_type_attr = is_pmem ? 2 : 1,
         .DPA_offset = 0,
-        .DPA_length = memory_region_size(mr),
+        .DPA_length = size,
     };
 
     /* Header always at start of structure */
@@ -149,6 +149,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     g_autofree CDATSubHeader **table = NULL;
     CXLType3Dev *ct3d = priv;
     MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+    uint64_t vmr_size = 0, pmr_size = 0;
     int dsmad_handle = 0;
     int cur_ent = 0;
     int len = 0;
@@ -163,6 +164,7 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+        vmr_size = memory_region_size(volatile_mr);
     }
 
     if (ct3d->hostpmem) {
@@ -171,21 +173,22 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
             return -EINVAL;
         }
         len += CT3_CDAT_NUM_ENTRIES;
+        pmr_size = memory_region_size(nonvolatile_mr);
     }
 
     table = g_malloc0(len * sizeof(*table));
 
     /* Now fill them in */
     if (volatile_mr) {
-        ct3_build_cdat_entries_for_mr(table, dsmad_handle++, volatile_mr,
+        ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
                                       false, 0);
         cur_ent = CT3_CDAT_NUM_ENTRIES;
     }
 
     if (nonvolatile_mr) {
-        uint64_t base = volatile_mr ? memory_region_size(volatile_mr) : 0;
+        uint64_t base = vmr_size;
         ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
-                                      nonvolatile_mr, true, base);
+                                      pmr_size, true, base);
         cur_ent += CT3_CDAT_NUM_ENTRIES;
     }
     assert(len == cur_ent);
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 24/46] hw/mem/cxl_type3: Add host backend and address space handling for DC regions
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (22 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 23/46] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 25/46] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support Michael S. Tsirkin
                   ` (23 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fan Ni, Gregory Price, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Add (file/memory backed) host backend for DCD. All the dynamic capacity
regions will share a single, large enough host backend. Set up address
space for DC regions to support read/write operations to dynamic capacity
for DCD.

With the change, the following support is added:
1. Add a new property to type3 device "volatile-dc-memdev" to point to host
   memory backend for dynamic capacity. Currently, all DC regions share one
   host backend;
2. Add namespace for dynamic capacity for read/write support;
3. Create cdat entries for each dynamic capacity region.

Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-9-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h |   8 ++
 hw/cxl/cxl-mailbox-utils.c  |  16 +++-
 hw/mem/cxl_type3.c          | 177 +++++++++++++++++++++++++++++-------
 3 files changed, 164 insertions(+), 37 deletions(-)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index f7f56b44e3..c2c3df0d2a 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -467,6 +467,14 @@ struct CXLType3Dev {
     uint64_t poison_list_overflow_ts;
 
     struct dynamic_capacity {
+        HostMemoryBackend *host_dc;
+        AddressSpace host_dc_as;
+        /*
+         * total_capacity is equivalent to the dynamic capability
+         * memory region size.
+         */
+        uint64_t total_capacity; /* 256M aligned */
+
         uint8_t num_regions; /* 0-8 regions */
         CXLDCRegion regions[DCD_MAX_NUM_REGION];
     } dc;
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index b592473587..6ad227f112 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -622,7 +622,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
                                                size_t *len_out,
                                                CXLCCI *cci)
 {
-    CXLDeviceState *cxl_dstate = &CXL_TYPE3(cci->d)->cxl_dstate;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
     struct {
         uint8_t slots_supported;
         uint8_t slot_info;
@@ -636,7 +637,8 @@ static CXLRetCode cmd_firmware_update_get_info(const struct cxl_cmd *cmd,
     QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
     if ((cxl_dstate->vmem_size < CXL_CAPACITY_MULTIPLIER) ||
-        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER)) {
+        (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) ||
+        (ct3d->dc.total_capacity < CXL_CAPACITY_MULTIPLIER)) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -793,7 +795,8 @@ static CXLRetCode cmd_identify_memory_device(const struct cxl_cmd *cmd,
     CXLDeviceState *cxl_dstate = &ct3d->cxl_dstate;
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -835,9 +838,11 @@ static CXLRetCode cmd_ccls_get_partition_info(const struct cxl_cmd *cmd,
         uint64_t next_pmem;
     } QEMU_PACKED *part_info = (void *)payload_out;
     QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+    CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
 
     if ((!QEMU_IS_ALIGNED(cxl_dstate->vmem_size, CXL_CAPACITY_MULTIPLIER)) ||
-        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER))) {
+        (!QEMU_IS_ALIGNED(cxl_dstate->pmem_size, CXL_CAPACITY_MULTIPLIER)) ||
+        (!QEMU_IS_ALIGNED(ct3d->dc.total_capacity, CXL_CAPACITY_MULTIPLIER))) {
         return CXL_MBOX_INTERNAL_ERROR;
     }
 
@@ -1179,7 +1184,8 @@ static CXLRetCode cmd_media_clear_poison(const struct cxl_cmd *cmd,
     struct clear_poison_pl *in = (void *)payload_in;
 
     dpa = ldq_le_p(&in->dpa);
-    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size) {
+    if (dpa + CXL_CACHE_LINE_SIZE > cxl_dstate->static_mem_size +
+        ct3d->dc.total_capacity) {
         return CXL_MBOX_INVALID_PA;
     }
 
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 51be50ce87..658570aa1a 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -45,7 +45,8 @@ enum {
 
 static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
                                           int dsmad_handle, uint64_t size,
-                                          bool is_pmem, uint64_t dpa_base)
+                                          bool is_pmem, bool is_dynamic,
+                                          uint64_t dpa_base)
 {
     CDATDsmas *dsmas;
     CDATDslbis *dslbis0;
@@ -61,7 +62,8 @@ static void ct3_build_cdat_entries_for_mr(CDATSubHeader **cdat_table,
             .length = sizeof(*dsmas),
         },
         .DSMADhandle = dsmad_handle,
-        .flags = is_pmem ? CDAT_DSMAS_FLAG_NV : 0,
+        .flags = (is_pmem ? CDAT_DSMAS_FLAG_NV : 0) |
+                 (is_dynamic ? CDAT_DSMAS_FLAG_DYNAMIC_CAP : 0),
         .DPA_base = dpa_base,
         .DPA_length = size,
     };
@@ -149,12 +151,13 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
     g_autofree CDATSubHeader **table = NULL;
     CXLType3Dev *ct3d = priv;
     MemoryRegion *volatile_mr = NULL, *nonvolatile_mr = NULL;
+    MemoryRegion *dc_mr = NULL;
     uint64_t vmr_size = 0, pmr_size = 0;
     int dsmad_handle = 0;
     int cur_ent = 0;
     int len = 0;
 
-    if (!ct3d->hostpmem && !ct3d->hostvmem) {
+    if (!ct3d->hostpmem && !ct3d->hostvmem && !ct3d->dc.num_regions) {
         return 0;
     }
 
@@ -176,21 +179,54 @@ static int ct3_build_cdat_table(CDATSubHeader ***cdat_table, void *priv)
         pmr_size = memory_region_size(nonvolatile_mr);
     }
 
+    if (ct3d->dc.num_regions) {
+        if (!ct3d->dc.host_dc) {
+            return -EINVAL;
+        }
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        if (!dc_mr) {
+            return -EINVAL;
+        }
+        len += CT3_CDAT_NUM_ENTRIES * ct3d->dc.num_regions;
+    }
+
     table = g_malloc0(len * sizeof(*table));
 
     /* Now fill them in */
     if (volatile_mr) {
         ct3_build_cdat_entries_for_mr(table, dsmad_handle++, vmr_size,
-                                      false, 0);
+                                      false, false, 0);
         cur_ent = CT3_CDAT_NUM_ENTRIES;
     }
 
     if (nonvolatile_mr) {
         uint64_t base = vmr_size;
         ct3_build_cdat_entries_for_mr(&(table[cur_ent]), dsmad_handle++,
-                                      pmr_size, true, base);
+                                      pmr_size, true, false, base);
         cur_ent += CT3_CDAT_NUM_ENTRIES;
     }
+
+    if (dc_mr) {
+        int i;
+        uint64_t region_base = vmr_size + pmr_size;
+
+        /*
+         * We assume the dynamic capacity to be volatile for now.
+         * Non-volatile dynamic capacity will be added if needed in the
+         * future.
+         */
+        for (i = 0; i < ct3d->dc.num_regions; i++) {
+            ct3_build_cdat_entries_for_mr(&(table[cur_ent]),
+                                          dsmad_handle++,
+                                          ct3d->dc.regions[i].len,
+                                          false, true, region_base);
+            ct3d->dc.regions[i].dsmadhandle = dsmad_handle - 1;
+
+            cur_ent += CT3_CDAT_NUM_ENTRIES;
+            region_base += ct3d->dc.regions[i].len;
+        }
+    }
+
     assert(len == cur_ent);
 
     *cdat_table = g_steal_pointer(&table);
@@ -301,10 +337,17 @@ static void build_dvsecs(CXLType3Dev *ct3d)
             range2_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                              (ct3d->hostpmem->size & 0xF0000000);
         }
-    } else {
+    } else if (ct3d->hostpmem) {
         range1_size_hi = ct3d->hostpmem->size >> 32;
         range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
                          (ct3d->hostpmem->size & 0xF0000000);
+    } else {
+        /*
+         * For DCD with no static memory, set memory active, memory class bits.
+         * No range is set.
+         */
+        range1_size_hi = 0;
+        range1_size_lo = (2 << 5) | (2 << 2) | 0x3;
     }
 
     dvsec = (uint8_t *)&(CXLDVSECDevice){
@@ -579,11 +622,29 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 {
     int i;
     uint64_t region_base = 0;
-    uint64_t region_len =  2 * GiB;
-    uint64_t decode_len = 2 * GiB;
+    uint64_t region_len;
+    uint64_t decode_len;
     uint64_t blk_size = 2 * MiB;
     CXLDCRegion *region;
     MemoryRegion *mr;
+    uint64_t dc_size;
+
+    mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+    dc_size = memory_region_size(mr);
+    region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions);
+
+    if (dc_size % (ct3d->dc.num_regions * CXL_CAPACITY_MULTIPLIER) != 0) {
+        error_setg(errp,
+                   "backend size is not multiple of region len: 0x%" PRIx64,
+                   region_len);
+        return false;
+    }
+    if (region_len % CXL_CAPACITY_MULTIPLIER != 0) {
+        error_setg(errp, "DC region size is unaligned to 0x%" PRIx64,
+                   CXL_CAPACITY_MULTIPLIER);
+        return false;
+    }
+    decode_len = region_len;
 
     if (ct3d->hostvmem) {
         mr = host_memory_backend_get_memory(ct3d->hostvmem);
@@ -594,7 +655,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         region_base += memory_region_size(mr);
     }
     if (region_base % CXL_CAPACITY_MULTIPLIER != 0) {
-        error_setg(errp, "DC region base not aligned to 0x%lx",
+        error_setg(errp, "DC region base not aligned to 0x%" PRIx64,
                    CXL_CAPACITY_MULTIPLIER);
         return false;
     }
@@ -610,6 +671,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
             /* dsmad_handle set when creating CDAT table entries */
             .flags = 0,
         };
+        ct3d->dc.total_capacity += region->len;
     }
 
     return true;
@@ -619,7 +681,8 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
 
-    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem) {
+    if (!ct3d->hostmem && !ct3d->hostvmem && !ct3d->hostpmem
+        && !ct3d->dc.num_regions) {
         error_setg(errp, "at least one memdev property must be set");
         return false;
     } else if (ct3d->hostmem && ct3d->hostpmem) {
@@ -683,7 +746,37 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
         g_free(p_name);
     }
 
+    ct3d->dc.total_capacity = 0;
     if (ct3d->dc.num_regions > 0) {
+        MemoryRegion *dc_mr;
+        char *dc_name;
+
+        if (!ct3d->dc.host_dc) {
+            error_setg(errp, "dynamic capacity must have a backing device");
+            return false;
+        }
+
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        if (!dc_mr) {
+            error_setg(errp, "dynamic capacity must have a backing device");
+            return false;
+        }
+
+        /*
+         * Set DC regions as volatile for now, non-volatile support can
+         * be added in the future if needed.
+         */
+        memory_region_set_nonvolatile(dc_mr, false);
+        memory_region_set_enabled(dc_mr, true);
+        host_memory_backend_set_mapped(ct3d->dc.host_dc, true);
+        if (ds->id) {
+            dc_name = g_strdup_printf("cxl-dcd-dpa-dc-space:%s", ds->id);
+        } else {
+            dc_name = g_strdup("cxl-dcd-dpa-dc-space");
+        }
+        address_space_init(&ct3d->dc.host_dc_as, dc_mr, dc_name);
+        g_free(dc_name);
+
         if (!cxl_create_dc_regions(ct3d, errp)) {
             error_append_hint(errp, "setup DC regions failed");
             return false;
@@ -779,6 +872,9 @@ err_release_cdat:
 err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
+    if (ct3d->dc.host_dc) {
+        address_space_destroy(&ct3d->dc.host_dc_as);
+    }
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -797,6 +893,9 @@ static void ct3_exit(PCIDevice *pci_dev)
     pcie_aer_exit(pci_dev);
     cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
+    if (ct3d->dc.host_dc) {
+        address_space_destroy(&ct3d->dc.host_dc_as);
+    }
     if (ct3d->hostpmem) {
         address_space_destroy(&ct3d->hostpmem_as);
     }
@@ -875,16 +974,23 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
                                        AddressSpace **as,
                                        uint64_t *dpa_offset)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
+    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+        vmr_size = memory_region_size(vmr);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+        pmr_size = memory_region_size(pmr);
+    }
+    if (ct3d->dc.host_dc) {
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        dc_size = memory_region_size(dc_mr);
     }
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return -ENODEV;
     }
 
@@ -892,19 +998,18 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         return -EINVAL;
     }
 
-    if (*dpa_offset > ct3d->cxl_dstate.static_mem_size) {
+    if (*dpa_offset >= vmr_size + pmr_size + dc_size) {
         return -EINVAL;
     }
 
-    if (vmr) {
-        if (*dpa_offset < memory_region_size(vmr)) {
-            *as = &ct3d->hostvmem_as;
-        } else {
-            *as = &ct3d->hostpmem_as;
-            *dpa_offset -= memory_region_size(vmr);
-        }
-    } else {
+    if (*dpa_offset < vmr_size) {
+        *as = &ct3d->hostvmem_as;
+    } else if (*dpa_offset < vmr_size + pmr_size) {
         *as = &ct3d->hostpmem_as;
+        *dpa_offset -= vmr_size;
+    } else {
+        *as = &ct3d->dc.host_dc_as;
+        *dpa_offset -= (vmr_size + pmr_size);
     }
 
     return 0;
@@ -986,6 +1091,8 @@ static Property ct3_props[] = {
     DEFINE_PROP_UINT64("sn", CXLType3Dev, sn, UI64_NULL),
     DEFINE_PROP_STRING("cdat", CXLType3Dev, cxl_cstate.cdat.filename),
     DEFINE_PROP_UINT8("num-dc-regions", CXLType3Dev, dc.num_regions, 0),
+    DEFINE_PROP_LINK("volatile-dc-memdev", CXLType3Dev, dc.host_dc,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1052,33 +1159,39 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, uint64_t size,
 
 static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
 {
-    MemoryRegion *vmr = NULL, *pmr = NULL;
+    MemoryRegion *vmr = NULL, *pmr = NULL, *dc_mr = NULL;
     AddressSpace *as;
+    uint64_t vmr_size = 0, pmr_size = 0, dc_size = 0;
 
     if (ct3d->hostvmem) {
         vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+        vmr_size = memory_region_size(vmr);
     }
     if (ct3d->hostpmem) {
         pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+        pmr_size = memory_region_size(pmr);
     }
+    if (ct3d->dc.host_dc) {
+        dc_mr = host_memory_backend_get_memory(ct3d->dc.host_dc);
+        dc_size = memory_region_size(dc_mr);
+     }
 
-    if (!vmr && !pmr) {
+    if (!vmr && !pmr && !dc_mr) {
         return false;
     }
 
-    if (dpa_offset + CXL_CACHE_LINE_SIZE > ct3d->cxl_dstate.static_mem_size) {
+    if (dpa_offset + CXL_CACHE_LINE_SIZE > vmr_size + pmr_size + dc_size) {
         return false;
     }
 
-    if (vmr) {
-        if (dpa_offset < memory_region_size(vmr)) {
-            as = &ct3d->hostvmem_as;
-        } else {
-            as = &ct3d->hostpmem_as;
-            dpa_offset -= memory_region_size(vmr);
-        }
-    } else {
+    if (dpa_offset < vmr_size) {
+        as = &ct3d->hostvmem_as;
+    } else if (dpa_offset < vmr_size + pmr_size) {
         as = &ct3d->hostpmem_as;
+        dpa_offset -= vmr_size;
+    } else {
+        as = &ct3d->dc.host_dc_as;
+        dpa_offset -= (vmr_size + pmr_size);
     }
 
     address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data,
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 25/46] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (23 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 24/46] hw/mem/cxl_type3: Add host backend and address space handling for DC regions Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 26/46] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response Michael S. Tsirkin
                   ` (22 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Fan Ni, Svetly Todorov, Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Add dynamic capacity extent list representative to the definition of
CXLType3Dev and implement get DC extent list mailbox command per
CXL.spec.3.1:.8.2.9.9.9.2.

Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-10-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h | 22 +++++++++++
 hw/cxl/cxl-mailbox-utils.c  | 73 ++++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3.c          |  1 +
 3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index c2c3df0d2a..6aec6ac983 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -424,6 +424,25 @@ typedef QLIST_HEAD(, CXLPoison) CXLPoisonList;
 
 #define DCD_MAX_NUM_REGION 8
 
+typedef struct CXLDCExtentRaw {
+    uint64_t start_dpa;
+    uint64_t len;
+    uint8_t tag[0x10];
+    uint16_t shared_seq;
+    uint8_t rsvd[0x6];
+} QEMU_PACKED CXLDCExtentRaw;
+
+typedef struct CXLDCExtent {
+    uint64_t start_dpa;
+    uint64_t len;
+    uint8_t tag[0x10];
+    uint16_t shared_seq;
+    uint8_t rsvd[0x6];
+
+    QTAILQ_ENTRY(CXLDCExtent) node;
+} CXLDCExtent;
+typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -474,6 +493,9 @@ struct CXLType3Dev {
          * memory region size.
          */
         uint64_t total_capacity; /* 256M aligned */
+        CXLDCExtentList extents;
+        uint32_t total_extent_count;
+        uint32_t ext_list_gen_seq;
 
         uint8_t num_regions; /* 0-8 regions */
         CXLDCRegion regions[DCD_MAX_NUM_REGION];
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 6ad227f112..7872d2f3e6 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -84,6 +84,7 @@ enum {
         #define CLEAR_POISON           0x2
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
+        #define GET_DYN_CAP_EXT_LIST   0x1
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1322,7 +1323,8 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
      * to use.
      */
     stl_le_p(&extra_out->num_extents_supported, CXL_NUM_EXTENTS_SUPPORTED);
-    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED);
+    stl_le_p(&extra_out->num_extents_available, CXL_NUM_EXTENTS_SUPPORTED -
+             ct3d->dc.total_extent_count);
     stl_le_p(&extra_out->num_tags_supported, CXL_NUM_TAGS_SUPPORTED);
     stl_le_p(&extra_out->num_tags_available, CXL_NUM_TAGS_SUPPORTED);
 
@@ -1330,6 +1332,72 @@ static CXLRetCode cmd_dcd_get_dyn_cap_config(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * CXL r3.1 section 8.2.9.9.9.2:
+ * Get Dynamic Capacity Extent List (Opcode 4801h)
+ */
+static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
+                                               uint8_t *payload_in,
+                                               size_t len_in,
+                                               uint8_t *payload_out,
+                                               size_t *len_out,
+                                               CXLCCI *cci)
+{
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    struct {
+        uint32_t extent_cnt;
+        uint32_t start_extent_id;
+    } QEMU_PACKED *in = (void *)payload_in;
+    struct {
+        uint32_t count;
+        uint32_t total_extents;
+        uint32_t generation_num;
+        uint8_t rsvd[4];
+        CXLDCExtentRaw records[];
+    } QEMU_PACKED *out = (void *)payload_out;
+    uint32_t start_extent_id = in->start_extent_id;
+    CXLDCExtentList *extent_list = &ct3d->dc.extents;
+    uint16_t record_count = 0, i = 0, record_done = 0;
+    uint16_t out_pl_len, size;
+    CXLDCExtent *ent;
+
+    if (start_extent_id > ct3d->dc.total_extent_count) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    record_count = MIN(in->extent_cnt,
+                       ct3d->dc.total_extent_count - start_extent_id);
+    size = CXL_MAILBOX_MAX_PAYLOAD_SIZE - sizeof(*out);
+    record_count = MIN(record_count, size / sizeof(out->records[0]));
+    out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
+
+    stl_le_p(&out->count, record_count);
+    stl_le_p(&out->total_extents, ct3d->dc.total_extent_count);
+    stl_le_p(&out->generation_num, ct3d->dc.ext_list_gen_seq);
+
+    if (record_count > 0) {
+        CXLDCExtentRaw *out_rec = &out->records[record_done];
+
+        QTAILQ_FOREACH(ent, extent_list, node) {
+            if (i++ < start_extent_id) {
+                continue;
+            }
+            stq_le_p(&out_rec->start_dpa, ent->start_dpa);
+            stq_le_p(&out_rec->len, ent->len);
+            memcpy(&out_rec->tag, ent->tag, 0x10);
+            stw_le_p(&out_rec->shared_seq, ent->shared_seq);
+
+            record_done++;
+            if (record_done == record_count) {
+                break;
+            }
+        }
+    }
+
+    *len_out = out_pl_len;
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1377,6 +1445,9 @@ static const struct cxl_cmd cxl_cmd_set[256][256] = {
 static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DC_CONFIG] = { "DCD_GET_DC_CONFIG",
         cmd_dcd_get_dyn_cap_config, 2, 0 },
+    [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
+        "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
+        8, 0 },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 658570aa1a..2075846b1b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -673,6 +673,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         };
         ct3d->dc.total_capacity += region->len;
     }
+    QTAILQ_INIT(&ct3d->dc.extents);
 
     return true;
 }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 26/46] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (24 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 25/46] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 27/46] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents Michael S. Tsirkin
                   ` (21 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Fan Ni, Svetly Todorov, Gregory Price,
	Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Per CXL spec 3.1, two mailbox commands are implemented:
Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and
Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4.

For the process of the above two commands, we use two-pass approach.
Pass 1: Check whether the input payload is valid or not; if not, skip
        Pass 2 and return mailbox process error.
Pass 2: Do the real work--add or release extents, respectively.

Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-11-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h |   4 +
 hw/cxl/cxl-mailbox-utils.c  | 394 ++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          |  11 +
 3 files changed, 409 insertions(+)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 6aec6ac983..df3511e91b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -551,4 +551,8 @@ void cxl_event_irq_assert(CXLType3Dev *ct3d);
 
 void cxl_set_poison_list_overflowed(CXLType3Dev *ct3d);
 
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+                                        CXLDCExtent *extent);
 #endif
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 7872d2f3e6..e322407fb3 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -19,6 +19,7 @@
 #include "qemu/units.h"
 #include "qemu/uuid.h"
 #include "sysemu/hostmem.h"
+#include "qemu/range.h"
 
 #define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
 #define CXL_DC_EVENT_LOG_SIZE 8
@@ -85,6 +86,8 @@ enum {
     DCD_CONFIG  = 0x48,
         #define GET_DC_CONFIG          0x0
         #define GET_DYN_CAP_EXT_LIST   0x1
+        #define ADD_DYN_CAP_RSP        0x2
+        #define RELEASE_DYN_CAP        0x3
     PHYSICAL_SWITCH = 0x51,
         #define IDENTIFY_SWITCH_DEVICE      0x0
         #define GET_PHYSICAL_PORT_STATE     0x1
@@ -1398,6 +1401,391 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
     return CXL_MBOX_SUCCESS;
 }
 
+/*
+ * Check whether any bit between addr[nr, nr+size) is set,
+ * return true if any bit is set, otherwise return false
+ */
+static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                              unsigned long size)
+{
+    unsigned long res = find_next_bit(addr, size + nr, nr);
+
+    return res < nr + size;
+}
+
+CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
+{
+    int i;
+    CXLDCRegion *region = &ct3d->dc.regions[0];
+
+    if (dpa < region->base ||
+        dpa >= region->base + ct3d->dc.total_capacity) {
+        return NULL;
+    }
+
+    /*
+     * CXL r3.1 section 9.13.3: Dynamic Capacity Device (DCD)
+     *
+     * Regions are used in increasing-DPA order, with Region 0 being used for
+     * the lowest DPA of Dynamic Capacity and Region 7 for the highest DPA.
+     * So check from the last region to find where the dpa belongs. Extents that
+     * cross multiple regions are not allowed.
+     */
+    for (i = ct3d->dc.num_regions - 1; i >= 0; i--) {
+        region = &ct3d->dc.regions[i];
+        if (dpa >= region->base) {
+            if (dpa + len > region->base + region->len) {
+                return NULL;
+            }
+            return region;
+        }
+    }
+
+    return NULL;
+}
+
+static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+                                             uint64_t dpa,
+                                             uint64_t len,
+                                             uint8_t *tag,
+                                             uint16_t shared_seq)
+{
+    CXLDCExtent *extent;
+
+    extent = g_new0(CXLDCExtent, 1);
+    extent->start_dpa = dpa;
+    extent->len = len;
+    if (tag) {
+        memcpy(extent->tag, tag, 0x10);
+    }
+    extent->shared_seq = shared_seq;
+
+    QTAILQ_INSERT_TAIL(list, extent, node);
+}
+
+void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
+                                        CXLDCExtent *extent)
+{
+    QTAILQ_REMOVE(list, extent, node);
+    g_free(extent);
+}
+
+/*
+ * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
+ * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
+ */
+typedef struct CXLUpdateDCExtentListInPl {
+    uint32_t num_entries_updated;
+    uint8_t flags;
+    uint8_t rsvd[3];
+    /* CXL r3.1 Table 8-169: Updated Extent */
+    struct {
+        uint64_t start_dpa;
+        uint64_t len;
+        uint8_t rsvd[8];
+    } QEMU_PACKED updated_entries[];
+} QEMU_PACKED CXLUpdateDCExtentListInPl;
+
+/*
+ * For the extents in the extent list to operate, check whether they are valid
+ * 1. The extent should be in the range of a valid DC region;
+ * 2. The extent should not cross multiple regions;
+ * 3. The start DPA and the length of the extent should align with the block
+ * size of the region;
+ * 4. The address range of multiple extents in the list should not overlap.
+ */
+static CXLRetCode cxl_detect_malformed_extent_list(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    uint64_t min_block_size = UINT64_MAX;
+    CXLDCRegion *region;
+    CXLDCRegion *lastregion = &ct3d->dc.regions[ct3d->dc.num_regions - 1];
+    g_autofree unsigned long *blk_bitmap = NULL;
+    uint64_t dpa, len;
+    uint32_t i;
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        min_block_size = MIN(min_block_size, region->block_size);
+    }
+
+    blk_bitmap = bitmap_new((lastregion->base + lastregion->len -
+                             ct3d->dc.regions[0].base) / min_block_size);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        region = cxl_find_dc_region(ct3d, dpa, len);
+        if (!region) {
+            return CXL_MBOX_INVALID_PA;
+        }
+
+        dpa -= ct3d->dc.regions[0].base;
+        if (dpa % region->block_size || len % region->block_size) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        /* the dpa range already covered by some other extents in the list */
+        if (test_any_bits_set(blk_bitmap, dpa / min_block_size,
+            len / min_block_size)) {
+            return CXL_MBOX_INVALID_EXTENT_LIST;
+        }
+        bitmap_set(blk_bitmap, dpa / min_block_size, len / min_block_size);
+   }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in)
+{
+    uint32_t i;
+    CXLDCExtent *ent;
+    uint64_t dpa, len;
+    Range range1, range2;
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        range_init_nofail(&range1, dpa, len);
+
+        /*
+         * TODO: once the pending extent list is added, check against
+         * the list will be added here.
+         */
+
+        /* to-be-added range should not overlap with range already accepted */
+        QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
+            range_init_nofail(&range2, ent->start_dpa, ent->len);
+            if (range_overlaps_range(&range1, &range2)) {
+                return CXL_MBOX_INVALID_PA;
+            }
+        }
+    }
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.3: Add Dynamic Capacity Response (Opcode 4802h)
+ * An extent is added to the extent list and becomes usable only after the
+ * response is processed successfully.
+ */
+static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCExtentList *extent_list = &ct3d->dc.extents;
+    uint32_t i;
+    uint64_t dpa, len;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        /*
+         * TODO: once the pending list is introduced, extents in the beginning
+         * will get wiped out.
+         */
+        return CXL_MBOX_SUCCESS;
+    }
+
+    /* Adding extents causes exceeding device's extent tracking ability. */
+    if (in->num_entries_updated + ct3d->dc.total_extent_count >
+        CXL_NUM_EXTENTS_SUPPORTED) {
+        return CXL_MBOX_RESOURCES_EXHAUSTED;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    ret = cxl_dcd_add_dyn_cap_rsp_dry_run(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
+        ct3d->dc.total_extent_count += 1;
+        /*
+         * TODO: we will add a pending extent list based on event log record
+         * and process the list accordingly here.
+         */
+    }
+
+    return CXL_MBOX_SUCCESS;
+}
+
+/*
+ * Copy extent list from src to dst
+ * Return value: number of extents copied
+ */
+static uint32_t copy_extent_list(CXLDCExtentList *dst,
+                                 const CXLDCExtentList *src)
+{
+    uint32_t cnt = 0;
+    CXLDCExtent *ent;
+
+    if (!dst || !src) {
+        return 0;
+    }
+
+    QTAILQ_FOREACH(ent, src, node) {
+        cxl_insert_extent_to_extent_list(dst, ent->start_dpa, ent->len,
+                                         ent->tag, ent->shared_seq);
+        cnt++;
+    }
+    return cnt;
+}
+
+static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
+        const CXLUpdateDCExtentListInPl *in, CXLDCExtentList *updated_list,
+        uint32_t *updated_list_size)
+{
+    CXLDCExtent *ent, *ent_next;
+    uint64_t dpa, len;
+    uint32_t i;
+    int cnt_delta = 0;
+    CXLRetCode ret = CXL_MBOX_SUCCESS;
+
+    QTAILQ_INIT(updated_list);
+    copy_extent_list(updated_list, &ct3d->dc.extents);
+
+    for (i = 0; i < in->num_entries_updated; i++) {
+        Range range;
+
+        dpa = in->updated_entries[i].start_dpa;
+        len = in->updated_entries[i].len;
+
+        while (len > 0) {
+            QTAILQ_FOREACH(ent, updated_list, node) {
+                range_init_nofail(&range, ent->start_dpa, ent->len);
+
+                if (range_contains(&range, dpa)) {
+                    uint64_t len1, len2 = 0, len_done = 0;
+                    uint64_t ent_start_dpa = ent->start_dpa;
+                    uint64_t ent_len = ent->len;
+
+                    len1 = dpa - ent->start_dpa;
+                    /* Found the extent or the subset of an existing extent */
+                    if (range_contains(&range, dpa + len - 1)) {
+                        len2 = ent_start_dpa + ent_len - dpa - len;
+                    } else {
+                        /*
+                         * TODO: we reject the attempt to remove an extent
+                         * that overlaps with multiple extents in the device
+                         * for now. We will allow it once superset release
+                         * support is added.
+                         */
+                        ret = CXL_MBOX_INVALID_PA;
+                        goto free_and_exit;
+                    }
+                    len_done = ent_len - len1 - len2;
+
+                    cxl_remove_extent_from_extent_list(updated_list, ent);
+                    cnt_delta--;
+
+                    if (len1) {
+                        cxl_insert_extent_to_extent_list(updated_list,
+                                                         ent_start_dpa,
+                                                         len1, NULL, 0);
+                        cnt_delta++;
+                    }
+                    if (len2) {
+                        cxl_insert_extent_to_extent_list(updated_list,
+                                                         dpa + len,
+                                                         len2, NULL, 0);
+                        cnt_delta++;
+                    }
+
+                    if (cnt_delta + ct3d->dc.total_extent_count >
+                            CXL_NUM_EXTENTS_SUPPORTED) {
+                        ret = CXL_MBOX_RESOURCES_EXHAUSTED;
+                        goto free_and_exit;
+                    }
+
+                    len -= len_done;
+                    /* len == 0 here until superset release is added */
+                    break;
+                }
+            }
+            if (len) {
+                ret = CXL_MBOX_INVALID_PA;
+                goto free_and_exit;
+            }
+        }
+    }
+free_and_exit:
+    if (ret != CXL_MBOX_SUCCESS) {
+        QTAILQ_FOREACH_SAFE(ent, updated_list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(updated_list, ent);
+        }
+        *updated_list_size = 0;
+    } else {
+        *updated_list_size = ct3d->dc.total_extent_count + cnt_delta;
+    }
+
+    return ret;
+}
+
+/*
+ * CXL r3.1 section 8.2.9.9.9.4: Release Dynamic Capacity (Opcode 4803h)
+ */
+static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
+                                          uint8_t *payload_in,
+                                          size_t len_in,
+                                          uint8_t *payload_out,
+                                          size_t *len_out,
+                                          CXLCCI *cci)
+{
+    CXLUpdateDCExtentListInPl *in = (void *)payload_in;
+    CXLType3Dev *ct3d = CXL_TYPE3(cci->d);
+    CXLDCExtentList updated_list;
+    CXLDCExtent *ent, *ent_next;
+    uint32_t updated_list_size;
+    CXLRetCode ret;
+
+    if (in->num_entries_updated == 0) {
+        return CXL_MBOX_INVALID_INPUT;
+    }
+
+    ret = cxl_detect_malformed_extent_list(ct3d, in);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    ret = cxl_dc_extent_release_dry_run(ct3d, in, &updated_list,
+                                        &updated_list_size);
+    if (ret != CXL_MBOX_SUCCESS) {
+        return ret;
+    }
+
+    /*
+     * If the dry run release passes, the returned updated_list will
+     * be the updated extent list and we just need to clear the extents
+     * in the accepted list and copy extents in the updated_list to accepted
+     * list and update the extent count;
+     */
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+    }
+    copy_extent_list(&ct3d->dc.extents, &updated_list);
+    QTAILQ_FOREACH_SAFE(ent, &updated_list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&updated_list, ent);
+    }
+    ct3d->dc.total_extent_count = updated_list_size;
+
+    return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
@@ -1448,6 +1836,12 @@ static const struct cxl_cmd cxl_cmd_set_dcd[256][256] = {
     [DCD_CONFIG][GET_DYN_CAP_EXT_LIST] = {
         "DCD_GET_DYNAMIC_CAPACITY_EXTENT_LIST", cmd_dcd_get_dyn_cap_ext_list,
         8, 0 },
+    [DCD_CONFIG][ADD_DYN_CAP_RSP] = {
+        "DCD_ADD_DYNAMIC_CAPACITY_RESPONSE", cmd_dcd_add_dyn_cap_rsp,
+        ~0, IMMEDIATE_DATA_CHANGE },
+    [DCD_CONFIG][RELEASE_DYN_CAP] = {
+        "DCD_RELEASE_DYNAMIC_CAPACITY", cmd_dcd_release_dyn_cap,
+        ~0, IMMEDIATE_DATA_CHANGE },
 };
 
 static const struct cxl_cmd cxl_cmd_set_sw[256][256] = {
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 2075846b1b..db5191b3b7 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -678,6 +678,15 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
     return true;
 }
 
+static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
+{
+    CXLDCExtent *ent, *ent_next;
+
+    QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
+    }
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
     DeviceState *ds = DEVICE(ct3d);
@@ -874,6 +883,7 @@ err_free_special_ops:
     g_free(regs->special_ops);
 err_address_space_free:
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
@@ -895,6 +905,7 @@ static void ct3_exit(PCIDevice *pci_dev)
     cxl_doe_cdat_release(cxl_cstate);
     g_free(regs->special_ops);
     if (ct3d->dc.host_dc) {
+        cxl_destroy_dc_regions(ct3d);
         address_space_destroy(&ct3d->dc.host_dc_as);
     }
     if (ct3d->hostpmem) {
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 27/46] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (25 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 26/46] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 28/46] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions Michael S. Tsirkin
                   ` (20 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Fan Ni, Svetly Todorov, Gregory Price,
	Jonathan Cameron, Eric Blake, Markus Armbruster

From: Fan Ni <fan.ni@samsung.com>

To simulate FM functionalities for initiating Dynamic Capacity Add
(Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
add/release dynamic capacity extents requests.

With the change, we allow to release an extent only when its DPA range
is contained by a single accepted extent in the device. That is to say,
extent superset release is not supported yet.

1. Add dynamic capacity extents:

For example, the command to add two continuous extents (each 128MiB long)
to region 0 (starting at DPA offset 0) looks like below:

{ "execute": "qmp_capabilities" }

{ "execute": "cxl-add-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "selection-policy": "prescriptive",
      "region": 0,
      "extents": [
      {
          "offset": 0,
          "len": 134217728
      },
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

2. Release dynamic capacity extents:

For example, the command to release an extent of size 128MiB from region 0
(DPA offset 128MiB) looks like below:

{ "execute": "cxl-release-dynamic-capacity",
  "arguments": {
      "path": "/machine/peripheral/cxl-dcd0",
      "host-id": 0,
      "removal-policy":"prescriptive",
      "region": 0,
      "extents": [
      {
          "offset": 134217728,
          "len": 134217728
      }
      ]
  }
}

Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-12-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 qapi/cxl.json               | 143 +++++++++++++++++
 include/hw/cxl/cxl_device.h |  22 +++
 include/hw/cxl/cxl_events.h |  18 +++
 hw/cxl/cxl-mailbox-utils.c  |  62 ++++++--
 hw/mem/cxl_type3.c          | 306 +++++++++++++++++++++++++++++++++++-
 hw/mem/cxl_type3_stubs.c    |  25 +++
 6 files changed, 563 insertions(+), 13 deletions(-)

diff --git a/qapi/cxl.json b/qapi/cxl.json
index 4281726dec..57d9f82014 100644
--- a/qapi/cxl.json
+++ b/qapi/cxl.json
@@ -361,3 +361,146 @@
 ##
 {'command': 'cxl-inject-correctable-error',
  'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
+
+##
+# @CXLDynamicCapacityExtent:
+#
+# A single dynamic capacity extent
+#
+# @offset: The offset (in bytes) to the start of the region
+#     where the extent belongs to.
+#
+# @len: The length of the extent in bytes.
+#
+# Since: 9.1
+##
+{ 'struct': 'CXLDynamicCapacityExtent',
+  'data': {
+      'offset':'uint64',
+      'len': 'uint64'
+  }
+}
+
+##
+# @CXLExtSelPolicy:
+#
+# The policy to use for selecting which extents comprise the added
+# capacity, as defined in cxl spec r3.1 Table 7-70.
+#
+# @free: 0h = Free
+#
+# @contiguous: 1h = Continuous
+#
+# @prescriptive: 2h = Prescriptive
+#
+# @enable-shared-access: 3h = Enable Shared Access
+#
+# Since: 9.1
+##
+{ 'enum': 'CXLExtSelPolicy',
+  'data': ['free',
+           'contiguous',
+           'prescriptive',
+           'enable-shared-access']
+}
+
+##
+# @cxl-add-dynamic-capacity:
+#
+# Command to initiate to add dynamic capacity extents to a host.  It
+# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
+#
+# @path: CXL DCD canonical QOM path.
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# @selection-policy: The "Selection Policy" bits as defined in
+#     cxl spec r3.1 Table 7-70.  It specifies the policy to use for
+#     selecting which extents comprise the added capacity.
+#
+# @region: The "Region Number" field as defined in cxl spec r3.1
+#     Table 7-70.  The dynamic capacity region where the capacity
+#     is being added.  Valid range is from 0-7.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-70.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-add-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'selection-policy': 'CXLExtSelPolicy',
+            'region': 'uint8',
+            '*tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
+
+##
+# @CXLExtRemovalPolicy:
+#
+# The policy to use for selecting which extents comprise the released
+# capacity, defined in the "Flags" field in cxl spec r3.1 Table 7-71.
+#
+# @tag-based: value = 0h.  Extents are selected by the device based
+#     on tag, with no requirement for contiguous extents.
+#
+# @prescriptive: value = 1h.  Extent list of capacity to release is
+#     included in the request payload.
+#
+# Since: 9.1
+##
+{ 'enum': 'CXLExtRemovalPolicy',
+  'data': ['tag-based',
+           'prescriptive']
+}
+
+##
+# @cxl-release-dynamic-capacity:
+#
+# Command to initiate to release dynamic capacity extents from a
+# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
+#
+# @path: CXL DCD canonical QOM path.
+#
+# @host-id: The "Host ID" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# @removal-policy: Bit[3:0] of the "Flags" field as defined in cxl
+#     spec r3.1 Table 7-71.
+#
+# @forced-removal: Bit[4] of the "Flags" field in cxl spec r3.1
+#     Table 7-71.  When set, device does not wait for a Release
+#     Dynamic Capacity command from the host.  Host immediately
+#     loses access to released capacity.
+#
+# @sanitize-on-release: Bit[5] of the "Flags" field in cxl spec r3.1
+#     Table 7-71.  When set, device should sanitize all released
+#     capacity as a result of this request.
+#
+# @region: The "Region Number" field as defined in cxl spec r3.1
+#     Table 7-71.  The dynamic capacity region where the capacity
+#     is being added.  Valid range is from 0-7.
+#
+# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-71.
+#
+# @extents: The "Extent List" field as defined in cxl spec r3.1
+#     Table 7-71.
+#
+# Since : 9.1
+##
+{ 'command': 'cxl-release-dynamic-capacity',
+  'data': { 'path': 'str',
+            'host-id': 'uint16',
+            'removal-policy': 'CXLExtRemovalPolicy',
+            '*forced-removal': 'bool',
+            '*sanitize-on-release': 'bool',
+            'region': 'uint8',
+            '*tag': 'str',
+            'extents': [ 'CXLDynamicCapacityExtent' ]
+           }
+}
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index df3511e91b..c69ff6b5de 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -443,6 +443,12 @@ typedef struct CXLDCExtent {
 } CXLDCExtent;
 typedef QTAILQ_HEAD(, CXLDCExtent) CXLDCExtentList;
 
+typedef struct CXLDCExtentGroup {
+    CXLDCExtentList list;
+    QTAILQ_ENTRY(CXLDCExtentGroup) node;
+} CXLDCExtentGroup;
+typedef QTAILQ_HEAD(, CXLDCExtentGroup) CXLDCExtentGroupList;
+
 typedef struct CXLDCRegion {
     uint64_t base;       /* aligned to 256*MiB */
     uint64_t decode_len; /* aligned to 256*MiB */
@@ -494,6 +500,7 @@ struct CXLType3Dev {
          */
         uint64_t total_capacity; /* 256M aligned */
         CXLDCExtentList extents;
+        CXLDCExtentGroupList extents_pending;
         uint32_t total_extent_count;
         uint32_t ext_list_gen_seq;
 
@@ -555,4 +562,19 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len);
 
 void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
                                         CXLDCExtent *extent);
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list, uint64_t dpa,
+                                      uint64_t len, uint8_t *tag,
+                                      uint16_t shared_seq);
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+                       unsigned long size);
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len);
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq);
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group);
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
 #endif
diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
index 5170b8dbf8..38cadaa0f3 100644
--- a/include/hw/cxl/cxl_events.h
+++ b/include/hw/cxl/cxl_events.h
@@ -166,4 +166,22 @@ typedef struct CXLEventMemoryModule {
     uint8_t reserved[0x3d];
 } QEMU_PACKED CXLEventMemoryModule;
 
+/*
+ * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
+ * All fields little endian.
+ */
+typedef struct CXLEventDynamicCapacity {
+    CXLEventRecordHdr hdr;
+    uint8_t type;
+    uint8_t validity_flags;
+    uint16_t host_id;
+    uint8_t updated_region_id;
+    uint8_t flags;
+    uint8_t reserved2[2];
+    uint8_t dynamic_capacity_extent[0x28]; /* defined in cxl_device.h */
+    uint8_t reserved[0x18];
+    uint32_t extents_avail;
+    uint32_t tags_avail;
+} QEMU_PACKED CXLEventDynamicCapacity;
+
 #endif /* CXL_EVENTS_H */
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index e322407fb3..64387f34ce 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1405,7 +1405,7 @@ static CXLRetCode cmd_dcd_get_dyn_cap_ext_list(const struct cxl_cmd *cmd,
  * Check whether any bit between addr[nr, nr+size) is set,
  * return true if any bit is set, otherwise return false
  */
-static bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
+bool test_any_bits_set(const unsigned long *addr, unsigned long nr,
                               unsigned long size)
 {
     unsigned long res = find_next_bit(addr, size + nr, nr);
@@ -1444,7 +1444,7 @@ CXLDCRegion *cxl_find_dc_region(CXLType3Dev *ct3d, uint64_t dpa, uint64_t len)
     return NULL;
 }
 
-static void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
+void cxl_insert_extent_to_extent_list(CXLDCExtentList *list,
                                              uint64_t dpa,
                                              uint64_t len,
                                              uint8_t *tag,
@@ -1470,6 +1470,44 @@ void cxl_remove_extent_from_extent_list(CXLDCExtentList *list,
     g_free(extent);
 }
 
+/*
+ * Add a new extent to the extent "group" if group exists;
+ * otherwise, create a new group
+ * Return value: the extent group where the extent is inserted.
+ */
+CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
+                                                    uint64_t dpa,
+                                                    uint64_t len,
+                                                    uint8_t *tag,
+                                                    uint16_t shared_seq)
+{
+    if (!group) {
+        group = g_new0(CXLDCExtentGroup, 1);
+        QTAILQ_INIT(&group->list);
+    }
+    cxl_insert_extent_to_extent_list(&group->list, dpa, len,
+                                     tag, shared_seq);
+    return group;
+}
+
+void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
+                                       CXLDCExtentGroup *group)
+{
+    QTAILQ_INSERT_TAIL(list, group, node);
+}
+
+void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list)
+{
+    CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group = QTAILQ_FIRST(list);
+
+    QTAILQ_REMOVE(list, group, node);
+    QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+        cxl_remove_extent_from_extent_list(&group->list, ent);
+    }
+    g_free(group);
+}
+
 /*
  * CXL r3.1 Table 8-168: Add Dynamic Capacity Response Input Payload
  * CXL r3.1 Table 8-170: Release Dynamic Capacity Input Payload
@@ -1541,6 +1579,7 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
 {
     uint32_t i;
     CXLDCExtent *ent;
+    CXLDCExtentGroup *ext_group;
     uint64_t dpa, len;
     Range range1, range2;
 
@@ -1551,9 +1590,13 @@ static CXLRetCode cxl_dcd_add_dyn_cap_rsp_dry_run(CXLType3Dev *ct3d,
         range_init_nofail(&range1, dpa, len);
 
         /*
-         * TODO: once the pending extent list is added, check against
-         * the list will be added here.
+         * The host-accepted DPA range must be contained by the first extent
+         * group in the pending list
          */
+        ext_group = QTAILQ_FIRST(&ct3d->dc.extents_pending);
+        if (!cxl_extents_contains_dpa_range(&ext_group->list, dpa, len)) {
+            return CXL_MBOX_INVALID_PA;
+        }
 
         /* to-be-added range should not overlap with range already accepted */
         QTAILQ_FOREACH(ent, &ct3d->dc.extents, node) {
@@ -1586,10 +1629,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
     CXLRetCode ret;
 
     if (in->num_entries_updated == 0) {
-        /*
-         * TODO: once the pending list is introduced, extents in the beginning
-         * will get wiped out.
-         */
+        cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
         return CXL_MBOX_SUCCESS;
     }
 
@@ -1615,11 +1655,9 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
-        /*
-         * TODO: we will add a pending extent list based on event log record
-         * and process the list accordingly here.
-         */
     }
+    /* Remove the first extent group in the pending list */
+    cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
 
     return CXL_MBOX_SUCCESS;
 }
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index db5191b3b7..f53bcca6d3 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -674,6 +674,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
         ct3d->dc.total_capacity += region->len;
     }
     QTAILQ_INIT(&ct3d->dc.extents);
+    QTAILQ_INIT(&ct3d->dc.extents_pending);
 
     return true;
 }
@@ -681,10 +682,19 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
 static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
+    CXLDCExtentGroup *group, *group_next;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
+
+    QTAILQ_FOREACH_SAFE(group, &ct3d->dc.extents_pending, node, group_next) {
+        QTAILQ_REMOVE(&ct3d->dc.extents_pending, group, node);
+        QTAILQ_FOREACH_SAFE(ent, &group->list, node, ent_next) {
+            cxl_remove_extent_from_extent_list(&group->list, ent);
+        }
+        g_free(group);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -1449,7 +1459,6 @@ static int ct3d_qmp_cxl_event_log_enc(CxlEventLog log)
         return CXL_EVENT_TYPE_FAIL;
     case CXL_EVENT_LOG_FATAL:
         return CXL_EVENT_TYPE_FATAL;
-/* DCD not yet supported */
     default:
         return -EINVAL;
     }
@@ -1700,6 +1709,301 @@ void qmp_cxl_inject_memory_module_event(const char *path, CxlEventLog log,
     }
 }
 
+/* CXL r3.1 Table 8-50: Dynamic Capacity Event Record */
+static const QemuUUID dynamic_capacity_uuid = {
+    .data = UUID(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f,
+                 0x95, 0x26, 0x8e, 0x10, 0x1a, 0x2a),
+};
+
+typedef enum CXLDCEventType {
+    DC_EVENT_ADD_CAPACITY = 0x0,
+    DC_EVENT_RELEASE_CAPACITY = 0x1,
+    DC_EVENT_FORCED_RELEASE_CAPACITY = 0x2,
+    DC_EVENT_REGION_CONFIG_UPDATED = 0x3,
+    DC_EVENT_ADD_CAPACITY_RSP = 0x4,
+    DC_EVENT_CAPACITY_RELEASED = 0x5,
+} CXLDCEventType;
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] has overlaps with extents in
+ * the list.
+ * Return value: return true if has overlaps; otherwise, return false
+ */
+static bool cxl_extents_overlaps_dpa_range(CXLDCExtentList *list,
+                                           uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_overlaps_range(&range1, &range2)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * Check whether the range [dpa, dpa + len - 1] is contained by extents in
+ * the list.
+ * Will check multiple extents containment once superset release is added.
+ * Return value: return true if range is contained; otherwise, return false
+ */
+bool cxl_extents_contains_dpa_range(CXLDCExtentList *list,
+                                    uint64_t dpa, uint64_t len)
+{
+    CXLDCExtent *ent;
+    Range range1, range2;
+
+    if (!list) {
+        return false;
+    }
+
+    range_init_nofail(&range1, dpa, len);
+    QTAILQ_FOREACH(ent, list, node) {
+        range_init_nofail(&range2, ent->start_dpa, ent->len);
+        if (range_contains_range(&range2, &range1)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static bool cxl_extent_groups_overlaps_dpa_range(CXLDCExtentGroupList *list,
+                                                 uint64_t dpa, uint64_t len)
+{
+    CXLDCExtentGroup *group;
+
+    if (!list) {
+        return false;
+    }
+
+    QTAILQ_FOREACH(group, list, node) {
+        if (cxl_extents_overlaps_dpa_range(&group->list, dpa, len)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+/*
+ * The main function to process dynamic capacity event with extent list.
+ * Currently DC extents add/release requests are processed.
+ */
+static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
+        uint16_t hid, CXLDCEventType type, uint8_t rid,
+        CXLDynamicCapacityExtentList *records, Error **errp)
+{
+    Object *obj;
+    CXLEventDynamicCapacity dCap = {};
+    CXLEventRecordHdr *hdr = &dCap.hdr;
+    CXLType3Dev *dcd;
+    uint8_t flags = 1 << CXL_EVENT_TYPE_INFO;
+    uint32_t num_extents = 0;
+    CXLDynamicCapacityExtentList *list;
+    CXLDCExtentGroup *group = NULL;
+    g_autofree CXLDCExtentRaw *extents = NULL;
+    uint8_t enc_log = CXL_EVENT_TYPE_DYNAMIC_CAP;
+    uint64_t dpa, offset, len, block_size;
+    g_autofree unsigned long *blk_bitmap = NULL;
+    int i;
+
+    obj = object_resolve_path_type(path, TYPE_CXL_TYPE3, NULL);
+    if (!obj) {
+        error_setg(errp, "Unable to resolve CXL type 3 device");
+        return;
+    }
+
+    dcd = CXL_TYPE3(obj);
+    if (!dcd->dc.num_regions) {
+        error_setg(errp, "No dynamic capacity support from the device");
+        return;
+    }
+
+
+    if (rid >= dcd->dc.num_regions) {
+        error_setg(errp, "region id is too large");
+        return;
+    }
+    block_size = dcd->dc.regions[rid].block_size;
+    blk_bitmap = bitmap_new(dcd->dc.regions[rid].len / block_size);
+
+    /* Sanity check and count the extents */
+    list = records;
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = offset + dcd->dc.regions[rid].base;
+
+        if (len == 0) {
+            error_setg(errp, "extent with 0 length is not allowed");
+            return;
+        }
+
+        if (offset % block_size || len % block_size) {
+            error_setg(errp, "dpa or len is not aligned to region block size");
+            return;
+        }
+
+        if (offset + len > dcd->dc.regions[rid].len) {
+            error_setg(errp, "extent range is beyond the region end");
+            return;
+        }
+
+        /* No duplicate or overlapped extents are allowed */
+        if (test_any_bits_set(blk_bitmap, offset / block_size,
+                              len / block_size)) {
+            error_setg(errp, "duplicate or overlapped extents are detected");
+            return;
+        }
+        bitmap_set(blk_bitmap, offset / block_size, len / block_size);
+
+        if (type == DC_EVENT_RELEASE_CAPACITY) {
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with pending DPA range");
+                return;
+            }
+            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot release extent with non-existing DPA range");
+                return;
+            }
+        } else if (type == DC_EVENT_ADD_CAPACITY) {
+            if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA already accessible to the same LD");
+                return;
+            }
+            if (cxl_extent_groups_overlaps_dpa_range(&dcd->dc.extents_pending,
+                                                     dpa, len)) {
+                error_setg(errp,
+                           "cannot add DPA again while still pending");
+                return;
+            }
+        }
+        list = list->next;
+        num_extents++;
+    }
+
+    /* Create extent list for event being passed to host */
+    i = 0;
+    list = records;
+    extents = g_new0(CXLDCExtentRaw, num_extents);
+    while (list) {
+        offset = list->value->offset;
+        len = list->value->len;
+        dpa = dcd->dc.regions[rid].base + offset;
+
+        extents[i].start_dpa = dpa;
+        extents[i].len = len;
+        memset(extents[i].tag, 0, 0x10);
+        extents[i].shared_seq = 0;
+        if (type == DC_EVENT_ADD_CAPACITY) {
+            group = cxl_insert_extent_to_extent_group(group,
+                                                      extents[i].start_dpa,
+                                                      extents[i].len,
+                                                      extents[i].tag,
+                                                      extents[i].shared_seq);
+        }
+
+        list = list->next;
+        i++;
+    }
+    if (group) {
+        cxl_extent_group_list_insert_tail(&dcd->dc.extents_pending, group);
+    }
+
+    /*
+     * CXL r3.1 section 8.2.9.2.1.6: Dynamic Capacity Event Record
+     *
+     * All Dynamic Capacity event records shall set the Event Record Severity
+     * field in the Common Event Record Format to Informational Event. All
+     * Dynamic Capacity related events shall be logged in the Dynamic Capacity
+     * Event Log.
+     */
+    cxl_assign_event_header(hdr, &dynamic_capacity_uuid, flags, sizeof(dCap),
+                            cxl_device_get_timestamp(&dcd->cxl_dstate));
+
+    dCap.type = type;
+    /* FIXME: for now, validity flag is cleared */
+    dCap.validity_flags = 0;
+    stw_le_p(&dCap.host_id, hid);
+    /* only valid for DC_REGION_CONFIG_UPDATED event */
+    dCap.updated_region_id = 0;
+    dCap.flags = 0;
+    for (i = 0; i < num_extents; i++) {
+        memcpy(&dCap.dynamic_capacity_extent, &extents[i],
+               sizeof(CXLDCExtentRaw));
+
+        if (i < num_extents - 1) {
+            /* Set "More" flag */
+            dCap.flags |= BIT(0);
+        }
+
+        if (cxl_event_insert(&dcd->cxl_dstate, enc_log,
+                             (CXLEventRecordRaw *)&dCap)) {
+            cxl_event_irq_assert(dcd);
+        }
+    }
+}
+
+void qmp_cxl_add_dynamic_capacity(const char *path, uint16_t host_id,
+                                  CXLExtSelPolicy sel_policy, uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList  *extents,
+                                  Error **errp)
+{
+    switch (sel_policy) {
+    case CXL_EXT_SEL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id,
+                                                      DC_EVENT_ADD_CAPACITY,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Selection policy not supported");
+        return;
+    }
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      CXLExtRemovalPolicy removal_policy,
+                                      bool has_forced_removal,
+                                      bool forced_removal,
+                                      bool has_sanitize_on_release,
+                                      bool sanitize_on_release,
+                                      uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList  *extents,
+                                      Error **errp)
+{
+    CXLDCEventType type = DC_EVENT_RELEASE_CAPACITY;
+
+    if (has_forced_removal && forced_removal) {
+        /* TODO: enable forced removal in the future */
+        type = DC_EVENT_FORCED_RELEASE_CAPACITY;
+        error_setg(errp, "Forced removal not supported yet");
+        return;
+    }
+
+    switch (removal_policy) {
+    case CXL_EXT_REMOVAL_POLICY_PRESCRIPTIVE:
+        qmp_cxl_process_dynamic_capacity_prescriptive(path, host_id, type,
+                                                      region, extents, errp);
+        return;
+    default:
+        error_setg(errp, "Removal policy not supported");
+        return;
+    }
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
diff --git a/hw/mem/cxl_type3_stubs.c b/hw/mem/cxl_type3_stubs.c
index 3e1851e32b..45419bbefe 100644
--- a/hw/mem/cxl_type3_stubs.c
+++ b/hw/mem/cxl_type3_stubs.c
@@ -67,3 +67,28 @@ void qmp_cxl_inject_correctable_error(const char *path, CxlCorErrorType type,
 {
     error_setg(errp, "CXL Type 3 support is not compiled in");
 }
+
+void qmp_cxl_add_dynamic_capacity(const char *path,
+                                  uint16_t host_id,
+                                  CXLExtSelPolicy sel_policy,
+                                  uint8_t region,
+                                  const char *tag,
+                                  CXLDynamicCapacityExtentList *extents,
+                                  Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
+
+void qmp_cxl_release_dynamic_capacity(const char *path, uint16_t host_id,
+                                      CXLExtRemovalPolicy removal_policy,
+                                      bool has_forced_removal,
+                                      bool forced_removal,
+                                      bool has_sanitize_on_release,
+                                      bool sanitize_on_release,
+                                      uint8_t region,
+                                      const char *tag,
+                                      CXLDynamicCapacityExtentList *extents,
+                                      Error **errp)
+{
+    error_setg(errp, "CXL Type 3 support is not compiled in");
+}
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 28/46] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (26 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 27/46] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 29/46] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support Michael S. Tsirkin
                   ` (19 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Fan Ni, Svetly Todorov, Gregory Price,
	Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

All DPA ranges in the DC regions are invalid to access until an extent
covering the range has been successfully accepted by the host. A bitmap
is added to each region to record whether a DC block in the region has
been backed by a DC extent. Each bit in the bitmap represents a DC block.
When a DC extent is accepted, all the bits representing the blocks in the
extent are set, which will be cleared when the extent is released.

Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-13-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/cxl/cxl_device.h |  7 ++++
 hw/cxl/cxl-mailbox-utils.c  |  3 ++
 hw/mem/cxl_type3.c          | 76 +++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index c69ff6b5de..0a4fcb2800 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -456,6 +456,7 @@ typedef struct CXLDCRegion {
     uint64_t block_size;
     uint32_t dsmadhandle;
     uint8_t flags;
+    unsigned long *blk_bitmap;
 } CXLDCRegion;
 
 struct CXLType3Dev {
@@ -577,4 +578,10 @@ CXLDCExtentGroup *cxl_insert_extent_to_extent_group(CXLDCExtentGroup *group,
 void cxl_extent_group_list_insert_tail(CXLDCExtentGroupList *list,
                                        CXLDCExtentGroup *group);
 void cxl_extent_group_list_delete_front(CXLDCExtentGroupList *list);
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                 uint64_t len);
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                   uint64_t len);
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len);
 #endif
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 64387f34ce..c4852112fe 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1655,6 +1655,7 @@ static CXLRetCode cmd_dcd_add_dyn_cap_rsp(const struct cxl_cmd *cmd,
 
         cxl_insert_extent_to_extent_list(extent_list, dpa, len, NULL, 0);
         ct3d->dc.total_extent_count += 1;
+        ct3_set_region_block_backed(ct3d, dpa, len);
     }
     /* Remove the first extent group in the pending list */
     cxl_extent_group_list_delete_front(&ct3d->dc.extents_pending);
@@ -1813,10 +1814,12 @@ static CXLRetCode cmd_dcd_release_dyn_cap(const struct cxl_cmd *cmd,
      * list and update the extent count;
      */
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
+        ct3_clear_region_block_backed(ct3d, ent->start_dpa, ent->len);
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
     }
     copy_extent_list(&ct3d->dc.extents, &updated_list);
     QTAILQ_FOREACH_SAFE(ent, &updated_list, node, ent_next) {
+        ct3_set_region_block_backed(ct3d, ent->start_dpa, ent->len);
         cxl_remove_extent_from_extent_list(&updated_list, ent);
     }
     ct3d->dc.total_extent_count = updated_list_size;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index f53bcca6d3..0d18259ec0 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -672,6 +672,7 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
             .flags = 0,
         };
         ct3d->dc.total_capacity += region->len;
+        region->blk_bitmap = bitmap_new(region->len / region->block_size);
     }
     QTAILQ_INIT(&ct3d->dc.extents);
     QTAILQ_INIT(&ct3d->dc.extents_pending);
@@ -683,6 +684,8 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
 {
     CXLDCExtent *ent, *ent_next;
     CXLDCExtentGroup *group, *group_next;
+    int i;
+    CXLDCRegion *region;
 
     QTAILQ_FOREACH_SAFE(ent, &ct3d->dc.extents, node, ent_next) {
         cxl_remove_extent_from_extent_list(&ct3d->dc.extents, ent);
@@ -695,6 +698,11 @@ static void cxl_destroy_dc_regions(CXLType3Dev *ct3d)
         }
         g_free(group);
     }
+
+    for (i = 0; i < ct3d->dc.num_regions; i++) {
+        region = &ct3d->dc.regions[i];
+        g_free(region->blk_bitmap);
+    }
 }
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
@@ -926,6 +934,70 @@ static void ct3_exit(PCIDevice *pci_dev)
     }
 }
 
+/*
+ * Mark the DPA range [dpa, dap + len - 1] to be backed and accessible. This
+ * happens when a DC extent is added and accepted by the host.
+ */
+void ct3_set_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                 uint64_t len)
+{
+    CXLDCRegion *region;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return;
+    }
+
+    bitmap_set(region->blk_bitmap, (dpa - region->base) / region->block_size,
+               len / region->block_size);
+}
+
+/*
+ * Check whether the DPA range [dpa, dpa + len - 1] is backed with DC extents.
+ * Used when validating read/write to dc regions
+ */
+bool ct3_test_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                  uint64_t len)
+{
+    CXLDCRegion *region;
+    uint64_t nbits;
+    long nr;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return false;
+    }
+
+    nr = (dpa - region->base) / region->block_size;
+    nbits = DIV_ROUND_UP(len, region->block_size);
+    /*
+     * if bits between [dpa, dpa + len) are all 1s, meaning the DPA range is
+     * backed with DC extents, return true; else return false.
+     */
+    return find_next_zero_bit(region->blk_bitmap, nr + nbits, nr) == nr + nbits;
+}
+
+/*
+ * Mark the DPA range [dpa, dap + len - 1] to be unbacked and inaccessible.
+ * This happens when a dc extent is released by the host.
+ */
+void ct3_clear_region_block_backed(CXLType3Dev *ct3d, uint64_t dpa,
+                                   uint64_t len)
+{
+    CXLDCRegion *region;
+    uint64_t nbits;
+    long nr;
+
+    region = cxl_find_dc_region(ct3d, dpa, len);
+    if (!region) {
+        return;
+    }
+
+    nr = (dpa - region->base) / region->block_size;
+    nbits = len / region->block_size;
+    bitmap_clear(region->blk_bitmap, nr, nbits);
+}
+
 static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
 {
     int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
@@ -1030,6 +1102,10 @@ static int cxl_type3_hpa_to_as_and_dpa(CXLType3Dev *ct3d,
         *as = &ct3d->hostpmem_as;
         *dpa_offset -= vmr_size;
     } else {
+        if (!ct3_test_region_block_backed(ct3d, *dpa_offset, size)) {
+            return -ENODEV;
+        }
+
         *as = &ct3d->dc.host_dc_as;
         *dpa_offset -= (vmr_size + pmr_size);
     }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 29/46] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (27 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 28/46] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 30/46] hw/mem/cxl_type3: Allow to release extent superset in QMP interface Michael S. Tsirkin
                   ` (18 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Fan Ni, Svetly Todorov, Gregory Price,
	Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

With the change, we extend the extent release mailbox command processing
to allow more flexible release. As long as the DPA range of the extent to
release is covered by accepted extent(s) in the device, the release can be
performed.

Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-14-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/cxl/cxl-mailbox-utils.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index c4852112fe..74eeb6fde7 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -1704,6 +1704,13 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
         dpa = in->updated_entries[i].start_dpa;
         len = in->updated_entries[i].len;
 
+        /* Check if the DPA range is not fully backed with valid extents */
+        if (!ct3_test_region_block_backed(ct3d, dpa, len)) {
+            ret = CXL_MBOX_INVALID_PA;
+            goto free_and_exit;
+        }
+
+        /* After this point, extent overflow is the only error can happen */
         while (len > 0) {
             QTAILQ_FOREACH(ent, updated_list, node) {
                 range_init_nofail(&range, ent->start_dpa, ent->len);
@@ -1718,14 +1725,7 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
                     if (range_contains(&range, dpa + len - 1)) {
                         len2 = ent_start_dpa + ent_len - dpa - len;
                     } else {
-                        /*
-                         * TODO: we reject the attempt to remove an extent
-                         * that overlaps with multiple extents in the device
-                         * for now. We will allow it once superset release
-                         * support is added.
-                         */
-                        ret = CXL_MBOX_INVALID_PA;
-                        goto free_and_exit;
+                        dpa = ent_start_dpa + ent_len;
                     }
                     len_done = ent_len - len1 - len2;
 
@@ -1752,14 +1752,9 @@ static CXLRetCode cxl_dc_extent_release_dry_run(CXLType3Dev *ct3d,
                     }
 
                     len -= len_done;
-                    /* len == 0 here until superset release is added */
                     break;
                 }
             }
-            if (len) {
-                ret = CXL_MBOX_INVALID_PA;
-                goto free_and_exit;
-            }
         }
     }
 free_and_exit:
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 30/46] hw/mem/cxl_type3: Allow to release extent superset in QMP interface
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (28 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 29/46] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 31/46] hw/acpi/GI: Fix trivial parameter alignment issue Michael S. Tsirkin
                   ` (17 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Fan Ni, Svetly Todorov, Gregory Price,
	Jonathan Cameron

From: Fan Ni <fan.ni@samsung.com>

Before the change, the QMP interface used for add/release DC extents
only allows to release an extent whose DPA range is contained by a single
accepted extent in the device.

With the change, we relax the constraints.  As long as the DPA range of
the extent is covered by accepted extents, we allow the release.

Tested-by: Svetly Todorov <svetly.todorov@memverge.com>
Reviewed-by: Gregory Price <gregory.price@memverge.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Fan Ni <fan.ni@samsung.com>
Message-Id: <20240523174651.1089554-15-nifan.cxl@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/mem/cxl_type3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 0d18259ec0..5d4a1276be 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1947,7 +1947,7 @@ static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
                            "cannot release extent with pending DPA range");
                 return;
             }
-            if (!cxl_extents_contains_dpa_range(&dcd->dc.extents, dpa, len)) {
+            if (!ct3_test_region_block_backed(dcd, dpa, len)) {
                 error_setg(errp,
                            "cannot release extent with non-existing DPA range");
                 return;
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 31/46] hw/acpi/GI: Fix trivial parameter alignment issue.
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (29 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 30/46] hw/mem/cxl_type3: Allow to release extent superset in QMP interface Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 32/46] hw/acpi: Insert an acpi-generic-node base under acpi-generic-initiator Michael S. Tsirkin
                   ` (16 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jonathan Cameron, Ankit Agrawal, Igor Mammedov,
	Ani Sinha, Michael Tokarev, Laurent Vivier, qemu-trivial

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Before making additional modification, tidy up this misleading indentation.

Reviewed-by: Ankit Agrawal <ankita@nvidia.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240524100507.32106-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/acpi/acpi_generic_initiator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
index 17b9a052f5..18a939b0e5 100644
--- a/hw/acpi/acpi_generic_initiator.c
+++ b/hw/acpi/acpi_generic_initiator.c
@@ -132,7 +132,7 @@ static int build_all_acpi_generic_initiators(Object *obj, void *opaque)
 
     dev_handle.segment = 0;
     dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
-                                               pci_dev->devfn);
+                                   pci_dev->devfn);
 
     build_srat_generic_pci_initiator_affinity(table_data,
                                               gi->node, &dev_handle);
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 32/46] hw/acpi: Insert an acpi-generic-node base under acpi-generic-initiator
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (30 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 31/46] hw/acpi/GI: Fix trivial parameter alignment issue Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:07 ` [PULL 33/46] hw/acpi: Generic Port Affinity Structure support Michael S. Tsirkin
                   ` (15 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jonathan Cameron, Igor Mammedov, Ani Sinha

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

This will simplify reuse when adding acpi-generic-port.
Note that some error_printf() messages will now print acpi-generic-node
whereas others will move to type specific cases in next patch so
are left alone for now.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240524100507.32106-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/acpi/acpi_generic_initiator.h | 15 ++++-
 hw/acpi/acpi_generic_initiator.c         | 80 +++++++++++++++---------
 2 files changed, 63 insertions(+), 32 deletions(-)

diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
index a304bad73e..dd4be19c8f 100644
--- a/include/hw/acpi/acpi_generic_initiator.h
+++ b/include/hw/acpi/acpi_generic_initiator.h
@@ -8,15 +8,26 @@
 
 #include "qom/object_interfaces.h"
 
-#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
+/*
+ * Abstract type to be used as base for
+ * - acpi-generic-initiator
+ * - acpi-generic-port
+ */
+#define TYPE_ACPI_GENERIC_NODE "acpi-generic-node"
 
-typedef struct AcpiGenericInitiator {
+typedef struct AcpiGenericNode {
     /* private */
     Object parent;
 
     /* public */
     char *pci_dev;
     uint16_t node;
+} AcpiGenericNode;
+
+#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
+
+typedef struct AcpiGenericInitiator {
+    AcpiGenericNode parent;
 } AcpiGenericInitiator;
 
 /*
diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
index 18a939b0e5..c054e0e27d 100644
--- a/hw/acpi/acpi_generic_initiator.c
+++ b/hw/acpi/acpi_generic_initiator.c
@@ -10,45 +10,61 @@
 #include "hw/pci/pci_device.h"
 #include "qemu/error-report.h"
 
-typedef struct AcpiGenericInitiatorClass {
+typedef struct AcpiGenericNodeClass {
     ObjectClass parent_class;
+} AcpiGenericNodeClass;
+
+typedef struct AcpiGenericInitiatorClass {
+     AcpiGenericNodeClass parent_class;
 } AcpiGenericInitiatorClass;
 
+OBJECT_DEFINE_ABSTRACT_TYPE(AcpiGenericNode, acpi_generic_node,
+                            ACPI_GENERIC_NODE, OBJECT)
+
+OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericNode, ACPI_GENERIC_NODE)
+
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
-                   ACPI_GENERIC_INITIATOR, OBJECT,
+                   ACPI_GENERIC_INITIATOR, ACPI_GENERIC_NODE,
                    { TYPE_USER_CREATABLE },
                    { NULL })
 
 OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
 
+static void acpi_generic_node_init(Object *obj)
+{
+    AcpiGenericNode *gn = ACPI_GENERIC_NODE(obj);
+
+    gn->node = MAX_NODES;
+    gn->pci_dev = NULL;
+}
+
 static void acpi_generic_initiator_init(Object *obj)
 {
-    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+}
 
-    gi->node = MAX_NODES;
-    gi->pci_dev = NULL;
+static void acpi_generic_node_finalize(Object *obj)
+{
+    AcpiGenericNode *gn = ACPI_GENERIC_NODE(obj);
+
+    g_free(gn->pci_dev);
 }
 
 static void acpi_generic_initiator_finalize(Object *obj)
 {
-    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
-
-    g_free(gi->pci_dev);
 }
 
-static void acpi_generic_initiator_set_pci_device(Object *obj, const char *val,
-                                                  Error **errp)
+static void acpi_generic_node_set_pci_device(Object *obj, const char *val,
+                                             Error **errp)
 {
-    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+    AcpiGenericNode *gn = ACPI_GENERIC_NODE(obj);
 
-    gi->pci_dev = g_strdup(val);
+    gn->pci_dev = g_strdup(val);
 }
-
-static void acpi_generic_initiator_set_node(Object *obj, Visitor *v,
-                                            const char *name, void *opaque,
-                                            Error **errp)
+static void acpi_generic_node_set_node(Object *obj, Visitor *v,
+                                       const char *name, void *opaque,
+                                       Error **errp)
 {
-    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+    AcpiGenericNode *gn = ACPI_GENERIC_NODE(obj);
     MachineState *ms = MACHINE(qdev_get_machine());
     uint32_t value;
 
@@ -58,20 +74,24 @@ static void acpi_generic_initiator_set_node(Object *obj, Visitor *v,
 
     if (value >= MAX_NODES) {
         error_printf("%s: Invalid NUMA node specified\n",
-                     TYPE_ACPI_GENERIC_INITIATOR);
+                     TYPE_ACPI_GENERIC_NODE);
         exit(1);
     }
 
-    gi->node = value;
-    ms->numa_state->nodes[gi->node].has_gi = true;
+    gn->node = value;
+    ms->numa_state->nodes[gn->node].has_gi = true;
+}
+
+static void acpi_generic_node_class_init(ObjectClass *oc, void *data)
+{
+    object_class_property_add_str(oc, "pci-dev", NULL,
+        acpi_generic_node_set_pci_device);
+    object_class_property_add(oc, "node", "int", NULL,
+        acpi_generic_node_set_node, NULL, NULL);
 }
 
 static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
 {
-    object_class_property_add_str(oc, "pci-dev", NULL,
-        acpi_generic_initiator_set_pci_device);
-    object_class_property_add(oc, "node", "int", NULL,
-        acpi_generic_initiator_set_node, NULL, NULL);
 }
 
 /*
@@ -104,9 +124,9 @@ build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
 static int build_all_acpi_generic_initiators(Object *obj, void *opaque)
 {
     MachineState *ms = MACHINE(qdev_get_machine());
-    AcpiGenericInitiator *gi;
     GArray *table_data = opaque;
     PCIDeviceHandle dev_handle;
+    AcpiGenericNode *gn;
     PCIDevice *pci_dev;
     Object *o;
 
@@ -114,14 +134,14 @@ static int build_all_acpi_generic_initiators(Object *obj, void *opaque)
         return 0;
     }
 
-    gi = ACPI_GENERIC_INITIATOR(obj);
-    if (gi->node >= ms->numa_state->num_nodes) {
+    gn = ACPI_GENERIC_NODE(obj);
+    if (gn->node >= ms->numa_state->num_nodes) {
         error_printf("%s: Specified node %d is invalid.\n",
-                     TYPE_ACPI_GENERIC_INITIATOR, gi->node);
+                     TYPE_ACPI_GENERIC_INITIATOR, gn->node);
         exit(1);
     }
 
-    o = object_resolve_path_type(gi->pci_dev, TYPE_PCI_DEVICE, NULL);
+    o = object_resolve_path_type(gn->pci_dev, TYPE_PCI_DEVICE, NULL);
     if (!o) {
         error_printf("%s: Specified device must be a PCI device.\n",
                      TYPE_ACPI_GENERIC_INITIATOR);
@@ -135,7 +155,7 @@ static int build_all_acpi_generic_initiators(Object *obj, void *opaque)
                                    pci_dev->devfn);
 
     build_srat_generic_pci_initiator_affinity(table_data,
-                                              gi->node, &dev_handle);
+                                              gn->node, &dev_handle);
 
     return 0;
 }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 33/46] hw/acpi: Generic Port Affinity Structure support
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (31 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 32/46] hw/acpi: Insert an acpi-generic-node base under acpi-generic-initiator Michael S. Tsirkin
@ 2024-06-04 19:07 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 34/46] bios-tables-test: Allow for new acpihmat-generic-x test data Michael S. Tsirkin
                   ` (14 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:07 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jonathan Cameron, Igor Mammedov, Ani Sinha,
	Marcel Apfelbaum, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Eric Blake, Markus Armbruster

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

These are very similar to the recently added Generic Initiators
but instead of representing an initiator of memory traffic they
represent an edge point beyond which may lie either targets or
initiators.  Here we add these ports such that they may
be targets of hmat_lb records to describe the latency and
bandwidth from host side initiators to the port.  A descoverable
mechanism such as UEFI CDAT read from CXL devices and switches
is used to discover the remainder fo the path and the OS can build
up full latency and bandwidth numbers as need for work and data
placement decisions.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240524100507.32106-4-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 qapi/qom.json                            |  35 ++++++
 include/hw/acpi/acpi_generic_initiator.h |  18 ++-
 include/hw/pci/pci_bridge.h              |   1 +
 hw/acpi/acpi_generic_initiator.c         | 145 +++++++++++++++++------
 hw/pci-bridge/pci_expander_bridge.c      |   1 -
 5 files changed, 160 insertions(+), 40 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..9d1d86bdad 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -826,6 +826,39 @@
   'data': { 'pci-dev': 'str',
             'node': 'uint32' } }
 
+
+##
+# @AcpiGenericPortProperties:
+#
+# Properties for acpi-generic-port objects.
+#
+# @pci-bus: QOM path of the PCI bus of the hostbridge associated with
+#     this SRAT Generic Port Affinity Structure.  This is the same as
+#     the bus parameter for the root ports attached to this host bridge.
+#     The resulting SRAT Generic Port Affinity Structure will refer to
+#     the ACPI object in DSDT that represents the host bridge (e.g.
+#     ACPI0016 for CXL host bridges.) See ACPI 6.5 Section 5.2.16.7 for
+#     more information.
+#
+# @node: Similar to a NUMA node ID, but instead of providing a reference
+#     point used for defining NUMA distances and access characteristics
+#     to memory or from an initiator (e.g. CPU), this node defines the
+#     boundary point between non-discoverable system buses which must be
+#     described by firmware, and a discoverable bus.  NUMA distances
+#     and access characteristics are defined to and from that point.
+#     For system software to establish full initiator to target
+#     characteristics this information must be combined with information
+#     retrieved from the discoverable part of the path.  An example would
+#     use CDAT (see UEFI.org) information read from devices and switches
+#     in conjunction with link characteristics read from PCIe
+#     Configuration space.
+#
+# Since: 9.1
+##
+{ 'struct': 'AcpiGenericPortProperties',
+  'data': { 'pci-bus': 'str',
+            'node': 'uint32' } }
+
 ##
 # @RngProperties:
 #
@@ -953,6 +986,7 @@
 { 'enum': 'ObjectType',
   'data': [
     'acpi-generic-initiator',
+    'acpi-generic-port',
     'authz-list',
     'authz-listfile',
     'authz-pam',
@@ -1025,6 +1059,7 @@
   'discriminator': 'qom-type',
   'data': {
       'acpi-generic-initiator':     'AcpiGenericInitiatorProperties',
+      'acpi-generic-port':          'AcpiGenericPortProperties',
       'authz-list':                 'AuthZListProperties',
       'authz-listfile':             'AuthZListFileProperties',
       'authz-pam':                  'AuthZPAMProperties',
diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
index dd4be19c8f..1a899af30f 100644
--- a/include/hw/acpi/acpi_generic_initiator.h
+++ b/include/hw/acpi/acpi_generic_initiator.h
@@ -30,6 +30,12 @@ typedef struct AcpiGenericInitiator {
     AcpiGenericNode parent;
 } AcpiGenericInitiator;
 
+#define TYPE_ACPI_GENERIC_PORT "acpi-generic-port"
+
+typedef struct AcpiGenericPort {
+    AcpiGenericInitiator parent;
+} AcpiGenericPort;
+
 /*
  * ACPI 6.3:
  * Table 5-81 Flags – Generic Initiator Affinity Structure
@@ -49,8 +55,16 @@ typedef enum {
  * Table 5-80 Device Handle - PCI
  */
 typedef struct PCIDeviceHandle {
-    uint16_t segment;
-    uint16_t bdf;
+    union {
+        struct {
+            uint16_t segment;
+            uint16_t bdf;
+        };
+        struct {
+            uint64_t hid;
+            uint32_t uid;
+        };
+    };
 } PCIDeviceHandle;
 
 void build_srat_generic_pci_initiator(GArray *table_data);
diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h
index 5cd452115a..5456e24883 100644
--- a/include/hw/pci/pci_bridge.h
+++ b/include/hw/pci/pci_bridge.h
@@ -102,6 +102,7 @@ typedef struct PXBPCIEDev {
     PXBDev parent_obj;
 } PXBPCIEDev;
 
+#define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
 #define TYPE_PXB_DEV "pxb"
 OBJECT_DECLARE_SIMPLE_TYPE(PXBDev, PXB_DEV)
 
diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
index c054e0e27d..78b80dcf08 100644
--- a/hw/acpi/acpi_generic_initiator.c
+++ b/hw/acpi/acpi_generic_initiator.c
@@ -7,6 +7,7 @@
 #include "hw/acpi/acpi_generic_initiator.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/boards.h"
+#include "hw/pci/pci_bridge.h"
 #include "hw/pci/pci_device.h"
 #include "qemu/error-report.h"
 
@@ -18,6 +19,10 @@ typedef struct AcpiGenericInitiatorClass {
      AcpiGenericNodeClass parent_class;
 } AcpiGenericInitiatorClass;
 
+typedef struct AcpiGenericPortClass {
+    AcpiGenericInitiatorClass parent;
+} AcpiGenericPortClass;
+
 OBJECT_DEFINE_ABSTRACT_TYPE(AcpiGenericNode, acpi_generic_node,
                             ACPI_GENERIC_NODE, OBJECT)
 
@@ -30,6 +35,13 @@ OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
 
 OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
 
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericPort, acpi_generic_port,
+                   ACPI_GENERIC_PORT, ACPI_GENERIC_NODE,
+                   { TYPE_USER_CREATABLE },
+                   { NULL })
+
+OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericPort, ACPI_GENERIC_PORT)
+
 static void acpi_generic_node_init(Object *obj)
 {
     AcpiGenericNode *gn = ACPI_GENERIC_NODE(obj);
@@ -53,6 +65,14 @@ static void acpi_generic_initiator_finalize(Object *obj)
 {
 }
 
+static void acpi_generic_port_init(Object *obj)
+{
+}
+
+static void acpi_generic_port_finalize(Object *obj)
+{
+}
+
 static void acpi_generic_node_set_pci_device(Object *obj, const char *val,
                                              Error **errp)
 {
@@ -79,42 +99,61 @@ static void acpi_generic_node_set_node(Object *obj, Visitor *v,
     }
 
     gn->node = value;
-    ms->numa_state->nodes[gn->node].has_gi = true;
+    if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
+        ms->numa_state->nodes[gn->node].has_gi = true;
+    }
 }
 
 static void acpi_generic_node_class_init(ObjectClass *oc, void *data)
 {
-    object_class_property_add_str(oc, "pci-dev", NULL,
-        acpi_generic_node_set_pci_device);
     object_class_property_add(oc, "node", "int", NULL,
         acpi_generic_node_set_node, NULL, NULL);
 }
 
 static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
 {
+    object_class_property_add_str(oc, "pci-dev", NULL,
+        acpi_generic_node_set_pci_device);
+}
+
+static void acpi_generic_port_class_init(ObjectClass *oc, void *data)
+{
+    /*
+     * Despite the ID representing a root bridge bus, same storage
+     * can be used.
+     */
+    object_class_property_add_str(oc, "pci-bus", NULL,
+        acpi_generic_node_set_pci_device);
 }
 
 /*
  * ACPI 6.3:
  * Table 5-78 Generic Initiator Affinity Structure
+ * ACPI 6.5:
+ * Table 5-67 Generic Port Affinity Structure
  */
 static void
-build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
-                                          PCIDeviceHandle *handle)
+build_srat_generic_node_affinity(GArray *table_data, int node,
+                                 PCIDeviceHandle *handle, bool gp, bool pci)
 {
-    uint8_t index;
-
-    build_append_int_noprefix(table_data, 5, 1);  /* Type */
+    build_append_int_noprefix(table_data, gp ? 6 : 5, 1);  /* Type */
     build_append_int_noprefix(table_data, 32, 1); /* Length */
     build_append_int_noprefix(table_data, 0, 1);  /* Reserved */
-    build_append_int_noprefix(table_data, 1, 1);  /* Device Handle Type: PCI */
+    /* Device Handle Type: PCI / ACPI */
+    build_append_int_noprefix(table_data, pci ? 1 : 0, 1);
     build_append_int_noprefix(table_data, node, 4);  /* Proximity Domain */
 
     /* Device Handle - PCI */
-    build_append_int_noprefix(table_data, handle->segment, 2);
-    build_append_int_noprefix(table_data, handle->bdf, 2);
-    for (index = 0; index < 12; index++) {
-        build_append_int_noprefix(table_data, 0, 1);
+    if (pci) {
+        /* Device Handle - PCI */
+        build_append_int_noprefix(table_data, handle->segment, 2);
+        build_append_int_noprefix(table_data, handle->bdf, 2);
+        build_append_int_noprefix(table_data, 0, 12);
+    } else {
+        /* Device Handle - ACPI */
+        build_append_int_noprefix(table_data, handle->hid, 8);
+        build_append_int_noprefix(table_data, handle->uid, 4);
+        build_append_int_noprefix(table_data, 0, 4);
     }
 
     build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
@@ -127,37 +166,69 @@ static int build_all_acpi_generic_initiators(Object *obj, void *opaque)
     GArray *table_data = opaque;
     PCIDeviceHandle dev_handle;
     AcpiGenericNode *gn;
-    PCIDevice *pci_dev;
     Object *o;
 
-    if (!object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
+    if (!object_dynamic_cast(obj, TYPE_ACPI_GENERIC_NODE)) {
         return 0;
     }
 
     gn = ACPI_GENERIC_NODE(obj);
-    if (gn->node >= ms->numa_state->num_nodes) {
-        error_printf("%s: Specified node %d is invalid.\n",
-                     TYPE_ACPI_GENERIC_INITIATOR, gn->node);
-        exit(1);
+
+    if (object_dynamic_cast(OBJECT(gn), TYPE_ACPI_GENERIC_INITIATOR)) {
+        PCIDevice *pci_dev;
+
+        if (gn->node >= ms->numa_state->num_nodes) {
+            error_printf("%s: node %d is invalid.\n",
+                         TYPE_ACPI_GENERIC_INITIATOR, gn->node);
+            exit(1);
+        }
+
+        o = object_resolve_path_type(gn->pci_dev, TYPE_PCI_DEVICE, NULL);
+        if (!o) {
+            error_printf("%s: device must be a PCI device.\n",
+                         TYPE_ACPI_GENERIC_INITIATOR);
+            exit(1);
+        }
+        pci_dev = PCI_DEVICE(o);
+
+        dev_handle.segment = 0;
+        dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
+                                       pci_dev->devfn);
+        build_srat_generic_node_affinity(table_data,
+                                         gn->node, &dev_handle, false, true);
+
+        return 0;
+    } else { /* TYPE_ACPI_GENERIC_PORT */
+        PCIBus *bus;
+        const char *hid = "ACPI0016";
+
+        if (gn->node >= ms->numa_state->num_nodes) {
+            error_printf("%s: node %d is invalid.\n",
+                         TYPE_ACPI_GENERIC_PORT, gn->node);
+            exit(1);
+        }
+
+        o = object_resolve_path_type(gn->pci_dev, TYPE_PCI_BUS, NULL);
+        if (!o) {
+            error_printf("%s: device must be a PCI host bridge.\n",
+                         TYPE_ACPI_GENERIC_PORT);
+            exit(1);
+        }
+        bus = PCI_BUS(o);
+        /* Need to know if this is a PXB Bus so below an expander bridge */
+        if (!object_dynamic_cast(OBJECT(bus), TYPE_PXB_CXL_BUS)) {
+            error_printf("%s: device is not a bus below a host bridge.\n",
+                         TYPE_ACPI_GENERIC_PORT);
+            exit(1);
+        }
+        /* Copy without trailing NULL */
+        memcpy(&dev_handle.hid, hid, sizeof(dev_handle.hid));
+        dev_handle.uid = pci_bus_num(bus);
+        build_srat_generic_node_affinity(table_data,
+                                         gn->node, &dev_handle, true, false);
+
+        return 0;
     }
-
-    o = object_resolve_path_type(gn->pci_dev, TYPE_PCI_DEVICE, NULL);
-    if (!o) {
-        error_printf("%s: Specified device must be a PCI device.\n",
-                     TYPE_ACPI_GENERIC_INITIATOR);
-        exit(1);
-    }
-
-    pci_dev = PCI_DEVICE(o);
-
-    dev_handle.segment = 0;
-    dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
-                                   pci_dev->devfn);
-
-    build_srat_generic_pci_initiator_affinity(table_data,
-                                              gn->node, &dev_handle);
-
-    return 0;
 }
 
 void build_srat_generic_pci_initiator(GArray *table_data)
diff --git a/hw/pci-bridge/pci_expander_bridge.c b/hw/pci-bridge/pci_expander_bridge.c
index 0411ad31ea..f5431443b9 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -38,7 +38,6 @@ DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_PCIE_BUS,
                          TYPE_PXB_PCIE_BUS)
 
-#define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_CXL_BUS,
                          TYPE_PXB_CXL_BUS)
 
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 34/46] bios-tables-test: Allow for new acpihmat-generic-x test data.
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (32 preceding siblings ...)
  2024-06-04 19:07 ` [PULL 33/46] hw/acpi: Generic Port Affinity Structure support Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 35/46] bios-tables-test: Add complex SRAT / HMAT test for GI GP Michael S. Tsirkin
                   ` (13 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jonathan Cameron, Igor Mammedov, Ani Sinha

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

The test to be added exercises many corners of the SRAT and HMAT
table generation.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240524100507.32106-5-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 5 +++++
 tests/data/acpi/q35/APIC.acpihmat-generic-x | 0
 tests/data/acpi/q35/CEDT.acpihmat-generic-x | 0
 tests/data/acpi/q35/DSDT.acpihmat-generic-x | 0
 tests/data/acpi/q35/HMAT.acpihmat-generic-x | 0
 tests/data/acpi/q35/SRAT.acpihmat-generic-x | 0
 6 files changed, 5 insertions(+)
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat-generic-x
 create mode 100644 tests/data/acpi/q35/CEDT.acpihmat-generic-x
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat-generic-x
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat-generic-x
 create mode 100644 tests/data/acpi/q35/SRAT.acpihmat-generic-x

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..a5aa801c99 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,6 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/APIC.acpihmat-generic-x",
+"tests/data/acpi/q35/CEDT.acpihmat-generic-x",
+"tests/data/acpi/q35/DSDT.acpihmat-generic-x",
+"tests/data/acpi/q35/HMAT.acpihmat-generic-x",
+"tests/data/acpi/q35/SRAT.acpihmat-generic-x",
diff --git a/tests/data/acpi/q35/APIC.acpihmat-generic-x b/tests/data/acpi/q35/APIC.acpihmat-generic-x
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/q35/CEDT.acpihmat-generic-x b/tests/data/acpi/q35/CEDT.acpihmat-generic-x
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/q35/DSDT.acpihmat-generic-x b/tests/data/acpi/q35/DSDT.acpihmat-generic-x
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/q35/HMAT.acpihmat-generic-x b/tests/data/acpi/q35/HMAT.acpihmat-generic-x
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/tests/data/acpi/q35/SRAT.acpihmat-generic-x b/tests/data/acpi/q35/SRAT.acpihmat-generic-x
new file mode 100644
index 0000000000..e69de29bb2
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 35/46] bios-tables-test: Add complex SRAT / HMAT test for GI GP
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (33 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 34/46] bios-tables-test: Allow for new acpihmat-generic-x test data Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc) Michael S. Tsirkin
                   ` (12 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jonathan Cameron, Igor Mammedov, Ani Sinha

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Add a test with 6 nodes to exercise most interesting corner cases
of SRAT and HMAT generation including the new Generic Initiator
and Generic Port Affinity structures.  More details of the
set up in the following patch adding the table data.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240524100507.32106-6-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/qtest/bios-tables-test.c | 92 ++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index d1ff4db7a2..1651d06b7b 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1862,6 +1862,96 @@ static void test_acpi_q35_tcg_acpi_hmat_noinitiator(void)
     free_test_data(&data);
 }
 
+/* Test intended to hit corner cases of SRAT and HMAT */
+static void test_acpi_q35_tcg_acpi_hmat_generic_x(void)
+{
+    test_data data = {};
+
+    data.machine = MACHINE_Q35;
+    data.variant = ".acpihmat-generic-x";
+    test_acpi_one(" -machine hmat=on,cxl=on"
+                  " -smp 3,sockets=3"
+                  " -m 128M,maxmem=384M,slots=2"
+                  " -device virtio-rng-pci,id=gidev"
+                  " -device pxb-cxl,bus_nr=64,bus=pcie.0,id=cxl.1"
+                  " -object memory-backend-ram,size=64M,id=ram0"
+                  " -object memory-backend-ram,size=64M,id=ram1"
+                  " -numa node,nodeid=0,cpus=0,memdev=ram0"
+                  " -numa node,nodeid=1"
+                  " -object acpi-generic-initiator,id=gi0,pci-dev=gidev,node=1"
+                  " -numa node,nodeid=2"
+                  " -object acpi-generic-port,id=gp0,pci-bus=cxl.1,node=2"
+                  " -numa node,nodeid=3,cpus=1"
+                  " -numa node,nodeid=4,memdev=ram1"
+                  " -numa node,nodeid=5,cpus=2"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-latency,latency=10"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=800M"
+                  " -numa hmat-lb,initiator=0,target=2,hierarchy=memory,"
+                  "data-type=access-latency,latency=100"
+                  " -numa hmat-lb,initiator=0,target=2,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=200M"
+                  " -numa hmat-lb,initiator=0,target=4,hierarchy=memory,"
+                  "data-type=access-latency,latency=100"
+                  " -numa hmat-lb,initiator=0,target=4,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=200M"
+                  " -numa hmat-lb,initiator=0,target=5,hierarchy=memory,"
+                  "data-type=access-latency,latency=200"
+                  " -numa hmat-lb,initiator=0,target=5,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=400M"
+                  " -numa hmat-lb,initiator=1,target=0,hierarchy=memory,"
+                  "data-type=access-latency,latency=500"
+                  " -numa hmat-lb,initiator=1,target=0,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=100M"
+                  " -numa hmat-lb,initiator=1,target=2,hierarchy=memory,"
+                  "data-type=access-latency,latency=50"
+                  " -numa hmat-lb,initiator=1,target=2,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=400M"
+                  " -numa hmat-lb,initiator=1,target=4,hierarchy=memory,"
+                  "data-type=access-latency,latency=50"
+                  " -numa hmat-lb,initiator=1,target=4,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=800M"
+                  " -numa hmat-lb,initiator=1,target=5,hierarchy=memory,"
+                  "data-type=access-latency,latency=500"
+                  " -numa hmat-lb,initiator=1,target=5,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=100M"
+                  " -numa hmat-lb,initiator=3,target=0,hierarchy=memory,"
+                  "data-type=access-latency,latency=20"
+                  " -numa hmat-lb,initiator=3,target=0,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=400M"
+                  " -numa hmat-lb,initiator=3,target=2,hierarchy=memory,"
+                  "data-type=access-latency,latency=80"
+                  " -numa hmat-lb,initiator=3,target=2,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=200M"
+                  " -numa hmat-lb,initiator=3,target=4,hierarchy=memory,"
+                  "data-type=access-latency,latency=80"
+                  " -numa hmat-lb,initiator=3,target=4,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=200M"
+                  " -numa hmat-lb,initiator=3,target=5,hierarchy=memory,"
+                  "data-type=access-latency,latency=20"
+                  " -numa hmat-lb,initiator=3,target=5,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=400M"
+                  " -numa hmat-lb,initiator=5,target=0,hierarchy=memory,"
+                  "data-type=access-latency,latency=20"
+                  " -numa hmat-lb,initiator=5,target=0,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=400M"
+                  " -numa hmat-lb,initiator=5,target=2,hierarchy=memory,"
+                  "data-type=access-latency,latency=80"
+                  " -numa hmat-lb,initiator=5,target=4,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=200M"
+                  " -numa hmat-lb,initiator=5,target=4,hierarchy=memory,"
+                  "data-type=access-latency,latency=80"
+                  " -numa hmat-lb,initiator=5,target=2,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=200M"
+                  " -numa hmat-lb,initiator=5,target=5,hierarchy=memory,"
+                  "data-type=access-latency,latency=10"
+                  " -numa hmat-lb,initiator=5,target=5,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=800M",
+                  &data);
+    free_test_data(&data);
+}
+
 #ifdef CONFIG_POSIX
 static void test_acpi_erst(const char *machine)
 {
@@ -2304,6 +2394,8 @@ int main(int argc, char *argv[])
             qtest_add_func("acpi/q35/nohpet", test_acpi_q35_tcg_nohpet);
             qtest_add_func("acpi/q35/acpihmat-noinitiator",
                            test_acpi_q35_tcg_acpi_hmat_noinitiator);
+            qtest_add_func("acpi/q35/acpihmat-genericx",
+                           test_acpi_q35_tcg_acpi_hmat_generic_x);
 
             /* i386 does not support memory hotplug */
             if (strcmp(arch, "i386")) {
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (34 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 35/46] bios-tables-test: Add complex SRAT / HMAT test for GI GP Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-05 14:39   ` Richard Henderson
  2024-06-04 19:08 ` [PULL 37/46] scripts/update-linux-headers: Copy setup_data.h to correct directory Michael S. Tsirkin
                   ` (11 subsequent siblings)
  47 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jonathan Cameron, Igor Mammedov, Ani Sinha

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Given this is a new configuration, there are affects on APIC, CEDT
and DSDT, but the key elements are in SRAT (plus related data in
HMAT).  The configuration has node to exercise many different combinations.

0) CPUs + Memory
1) GI only
2) GP only
3) CPUS only
4) Memory only
5) CPUs + HP memory

GI node, GP Node, Memory only node, hotplug memory
only node, latency and bandwidth such that in Linux Access0
(any initiator) and Access1 (CPU initiators only) given different
answers.  Following cropped to remove details of each entry.

[000h 0000 004h]                   Signature : "SRAT"    [System Resource Affinity Table]

[030h 0048 001h]               Subtable Type : 00 [Processor Local APIC/SAPIC Affinity]
[032h 0050 001h]     Proximity Domain Low(8) : 00
[033h 0051 001h]                     Apic ID : 00

[040h 0064 001h]               Subtable Type : 00 [Processor Local APIC/SAPIC Affinity]
[042h 0066 001h]     Proximity Domain Low(8) : 03                                                                                                                                                                                                                                      [043h 0067 001h]                     Apic ID : 01

[050h 0080 001h]               Subtable Type : 00 [Processor Local APIC/SAPIC Affinity]
[052h 0082 001h]     Proximity Domain Low(8) : 05
[053h 0083 001h]                     Apic ID : 02

[060h 0096 001h]               Subtable Type : 01 [Memory Affinity]
[062h 0098 004h]            Proximity Domain : 00000000
[068h 0104 008h]                Base Address : 0000000000000000
[070h 0112 008h]              Address Length : 00000000000A0000

[088h 0136 001h]               Subtable Type : 01 [Memory Affinity]
[08Ah 0138 004h]            Proximity Domain : 00000000
[090h 0144 008h]                Base Address : 0000000000100000
[098h 0152 008h]              Address Length : 0000000003F00000
[0A8h 0168 008h]                   Reserved3 : 0000000000000000

[0B0h 0176 001h]               Subtable Type : 01 [Memory Affinity]
[0B2h 0178 004h]            Proximity Domain : 00000004
[0B8h 0184 008h]                Base Address : 0000000004000000
[0C0h 0192 008h]              Address Length : 0000000004000000

//Comment in hw/i386/aml-build.c on why these exist - not part of
//ACPI requirements.
[0D8h 0216 001h]               Subtable Type : 01 [Memory Affinity]
[0DAh 0218 004h]            Proximity Domain : 00000000
[0E0h 0224 008h]                Base Address : 0000000000000000
[0E8h 0232 008h]              Address Length : 0000000000000000

[100h 0256 001h]               Subtable Type : 01 [Memory Affinity]
[102h 0258 004h]            Proximity Domain : 00000000
[108h 0264 008h]                Base Address : 0000000000000000
[110h 0272 008h]              Address Length : 0000000000000000

[128h 0296 001h]               Subtable Type : 01 [Memory Affinity]
[12Ah 0298 004h]            Proximity Domain : 00000000
[130h 0304 008h]                Base Address : 0000000000000000
[138h 0312 008h]              Address Length : 0000000000000000

[150h 0336 001h]               Subtable Type : 01 [Memory Affinity]
[152h 0338 004h]            Proximity Domain : 00000000
[158h 0344 008h]                Base Address : 0000000000000000
[160h 0352 008h]              Address Length : 0000000000000000

[178h 0376 001h]               Subtable Type : 01 [Memory Affinity]
[17Ah 0378 004h]            Proximity Domain : 00000000
[180h 0384 008h]                Base Address : 0000000000000000
[188h 0392 008h]              Address Length : 0000000000000000
// End of strange empty Memory Affinity structures.

[1A0h 0416 001h]               Subtable Type : 05 [Generic Initiator Affinity]
[1A3h 0419 001h]          Device Handle Type : 01
[1A4h 0420 004h]            Proximity Domain : 00000001
[1A8h 0424 010h]               Device Handle : 00 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00

[1C0h 0448 001h]               Subtable Type : 06 [Generic Port Affinity]
[1C3h 0451 001h]          Device Handle Type : 00
[1C4h 0452 004h]            Proximity Domain : 00000002
[1C8h 0456 010h]               Device Handle : 41 43 50 49 30 30 31 36 40 00 00 00 00 00 00 00

[1E0h 0480 001h]               Subtable Type : 01 [Memory Affinity]
[1E2h 0482 004h]            Proximity Domain : 00000005
[1E8h 0488 008h]                Base Address : 0000000100000000
[1F0h 0496 008h]              Address Length : 0000000090000000
[1FCh 0508 004h]       Flags (decoded below) : 00000003
                                     Enabled : 1
                               Hot Pluggable : 1
                                Non-Volatile : 0

Example block from HMAT:
[0F0h 0240 002h]              Structure Type : 0001 [System Locality Latency and Bandwidth Information]                                                                                                                                                                                 [0F2h 0242 002h]                    Reserved : 0000                                                                                                                                                                                                                                     [0F4h 0244 004h]                      Length : 00000078                                                                                                                                                                                                                                 [0F8h 0248 001h]       Flags (decoded below) : 00                                                                                                                                                                                                                                                                   Memory Hierarchy : 0                                                                                                                                                                                                                                                           Use Minimum Transfer Size : 0                                                                                                                                                                                                                                                            Non-sequential Transfers : 0
[0F9h 0249 001h]                   Data Type : 03
[0FAh 0250 001h]       Minimum Transfer Size : 00
[0FBh 0251 001h]                   Reserved1 : 00
[0FCh 0252 004h] Initiator Proximity Domains # : 00000004
[100h 0256 004h]  Target Proximity Domains # : 00000006
[104h 0260 004h]                   Reserved2 : 00000000
[108h 0264 008h]             Entry Base Unit : 0000000000000004
[110h 0272 004h] Initiator Proximity Domain List : 00000000
[114h 0276 004h] Initiator Proximity Domain List : 00000001
[118h 0280 004h] Initiator Proximity Domain List : 00000003
[11Ch 0284 004h] Initiator Proximity Domain List : 00000005
[120h 0288 004h] Target Proximity Domain List : 00000000
[124h 0292 004h] Target Proximity Domain List : 00000001
[128h 0296 004h] Target Proximity Domain List : 00000002
[12Ch 0300 004h] Target Proximity Domain List : 00000003
[130h 0304 004h] Target Proximity Domain List : 00000004
[134h 0308 004h] Target Proximity Domain List : 00000005
[138h 0312 002h]                       Entry : 00C8
[13Ah 0314 002h]                       Entry : 0000
[13Ch 0316 002h]                       Entry : 0032
[13Eh 0318 002h]                       Entry : 0000
[140h 0320 002h]                       Entry : 0032
[142h 0322 002h]                       Entry : 0064
[144h 0324 002h]                       Entry : 0019
[146h 0326 002h]                       Entry : 0000
[148h 0328 002h]                       Entry : 0064
[14Ah 0330 002h]                       Entry : 0000
[14Ch 0332 002h]                       Entry : 00C8
[14Eh 0334 002h]                       Entry : 0019
[150h 0336 002h]                       Entry : 0064
[152h 0338 002h]                       Entry : 0000
[154h 0340 002h]                       Entry : 0032
[156h 0342 002h]                       Entry : 0000
[158h 0344 002h]                       Entry : 0032
[15Ah 0346 002h]                       Entry : 0064
[15Ch 0348 002h]                       Entry : 0064
[15Eh 0350 002h]                       Entry : 0000
[160h 0352 002h]                       Entry : 0032
[162h 0354 002h]                       Entry : 0000
[164h 0356 002h]                       Entry : 0032
[166h 0358 002h]                       Entry : 00C8

Note the zeros represent entries where the target node has no
memory.  These could be surpressed but it isn't 'wrong' to provide
them and it is (probably) permissible under ACPI to hotplug memory
into these nodes later.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240524100507.32106-7-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h |   5 -----
 tests/data/acpi/q35/APIC.acpihmat-generic-x | Bin 0 -> 136 bytes
 tests/data/acpi/q35/CEDT.acpihmat-generic-x | Bin 0 -> 68 bytes
 tests/data/acpi/q35/DSDT.acpihmat-generic-x | Bin 0 -> 10400 bytes
 tests/data/acpi/q35/HMAT.acpihmat-generic-x | Bin 0 -> 360 bytes
 tests/data/acpi/q35/SRAT.acpihmat-generic-x | Bin 0 -> 520 bytes
 6 files changed, 5 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index a5aa801c99..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,6 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/q35/APIC.acpihmat-generic-x",
-"tests/data/acpi/q35/CEDT.acpihmat-generic-x",
-"tests/data/acpi/q35/DSDT.acpihmat-generic-x",
-"tests/data/acpi/q35/HMAT.acpihmat-generic-x",
-"tests/data/acpi/q35/SRAT.acpihmat-generic-x",
diff --git a/tests/data/acpi/q35/APIC.acpihmat-generic-x b/tests/data/acpi/q35/APIC.acpihmat-generic-x
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..317ddb3fbed94e4f49a87976fdc7f23b1a6c3fdc 100644
GIT binary patch
literal 136
zcmXxb%Mn0O3`XITcnlXkq!sRl6*D%L%2Ae5RD#JhHu=utPrpp@0J43U<G9+eEz!(O
p0B;wrJ6XY}$fv3+t#8iTuLjWcqk*CTI<LC^D}=wACRJWOATL<W3;_TD

literal 0
HcmV?d00001

diff --git a/tests/data/acpi/q35/CEDT.acpihmat-generic-x b/tests/data/acpi/q35/CEDT.acpihmat-generic-x
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..31c9011663639b4a0f4816f3b2b06398f94682f7 100644
GIT binary patch
literal 68
zcmZ>EbqR4{U|?Xhb@F%i2v%^42yj+VP*7lGU|;~TK{Nw{0)qoc4VVoE6CiAe2mn)-
B2LS*8

literal 0
HcmV?d00001

diff --git a/tests/data/acpi/q35/DSDT.acpihmat-generic-x b/tests/data/acpi/q35/DSDT.acpihmat-generic-x
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..a04451374d1b95764d34fbda1ec791a89ffdcc46 100644
GIT binary patch
literal 10400
zcmcJVO>7%UcE?}wLuwXVQj=<ZSe7lqGn1L@Ze}R?V-`s^W3rnfDY2<Go3b@%18jXb
zk?aYk;~58|F$_pDu&dP}$qd(v<VFv;o4sugL2f||B!}c?AP6!*fB-q<vd1ifb%0<P
zyy|YM9z_A`Lv*03Ue*7-`c-$=>yI)Wv)OqEfYpm@M#ae$H*6!9%jFmVU>W~w)x=$J
z56oh@#nQ0^a}U#^Y!=H!MgNXjTraaf-}Qd%d7u8kW1Bu~eWg9y+Vr0uds~2=9&P!Y
zV?kS;Y&pf2dsLmUjcQ(LRjQ_1c*zx|Wdx<T(ps;Ypw~uQYmSjPc>soaXR%XHws$j=
zTyw2|yZWnz`I|q!^J(G5_x|<wPag1uVcuCfdB7FJymQsZAMdfpFMA#B%l007f^yNe
z*qQvYEt&(HhAqB_mgj0<LRX932l{HVWIEPZ=SKTJ_?b8ZuQT^$Fo2iu|I>du*FU$u
znE8cu;qMKr>vcw-?eSRG^ZDVi2hjC8^X>b5)XyekFb1Cg^!OuyO@A|JbdSDkHVTFy
zAG7Or-pDV4(lQ!OW`WQDU<N?NDaXIR>h*em7nZes&}#a-IC(g6?tyJMO_qM4a>YH=
zuwvLobv<VrRWs)vIHi>qOaG(F6YgOeD@HM2TFJQw#k$pC>6d{FD@L)tZsqK%xsh{O
zv1Y7-{+4Z&R$}fOSnKutUF_lSvpql`?%QbYg8uD7)!Z1?3w`<c<Ar@ay_mAae3~P#
zHKDNSV-PWcPme!hu+^FS@=1FO+FNR`*XzNPO}6=DbD#0i(+TdpMP`4%fxi_T2F0!#
z%b=?*yJ|3h7i-$lZHDu!8NU$crrl`T+0{=Dvgt&+*MnMpwFE(#)#|IukX6ynW3~Ef
zIjY2K_0<Yw#}aC<#{f0U{TiyukR1=&gsyU0MP(`M8xzWMzlO@FYb;QfLS+nuGKvxx
z%4h^}p^Uo51Lbn4j0egnN<t{35hR2%>Y50Y(clw-GK!KUoJqo&q@2k>8ATZ(oFjyD
zgmR8h&QZcSN;pR;=P2b=2&Y0g70Rhl&J^KH5zZ9lOi@lwI62|ul#^4=G~rAW&NSsr
zQ%;p|s)SReoGRrUBb;M|bBuD1QO<F~IZimoDd#xloFJSNgmZ#&PEbybaB758qnsM$
z%n;5D;mlCZ4CTxc&Me`~QqC;poFtr+gmaQ|PEt;taO#9pr<^+FoFbf4gma2=PEpQj
z!Z}Sirzz(&<(wg$GlX-7a?ViBS;9F>IA<y6Eaki)l!b8BydadtaIw1(D2>x9uD52`
zcaA9Mh;oi9=Yp7#@*+`QB+83ad66pTiE^GO=c#g@DlZY`C8E4Um6xdUvQS2=?`5Hk
z*7(bTGFsy=2g+!TUm!6TNX!Kqb0La}3A7+3(6UqQC(yD}Jx!oxr+S(|%T6^4$^bP{
zYGZ(^XnYJf>Y}X3fTJ=(iwR|v+G0W(b;Sf4bQPHx6Q~p=E|k#-;zB7W(4ebOVge0D
zB+`9CD5DW1gfg03OrX->F@Z{)Ny3>VoR~l*PE4Q@=Lq2(A)J^%B~DDB66Yx493`BX
zKqXF0pc1D-I2FQ)2~^_51S)Z+2xp3LVgi*oF@Z{)oN#i&i3wEV!~`mFrU_@7aAE?L
zI5B}roGRf|2`45{i4zm3#5qPd#|S4TP>B;0sKhx=IL8SmCQykJ6R5;FK{zJ}Cnivd
z6BDS!sS!?%aAE?LI5B}roEgHIA)J^%B~DDB5@(ihW(g-IP>B;0sKhx*I421wCQykJ
z6R5<g6Hc9QVgi*oF@Z{)Q-pJhaAE?LI5B}roYRDJns8zQl{hhhN}Mx<bB1tY0+l#1
zfl8dSgmac~Vgi*oF@Z{)7lcwupk=2jC(yD}l@n;$sYYpVjwt6u%t(m|R8nFBm6R8W
z@*+`U0+p1QKqcioQO*-3CQwO<2~<*EBFaldi3wCvVgi+vmxVG~<1Y(kw8mosmF5-`
zs5G|=B<2E%i3wDSi3wE1)aP1uqX`|gEgn|8_(1x61pgKE$?7eq$vUaW{9qTlJk{lC
z{aW>w(MTzFqmj>y^Yjl#ASGTU?86D#c+ze(s{Z3$J^@(jqR~C7jn?X`h3HL;IuX2o
zQCfDx$xQRfAC5yRo&*360BHc&2YsUDGz-whA=r&u1wGR;s}4NDGCvTO34Eb}J&cB^
z&$OI|!%lX?F<^MKhtZ1z(b;Q6FG|siB6@Kkdhwj-Mfb=;_TXKNT8P+r>@~t0Qg}mz
zHwMBR=Y%)H@a6zKz5)BS=iZdUn<Bh95Z*i|ycvcs4Tev=M);Bxz9hnz2Ev!l3114s
zFAs)KzDD?EDg3eszdR6r`JC{}VfgZ3_>tELUzWm`MfmbS`0_d7%VGG+VEED32w#!H
zS48;AK={fz;VbTul_)#a!K<EYLHKC#(8T2)yqu&qeMDUH`iQu$LwAFhG0#wUnIY~1
z+fa9BuFEWSmmT6Runl#0<~p0C?k0!03v5H(ow;sw>P{cxF0c)Ccjh{nqVA@KxC?AU
z-JLnFr>VQ?A?^a(P<Lm}(;4b+W{A7MM%`(a61@H|DHXfajLv=*zVvqwDt6NX72xGJ
zX-K$-8rCOom+EHDZkVl{enVs)A1$7brp?biu6Z5xi?#=CFWu{Pp)>wPdlP=NsltE!
zcd`pzJ`tS389x5~6uwS>`3)p6k7;_VYzN12tYFyHd_I2zwAxBA#0tOrUOal=ZZ}F!
zG+JCE)~n{2VXx;ZAAjn9^ym+A5AS{Q@uP>^_dWra8Fp<oylBQaFSD=JuT`e*gL_~%
z@SXkFDp%dZcd$}`oMD&h%&68KW;CqZ%qXqo@n@Mq+%POtaSyZDGj2GgS`KoCS!m{T
zwN<TG6(7*(9<6+ES9};S?Db%fA<beQhY@^MN$T$y&E*ec9p2uB?;H7ieh-Tbi;H`m
zbbFV_u)MUVS4Iq{SxC8W+<h-0^CK624j42$rJ9c)NZNx9Naphi&wcw$OKr95=~fbH
zUoQ|FbX8i(5B0B?rwzxbE9YP+M!j?=8|(S7?X#V)O4;@f^I^wlo)4Q}S?Zs)J@#yy
z$9?ep?ZKFWc+hBAW@qNvpw%oiw|~+}wKsXsXFEUfhqZU!JTrB8KKR?mILJ+SwCOYN
z`TL#gXS#_qrn&qf4)n~-G_2b`^B!&E=eSZL_#TH2w_sann7{2V4u?DAt6Fy+Z4IAi
zD`l)XP4%349)7x<@o!kBzc*xVd>$LLV%PhFMzhcyIy(31@8P)~d_>b#Ts@R7p5CCe
ztU@!B<_ahPc$=(`x7M6y8ew2}@4nKqH!N_U))IJy1fS$&2TXFnm~fD8g%@w0NzPd0
zjK|%lYv^q$CdQ+Woel@@Is4;@#e_Sx?~cdxiT>bttnV<XU%dM+u3@cGehI?m3ReL<
z^W7sWDXm~u#SSu{_~=;3-+*8hv+n)j#}D^&yJr@%V9^vS7;xY)iz`^g#?Obq`K)_X
zC<V)oRk5YdnWvu3s%7PN?)~u|E{lp~)*C$L9=<CbIvf*vj9ZOjEoU`~m7LY6VX0Qh
zS@uRN=N^7%NdE%d<W|XPF$=E~3$FqTKX|%_Zw~1lw{YpS)=IZ?)|zAF^huFKI;l2}
z2{1#XU1o_-pM??2G3rvA51#u5w)J{n{q8dl6~2cKJCp4odUN?hkNILK&!;<jI~WdH
z@uI}qLyvjx+vudr^xv{D2}o}Jzt{!gFx?hR!HOw*dwOBkG8#@Q8bZ`pu^Kn7Wv=r4
z$+rR2c=(OD|Mv}k9L4XW;a^zo?ZXKQ`M6cB*Mef!s@5w(@i%Iv$oj2XDY8ymg^jT9
zoK@JU_;{F5goPS9^6_w?01GvA@(|5fKrL28GpalKGqhp*L##7l%m^d<v~R&u4clrR
zbF__SP^`w)rT!=KJ~E;rB_ayEt4o|))q3j$`}1+Dun`ugYo&V2$H>_1i(wQtT0V{w
zo5CZ6vVQT#HT_M?-muu<0r2tcj1H=&zhkXA&BU2bdwA0M_wlR?T<TY>(n>ydrlUB0
zgD&ooHDcNMd~kK<Kg(kh4|YkY2hsgKo)mZw5<Q9IIThA%)#~DyMLmPdMjS_n)!n`O
z`h-|ja2Uc8hFk21!_5f)f8q3byfVVO5X21cLM%JKxF@Y<SEJPo7Z<!FwYAH4gZr}O
z%SP>5Fg^UH(kd3~_^(p!-}T_iW>n3r8WTH9@F%V~>$VMw(QMQJ%Z4Z$px1EKm$sRx
zUku-85?Is!@?X1q|8Qdc?e~5(r+x6x|9kH4lZ9V%_X%!Y!J&z5D>%AfKT5|;oEuOz
z%sbak9*FHL*HL#3d9ZNZXf_Po$a0<GPxry|ak{PLW=gzJ^0To6ahp>geYVLD{LeRe
z47U8o4>z}f$6>ppwYR~~rh+X2ChVYr^$i8s>15j5Jl3}qLrc7GF<{4kvcJVOz^xO<
zI{^OUO&*8i9R~i+v4=M`T`x#6Q#^Y-1~C-?Zo*B#Y`hQJS~57?<)pDuv&aE{AM})1
z!ttfo`z_P4)J{H_ub=+Uy<h(OU;fd}pWgp#HNEyHjZUh4AG(=1W1aE#eb&t?F_ull
zfN9oLkiCL!NVxrGCC&dP8VJA#VmM{?N`0j=<#-HOu;~0s1$d?Q>tka1`K8#m6D~>|
zdidEv{KbIm=uvvT-p|9PpOKaqy<YE6!<Lt4R`nNnLx8)1#Pf^vr-%nH_sB?;?NYFv
zY9PO(U3xrGaN>AcO76Ll+z8Y5!97|@*cH1GOf~LII!)!aQ62KAbJo`j^@Yl2_d)y3
zyV!cQl^sdoy^+##uC=mCg2#AF)UIO-a%jJEdKlcJcbD%N!ISXVt&+8#&*zthMA4F3
z5+A6v@+Gk*XlwIgPt;M`$>&>2t6(<3J^GfD|87uG@s!e)ma}L=R}_|_!c{D|M-$Fs
zC{zJrf#Fmf$GOp+<x|T7KFn1cw_@Dms%9jDrRngu-;3iwuqU=Me8s8Qjax<hG4RJ!
zAZIsjm9V^wKV|%>DDKg<)uP#omfm<gnA`Blb(T++nFk&)R%Sb}1B_jX0Uok9t+n;O
zYGRV86rQ|-`^Ma(rJCJbjC?188$U++YJ9BBK8t@ASClIU!t;``X7x2mCi24&&8zsU
zI4D;%p?Ry(H2R}Shj);|wGTcp<q9wy&V;jE2#-t*rxFdt&=7|O&{Vvq)kj*T{384x
DTa5vi

literal 0
HcmV?d00001

diff --git a/tests/data/acpi/q35/HMAT.acpihmat-generic-x b/tests/data/acpi/q35/HMAT.acpihmat-generic-x
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0e5765f6ee4c07638c70647ae145e968718b67cd 100644
GIT binary patch
literal 360
zcma)$u?oU46h%)`E5*sdrCT<g^9Qy|7pb6wAJD;1aP;%?S#r}?8U(=$A@?0_a^G+{
z-=7Zr*p2;g3*F<|hY*4T<aIAP0p<Kl%1LivWByzE=Veftt@-_NO)66XwIR*knBIts
t?eaMgjn%}QYk&q{c(?Xe^KMITx#vH<336X#q6H=((dJuwh>OiW@d0Cl4*>uG

literal 0
HcmV?d00001

diff --git a/tests/data/acpi/q35/SRAT.acpihmat-generic-x b/tests/data/acpi/q35/SRAT.acpihmat-generic-x
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..cbf45a31ba6d2c06f2bb0e203d4b40100522ac08 100644
GIT binary patch
literal 520
zcmWFzatz^MVqjoQbMklg2v%^42yj+VP*7lGU|;~TK{N=%fdD$6nGsc<l?j>8r~%gr
z1za!&in$1N0#Nx6%rJ$h=CQzpVGJ0JrVgeIKS0=v9}JW_Rs{xV_`<>k0$^dnroh0!
i#K6Gd=p5i_U|?wGfF{qV!3y^nRL=yM06c&h7#IL5atr|g

literal 0
HcmV?d00001

-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 37/46] scripts/update-linux-headers: Copy setup_data.h to correct directory
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (35 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc) Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 38/46] linux-headers: update to 6.10-rc1 Michael S. Tsirkin
                   ` (10 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Weißschuh, Thomas Huth, Cornelia Huck,
	Paolo Bonzini

From: Thomas Weißschuh <thomas@t-8ch.de>

Add the missing "include/" path component, so the files ends up in the
correct place like the other headers.

Fixes: 66210a1a30f2 ("scripts/update-linux-headers: Add setup_data.h to import list")
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240527-pvpanic-shutdown-v8-1-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 8963c39189..a148793bd5 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -158,7 +158,7 @@ for arch in $ARCHLIST; do
         cp_portable "$hdrdir/bootparam.h" \
                     "$output/include/standard-headers/asm-$arch"
         cp_portable "$hdrdir/include/asm/setup_data.h" \
-                    "$output/standard-headers/asm-x86"
+                    "$output/include/standard-headers/asm-x86"
     fi
     if [ $arch = riscv ]; then
         cp "$hdrdir/include/asm/ptrace.h" "$output/linux-headers/asm-riscv/"
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 38/46] linux-headers: update to 6.10-rc1
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (36 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 37/46] scripts/update-linux-headers: Copy setup_data.h to correct directory Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 39/46] hw/misc/pvpanic: centralize definition of supported events Michael S. Tsirkin
                   ` (9 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Weißschuh, Cornelia Huck,
	Paolo Bonzini, kvm

From: Thomas Weißschuh <thomas@t-8ch.de>

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240527-pvpanic-shutdown-v8-2-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/standard-headers/linux/ethtool.h    |  55 ++++++++
 include/standard-headers/linux/pci_regs.h   |   6 +
 include/standard-headers/linux/pvpanic.h    |   7 +-
 include/standard-headers/linux/virtio_bt.h  |   1 -
 include/standard-headers/linux/virtio_mem.h |   2 +
 include/standard-headers/linux/virtio_net.h | 143 ++++++++++++++++++++
 linux-headers/asm-generic/unistd.h          |   5 +-
 linux-headers/asm-loongarch/kvm.h           |   4 +
 linux-headers/asm-mips/unistd_n32.h         |   1 +
 linux-headers/asm-mips/unistd_n64.h         |   1 +
 linux-headers/asm-mips/unistd_o32.h         |   1 +
 linux-headers/asm-powerpc/unistd_32.h       |   1 +
 linux-headers/asm-powerpc/unistd_64.h       |   1 +
 linux-headers/asm-riscv/kvm.h               |   1 +
 linux-headers/asm-s390/unistd_32.h          |   1 +
 linux-headers/asm-s390/unistd_64.h          |   1 +
 linux-headers/asm-x86/kvm.h                 |   4 +-
 linux-headers/asm-x86/unistd_32.h           |   1 +
 linux-headers/asm-x86/unistd_64.h           |   1 +
 linux-headers/asm-x86/unistd_x32.h          |   2 +
 linux-headers/linux/kvm.h                   |   4 +-
 linux-headers/linux/stddef.h                |   8 ++
 linux-headers/linux/vhost.h                 |  15 +-
 23 files changed, 252 insertions(+), 14 deletions(-)

diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
index 01503784d2..b0b4b68410 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -752,6 +752,61 @@ enum ethtool_module_power_mode {
 	ETHTOOL_MODULE_POWER_MODE_HIGH,
 };
 
+/**
+ * enum ethtool_pse_types - Types of PSE controller.
+ * @ETHTOOL_PSE_UNKNOWN: Type of PSE controller is unknown
+ * @ETHTOOL_PSE_PODL: PSE controller which support PoDL
+ * @ETHTOOL_PSE_C33: PSE controller which support Clause 33 (PoE)
+ */
+enum ethtool_pse_types {
+	ETHTOOL_PSE_UNKNOWN =	1 << 0,
+	ETHTOOL_PSE_PODL =	1 << 1,
+	ETHTOOL_PSE_C33 =	1 << 2,
+};
+
+/**
+ * enum ethtool_c33_pse_admin_state - operational state of the PoDL PSE
+ *	functions. IEEE 802.3-2022 30.9.1.1.2 aPSEAdminState
+ * @ETHTOOL_C33_PSE_ADMIN_STATE_UNKNOWN: state of PSE functions is unknown
+ * @ETHTOOL_C33_PSE_ADMIN_STATE_DISABLED: PSE functions are disabled
+ * @ETHTOOL_C33_PSE_ADMIN_STATE_ENABLED: PSE functions are enabled
+ */
+enum ethtool_c33_pse_admin_state {
+	ETHTOOL_C33_PSE_ADMIN_STATE_UNKNOWN = 1,
+	ETHTOOL_C33_PSE_ADMIN_STATE_DISABLED,
+	ETHTOOL_C33_PSE_ADMIN_STATE_ENABLED,
+};
+
+/**
+ * enum ethtool_c33_pse_pw_d_status - power detection status of the PSE.
+ *	IEEE 802.3-2022 30.9.1.1.3 aPoDLPSEPowerDetectionStatus:
+ * @ETHTOOL_C33_PSE_PW_D_STATUS_UNKNOWN: PSE status is unknown
+ * @ETHTOOL_C33_PSE_PW_D_STATUS_DISABLED: The enumeration "disabled"
+ *	indicates that the PSE State diagram is in the state DISABLED.
+ * @ETHTOOL_C33_PSE_PW_D_STATUS_SEARCHING: The enumeration "searching"
+ *	indicates the PSE State diagram is in a state other than those
+ *	listed.
+ * @ETHTOOL_C33_PSE_PW_D_STATUS_DELIVERING: The enumeration
+ *	"deliveringPower" indicates that the PSE State diagram is in the
+ *	state POWER_ON.
+ * @ETHTOOL_C33_PSE_PW_D_STATUS_TEST: The enumeration "test" indicates that
+ *	the PSE State diagram is in the state TEST_MODE.
+ * @ETHTOOL_C33_PSE_PW_D_STATUS_FAULT: The enumeration "fault" indicates that
+ *	the PSE State diagram is in the state TEST_ERROR.
+ * @ETHTOOL_C33_PSE_PW_D_STATUS_OTHERFAULT: The enumeration "otherFault"
+ *	indicates that the PSE State diagram is in the state IDLE due to
+ *	the variable error_condition = true.
+ */
+enum ethtool_c33_pse_pw_d_status {
+	ETHTOOL_C33_PSE_PW_D_STATUS_UNKNOWN = 1,
+	ETHTOOL_C33_PSE_PW_D_STATUS_DISABLED,
+	ETHTOOL_C33_PSE_PW_D_STATUS_SEARCHING,
+	ETHTOOL_C33_PSE_PW_D_STATUS_DELIVERING,
+	ETHTOOL_C33_PSE_PW_D_STATUS_TEST,
+	ETHTOOL_C33_PSE_PW_D_STATUS_FAULT,
+	ETHTOOL_C33_PSE_PW_D_STATUS_OTHERFAULT,
+};
+
 /**
  * enum ethtool_podl_pse_admin_state - operational state of the PoDL PSE
  *	functions. IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
diff --git a/include/standard-headers/linux/pci_regs.h b/include/standard-headers/linux/pci_regs.h
index a39193213f..94c00996e6 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -1144,8 +1144,14 @@
 #define PCI_DOE_DATA_OBJECT_HEADER_2_LENGTH		0x0003ffff
 
 #define PCI_DOE_DATA_OBJECT_DISC_REQ_3_INDEX		0x000000ff
+#define PCI_DOE_DATA_OBJECT_DISC_REQ_3_VER		0x0000ff00
 #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_VID		0x0000ffff
 #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_PROTOCOL		0x00ff0000
 #define PCI_DOE_DATA_OBJECT_DISC_RSP_3_NEXT_INDEX	0xff000000
 
+/* Compute Express Link (CXL r3.1, sec 8.1.5) */
+#define PCI_DVSEC_CXL_PORT				3
+#define PCI_DVSEC_CXL_PORT_CTL				0x0c
+#define PCI_DVSEC_CXL_PORT_CTL_UNMASK_SBR		0x00000001
+
 #endif /* LINUX_PCI_REGS_H */
diff --git a/include/standard-headers/linux/pvpanic.h b/include/standard-headers/linux/pvpanic.h
index 54b7485390..b115094431 100644
--- a/include/standard-headers/linux/pvpanic.h
+++ b/include/standard-headers/linux/pvpanic.h
@@ -3,7 +3,10 @@
 #ifndef __PVPANIC_H__
 #define __PVPANIC_H__
 
-#define PVPANIC_PANICKED	(1 << 0)
-#define PVPANIC_CRASH_LOADED	(1 << 1)
+#include "standard-headers/linux/const.h"
+
+#define PVPANIC_PANICKED	_BITUL(0)
+#define PVPANIC_CRASH_LOADED	_BITUL(1)
+#define PVPANIC_SHUTDOWN	_BITUL(2)
 
 #endif /* __PVPANIC_H__ */
diff --git a/include/standard-headers/linux/virtio_bt.h b/include/standard-headers/linux/virtio_bt.h
index a11ecc3f92..6f0dee7e32 100644
--- a/include/standard-headers/linux/virtio_bt.h
+++ b/include/standard-headers/linux/virtio_bt.h
@@ -13,7 +13,6 @@
 
 enum virtio_bt_config_type {
 	VIRTIO_BT_CONFIG_TYPE_PRIMARY	= 0,
-	VIRTIO_BT_CONFIG_TYPE_AMP	= 1,
 };
 
 enum virtio_bt_config_vendor {
diff --git a/include/standard-headers/linux/virtio_mem.h b/include/standard-headers/linux/virtio_mem.h
index 18c74c527c..6bfa41bd8b 100644
--- a/include/standard-headers/linux/virtio_mem.h
+++ b/include/standard-headers/linux/virtio_mem.h
@@ -90,6 +90,8 @@
 #define VIRTIO_MEM_F_ACPI_PXM		0
 /* unplugged memory must not be accessed */
 #define VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE	1
+/* plugged memory will remain plugged when suspending+resuming */
+#define VIRTIO_MEM_F_PERSISTENT_SUSPEND		2
 
 
 /* --- virtio-mem: guest -> host requests --- */
diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
index 0f88417742..fc594fe5fc 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -56,6 +56,7 @@
 #define VIRTIO_NET_F_MQ	22	/* Device supports Receive Flow
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
+#define VIRTIO_NET_F_DEVICE_STATS 50	/* Device can provide device-level statistics. */
 #define VIRTIO_NET_F_VQ_NOTF_COAL 52	/* Device supports virtqueue notification coalescing */
 #define VIRTIO_NET_F_NOTF_COAL	53	/* Device supports notifications coalescing */
 #define VIRTIO_NET_F_GUEST_USO4	54	/* Guest can handle USOv4 in. */
@@ -406,4 +407,146 @@ struct  virtio_net_ctrl_coal_vq {
 	struct virtio_net_ctrl_coal coal;
 };
 
+/*
+ * Device Statistics
+ */
+#define VIRTIO_NET_CTRL_STATS         8
+#define VIRTIO_NET_CTRL_STATS_QUERY   0
+#define VIRTIO_NET_CTRL_STATS_GET     1
+
+struct virtio_net_stats_capabilities {
+
+#define VIRTIO_NET_STATS_TYPE_CVQ       (1ULL << 32)
+
+#define VIRTIO_NET_STATS_TYPE_RX_BASIC  (1ULL << 0)
+#define VIRTIO_NET_STATS_TYPE_RX_CSUM   (1ULL << 1)
+#define VIRTIO_NET_STATS_TYPE_RX_GSO    (1ULL << 2)
+#define VIRTIO_NET_STATS_TYPE_RX_SPEED  (1ULL << 3)
+
+#define VIRTIO_NET_STATS_TYPE_TX_BASIC  (1ULL << 16)
+#define VIRTIO_NET_STATS_TYPE_TX_CSUM   (1ULL << 17)
+#define VIRTIO_NET_STATS_TYPE_TX_GSO    (1ULL << 18)
+#define VIRTIO_NET_STATS_TYPE_TX_SPEED  (1ULL << 19)
+
+	uint64_t supported_stats_types[1];
+};
+
+struct virtio_net_ctrl_queue_stats {
+	struct {
+		uint16_t vq_index;
+		uint16_t reserved[3];
+		uint64_t types_bitmap[1];
+	} stats[1];
+};
+
+struct virtio_net_stats_reply_hdr {
+#define VIRTIO_NET_STATS_TYPE_REPLY_CVQ       32
+
+#define VIRTIO_NET_STATS_TYPE_REPLY_RX_BASIC  0
+#define VIRTIO_NET_STATS_TYPE_REPLY_RX_CSUM   1
+#define VIRTIO_NET_STATS_TYPE_REPLY_RX_GSO    2
+#define VIRTIO_NET_STATS_TYPE_REPLY_RX_SPEED  3
+
+#define VIRTIO_NET_STATS_TYPE_REPLY_TX_BASIC  16
+#define VIRTIO_NET_STATS_TYPE_REPLY_TX_CSUM   17
+#define VIRTIO_NET_STATS_TYPE_REPLY_TX_GSO    18
+#define VIRTIO_NET_STATS_TYPE_REPLY_TX_SPEED  19
+	uint8_t type;
+	uint8_t reserved;
+	uint16_t vq_index;
+	uint16_t reserved1;
+	uint16_t size;
+};
+
+struct virtio_net_stats_cvq {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	uint64_t command_num;
+	uint64_t ok_num;
+};
+
+struct virtio_net_stats_rx_basic {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	uint64_t rx_notifications;
+
+	uint64_t rx_packets;
+	uint64_t rx_bytes;
+
+	uint64_t rx_interrupts;
+
+	uint64_t rx_drops;
+	uint64_t rx_drop_overruns;
+};
+
+struct virtio_net_stats_tx_basic {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	uint64_t tx_notifications;
+
+	uint64_t tx_packets;
+	uint64_t tx_bytes;
+
+	uint64_t tx_interrupts;
+
+	uint64_t tx_drops;
+	uint64_t tx_drop_malformed;
+};
+
+struct virtio_net_stats_rx_csum {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	uint64_t rx_csum_valid;
+	uint64_t rx_needs_csum;
+	uint64_t rx_csum_none;
+	uint64_t rx_csum_bad;
+};
+
+struct virtio_net_stats_tx_csum {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	uint64_t tx_csum_none;
+	uint64_t tx_needs_csum;
+};
+
+struct virtio_net_stats_rx_gso {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	uint64_t rx_gso_packets;
+	uint64_t rx_gso_bytes;
+	uint64_t rx_gso_packets_coalesced;
+	uint64_t rx_gso_bytes_coalesced;
+};
+
+struct virtio_net_stats_tx_gso {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	uint64_t tx_gso_packets;
+	uint64_t tx_gso_bytes;
+	uint64_t tx_gso_segments;
+	uint64_t tx_gso_segments_bytes;
+	uint64_t tx_gso_packets_noseg;
+	uint64_t tx_gso_bytes_noseg;
+};
+
+struct virtio_net_stats_rx_speed {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	/* rx_{packets,bytes}_allowance_exceeded are too long. So rename to
+	 * short name.
+	 */
+	uint64_t rx_ratelimit_packets;
+	uint64_t rx_ratelimit_bytes;
+};
+
+struct virtio_net_stats_tx_speed {
+	struct virtio_net_stats_reply_hdr hdr;
+
+	/* tx_{packets,bytes}_allowance_exceeded are too long. So rename to
+	 * short name.
+	 */
+	uint64_t tx_ratelimit_packets;
+	uint64_t tx_ratelimit_bytes;
+};
+
 #endif /* _LINUX_VIRTIO_NET_H */
diff --git a/linux-headers/asm-generic/unistd.h b/linux-headers/asm-generic/unistd.h
index 75f00965ab..d983c48a3b 100644
--- a/linux-headers/asm-generic/unistd.h
+++ b/linux-headers/asm-generic/unistd.h
@@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
 #define __NR_lsm_list_modules 461
 __SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
 
+#define __NR_mseal 462
+__SYSCALL(__NR_mseal, sys_mseal)
+
 #undef __NR_syscalls
-#define __NR_syscalls 462
+#define __NR_syscalls 463
 
 /*
  * 32 bit systems traditionally used different
diff --git a/linux-headers/asm-loongarch/kvm.h b/linux-headers/asm-loongarch/kvm.h
index 109785922c..f9abef3823 100644
--- a/linux-headers/asm-loongarch/kvm.h
+++ b/linux-headers/asm-loongarch/kvm.h
@@ -17,6 +17,8 @@
 #define KVM_COALESCED_MMIO_PAGE_OFFSET	1
 #define KVM_DIRTY_LOG_PAGE_OFFSET	64
 
+#define KVM_GUESTDBG_USE_SW_BP		0x00010000
+
 /*
  * for KVM_GET_REGS and KVM_SET_REGS
  */
@@ -72,6 +74,8 @@ struct kvm_fpu {
 
 #define KVM_REG_LOONGARCH_COUNTER	(KVM_REG_LOONGARCH_KVM | KVM_REG_SIZE_U64 | 1)
 #define KVM_REG_LOONGARCH_VCPU_RESET	(KVM_REG_LOONGARCH_KVM | KVM_REG_SIZE_U64 | 2)
+/* Debugging: Special instruction for software breakpoint */
+#define KVM_REG_LOONGARCH_DEBUG_INST	(KVM_REG_LOONGARCH_KVM | KVM_REG_SIZE_U64 | 3)
 
 #define LOONGARCH_REG_SHIFT		3
 #define LOONGARCH_REG_64(TYPE, REG)	(TYPE | KVM_REG_SIZE_U64 | (REG << LOONGARCH_REG_SHIFT))
diff --git a/linux-headers/asm-mips/unistd_n32.h b/linux-headers/asm-mips/unistd_n32.h
index ce2e050a9b..fc93b3be30 100644
--- a/linux-headers/asm-mips/unistd_n32.h
+++ b/linux-headers/asm-mips/unistd_n32.h
@@ -390,5 +390,6 @@
 #define __NR_lsm_get_self_attr (__NR_Linux + 459)
 #define __NR_lsm_set_self_attr (__NR_Linux + 460)
 #define __NR_lsm_list_modules (__NR_Linux + 461)
+#define __NR_mseal (__NR_Linux + 462)
 
 #endif /* _ASM_UNISTD_N32_H */
diff --git a/linux-headers/asm-mips/unistd_n64.h b/linux-headers/asm-mips/unistd_n64.h
index 5bfb3733ff..e72a3eb2c9 100644
--- a/linux-headers/asm-mips/unistd_n64.h
+++ b/linux-headers/asm-mips/unistd_n64.h
@@ -366,5 +366,6 @@
 #define __NR_lsm_get_self_attr (__NR_Linux + 459)
 #define __NR_lsm_set_self_attr (__NR_Linux + 460)
 #define __NR_lsm_list_modules (__NR_Linux + 461)
+#define __NR_mseal (__NR_Linux + 462)
 
 #endif /* _ASM_UNISTD_N64_H */
diff --git a/linux-headers/asm-mips/unistd_o32.h b/linux-headers/asm-mips/unistd_o32.h
index 02eaecd020..b86eb0786c 100644
--- a/linux-headers/asm-mips/unistd_o32.h
+++ b/linux-headers/asm-mips/unistd_o32.h
@@ -436,5 +436,6 @@
 #define __NR_lsm_get_self_attr (__NR_Linux + 459)
 #define __NR_lsm_set_self_attr (__NR_Linux + 460)
 #define __NR_lsm_list_modules (__NR_Linux + 461)
+#define __NR_mseal (__NR_Linux + 462)
 
 #endif /* _ASM_UNISTD_O32_H */
diff --git a/linux-headers/asm-powerpc/unistd_32.h b/linux-headers/asm-powerpc/unistd_32.h
index bbab08d6ec..28627b6546 100644
--- a/linux-headers/asm-powerpc/unistd_32.h
+++ b/linux-headers/asm-powerpc/unistd_32.h
@@ -443,6 +443,7 @@
 #define __NR_lsm_get_self_attr 459
 #define __NR_lsm_set_self_attr 460
 #define __NR_lsm_list_modules 461
+#define __NR_mseal 462
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-powerpc/unistd_64.h b/linux-headers/asm-powerpc/unistd_64.h
index af34cde70f..1fc42a8300 100644
--- a/linux-headers/asm-powerpc/unistd_64.h
+++ b/linux-headers/asm-powerpc/unistd_64.h
@@ -415,6 +415,7 @@
 #define __NR_lsm_get_self_attr 459
 #define __NR_lsm_set_self_attr 460
 #define __NR_lsm_list_modules 461
+#define __NR_mseal 462
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-riscv/kvm.h b/linux-headers/asm-riscv/kvm.h
index b1c503c295..e878e7cc39 100644
--- a/linux-headers/asm-riscv/kvm.h
+++ b/linux-headers/asm-riscv/kvm.h
@@ -167,6 +167,7 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_ZFA,
 	KVM_RISCV_ISA_EXT_ZTSO,
 	KVM_RISCV_ISA_EXT_ZACAS,
+	KVM_RISCV_ISA_EXT_SSCOFPMF,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
diff --git a/linux-headers/asm-s390/unistd_32.h b/linux-headers/asm-s390/unistd_32.h
index a3ece69d82..7706c21b87 100644
--- a/linux-headers/asm-s390/unistd_32.h
+++ b/linux-headers/asm-s390/unistd_32.h
@@ -434,5 +434,6 @@
 #define __NR_lsm_get_self_attr 459
 #define __NR_lsm_set_self_attr 460
 #define __NR_lsm_list_modules 461
+#define __NR_mseal 462
 
 #endif /* _ASM_S390_UNISTD_32_H */
diff --git a/linux-headers/asm-s390/unistd_64.h b/linux-headers/asm-s390/unistd_64.h
index 8c5fd93495..62082d592d 100644
--- a/linux-headers/asm-s390/unistd_64.h
+++ b/linux-headers/asm-s390/unistd_64.h
@@ -382,5 +382,6 @@
 #define __NR_lsm_get_self_attr 459
 #define __NR_lsm_set_self_attr 460
 #define __NR_lsm_list_modules 461
+#define __NR_mseal 462
 
 #endif /* _ASM_S390_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 31c95c2dfe..b4e2e8d220 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -709,7 +709,9 @@ struct kvm_sev_cmd {
 struct kvm_sev_init {
 	__u64 vmsa_features;
 	__u32 flags;
-	__u32 pad[9];
+	__u16 ghcb_version;
+	__u16 pad1;
+	__u32 pad2[8];
 };
 
 struct kvm_sev_launch_start {
diff --git a/linux-headers/asm-x86/unistd_32.h b/linux-headers/asm-x86/unistd_32.h
index 5c9c329e93..fb7b8b169b 100644
--- a/linux-headers/asm-x86/unistd_32.h
+++ b/linux-headers/asm-x86/unistd_32.h
@@ -452,6 +452,7 @@
 #define __NR_lsm_get_self_attr 459
 #define __NR_lsm_set_self_attr 460
 #define __NR_lsm_list_modules 461
+#define __NR_mseal 462
 
 
 #endif /* _ASM_UNISTD_32_H */
diff --git a/linux-headers/asm-x86/unistd_64.h b/linux-headers/asm-x86/unistd_64.h
index d9aab7ae87..da439afee1 100644
--- a/linux-headers/asm-x86/unistd_64.h
+++ b/linux-headers/asm-x86/unistd_64.h
@@ -374,6 +374,7 @@
 #define __NR_lsm_get_self_attr 459
 #define __NR_lsm_set_self_attr 460
 #define __NR_lsm_list_modules 461
+#define __NR_mseal 462
 
 
 #endif /* _ASM_UNISTD_64_H */
diff --git a/linux-headers/asm-x86/unistd_x32.h b/linux-headers/asm-x86/unistd_x32.h
index 63cdd1ee43..4fcb607c72 100644
--- a/linux-headers/asm-x86/unistd_x32.h
+++ b/linux-headers/asm-x86/unistd_x32.h
@@ -318,6 +318,7 @@
 #define __NR_set_mempolicy_home_node (__X32_SYSCALL_BIT + 450)
 #define __NR_cachestat (__X32_SYSCALL_BIT + 451)
 #define __NR_fchmodat2 (__X32_SYSCALL_BIT + 452)
+#define __NR_map_shadow_stack (__X32_SYSCALL_BIT + 453)
 #define __NR_futex_wake (__X32_SYSCALL_BIT + 454)
 #define __NR_futex_wait (__X32_SYSCALL_BIT + 455)
 #define __NR_futex_requeue (__X32_SYSCALL_BIT + 456)
@@ -326,6 +327,7 @@
 #define __NR_lsm_get_self_attr (__X32_SYSCALL_BIT + 459)
 #define __NR_lsm_set_self_attr (__X32_SYSCALL_BIT + 460)
 #define __NR_lsm_list_modules (__X32_SYSCALL_BIT + 461)
+#define __NR_mseal (__X32_SYSCALL_BIT + 462)
 #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512)
 #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513)
 #define __NR_ioctl (__X32_SYSCALL_BIT + 514)
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 038731cdef..c93876ca0b 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1217,9 +1217,9 @@ struct kvm_vfio_spapr_tce {
 /* Available with KVM_CAP_SPAPR_RESIZE_HPT */
 #define KVM_PPC_RESIZE_HPT_PREPARE _IOR(KVMIO, 0xad, struct kvm_ppc_resize_hpt)
 #define KVM_PPC_RESIZE_HPT_COMMIT  _IOR(KVMIO, 0xae, struct kvm_ppc_resize_hpt)
-/* Available with KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3 */
+/* Available with KVM_CAP_PPC_MMU_RADIX or KVM_CAP_PPC_MMU_HASH_V3 */
 #define KVM_PPC_CONFIGURE_V3_MMU  _IOW(KVMIO,  0xaf, struct kvm_ppc_mmuv3_cfg)
-/* Available with KVM_CAP_PPC_RADIX_MMU */
+/* Available with KVM_CAP_PPC_MMU_RADIX */
 #define KVM_PPC_GET_RMMU_INFO	  _IOW(KVMIO,  0xb0, struct kvm_ppc_rmmu_info)
 /* Available with KVM_CAP_PPC_GET_CPU_CHAR */
 #define KVM_PPC_GET_CPU_CHAR	  _IOR(KVMIO,  0xb1, struct kvm_ppc_cpu_char)
diff --git a/linux-headers/linux/stddef.h b/linux-headers/linux/stddef.h
index bf9749dd14..96aa341942 100644
--- a/linux-headers/linux/stddef.h
+++ b/linux-headers/linux/stddef.h
@@ -55,4 +55,12 @@
 #define __counted_by(m)
 #endif
 
+#ifndef __counted_by_le
+#define __counted_by_le(m)
+#endif
+
+#ifndef __counted_by_be
+#define __counted_by_be(m)
+#endif
+
 #endif /* _LINUX_STDDEF_H */
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index bea6973906..b95dd84eef 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -179,12 +179,6 @@
 /* Get the config size */
 #define VHOST_VDPA_GET_CONFIG_SIZE	_IOR(VHOST_VIRTIO, 0x79, __u32)
 
-/* Get the count of all virtqueues */
-#define VHOST_VDPA_GET_VQS_COUNT	_IOR(VHOST_VIRTIO, 0x80, __u32)
-
-/* Get the number of virtqueue groups. */
-#define VHOST_VDPA_GET_GROUP_NUM	_IOR(VHOST_VIRTIO, 0x81, __u32)
-
 /* Get the number of address spaces. */
 #define VHOST_VDPA_GET_AS_NUM		_IOR(VHOST_VIRTIO, 0x7A, unsigned int)
 
@@ -228,10 +222,17 @@
 #define VHOST_VDPA_GET_VRING_DESC_GROUP	_IOWR(VHOST_VIRTIO, 0x7F,	\
 					      struct vhost_vring_state)
 
+
+/* Get the count of all virtqueues */
+#define VHOST_VDPA_GET_VQS_COUNT	_IOR(VHOST_VIRTIO, 0x80, __u32)
+
+/* Get the number of virtqueue groups. */
+#define VHOST_VDPA_GET_GROUP_NUM	_IOR(VHOST_VIRTIO, 0x81, __u32)
+
 /* Get the queue size of a specific virtqueue.
  * userspace set the vring index in vhost_vring_state.index
  * kernel set the queue size in vhost_vring_state.num
  */
-#define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x80,	\
+#define VHOST_VDPA_GET_VRING_SIZE	_IOWR(VHOST_VIRTIO, 0x82,	\
 					      struct vhost_vring_state)
 #endif
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 39/46] hw/misc/pvpanic: centralize definition of supported events
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (37 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 38/46] linux-headers: update to 6.10-rc1 Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 40/46] tests/qtest/pvpanic: use centralized " Michael S. Tsirkin
                   ` (8 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Weißschuh, Thomas Huth, Cornelia Huck,
	Richard Henderson, Zhao Liu

From: Thomas Weißschuh <thomas@t-8ch.de>

The different components of pvpanic duplicate the list of supported
events. Move it to the shared header file to minimize changes when new
events are added.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240527-pvpanic-shutdown-v8-3-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/misc/pvpanic.h | 4 ++++
 hw/misc/pvpanic-isa.c     | 3 +--
 hw/misc/pvpanic-pci.c     | 3 +--
 hw/misc/pvpanic.c         | 3 +--
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/hw/misc/pvpanic.h b/include/hw/misc/pvpanic.h
index fab94165d0..947468b81b 100644
--- a/include/hw/misc/pvpanic.h
+++ b/include/hw/misc/pvpanic.h
@@ -18,6 +18,10 @@
 #include "exec/memory.h"
 #include "qom/object.h"
 
+#include "standard-headers/linux/pvpanic.h"
+
+#define PVPANIC_EVENTS (PVPANIC_PANICKED | PVPANIC_CRASH_LOADED)
+
 #define TYPE_PVPANIC_ISA_DEVICE "pvpanic"
 #define TYPE_PVPANIC_PCI_DEVICE "pvpanic-pci"
 
diff --git a/hw/misc/pvpanic-isa.c b/hw/misc/pvpanic-isa.c
index ccec50f61b..9a923b7869 100644
--- a/hw/misc/pvpanic-isa.c
+++ b/hw/misc/pvpanic-isa.c
@@ -21,7 +21,6 @@
 #include "hw/misc/pvpanic.h"
 #include "qom/object.h"
 #include "hw/isa/isa.h"
-#include "standard-headers/linux/pvpanic.h"
 #include "hw/acpi/acpi_aml_interface.h"
 
 OBJECT_DECLARE_SIMPLE_TYPE(PVPanicISAState, PVPANIC_ISA_DEVICE)
@@ -102,7 +101,7 @@ static void build_pvpanic_isa_aml(AcpiDevAmlIf *adev, Aml *scope)
 static Property pvpanic_isa_properties[] = {
     DEFINE_PROP_UINT16(PVPANIC_IOPORT_PROP, PVPanicISAState, ioport, 0x505),
     DEFINE_PROP_UINT8("events", PVPanicISAState, pvpanic.events,
-                      PVPANIC_PANICKED | PVPANIC_CRASH_LOADED),
+                      PVPANIC_EVENTS),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/misc/pvpanic-pci.c b/hw/misc/pvpanic-pci.c
index 83be95d0d2..603c5c7600 100644
--- a/hw/misc/pvpanic-pci.c
+++ b/hw/misc/pvpanic-pci.c
@@ -21,7 +21,6 @@
 #include "hw/misc/pvpanic.h"
 #include "qom/object.h"
 #include "hw/pci/pci_device.h"
-#include "standard-headers/linux/pvpanic.h"
 
 OBJECT_DECLARE_SIMPLE_TYPE(PVPanicPCIState, PVPANIC_PCI_DEVICE)
 
@@ -55,7 +54,7 @@ static void pvpanic_pci_realizefn(PCIDevice *dev, Error **errp)
 
 static Property pvpanic_pci_properties[] = {
     DEFINE_PROP_UINT8("events", PVPanicPCIState, pvpanic.events,
-                      PVPANIC_PANICKED | PVPANIC_CRASH_LOADED),
+                      PVPANIC_EVENTS),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
index 1540e9091a..a4982cc592 100644
--- a/hw/misc/pvpanic.c
+++ b/hw/misc/pvpanic.c
@@ -21,13 +21,12 @@
 #include "hw/qdev-properties.h"
 #include "hw/misc/pvpanic.h"
 #include "qom/object.h"
-#include "standard-headers/linux/pvpanic.h"
 
 static void handle_event(int event)
 {
     static bool logged;
 
-    if (event & ~(PVPANIC_PANICKED | PVPANIC_CRASH_LOADED) && !logged) {
+    if (event & ~PVPANIC_EVENTS && !logged) {
         qemu_log_mask(LOG_GUEST_ERROR, "pvpanic: unknown event %#x.\n", event);
         logged = true;
     }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 40/46] tests/qtest/pvpanic: use centralized definition of supported events
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (38 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 39/46] hw/misc/pvpanic: centralize definition of supported events Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 41/46] hw/misc/pvpanic: add support for normal shutdowns Michael S. Tsirkin
                   ` (7 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Weißschuh, Thomas Huth, Cornelia Huck,
	Laurent Vivier, Paolo Bonzini

From: Thomas Weißschuh <thomas@t-8ch.de>

Avoid the necessity to update all tests when new events are added
to the device.

Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240527-pvpanic-shutdown-v8-4-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/qtest/pvpanic-pci-test.c | 5 +++--
 tests/qtest/pvpanic-test.c     | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/pvpanic-pci-test.c b/tests/qtest/pvpanic-pci-test.c
index 2c05b376ba..b372caf41d 100644
--- a/tests/qtest/pvpanic-pci-test.c
+++ b/tests/qtest/pvpanic-pci-test.c
@@ -16,6 +16,7 @@
 #include "qapi/qmp/qdict.h"
 #include "libqos/pci.h"
 #include "libqos/pci-pc.h"
+#include "hw/misc/pvpanic.h"
 #include "hw/pci/pci_regs.h"
 
 static void test_panic_nopause(void)
@@ -34,7 +35,7 @@ static void test_panic_nopause(void)
     bar = qpci_iomap(dev, 0, NULL);
 
     qpci_memread(dev, bar, 0, &val, sizeof(val));
-    g_assert_cmpuint(val, ==, 3);
+    g_assert_cmpuint(val, ==, PVPANIC_EVENTS);
 
     val = 1;
     qpci_memwrite(dev, bar, 0, &val, sizeof(val));
@@ -67,7 +68,7 @@ static void test_panic(void)
     bar = qpci_iomap(dev, 0, NULL);
 
     qpci_memread(dev, bar, 0, &val, sizeof(val));
-    g_assert_cmpuint(val, ==, 3);
+    g_assert_cmpuint(val, ==, PVPANIC_EVENTS);
 
     val = 1;
     qpci_memwrite(dev, bar, 0, &val, sizeof(val));
diff --git a/tests/qtest/pvpanic-test.c b/tests/qtest/pvpanic-test.c
index 78f1cf8186..ccc603472f 100644
--- a/tests/qtest/pvpanic-test.c
+++ b/tests/qtest/pvpanic-test.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "libqtest.h"
 #include "qapi/qmp/qdict.h"
+#include "hw/misc/pvpanic.h"
 
 static void test_panic_nopause(void)
 {
@@ -20,7 +21,7 @@ static void test_panic_nopause(void)
     qts = qtest_init("-device pvpanic -action panic=none");
 
     val = qtest_inb(qts, 0x505);
-    g_assert_cmpuint(val, ==, 3);
+    g_assert_cmpuint(val, ==, PVPANIC_EVENTS);
 
     qtest_outb(qts, 0x505, 0x1);
 
@@ -43,7 +44,7 @@ static void test_panic(void)
     qts = qtest_init("-device pvpanic -action panic=pause");
 
     val = qtest_inb(qts, 0x505);
-    g_assert_cmpuint(val, ==, 3);
+    g_assert_cmpuint(val, ==, PVPANIC_EVENTS);
 
     qtest_outb(qts, 0x505, 0x1);
 
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 41/46] hw/misc/pvpanic: add support for normal shutdowns
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (39 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 40/46] tests/qtest/pvpanic: use centralized " Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 42/46] pvpanic: Emit GUEST_PVSHUTDOWN QMP event on pvpanic shutdown signal Michael S. Tsirkin
                   ` (6 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Weißschuh, Cornelia Huck,
	Paolo Bonzini

From: Thomas Weißschuh <thomas@t-8ch.de>

Shutdown requests are normally hardware dependent.
By extending pvpanic to also handle shutdown requests, guests can
submit such requests with an easily implementable and cross-platform
mechanism.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240527-pvpanic-shutdown-v8-5-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/misc/pvpanic.h | 4 +++-
 include/sysemu/runstate.h | 1 +
 hw/misc/pvpanic.c         | 5 +++++
 system/runstate.c         | 5 +++++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/hw/misc/pvpanic.h b/include/hw/misc/pvpanic.h
index 947468b81b..04986537af 100644
--- a/include/hw/misc/pvpanic.h
+++ b/include/hw/misc/pvpanic.h
@@ -20,7 +20,9 @@
 
 #include "standard-headers/linux/pvpanic.h"
 
-#define PVPANIC_EVENTS (PVPANIC_PANICKED | PVPANIC_CRASH_LOADED)
+#define PVPANIC_EVENTS (PVPANIC_PANICKED | \
+                        PVPANIC_CRASH_LOADED | \
+                        PVPANIC_SHUTDOWN)
 
 #define TYPE_PVPANIC_ISA_DEVICE "pvpanic"
 #define TYPE_PVPANIC_PCI_DEVICE "pvpanic-pci"
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d243c4..e210a37abf 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -104,6 +104,7 @@ void qemu_system_killed(int signal, pid_t pid);
 void qemu_system_reset(ShutdownCause reason);
 void qemu_system_guest_panicked(GuestPanicInformation *info);
 void qemu_system_guest_crashloaded(GuestPanicInformation *info);
+void qemu_system_guest_pvshutdown(void);
 bool qemu_system_dump_in_progress(void);
 
 #endif
diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
index a4982cc592..0e9505451a 100644
--- a/hw/misc/pvpanic.c
+++ b/hw/misc/pvpanic.c
@@ -40,6 +40,11 @@ static void handle_event(int event)
         qemu_system_guest_crashloaded(NULL);
         return;
     }
+
+    if (event & PVPANIC_SHUTDOWN) {
+        qemu_system_guest_pvshutdown();
+        return;
+    }
 }
 
 /* return supported events on read */
diff --git a/system/runstate.c b/system/runstate.c
index cb4905a40f..83055f3278 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -585,6 +585,11 @@ void qemu_system_guest_crashloaded(GuestPanicInformation *info)
     qapi_free_GuestPanicInformation(info);
 }
 
+void qemu_system_guest_pvshutdown(void)
+{
+    qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+}
+
 void qemu_system_reset_request(ShutdownCause reason)
 {
     if (reboot_action == REBOOT_ACTION_SHUTDOWN &&
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 42/46] pvpanic: Emit GUEST_PVSHUTDOWN QMP event on pvpanic shutdown signal
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (40 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 41/46] hw/misc/pvpanic: add support for normal shutdowns Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 43/46] tests/qtest/pvpanic: add tests for pvshutdown event Michael S. Tsirkin
                   ` (5 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Alejandro Jimenez, Eric Blake, Markus Armbruster,
	Paolo Bonzini

From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>

Emit a QMP event on receiving a PVPANIC_SHUTDOWN event. Even though a typical
SHUTDOWN event will be sent, it will be indistinguishable from a shutdown
originating from other cases (e.g. KVM exit due to KVM_SYSTEM_EVENT_SHUTDOWN)
that also issue the guest-shutdown cause.
A management layer application can detect the new GUEST_PVSHUTDOWN event to
determine if the guest is using the pvpanic interface to request shutdowns.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Message-Id: <20240527-pvpanic-shutdown-v8-6-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 qapi/run-state.json | 14 ++++++++++++++
 system/runstate.c   |  1 +
 2 files changed, 15 insertions(+)

diff --git a/qapi/run-state.json b/qapi/run-state.json
index f8773f23b2..5ac0fec852 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -462,6 +462,20 @@
 { 'event': 'GUEST_CRASHLOADED',
   'data': { 'action': 'GuestPanicAction', '*info': 'GuestPanicInformation' } }
 
+##
+# @GUEST_PVSHUTDOWN:
+#
+# Emitted when guest submits a shutdown request via pvpanic interface
+#
+# Since: 9.1
+#
+# Example:
+#
+#     <- { "event": "GUEST_PVSHUTDOWN",
+#          "timestamp": { "seconds": 1648245259, "microseconds": 893771 } }
+##
+{ 'event': 'GUEST_PVSHUTDOWN' }
+
 ##
 # @GuestPanicAction:
 #
diff --git a/system/runstate.c b/system/runstate.c
index 83055f3278..c149b1ab4b 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -587,6 +587,7 @@ void qemu_system_guest_crashloaded(GuestPanicInformation *info)
 
 void qemu_system_guest_pvshutdown(void)
 {
+    qapi_event_send_guest_pvshutdown();
     qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 }
 
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 43/46] tests/qtest/pvpanic: add tests for pvshutdown event
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (41 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 42/46] pvpanic: Emit GUEST_PVSHUTDOWN QMP event on pvpanic shutdown signal Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 44/46] Revert "docs/specs/pvpanic: mark shutdown event as not implemented" Michael S. Tsirkin
                   ` (4 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Weißschuh, Thomas Huth, Laurent Vivier,
	Paolo Bonzini

From: Thomas Weißschuh <thomas@t-8ch.de>

Validate that a shutdown via the pvpanic device emits the correct
QMP events.

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20240527-pvpanic-shutdown-v8-7-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/qtest/pvpanic-pci-test.c | 39 ++++++++++++++++++++++++++++++++++
 tests/qtest/pvpanic-test.c     | 29 +++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/tests/qtest/pvpanic-pci-test.c b/tests/qtest/pvpanic-pci-test.c
index b372caf41d..dc021c2fdf 100644
--- a/tests/qtest/pvpanic-pci-test.c
+++ b/tests/qtest/pvpanic-pci-test.c
@@ -85,11 +85,50 @@ static void test_panic(void)
     qtest_quit(qts);
 }
 
+static void test_pvshutdown(void)
+{
+    uint8_t val;
+    QDict *response, *data;
+    QTestState *qts;
+    QPCIBus *pcibus;
+    QPCIDevice *dev;
+    QPCIBar bar;
+
+    qts = qtest_init("-device pvpanic-pci,addr=04.0");
+    pcibus = qpci_new_pc(qts, NULL);
+    dev = qpci_device_find(pcibus, QPCI_DEVFN(0x4, 0x0));
+    qpci_device_enable(dev);
+    bar = qpci_iomap(dev, 0, NULL);
+
+    qpci_memread(dev, bar, 0, &val, sizeof(val));
+    g_assert_cmpuint(val, ==, PVPANIC_EVENTS);
+
+    val = PVPANIC_SHUTDOWN;
+    qpci_memwrite(dev, bar, 0, &val, sizeof(val));
+
+    response = qtest_qmp_eventwait_ref(qts, "GUEST_PVSHUTDOWN");
+    qobject_unref(response);
+
+    response = qtest_qmp_eventwait_ref(qts, "SHUTDOWN");
+    g_assert(qdict_haskey(response, "data"));
+    data = qdict_get_qdict(response, "data");
+    g_assert(qdict_haskey(data, "guest"));
+    g_assert(qdict_get_bool(data, "guest"));
+    g_assert(qdict_haskey(data, "reason"));
+    g_assert_cmpstr(qdict_get_str(data, "reason"), ==, "guest-shutdown");
+    qobject_unref(response);
+
+    g_free(dev);
+    qpci_free_pc(pcibus);
+    qtest_quit(qts);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
     qtest_add_func("/pvpanic-pci/panic", test_panic);
     qtest_add_func("/pvpanic-pci/panic-nopause", test_panic_nopause);
+    qtest_add_func("/pvpanic-pci/pvshutdown", test_pvshutdown);
 
     return g_test_run();
 }
diff --git a/tests/qtest/pvpanic-test.c b/tests/qtest/pvpanic-test.c
index ccc603472f..d49d2ba931 100644
--- a/tests/qtest/pvpanic-test.c
+++ b/tests/qtest/pvpanic-test.c
@@ -58,11 +58,40 @@ static void test_panic(void)
     qtest_quit(qts);
 }
 
+static void test_pvshutdown(void)
+{
+    uint8_t val;
+    QDict *response, *data;
+    QTestState *qts;
+
+    qts = qtest_init("-device pvpanic");
+
+    val = qtest_inb(qts, 0x505);
+    g_assert_cmpuint(val, ==, PVPANIC_EVENTS);
+
+    qtest_outb(qts, 0x505, PVPANIC_SHUTDOWN);
+
+    response = qtest_qmp_eventwait_ref(qts, "GUEST_PVSHUTDOWN");
+    qobject_unref(response);
+
+    response = qtest_qmp_eventwait_ref(qts, "SHUTDOWN");
+    g_assert(qdict_haskey(response, "data"));
+    data = qdict_get_qdict(response, "data");
+    g_assert(qdict_haskey(data, "guest"));
+    g_assert(qdict_get_bool(data, "guest"));
+    g_assert(qdict_haskey(data, "reason"));
+    g_assert_cmpstr(qdict_get_str(data, "reason"), ==, "guest-shutdown");
+    qobject_unref(response);
+
+    qtest_quit(qts);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
     qtest_add_func("/pvpanic/panic", test_panic);
     qtest_add_func("/pvpanic/panic-nopause", test_panic_nopause);
+    qtest_add_func("/pvpanic/pvshutdown", test_pvshutdown);
 
     return g_test_run();
 }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 44/46] Revert "docs/specs/pvpanic: mark shutdown event as not implemented"
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (42 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 43/46] tests/qtest/pvpanic: add tests for pvshutdown event Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 45/46] virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one() Michael S. Tsirkin
                   ` (3 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Thomas Weißschuh, Philippe Mathieu-Daudé

From: Thomas Weißschuh <thomas@t-8ch.de>

The missing functionality has been implemented now.

This reverts commit e739d1935c461d0668057e9dbba9d06f728d29ec.

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Message-Id: <20240527-pvpanic-shutdown-v8-8-5a28ec02558b@t-8ch.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/specs/pvpanic.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/specs/pvpanic.rst b/docs/specs/pvpanic.rst
index b0f27860ec..61a80480ed 100644
--- a/docs/specs/pvpanic.rst
+++ b/docs/specs/pvpanic.rst
@@ -29,7 +29,7 @@ bit 1
   a guest panic has happened and will be handled by the guest;
   the host should record it or report it, but should not affect
   the execution of the guest.
-bit 2 (to be implemented)
+bit 2
   a regular guest shutdown has happened and should be processed by the host
 
 PCI Interface
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 45/46] virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one()
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (43 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 44/46] Revert "docs/specs/pvpanic: mark shutdown event as not implemented" Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-04 19:08 ` [PULL 46/46] hw/cxl: Fix read from bogus memory Michael S. Tsirkin
                   ` (2 subsequent siblings)
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Cindy Lu, qemu-stable

From: Cindy Lu <lulu@redhat.com>

In function kvm_virtio_pci_vector_use_one(), the function will only use
the irqfd/vector for itself. Therefore, in the undo label, the failing
process is incorrect.
To fix this, we can just remove this label.

Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
Cc: qemu-stable@nongnu.org
Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20240528084840.194538-1-lulu@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-pci.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b6757ffce2..349619f57a 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -898,7 +898,7 @@ static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int queue_no)
     }
     ret = kvm_virtio_pci_vq_vector_use(proxy, vector);
     if (ret < 0) {
-        goto undo;
+        return ret;
     }
     /*
      * If guest supports masking, set up irqfd now.
@@ -908,25 +908,11 @@ static int kvm_virtio_pci_vector_use_one(VirtIOPCIProxy *proxy, int queue_no)
         ret = kvm_virtio_pci_irqfd_use(proxy, n, vector);
         if (ret < 0) {
             kvm_virtio_pci_vq_vector_release(proxy, vector);
-            goto undo;
+            return ret;
         }
     }
 
     return 0;
-undo:
-
-    vector = virtio_queue_vector(vdev, queue_no);
-    if (vector >= msix_nr_vectors_allocated(dev)) {
-        return ret;
-    }
-    if (vdev->use_guest_notifier_mask && k->guest_notifier_mask) {
-        ret = virtio_pci_get_notifier(proxy, queue_no, &n, &vector);
-        if (ret < 0) {
-            return ret;
-        }
-        kvm_virtio_pci_irqfd_release(proxy, n, vector);
-    }
-    return ret;
 }
 static int kvm_virtio_pci_vector_vq_use(VirtIOPCIProxy *proxy, int nvqs)
 {
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PULL 46/46] hw/cxl: Fix read from bogus memory
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (44 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 45/46] virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one() Michael S. Tsirkin
@ 2024-06-04 19:08 ` Michael S. Tsirkin
  2024-06-05  7:27 ` [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
  2024-06-25 13:06 ` Peter Maydell
  47 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-04 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Ira Weiny, Jonathan Cameron, Fan Ni

From: Ira Weiny <ira.weiny@intel.com>

Peter and coverity report:

	We've passed '&data' to address_space_write(), which means "read
	from the address on the stack where the function argument 'data'
	lives", so instead of writing 64 bytes of data to the guest ,
	we'll write 64 bytes which start with a host pointer value and
	then continue with whatever happens to be on the host stack
	after that.

Indeed the intention was to write 64 bytes of data at the address given.

Fix the parameter to address_space_write().

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Link: https://lore.kernel.org/all/CAFEAcA-u4sytGwTKsb__Y+_+0O2-WwARntm3x8WNhvL1WfHOBg@mail.gmail.com/
Fixes: 6bda41a69bdc ("hw/cxl: Add clear poison mailbox command support.")
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Message-Id: <20240531-fix-poison-set-cacheline-v1-1-e3bc7e8f1158@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/mem/cxl_type3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 5d4a1276be..3274e5dcbb 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -1292,7 +1292,7 @@ static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t *data)
         dpa_offset -= (vmr_size + pmr_size);
     }
 
-    address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, &data,
+    address_space_write(as, dpa_offset, MEMTXATTRS_UNSPECIFIED, data,
                         CXL_CACHE_LINE_SIZE);
     return true;
 }
-- 
MST



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PULL 00/46] virtio: features,fixes
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (45 preceding siblings ...)
  2024-06-04 19:08 ` [PULL 46/46] hw/cxl: Fix read from bogus memory Michael S. Tsirkin
@ 2024-06-05  7:27 ` Michael S. Tsirkin
  2024-06-05 14:44   ` Richard Henderson
  2024-06-25 13:06 ` Peter Maydell
  47 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-05  7:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Cindy Lu, qemu-stable, Jason Wang

On Tue, Jun 04, 2024 at 03:06:01PM -0400, Michael S. Tsirkin wrote:
> The following changes since commit 60b54b67c63d8f076152e0f7dccf39854dfc6a77:
> 
>   Merge tag 'pull-lu-20240526' of https://gitlab.com/rth7680/qemu into staging (2024-05-26 17:51:00 -0700)
> 
> are available in the Git repository at:
> 
>   https://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> for you to fetch changes up to bfcacf81d63a3d95f128bce3faf3564e7f98ea8b:

Dropped a patch from this pull at Author's request.
New head a2da15a164ddd798227262b58507b46ad5ab0ca9
Sorry about the noise - ok like this?
Don't want to spam the list posting v2 just for this.

> 
>   hw/cxl: Fix read from bogus memory (2024-06-04 15:05:03 -0400)
> 
> ----------------------------------------------------------------
> virtio: features,fixes
> 
> A bunch of improvements:
> - vhost dirty log is now only scanned once, not once per device
> - virtio and vhost now support VIRTIO_F_NOTIFICATION_DATA
> - cxl gained DCD emulation support
> - pvpanic gained shutdown support
> - acpi now supports Generic Port Affinity Structure
> - new tests
> - bugfixes
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ----------------------------------------------------------------
> Alejandro Jimenez (1):
>       pvpanic: Emit GUEST_PVSHUTDOWN QMP event on pvpanic shutdown signal
> 
> Christian Pötzsch (1):
>       Fix vhost user assertion when sending more than one fd
> 
> Cindy Lu (2):
>       virtio-pci: Fix the use of an uninitialized irqfd.

This is the patch that I dropped.


>       virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one()
> 
> Fan Ni (12):
>       hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command
>       hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support
>       include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices
>       hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices
>       hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument
>       hw/mem/cxl_type3: Add host backend and address space handling for DC regions
>       hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support
>       hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response
>       hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
>       hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions
>       hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support
>       hw/mem/cxl_type3: Allow to release extent superset in QMP interface
> 
> Gregory Price (2):
>       hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference
>       hw/cxl/mailbox: interface to add CCI commands to an existing CCI
> 
> Halil Pasic (1):
>       vhost-vsock: add VIRTIO_F_RING_PACKED to feature_bits
> 
> Ira Weiny (1):
>       hw/cxl: Fix read from bogus memory
> 
> Jiqian Chen (1):
>       virtio-pci: only reset pm state during resetting
> 
> Jonah Palmer (5):
>       virtio/virtio-pci: Handle extra notification data
>       virtio: Prevent creation of device using notification-data with ioeventfd
>       virtio-mmio: Handle extra notification data
>       virtio-ccw: Handle extra notification data
>       vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
> 
> Jonathan Cameron (6):
>       hw/acpi/GI: Fix trivial parameter alignment issue.
>       hw/acpi: Insert an acpi-generic-node base under acpi-generic-initiator
>       hw/acpi: Generic Port Affinity Structure support
>       bios-tables-test: Allow for new acpihmat-generic-x test data.
>       bios-tables-test: Add complex SRAT / HMAT test for GI GP
>       bios-tables-test: Add data for complex numa test (GI, GP etc)
> 
> Li Feng (2):
>       Revert "vhost-user: fix lost reconnect"
>       vhost-user: fix lost reconnect again
> 
> Marc-André Lureau (1):
>       vhost-user-gpu: fix import of DMABUF
> 
> Si-Wei Liu (2):
>       vhost: dirty log should be per backend type
>       vhost: Perform memory section dirty scans once per iteration
> 
> Stefano Garzarella (1):
>       vhost-vdpa: check vhost_vdpa_set_vring_ready() return value
> 
> Thomas Weißschuh (7):
>       scripts/update-linux-headers: Copy setup_data.h to correct directory
>       linux-headers: update to 6.10-rc1
>       hw/misc/pvpanic: centralize definition of supported events
>       tests/qtest/pvpanic: use centralized definition of supported events
>       hw/misc/pvpanic: add support for normal shutdowns
>       tests/qtest/pvpanic: add tests for pvshutdown event
>       Revert "docs/specs/pvpanic: mark shutdown event as not implemented"
> 
> Wafer (1):
>       hw/virtio: Fix obtain the buffer id from the last descriptor
> 
>  qapi/cxl.json                               | 143 ++++++
>  qapi/qom.json                               |  35 ++
>  qapi/run-state.json                         |  14 +
>  include/hw/acpi/acpi_generic_initiator.h    |  33 +-
>  include/hw/cxl/cxl_device.h                 |  85 +++-
>  include/hw/cxl/cxl_events.h                 |  18 +
>  include/hw/misc/pvpanic.h                   |   6 +
>  include/hw/pci/pci_bridge.h                 |   1 +
>  include/hw/virtio/vhost-user.h              |   3 +-
>  include/hw/virtio/vhost.h                   |   1 +
>  include/hw/virtio/virtio.h                  |   2 +
>  include/standard-headers/linux/ethtool.h    |  55 +++
>  include/standard-headers/linux/pci_regs.h   |   6 +
>  include/standard-headers/linux/pvpanic.h    |   7 +-
>  include/standard-headers/linux/virtio_bt.h  |   1 -
>  include/standard-headers/linux/virtio_mem.h |   2 +
>  include/standard-headers/linux/virtio_net.h | 143 ++++++
>  include/sysemu/runstate.h                   |   1 +
>  linux-headers/asm-generic/unistd.h          |   5 +-
>  linux-headers/asm-loongarch/kvm.h           |   4 +
>  linux-headers/asm-mips/unistd_n32.h         |   1 +
>  linux-headers/asm-mips/unistd_n64.h         |   1 +
>  linux-headers/asm-mips/unistd_o32.h         |   1 +
>  linux-headers/asm-powerpc/unistd_32.h       |   1 +
>  linux-headers/asm-powerpc/unistd_64.h       |   1 +
>  linux-headers/asm-riscv/kvm.h               |   1 +
>  linux-headers/asm-s390/unistd_32.h          |   1 +
>  linux-headers/asm-s390/unistd_64.h          |   1 +
>  linux-headers/asm-x86/kvm.h                 |   4 +-
>  linux-headers/asm-x86/unistd_32.h           |   1 +
>  linux-headers/asm-x86/unistd_64.h           |   1 +
>  linux-headers/asm-x86/unistd_x32.h          |   2 +
>  linux-headers/linux/kvm.h                   |   4 +-
>  linux-headers/linux/stddef.h                |   8 +
>  linux-headers/linux/vhost.h                 |  15 +-
>  hw/acpi/acpi_generic_initiator.c            | 209 ++++++---
>  hw/block/vhost-user-blk.c                   |   6 +-
>  hw/cxl/cxl-mailbox-utils.c                  | 658 +++++++++++++++++++++++++++-
>  hw/display/vhost-user-gpu.c                 |   5 +-
>  hw/mem/cxl_type3.c                          | 637 +++++++++++++++++++++++++--
>  hw/mem/cxl_type3_stubs.c                    |  25 ++
>  hw/misc/pvpanic-isa.c                       |   3 +-
>  hw/misc/pvpanic-pci.c                       |   3 +-
>  hw/misc/pvpanic.c                           |   8 +-
>  hw/net/vhost_net.c                          |   2 +
>  hw/pci-bridge/pci_expander_bridge.c         |   1 -
>  hw/s390x/s390-virtio-ccw.c                  |  17 +-
>  hw/scsi/vhost-scsi.c                        |   1 +
>  hw/scsi/vhost-user-scsi.c                   |   7 +-
>  hw/virtio/vhost-user-base.c                 |   5 +-
>  hw/virtio/vhost-user-fs.c                   |   2 +-
>  hw/virtio/vhost-user-vsock.c                |   1 +
>  hw/virtio/vhost-user.c                      |  18 +-
>  hw/virtio/vhost-vsock-common.c              |   1 +
>  hw/virtio/vhost.c                           | 112 ++++-
>  hw/virtio/virtio-mmio.c                     |  11 +-
>  hw/virtio/virtio-pci.c                      |  45 +-
>  hw/virtio/virtio.c                          |  45 ++
>  net/vhost-vdpa.c                            |  16 +-
>  subprojects/libvhost-user/libvhost-user.c   |   2 +-
>  system/runstate.c                           |   6 +
>  tests/qtest/bios-tables-test.c              |  92 ++++
>  tests/qtest/pvpanic-pci-test.c              |  44 +-
>  tests/qtest/pvpanic-test.c                  |  34 +-
>  docs/specs/pvpanic.rst                      |   2 +-
>  scripts/update-linux-headers.sh             |   2 +-
>  tests/data/acpi/q35/APIC.acpihmat-generic-x | Bin 0 -> 136 bytes
>  tests/data/acpi/q35/CEDT.acpihmat-generic-x | Bin 0 -> 68 bytes
>  tests/data/acpi/q35/DSDT.acpihmat-generic-x | Bin 0 -> 10400 bytes
>  tests/data/acpi/q35/HMAT.acpihmat-generic-x | Bin 0 -> 360 bytes
>  tests/data/acpi/q35/SRAT.acpihmat-generic-x | Bin 0 -> 520 bytes
>  71 files changed, 2407 insertions(+), 221 deletions(-)
>  create mode 100644 tests/data/acpi/q35/APIC.acpihmat-generic-x
>  create mode 100644 tests/data/acpi/q35/CEDT.acpihmat-generic-x
>  create mode 100644 tests/data/acpi/q35/DSDT.acpihmat-generic-x
>  create mode 100644 tests/data/acpi/q35/HMAT.acpihmat-generic-x
>  create mode 100644 tests/data/acpi/q35/SRAT.acpihmat-generic-x
> 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 04/46] virtio-pci: Fix the use of an uninitialized irqfd.
  2024-06-04 19:06 ` [PULL 04/46] virtio-pci: Fix the use of an uninitialized irqfd Michael S. Tsirkin
@ 2024-06-05  7:27   ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-05  7:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Cindy Lu, qemu-stable, Jason Wang

On Tue, Jun 04, 2024 at 03:06:15PM -0400, Michael S. Tsirkin wrote:
> From: Cindy Lu <lulu@redhat.com>
> 
> The crash was reported in MAC OS and NixOS, here is the link for this bug
> https://gitlab.com/qemu-project/qemu/-/issues/2334
> https://gitlab.com/qemu-project/qemu/-/issues/2321
> 
> The root cause is that the function virtio_pci_set_guest_notifiers() only
> initializes the irqfd when the use_guest_notifier_mask and guest_notifier_mask
> are set.
> However, this check is missing in virtio_pci_set_vector().
> So the fix is to add this check.
> 
> This fix is verified in vyatta,MacOS,NixOS,fedora system.
> 
> The bt tree for this bug is:
> Thread 6 "CPU 0/KVM" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7c817be006c0 (LWP 1269146)]
> kvm_virtio_pci_vq_vector_use () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:817
> 817	    if (irqfd->users == 0) {
> (gdb) thread apply all bt
> ...
> Thread 6 (Thread 0x7c817be006c0 (LWP 1269146) "CPU 0/KVM"):
> 0  kvm_virtio_pci_vq_vector_use () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:817
> 1  kvm_virtio_pci_vector_use_one () at ../qemu-9.0.0/hw/virtio/virtio-pci.c:893
> 2  0x00005983657045e2 in memory_region_write_accessor () at ../qemu-9.0.0/system/memory.c:497
> 3  0x0000598365704ba6 in access_with_adjusted_size () at ../qemu-9.0.0/system/memory.c:573
> 4  0x0000598365705059 in memory_region_dispatch_write () at ../qemu-9.0.0/system/memory.c:1528
> 5  0x00005983659b8e1f in flatview_write_continue_step.isra.0 () at ../qemu-9.0.0/system/physmem.c:2713
> 6  0x000059836570ba7d in flatview_write_continue () at ../qemu-9.0.0/system/physmem.c:2743
> 7  flatview_write () at ../qemu-9.0.0/system/physmem.c:2774
> 8  0x000059836570bb76 in address_space_write () at ../qemu-9.0.0/system/physmem.c:2894
> 9  0x0000598365763afe in address_space_rw () at ../qemu-9.0.0/system/physmem.c:2904
> 10 kvm_cpu_exec () at ../qemu-9.0.0/accel/kvm/kvm-all.c:2917
> 11 0x000059836576656e in kvm_vcpu_thread_fn () at ../qemu-9.0.0/accel/kvm/kvm-accel-ops.c:50
> 12 0x0000598365926ca8 in qemu_thread_start () at ../qemu-9.0.0/util/qemu-thread-posix.c:541
> 13 0x00007c8185bcd1cf in ??? () at /usr/lib/libc.so.6
> 14 0x00007c8185c4e504 in clone () at /usr/lib/libc.so.6
> 
> Fixes: 2ce6cff94d ("virtio-pci: fix use of a released vector")
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Cindy Lu <lulu@redhat.com>
> Message-Id: <20240522051042.985825-1-lulu@redhat.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


Dropped now at author's request.

> ---
>  hw/virtio/virtio-pci.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index b1d02f4b3d..a7faee5b33 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1431,6 +1431,7 @@ static void virtio_pci_set_vector(VirtIODevice *vdev,
>  {
>      bool kvm_irqfd = (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>          msix_enabled(&proxy->pci_dev) && kvm_msi_via_irqfd_enabled();
> +    VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
>  
>      if (new_vector == old_vector) {
>          return;
> @@ -1441,7 +1442,8 @@ static void virtio_pci_set_vector(VirtIODevice *vdev,
>       * set, we need to release the old vector and set up the new one.
>       * Otherwise just need to set the new vector on the device.
>       */
> -    if (kvm_irqfd && old_vector != VIRTIO_NO_VECTOR) {
> +    if (kvm_irqfd && old_vector != VIRTIO_NO_VECTOR &&
> +        vdev->use_guest_notifier_mask && k->guest_notifier_mask) {
>          kvm_virtio_pci_vector_release_one(proxy, queue_no);
>      }
>      /* Set the new vector on the device. */
> @@ -1451,7 +1453,8 @@ static void virtio_pci_set_vector(VirtIODevice *vdev,
>          virtio_queue_set_vector(vdev, queue_no, new_vector);
>      }
>      /* If the new vector changed need to set it up. */
> -    if (kvm_irqfd && new_vector != VIRTIO_NO_VECTOR) {
> +    if (kvm_irqfd && new_vector != VIRTIO_NO_VECTOR &&
> +        vdev->use_guest_notifier_mask && k->guest_notifier_mask) {
>          kvm_virtio_pci_vector_use_one(proxy, queue_no);
>      }
>  }
> -- 
> MST
> 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-04 19:08 ` [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc) Michael S. Tsirkin
@ 2024-06-05 14:39   ` Richard Henderson
  2024-06-05 15:27     ` Jonathan Cameron via
  0 siblings, 1 reply; 60+ messages in thread
From: Richard Henderson @ 2024-06-05 14:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Peter Maydell, Jonathan Cameron, Igor Mammedov, Ani Sinha

On 6/4/24 14:08, Michael S. Tsirkin wrote:
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Given this is a new configuration, there are affects on APIC, CEDT
> and DSDT, but the key elements are in SRAT (plus related data in
> HMAT).  The configuration has node to exercise many different combinations.
> 
> 0) CPUs + Memory
> 1) GI only
> 2) GP only
> 3) CPUS only
> 4) Memory only
> 5) CPUs + HP memory
> 
> GI node, GP Node, Memory only node, hotplug memory
> only node, latency and bandwidth such that in Linux Access0
> (any initiator) and Access1 (CPU initiators only) given different
> answers.  Following cropped to remove details of each entry.


This fails testing:

https://gitlab.com/qemu-project/qemu/-/jobs/7021105504

acpi-test: Warning! SRAT binary file mismatch. Actual [aml:/tmp/aml-GHR6O2], Expected 
[aml:tests/data/acpi/q35/SRAT.acpihmat-generic-x].
See source file tests/qtest/bios-tables-test.c for instructions on how to update expected 
files.
to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and 
re-run tests with V=1 environment variable set**
ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed: 
(all_tables_match)
Bail out! ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed: 
(all_tables_match)
Aborted (core dumped)


r~


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 00/46] virtio: features,fixes
  2024-06-05  7:27 ` [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
@ 2024-06-05 14:44   ` Richard Henderson
  0 siblings, 0 replies; 60+ messages in thread
From: Richard Henderson @ 2024-06-05 14:44 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Peter Maydell, Cindy Lu, qemu-stable, Jason Wang

On 6/5/24 02:27, Michael S. Tsirkin wrote:
> On Tue, Jun 04, 2024 at 03:06:01PM -0400, Michael S. Tsirkin wrote:
>> The following changes since commit 60b54b67c63d8f076152e0f7dccf39854dfc6a77:
>>
>>    Merge tag 'pull-lu-20240526' of https://gitlab.com/rth7680/qemu into staging (2024-05-26 17:51:00 -0700)
>>
>> are available in the Git repository at:
>>
>>    https://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>> for you to fetch changes up to bfcacf81d63a3d95f128bce3faf3564e7f98ea8b:
> 
> Dropped a patch from this pull at Author's request.
> New head a2da15a164ddd798227262b58507b46ad5ab0ca9
> Sorry about the noise - ok like this?
> Don't want to spam the list posting v2 just for this.

When you do this in future, send a new v2 cover.
However, the bios-tables-test still fails, so I won't merge this either.


r~


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-05 14:39   ` Richard Henderson
@ 2024-06-05 15:27     ` Jonathan Cameron via
  2024-06-05 15:49       ` Jonathan Cameron via
  2024-06-05 16:01       ` Richard Henderson
  0 siblings, 2 replies; 60+ messages in thread
From: Jonathan Cameron via @ 2024-06-05 15:27 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Michael S. Tsirkin, qemu-devel, Peter Maydell, Igor Mammedov,
	Ani Sinha

On Wed, 5 Jun 2024 07:39:53 -0700
Richard Henderson <richard.henderson@linaro.org> wrote:

> On 6/4/24 14:08, Michael S. Tsirkin wrote:
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > Given this is a new configuration, there are affects on APIC, CEDT
> > and DSDT, but the key elements are in SRAT (plus related data in
> > HMAT).  The configuration has node to exercise many different combinations.
> > 
> > 0) CPUs + Memory
> > 1) GI only
> > 2) GP only
> > 3) CPUS only
> > 4) Memory only
> > 5) CPUs + HP memory
> > 
> > GI node, GP Node, Memory only node, hotplug memory
> > only node, latency and bandwidth such that in Linux Access0
> > (any initiator) and Access1 (CPU initiators only) given different
> > answers.  Following cropped to remove details of each entry.  
> 
> 
> This fails testing:
> 
> https://gitlab.com/qemu-project/qemu/-/jobs/7021105504
> 
> acpi-test: Warning! SRAT binary file mismatch. Actual [aml:/tmp/aml-GHR6O2], Expected 
> [aml:tests/data/acpi/q35/SRAT.acpihmat-generic-x].
> See source file tests/qtest/bios-tables-test.c for instructions on how to update expected 
> files.
> to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and 
> re-run tests with V=1 environment variable set**
> ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed: 
> (all_tables_match)
> Bail out! ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed: 
> (all_tables_match)
> Aborted (core dumped)
> 

s390 and passes on an x86 host, so I guess an endian bug - any chance of a table dump
from someone with access to an s390 host?

This test covers some stuff that was previously missing tests
so may be non trivial to spot.

I'll play guess in the meantime. So far I'm not seeing anything that differs
from existing ACPI table building code.

Jonathan


> 
> r~



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-05 15:27     ` Jonathan Cameron via
@ 2024-06-05 15:49       ` Jonathan Cameron via
  2024-06-05 16:01       ` Richard Henderson
  1 sibling, 0 replies; 60+ messages in thread
From: Jonathan Cameron via @ 2024-06-05 15:49 UTC (permalink / raw)
  To: Jonathan Cameron via
  Cc: Jonathan Cameron, Richard Henderson, Michael S. Tsirkin,
	Peter Maydell, Igor Mammedov, Ani Sinha

On Wed, 5 Jun 2024 16:27:33 +0100
Jonathan Cameron via <qemu-devel@nongnu.org> wrote:

> On Wed, 5 Jun 2024 07:39:53 -0700
> Richard Henderson <richard.henderson@linaro.org> wrote:
> 
> > On 6/4/24 14:08, Michael S. Tsirkin wrote:  
> > > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > 
> > > Given this is a new configuration, there are affects on APIC, CEDT
> > > and DSDT, but the key elements are in SRAT (plus related data in
> > > HMAT).  The configuration has node to exercise many different combinations.
> > > 
> > > 0) CPUs + Memory
> > > 1) GI only
> > > 2) GP only
> > > 3) CPUS only
> > > 4) Memory only
> > > 5) CPUs + HP memory
> > > 
> > > GI node, GP Node, Memory only node, hotplug memory
> > > only node, latency and bandwidth such that in Linux Access0
> > > (any initiator) and Access1 (CPU initiators only) given different
> > > answers.  Following cropped to remove details of each entry.    
> > 
> > 
> > This fails testing:
> > 
> > https://gitlab.com/qemu-project/qemu/-/jobs/7021105504
> > 
> > acpi-test: Warning! SRAT binary file mismatch. Actual [aml:/tmp/aml-GHR6O2], Expected 
> > [aml:tests/data/acpi/q35/SRAT.acpihmat-generic-x].
> > See source file tests/qtest/bios-tables-test.c for instructions on how to update expected 
> > files.
> > to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and 
> > re-run tests with V=1 environment variable set**
> > ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed: 
> > (all_tables_match)
> > Bail out! ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed: 
> > (all_tables_match)
> > Aborted (core dumped)
> >   
> 
> s390 and passes on an x86 host, so I guess an endian bug - any chance of a table dump
> from someone with access to an s390 host?
> 
> This test covers some stuff that was previously missing tests
> so may be non trivial to spot.
> 
> I'll play guess in the meantime. So far I'm not seeing anything that differs
> from existing ACPI table building code.

Ok. I think it a bug earlier in this patch set. A table dump would confirm.

One item that looks dodgy is the filling in of the ACPI HID.

It's messy in the ACPI spec as it's defined as both an integer
and a string.
 
Need to take a 4 character string and get it into hid of

struct {
   uint64_t hid;
   uint32_t uid;
}
which id done with an 8 byte memcpy.
that is then written to SRAT wtih
build_append_int_noprefix(table_data, handle->hid, 8);

So I need to byte swap it if host doesn't match guest endian.

I'll put together a patch, but perhaps best plan is if Michael
drops generic ports and we go around again :(

At least I don't have much backed up behind this one.

Jonathan

> 
> Jonathan
> 
> 
> > 
> > r~  
> 
> 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-05 15:27     ` Jonathan Cameron via
  2024-06-05 15:49       ` Jonathan Cameron via
@ 2024-06-05 16:01       ` Richard Henderson
  2024-06-05 16:08         ` Jonathan Cameron via
  1 sibling, 1 reply; 60+ messages in thread
From: Richard Henderson @ 2024-06-05 16:01 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Michael S. Tsirkin, qemu-devel, Peter Maydell, Igor Mammedov,
	Ani Sinha

On 6/5/24 10:27, Jonathan Cameron wrote:
>> This fails testing:
>>
>> https://gitlab.com/qemu-project/qemu/-/jobs/7021105504
>>
>> acpi-test: Warning! SRAT binary file mismatch. Actual [aml:/tmp/aml-GHR6O2], Expected
>> [aml:tests/data/acpi/q35/SRAT.acpihmat-generic-x].
>> See source file tests/qtest/bios-tables-test.c for instructions on how to update expected
>> files.
>> to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and
>> re-run tests with V=1 environment variable set**
>> ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed:
>> (all_tables_match)
>> Bail out! ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed:
>> (all_tables_match)
>> Aborted (core dumped)
>>
> 
> s390 and passes on an x86 host, so I guess an endian bug - any chance of a table dump
> from someone with access to an s390 host?

Sure.  By what incantation do I produce a dump?


r~


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-05 16:01       ` Richard Henderson
@ 2024-06-05 16:08         ` Jonathan Cameron via
  2024-06-05 16:11           ` Jonathan Cameron via
  0 siblings, 1 reply; 60+ messages in thread
From: Jonathan Cameron via @ 2024-06-05 16:08 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Michael S. Tsirkin, qemu-devel, Peter Maydell, Igor Mammedov,
	Ani Sinha

On Wed, 5 Jun 2024 09:01:37 -0700
Richard Henderson <richard.henderson@linaro.org> wrote:

> On 6/5/24 10:27, Jonathan Cameron wrote:
> >> This fails testing:
> >>
> >> https://gitlab.com/qemu-project/qemu/-/jobs/7021105504
> >>
> >> acpi-test: Warning! SRAT binary file mismatch. Actual [aml:/tmp/aml-GHR6O2], Expected
> >> [aml:tests/data/acpi/q35/SRAT.acpihmat-generic-x].
> >> See source file tests/qtest/bios-tables-test.c for instructions on how to update expected
> >> files.
> >> to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and
> >> re-run tests with V=1 environment variable set**
> >> ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed:
> >> (all_tables_match)
> >> Bail out! ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed:
> >> (all_tables_match)
> >> Aborted (core dumped)
> >>  
> > 
> > s390 and passes on an x86 host, so I guess an endian bug - any chance of a table dump
> > from someone with access to an s390 host?  
> 
> Sure.  By what incantation do I produce a dump?

If you still have the /mnt/aml-GHR602 above then either upload that somewhere or
iasl -d /mnt/aml-GHR602
should generate you a suitable text file.  However generic ports are fairly recent
so you may need a newer iasl from acpica-tools to decode.
It will moan if it doesn't understand the content.

Jonathan

> 
> 
> r~



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-05 16:08         ` Jonathan Cameron via
@ 2024-06-05 16:11           ` Jonathan Cameron via
  2024-06-05 16:54             ` Richard Henderson
  0 siblings, 1 reply; 60+ messages in thread
From: Jonathan Cameron via @ 2024-06-05 16:11 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Michael S. Tsirkin, qemu-devel, Peter Maydell, Igor Mammedov,
	Ani Sinha

On Wed, 5 Jun 2024 17:08:45 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 5 Jun 2024 09:01:37 -0700
> Richard Henderson <richard.henderson@linaro.org> wrote:
> 
> > On 6/5/24 10:27, Jonathan Cameron wrote:  
> > >> This fails testing:
> > >>
> > >> https://gitlab.com/qemu-project/qemu/-/jobs/7021105504
> > >>
> > >> acpi-test: Warning! SRAT binary file mismatch. Actual [aml:/tmp/aml-GHR6O2], Expected
> > >> [aml:tests/data/acpi/q35/SRAT.acpihmat-generic-x].
> > >> See source file tests/qtest/bios-tables-test.c for instructions on how to update expected
> > >> files.
> > >> to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and
> > >> re-run tests with V=1 environment variable set**
> > >> ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed:
> > >> (all_tables_match)
> > >> Bail out! ERROR:../alt/tests/qtest/bios-tables-test.c:550:test_acpi_asl: assertion failed:
> > >> (all_tables_match)
> > >> Aborted (core dumped)
> > >>    
> > > 
> > > s390 and passes on an x86 host, so I guess an endian bug - any chance of a table dump
> > > from someone with access to an s390 host?    
> > 
> > Sure.  By what incantation do I produce a dump?  
> 
> If you still have the /mnt/aml-GHR602 above then either upload that somewhere or
> iasl -d /mnt/aml-GHR602
> should generate you a suitable text file.  However generic ports are fairly recent
> so you may need a newer iasl from acpica-tools to decode.
> It will moan if it doesn't understand the content.
> 
make check-qtest-x86_64 is how I get the test to run in the first place.

Alternatively, this 'might' be sufficient if my guess for the problem
is correct. Thanks!

From 956df037f024783f19b6b00e5e280484380227a0 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Date: Wed, 5 Jun 2024 17:01:36 +0100
Subject: [PATCH] hw/acpi: Fix big endian host creation of Generic Port
 Affinity Structures

Treating the HID as an integer caused it to get bit reversed
on big endian hosts running little endian guests.  Treat it
as a character array instead.

Fixes hw/acpi: Generic Port Affinity Structure Support
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---
Maybe this is the only problem?  I don't have a setup to test
so any help would be appreciated.
---
 include/hw/acpi/acpi_generic_initiator.h | 2 +-
 hw/acpi/acpi_generic_initiator.c         | 6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
index 1a899af30f..5baefda33a 100644
--- a/include/hw/acpi/acpi_generic_initiator.h
+++ b/include/hw/acpi/acpi_generic_initiator.h
@@ -61,7 +61,7 @@ typedef struct PCIDeviceHandle {
             uint16_t bdf;
         };
         struct {
-            uint64_t hid;
+            char hid[8];
             uint32_t uid;
         };
     };
diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
index 78b80dcf08..c2fb5ab2ef 100644
--- a/hw/acpi/acpi_generic_initiator.c
+++ b/hw/acpi/acpi_generic_initiator.c
@@ -150,8 +150,12 @@ build_srat_generic_node_affinity(GArray *table_data, int node,
         build_append_int_noprefix(table_data, handle->bdf, 2);
         build_append_int_noprefix(table_data, 0, 12);
     } else {
+        int i;
+
         /* Device Handle - ACPI */
-        build_append_int_noprefix(table_data, handle->hid, 8);
+        for (i = 0; i < sizeof(handle->hid); i++) {
+            build_append_int_noprefix(table_data, handle->hid[i], 1);
+        }
         build_append_int_noprefix(table_data, handle->uid, 4);
         build_append_int_noprefix(table_data, 0, 4);
     }
-- 
2.39.2


> Jonathan
> 
> > 
> > 
> > r~  
> 



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-05 16:11           ` Jonathan Cameron via
@ 2024-06-05 16:54             ` Richard Henderson
  2024-06-05 17:19               ` Jonathan Cameron via
  0 siblings, 1 reply; 60+ messages in thread
From: Richard Henderson @ 2024-06-05 16:54 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Michael S. Tsirkin, qemu-devel, Peter Maydell, Igor Mammedov,
	Ani Sinha

[-- Attachment #1: Type: text/plain, Size: 2874 bytes --]

On 6/5/24 11:11, Jonathan Cameron wrote:
>>> Sure.  By what incantation do I produce a dump?
>>
>> If you still have the /mnt/aml-GHR602 above then either upload that somewhere or
>> iasl -d /mnt/aml-GHR602
>> should generate you a suitable text file.  However generic ports are fairly recent
>> so you may need a newer iasl from acpica-tools to decode.
>> It will moan if it doesn't understand the content.
>>
> make check-qtest-x86_64 is how I get the test to run in the first place.

Ah, I see, the file is not cleaned up on abort.
For the record, I attach the dump.

> Alternatively, this 'might' be sufficient if my guess for the problem
> is correct. Thanks!
> 
>  From 956df037f024783f19b6b00e5e280484380227a0 Mon Sep 17 00:00:00 2001
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Date: Wed, 5 Jun 2024 17:01:36 +0100
> Subject: [PATCH] hw/acpi: Fix big endian host creation of Generic Port
>   Affinity Structures
> 
> Treating the HID as an integer caused it to get bit reversed
> on big endian hosts running little endian guests.  Treat it
> as a character array instead.
> 
> Fixes hw/acpi: Generic Port Affinity Structure Support
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> ---
> Maybe this is the only problem?  I don't have a setup to test
> so any help would be appreciated.
> ---
>   include/hw/acpi/acpi_generic_initiator.h | 2 +-
>   hw/acpi/acpi_generic_initiator.c         | 6 +++++-
>   2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
> index 1a899af30f..5baefda33a 100644
> --- a/include/hw/acpi/acpi_generic_initiator.h
> +++ b/include/hw/acpi/acpi_generic_initiator.h
> @@ -61,7 +61,7 @@ typedef struct PCIDeviceHandle {
>               uint16_t bdf;
>           };
>           struct {
> -            uint64_t hid;
> +            char hid[8];
>               uint32_t uid;
>           };
>       };
> diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
> index 78b80dcf08..c2fb5ab2ef 100644
> --- a/hw/acpi/acpi_generic_initiator.c
> +++ b/hw/acpi/acpi_generic_initiator.c
> @@ -150,8 +150,12 @@ build_srat_generic_node_affinity(GArray *table_data, int node,
>           build_append_int_noprefix(table_data, handle->bdf, 2);
>           build_append_int_noprefix(table_data, 0, 12);
>       } else {
> +        int i;
> +
>           /* Device Handle - ACPI */
> -        build_append_int_noprefix(table_data, handle->hid, 8);
> +        for (i = 0; i < sizeof(handle->hid); i++) {
> +            build_append_int_noprefix(table_data, handle->hid[i], 1);
> +        }
>           build_append_int_noprefix(table_data, handle->uid, 4);
>           build_append_int_noprefix(table_data, 0, 4);
>       }

Yes, this fixes the abort.
Merge the declaration of i into the for loop?


r~

[-- Attachment #2: aml-6SC2O2.gz --]
[-- Type: application/gzip, Size: 144 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc)
  2024-06-05 16:54             ` Richard Henderson
@ 2024-06-05 17:19               ` Jonathan Cameron via
  0 siblings, 0 replies; 60+ messages in thread
From: Jonathan Cameron via @ 2024-06-05 17:19 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Michael S. Tsirkin, qemu-devel, Peter Maydell, Igor Mammedov,
	Ani Sinha

On Wed, 5 Jun 2024 09:54:16 -0700
Richard Henderson <richard.henderson@linaro.org> wrote:

> On 6/5/24 11:11, Jonathan Cameron wrote:
> >>> Sure.  By what incantation do I produce a dump?  
> >>
> >> If you still have the /mnt/aml-GHR602 above then either upload that somewhere or
> >> iasl -d /mnt/aml-GHR602
> >> should generate you a suitable text file.  However generic ports are fairly recent
> >> so you may need a newer iasl from acpica-tools to decode.
> >> It will moan if it doesn't understand the content.
> >>  
> > make check-qtest-x86_64 is how I get the test to run in the first place.  
> 
> Ah, I see, the file is not cleaned up on abort.
Yes. End up with a lot of those after a while!

> For the record, I attach the dump.
Thanks!
> 
> > Alternatively, this 'might' be sufficient if my guess for the problem
> > is correct. Thanks!
> > 
> >  From 956df037f024783f19b6b00e5e280484380227a0 Mon Sep 17 00:00:00 2001
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Date: Wed, 5 Jun 2024 17:01:36 +0100
> > Subject: [PATCH] hw/acpi: Fix big endian host creation of Generic Port
> >   Affinity Structures
> > 
> > Treating the HID as an integer caused it to get bit reversed
> > on big endian hosts running little endian guests.  Treat it
> > as a character array instead.
> > 
> > Fixes hw/acpi: Generic Port Affinity Structure Support
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > ---
> > Maybe this is the only problem?  I don't have a setup to test
> > so any help would be appreciated.
> > ---
> >   include/hw/acpi/acpi_generic_initiator.h | 2 +-
> >   hw/acpi/acpi_generic_initiator.c         | 6 +++++-
> >   2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
> > index 1a899af30f..5baefda33a 100644
> > --- a/include/hw/acpi/acpi_generic_initiator.h
> > +++ b/include/hw/acpi/acpi_generic_initiator.h
> > @@ -61,7 +61,7 @@ typedef struct PCIDeviceHandle {
> >               uint16_t bdf;
> >           };
> >           struct {
> > -            uint64_t hid;
> > +            char hid[8];
> >               uint32_t uid;
> >           };
> >       };
> > diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
> > index 78b80dcf08..c2fb5ab2ef 100644
> > --- a/hw/acpi/acpi_generic_initiator.c
> > +++ b/hw/acpi/acpi_generic_initiator.c
> > @@ -150,8 +150,12 @@ build_srat_generic_node_affinity(GArray *table_data, int node,
> >           build_append_int_noprefix(table_data, handle->bdf, 2);
> >           build_append_int_noprefix(table_data, 0, 12);
> >       } else {
> > +        int i;
> > +
> >           /* Device Handle - ACPI */
> > -        build_append_int_noprefix(table_data, handle->hid, 8);
> > +        for (i = 0; i < sizeof(handle->hid); i++) {
> > +            build_append_int_noprefix(table_data, handle->hid[i], 1);
> > +        }
> >           build_append_int_noprefix(table_data, handle->uid, 4);
> >           build_append_int_noprefix(table_data, 0, 4);
> >       }  
> 
> Yes, this fixes the abort.
> Merge the declaration of i into the for loop?

sure.

Thanks for testing. Seems it's time I tried to get an emulated s390
setup running if that's practical as two of these in the last year :(

Jonathan


> 
> 
> r~



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 00/46] virtio: features,fixes
  2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
                   ` (46 preceding siblings ...)
  2024-06-05  7:27 ` [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
@ 2024-06-25 13:06 ` Peter Maydell
  2024-06-25 14:01   ` Michael S. Tsirkin
  47 siblings, 1 reply; 60+ messages in thread
From: Peter Maydell @ 2024-06-25 13:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On Tue, 4 Jun 2024 at 20:06, Michael S. Tsirkin <mst@redhat.com> wrote:
> ----------------------------------------------------------------
> virtio: features,fixes
>
> A bunch of improvements:
> - vhost dirty log is now only scanned once, not once per device
> - virtio and vhost now support VIRTIO_F_NOTIFICATION_DATA
> - cxl gained DCD emulation support
> - pvpanic gained shutdown support
> - acpi now supports Generic Port Affinity Structure
> - new tests
> - bugfixes
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Cindy Lu (2):
>       virtio-pci: Fix the use of an uninitialized irqfd.
>       virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one()

Hi -- I was wondering what happened to this virtio-pci patch,
which I reviewed. It seems like it got into this pullreq
which didn't get merged because of a bios-tables-test failure.

Michael, do you plan to respin this pullreq soon ?

-- PMM


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PULL 00/46] virtio: features,fixes
  2024-06-25 13:06 ` Peter Maydell
@ 2024-06-25 14:01   ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2024-06-25 14:01 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

On Tue, Jun 25, 2024 at 02:06:04PM +0100, Peter Maydell wrote:
> On Tue, 4 Jun 2024 at 20:06, Michael S. Tsirkin <mst@redhat.com> wrote:
> > ----------------------------------------------------------------
> > virtio: features,fixes
> >
> > A bunch of improvements:
> > - vhost dirty log is now only scanned once, not once per device
> > - virtio and vhost now support VIRTIO_F_NOTIFICATION_DATA
> > - cxl gained DCD emulation support
> > - pvpanic gained shutdown support
> > - acpi now supports Generic Port Affinity Structure
> > - new tests
> > - bugfixes
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Cindy Lu (2):
> >       virtio-pci: Fix the use of an uninitialized irqfd.
> >       virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one()
> 
> Hi -- I was wondering what happened to this virtio-pci patch,
> which I reviewed. It seems like it got into this pullreq
> which didn't get merged because of a bios-tables-test failure.
> 
> Michael, do you plan to respin this pullreq soon ?
> 
> -- PMM

For some reason I was sure it got merged.
Whatevs I will do a new pull req.



^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2024-06-25 14:02 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-04 19:05 [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 01/46] vhost: dirty log should be per backend type Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 02/46] vhost: Perform memory section dirty scans once per iteration Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 03/46] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 04/46] virtio-pci: Fix the use of an uninitialized irqfd Michael S. Tsirkin
2024-06-05  7:27   ` Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 05/46] virtio/virtio-pci: Handle extra notification data Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 06/46] virtio: Prevent creation of device using notification-data with ioeventfd Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 07/46] virtio-mmio: Handle extra notification data Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 08/46] virtio-ccw: " Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 09/46] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 10/46] Fix vhost user assertion when sending more than one fd Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 11/46] vhost-vsock: add VIRTIO_F_RING_PACKED to feature_bits Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 12/46] hw/virtio: Fix obtain the buffer id from the last descriptor Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 13/46] virtio-pci: only reset pm state during resetting Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 14/46] vhost-user-gpu: fix import of DMABUF Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 15/46] Revert "vhost-user: fix lost reconnect" Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 16/46] vhost-user: fix lost reconnect again Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 17/46] hw/cxl/mailbox: change CCI cmd set structure to be a member, not a reference Michael S. Tsirkin
2024-06-04 19:06 ` [PULL 18/46] hw/cxl/mailbox: interface to add CCI commands to an existing CCI Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 19/46] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 20/46] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 21/46] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 22/46] hw/mem/cxl_type3: Add support to create DC regions to " Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 23/46] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 24/46] hw/mem/cxl_type3: Add host backend and address space handling for DC regions Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 25/46] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 26/46] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 27/46] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 28/46] hw/mem/cxl_type3: Add DPA range validation for accesses to DC regions Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 29/46] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 30/46] hw/mem/cxl_type3: Allow to release extent superset in QMP interface Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 31/46] hw/acpi/GI: Fix trivial parameter alignment issue Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 32/46] hw/acpi: Insert an acpi-generic-node base under acpi-generic-initiator Michael S. Tsirkin
2024-06-04 19:07 ` [PULL 33/46] hw/acpi: Generic Port Affinity Structure support Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 34/46] bios-tables-test: Allow for new acpihmat-generic-x test data Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 35/46] bios-tables-test: Add complex SRAT / HMAT test for GI GP Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 36/46] bios-tables-test: Add data for complex numa test (GI, GP etc) Michael S. Tsirkin
2024-06-05 14:39   ` Richard Henderson
2024-06-05 15:27     ` Jonathan Cameron via
2024-06-05 15:49       ` Jonathan Cameron via
2024-06-05 16:01       ` Richard Henderson
2024-06-05 16:08         ` Jonathan Cameron via
2024-06-05 16:11           ` Jonathan Cameron via
2024-06-05 16:54             ` Richard Henderson
2024-06-05 17:19               ` Jonathan Cameron via
2024-06-04 19:08 ` [PULL 37/46] scripts/update-linux-headers: Copy setup_data.h to correct directory Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 38/46] linux-headers: update to 6.10-rc1 Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 39/46] hw/misc/pvpanic: centralize definition of supported events Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 40/46] tests/qtest/pvpanic: use centralized " Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 41/46] hw/misc/pvpanic: add support for normal shutdowns Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 42/46] pvpanic: Emit GUEST_PVSHUTDOWN QMP event on pvpanic shutdown signal Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 43/46] tests/qtest/pvpanic: add tests for pvshutdown event Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 44/46] Revert "docs/specs/pvpanic: mark shutdown event as not implemented" Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 45/46] virtio-pci: Fix the failure process in kvm_virtio_pci_vector_use_one() Michael S. Tsirkin
2024-06-04 19:08 ` [PULL 46/46] hw/cxl: Fix read from bogus memory Michael S. Tsirkin
2024-06-05  7:27 ` [PULL 00/46] virtio: features,fixes Michael S. Tsirkin
2024-06-05 14:44   ` Richard Henderson
2024-06-25 13:06 ` Peter Maydell
2024-06-25 14:01   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).