[PATCH v3 0/7] Move memory listener register to vhost_vdpa

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
@ 2025-03-14 13:01 Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 1/7] vdpa: check for iova tree initialized at net_client_start Jonah Palmer
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Jonah Palmer @ 2025-03-14 13:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: eperezma, peterx, mst, jasowant, lvivier, dtatulea, leiyan, parav,
	sgarzare, si-wei.liu, lingshan.zhu, boris.ostrovsky, jonah.palmer

Current memory operations like pinning may take a lot of time at the
destination.  Currently they are done after the source of the migration is
stopped, and before the workload is resumed at the destination.  This is a
period where neigher traffic can flow, nor the VM workload can continue
(downtime).

We can do better as we know the memory layout of the guest RAM at the
destination from the moment that all devices are initializaed.  So
moving that operation allows QEMU to communicate the kernel the maps
while the workload is still running in the source, so Linux can start
mapping them.

As a small drawback, there is a time in the initialization where QEMU
cannot respond to QMP etc.  By some testing, this time is about
0.2seconds.  This may be further reduced (or increased) depending on the
vdpa driver and the platform hardware, and it is dominated by the cost
of memory pinning.

This matches the time that we move out of the called downtime window.
The downtime is measured as checking the trace timestamp from the moment
the source suspend the device to the moment the destination starts the
eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
secs to 2.0949.

Future directions on top of this series may include to move more things ahead
of the migration time, like set DRIVER_OK or perform actual iterative migration
of virtio-net devices.

Comments are welcome.

This series is a different approach of series [1]. As the title does not
reflect the changes anymore, please refer to the previous one to know the
series history.

This series is based on [2], it must be applied after it.

[Jonah Palmer]
This series was rebased after [3] was pulled in, as [3] was a prerequisite
fix for this series.

v3:
---
* Rebase

v2:
---
* Move the memory listener registration to vhost_vdpa_set_owner function.
* Move the iova_tree allocation to net_vhost_vdpa_init.

v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.

[1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
[2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
[3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/

Eugenio Pérez (7):
  vdpa: check for iova tree initialized at net_client_start
  vdpa: reorder vhost_vdpa_set_backend_cap
  vdpa: set backend capabilities at vhost_vdpa_init
  vdpa: add listener_registered
  vdpa: reorder listener assignment
  vdpa: move iova_tree allocation to net_vhost_vdpa_init
  vdpa: move memory listener register to vhost_vdpa_init

 hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
 include/hw/virtio/vhost-vdpa.h | 22 +++++++-
 net/vhost-vdpa.c               | 34 ++----------
 3 files changed, 88 insertions(+), 66 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/7] vdpa: check for iova tree initialized at net_client_start
  2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
@ 2025-03-14 13:01 ` Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 2/7] vdpa: reorder vhost_vdpa_set_backend_cap Jonah Palmer
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Jonah Palmer @ 2025-03-14 13:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: eperezma, peterx, mst, jasowant, lvivier, dtatulea, leiyan, parav,
	sgarzare, si-wei.liu, lingshan.zhu, boris.ostrovsky, jonah.palmer

From: Eugenio Pérez <eperezma@redhat.com>

To map the guest memory while it is migrating we need to create the
iova_tree, as long as the destination uses x-svq=on. Checking to not
override it.

The function vhost_vdpa_net_client_stop clear it if the device is
stopped. If the guest starts the device again, the iova tree is
recreated by vhost_vdpa_net_data_start_first or vhost_vdpa_net_cvq_start
if needed, so old behavior is kept.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 net/vhost-vdpa.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index f7a54f46aa..5bc945d3e0 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -354,7 +354,9 @@ static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
 
     migration_add_notifier(&s->migration_state,
                            vdpa_net_migration_state_notifier);
-    if (v->shadow_vqs_enabled) {
+
+    /* iova_tree may be initialized by vhost_vdpa_net_load_setup */
+    if (v->shadow_vqs_enabled && !v->shared->iova_tree) {
         v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first,
                                                    v->shared->iova_range.last);
     }
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/7] vdpa: reorder vhost_vdpa_set_backend_cap
  2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 1/7] vdpa: check for iova tree initialized at net_client_start Jonah Palmer
@ 2025-03-14 13:01 ` Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 3/7] vdpa: set backend capabilities at vhost_vdpa_init Jonah Palmer
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Jonah Palmer @ 2025-03-14 13:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: eperezma, peterx, mst, jasowant, lvivier, dtatulea, leiyan, parav,
	sgarzare, si-wei.liu, lingshan.zhu, boris.ostrovsky, jonah.palmer

From: Eugenio Pérez <eperezma@redhat.com>

It will be used directly by vhost_vdpa_init.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 60 +++++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7efbde3d4c..79224d18d8 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -596,6 +596,36 @@ static void vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v)
     v->shadow_vqs = g_steal_pointer(&shadow_vqs);
 }
 
+static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
+{
+    struct vhost_vdpa *v = dev->opaque;
+
+    uint64_t features;
+    uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
+        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
+        0x1ULL << VHOST_BACKEND_F_SUSPEND;
+    int r;
+
+    if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
+        return -EFAULT;
+    }
+
+    features &= f;
+
+    if (vhost_vdpa_first_dev(dev)) {
+        r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
+        if (r) {
+            return -EFAULT;
+        }
+    }
+
+    dev->backend_cap = features;
+    v->shared->backend_cap = features;
+
+    return 0;
+}
+
 static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
 {
     struct vhost_vdpa *v = opaque;
@@ -843,36 +873,6 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
     return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
 }
 
-static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
-{
-    struct vhost_vdpa *v = dev->opaque;
-
-    uint64_t features;
-    uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-        0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
-        0x1ULL << VHOST_BACKEND_F_IOTLB_ASID |
-        0x1ULL << VHOST_BACKEND_F_SUSPEND;
-    int r;
-
-    if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
-        return -EFAULT;
-    }
-
-    features &= f;
-
-    if (vhost_vdpa_first_dev(dev)) {
-        r = vhost_vdpa_call(dev, VHOST_SET_BACKEND_FEATURES, &features);
-        if (r) {
-            return -EFAULT;
-        }
-    }
-
-    dev->backend_cap = features;
-    v->shared->backend_cap = features;
-
-    return 0;
-}
-
 static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
                                     uint32_t *device_id)
 {
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/7] vdpa: set backend capabilities at vhost_vdpa_init
  2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 1/7] vdpa: check for iova tree initialized at net_client_start Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 2/7] vdpa: reorder vhost_vdpa_set_backend_cap Jonah Palmer
@ 2025-03-14 13:01 ` Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 4/7] vdpa: add listener_registered Jonah Palmer
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Jonah Palmer @ 2025-03-14 13:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: eperezma, peterx, mst, jasowant, lvivier, dtatulea, leiyan, parav,
	sgarzare, si-wei.liu, lingshan.zhu, boris.ostrovsky, jonah.palmer

From: Eugenio Pérez <eperezma@redhat.com>

The backend does not reset them until the vdpa file descriptor is closed
so there is no harm in doing it only once.

This allows the destination of a live migration to premap memory in
batches, using VHOST_BACKEND_F_IOTLB_BATCH.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 79224d18d8..939a5a28a1 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -636,6 +636,12 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     v->dev = dev;
     dev->opaque =  opaque ;
     v->shared->listener = vhost_vdpa_memory_listener;
+
+    ret = vhost_vdpa_set_backend_cap(dev);
+    if (unlikely(ret != 0)) {
+        return ret;
+    }
+
     vhost_vdpa_init_svq(dev, v);
 
     error_propagate(&dev->migration_blocker, v->migration_blocker);
@@ -1565,7 +1571,6 @@ const VhostOps vdpa_ops = {
         .vhost_set_vring_kick = vhost_vdpa_set_vring_kick,
         .vhost_set_vring_call = vhost_vdpa_set_vring_call,
         .vhost_get_features = vhost_vdpa_get_features,
-        .vhost_set_backend_cap = vhost_vdpa_set_backend_cap,
         .vhost_set_owner = vhost_vdpa_set_owner,
         .vhost_set_vring_endian = NULL,
         .vhost_backend_memslots_limit = vhost_vdpa_memslots_limit,
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/7] vdpa: add listener_registered
  2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
                   ` (2 preceding siblings ...)
  2025-03-14 13:01 ` [PATCH v3 3/7] vdpa: set backend capabilities at vhost_vdpa_init Jonah Palmer
@ 2025-03-14 13:01 ` Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 5/7] vdpa: reorder listener assignment Jonah Palmer
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Jonah Palmer @ 2025-03-14 13:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: eperezma, peterx, mst, jasowant, lvivier, dtatulea, leiyan, parav,
	sgarzare, si-wei.liu, lingshan.zhu, boris.ostrovsky, jonah.palmer

From: Eugenio Pérez <eperezma@redhat.com>

Check if the listener has been registered or not, so it needs to be
registered again at start.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/vhost-vdpa.c         | 7 ++++++-
 include/hw/virtio/vhost-vdpa.h | 6 ++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 939a5a28a1..61a0e8fdbd 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1381,7 +1381,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
                          "IOMMU and try again");
             return -1;
         }
-        memory_listener_register(&v->shared->listener, dev->vdev->dma_as);
+        if (!v->shared->listener_registered) {
+            memory_listener_register(&v->shared->listener, dev->vdev->dma_as);
+            v->shared->listener_registered = true;
+        }
 
         return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
     }
@@ -1401,6 +1404,8 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev)
     vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                VIRTIO_CONFIG_S_DRIVER);
     memory_listener_unregister(&v->shared->listener);
+    v->shared->listener_registered = false;
+
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 0a9575b469..221840987e 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -51,6 +51,12 @@ typedef struct vhost_vdpa_shared {
 
     bool iotlb_batch_begin_sent;
 
+    /*
+     * The memory listener has been registered, so DMA maps have been sent to
+     * the device.
+     */
+    bool listener_registered;
+
     /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
     bool shadow_data;
 
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/7] vdpa: reorder listener assignment
  2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
                   ` (3 preceding siblings ...)
  2025-03-14 13:01 ` [PATCH v3 4/7] vdpa: add listener_registered Jonah Palmer
@ 2025-03-14 13:01 ` Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_init Jonah Palmer
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Jonah Palmer @ 2025-03-14 13:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: eperezma, peterx, mst, jasowant, lvivier, dtatulea, leiyan, parav,
	sgarzare, si-wei.liu, lingshan.zhu, boris.ostrovsky, jonah.palmer

From: Eugenio Pérez <eperezma@redhat.com>

Since commit f6fe3e333f ("vdpa: move memory listener to
vhost_vdpa_shared") this piece of code repeatedly assign
shared->listener members.  This was not a problem as it was not used
until device start.

However next patches move the listener registration to this
vhost_vdpa_init function.  When the listener is registered it is added
to an embedded linked list, so setting its members again will cause
memory corruption to the linked list node.

Do the right thing and only set it in the first vdpa device.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 hw/virtio/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 61a0e8fdbd..eb5b5208b7 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -635,7 +635,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
 
     v->dev = dev;
     dev->opaque =  opaque ;
-    v->shared->listener = vhost_vdpa_memory_listener;
 
     ret = vhost_vdpa_set_backend_cap(dev);
     if (unlikely(ret != 0)) {
@@ -677,6 +676,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
     vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
                                VIRTIO_CONFIG_S_DRIVER);
 
+    v->shared->listener = vhost_vdpa_memory_listener;
     return 0;
 }
 
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_init
  2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
                   ` (4 preceding siblings ...)
  2025-03-14 13:01 ` [PATCH v3 5/7] vdpa: reorder listener assignment Jonah Palmer
@ 2025-03-14 13:01 ` Jonah Palmer
  2025-03-14 13:01 ` [PATCH v3 7/7] vdpa: move memory listener register to vhost_vdpa_init Jonah Palmer
  2025-03-18  1:53 ` [PATCH v3 0/7] Move " Lei Yang
  7 siblings, 0 replies; 16+ messages in thread
From: Jonah Palmer @ 2025-03-14 13:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: eperezma, peterx, mst, jasowant, lvivier, dtatulea, leiyan, parav,
	sgarzare, si-wei.liu, lingshan.zhu, boris.ostrovsky, jonah.palmer

From: Eugenio Pérez <eperezma@redhat.com>

As we are moving to keep the mapping through all the vdpa device life
instead of resetting it at VirtIO reset, we need to move all its
dependencies to the initialization too.  In particular devices with
x-svq=on need a valid iova_tree from the beginning.

Simplify the code also consolidating the two creation points: the first
data vq in case of SVQ active and CVQ start in case only CVQ uses it.

Suggested-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 include/hw/virtio/vhost-vdpa.h | 16 ++++++++++++++-
 net/vhost-vdpa.c               | 36 +++-------------------------------
 2 files changed, 18 insertions(+), 34 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 221840987e..449bf5c840 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -43,7 +43,21 @@ typedef struct vhost_vdpa_shared {
     struct vhost_vdpa_iova_range iova_range;
     QLIST_HEAD(, vdpa_iommu) iommu_list;
 
-    /* IOVA mapping used by the Shadow Virtqueue */
+    /*
+     * IOVA mapping used by the Shadow Virtqueue
+     *
+     * It is shared among all ASID for simplicity, whether CVQ shares ASID with
+     * guest or not:
+     * - Memory listener need access to guest's memory addresses allocated in
+     *   the IOVA tree.
+     * - There should be plenty of IOVA address space for both ASID not to
+     *   worry about collisions between them.  Guest's translations are still
+     *   validated with virtio virtqueue_pop so there is no risk for the guest
+     *   to access memory that it shouldn't.
+     *
+     * To allocate a iova tree per ASID is doable but it complicates the code
+     * and it is not worth it for the moment.
+     */
     VhostIOVATree *iova_tree;
 
     /* Copy of backend features */
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 5bc945d3e0..4254ca7c36 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -235,6 +235,7 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
         return;
     }
     qemu_close(s->vhost_vdpa.shared->device_fd);
+    g_clear_pointer(&s->vhost_vdpa.shared->iova_tree, vhost_iova_tree_delete);
     g_free(s->vhost_vdpa.shared);
 }
 
@@ -350,16 +351,8 @@ static int vdpa_net_migration_state_notifier(NotifierWithReturn *notifier,
 
 static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
 {
-    struct vhost_vdpa *v = &s->vhost_vdpa;
-
     migration_add_notifier(&s->migration_state,
                            vdpa_net_migration_state_notifier);
-
-    /* iova_tree may be initialized by vhost_vdpa_net_load_setup */
-    if (v->shadow_vqs_enabled && !v->shared->iova_tree) {
-        v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first,
-                                                   v->shared->iova_range.last);
-    }
 }
 
 static int vhost_vdpa_net_data_start(NetClientState *nc)
@@ -406,19 +399,12 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
 static void vhost_vdpa_net_client_stop(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
-    struct vhost_dev *dev;
 
     assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
     if (s->vhost_vdpa.index == 0) {
         migration_remove_notifier(&s->migration_state);
     }
-
-    dev = s->vhost_vdpa.dev;
-    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
-        g_clear_pointer(&s->vhost_vdpa.shared->iova_tree,
-                        vhost_iova_tree_delete);
-    }
 }
 
 static NetClientInfo net_vhost_vdpa_info = {
@@ -589,24 +575,6 @@ out:
         return 0;
     }
 
-    /*
-     * If other vhost_vdpa already have an iova_tree, reuse it for simplicity,
-     * whether CVQ shares ASID with guest or not, because:
-     * - Memory listener need access to guest's memory addresses allocated in
-     *   the IOVA tree.
-     * - There should be plenty of IOVA address space for both ASID not to
-     *   worry about collisions between them.  Guest's translations are still
-     *   validated with virtio virtqueue_pop so there is no risk for the guest
-     *   to access memory that it shouldn't.
-     *
-     * To allocate a iova tree per ASID is doable but it complicates the code
-     * and it is not worth it for the moment.
-     */
-    if (!v->shared->iova_tree) {
-        v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first,
-                                                   v->shared->iova_range.last);
-    }
-
     r = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer,
                                vhost_vdpa_net_cvq_cmd_page_len(), false);
     if (unlikely(r < 0)) {
@@ -1715,6 +1683,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
         s->vhost_vdpa.shared->device_fd = vdpa_device_fd;
         s->vhost_vdpa.shared->iova_range = iova_range;
         s->vhost_vdpa.shared->shadow_data = svq;
+        s->vhost_vdpa.shared->iova_tree = vhost_iova_tree_new(iova_range.first,
+                                                              iova_range.last);
     } else if (!is_datapath) {
         s->cvq_cmd_out_buffer = mmap(NULL, vhost_vdpa_net_cvq_cmd_page_len(),
                                      PROT_READ | PROT_WRITE,
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 7/7] vdpa: move memory listener register to vhost_vdpa_init
  2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
                   ` (5 preceding siblings ...)
  2025-03-14 13:01 ` [PATCH v3 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_init Jonah Palmer
@ 2025-03-14 13:01 ` Jonah Palmer
  2025-03-18  1:53 ` [PATCH v3 0/7] Move " Lei Yang
  7 siblings, 0 replies; 16+ messages in thread
From: Jonah Palmer @ 2025-03-14 13:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: eperezma, peterx, mst, jasowant, lvivier, dtatulea, leiyan, parav,
	sgarzare, si-wei.liu, lingshan.zhu, boris.ostrovsky, jonah.palmer

From: Eugenio Pérez <eperezma@redhat.com>

Current memory operations like pinning may take a lot of time at the
destination.  Currently they are done after the source of the migration is
stopped, and before the workload is resumed at the destination.  This is a
period where neigher traffic can flow, nor the VM workload can continue
(downtime).

We can do better as we know the memory layout of the guest RAM at the
destination from the moment that all devices are initializaed.  So
moving that operation allows QEMU to communicate the kernel the maps
while the workload is still running in the source, so Linux can start
mapping them.

As a small drawback, there is a time in the initialization where QEMU
cannot respond to QMP etc.  By some testing, this time is about
0.2seconds.  This may be further reduced (or increased) depending on the
vdpa driver and the platform hardware, and it is dominated by the cost
of memory pinning.

This matches the time that we move out of the called downtime window.
The downtime is measured as checking the trace timestamp from the moment
the source suspend the device to the moment the destination starts the
eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
secs to 2.0949.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>

--
v2:
Move the memory listener registration to vhost_vdpa_set_owner function.
In case of hotplug the vdpa device, the memory is already set up, and
leaving memory listener register call in the init function made maps
occur before set owner call.

To be 100% safe, let's put it right after set_owner call.

Reported-by: Lei Yang <leiyang@redhat.com>
---
 hw/virtio/vhost-vdpa.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index eb5b5208b7..afed991253 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1381,6 +1381,11 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
                          "IOMMU and try again");
             return -1;
         }
+        if (v->shared->listener_registered &&
+            dev->vdev->dma_as != v->shared->listener.address_space) {
+            memory_listener_unregister(&v->shared->listener);
+            v->shared->listener_registered = false;
+        }
         if (!v->shared->listener_registered) {
             memory_listener_register(&v->shared->listener, dev->vdev->dma_as);
             v->shared->listener_registered = true;
@@ -1539,12 +1544,27 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,

 static int vhost_vdpa_set_owner(struct vhost_dev *dev)
 {
+    int r;
+    struct vhost_vdpa *v;
+
     if (!vhost_vdpa_first_dev(dev)) {
         return 0;
     }

     trace_vhost_vdpa_set_owner(dev);
-    return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
+    r = vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
+    if (unlikely(r < 0)) {
+        return r;
+    }
+
+    /*
+     * Being optimistic and listening address space memory. If the device
+     * uses vIOMMU, it is changed at vhost_vdpa_dev_start.
+     */
+    v = dev->opaque;
+    memory_listener_register(&v->shared->listener, &address_space_memory);
+    v->shared->listener_registered = true;
+    return 0;
 }

 static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
  2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
                   ` (6 preceding siblings ...)
  2025-03-14 13:01 ` [PATCH v3 7/7] vdpa: move memory listener register to vhost_vdpa_init Jonah Palmer
@ 2025-03-18  1:53 ` Lei Yang
  2025-03-18  2:14   ` Jason Wang
  7 siblings, 1 reply; 16+ messages in thread
From: Lei Yang @ 2025-03-18  1:53 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: qemu-devel, eperezma, peterx, mst, jasowant, lvivier, dtatulea,
	leiyan, parav, sgarzare, si-wei.liu, lingshan.zhu,
	boris.ostrovsky

[-- Attachment #1: Type: text/plain, Size: 3528 bytes --]

Hi Jonah

I tested this series with the vhost_vdpa device based on mellanox
ConnectX-6 DX nic and hit the host kernel crash. This problem can be
easier to reproduce under the hotplug/unplug device scenario.
For the core dump messages please review the attachment.
FW version:
#  flint -d 0000:0d:00.0 q |grep Version
FW Version:            22.44.1036
Product Version:       22.44.1036

Best Regards
Lei

On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
> Current memory operations like pinning may take a lot of time at the
> destination.  Currently they are done after the source of the migration is
> stopped, and before the workload is resumed at the destination.  This is a
> period where neigher traffic can flow, nor the VM workload can continue
> (downtime).
>
> We can do better as we know the memory layout of the guest RAM at the
> destination from the moment that all devices are initializaed.  So
> moving that operation allows QEMU to communicate the kernel the maps
> while the workload is still running in the source, so Linux can start
> mapping them.
>
> As a small drawback, there is a time in the initialization where QEMU
> cannot respond to QMP etc.  By some testing, this time is about
> 0.2seconds.  This may be further reduced (or increased) depending on the
> vdpa driver and the platform hardware, and it is dominated by the cost
> of memory pinning.
>
> This matches the time that we move out of the called downtime window.
> The downtime is measured as checking the trace timestamp from the moment
> the source suspend the device to the moment the destination starts the
> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> secs to 2.0949.
>
> Future directions on top of this series may include to move more things ahead
> of the migration time, like set DRIVER_OK or perform actual iterative migration
> of virtio-net devices.
>
> Comments are welcome.
>
> This series is a different approach of series [1]. As the title does not
> reflect the changes anymore, please refer to the previous one to know the
> series history.
>
> This series is based on [2], it must be applied after it.
>
> [Jonah Palmer]
> This series was rebased after [3] was pulled in, as [3] was a prerequisite
> fix for this series.
>
> v3:
> ---
> * Rebase
>
> v2:
> ---
> * Move the memory listener registration to vhost_vdpa_set_owner function.
> * Move the iova_tree allocation to net_vhost_vdpa_init.
>
> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
>
> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
>
> Eugenio Pérez (7):
>   vdpa: check for iova tree initialized at net_client_start
>   vdpa: reorder vhost_vdpa_set_backend_cap
>   vdpa: set backend capabilities at vhost_vdpa_init
>   vdpa: add listener_registered
>   vdpa: reorder listener assignment
>   vdpa: move iova_tree allocation to net_vhost_vdpa_init
>   vdpa: move memory listener register to vhost_vdpa_init
>
>  hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
>  include/hw/virtio/vhost-vdpa.h | 22 +++++++-
>  net/vhost-vdpa.c               | 34 ++----------
>  3 files changed, 88 insertions(+), 66 deletions(-)
>
> --
> 2.43.5
>
>

[-- Attachment #2: vmcore-dmesg.txt --]
[-- Type: text/plain, Size: 48756 bytes --]

[  257.598060] openvswitch: Open vSwitch switching datapath
[  257.826760] mlx5_core 0000:0d:00.0: E-Switch: Enable: mode(LEGACY), nvfs(4), necvfs(0), active vports(5)
[  257.928288] pci 0000:0d:00.2: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  257.928363] pci 0000:0d:00.2: enabling Extended Tags
[  257.931674] mlx5_core 0000:0d:00.2: enabling device (0000 -> 0002)
[  257.931747] mlx5_core 0000:0d:00.2: PTM is not supported by PCIe
[  257.931766] mlx5_core 0000:0d:00.2: firmware version: 22.44.1036
[  258.127431] mlx5_core 0000:0d:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  258.139089] mlx5_core 0000:0d:00.2: Assigned random MAC address 76:34:f6:43:68:d7
[  258.282267] mlx5_core 0000:0d:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  258.286453] mlx5_core 0000:0d:00.2 ens2f0v0: renamed from eth0
[  258.317591] pci 0000:0d:00.3: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  258.317669] pci 0000:0d:00.3: enabling Extended Tags
[  258.321033] mlx5_core 0000:0d:00.3: enabling device (0000 -> 0002)
[  258.321110] mlx5_core 0000:0d:00.3: PTM is not supported by PCIe
[  258.321128] mlx5_core 0000:0d:00.3: firmware version: 22.44.1036
[  258.442412] mlx5_core 0000:0d:00.2 ens2f0v0: Link up
[  258.521068] mlx5_core 0000:0d:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  258.532963] mlx5_core 0000:0d:00.3: Assigned random MAC address 0a:b5:ca:d1:b3:e8
[  258.674658] mlx5_core 0000:0d:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  258.678230] mlx5_core 0000:0d:00.3 ens2f0v1: renamed from eth0
[  258.708606] pci 0000:0d:00.4: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  258.708684] pci 0000:0d:00.4: enabling Extended Tags
[  258.711905] mlx5_core 0000:0d:00.4: enabling device (0000 -> 0002)
[  258.711974] mlx5_core 0000:0d:00.4: PTM is not supported by PCIe
[  258.711991] mlx5_core 0000:0d:00.4: firmware version: 22.44.1036
[  258.832721] mlx5_core 0000:0d:00.3 ens2f0v1: Link up
[  258.909375] mlx5_core 0000:0d:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  258.921060] mlx5_core 0000:0d:00.4: Assigned random MAC address ca:a1:82:3c:f5:0e
[  259.060439] mlx5_core 0000:0d:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  259.063961] mlx5_core 0000:0d:00.4 ens2f0v2: renamed from eth0
[  259.094975] pci 0000:0d:00.5: [15b3:101e] type 00 class 0x020000 PCIe Endpoint
[  259.095052] pci 0000:0d:00.5: enabling Extended Tags
[  259.098268] mlx5_core 0000:0d:00.5: enabling device (0000 -> 0002)
[  259.098327] mlx5_core 0000:0d:00.5: PTM is not supported by PCIe
[  259.098344] mlx5_core 0000:0d:00.5: firmware version: 22.44.1036
[  259.217430] mlx5_core 0000:0d:00.4 ens2f0v2: Link up
[  259.294213] mlx5_core 0000:0d:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  259.305970] mlx5_core 0000:0d:00.5: Assigned random MAC address 42:22:11:90:2f:3b
[  259.446987] mlx5_core 0000:0d:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  259.450475] mlx5_core 0000:0d:00.5 ens2f0v3: renamed from eth0
[  259.590880] mlx5_core 0000:0d:00.5 ens2f0v3: Link up
[  262.238175] mlx5_core 0000:0d:00.0: E-Switch: Disable: mode(LEGACY), nvfs(4), necvfs(0), active vports(5)
[  263.576242] mlx5_core 0000:0d:00.0: E-Switch: Supported tc chains and prios offload
[  264.016364] mlx5_core 0000:0d:00.0 ens2f0np0: Link up
[  264.017221] mlx5_core 0000:0d:00.0 ens2f0np0: Dropping C-tag vlan stripping offload due to S-tag vlan
[  264.017223] mlx5_core 0000:0d:00.0 ens2f0np0: Disabling HW_VLAN CTAG FILTERING, not supported in switchdev mode
[  264.136922] mlx5_core 0000:0d:00.0 ens2f0npf0vf0: renamed from eth0
[  264.151402] debugfs: Directory 'nic' with parent '0000:0d:00.0' already present!
[  264.214572] mlx5_core 0000:0d:00.0 ens2f0npf0vf1: renamed from eth0
[  264.226278] debugfs: Directory 'nic' with parent '0000:0d:00.0' already present!
[  264.292884] mlx5_core 0000:0d:00.0 ens2f0npf0vf2: renamed from eth0
[  264.303506] debugfs: Directory 'nic' with parent '0000:0d:00.0' already present!
[  264.377461] mlx5_core 0000:0d:00.0: E-Switch: Enable: mode(OFFLOADS), nvfs(4), necvfs(0), active vports(4)
[  264.380931] mlx5_core 0000:0d:00.0 ens2f0npf0vf3: renamed from eth0
[  269.425503] mlx5_core 0000:0d:00.2: enabling device (0000 -> 0002)
[  269.425569] mlx5_core 0000:0d:00.2: PTM is not supported by PCIe
[  269.425588] mlx5_core 0000:0d:00.2: firmware version: 22.44.1036
[  269.620194] mlx5_core 0000:0d:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  269.631907] mlx5_core 0000:0d:00.2: Assigned random MAC address b2:0a:5e:e7:f9:eb
[  269.772359] mlx5_core 0000:0d:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  269.776165] mlx5_core 0000:0d:00.2 ens2f0v0: renamed from eth0
[  269.915517] mlx5_core 0000:0d:00.2 ens2f0v0: Link up
[  272.845232] mlx5_core 0000:0d:00.3: enabling device (0000 -> 0002)
[  272.845299] mlx5_core 0000:0d:00.3: PTM is not supported by PCIe
[  272.845319] mlx5_core 0000:0d:00.3: firmware version: 22.44.1036
[  273.039306] mlx5_core 0000:0d:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  273.051151] mlx5_core 0000:0d:00.3: Assigned random MAC address 6a:6f:da:6b:50:70
[  273.193736] mlx5_core 0000:0d:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  273.197343] mlx5_core 0000:0d:00.3 ens2f0v1: renamed from eth0
[  273.338087] mlx5_core 0000:0d:00.3 ens2f0v1: Link up
[  276.254430] mlx5_core 0000:0d:00.4: enabling device (0000 -> 0002)
[  276.254497] mlx5_core 0000:0d:00.4: PTM is not supported by PCIe
[  276.254515] mlx5_core 0000:0d:00.4: firmware version: 22.44.1036
[  276.448550] mlx5_core 0000:0d:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  276.460439] mlx5_core 0000:0d:00.4: Assigned random MAC address f6:c5:69:e8:30:10
[  276.601405] mlx5_core 0000:0d:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  276.604867] mlx5_core 0000:0d:00.4 ens2f0v2: renamed from eth0
[  276.752492] mlx5_core 0000:0d:00.4 ens2f0v2: Link up
[  279.669193] mlx5_core 0000:0d:00.5: enabling device (0000 -> 0002)
[  279.669265] mlx5_core 0000:0d:00.5: PTM is not supported by PCIe
[  279.669282] mlx5_core 0000:0d:00.5: firmware version: 22.44.1036
[  279.863461] mlx5_core 0000:0d:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[  279.875360] mlx5_core 0000:0d:00.5: Assigned random MAC address be:56:90:1d:b5:e0
[  280.017839] mlx5_core 0000:0d:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
[  280.021391] mlx5_core 0000:0d:00.5 ens2f0v3: renamed from eth0
[  280.162052] mlx5_core 0000:0d:00.5 ens2f0v3: Link up
[  283.258060] ovs-system: entered promiscuous mode
[  283.298028] GACT probability on
[  283.302732] Timeout policy base is empty
[  283.357058] ens2f0np0_br: entered promiscuous mode
[  283.369803] mlx5_core 0000:0d:00.0 ens2f0np0: entered promiscuous mode
[  283.406604] mlx5_core 0000:0d:00.0 ens2f0npf0vf0: entered promiscuous mode
[  283.441464] mlx5_core 0000:0d:00.0 ens2f0npf0vf1: entered promiscuous mode
[  283.478269] mlx5_core 0000:0d:00.0 ens2f0npf0vf2: entered promiscuous mode
[  283.517622] mlx5_core 0000:0d:00.0 ens2f0npf0vf3: entered promiscuous mode
[  283.533791] Mirror/redirect action on
[  345.179996] FS-Cache: Loaded
[  345.267362] Key type dns_resolver registered
[  345.435186] NFS: Registering the id_resolver key type
[  345.435195] Key type id_resolver registered
[  345.435196] Key type id_legacy registered
[  345.576167] systemd-rc-local-generator[3537]: /etc/rc.d/rc.local is not marked executable, skipping.
[  376.131613] Bluetooth: Core ver 2.22
[  376.131636] NET: Registered PF_BLUETOOTH protocol family
[  376.131638] Bluetooth: HCI device and connection manager initialized
[  376.131642] Bluetooth: HCI socket layer initialized
[  376.131643] Bluetooth: L2CAP socket layer initialized
[  376.131647] Bluetooth: SCO socket layer initialized
[  423.869555] No such timeout policy "ovs_test_tp"
[  429.662364] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  429.663567] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  430.945688] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  437.634039] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4416): performing device reset
[  445.937186] mlx5_core 0000:0d:00.0: poll_health:801:(pid 0): device's health compromised - reached miss count
[  445.937212] mlx5_core 0000:0d:00.0: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[  445.937221] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[0] 0x0521945b
[  445.937228] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[1] 0x00000000
[  445.937234] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[2] 0x00000000
[  445.937240] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[3] 0x00000000
[  445.937247] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[4] 0x00000000
[  445.937253] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[5] 0x00000000
[  445.937259] mlx5_core 0000:0d:00.0: print_health_info:438:(pid 0): assert_exit_ptr 0x21492f38
[  445.937265] mlx5_core 0000:0d:00.0: print_health_info:439:(pid 0): assert_callra 0x2102d5f0
[  445.937280] mlx5_core 0000:0d:00.0: print_health_info:440:(pid 0): fw_ver 22.44.1036
[  445.937286] mlx5_core 0000:0d:00.0: print_health_info:442:(pid 0): time 1742220438
[  445.937294] mlx5_core 0000:0d:00.0: print_health_info:443:(pid 0): hw_id 0x00000212
[  445.937296] mlx5_core 0000:0d:00.0: print_health_info:444:(pid 0): rfr 0
[  445.937297] mlx5_core 0000:0d:00.0: print_health_info:445:(pid 0): severity 3 (ERROR)
[  445.937303] mlx5_core 0000:0d:00.0: print_health_info:446:(pid 0): irisc_index 3
[  445.937314] mlx5_core 0000:0d:00.0: print_health_info:447:(pid 0): synd 0x1: firmware internal error
[  445.937320] mlx5_core 0000:0d:00.0: print_health_info:449:(pid 0): ext_synd 0x8f7a
[  445.937327] mlx5_core 0000:0d:00.0: print_health_info:450:(pid 0): raw fw_ver 0x162c040c
[  446.257192] mlx5_core 0000:0d:00.2: poll_health:801:(pid 0): device's health compromised - reached miss count
[  446.513190] mlx5_core 0000:0d:00.3: poll_health:801:(pid 0): device's health compromised - reached miss count
[  446.577190] mlx5_core 0000:0d:00.4: poll_health:801:(pid 0): device's health compromised - reached miss count
[  447.473192] mlx5_core 0000:0d:00.1: poll_health:801:(pid 0): device's health compromised - reached miss count
[  447.473215] mlx5_core 0000:0d:00.1: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[  447.473221] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[0] 0x0521945b
[  447.473228] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[1] 0x00000000
[  447.473234] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[2] 0x00000000
[  447.473240] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[3] 0x00000000
[  447.473246] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[4] 0x00000000
[  447.473252] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[5] 0x00000000
[  447.473259] mlx5_core 0000:0d:00.1: print_health_info:438:(pid 0): assert_exit_ptr 0x21492f38
[  447.473265] mlx5_core 0000:0d:00.1: print_health_info:439:(pid 0): assert_callra 0x2102d5f0
[  447.473279] mlx5_core 0000:0d:00.1: print_health_info:440:(pid 0): fw_ver 22.44.1036
[  447.473286] mlx5_core 0000:0d:00.1: print_health_info:442:(pid 0): time 1742220438
[  447.473292] mlx5_core 0000:0d:00.1: print_health_info:443:(pid 0): hw_id 0x00000212
[  447.473293] mlx5_core 0000:0d:00.1: print_health_info:444:(pid 0): rfr 0
[  447.473295] mlx5_core 0000:0d:00.1: print_health_info:445:(pid 0): severity 3 (ERROR)
[  447.473300] mlx5_core 0000:0d:00.1: print_health_info:446:(pid 0): irisc_index 3
[  447.473311] mlx5_core 0000:0d:00.1: print_health_info:447:(pid 0): synd 0x1: firmware internal error
[  447.473317] mlx5_core 0000:0d:00.1: print_health_info:449:(pid 0): ext_synd 0x8f7a
[  447.473323] mlx5_core 0000:0d:00.1: print_health_info:450:(pid 0): raw fw_ver 0x162c040c
[  447.729198] mlx5_core 0000:0d:00.5: poll_health:801:(pid 0): device's health compromised - reached miss count
[  456.169156] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 4416): suspending device
[  456.171590] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4416): performing device reset
[  456.305726] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4416): performing device reset
[  457.305137] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4428): performing device reset
[  495.742404] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 4410): suspending device
[  495.843610] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  495.991695] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  496.020164] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  496.035602] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  496.049646] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4417): performing device reset
[  832.265070] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 4410): suspending device
[  832.372650] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  853.399862] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 4417): suspending device
[  853.529509] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4417): performing device reset
[  853.673868] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4417): performing device reset
[  854.395224] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  884.925816] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 4410): suspending device
[  885.042204] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  885.244260] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 4410): performing device reset
[  896.300680] No such timeout policy "ovs_test_tp"
[  900.357954] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  900.359110] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  902.100249] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  910.628854] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[  937.895823] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6064): suspending device
[  937.899325] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[  938.085016] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[  939.055817] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6077): performing device reset
[  976.568586] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[  976.662853] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  976.810406] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  976.834590] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  976.850151] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[  976.864017] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1311.108310] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1311.220778] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1332.256657] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6065): suspending device
[ 1332.386466] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1332.529703] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1333.241475] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1356.795129] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1356.917443] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1357.064267] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1357.087461] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1357.102482] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1357.118749] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1413.013745] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1413.129036] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1434.138735] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6065): suspending device
[ 1434.269223] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1434.412250] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1435.137768] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1458.692556] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1458.804336] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1458.950704] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1458.973833] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1458.988729] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1459.004483] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 1514.899786] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1515.005398] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1536.017815] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6067): suspending device
[ 1536.147266] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1536.290797] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1537.013159] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.562567] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1560.672042] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.822567] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.845755] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.860899] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1560.874076] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1616.775191] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1616.882382] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1637.909014] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6067): suspending device
[ 1638.038584] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1638.182394] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1638.899863] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.455170] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1662.571479] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.718829] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.742060] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.757116] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1662.770550] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1718.669137] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1718.778560] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1739.797035] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6067): suspending device
[ 1739.928191] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1740.071945] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6067): performing device reset
[ 1740.793808] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1744.056362] ------------[ cut here ]------------
[ 1744.056364] WARNING: CPU: 46 PID: 0 at kernel/time/timer.c:1685 __run_timers.part.0+0x253/0x280
[ 1744.056375] Modules linked in: act_skbedit bluetooth nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs act_mirred cls_matchall nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_vdpa vringh vhost_vdpa vhost vhost_iotlb vdpa bridge stp llc qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_ib kvm_intel ib_uverbs cdc_ether kvm macsec usbnet ib_core mii iTCO_wdt pmt_telemetry dell_smbios pmt_class rapl iTCO_vendor_support dcdbas wmi_bmof dell_wmi_descriptor ipmi_ssif acpi_power_meter joydev acpi_ipmi intel_sdsi isst_if_mmio isst_if_mbox_pci isst_if_common intel_vsec intel_cstate i2c_i801 mei_me mei ipmi_si i2c_smbus intel_uncore ipmi_devintf pcspkr i2c_ismt ipmi_msghandler xfs libcrc32c sd_mod mgag200 i2c_algo_bit sg drm_shmem_helper mlx5_core
[ 1744.056420]  drm_kms_helper mlxfw ahci psample libahci crct10dif_pclmul iaa_crypto crc32_pclmul drm bnxt_en libata megaraid_sas crc32c_intel idxd tls ghash_clmulni_intel tg3 idxd_bus pci_hyperv_intf wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse
[ 1744.056432] CPU: 46 PID: 0 Comm: swapper/46 Kdump: loaded Not tainted 5.14.0-570.1.1.el9_6.x86_64 #1
[ 1744.056435] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS 1.3.2 03/28/2023
[ 1744.056436] RIP: 0010:__run_timers.part.0+0x253/0x280
[ 1744.056439] Code: 6e ff ff ff 0f 1f 44 00 00 e9 64 ff ff ff 49 8b 44 24 10 83 6c 24 04 01 8b 4c 24 04 83 f9 ff 0f 85 f8 fe ff ff e9 9a fe ff ff <0f> 0b e9 22 ff ff ff 41 80 7c 24 26 00 0f 84 75 fe ff ff 0f 0b e9
[ 1744.056440] RSP: 0018:ff367707c7118ef0 EFLAGS: 00010046
[ 1744.056442] RAX: 0000000000000000 RBX: 0000000100160000 RCX: 0000000000000200
[ 1744.056443] RDX: ff367707c7118f00 RSI: ff119e88e01e1540 RDI: ff119e88e01e1568
[ 1744.056444] RBP: ff119e818706f658 R08: 0000000000000009 R09: 0000000000000001
[ 1744.056445] R10: ffffffff9e8060c0 R11: ff367707c7110082 R12: ff119e88e01e1540
[ 1744.056445] R13: dead000000000122 R14: 0000000000000000 R15: ff367707c7118f00
[ 1744.056446] FS:  0000000000000000(0000) GS:ff119e88e01c0000(0000) knlGS:0000000000000000
[ 1744.056448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1744.056448] CR2: 00007f37a222ffe8 CR3: 000000089a8f2001 CR4: 0000000000773ef0
[ 1744.056449] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1744.056450] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 1744.056451] PKRU: 55555554
[ 1744.056452] Call Trace:
[ 1744.056453]  <IRQ>
[ 1744.056455]  ? show_trace_log_lvl+0x1c4/0x2df
[ 1744.056460]  ? show_trace_log_lvl+0x1c4/0x2df
[ 1744.056461]  ? run_timer_softirq+0x26/0x50
[ 1744.056463]  ? __run_timers.part.0+0x253/0x280
[ 1744.056465]  ? __warn+0x7e/0xd0
[ 1744.056470]  ? __run_timers.part.0+0x253/0x280
[ 1744.056471]  ? report_bug+0x100/0x140
[ 1744.056475]  ? handle_bug+0x3c/0x70
[ 1744.056478]  ? exc_invalid_op+0x14/0x70
[ 1744.056480]  ? asm_exc_invalid_op+0x16/0x20
[ 1744.056486]  ? __run_timers.part.0+0x253/0x280
[ 1744.056488]  ? tick_nohz_highres_handler+0x6d/0x90
[ 1744.056491]  ? __hrtimer_run_queues+0x121/0x2b0
[ 1744.056494]  ? sched_clock+0xc/0x30
[ 1744.056498]  run_timer_softirq+0x26/0x50
[ 1744.056500]  handle_softirqs+0xce/0x270
[ 1744.056504]  __irq_exit_rcu+0xa3/0xc0
[ 1744.056507]  sysvec_apic_timer_interrupt+0x72/0x90
[ 1744.056510]  </IRQ>
[ 1744.056511]  <TASK>
[ 1744.056511]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 1744.056513] RIP: 0010:cpuidle_enter_state+0xbc/0x420
[ 1744.056515] Code: e6 01 00 00 e8 75 52 46 ff e8 90 ed ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 91 1c 45 ff 45 84 ff 0f 85 3f 01 00 00 fb 45 85 f6 <0f> 88 a0 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d 04 82 49
[ 1744.056516] RSP: 0018:ff367707c67afe80 EFLAGS: 00000202
[ 1744.056517] RAX: ff119e88e01f38c0 RBX: 0000000000000002 RCX: 000000000000001f
[ 1744.056518] RDX: 0000000000000000 RSI: 0000000040000000 RDI: 0000000000000000
[ 1744.056519] RBP: ff119e88e01ff1f0 R08: 0000019611dc062f R09: 0000000000000001
[ 1744.056520] R10: 000000000000afc8 R11: ff119e88e01f1b24 R12: ffffffff9ecd0a80
[ 1744.056520] R13: 0000019611dc062f R14: 0000000000000002 R15: 0000000000000000
[ 1744.056523]  cpuidle_enter+0x29/0x40
[ 1744.056527]  cpuidle_idle_call+0xfa/0x160
[ 1744.056532]  do_idle+0x7b/0xe0
[ 1744.056533]  cpu_startup_entry+0x26/0x30
[ 1744.056535]  start_secondary+0x115/0x140
[ 1744.056539]  secondary_startup_64_no_verify+0x187/0x18b
[ 1744.056544]  </TASK>
[ 1744.056545] ---[ end trace 0000000000000000 ]---
[ 1764.350097] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1764.461299] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1764.610195] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1764.633164] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1764.651522] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1764.666709] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6074): performing device reset
[ 1820.564757] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1820.669447] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1841.689173] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6074): suspending device
[ 1841.820848] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6074): performing device reset
[ 1841.964521] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6074): performing device reset
[ 1842.686383] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.244482] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1866.356684] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.504013] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.527119] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.542271] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1866.557025] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6069): performing device reset
[ 1922.453849] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1922.566288] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1943.577231] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6069): suspending device
[ 1943.705482] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6069): performing device reset
[ 1943.849078] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6069): performing device reset
[ 1944.575623] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.131649] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 1968.244956] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.393767] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.416840] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.432204] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 1968.445394] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6066): performing device reset
[ 2023.748063] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2023.859118] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2044.773289] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6066): suspending device
[ 2044.902992] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6066): performing device reset
[ 2045.046642] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6066): performing device reset
[ 2045.765646] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.321650] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2069.432046] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.579077] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.602194] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.617295] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2069.630408] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6066): performing device reset
[ 2125.529218] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2125.642232] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2146.660349] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6068): suspending device
[ 2146.789723] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2146.934212] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2147.651131] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.206931] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2171.317257] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.463890] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.486887] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.502178] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2171.515624] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2227.925133] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2228.039644] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2249.064863] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6068): suspending device
[ 2249.196469] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2249.342621] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2250.062035] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.618035] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2273.728952] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.875403] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.903320] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.918752] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2273.935470] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2329.733715] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2329.845466] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2350.757001] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6068): suspending device
[ 2350.888854] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2351.032363] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2351.754748] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.311251] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2375.423505] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.575649] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.598606] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.613576] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2375.626555] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6068): performing device reset
[ 2431.431256] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2431.548474] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2452.563063] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6082): suspending device
[ 2452.692186] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6082): performing device reset
[ 2452.835890] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6082): performing device reset
[ 2453.558500] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.107406] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2477.222543] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.369468] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.392708] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.408000] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2477.421005] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6082): performing device reset
[ 2532.820704] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2532.929522] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2553.944287] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6083): suspending device
[ 2554.073386] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6083): performing device reset
[ 2554.217339] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6083): performing device reset
[ 2554.939721] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.496858] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2578.608906] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.755717] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.778646] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.793624] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2578.806434] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6083): performing device reset
[ 2635.208023] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2635.319509] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2656.349622] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6064): suspending device
[ 2656.481365] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2656.626034] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2657.344870] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2680.900938] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2681.012904] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2681.162658] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2681.190490] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2681.209509] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2681.222689] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2736.520697] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2736.629074] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2757.549668] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6064): suspending device
[ 2757.677681] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2757.821586] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6064): performing device reset
[ 2758.545571] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.101880] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2782.212478] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.362330] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.391840] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.410782] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2782.424201] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2837.821875] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2837.934527] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2858.857829] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6065): suspending device
[ 2858.987470] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2859.131164] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2859.856901] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.416052] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2883.529299] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.678733] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.702133] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.720613] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2883.736648] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2939.534333] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2939.650840] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2960.666945] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6065): suspending device
[ 2960.795641] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2960.939717] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6065): performing device reset
[ 2961.662744] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.218288] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 2985.330631] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.478083] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.501059] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.516039] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 2985.532442] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3051.941145] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3052.052588] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3061.926205] mlx5_core 0000:0d:00.0: mlx5_fw_tracer_handle_traces:741:(pid 12): FWTracer: Events were lost
[ 3073.011616] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6073): suspending device
[ 3073.145816] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3073.292354] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3074.007125] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.566078] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3097.681695] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.841924] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.865284] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.880624] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3097.895416] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3153.793814] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3153.904742] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3174.840368] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6073): suspending device
[ 3174.976699] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3175.120717] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3175.831356] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.386363] mlx5_core 0000:0d:00.2: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3199.505991] mlx5_core 0000:0d:00.2: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.653190] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.676267] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.691369] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3199.709429] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6073): performing device reset
[ 3255.502913] mlx5_core 0000:0d:00.3: mlx5_vdpa_suspend:2949:(pid 6058): suspending device
[ 3255.626444] mlx5_core 0000:0d:00.3: mlx5_vdpa_compat_reset:2621:(pid 6058): performing device reset
[ 3256.256680] general protection fault, probably for non-canonical address 0x16919e70800880c0: 0000 [#1] PREEMPT SMP NOPTI
[ 3256.256684] CPU: 38 PID: 0 Comm: swapper/38 Kdump: loaded Tainted: G        W         -------  ---  5.14.0-570.1.1.el9_6.x86_64 #1
[ 3256.256687] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS 1.3.2 03/28/2023
[ 3256.256687] RIP: 0010:__build_skb_around+0x8c/0xf0
[ 3256.256695] Code: 24 d0 00 00 00 66 41 89 94 24 ba 00 00 00 66 41 89 8c 24 b6 00 00 00 65 8b 15 1c 09 bd 62 66 41 89 94 24 a0 00 00 00 48 01 d8 <48> c7 00 00 00 00 00 48 c7 40 08 00 00 00 00 48 c7 40 10 00 00 00
[ 3256.256696] RSP: 0018:ff367707c6f78cf0 EFLAGS: 00010206
[ 3256.256698] RAX: 16919e70800880c0 RBX: 16919e7080088000 RCX: 00000000ffffffff
[ 3256.256699] RDX: 0000000000000026 RSI: 16919e7080088000 RDI: ff119e81b3397000
[ 3256.256700] RBP: 0000000000000300 R08: ff367707c6f78cc0 R09: 0000000000000040
[ 3256.256701] R10: 000000000000005a R11: 16919e7080088040 R12: ff119e81b3397000
[ 3256.256702] R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000040
[ 3256.256703] FS:  0000000000000000(0000) GS:ff119e88e00c0000(0000) knlGS:0000000000000000
[ 3256.256704] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3256.256705] CR2: 00007f420c0205b8 CR3: 000000089a8f2006 CR4: 0000000000773ef0
[ 3256.256706] PKRU: 55555554
[ 3256.256707] Call Trace:
[ 3256.256708]  <IRQ>
[ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3256.256715]  ? __build_skb+0x4a/0x60
[ 3256.256719]  ? __die_body.cold+0x8/0xd
[ 3256.256720]  ? die_addr+0x39/0x60
[ 3256.256725]  ? exc_general_protection+0x1ec/0x420
[ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
[ 3256.256736]  ? __build_skb_around+0x8c/0xf0
[ 3256.256738]  __build_skb+0x4a/0x60
[ 3256.256740]  build_skb+0x11/0xa0
[ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
[ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
[ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
[ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
[ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
[ 3256.257226]  __napi_poll+0x29/0x170
[ 3256.257229]  net_rx_action+0x29c/0x370
[ 3256.257231]  handle_softirqs+0xce/0x270
[ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
[ 3256.257238]  common_interrupt+0x80/0xa0
[ 3256.257241]  </IRQ>
[ 3256.257241]  <TASK>
[ 3256.257242]  asm_common_interrupt+0x22/0x40
[ 3256.257244] RIP: 0010:cpuidle_enter_state+0xbc/0x420
[ 3256.257246] Code: e6 01 00 00 e8 75 52 46 ff e8 90 ed ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 91 1c 45 ff 45 84 ff 0f 85 3f 01 00 00 fb 45 85 f6 <0f> 88 a0 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d 04 82 49
[ 3256.257247] RSP: 0018:ff367707c676fe80 EFLAGS: 00000202
[ 3256.257249] RAX: ff119e88e00f38c0 RBX: 0000000000000002 RCX: 000000000000001f
[ 3256.257250] RDX: 0000000000000000 RSI: 0000000040000000 RDI: 0000000000000000
[ 3256.257251] RBP: ff119e88e00ff1f0 R08: 000002f62805a71f R09: 0000000000000000
[ 3256.257251] R10: 00000000000003e1 R11: ff119e88e00f1b24 R12: ffffffff9ecd0a80
[ 3256.257252] R13: 000002f62805a71f R14: 0000000000000002 R15: 0000000000000000
[ 3256.257254]  cpuidle_enter+0x29/0x40
[ 3256.257259]  cpuidle_idle_call+0xfa/0x160
[ 3256.257262]  do_idle+0x7b/0xe0
[ 3256.257264]  cpu_startup_entry+0x26/0x30
[ 3256.257266]  start_secondary+0x115/0x140
[ 3256.257270]  secondary_startup_64_no_verify+0x187/0x18b
[ 3256.257274]  </TASK>
[ 3256.257275] Modules linked in: act_skbedit bluetooth nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs act_mirred cls_matchall nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 mlx5_vdpa vringh vhost_vdpa vhost vhost_iotlb vdpa bridge stp llc qrtr rfkill sunrpc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_ib kvm_intel ib_uverbs cdc_ether kvm macsec usbnet ib_core mii iTCO_wdt pmt_telemetry dell_smbios pmt_class rapl iTCO_vendor_support dcdbas wmi_bmof dell_wmi_descriptor ipmi_ssif acpi_power_meter joydev acpi_ipmi intel_sdsi isst_if_mmio isst_if_mbox_pci isst_if_common intel_vsec intel_cstate i2c_i801 mei_me mei ipmi_si i2c_smbus intel_uncore ipmi_devintf pcspkr i2c_ismt ipmi_msghandler xfs libcrc32c sd_mod mgag200 i2c_algo_bit sg drm_shmem_helper mlx5_core
[ 3256.257321]  drm_kms_helper mlxfw ahci psample libahci crct10dif_pclmul iaa_crypto crc32_pclmul drm bnxt_en libata megaraid_sas crc32c_intel idxd tls ghash_clmulni_intel tg3 idxd_bus pci_hyperv_intf wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
  2025-03-18  1:53 ` [PATCH v3 0/7] Move " Lei Yang
@ 2025-03-18  2:14   ` Jason Wang
  2025-03-18 14:06     ` Lei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Wang @ 2025-03-18  2:14 UTC (permalink / raw)
  To: Lei Yang
  Cc: Jonah Palmer, qemu-devel, eperezma, peterx, mst, jasowant,
	lvivier, dtatulea, leiyan, parav, sgarzare, si-wei.liu,
	lingshan.zhu, boris.ostrovsky

On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
>
> Hi Jonah
>
> I tested this series with the vhost_vdpa device based on mellanox
> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> easier to reproduce under the hotplug/unplug device scenario.
> For the core dump messages please review the attachment.
> FW version:
> #  flint -d 0000:0d:00.0 q |grep Version
> FW Version:            22.44.1036
> Product Version:       22.44.1036

The trace looks more like a mlx5e driver bug other than vDPA?

[ 3256.256707] Call Trace:
[ 3256.256708]  <IRQ>
[ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
[ 3256.256715]  ? __build_skb+0x4a/0x60
[ 3256.256719]  ? __die_body.cold+0x8/0xd
[ 3256.256720]  ? die_addr+0x39/0x60
[ 3256.256725]  ? exc_general_protection+0x1ec/0x420
[ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
[ 3256.256736]  ? __build_skb_around+0x8c/0xf0
[ 3256.256738]  __build_skb+0x4a/0x60
[ 3256.256740]  build_skb+0x11/0xa0
[ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
[ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
[ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
[ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
[ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
[ 3256.257226]  __napi_poll+0x29/0x170
[ 3256.257229]  net_rx_action+0x29c/0x370
[ 3256.257231]  handle_softirqs+0xce/0x270
[ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
[ 3256.257238]  common_interrupt+0x80/0xa0

Which kernel tree did you use? Can you please try net.git?

Thanks

>
> Best Regards
> Lei
>
> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >
> > Current memory operations like pinning may take a lot of time at the
> > destination.  Currently they are done after the source of the migration is
> > stopped, and before the workload is resumed at the destination.  This is a
> > period where neigher traffic can flow, nor the VM workload can continue
> > (downtime).
> >
> > We can do better as we know the memory layout of the guest RAM at the
> > destination from the moment that all devices are initializaed.  So
> > moving that operation allows QEMU to communicate the kernel the maps
> > while the workload is still running in the source, so Linux can start
> > mapping them.
> >
> > As a small drawback, there is a time in the initialization where QEMU
> > cannot respond to QMP etc.  By some testing, this time is about
> > 0.2seconds.  This may be further reduced (or increased) depending on the
> > vdpa driver and the platform hardware, and it is dominated by the cost
> > of memory pinning.
> >
> > This matches the time that we move out of the called downtime window.
> > The downtime is measured as checking the trace timestamp from the moment
> > the source suspend the device to the moment the destination starts the
> > eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > secs to 2.0949.
> >
> > Future directions on top of this series may include to move more things ahead
> > of the migration time, like set DRIVER_OK or perform actual iterative migration
> > of virtio-net devices.
> >
> > Comments are welcome.
> >
> > This series is a different approach of series [1]. As the title does not
> > reflect the changes anymore, please refer to the previous one to know the
> > series history.
> >
> > This series is based on [2], it must be applied after it.
> >
> > [Jonah Palmer]
> > This series was rebased after [3] was pulled in, as [3] was a prerequisite
> > fix for this series.
> >
> > v3:
> > ---
> > * Rebase
> >
> > v2:
> > ---
> > * Move the memory listener registration to vhost_vdpa_set_owner function.
> > * Move the iova_tree allocation to net_vhost_vdpa_init.
> >
> > v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> >
> > [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> > [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> >
> > Eugenio Pérez (7):
> >   vdpa: check for iova tree initialized at net_client_start
> >   vdpa: reorder vhost_vdpa_set_backend_cap
> >   vdpa: set backend capabilities at vhost_vdpa_init
> >   vdpa: add listener_registered
> >   vdpa: reorder listener assignment
> >   vdpa: move iova_tree allocation to net_vhost_vdpa_init
> >   vdpa: move memory listener register to vhost_vdpa_init
> >
> >  hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> >  include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> >  net/vhost-vdpa.c               | 34 ++----------
> >  3 files changed, 88 insertions(+), 66 deletions(-)
> >
> > --
> > 2.43.5
> >
> >



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
  2025-03-18  2:14   ` Jason Wang
@ 2025-03-18 14:06     ` Lei Yang
  2025-03-18 18:09       ` Dragos Tatulea
  2025-03-19  0:14       ` Si-Wei Liu
  0 siblings, 2 replies; 16+ messages in thread
From: Lei Yang @ 2025-03-18 14:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jonah Palmer, qemu-devel, eperezma, peterx, mst, jasowant,
	lvivier, dtatulea, leiyan, parav, sgarzare, si-wei.liu,
	lingshan.zhu, boris.ostrovsky

On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> >
> > Hi Jonah
> >
> > I tested this series with the vhost_vdpa device based on mellanox
> > ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > easier to reproduce under the hotplug/unplug device scenario.
> > For the core dump messages please review the attachment.
> > FW version:
> > #  flint -d 0000:0d:00.0 q |grep Version
> > FW Version:            22.44.1036
> > Product Version:       22.44.1036
>
> The trace looks more like a mlx5e driver bug other than vDPA?
>
> [ 3256.256707] Call Trace:
> [ 3256.256708]  <IRQ>
> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> [ 3256.256715]  ? __build_skb+0x4a/0x60
> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> [ 3256.256720]  ? die_addr+0x39/0x60
> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> [ 3256.256738]  __build_skb+0x4a/0x60
> [ 3256.256740]  build_skb+0x11/0xa0
> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> [ 3256.257226]  __napi_poll+0x29/0x170
> [ 3256.257229]  net_rx_action+0x29c/0x370
> [ 3256.257231]  handle_softirqs+0xce/0x270
> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> [ 3256.257238]  common_interrupt+0x80/0xa0
>
Hi Jason

> Which kernel tree did you use? Can you please try net.git?

I used the latest 9.6 downstream kernel and upstream qemu (applied
this series of patches) to test this scenario.
First based on my test result this bug is related to this series of
patches, the conclusions are based on the following test results(All
test results are based on the above mentioned nic driver):
Case 1: downstream kernel + downstream qemu-kvm  -  pass
Case 2: downstream kernel + upstream qemu (doesn't included this
series of patches)  -  pass
Case 3: downstream kernel + upstream qemu (included this series of
patches)  - failed, reproduce ratio 100%

Then I also tried to test it with the net.git tree, but it will hit
the host kernel panic after compiling when rebooting the host. For the
call trace info please review following messages:
[    9.902851] No filesystem could mount root, tried:
[    9.902851]
[    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
[    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
[    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
1.3.2 03/28/2023
[    9.935876] Call Trace:
[    9.938332]  <TASK>
[    9.940436]  panic+0x356/0x380
[    9.943513]  mount_root_generic+0x2e7/0x300
[    9.947717]  prepare_namespace+0x65/0x270
[    9.951731]  kernel_init_freeable+0x2e2/0x310
[    9.956105]  ? __pfx_kernel_init+0x10/0x10
[    9.960221]  kernel_init+0x16/0x1d0
[    9.963715]  ret_from_fork+0x2d/0x50
[    9.967303]  ? __pfx_kernel_init+0x10/0x10
[    9.971404]  ret_from_fork_asm+0x1a/0x30
[    9.975348]  </TASK>
[    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
unknown-block(0,0) ]---

# git log -1
commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
origin/main, origin/HEAD)
Merge: 8f7617f45009 2409fa66e29a
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Mar 13 07:58:48 2025 -1000

    Merge tag 'net-6.14-rc7' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net


Thanks

Lei
>
> Thanks
>
> >
> > Best Regards
> > Lei
> >
> > On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> > >
> > > Current memory operations like pinning may take a lot of time at the
> > > destination.  Currently they are done after the source of the migration is
> > > stopped, and before the workload is resumed at the destination.  This is a
> > > period where neigher traffic can flow, nor the VM workload can continue
> > > (downtime).
> > >
> > > We can do better as we know the memory layout of the guest RAM at the
> > > destination from the moment that all devices are initializaed.  So
> > > moving that operation allows QEMU to communicate the kernel the maps
> > > while the workload is still running in the source, so Linux can start
> > > mapping them.
> > >
> > > As a small drawback, there is a time in the initialization where QEMU
> > > cannot respond to QMP etc.  By some testing, this time is about
> > > 0.2seconds.  This may be further reduced (or increased) depending on the
> > > vdpa driver and the platform hardware, and it is dominated by the cost
> > > of memory pinning.
> > >
> > > This matches the time that we move out of the called downtime window.
> > > The downtime is measured as checking the trace timestamp from the moment
> > > the source suspend the device to the moment the destination starts the
> > > eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > > secs to 2.0949.
> > >
> > > Future directions on top of this series may include to move more things ahead
> > > of the migration time, like set DRIVER_OK or perform actual iterative migration
> > > of virtio-net devices.
> > >
> > > Comments are welcome.
> > >
> > > This series is a different approach of series [1]. As the title does not
> > > reflect the changes anymore, please refer to the previous one to know the
> > > series history.
> > >
> > > This series is based on [2], it must be applied after it.
> > >
> > > [Jonah Palmer]
> > > This series was rebased after [3] was pulled in, as [3] was a prerequisite
> > > fix for this series.
> > >
> > > v3:
> > > ---
> > > * Rebase
> > >
> > > v2:
> > > ---
> > > * Move the memory listener registration to vhost_vdpa_set_owner function.
> > > * Move the iova_tree allocation to net_vhost_vdpa_init.
> > >
> > > v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> > >
> > > [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> > > [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > > [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> > >
> > > Eugenio Pérez (7):
> > >   vdpa: check for iova tree initialized at net_client_start
> > >   vdpa: reorder vhost_vdpa_set_backend_cap
> > >   vdpa: set backend capabilities at vhost_vdpa_init
> > >   vdpa: add listener_registered
> > >   vdpa: reorder listener assignment
> > >   vdpa: move iova_tree allocation to net_vhost_vdpa_init
> > >   vdpa: move memory listener register to vhost_vdpa_init
> > >
> > >  hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> > >  include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> > >  net/vhost-vdpa.c               | 34 ++----------
> > >  3 files changed, 88 insertions(+), 66 deletions(-)
> > >
> > > --
> > > 2.43.5
> > >
> > >
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
  2025-03-18 14:06     ` Lei Yang
@ 2025-03-18 18:09       ` Dragos Tatulea
  2025-03-19  0:14       ` Si-Wei Liu
  1 sibling, 0 replies; 16+ messages in thread
From: Dragos Tatulea @ 2025-03-18 18:09 UTC (permalink / raw)
  To: Lei Yang, Jason Wang
  Cc: Jonah Palmer, qemu-devel, eperezma, peterx, mst, jasowant,
	lvivier, leiyan, parav, sgarzare, si-wei.liu, lingshan.zhu,
	boris.ostrovsky

Hi,

On 03/18, Lei Yang wrote:
> On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> > >
> > > Hi Jonah
> > >
> > > I tested this series with the vhost_vdpa device based on mellanox
> > > ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > > easier to reproduce under the hotplug/unplug device scenario.
> > > For the core dump messages please review the attachment.
> > > FW version:
> > > #  flint -d 0000:0d:00.0 q |grep Version
> > > FW Version:            22.44.1036
> > > Product Version:       22.44.1036
> >
> > The trace looks more like a mlx5e driver bug other than vDPA?
> >
> > [ 3256.256707] Call Trace:
> > [ 3256.256708]  <IRQ>
> > [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> > [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> > [ 3256.256715]  ? __build_skb+0x4a/0x60
> > [ 3256.256719]  ? __die_body.cold+0x8/0xd
> > [ 3256.256720]  ? die_addr+0x39/0x60
> > [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> > [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> > [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> > [ 3256.256738]  __build_skb+0x4a/0x60
> > [ 3256.256740]  build_skb+0x11/0xa0
> > [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> > [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> > [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> > [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> > [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> > [ 3256.257226]  __napi_poll+0x29/0x170
> > [ 3256.257229]  net_rx_action+0x29c/0x370
> > [ 3256.257231]  handle_softirqs+0xce/0x270
> > [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> > [ 3256.257238]  common_interrupt+0x80/0xa0
> >
The logs indicate that the mlx5_vdpa device is already in bad FW state
before this crash:

[  445.937186] mlx5_core 0000:0d:00.0: poll_health:801:(pid 0): device's health compromised - reached miss count
[  445.937212] mlx5_core 0000:0d:00.0: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[  445.937221] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[0] 0x0521945b
[  445.937228] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[1] 0x00000000
[  445.937234] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[2] 0x00000000
[  445.937240] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[3] 0x00000000
[  445.937247] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[4] 0x00000000
[  445.937253] mlx5_core 0000:0d:00.0: print_health_info:435:(pid 0): assert_var[5] 0x00000000
[  445.937259] mlx5_core 0000:0d:00.0: print_health_info:438:(pid 0): assert_exit_ptr 0x21492f38
[  445.937265] mlx5_core 0000:0d:00.0: print_health_info:439:(pid 0): assert_callra 0x2102d5f0
[  445.937280] mlx5_core 0000:0d:00.0: print_health_info:440:(pid 0): fw_ver 22.44.1036
[  445.937286] mlx5_core 0000:0d:00.0: print_health_info:442:(pid 0): time 1742220438
[  445.937294] mlx5_core 0000:0d:00.0: print_health_info:443:(pid 0): hw_id 0x00000212
[  445.937296] mlx5_core 0000:0d:00.0: print_health_info:444:(pid 0): rfr 0
[  445.937297] mlx5_core 0000:0d:00.0: print_health_info:445:(pid 0): severity 3 (ERROR)
[  445.937303] mlx5_core 0000:0d:00.0: print_health_info:446:(pid 0): irisc_index 3
[  445.937314] mlx5_core 0000:0d:00.0: print_health_info:447:(pid 0): synd 0x1: firmware internal error
[  445.937320] mlx5_core 0000:0d:00.0: print_health_info:449:(pid 0): ext_synd 0x8f7a
[  445.937327] mlx5_core 0000:0d:00.0: print_health_info:450:(pid 0): raw fw_ver 0x162c040c
[  446.257192] mlx5_core 0000:0d:00.2: poll_health:801:(pid 0): device's health compromised - reached miss count
[  446.513190] mlx5_core 0000:0d:00.3: poll_health:801:(pid 0): device's health compromised - reached miss count
[  446.577190] mlx5_core 0000:0d:00.4: poll_health:801:(pid 0): device's health compromised - reached miss count
[  447.473192] mlx5_core 0000:0d:00.1: poll_health:801:(pid 0): device's health compromised - reached miss count
[  447.473215] mlx5_core 0000:0d:00.1: print_health_info:431:(pid 0): Health issue observed, firmware internal error, severity(3) ERROR:
[  447.473221] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[0] 0x0521945b
[  447.473228] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[1] 0x00000000
[  447.473234] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[2] 0x00000000
[  447.473240] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[3] 0x00000000
[  447.473246] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[4] 0x00000000
[  447.473252] mlx5_core 0000:0d:00.1: print_health_info:435:(pid 0): assert_var[5] 0x00000000
[  447.473259] mlx5_core 0000:0d:00.1: print_health_info:438:(pid 0): assert_exit_ptr 0x21492f38
[  447.473265] mlx5_core 0000:0d:00.1: print_health_info:439:(pid 0): assert_callra 0x2102d5f0
[  447.473279] mlx5_core 0000:0d:00.1: print_health_info:440:(pid 0): fw_ver 22.44.1036
[  447.473286] mlx5_core 0000:0d:00.1: print_health_info:442:(pid 0): time 1742220438
[  447.473292] mlx5_core 0000:0d:00.1: print_health_info:443:(pid 0): hw_id 0x00000212
[  447.473293] mlx5_core 0000:0d:00.1: print_health_info:444:(pid 0): rfr 0
[  447.473295] mlx5_core 0000:0d:00.1: print_health_info:445:(pid 0): severity 3 (ERROR)
[  447.473300] mlx5_core 0000:0d:00.1: print_health_info:446:(pid 0): irisc_index 3
[  447.473311] mlx5_core 0000:0d:00.1: print_health_info:447:(pid 0): synd 0x1: firmware internal error
[  447.473317] mlx5_core 0000:0d:00.1: print_health_info:449:(pid 0): ext_synd 0x8f7a
[  447.473323] mlx5_core 0000:0d:00.1: print_health_info:450:(pid 0): raw fw_ver 0x162c040c
[  447.729198] mlx5_core 0000:0d:00.5: poll_health:801:(pid 0): device's health compromised - reached miss count

This is related to a ring translation error on the FW side.

Si-Wei has some relevant fixes in the latest kernel [0][1]. And there is
an upcoming fix [2] which is pending merge. These might help. Either
that or there is something off with the mapping.

[0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
[1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
[2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")

Thanks,
Dragos
 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
  2025-03-18 14:06     ` Lei Yang
  2025-03-18 18:09       ` Dragos Tatulea
@ 2025-03-19  0:14       ` Si-Wei Liu
  2025-03-20 15:07         ` Lei Yang
  1 sibling, 1 reply; 16+ messages in thread
From: Si-Wei Liu @ 2025-03-19  0:14 UTC (permalink / raw)
  To: Lei Yang, Jason Wang
  Cc: Jonah Palmer, qemu-devel, eperezma, peterx, mst, jasowant,
	lvivier, dtatulea, leiyan, parav, sgarzare, lingshan.zhu,
	boris.ostrovsky

Hi Lei,

On 3/18/2025 7:06 AM, Lei Yang wrote:
> On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
>>> Hi Jonah
>>>
>>> I tested this series with the vhost_vdpa device based on mellanox
>>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
>>> easier to reproduce under the hotplug/unplug device scenario.
>>> For the core dump messages please review the attachment.
>>> FW version:
>>> #  flint -d 0000:0d:00.0 q |grep Version
>>> FW Version:            22.44.1036
>>> Product Version:       22.44.1036
>> The trace looks more like a mlx5e driver bug other than vDPA?
>>
>> [ 3256.256707] Call Trace:
>> [ 3256.256708]  <IRQ>
>> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
>> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
>> [ 3256.256715]  ? __build_skb+0x4a/0x60
>> [ 3256.256719]  ? __die_body.cold+0x8/0xd
>> [ 3256.256720]  ? die_addr+0x39/0x60
>> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
>> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
>> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
>> [ 3256.256738]  __build_skb+0x4a/0x60
>> [ 3256.256740]  build_skb+0x11/0xa0
>> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
>> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
>> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
>> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
>> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
>> [ 3256.257226]  __napi_poll+0x29/0x170
>> [ 3256.257229]  net_rx_action+0x29c/0x370
>> [ 3256.257231]  handle_softirqs+0xce/0x270
>> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
>> [ 3256.257238]  common_interrupt+0x80/0xa0
>>
> Hi Jason
>
>> Which kernel tree did you use? Can you please try net.git?
> I used the latest 9.6 downstream kernel and upstream qemu (applied
> this series of patches) to test this scenario.
> First based on my test result this bug is related to this series of
> patches, the conclusions are based on the following test results(All
> test results are based on the above mentioned nic driver):
> Case 1: downstream kernel + downstream qemu-kvm  -  pass
> Case 2: downstream kernel + upstream qemu (doesn't included this
> series of patches)  -  pass
> Case 3: downstream kernel + upstream qemu (included this series of
> patches)  - failed, reproduce ratio 100%
Just as Dragos replied earlier, the firmware was already in a bogus 
state before the panic that I also suspect it has something to do with 
various bugs in the downstream kernel. You have to apply the 3 patches 
to the downstream kernel before you may kick of the relevant tests 
again. Please pay special attention to which specific command or step 
that triggers the unhealthy report from firmware, and let us know if you 
still run into any of them.

In addition, as you seem to be testing the device hot plug and unplug 
use cases, for which the latest qemu should have related fixes 
below[1][2], but in case they are missed somehow it might also end up 
with bad firmware state to some extend. Just fyi.

[1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
[2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")

Thanks,
-Siwei
>
> Then I also tried to test it with the net.git tree, but it will hit
> the host kernel panic after compiling when rebooting the host. For the
> call trace info please review following messages:
> [    9.902851] No filesystem could mount root, tried:
> [    9.902851]
> [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
> [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> 1.3.2 03/28/2023
> [    9.935876] Call Trace:
> [    9.938332]  <TASK>
> [    9.940436]  panic+0x356/0x380
> [    9.943513]  mount_root_generic+0x2e7/0x300
> [    9.947717]  prepare_namespace+0x65/0x270
> [    9.951731]  kernel_init_freeable+0x2e2/0x310
> [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> [    9.960221]  kernel_init+0x16/0x1d0
> [    9.963715]  ret_from_fork+0x2d/0x50
> [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> [    9.971404]  ret_from_fork_asm+0x1a/0x30
> [    9.975348]  </TASK>
> [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> unknown-block(0,0) ]---
>
> # git log -1
> commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> origin/main, origin/HEAD)
> Merge: 8f7617f45009 2409fa66e29a
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Thu Mar 13 07:58:48 2025 -1000
>
>      Merge tag 'net-6.14-rc7' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>
>
> Thanks
>
> Lei
>> Thanks
>>
>>> Best Regards
>>> Lei
>>>
>>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>> Current memory operations like pinning may take a lot of time at the
>>>> destination.  Currently they are done after the source of the migration is
>>>> stopped, and before the workload is resumed at the destination.  This is a
>>>> period where neigher traffic can flow, nor the VM workload can continue
>>>> (downtime).
>>>>
>>>> We can do better as we know the memory layout of the guest RAM at the
>>>> destination from the moment that all devices are initializaed.  So
>>>> moving that operation allows QEMU to communicate the kernel the maps
>>>> while the workload is still running in the source, so Linux can start
>>>> mapping them.
>>>>
>>>> As a small drawback, there is a time in the initialization where QEMU
>>>> cannot respond to QMP etc.  By some testing, this time is about
>>>> 0.2seconds.  This may be further reduced (or increased) depending on the
>>>> vdpa driver and the platform hardware, and it is dominated by the cost
>>>> of memory pinning.
>>>>
>>>> This matches the time that we move out of the called downtime window.
>>>> The downtime is measured as checking the trace timestamp from the moment
>>>> the source suspend the device to the moment the destination starts the
>>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
>>>> secs to 2.0949.
>>>>
>>>> Future directions on top of this series may include to move more things ahead
>>>> of the migration time, like set DRIVER_OK or perform actual iterative migration
>>>> of virtio-net devices.
>>>>
>>>> Comments are welcome.
>>>>
>>>> This series is a different approach of series [1]. As the title does not
>>>> reflect the changes anymore, please refer to the previous one to know the
>>>> series history.
>>>>
>>>> This series is based on [2], it must be applied after it.
>>>>
>>>> [Jonah Palmer]
>>>> This series was rebased after [3] was pulled in, as [3] was a prerequisite
>>>> fix for this series.
>>>>
>>>> v3:
>>>> ---
>>>> * Rebase
>>>>
>>>> v2:
>>>> ---
>>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
>>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
>>>>
>>>> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
>>>>
>>>> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
>>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
>>>> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
>>>>
>>>> Eugenio Pérez (7):
>>>>    vdpa: check for iova tree initialized at net_client_start
>>>>    vdpa: reorder vhost_vdpa_set_backend_cap
>>>>    vdpa: set backend capabilities at vhost_vdpa_init
>>>>    vdpa: add listener_registered
>>>>    vdpa: reorder listener assignment
>>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
>>>>    vdpa: move memory listener register to vhost_vdpa_init
>>>>
>>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
>>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
>>>>   net/vhost-vdpa.c               | 34 ++----------
>>>>   3 files changed, 88 insertions(+), 66 deletions(-)
>>>>
>>>> --
>>>> 2.43.5
>>>>
>>>>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
  2025-03-19  0:14       ` Si-Wei Liu
@ 2025-03-20 15:07         ` Lei Yang
  2025-03-20 15:48           ` Dragos Tatulea
  0 siblings, 1 reply; 16+ messages in thread
From: Lei Yang @ 2025-03-20 15:07 UTC (permalink / raw)
  To: Si-Wei Liu, dtatulea
  Cc: Jason Wang, Jonah Palmer, qemu-devel, eperezma, peterx, mst,
	jasowant, lvivier, leiyan, parav, sgarzare, lingshan.zhu,
	boris.ostrovsky

Hi Dragos, Si-Wei

1.  I applied [0] [1] [2] to the downstream kernel then tested
hotplug/unplug, this bug still exists.

[0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
[1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
[2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")

2. Si-Wei mentioned two patches [1] [2] have been merged into qemu
master branch, so based on the test result it can not help fix this
bug.
[1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
[2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")

3. I found triggers for the unhealthy report from firmware step is
just boot up guest when using the current patches qemu. The host dmesg
will print  unhealthy info immediately after booting up the guest.

Thanks
Lei


On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>
> Hi Lei,
>
> On 3/18/2025 7:06 AM, Lei Yang wrote:
> > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
> >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> >>> Hi Jonah
> >>>
> >>> I tested this series with the vhost_vdpa device based on mellanox
> >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> >>> easier to reproduce under the hotplug/unplug device scenario.
> >>> For the core dump messages please review the attachment.
> >>> FW version:
> >>> #  flint -d 0000:0d:00.0 q |grep Version
> >>> FW Version:            22.44.1036
> >>> Product Version:       22.44.1036
> >> The trace looks more like a mlx5e driver bug other than vDPA?
> >>
> >> [ 3256.256707] Call Trace:
> >> [ 3256.256708]  <IRQ>
> >> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> >> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> >> [ 3256.256715]  ? __build_skb+0x4a/0x60
> >> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> >> [ 3256.256720]  ? die_addr+0x39/0x60
> >> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> >> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> >> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> >> [ 3256.256738]  __build_skb+0x4a/0x60
> >> [ 3256.256740]  build_skb+0x11/0xa0
> >> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> >> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> >> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> >> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> >> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> >> [ 3256.257226]  __napi_poll+0x29/0x170
> >> [ 3256.257229]  net_rx_action+0x29c/0x370
> >> [ 3256.257231]  handle_softirqs+0xce/0x270
> >> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> >> [ 3256.257238]  common_interrupt+0x80/0xa0
> >>
> > Hi Jason
> >
> >> Which kernel tree did you use? Can you please try net.git?
> > I used the latest 9.6 downstream kernel and upstream qemu (applied
> > this series of patches) to test this scenario.
> > First based on my test result this bug is related to this series of
> > patches, the conclusions are based on the following test results(All
> > test results are based on the above mentioned nic driver):
> > Case 1: downstream kernel + downstream qemu-kvm  -  pass
> > Case 2: downstream kernel + upstream qemu (doesn't included this
> > series of patches)  -  pass
> > Case 3: downstream kernel + upstream qemu (included this series of
> > patches)  - failed, reproduce ratio 100%
> Just as Dragos replied earlier, the firmware was already in a bogus
> state before the panic that I also suspect it has something to do with
> various bugs in the downstream kernel. You have to apply the 3 patches
> to the downstream kernel before you may kick of the relevant tests
> again. Please pay special attention to which specific command or step
> that triggers the unhealthy report from firmware, and let us know if you
> still run into any of them.
>
> In addition, as you seem to be testing the device hot plug and unplug
> use cases, for which the latest qemu should have related fixes
> below[1][2], but in case they are missed somehow it might also end up
> with bad firmware state to some extend. Just fyi.
>
> [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
>
> Thanks,
> -Siwei
> >
> > Then I also tried to test it with the net.git tree, but it will hit
> > the host kernel panic after compiling when rebooting the host. For the
> > call trace info please review following messages:
> > [    9.902851] No filesystem could mount root, tried:
> > [    9.902851]
> > [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> > [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
> > [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> > 1.3.2 03/28/2023
> > [    9.935876] Call Trace:
> > [    9.938332]  <TASK>
> > [    9.940436]  panic+0x356/0x380
> > [    9.943513]  mount_root_generic+0x2e7/0x300
> > [    9.947717]  prepare_namespace+0x65/0x270
> > [    9.951731]  kernel_init_freeable+0x2e2/0x310
> > [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> > [    9.960221]  kernel_init+0x16/0x1d0
> > [    9.963715]  ret_from_fork+0x2d/0x50
> > [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> > [    9.971404]  ret_from_fork_asm+0x1a/0x30
> > [    9.975348]  </TASK>
> > [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> > unknown-block(0,0) ]---
> >
> > # git log -1
> > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> > origin/main, origin/HEAD)
> > Merge: 8f7617f45009 2409fa66e29a
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Thu Mar 13 07:58:48 2025 -1000
> >
> >      Merge tag 'net-6.14-rc7' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> >
> >
> > Thanks
> >
> > Lei
> >> Thanks
> >>
> >>> Best Regards
> >>> Lei
> >>>
> >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>> Current memory operations like pinning may take a lot of time at the
> >>>> destination.  Currently they are done after the source of the migration is
> >>>> stopped, and before the workload is resumed at the destination.  This is a
> >>>> period where neigher traffic can flow, nor the VM workload can continue
> >>>> (downtime).
> >>>>
> >>>> We can do better as we know the memory layout of the guest RAM at the
> >>>> destination from the moment that all devices are initializaed.  So
> >>>> moving that operation allows QEMU to communicate the kernel the maps
> >>>> while the workload is still running in the source, so Linux can start
> >>>> mapping them.
> >>>>
> >>>> As a small drawback, there is a time in the initialization where QEMU
> >>>> cannot respond to QMP etc.  By some testing, this time is about
> >>>> 0.2seconds.  This may be further reduced (or increased) depending on the
> >>>> vdpa driver and the platform hardware, and it is dominated by the cost
> >>>> of memory pinning.
> >>>>
> >>>> This matches the time that we move out of the called downtime window.
> >>>> The downtime is measured as checking the trace timestamp from the moment
> >>>> the source suspend the device to the moment the destination starts the
> >>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> >>>> secs to 2.0949.
> >>>>
> >>>> Future directions on top of this series may include to move more things ahead
> >>>> of the migration time, like set DRIVER_OK or perform actual iterative migration
> >>>> of virtio-net devices.
> >>>>
> >>>> Comments are welcome.
> >>>>
> >>>> This series is a different approach of series [1]. As the title does not
> >>>> reflect the changes anymore, please refer to the previous one to know the
> >>>> series history.
> >>>>
> >>>> This series is based on [2], it must be applied after it.
> >>>>
> >>>> [Jonah Palmer]
> >>>> This series was rebased after [3] was pulled in, as [3] was a prerequisite
> >>>> fix for this series.
> >>>>
> >>>> v3:
> >>>> ---
> >>>> * Rebase
> >>>>
> >>>> v2:
> >>>> ---
> >>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
> >>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
> >>>>
> >>>> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> >>>>
> >>>> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> >>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> >>>> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> >>>>
> >>>> Eugenio Pérez (7):
> >>>>    vdpa: check for iova tree initialized at net_client_start
> >>>>    vdpa: reorder vhost_vdpa_set_backend_cap
> >>>>    vdpa: set backend capabilities at vhost_vdpa_init
> >>>>    vdpa: add listener_registered
> >>>>    vdpa: reorder listener assignment
> >>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
> >>>>    vdpa: move memory listener register to vhost_vdpa_init
> >>>>
> >>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> >>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> >>>>   net/vhost-vdpa.c               | 34 ++----------
> >>>>   3 files changed, 88 insertions(+), 66 deletions(-)
> >>>>
> >>>> --
> >>>> 2.43.5
> >>>>
> >>>>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
  2025-03-20 15:07         ` Lei Yang
@ 2025-03-20 15:48           ` Dragos Tatulea
  2025-03-21  6:45             ` Lei Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Dragos Tatulea @ 2025-03-20 15:48 UTC (permalink / raw)
  To: Lei Yang, Si-Wei Liu
  Cc: Jason Wang, Jonah Palmer, qemu-devel, eperezma, peterx, mst,
	jasowant, lvivier, leiyan, parav, sgarzare, lingshan.zhu,
	boris.ostrovsky

Hi Lei,

On 03/20, Lei Yang wrote:
> Hi Dragos, Si-Wei
> 
> 1.  I applied [0] [1] [2] to the downstream kernel then tested
> hotplug/unplug, this bug still exists.
> 
> [0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
> [1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
> [2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")
> 
> 2. Si-Wei mentioned two patches [1] [2] have been merged into qemu
> master branch, so based on the test result it can not help fix this
> bug.
> [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> 
> 3. I found triggers for the unhealthy report from firmware step is
> just boot up guest when using the current patches qemu. The host dmesg
> will print  unhealthy info immediately after booting up the guest.
> 
Did you set the locked memory to ulimite before (ulimit -l unlimited)?
This could also be the cause for the FW issue.

Thanks,
Dragos

> Thanks
> Lei
> 
> 
> On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> >
> > Hi Lei,
> >
> > On 3/18/2025 7:06 AM, Lei Yang wrote:
> > > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
> > >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> > >>> Hi Jonah
> > >>>
> > >>> I tested this series with the vhost_vdpa device based on mellanox
> > >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > >>> easier to reproduce under the hotplug/unplug device scenario.
> > >>> For the core dump messages please review the attachment.
> > >>> FW version:
> > >>> #  flint -d 0000:0d:00.0 q |grep Version
> > >>> FW Version:            22.44.1036
> > >>> Product Version:       22.44.1036
> > >> The trace looks more like a mlx5e driver bug other than vDPA?
> > >>
> > >> [ 3256.256707] Call Trace:
> > >> [ 3256.256708]  <IRQ>
> > >> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> > >> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> > >> [ 3256.256715]  ? __build_skb+0x4a/0x60
> > >> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> > >> [ 3256.256720]  ? die_addr+0x39/0x60
> > >> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> > >> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> > >> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> > >> [ 3256.256738]  __build_skb+0x4a/0x60
> > >> [ 3256.256740]  build_skb+0x11/0xa0
> > >> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> > >> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> > >> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> > >> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> > >> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> > >> [ 3256.257226]  __napi_poll+0x29/0x170
> > >> [ 3256.257229]  net_rx_action+0x29c/0x370
> > >> [ 3256.257231]  handle_softirqs+0xce/0x270
> > >> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> > >> [ 3256.257238]  common_interrupt+0x80/0xa0
> > >>
> > > Hi Jason
> > >
> > >> Which kernel tree did you use? Can you please try net.git?
> > > I used the latest 9.6 downstream kernel and upstream qemu (applied
> > > this series of patches) to test this scenario.
> > > First based on my test result this bug is related to this series of
> > > patches, the conclusions are based on the following test results(All
> > > test results are based on the above mentioned nic driver):
> > > Case 1: downstream kernel + downstream qemu-kvm  -  pass
> > > Case 2: downstream kernel + upstream qemu (doesn't included this
> > > series of patches)  -  pass
> > > Case 3: downstream kernel + upstream qemu (included this series of
> > > patches)  - failed, reproduce ratio 100%
> > Just as Dragos replied earlier, the firmware was already in a bogus
> > state before the panic that I also suspect it has something to do with
> > various bugs in the downstream kernel. You have to apply the 3 patches
> > to the downstream kernel before you may kick of the relevant tests
> > again. Please pay special attention to which specific command or step
> > that triggers the unhealthy report from firmware, and let us know if you
> > still run into any of them.
> >
> > In addition, as you seem to be testing the device hot plug and unplug
> > use cases, for which the latest qemu should have related fixes
> > below[1][2], but in case they are missed somehow it might also end up
> > with bad firmware state to some extend. Just fyi.
> >
> > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> >
> > Thanks,
> > -Siwei
> > >
> > > Then I also tried to test it with the net.git tree, but it will hit
> > > the host kernel panic after compiling when rebooting the host. For the
> > > call trace info please review following messages:
> > > [    9.902851] No filesystem could mount root, tried:
> > > [    9.902851]
> > > [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> > > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> > > [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
> > > [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> > > 1.3.2 03/28/2023
> > > [    9.935876] Call Trace:
> > > [    9.938332]  <TASK>
> > > [    9.940436]  panic+0x356/0x380
> > > [    9.943513]  mount_root_generic+0x2e7/0x300
> > > [    9.947717]  prepare_namespace+0x65/0x270
> > > [    9.951731]  kernel_init_freeable+0x2e2/0x310
> > > [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> > > [    9.960221]  kernel_init+0x16/0x1d0
> > > [    9.963715]  ret_from_fork+0x2d/0x50
> > > [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> > > [    9.971404]  ret_from_fork_asm+0x1a/0x30
> > > [    9.975348]  </TASK>
> > > [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> > > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> > > unknown-block(0,0) ]---
> > >
> > > # git log -1
> > > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> > > origin/main, origin/HEAD)
> > > Merge: 8f7617f45009 2409fa66e29a
> > > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > > Date:   Thu Mar 13 07:58:48 2025 -1000
> > >
> > >      Merge tag 'net-6.14-rc7' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > >
> > >
> > > Thanks
> > >
> > > Lei
> > >> Thanks
> > >>
> > >>> Best Regards
> > >>> Lei
> > >>>
> > >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> > >>>> Current memory operations like pinning may take a lot of time at the
> > >>>> destination.  Currently they are done after the source of the migration is
> > >>>> stopped, and before the workload is resumed at the destination.  This is a
> > >>>> period where neigher traffic can flow, nor the VM workload can continue
> > >>>> (downtime).
> > >>>>
> > >>>> We can do better as we know the memory layout of the guest RAM at the
> > >>>> destination from the moment that all devices are initializaed.  So
> > >>>> moving that operation allows QEMU to communicate the kernel the maps
> > >>>> while the workload is still running in the source, so Linux can start
> > >>>> mapping them.
> > >>>>
> > >>>> As a small drawback, there is a time in the initialization where QEMU
> > >>>> cannot respond to QMP etc.  By some testing, this time is about
> > >>>> 0.2seconds.  This may be further reduced (or increased) depending on the
> > >>>> vdpa driver and the platform hardware, and it is dominated by the cost
> > >>>> of memory pinning.
> > >>>>
> > >>>> This matches the time that we move out of the called downtime window.
> > >>>> The downtime is measured as checking the trace timestamp from the moment
> > >>>> the source suspend the device to the moment the destination starts the
> > >>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > >>>> secs to 2.0949.
> > >>>>
> > >>>> Future directions on top of this series may include to move more things ahead
> > >>>> of the migration time, like set DRIVER_OK or perform actual iterative migration
> > >>>> of virtio-net devices.
> > >>>>
> > >>>> Comments are welcome.
> > >>>>
> > >>>> This series is a different approach of series [1]. As the title does not
> > >>>> reflect the changes anymore, please refer to the previous one to know the
> > >>>> series history.
> > >>>>
> > >>>> This series is based on [2], it must be applied after it.
> > >>>>
> > >>>> [Jonah Palmer]
> > >>>> This series was rebased after [3] was pulled in, as [3] was a prerequisite
> > >>>> fix for this series.
> > >>>>
> > >>>> v3:
> > >>>> ---
> > >>>> * Rebase
> > >>>>
> > >>>> v2:
> > >>>> ---
> > >>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
> > >>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
> > >>>>
> > >>>> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> > >>>>
> > >>>> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> > >>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > >>>> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> > >>>>
> > >>>> Eugenio Pérez (7):
> > >>>>    vdpa: check for iova tree initialized at net_client_start
> > >>>>    vdpa: reorder vhost_vdpa_set_backend_cap
> > >>>>    vdpa: set backend capabilities at vhost_vdpa_init
> > >>>>    vdpa: add listener_registered
> > >>>>    vdpa: reorder listener assignment
> > >>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
> > >>>>    vdpa: move memory listener register to vhost_vdpa_init
> > >>>>
> > >>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> > >>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> > >>>>   net/vhost-vdpa.c               | 34 ++----------
> > >>>>   3 files changed, 88 insertions(+), 66 deletions(-)
> > >>>>
> > >>>> --
> > >>>> 2.43.5
> > >>>>
> > >>>>
> >
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init
  2025-03-20 15:48           ` Dragos Tatulea
@ 2025-03-21  6:45             ` Lei Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Lei Yang @ 2025-03-21  6:45 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Si-Wei Liu, Jason Wang, Jonah Palmer, qemu-devel, eperezma,
	peterx, mst, jasowant, lvivier, leiyan, parav, sgarzare,
	lingshan.zhu, boris.ostrovsky

On Thu, Mar 20, 2025 at 11:48 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Hi Lei,
>
> On 03/20, Lei Yang wrote:
> > Hi Dragos, Si-Wei
> >
> > 1.  I applied [0] [1] [2] to the downstream kernel then tested
> > hotplug/unplug, this bug still exists.
> >
> > [0] 35025963326e ("vdpa/mlx5: Fix suboptimal range on iotlb iteration")
> > [1] 29ce8b8a4fa7 ("vdpa/mlx5: Fix PA offset with unaligned starting iotlb map")
> > [2] a6097e0a54a5 ("vdpa/mlx5: Fix oversized null mkey longer than 32bit")
> >
> > 2. Si-Wei mentioned two patches [1] [2] have been merged into qemu
> > master branch, so based on the test result it can not help fix this
> > bug.
> > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> >
> > 3. I found triggers for the unhealthy report from firmware step is
> > just boot up guest when using the current patches qemu. The host dmesg
> > will print  unhealthy info immediately after booting up the guest.
> >

Hi Dragos

> Did you set the locked memory to ulimite before (ulimit -l unlimited)?
> This could also be the cause for the FW issue.

Yes, I did it. I executed it (ulimit -l unlimited) before I boot up the guest.

Thanks
Lei
>
> Thanks,
> Dragos
>
> > Thanks
> > Lei
> >
> >
> > On Wed, Mar 19, 2025 at 8:14 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
> > >
> > > Hi Lei,
> > >
> > > On 3/18/2025 7:06 AM, Lei Yang wrote:
> > > > On Tue, Mar 18, 2025 at 10:15 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >> On Tue, Mar 18, 2025 at 9:55 AM Lei Yang <leiyang@redhat.com> wrote:
> > > >>> Hi Jonah
> > > >>>
> > > >>> I tested this series with the vhost_vdpa device based on mellanox
> > > >>> ConnectX-6 DX nic and hit the host kernel crash. This problem can be
> > > >>> easier to reproduce under the hotplug/unplug device scenario.
> > > >>> For the core dump messages please review the attachment.
> > > >>> FW version:
> > > >>> #  flint -d 0000:0d:00.0 q |grep Version
> > > >>> FW Version:            22.44.1036
> > > >>> Product Version:       22.44.1036
> > > >> The trace looks more like a mlx5e driver bug other than vDPA?
> > > >>
> > > >> [ 3256.256707] Call Trace:
> > > >> [ 3256.256708]  <IRQ>
> > > >> [ 3256.256709]  ? show_trace_log_lvl+0x1c4/0x2df
> > > >> [ 3256.256714]  ? show_trace_log_lvl+0x1c4/0x2df
> > > >> [ 3256.256715]  ? __build_skb+0x4a/0x60
> > > >> [ 3256.256719]  ? __die_body.cold+0x8/0xd
> > > >> [ 3256.256720]  ? die_addr+0x39/0x60
> > > >> [ 3256.256725]  ? exc_general_protection+0x1ec/0x420
> > > >> [ 3256.256729]  ? asm_exc_general_protection+0x22/0x30
> > > >> [ 3256.256736]  ? __build_skb_around+0x8c/0xf0
> > > >> [ 3256.256738]  __build_skb+0x4a/0x60
> > > >> [ 3256.256740]  build_skb+0x11/0xa0
> > > >> [ 3256.256743]  mlx5e_skb_from_cqe_mpwrq_linear+0x156/0x280 [mlx5_core]
> > > >> [ 3256.256872]  mlx5e_handle_rx_cqe_mpwrq_rep+0xcb/0x1e0 [mlx5_core]
> > > >> [ 3256.256964]  mlx5e_rx_cq_process_basic_cqe_comp+0x39f/0x3c0 [mlx5_core]
> > > >> [ 3256.257053]  mlx5e_poll_rx_cq+0x3a/0xc0 [mlx5_core]
> > > >> [ 3256.257139]  mlx5e_napi_poll+0xe2/0x710 [mlx5_core]
> > > >> [ 3256.257226]  __napi_poll+0x29/0x170
> > > >> [ 3256.257229]  net_rx_action+0x29c/0x370
> > > >> [ 3256.257231]  handle_softirqs+0xce/0x270
> > > >> [ 3256.257236]  __irq_exit_rcu+0xa3/0xc0
> > > >> [ 3256.257238]  common_interrupt+0x80/0xa0
> > > >>
> > > > Hi Jason
> > > >
> > > >> Which kernel tree did you use? Can you please try net.git?
> > > > I used the latest 9.6 downstream kernel and upstream qemu (applied
> > > > this series of patches) to test this scenario.
> > > > First based on my test result this bug is related to this series of
> > > > patches, the conclusions are based on the following test results(All
> > > > test results are based on the above mentioned nic driver):
> > > > Case 1: downstream kernel + downstream qemu-kvm  -  pass
> > > > Case 2: downstream kernel + upstream qemu (doesn't included this
> > > > series of patches)  -  pass
> > > > Case 3: downstream kernel + upstream qemu (included this series of
> > > > patches)  - failed, reproduce ratio 100%
> > > Just as Dragos replied earlier, the firmware was already in a bogus
> > > state before the panic that I also suspect it has something to do with
> > > various bugs in the downstream kernel. You have to apply the 3 patches
> > > to the downstream kernel before you may kick of the relevant tests
> > > again. Please pay special attention to which specific command or step
> > > that triggers the unhealthy report from firmware, and let us know if you
> > > still run into any of them.
> > >
> > > In addition, as you seem to be testing the device hot plug and unplug
> > > use cases, for which the latest qemu should have related fixes
> > > below[1][2], but in case they are missed somehow it might also end up
> > > with bad firmware state to some extend. Just fyi.
> > >
> > > [1] db0d4017f9b9 ("net: parameterize the removing client from nc list")
> > > [2] e7891c575fb2 ("net: move backend cleanup to NIC cleanup")
> > >
> > > Thanks,
> > > -Siwei
> > > >
> > > > Then I also tried to test it with the net.git tree, but it will hit
> > > > the host kernel panic after compiling when rebooting the host. For the
> > > > call trace info please review following messages:
> > > > [    9.902851] No filesystem could mount root, tried:
> > > > [    9.902851]
> > > > [    9.909248] Kernel panic - not syncing: VFS: Unable to mount root
> > > > fs on "/dev/mapper/rhel_dell--per760--12-root" or unknown-block(0,0)
> > > > [    9.921335] CPU: 16 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6+ #3
> > > > [    9.928398] Hardware name: Dell Inc. PowerEdge R760/0NH8MJ, BIOS
> > > > 1.3.2 03/28/2023
> > > > [    9.935876] Call Trace:
> > > > [    9.938332]  <TASK>
> > > > [    9.940436]  panic+0x356/0x380
> > > > [    9.943513]  mount_root_generic+0x2e7/0x300
> > > > [    9.947717]  prepare_namespace+0x65/0x270
> > > > [    9.951731]  kernel_init_freeable+0x2e2/0x310
> > > > [    9.956105]  ? __pfx_kernel_init+0x10/0x10
> > > > [    9.960221]  kernel_init+0x16/0x1d0
> > > > [    9.963715]  ret_from_fork+0x2d/0x50
> > > > [    9.967303]  ? __pfx_kernel_init+0x10/0x10
> > > > [    9.971404]  ret_from_fork_asm+0x1a/0x30
> > > > [    9.975348]  </TASK>
> > > > [    9.977555] Kernel Offset: 0xc00000 from 0xffffffff81000000
> > > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > [   10.101881] ---[ end Kernel panic - not syncing: VFS: Unable to
> > > > mount root fs on "/dev/mapper/rhel_dell--per760--12-root" or
> > > > unknown-block(0,0) ]---
> > > >
> > > > # git log -1
> > > > commit 4003c9e78778e93188a09d6043a74f7154449d43 (HEAD -> main,
> > > > origin/main, origin/HEAD)
> > > > Merge: 8f7617f45009 2409fa66e29a
> > > > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > > > Date:   Thu Mar 13 07:58:48 2025 -1000
> > > >
> > > >      Merge tag 'net-6.14-rc7' of
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > > >
> > > >
> > > > Thanks
> > > >
> > > > Lei
> > > >> Thanks
> > > >>
> > > >>> Best Regards
> > > >>> Lei
> > > >>>
> > > >>> On Fri, Mar 14, 2025 at 9:04 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> > > >>>> Current memory operations like pinning may take a lot of time at the
> > > >>>> destination.  Currently they are done after the source of the migration is
> > > >>>> stopped, and before the workload is resumed at the destination.  This is a
> > > >>>> period where neigher traffic can flow, nor the VM workload can continue
> > > >>>> (downtime).
> > > >>>>
> > > >>>> We can do better as we know the memory layout of the guest RAM at the
> > > >>>> destination from the moment that all devices are initializaed.  So
> > > >>>> moving that operation allows QEMU to communicate the kernel the maps
> > > >>>> while the workload is still running in the source, so Linux can start
> > > >>>> mapping them.
> > > >>>>
> > > >>>> As a small drawback, there is a time in the initialization where QEMU
> > > >>>> cannot respond to QMP etc.  By some testing, this time is about
> > > >>>> 0.2seconds.  This may be further reduced (or increased) depending on the
> > > >>>> vdpa driver and the platform hardware, and it is dominated by the cost
> > > >>>> of memory pinning.
> > > >>>>
> > > >>>> This matches the time that we move out of the called downtime window.
> > > >>>> The downtime is measured as checking the trace timestamp from the moment
> > > >>>> the source suspend the device to the moment the destination starts the
> > > >>>> eight and last virtqueue pair.  For a 39G guest, it goes from ~2.2526
> > > >>>> secs to 2.0949.
> > > >>>>
> > > >>>> Future directions on top of this series may include to move more things ahead
> > > >>>> of the migration time, like set DRIVER_OK or perform actual iterative migration
> > > >>>> of virtio-net devices.
> > > >>>>
> > > >>>> Comments are welcome.
> > > >>>>
> > > >>>> This series is a different approach of series [1]. As the title does not
> > > >>>> reflect the changes anymore, please refer to the previous one to know the
> > > >>>> series history.
> > > >>>>
> > > >>>> This series is based on [2], it must be applied after it.
> > > >>>>
> > > >>>> [Jonah Palmer]
> > > >>>> This series was rebased after [3] was pulled in, as [3] was a prerequisite
> > > >>>> fix for this series.
> > > >>>>
> > > >>>> v3:
> > > >>>> ---
> > > >>>> * Rebase
> > > >>>>
> > > >>>> v2:
> > > >>>> ---
> > > >>>> * Move the memory listener registration to vhost_vdpa_set_owner function.
> > > >>>> * Move the iova_tree allocation to net_vhost_vdpa_init.
> > > >>>>
> > > >>>> v1 at https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg02136.html.
> > > >>>>
> > > >>>> [1] https://patchwork.kernel.org/project/qemu-devel/cover/20231215172830.2540987-1-eperezma@redhat.com/
> > > >>>> [2] https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg05910.html
> > > >>>> [3] https://lore.kernel.org/qemu-devel/20250217144936.3589907-1-jonah.palmer@oracle.com/
> > > >>>>
> > > >>>> Eugenio Pérez (7):
> > > >>>>    vdpa: check for iova tree initialized at net_client_start
> > > >>>>    vdpa: reorder vhost_vdpa_set_backend_cap
> > > >>>>    vdpa: set backend capabilities at vhost_vdpa_init
> > > >>>>    vdpa: add listener_registered
> > > >>>>    vdpa: reorder listener assignment
> > > >>>>    vdpa: move iova_tree allocation to net_vhost_vdpa_init
> > > >>>>    vdpa: move memory listener register to vhost_vdpa_init
> > > >>>>
> > > >>>>   hw/virtio/vhost-vdpa.c         | 98 ++++++++++++++++++++++------------
> > > >>>>   include/hw/virtio/vhost-vdpa.h | 22 +++++++-
> > > >>>>   net/vhost-vdpa.c               | 34 ++----------
> > > >>>>   3 files changed, 88 insertions(+), 66 deletions(-)
> > > >>>>
> > > >>>> --
> > > >>>> 2.43.5
> > > >>>>
> > > >>>>
> > >
> >
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-03-21  6:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-14 13:01 [PATCH v3 0/7] Move memory listener register to vhost_vdpa_init Jonah Palmer
2025-03-14 13:01 ` [PATCH v3 1/7] vdpa: check for iova tree initialized at net_client_start Jonah Palmer
2025-03-14 13:01 ` [PATCH v3 2/7] vdpa: reorder vhost_vdpa_set_backend_cap Jonah Palmer
2025-03-14 13:01 ` [PATCH v3 3/7] vdpa: set backend capabilities at vhost_vdpa_init Jonah Palmer
2025-03-14 13:01 ` [PATCH v3 4/7] vdpa: add listener_registered Jonah Palmer
2025-03-14 13:01 ` [PATCH v3 5/7] vdpa: reorder listener assignment Jonah Palmer
2025-03-14 13:01 ` [PATCH v3 6/7] vdpa: move iova_tree allocation to net_vhost_vdpa_init Jonah Palmer
2025-03-14 13:01 ` [PATCH v3 7/7] vdpa: move memory listener register to vhost_vdpa_init Jonah Palmer
2025-03-18  1:53 ` [PATCH v3 0/7] Move " Lei Yang
2025-03-18  2:14   ` Jason Wang
2025-03-18 14:06     ` Lei Yang
2025-03-18 18:09       ` Dragos Tatulea
2025-03-19  0:14       ` Si-Wei Liu
2025-03-20 15:07         ` Lei Yang
2025-03-20 15:48           ` Dragos Tatulea
2025-03-21  6:45             ` Lei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).