public inbox for virtio-fs@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH v5 0/5] support inflight migration
@ 2026-01-12 11:44 Alexandr Moshkov
  2026-01-12 11:44 ` [PATCH v5 1/5] vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE Alexandr Moshkov
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Alexandr Moshkov @ 2026-01-12 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée,
	Stefan Hajnoczi, mzamazal, Peter Xu, Fabiano Rosas, qemu-block,
	virtio-fs, yc-core@yandex-team.ru, Eric Blake, Markus Armbruster,
	Alexandr Moshkov

v5:
Make protocol feature flag instead of GET_VRING_BASE msg parameter,
so all changes in other devices is no longer needed.
Now back-end may set this feature for QEMU. This feature must be set
with in-flight migration parameter in vhost-user-blk. 

v4:
While testing inflight migration, I notices a problem with the fact that
GET_VRING_BASE is needed during migration, so the back-end stops
dirtying pages and synchronizes `last_avail` counter with QEMU. So after
migration in-flight I/O requests will be looks like resubmited on destination vm.

However, in new logic, we no longer need to wait for in-flight requests
to be complete at GET_VRING_BASE message. So support new parameter
`should_drain` in the GET_VRING_BASE to allow back-end stop vrings
immediately without waiting for in-flight I/O requests to complete.

Also:
- modify vhost-user rst
- refactor on vhost-user-blk.c, now `should_drain` is based on
  device parameter `inflight-migration`

v3:
- use pre_load_errp instead of pre_load in vhost.c
- change vhost-user-blk property to
  "skip-get-vring-base-inflight-migration"
- refactor vhost-user-blk.c, by moving vhost_user_blk_inflight_needed() higher

v2:
- rewrite migration using VMSD instead of qemufile API
- add vhost-user-blk parameter instead of migration capability

I don't know if VMSD was used cleanly in migration implementation, so
feel free for comments.

Based on Vladimir's work:
[PATCH v2 00/25] vhost-user-blk: live-backend local migration
  which was based on:
    - [PATCH v4 0/7] chardev: postpone connect
      (which in turn is based on [PATCH 0/2] remove deprecated 'reconnect' options)
    - [PATCH v3 00/23] vhost refactoring and fixes
    - [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler

Based-on: <20250924133309.334631-1-vsementsov@yandex-team.ru>
Based-on: <20251015212051.1156334-1-vsementsov@yandex-team.ru>
Based-on: <20251015145808.1112843-1-vsementsov@yandex-team.ru>
Based-on: <20251015132136.1083972-15-vsementsov@yandex-team.ru>
Based-on: <20251016114104.1384675-1-vsementsov@yandex-team.ru>

---

Hi!

During inter-host migration, waiting for disk requests to be drained
in the vhost-user backend can incur significant downtime.

This can be avoided if QEMU migrates the inflight region in
vhost-user-blk.
Thus, during the qemu migration, with protocol feature flag the vhost-user
back-end can immediately stop vrings, so all in-flight requests will be
migrated to another host.

At first, I tried to implement migration for all vhost-user devices that support inflight at once,
but this would require a lot of changes both in vhost-user-blk (to transfer it to the base class) and
in the vhost-user-base base class (inflight implementation and remodeling + a large refactor).

Therefore, for now I decided to leave this idea for later and
implement the migration of the inflight region first for vhost-user-blk.

Alexandr Moshkov (5):
  vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE
  vhost-user: introduce protocol feature for skip drain on
    GET_VRING_BASE
  vmstate: introduce VMSTATE_VBUFFER_UINT64
  vhost: add vmstate for inflight region with inner buffer
  vhost-user-blk: support inter-host inflight migration

 docs/interop/vhost-user.rst        | 49 ++++++++++++++++--------------
 hw/block/vhost-user-blk.c          | 28 +++++++++++++++++
 hw/virtio/vhost.c                  | 42 +++++++++++++++++++++++++
 include/hw/virtio/vhost-user-blk.h |  1 +
 include/hw/virtio/vhost-user.h     |  1 +
 include/hw/virtio/vhost.h          |  6 ++++
 include/migration/vmstate.h        | 10 ++++++
 7 files changed, 115 insertions(+), 22 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 1/5] vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE
  2026-01-12 11:44 [PATCH v5 0/5] support inflight migration Alexandr Moshkov
@ 2026-01-12 11:44 ` Alexandr Moshkov
  2026-01-12 11:45 ` [PATCH v5 2/5] vhost-user: introduce protocol feature for skip drain " Alexandr Moshkov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Alexandr Moshkov @ 2026-01-12 11:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée,
	Stefan Hajnoczi, mzamazal, Peter Xu, Fabiano Rosas, qemu-block,
	virtio-fs, yc-core@yandex-team.ru, Eric Blake, Markus Armbruster,
	Alexandr Moshkov

By default, we assume that server need to wait all inflight IO on
GET_VRING_BASE. However, this fact is not recorded anywhere in the
documentation.
So, add this info in rst.

Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
---
 docs/interop/vhost-user.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 2e50f2ddfa..02908b48fa 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1243,7 +1243,8 @@ Front-end message types
 
   When and as long as all of a device's vrings are stopped, it is
   *suspended*, see :ref:`Suspended device state
-  <suspended_device_state>`.
+  <suspended_device_state>`. The back-end must complete all inflight I/O
+  requests for the specified vring before stopping it.
 
   The request payload's *num* field is currently reserved and must be
   set to 0.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 2/5] vhost-user: introduce protocol feature for skip drain on GET_VRING_BASE
  2026-01-12 11:44 [PATCH v5 0/5] support inflight migration Alexandr Moshkov
  2026-01-12 11:44 ` [PATCH v5 1/5] vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE Alexandr Moshkov
@ 2026-01-12 11:45 ` Alexandr Moshkov
  2026-01-12 18:08   ` Stefan Hajnoczi
  2026-01-12 11:45 ` [PATCH v5 3/5] vmstate: introduce VMSTATE_VBUFFER_UINT64 Alexandr Moshkov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Alexandr Moshkov @ 2026-01-12 11:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée,
	Stefan Hajnoczi, mzamazal, Peter Xu, Fabiano Rosas, qemu-block,
	virtio-fs, yc-core@yandex-team.ru, Eric Blake, Markus Armbruster,
	Alexandr Moshkov

Add vhost-user protocol feature
VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT

Now on GET_VRING_BASE this feature can control whether to wait for
in-flight requests to complete or not.

It will be helpfull in future for in-flight requests migration in
vhost-user devices.

Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
---
 docs/interop/vhost-user.rst    | 52 ++++++++++++++++++----------------
 include/hw/virtio/vhost-user.h |  1 +
 2 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 02908b48fa..80c80aada5 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -1033,26 +1033,27 @@ Protocol features
 
 .. code:: c
 
-  #define VHOST_USER_PROTOCOL_F_MQ                    0
-  #define VHOST_USER_PROTOCOL_F_LOG_SHMFD             1
-  #define VHOST_USER_PROTOCOL_F_RARP                  2
-  #define VHOST_USER_PROTOCOL_F_REPLY_ACK             3
-  #define VHOST_USER_PROTOCOL_F_MTU                   4
-  #define VHOST_USER_PROTOCOL_F_BACKEND_REQ           5
-  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN          6
-  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION        7
-  #define VHOST_USER_PROTOCOL_F_PAGEFAULT             8
-  #define VHOST_USER_PROTOCOL_F_CONFIG                9
-  #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD      10
-  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER        11
-  #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
-  #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
-  #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
-  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
-  #define VHOST_USER_PROTOCOL_F_STATUS               16
-  #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
-  #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT        18
-  #define VHOST_USER_PROTOCOL_F_DEVICE_STATE         19
+  #define VHOST_USER_PROTOCOL_F_MQ                       0
+  #define VHOST_USER_PROTOCOL_F_LOG_SHMFD                1
+  #define VHOST_USER_PROTOCOL_F_RARP                     2
+  #define VHOST_USER_PROTOCOL_F_REPLY_ACK                3
+  #define VHOST_USER_PROTOCOL_F_MTU                      4
+  #define VHOST_USER_PROTOCOL_F_BACKEND_REQ              5
+  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN             6
+  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION           7
+  #define VHOST_USER_PROTOCOL_F_PAGEFAULT                8
+  #define VHOST_USER_PROTOCOL_F_CONFIG                   9
+  #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD         10
+  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER           11
+  #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD          12
+  #define VHOST_USER_PROTOCOL_F_RESET_DEVICE            13
+  #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS    14
+  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS     15
+  #define VHOST_USER_PROTOCOL_F_STATUS                  16
+  #define VHOST_USER_PROTOCOL_F_XEN_MMAP                17
+  #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT           18
+  #define VHOST_USER_PROTOCOL_F_DEVICE_STATE            19
+  #define VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT 20
 
 Front-end message types
 -----------------------
@@ -1243,11 +1244,14 @@ Front-end message types
 
   When and as long as all of a device's vrings are stopped, it is
   *suspended*, see :ref:`Suspended device state
-  <suspended_device_state>`. The back-end must complete all inflight I/O
-  requests for the specified vring before stopping it.
+  <suspended_device_state>`.
 
-  The request payload's *num* field is currently reserved and must be
-  set to 0.
+  By default, the back-end must complete all inflight I/O requests for the
+  specified vring before stopping it.
+
+  If the ``VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT`` protocol feature
+  has been negotiated, the back-end may stop the vring immediately without
+  waiting for inflight I/O requests to complete.
 
 ``VHOST_USER_SET_VRING_KICK``
   :id: 12
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index fb89268de2..493fbce8b1 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -33,6 +33,7 @@ enum VhostUserProtocolFeature {
     /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
     VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
     VHOST_USER_PROTOCOL_F_DEVICE_STATE = 19,
+    VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT = 20,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 3/5] vmstate: introduce VMSTATE_VBUFFER_UINT64
  2026-01-12 11:44 [PATCH v5 0/5] support inflight migration Alexandr Moshkov
  2026-01-12 11:44 ` [PATCH v5 1/5] vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE Alexandr Moshkov
  2026-01-12 11:45 ` [PATCH v5 2/5] vhost-user: introduce protocol feature for skip drain " Alexandr Moshkov
@ 2026-01-12 11:45 ` Alexandr Moshkov
  2026-01-12 11:45 ` [PATCH v5 4/5] vhost: add vmstate for inflight region with inner buffer Alexandr Moshkov
  2026-01-12 11:45 ` [PATCH v5 5/5] vhost-user-blk: support inter-host inflight migration Alexandr Moshkov
  4 siblings, 0 replies; 10+ messages in thread
From: Alexandr Moshkov @ 2026-01-12 11:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée,
	Stefan Hajnoczi, mzamazal, Peter Xu, Fabiano Rosas, qemu-block,
	virtio-fs, yc-core@yandex-team.ru, Eric Blake, Markus Armbruster,
	Alexandr Moshkov

This is an analog of VMSTATE_VBUFFER_UINT32 macro, but for uint64 type.

Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
---
 include/migration/vmstate.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 7f1f1c166a..4c9e212d58 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -707,6 +707,16 @@ extern const VMStateInfo vmstate_info_qlist;
     .offset       = offsetof(_state, _field),                        \
 }
 
+#define VMSTATE_VBUFFER_UINT64(_field, _state, _version, _test, _field_size) { \
+    .name         = (stringify(_field)),                             \
+    .version_id   = (_version),                                      \
+    .field_exists = (_test),                                         \
+    .size_offset  = vmstate_offset_value(_state, _field_size, uint64_t),\
+    .info         = &vmstate_info_buffer,                            \
+    .flags        = VMS_VBUFFER | VMS_POINTER,                       \
+    .offset       = offsetof(_state, _field),                        \
+}
+
 #define VMSTATE_VBUFFER_ALLOC_UINT32(_field, _state, _version,       \
                                      _test, _field_size) {           \
     .name         = (stringify(_field)),                             \
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 4/5] vhost: add vmstate for inflight region with inner buffer
  2026-01-12 11:44 [PATCH v5 0/5] support inflight migration Alexandr Moshkov
                   ` (2 preceding siblings ...)
  2026-01-12 11:45 ` [PATCH v5 3/5] vmstate: introduce VMSTATE_VBUFFER_UINT64 Alexandr Moshkov
@ 2026-01-12 11:45 ` Alexandr Moshkov
  2026-01-12 18:22   ` Stefan Hajnoczi
  2026-01-12 11:45 ` [PATCH v5 5/5] vhost-user-blk: support inter-host inflight migration Alexandr Moshkov
  4 siblings, 1 reply; 10+ messages in thread
From: Alexandr Moshkov @ 2026-01-12 11:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée,
	Stefan Hajnoczi, mzamazal, Peter Xu, Fabiano Rosas, qemu-block,
	virtio-fs, yc-core@yandex-team.ru, Eric Blake, Markus Armbruster,
	Alexandr Moshkov

Prepare for future inflight region migration for vhost-user-blk.
We need to migrate size, queue_size, and inner buffer.

So firstly it migrate size and queue_size fields, then allocate memory for buffer with
migrated size, then migrate inner buffer itself.

Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
---
 hw/virtio/vhost.c         | 42 +++++++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost.h |  6 ++++++
 2 files changed, 48 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index c46203eb9c..9a746c9861 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2028,6 +2028,48 @@ const VMStateDescription vmstate_backend_transfer_vhost_inflight = {
     }
 };
 
+static int vhost_inflight_buffer_pre_load(void *opaque, Error **errp)
+{
+    info_report("vhost_inflight_region_buffer_pre_load");
+    struct vhost_inflight *inflight = opaque;
+
+    int fd = -1;
+    void *addr = qemu_memfd_alloc("vhost-inflight", inflight->size,
+                                  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
+                                  &fd, errp);
+    if (*errp) {
+        return -ENOMEM;
+    }
+
+    inflight->offset = 0;
+    inflight->addr = addr;
+    inflight->fd = fd;
+
+    return 0;
+}
+
+const VMStateDescription vmstate_vhost_inflight_region_buffer = {
+    .name = "vhost-inflight-region/buffer",
+    .pre_load_errp = vhost_inflight_buffer_pre_load,
+    .fields = (const VMStateField[]) {
+        VMSTATE_VBUFFER_UINT64(addr, struct vhost_inflight, 0, NULL, size),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+const VMStateDescription vmstate_vhost_inflight_region = {
+    .name = "vhost-inflight-region",
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(size, struct vhost_inflight),
+        VMSTATE_UINT16(queue_size, struct vhost_inflight),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * const []) {
+        &vmstate_vhost_inflight_region_buffer,
+        NULL
+    }
+};
+
 const VMStateDescription vmstate_vhost_virtqueue = {
     .name = "vhost-virtqueue",
     .fields = (const VMStateField[]) {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 13ca2c319f..dd552de91f 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -596,6 +596,12 @@ extern const VMStateDescription vmstate_backend_transfer_vhost_inflight;
                            vmstate_backend_transfer_vhost_inflight, \
                            struct vhost_inflight)
 
+extern const VMStateDescription vmstate_vhost_inflight_region;
+#define VMSTATE_VHOST_INFLIGHT_REGION(_field, _state) \
+    VMSTATE_STRUCT_POINTER(_field, _state, \
+                           vmstate_vhost_inflight_region, \
+                           struct vhost_inflight)
+
 extern const VMStateDescription vmstate_vhost_dev;
 #define VMSTATE_BACKEND_TRANSFER_VHOST(_field, _state) \
     VMSTATE_STRUCT(_field, _state, 0, vmstate_vhost_dev, struct vhost_dev)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 5/5] vhost-user-blk: support inter-host inflight migration
  2026-01-12 11:44 [PATCH v5 0/5] support inflight migration Alexandr Moshkov
                   ` (3 preceding siblings ...)
  2026-01-12 11:45 ` [PATCH v5 4/5] vhost: add vmstate for inflight region with inner buffer Alexandr Moshkov
@ 2026-01-12 11:45 ` Alexandr Moshkov
  2026-01-12 18:19   ` Stefan Hajnoczi
  4 siblings, 1 reply; 10+ messages in thread
From: Alexandr Moshkov @ 2026-01-12 11:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée,
	Stefan Hajnoczi, mzamazal, Peter Xu, Fabiano Rosas, qemu-block,
	virtio-fs, yc-core@yandex-team.ru, Eric Blake, Markus Armbruster,
	Alexandr Moshkov

During inter-host migration, waiting for disk requests to be drained
in the vhost-user backend can incur significant downtime.

This can be avoided if QEMU migrates the inflight region in
vhost-user-blk.
Thus, during the qemu migration, with feature flag the vhost-user
back-end can immediately stop vrings, so all in-flight requests will be
migrated to another host.

Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
---
 hw/block/vhost-user-blk.c          | 28 ++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user-blk.h |  1 +
 2 files changed, 29 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index a8fd90480a..5e44f6253c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -656,6 +656,28 @@ static struct vhost_dev *vhost_user_blk_get_vhost(VirtIODevice *vdev)
     return &s->dev;
 }
 
+static bool vhost_user_blk_inflight_needed(void *opaque)
+{
+    struct VHostUserBlk *s = opaque;
+
+    bool inflight_drain = vhost_dev_has_feature(&s->dev,
+                        VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT);
+
+    return s->inflight_migration &&
+           inflight_drain &&
+           !migrate_local_vhost_user_blk();
+}
+
+static const VMStateDescription vmstate_vhost_user_blk_inflight = {
+    .name = "vhost-user-blk/inflight",
+    .version_id = 1,
+    .needed = vhost_user_blk_inflight_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_VHOST_INFLIGHT_REGION(inflight, VHostUserBlk),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static bool vhost_user_blk_pre_incoming(void *opaque, Error **errp)
 {
     VHostUserBlk *s = VHOST_USER_BLK(opaque);
@@ -678,6 +700,10 @@ static const VMStateDescription vmstate_vhost_user_blk = {
         VMSTATE_VIRTIO_DEVICE,
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription * const []) {
+        &vmstate_vhost_user_blk_inflight,
+        NULL
+    }
 };
 
 static bool vhost_user_needed(void *opaque)
@@ -751,6 +777,8 @@ static const Property vhost_user_blk_properties[] = {
                       VIRTIO_BLK_F_WRITE_ZEROES, true),
     DEFINE_PROP_BOOL("skip-get-vring-base-on-force-shutdown", VHostUserBlk,
                      skip_get_vring_base_on_force_shutdown, false),
+    DEFINE_PROP_BOOL("inflight-migration", VHostUserBlk,
+                     inflight_migration, false),
 };
 
 static void vhost_user_blk_class_init(ObjectClass *klass, const void *data)
diff --git a/include/hw/virtio/vhost-user-blk.h b/include/hw/virtio/vhost-user-blk.h
index b06f55fd6f..e1466e5cf6 100644
--- a/include/hw/virtio/vhost-user-blk.h
+++ b/include/hw/virtio/vhost-user-blk.h
@@ -52,6 +52,7 @@ struct VHostUserBlk {
     bool started_vu;
 
     bool skip_get_vring_base_on_force_shutdown;
+    bool inflight_migration;
 
     bool incoming_backend;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 2/5] vhost-user: introduce protocol feature for skip drain on GET_VRING_BASE
  2026-01-12 11:45 ` [PATCH v5 2/5] vhost-user: introduce protocol feature for skip drain " Alexandr Moshkov
@ 2026-01-12 18:08   ` Stefan Hajnoczi
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2026-01-12 18:08 UTC (permalink / raw)
  To: Alexandr Moshkov
  Cc: qemu-devel, Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée, mzamazal,
	Peter Xu, Fabiano Rosas, qemu-block, virtio-fs,
	yc-core@yandex-team.ru, Eric Blake, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 5268 bytes --]

On Mon, Jan 12, 2026 at 04:45:00PM +0500, Alexandr Moshkov wrote:
> Add vhost-user protocol feature
> VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT
> 
> Now on GET_VRING_BASE this feature can control whether to wait for
> in-flight requests to complete or not.
> 
> It will be helpfull in future for in-flight requests migration in
> vhost-user devices.
> 
> Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
> ---
>  docs/interop/vhost-user.rst    | 52 ++++++++++++++++++----------------
>  include/hw/virtio/vhost-user.h |  1 +
>  2 files changed, 29 insertions(+), 24 deletions(-)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 02908b48fa..80c80aada5 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1033,26 +1033,27 @@ Protocol features
>  
>  .. code:: c
>  
> -  #define VHOST_USER_PROTOCOL_F_MQ                    0
> -  #define VHOST_USER_PROTOCOL_F_LOG_SHMFD             1
> -  #define VHOST_USER_PROTOCOL_F_RARP                  2
> -  #define VHOST_USER_PROTOCOL_F_REPLY_ACK             3
> -  #define VHOST_USER_PROTOCOL_F_MTU                   4
> -  #define VHOST_USER_PROTOCOL_F_BACKEND_REQ           5
> -  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN          6
> -  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION        7
> -  #define VHOST_USER_PROTOCOL_F_PAGEFAULT             8
> -  #define VHOST_USER_PROTOCOL_F_CONFIG                9
> -  #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD      10
> -  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER        11
> -  #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12
> -  #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13
> -  #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14
> -  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
> -  #define VHOST_USER_PROTOCOL_F_STATUS               16
> -  #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
> -  #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT        18
> -  #define VHOST_USER_PROTOCOL_F_DEVICE_STATE         19
> +  #define VHOST_USER_PROTOCOL_F_MQ                       0
> +  #define VHOST_USER_PROTOCOL_F_LOG_SHMFD                1
> +  #define VHOST_USER_PROTOCOL_F_RARP                     2
> +  #define VHOST_USER_PROTOCOL_F_REPLY_ACK                3
> +  #define VHOST_USER_PROTOCOL_F_MTU                      4
> +  #define VHOST_USER_PROTOCOL_F_BACKEND_REQ              5
> +  #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN             6
> +  #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION           7
> +  #define VHOST_USER_PROTOCOL_F_PAGEFAULT                8
> +  #define VHOST_USER_PROTOCOL_F_CONFIG                   9
> +  #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD         10
> +  #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER           11
> +  #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD          12
> +  #define VHOST_USER_PROTOCOL_F_RESET_DEVICE            13
> +  #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS    14
> +  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS     15
> +  #define VHOST_USER_PROTOCOL_F_STATUS                  16
> +  #define VHOST_USER_PROTOCOL_F_XEN_MMAP                17
> +  #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT           18
> +  #define VHOST_USER_PROTOCOL_F_DEVICE_STATE            19
> +  #define VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT 20
>  
>  Front-end message types
>  -----------------------
> @@ -1243,11 +1244,14 @@ Front-end message types
>  
>    When and as long as all of a device's vrings are stopped, it is
>    *suspended*, see :ref:`Suspended device state
> -  <suspended_device_state>`. The back-end must complete all inflight I/O
> -  requests for the specified vring before stopping it.
> +  <suspended_device_state>`.
>  
> -  The request payload's *num* field is currently reserved and must be
> -  set to 0.
> +  By default, the back-end must complete all inflight I/O requests for the
> +  specified vring before stopping it.
> +
> +  If the ``VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT`` protocol feature
> +  has been negotiated, the back-end may stop the vring immediately without
> +  waiting for inflight I/O requests to complete.

This paragraph is not specific enough. It gives the impression that I/O
requests can be left running, but that's not the case. They need to be
quiesced and recorded in the Inflight I/O Tracking
(VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD) shared memory data structure.

I suggest rewording it as follows:

  If the ``VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT`` protocol
  feature has been negotiated, the back-end may suspend in-flight I/O
  requests and record them as described in :ref:`inflight-io-tracking`
  instead of completing them before stopping the vring. How to suspend
  an in-flight request depends on the implementation of the back-end but
  it typically can be done by aborting or cancelling the underlying I/O
  request. The ``VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT``
  protocol feature must only be neogotiated if
  ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` is also negotiated.

(A _inflight-io-tracking label needs to be added in order to reference
the "Inflight I/O tracking" section.)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 5/5] vhost-user-blk: support inter-host inflight migration
  2026-01-12 11:45 ` [PATCH v5 5/5] vhost-user-blk: support inter-host inflight migration Alexandr Moshkov
@ 2026-01-12 18:19   ` Stefan Hajnoczi
  2026-01-13  6:49     ` Alexandr Moshkov
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Hajnoczi @ 2026-01-12 18:19 UTC (permalink / raw)
  To: Alexandr Moshkov
  Cc: qemu-devel, Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée, mzamazal,
	Peter Xu, Fabiano Rosas, qemu-block, virtio-fs,
	yc-core@yandex-team.ru, Eric Blake, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On Mon, Jan 12, 2026 at 04:45:03PM +0500, Alexandr Moshkov wrote:
> During inter-host migration, waiting for disk requests to be drained
> in the vhost-user backend can incur significant downtime.
> 
> This can be avoided if QEMU migrates the inflight region in
> vhost-user-blk.
> Thus, during the qemu migration, with feature flag the vhost-user
> back-end can immediately stop vrings, so all in-flight requests will be
> migrated to another host.
> 
> Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
> ---
>  hw/block/vhost-user-blk.c          | 28 ++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user-blk.h |  1 +
>  2 files changed, 29 insertions(+)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index a8fd90480a..5e44f6253c 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -656,6 +656,28 @@ static struct vhost_dev *vhost_user_blk_get_vhost(VirtIODevice *vdev)
>      return &s->dev;
>  }
>  
> +static bool vhost_user_blk_inflight_needed(void *opaque)
> +{
> +    struct VHostUserBlk *s = opaque;
> +
> +    bool inflight_drain = vhost_dev_has_feature(&s->dev,
> +                        VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT);

VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT must only be negotiated
when inflight_migration is enabled. Otherwise the backend will use this
feature even though vhost_user_blk_inflight_needed() skips migrating the
in-flight region.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 4/5] vhost: add vmstate for inflight region with inner buffer
  2026-01-12 11:45 ` [PATCH v5 4/5] vhost: add vmstate for inflight region with inner buffer Alexandr Moshkov
@ 2026-01-12 18:22   ` Stefan Hajnoczi
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2026-01-12 18:22 UTC (permalink / raw)
  To: Alexandr Moshkov
  Cc: qemu-devel, Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée, mzamazal,
	Peter Xu, Fabiano Rosas, qemu-block, virtio-fs,
	yc-core@yandex-team.ru, Eric Blake, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 2812 bytes --]

On Mon, Jan 12, 2026 at 04:45:02PM +0500, Alexandr Moshkov wrote:
> Prepare for future inflight region migration for vhost-user-blk.
> We need to migrate size, queue_size, and inner buffer.
> 
> So firstly it migrate size and queue_size fields, then allocate memory for buffer with
> migrated size, then migrate inner buffer itself.
> 
> Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
> ---
>  hw/virtio/vhost.c         | 42 +++++++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost.h |  6 ++++++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index c46203eb9c..9a746c9861 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2028,6 +2028,48 @@ const VMStateDescription vmstate_backend_transfer_vhost_inflight = {
>      }
>  };
>  
> +static int vhost_inflight_buffer_pre_load(void *opaque, Error **errp)
> +{
> +    info_report("vhost_inflight_region_buffer_pre_load");
> +    struct vhost_inflight *inflight = opaque;
> +
> +    int fd = -1;
> +    void *addr = qemu_memfd_alloc("vhost-inflight", inflight->size,
> +                                  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
> +                                  &fd, errp);
> +    if (*errp) {
> +        return -ENOMEM;
> +    }
> +
> +    inflight->offset = 0;
> +    inflight->addr = addr;
> +    inflight->fd = fd;
> +
> +    return 0;
> +}
> +
> +const VMStateDescription vmstate_vhost_inflight_region_buffer = {
> +    .name = "vhost-inflight-region/buffer",
> +    .pre_load_errp = vhost_inflight_buffer_pre_load,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_VBUFFER_UINT64(addr, struct vhost_inflight, 0, NULL, size),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +const VMStateDescription vmstate_vhost_inflight_region = {
> +    .name = "vhost-inflight-region",
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_UINT64(size, struct vhost_inflight),
> +        VMSTATE_UINT16(queue_size, struct vhost_inflight),
> +        VMSTATE_END_OF_LIST()
> +    },
> +    .subsections = (const VMStateDescription * const []) {
> +        &vmstate_vhost_inflight_region_buffer,
> +        NULL
> +    }
> +};

The subsection trick is neat - it allows the size to be loaded first and
then the memfd is allocated. However, it introduces a weird case: if the
source QEMU does not send the subsection, then the destination QEMU
loads successfully but with inflight partially uninitialized.

It's not obvious to me that the destination QEMU will fail in a safe way
when this happens. The source QEMU must not be able to trigger undefined
behavior. Can you add an explicit check somewhere to fail when this
required subsection is missing?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 5/5] vhost-user-blk: support inter-host inflight migration
  2026-01-12 18:19   ` Stefan Hajnoczi
@ 2026-01-13  6:49     ` Alexandr Moshkov
  0 siblings, 0 replies; 10+ messages in thread
From: Alexandr Moshkov @ 2026-01-13  6:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Gonglei (Arei), Zhenwei Pi, Michael S. Tsirkin,
	Stefano Garzarella, Raphael Norwitz, Kevin Wolf, Hanna Reitz,
	Jason Wang, Paolo Bonzini, Fam Zheng, Alex Bennée, mzamazal,
	Peter Xu, Fabiano Rosas, qemu-block, virtio-fs,
	yc-core@yandex-team.ru, Eric Blake, Markus Armbruster


On 1/12/26 23:19, Stefan Hajnoczi wrote:
> On Mon, Jan 12, 2026 at 04:45:03PM +0500, Alexandr Moshkov wrote:
>> During inter-host migration, waiting for disk requests to be drained
>> in the vhost-user backend can incur significant downtime.
>>
>> This can be avoided if QEMU migrates the inflight region in
>> vhost-user-blk.
>> Thus, during the qemu migration, with feature flag the vhost-user
>> back-end can immediately stop vrings, so all in-flight requests will be
>> migrated to another host.
>>
>> Signed-off-by: Alexandr Moshkov <dtalexundeer@yandex-team.ru>
>> ---
>>   hw/block/vhost-user-blk.c          | 28 ++++++++++++++++++++++++++++
>>   include/hw/virtio/vhost-user-blk.h |  1 +
>>   2 files changed, 29 insertions(+)
>>
>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>> index a8fd90480a..5e44f6253c 100644
>> --- a/hw/block/vhost-user-blk.c
>> +++ b/hw/block/vhost-user-blk.c
>> @@ -656,6 +656,28 @@ static struct vhost_dev *vhost_user_blk_get_vhost(VirtIODevice *vdev)
>>       return &s->dev;
>>   }
>>   
>> +static bool vhost_user_blk_inflight_needed(void *opaque)
>> +{
>> +    struct VHostUserBlk *s = opaque;
>> +
>> +    bool inflight_drain = vhost_dev_has_feature(&s->dev,
>> +                        VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT);
> VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT must only be negotiated
> when inflight_migration is enabled. Otherwise the backend will use this
> feature even though vhost_user_blk_inflight_needed() skips migrating the
> in-flight region.
Oh, I understand now. I'll fix this too, thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-01-13  6:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 11:44 [PATCH v5 0/5] support inflight migration Alexandr Moshkov
2026-01-12 11:44 ` [PATCH v5 1/5] vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE Alexandr Moshkov
2026-01-12 11:45 ` [PATCH v5 2/5] vhost-user: introduce protocol feature for skip drain " Alexandr Moshkov
2026-01-12 18:08   ` Stefan Hajnoczi
2026-01-12 11:45 ` [PATCH v5 3/5] vmstate: introduce VMSTATE_VBUFFER_UINT64 Alexandr Moshkov
2026-01-12 11:45 ` [PATCH v5 4/5] vhost: add vmstate for inflight region with inner buffer Alexandr Moshkov
2026-01-12 18:22   ` Stefan Hajnoczi
2026-01-12 11:45 ` [PATCH v5 5/5] vhost-user-blk: support inter-host inflight migration Alexandr Moshkov
2026-01-12 18:19   ` Stefan Hajnoczi
2026-01-13  6:49     ` Alexandr Moshkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox