[PATCH v3 0/5] vhost-user: Back-end state migration

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/5] vhost-user: Back-end state migration
@ 2023-09-15 10:25 Hanna Czenczek
  2023-09-15 10:25 ` [PATCH v3 1/5] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-15 10:25 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Stefan Hajnoczi, Michael S . Tsirkin,
	Eugenio Pérez, German Maglione

RFC:
https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html

v1:
https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html

v2:
https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02604.html

Hi,

I’ve decided not to work on vhost-user SUSPEND/RESUME for now – it is
not technically required for virtio-fs migration, which is the actual
priority for me now.  While we do want to have SUSPEND/RESUME at some
point, the only practically existing reason for it is to be able to
implement vhost-level resetting in virtiofsd, but that is not related to
migration.

So one of the changes in v3 is that it no longer depends on the
vhost-user SUSPEND/RESUME series, and describes the migration protocol
without the device being suspended at any point, but merely that the
vrings are stopped.

Other changes include:

- Patch 1:
  - Rephrased a lot
  - Added a description for the VHOST_USER_SET_DEVICE_STATE_FD
    parameters
  - Renamed VHOST_USER_PROTOCOL_F_MIGRATORY_STATE to
    VHOST_USER_PROTOCOL_F_DEVICE_STATE
  - enum variants changed in value due to dropping the SUSPEND/RESUME
    dependency

- Patch 2:
  - Pulled in, was a stand-alone patch before
  - Dropped a sentence about ring state before feature negotiations, as
    the rings are not to be used during that period anyway
  - Bit of rephrasing

- Patch 3:
  - Renamed “migratory state” to “device state”
  - enum variants changed in value due to dropping the SUSPEND/RESUME
    dependency

- Patch 4:
  - Changed `f` to @f (referencing parameter “f”) in comments
  - Use g_autofree for the transfer buffer
  - Note SUSPEND state as a future feature, not currently existing
  - Wrap read() and write() in RETRY_ON_EINTR()

- Patch 5:
  - Renamed “migratory state” to “device state”
  - (kept R-b still)


Hanna Czenczek (5):
  vhost-user.rst: Migrating back-end-internal state
  vhost-user.rst: Clarify enabling/disabling vrings
  vhost-user: Interface for migration state transfer
  vhost: Add high-level state save/load functions
  vhost-user-fs: Implement internal migration

 docs/interop/vhost-user.rst       | 188 ++++++++++++++++++++++-
 include/hw/virtio/vhost-backend.h |  24 +++
 include/hw/virtio/vhost.h         | 114 ++++++++++++++
 hw/virtio/vhost-user-fs.c         | 101 ++++++++++++-
 hw/virtio/vhost-user.c            | 148 ++++++++++++++++++
 hw/virtio/vhost.c                 | 241 ++++++++++++++++++++++++++++++
 6 files changed, 810 insertions(+), 6 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 1/5] vhost-user.rst: Migrating back-end-internal state
  2023-09-15 10:25 [PATCH v3 0/5] vhost-user: Back-end state migration Hanna Czenczek
@ 2023-09-15 10:25 ` Hanna Czenczek
  2023-09-25 19:04   ` Stefan Hajnoczi
  2023-09-15 10:25 ` [PATCH v3 2/5] vhost-user.rst: Clarify enabling/disabling vrings Hanna Czenczek
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-15 10:25 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Stefan Hajnoczi, Michael S . Tsirkin,
	Eugenio Pérez, German Maglione

For vhost-user devices, qemu can migrate the virtio state, but not the
back-end's internal state.  To do so, we need to be able to transfer
this internal state between front-end (qemu) and back-end.

At this point, this new feature is added for the purpose of virtio-fs
migration.  Because virtiofsd's internal state will not be too large, we
believe it is best to transfer it as a single binary blob after the
streaming phase.

These are the additions to the protocol:
- New vhost-user protocol feature VHOST_USER_PROTOCOL_F_DEVICE_STATE
- SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a file
  descriptor over which to transfer the state.
- CHECK_DEVICE_STATE: After the state has been transferred through the
  file descriptor, the front-end invokes this function to verify
  success.  There is no in-band way (through the file descriptor) to
  indicate failure, so we need to check explicitly.

Once the transfer FD has been established via SET_DEVICE_STATE_FD
(which includes establishing the direction of transfer and migration
phase), the sending side writes its data into it, and the reading side
reads it until it sees an EOF.  Then, the front-end will check for
success via CHECK_DEVICE_STATE, which on the destination side includes
checking for integrity (i.e. errors during deserialization).

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 docs/interop/vhost-user.rst | 170 ++++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 5a070adbc1..bb4dd0fd60 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -275,6 +275,31 @@ Inflight description
 
 :queue size: a 16-bit size of virtqueues
 
+Device state transfer parameters
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
++--------------------+-----------------+
+| transfer direction | migration phase |
++--------------------+-----------------+
+
+:transfer direction: a 32-bit enum, describing the direction in which
+  the state is transferred:
+
+  - 0: Save: Transfer the state from the back-end to the front-end,
+    which happens on the source side of migration
+  - 1: Load: Transfer the state from the front-end to the back-end,
+    which happens on the destination side of migration
+
+:migration phase: a 32-bit enum, describing the state in which the VM
+  guest and devices are:
+
+  - 0: Stopped (in the period after the transfer of memory-mapped
+    regions before switch-over to the destination): The VM guest and all
+    of the vhost-user device's rings are stopped.
+
+  In the future, additional phases might be added e.g. to allow
+  iterative migration while the device is running.
+
 C structure
 -----------
 
@@ -334,6 +359,7 @@ in the ancillary data:
 * ``VHOST_USER_SET_VRING_ERR``
 * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``)
 * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
+* ``VHOST_USER_SET_DEVICE_STATE_FD``
 
 If *front-end* is unable to send the full message or receives a wrong
 reply it will close the connection. An optional reconnection mechanism
@@ -492,6 +518,79 @@ it performs WAKE ioctl's on the userfaultfd to wake the stalled
 back-end.  The front-end indicates support for this via the
 ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature.
 
+.. _migrating_backend_state:
+
+Migrating back-end state
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Migrating device state involves transferring the state from one
+back-end, called the source, to another back-end, called the
+destination.  After migration, the destination transparently resumes
+operation without requiring the driver to re-initialize the device at
+the VIRTIO level.  If the migration fails, then the source can
+transparently resume operation until another migration attempt is made.
+
+Generally, the front-end is connected to a virtual machine guest (which
+contains the driver), which has its own state to transfer between source
+and destination, and therefore will have an implementation-specific
+mechanism to do so.  The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature
+provides functionality to have the front-end include the back-end's
+state in this transfer operation so the back-end does not need to
+implement its own mechanism, and so the virtual machine may have its
+complete state, including vhost-user devices' states, contained within a
+single stream of data.
+
+To do this, the back-end state is transferred from back-end to front-end
+on the source side, and vice versa on the destination side.  This
+transfer happens over a channel that is negotiated using the
+``VHOST_USER_SET_DEVICE_STATE_FD`` message.  This message has two
+parameters:
+
+* Direction of transfer: On the source, the data is saved, transferring
+  it from the back-end to the front-end.  On the destination, the data
+  is loaded, transferring it from the front-end to the back-end.
+
+* Migration phase: Currently, the only supported phase is the period
+  after the transfer of memory-mapped regions before switch-over to the
+  destination, when all of the device's rings are stopped.  In the
+  future, additional phases might be supported to allow iterative
+  migration while the device is running.
+
+The nature of the channel is implementation-defined, but it must
+generally behave like a pipe: The writing end will write all the data it
+has into it, signalling the end of data by closing its end.  The reading
+end must read all of this data (until encountering the end of file) and
+process it.
+
+* When saving, the writing end is the source back-end, and the reading
+  end is the source front-end.  After reading the state data from the
+  channel, the source front-end must transfer it to the destination
+  front-end through an implementation-defined mechanism.
+
+* When loading, the writing end is the destination front-end, and the
+  reading end is the destination back-end.  After reading the state data
+  from the channel, the destination back-end must deserialize its
+  internal state from that data and set itself up to allow the driver to
+  seamlessly resume operation on the VIRTIO level.
+
+Seamlessly resuming operation means that the migration must be
+transparent to the guest driver, which operates on the VIRTIO level.
+This driver will not perform any re-initialization steps, but continue
+to use the device as if no migration had occurred.  The vhost-user
+front-end, however, will re-initialize the vhost state on the
+destination, following the usual protocol for establishing a connection
+to a vhost-user back-end: This includes, for example, setting up memory
+mappings and kick and call FDs as necessary, negotiating protocol
+features, or setting the initial vring base indices (to the same value
+as on the source side, so that operation can resume).
+
+Both on the source and on the destination side, after the respective
+front-end has seen all data transferred (when the transfer FD has been
+closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to
+verify that data transfer was successful in the back-end, too.  The
+back-end responds once it knows whether the transfer and processing was
+successful or not.
+
 Memory access
 -------------
 
@@ -885,6 +984,7 @@ Protocol features
   #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
   #define VHOST_USER_PROTOCOL_F_STATUS               16
   #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17
+  #define VHOST_USER_PROTOCOL_F_DEVICE_STATE         18
 
 Front-end message types
 -----------------------
@@ -1440,6 +1540,76 @@ Front-end message types
   query the back-end for its device status as defined in the Virtio
   specification.
 
+``VHOST_USER_SET_DEVICE_STATE_FD``
+  :id: 41
+  :equivalent ioctl: N/A
+  :request payload: device state transfer parameters
+  :reply payload: ``u64``
+
+  Front-end and back-end negotiate a channel over which to transfer the
+  back-end’s internal state during migration.  Either side (front-end or
+  back-end) may create the channel.  The nature of this channel is not
+  restricted or defined in this document, but whichever side creates it
+  must create a file descriptor that is provided to the respectively
+  other side, allowing access to the channel.  This FD must behave as
+  follows:
+
+  * For the writing end, it must allow writing the whole back-end state
+    sequentially.  Closing the file descriptor signals the end of
+    transfer.
+
+  * For the reading end, it must allow reading the whole back-end state
+    sequentially.  The end of file signals the end of the transfer.
+
+  For example, the channel may be a pipe, in which case the two ends of
+  the pipe fulfill these requirements respectively.
+
+  Initially, the front-end creates a channel along with such an FD.  It
+  passes the FD to the back-end as ancillary data of a
+  ``VHOST_USER_SET_DEVICE_STATE_FD`` message.  The back-end may create a
+  different transfer channel, passing the respective FD back to the
+  front-end as ancillary data of the reply.  If so, the front-end must
+  then discard its channel and use the one provided by the back-end.
+
+  Whether the back-end should decide to use its own channel is decided
+  based on efficiency: If the channel is a pipe, both ends will most
+  likely need to copy data into and out of it.  Any channel that allows
+  for more efficient processing on at least one end, e.g. through
+  zero-copy, is considered more efficient and thus preferred.  If the
+  back-end can provide such a channel, it should decide to use it.
+
+  The request payload contains parameters for the subsequent data
+  transfer, as described in the :ref:`Migrating back-end state
+  <migrating_backend_state>` section.
+
+  The value returned is both an indication for success, and whether a
+  file descriptor for a back-end-provided channel is returned: Bits 0–7
+  are 0 on success, and non-zero on error.  Bit 8 is the invalid FD
+  flag; this flag is set when there is no file descriptor returned.
+  When this flag is not set, the front-end must use the returned file
+  descriptor as its end of the transfer channel.  The back-end must not
+  both indicate an error and return a file descriptor.
+
+  Using this function requires prior negotiation of the
+  ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
+
+``VHOST_USER_CHECK_DEVICE_STATE``
+  :id: 42
+  :equivalent ioctl: N/A
+  :request payload: N/A
+  :reply payload: ``u64``
+
+  After transferring the back-end’s internal state during migration (see
+  the :ref:`Migrating back-end state <migrating_backend_state>`
+  section), check whether the back-end was able to successfully fully
+  process the state.
+
+  The value returned indicates success or error; 0 is success, any
+  non-zero value is an error.
+
+  Using this function requires prior negotiation of the
+  ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature.
+
 
 Back-end message types
 ----------------------
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 2/5] vhost-user.rst: Clarify enabling/disabling vrings
  2023-09-15 10:25 [PATCH v3 0/5] vhost-user: Back-end state migration Hanna Czenczek
  2023-09-15 10:25 ` [PATCH v3 1/5] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
@ 2023-09-15 10:25 ` Hanna Czenczek
  2023-09-25 19:15   ` Stefan Hajnoczi
  2023-09-15 10:25 ` [PATCH v3 3/5] vhost-user: Interface for migration state transfer Hanna Czenczek
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-15 10:25 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Stefan Hajnoczi, Michael S . Tsirkin,
	Eugenio Pérez, German Maglione

Currently, the vhost-user documentation says that rings are to be
initialized in a disabled state when VHOST_USER_F_PROTOCOL_FEATURES is
negotiated.  However, by the time of feature negotiation, all rings have
already been initialized, so it is not entirely clear what this means.

At least the vhost-user-backend Rust crate's implementation interpreted
it to mean that whenever this feature is negotiated, all rings are to
put into a disabled state, which means that every SET_FEATURES call
would disable all rings, effectively halting the device.  This is
problematic because the VHOST_F_LOG_ALL feature is also set or cleared
this way, which happens during migration.  Doing so should not halt the
device.

Other implementations have interpreted this to mean that the device is
to be initialized with all rings disabled, and a subsequent SET_FEATURES
call that does not set VHOST_USER_F_PROTOCOL_FEATURES will enable all of
them.  Here, SET_FEATURES will never disable any ring.

This interpretation does not suffer the problem of unintentionally
halting the device whenever features are set or cleared, so it seems
better and more reasonable.

We should clarify this in the documentation.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 docs/interop/vhost-user.rst | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index bb4dd0fd60..9b9b802c60 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -409,12 +409,20 @@ and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``.

 Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.

-If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
-ring starts directly in the enabled state.
-
-If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
-initialized in a disabled state and is enabled by
-``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
+If ``VHOST_USER_SET_FEATURES`` does not negotiate
+``VHOST_USER_F_PROTOCOL_FEATURES``, rings are enabled immediately when
+started.
+
+If ``VHOST_USER_SET_FEATURES`` does negotiate
+``VHOST_USER_F_PROTOCOL_FEATURES``, each ring will remain in the disabled
+state until ``VHOST_USER_SET_VRING_ENABLE`` enables it with parameter 1.
+
+Back-end implementations that support ``VHOST_USER_F_PROTOCOL_FEATURES``
+should implement this by initializing each ring in a disabled state, and
+enabling them when ``VHOST_USER_SET_FEATURES`` is used without
+negotiating ``VHOST_USER_F_PROTOCOL_FEATURES``.  Other than that, rings
+should only be enabled and disabled through
+``VHOST_USER_SET_VRING_ENABLE``.

 While processing the rings (whether they are enabled or not), the back-end
 must support changing some configuration aspects on the fly.
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 3/5] vhost-user: Interface for migration state transfer
  2023-09-15 10:25 [PATCH v3 0/5] vhost-user: Back-end state migration Hanna Czenczek
  2023-09-15 10:25 ` [PATCH v3 1/5] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
  2023-09-15 10:25 ` [PATCH v3 2/5] vhost-user.rst: Clarify enabling/disabling vrings Hanna Czenczek
@ 2023-09-15 10:25 ` Hanna Czenczek
  2023-09-25 20:18   ` Stefan Hajnoczi
  2023-09-15 10:25 ` [PATCH v3 4/5] vhost: Add high-level state save/load functions Hanna Czenczek
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-15 10:25 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Stefan Hajnoczi, Michael S . Tsirkin,
	Eugenio Pérez, German Maglione

Add the interface for transferring the back-end's state during migration
as defined previously in vhost-user.rst.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 include/hw/virtio/vhost-backend.h |  24 +++++
 include/hw/virtio/vhost.h         |  79 ++++++++++++++++
 hw/virtio/vhost-user.c            | 148 ++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  37 ++++++++
 4 files changed, 288 insertions(+)

diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 31a251a9f5..b6eee7e9fd 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
     VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
 } VhostSetConfigType;
 
+typedef enum VhostDeviceStateDirection {
+    /* Transfer state from back-end (device) to front-end */
+    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
+    /* Transfer state from front-end to back-end (device) */
+    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
+} VhostDeviceStateDirection;
+
+typedef enum VhostDeviceStatePhase {
+    /* The device (and all its vrings) is stopped */
+    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
+} VhostDeviceStatePhase;
+
 struct vhost_inflight;
 struct vhost_dev;
 struct vhost_log;
@@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
 
 typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
 
+typedef bool (*vhost_supports_device_state_op)(struct vhost_dev *dev);
+typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
+                                            VhostDeviceStateDirection direction,
+                                            VhostDeviceStatePhase phase,
+                                            int fd,
+                                            int *reply_fd,
+                                            Error **errp);
+typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, Error **errp);
+
 typedef struct VhostOps {
     VhostBackendType backend_type;
     vhost_backend_init vhost_backend_init;
@@ -181,6 +202,9 @@ typedef struct VhostOps {
     vhost_force_iommu_op vhost_force_iommu;
     vhost_set_config_call_op vhost_set_config_call;
     vhost_reset_status_op vhost_reset_status;
+    vhost_supports_device_state_op vhost_supports_device_state;
+    vhost_set_device_state_fd_op vhost_set_device_state_fd;
+    vhost_check_device_state_op vhost_check_device_state;
 } VhostOps;
 
 int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 6a173cb9fa..f4e19aae8b 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -338,4 +338,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
 int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
                            struct vhost_inflight *inflight);
 bool vhost_dev_has_iommu(struct vhost_dev *dev);
+
+/**
+ * vhost_supports_device_state(): Checks whether the back-end supports
+ * transferring internal device state for the purpose of migration.
+ * Support for this feature is required for vhost_set_device_state_fd()
+ * and vhost_check_device_state().
+ *
+ * @dev: The vhost device
+ *
+ * Returns true if the device supports these commands, and false if it
+ * does not.
+ */
+bool vhost_supports_device_state(struct vhost_dev *dev);
+
+/**
+ * vhost_set_device_state_fd(): Begin transfer of internal state from/to
+ * the back-end for the purpose of migration.  Data is to be transferred
+ * over a pipe according to @direction and @phase.  The sending end must
+ * only write to the pipe, and the receiving end must only read from it.
+ * Once the sending end is done, it closes its FD.  The receiving end
+ * must take this as the end-of-transfer signal and close its FD, too.
+ *
+ * @fd is the back-end's end of the pipe: The write FD for SAVE, and the
+ * read FD for LOAD.  This function transfers ownership of @fd to the
+ * back-end, i.e. closes it in the front-end.
+ *
+ * The back-end may optionally reply with an FD of its own, if this
+ * improves efficiency on its end.  In this case, the returned FD is
+ * stored in *reply_fd.  The back-end will discard the FD sent to it,
+ * and the front-end must use *reply_fd for transferring state to/from
+ * the back-end.
+ *
+ * @dev: The vhost device
+ * @direction: The direction in which the state is to be transferred.
+ *             For outgoing migrations, this is SAVE, and data is read
+ *             from the back-end and stored by the front-end in the
+ *             migration stream.
+ *             For incoming migrations, this is LOAD, and data is read
+ *             by the front-end from the migration stream and sent to
+ *             the back-end to restore the saved state.
+ * @phase: Which migration phase we are in.  Currently, there is only
+ *         STOPPED (device and all vrings are stopped), in the future,
+ *         more phases such as PRE_COPY or POST_COPY may be added.
+ * @fd: Back-end's end of the pipe through which to transfer state; note
+ *      that ownership is transferred to the back-end, so this function
+ *      closes @fd in the front-end.
+ * @reply_fd: If the back-end wishes to use a different pipe for state
+ *            transfer, this will contain an FD for the front-end to
+ *            use.  Otherwise, -1 is stored here.
+ * @errp: Potential error description
+ *
+ * Returns 0 on success, and -errno on failure.
+ */
+int vhost_set_device_state_fd(struct vhost_dev *dev,
+                              VhostDeviceStateDirection direction,
+                              VhostDeviceStatePhase phase,
+                              int fd,
+                              int *reply_fd,
+                              Error **errp);
+
+/**
+ * vhost_set_device_state_fd(): After transferring state from/to the
+ * back-end via vhost_set_device_state_fd(), i.e. once the sending end
+ * has closed the pipe, inquire the back-end to report any potential
+ * errors that have occurred on its side.  This allows to sense errors
+ * like:
+ * - During outgoing migration, when the source side had already started
+ *   to produce its state, something went wrong and it failed to finish
+ * - During incoming migration, when the received state is somehow
+ *   invalid and cannot be processed by the back-end
+ *
+ * @dev: The vhost device
+ * @errp: Potential error description
+ *
+ * Returns 0 when the back-end reports successful state transfer and
+ * processing, and -errno when an error occurred somewhere.
+ */
+int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
+
 #endif
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..3cdde543fa 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -74,6 +74,8 @@ enum VhostUserProtocolFeature {
     /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
     VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
     VHOST_USER_PROTOCOL_F_STATUS = 16,
+    /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
+    VHOST_USER_PROTOCOL_F_DEVICE_STATE = 18,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -121,6 +123,8 @@ typedef enum VhostUserRequest {
     VHOST_USER_REM_MEM_REG = 38,
     VHOST_USER_SET_STATUS = 39,
     VHOST_USER_GET_STATUS = 40,
+    VHOST_USER_SET_DEVICE_STATE_FD = 41,
+    VHOST_USER_CHECK_DEVICE_STATE = 42,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -212,6 +216,12 @@ typedef struct {
     uint32_t size; /* the following payload size */
 } QEMU_PACKED VhostUserHeader;
 
+/* Request payload of VHOST_USER_SET_DEVICE_STATE_FD */
+typedef struct VhostUserTransferDeviceState {
+    uint32_t direction;
+    uint32_t phase;
+} VhostUserTransferDeviceState;
+
 typedef union {
 #define VHOST_USER_VRING_IDX_MASK   (0xff)
 #define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
@@ -226,6 +236,7 @@ typedef union {
         VhostUserCryptoSession session;
         VhostUserVringArea area;
         VhostUserInflight inflight;
+        VhostUserTransferDeviceState transfer_state;
 } VhostUserPayload;
 
 typedef struct VhostUserMsg {
@@ -2741,6 +2752,140 @@ static void vhost_user_reset_status(struct vhost_dev *dev)
     }
 }
 
+static bool vhost_user_supports_device_state(struct vhost_dev *dev)
+{
+    return virtio_has_feature(dev->protocol_features,
+                              VHOST_USER_PROTOCOL_F_DEVICE_STATE);
+}
+
+static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
+                                          VhostDeviceStateDirection direction,
+                                          VhostDeviceStatePhase phase,
+                                          int fd,
+                                          int *reply_fd,
+                                          Error **errp)
+{
+    int ret;
+    struct vhost_user *vu = dev->opaque;
+    VhostUserMsg msg = {
+        .hdr = {
+            .request = VHOST_USER_SET_DEVICE_STATE_FD,
+            .flags = VHOST_USER_VERSION,
+            .size = sizeof(msg.payload.transfer_state),
+        },
+        .payload.transfer_state = {
+            .direction = direction,
+            .phase = phase,
+        },
+    };
+
+    *reply_fd = -1;
+
+    if (!vhost_user_supports_device_state(dev)) {
+        close(fd);
+        error_setg(errp, "Back-end does not support migration state transfer");
+        return -ENOTSUP;
+    }
+
+    ret = vhost_user_write(dev, &msg, &fd, 1);
+    close(fd);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to send SET_DEVICE_STATE_FD message");
+        return ret;
+    }
+
+    ret = vhost_user_read(dev, &msg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to receive SET_DEVICE_STATE_FD reply");
+        return ret;
+    }
+
+    if (msg.hdr.request != VHOST_USER_SET_DEVICE_STATE_FD) {
+        error_setg(errp,
+                   "Received unexpected message type, expected %d, received %d",
+                   VHOST_USER_SET_DEVICE_STATE_FD, msg.hdr.request);
+        return -EPROTO;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.u64)) {
+        error_setg(errp,
+                   "Received bad message size, expected %zu, received %" PRIu32,
+                   sizeof(msg.payload.u64), msg.hdr.size);
+        return -EPROTO;
+    }
+
+    if ((msg.payload.u64 & 0xff) != 0) {
+        error_setg(errp, "Back-end did not accept migration state transfer");
+        return -EIO;
+    }
+
+    if (!(msg.payload.u64 & VHOST_USER_VRING_NOFD_MASK)) {
+        *reply_fd = qemu_chr_fe_get_msgfd(vu->user->chr);
+        if (*reply_fd < 0) {
+            error_setg(errp,
+                       "Failed to get back-end-provided transfer pipe FD");
+            *reply_fd = -1;
+            return -EIO;
+        }
+    }
+
+    return 0;
+}
+
+static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
+{
+    int ret;
+    VhostUserMsg msg = {
+        .hdr = {
+            .request = VHOST_USER_CHECK_DEVICE_STATE,
+            .flags = VHOST_USER_VERSION,
+            .size = 0,
+        },
+    };
+
+    if (!vhost_user_supports_device_state(dev)) {
+        error_setg(errp, "Back-end does not support migration state transfer");
+        return -ENOTSUP;
+    }
+
+    ret = vhost_user_write(dev, &msg, NULL, 0);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to send CHECK_DEVICE_STATE message");
+        return ret;
+    }
+
+    ret = vhost_user_read(dev, &msg);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to receive CHECK_DEVICE_STATE reply");
+        return ret;
+    }
+
+    if (msg.hdr.request != VHOST_USER_CHECK_DEVICE_STATE) {
+        error_setg(errp,
+                   "Received unexpected message type, expected %d, received %d",
+                   VHOST_USER_CHECK_DEVICE_STATE, msg.hdr.request);
+        return -EPROTO;
+    }
+
+    if (msg.hdr.size != sizeof(msg.payload.u64)) {
+        error_setg(errp,
+                   "Received bad message size, expected %zu, received %" PRIu32,
+                   sizeof(msg.payload.u64), msg.hdr.size);
+        return -EPROTO;
+    }
+
+    if (msg.payload.u64 != 0) {
+        error_setg(errp, "Back-end failed to process its internal state");
+        return -EIO;
+    }
+
+    return 0;
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
         .vhost_backend_init = vhost_user_backend_init,
@@ -2777,4 +2922,7 @@ const VhostOps user_ops = {
         .vhost_set_inflight_fd = vhost_user_set_inflight_fd,
         .vhost_dev_start = vhost_user_dev_start,
         .vhost_reset_status = vhost_user_reset_status,
+        .vhost_supports_device_state = vhost_user_supports_device_state,
+        .vhost_set_device_state_fd = vhost_user_set_device_state_fd,
+        .vhost_check_device_state = vhost_user_check_device_state,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2f6ffb446..6d34abec96 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2087,3 +2087,40 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
 
     return -ENOSYS;
 }
+
+bool vhost_supports_device_state(struct vhost_dev *dev)
+{
+    if (dev->vhost_ops->vhost_supports_device_state) {
+        return dev->vhost_ops->vhost_supports_device_state(dev);
+    }
+
+    return false;
+}
+
+int vhost_set_device_state_fd(struct vhost_dev *dev,
+                              VhostDeviceStateDirection direction,
+                              VhostDeviceStatePhase phase,
+                              int fd,
+                              int *reply_fd,
+                              Error **errp)
+{
+    if (dev->vhost_ops->vhost_set_device_state_fd) {
+        return dev->vhost_ops->vhost_set_device_state_fd(dev, direction, phase,
+                                                         fd, reply_fd, errp);
+    }
+
+    error_setg(errp,
+               "vhost transport does not support migration state transfer");
+    return -ENOSYS;
+}
+
+int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
+{
+    if (dev->vhost_ops->vhost_check_device_state) {
+        return dev->vhost_ops->vhost_check_device_state(dev, errp);
+    }
+
+    error_setg(errp,
+               "vhost transport does not support migration state transfer");
+    return -ENOSYS;
+}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 4/5] vhost: Add high-level state save/load functions
  2023-09-15 10:25 [PATCH v3 0/5] vhost-user: Back-end state migration Hanna Czenczek
                   ` (2 preceding siblings ...)
  2023-09-15 10:25 ` [PATCH v3 3/5] vhost-user: Interface for migration state transfer Hanna Czenczek
@ 2023-09-15 10:25 ` Hanna Czenczek
  2023-09-25 20:23   ` Stefan Hajnoczi
  2023-09-15 10:25 ` [PATCH v3 5/5] vhost-user-fs: Implement internal migration Hanna Czenczek
  2023-09-25 20:48 ` [PATCH v3 0/5] vhost-user: Back-end state migration Stefan Hajnoczi
  5 siblings, 1 reply; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-15 10:25 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Stefan Hajnoczi, Michael S . Tsirkin,
	Eugenio Pérez, German Maglione

vhost_save_backend_state() and vhost_load_backend_state() can be used by
vhost front-ends to easily save and load the back-end's state to/from
the migration stream.

Because we do not know the full state size ahead of time,
vhost_save_backend_state() simply reads the data in 1 MB chunks, and
writes each chunk consecutively into the migration stream, prefixed by
its length.  EOF is indicated by a 0-length chunk.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 include/hw/virtio/vhost.h |  35 +++++++
 hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 239 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index f4e19aae8b..28066ceda1 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -417,4 +417,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
  */
 int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
 
+/**
+ * vhost_save_backend_state(): High-level function to receive a vhost
+ * back-end's state, and save it in @f.  Uses
+ * `vhost_set_device_state_fd()` to get the data from the back-end, and
+ * stores it in consecutive chunks that are each prefixed by their
+ * respective length (be32).  The end is marked by a 0-length chunk.
+ *
+ * Must only be called while the device and all its vrings are stopped
+ * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
+ *
+ * @dev: The vhost device from which to save the state
+ * @f: Migration stream in which to save the state
+ * @errp: Potential error message
+ *
+ * Returns 0 on success, and -errno otherwise.
+ */
+int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
+
+/**
+ * vhost_load_backend_state(): High-level function to load a vhost
+ * back-end's state from @f, and send it over to the back-end.  Reads
+ * the data from @f in the format used by `vhost_save_state()`, and uses
+ * `vhost_set_device_state_fd()` to transfer it to the back-end.
+ *
+ * Must only be called while the device and all its vrings are stopped
+ * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
+ *
+ * @dev: The vhost device to which to send the sate
+ * @f: Migration stream from which to load the state
+ * @errp: Potential error message
+ *
+ * Returns 0 on success, and -errno otherwise.
+ */
+int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
+
 #endif
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 6d34abec96..164bd930d3 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2124,3 +2124,207 @@ int vhost_check_device_state(struct vhost_dev *dev, Error **errp)
                "vhost transport does not support migration state transfer");
     return -ENOSYS;
 }
+
+int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
+{
+    /* Maximum chunk size in which to transfer the state */
+    const size_t chunk_size = 1 * 1024 * 1024;
+    g_autofree void *transfer_buf = NULL;
+    g_autoptr(GError) g_err = NULL;
+    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
+    int ret;
+
+    /* [0] for reading (our end), [1] for writing (back-end's end) */
+    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
+        error_setg(errp, "Failed to set up state transfer pipe: %s",
+                   g_err->message);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    read_fd = pipe_fds[0];
+    write_fd = pipe_fds[1];
+
+    /*
+     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
+     * Ideally, it is suspended, but SUSPEND/RESUME currently do not exist for
+     * vhost-user, so just check that it is stopped at all.
+     */
+    assert(!dev->started);
+
+    /* Transfer ownership of write_fd to the back-end */
+    ret = vhost_set_device_state_fd(dev,
+                                    VHOST_TRANSFER_STATE_DIRECTION_SAVE,
+                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
+                                    write_fd,
+                                    &reply_fd,
+                                    errp);
+    if (ret < 0) {
+        error_prepend(errp, "Failed to initiate state transfer: ");
+        goto fail;
+    }
+
+    /* If the back-end wishes to use a different pipe, switch over */
+    if (reply_fd >= 0) {
+        close(read_fd);
+        read_fd = reply_fd;
+    }
+
+    transfer_buf = g_malloc(chunk_size);
+
+    while (true) {
+        ssize_t read_ret;
+
+        read_ret = RETRY_ON_EINTR(read(read_fd, transfer_buf, chunk_size));
+        if (read_ret < 0) {
+            ret = -errno;
+            error_setg_errno(errp, -ret, "Failed to receive state");
+            goto fail;
+        }
+
+        assert(read_ret <= chunk_size);
+        qemu_put_be32(f, read_ret);
+
+        if (read_ret == 0) {
+            /* EOF */
+            break;
+        }
+
+        qemu_put_buffer(f, transfer_buf, read_ret);
+    }
+
+    /*
+     * Back-end will not really care, but be clean and close our end of the pipe
+     * before inquiring the back-end about whether transfer was successful
+     */
+    close(read_fd);
+    read_fd = -1;
+
+    /* Also, verify that the device is still stopped */
+    assert(!dev->started);
+
+    ret = vhost_check_device_state(dev, errp);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = 0;
+fail:
+    if (read_fd >= 0) {
+        close(read_fd);
+    }
+
+    return ret;
+}
+
+int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
+{
+    size_t transfer_buf_size = 0;
+    g_autofree void *transfer_buf = NULL;
+    g_autoptr(GError) g_err = NULL;
+    int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
+    int ret;
+
+    /* [0] for reading (back-end's end), [1] for writing (our end) */
+    if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
+        error_setg(errp, "Failed to set up state transfer pipe: %s",
+                   g_err->message);
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    read_fd = pipe_fds[0];
+    write_fd = pipe_fds[1];
+
+    /*
+     * VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped.
+     * Ideally, it is suspended, but SUSPEND/RESUME currently do not exist for
+     * vhost-user, so just check that it is stopped at all.
+     */
+    assert(!dev->started);
+
+    /* Transfer ownership of read_fd to the back-end */
+    ret = vhost_set_device_state_fd(dev,
+                                    VHOST_TRANSFER_STATE_DIRECTION_LOAD,
+                                    VHOST_TRANSFER_STATE_PHASE_STOPPED,
+                                    read_fd,
+                                    &reply_fd,
+                                    errp);
+    if (ret < 0) {
+        error_prepend(errp, "Failed to initiate state transfer: ");
+        goto fail;
+    }
+
+    /* If the back-end wishes to use a different pipe, switch over */
+    if (reply_fd >= 0) {
+        close(write_fd);
+        write_fd = reply_fd;
+    }
+
+    while (true) {
+        size_t this_chunk_size = qemu_get_be32(f);
+        ssize_t write_ret;
+        const uint8_t *transfer_pointer;
+
+        if (this_chunk_size == 0) {
+            /* End of state */
+            break;
+        }
+
+        if (transfer_buf_size < this_chunk_size) {
+            transfer_buf = g_realloc(transfer_buf, this_chunk_size);
+            transfer_buf_size = this_chunk_size;
+        }
+
+        if (qemu_get_buffer(f, transfer_buf, this_chunk_size) <
+                this_chunk_size)
+        {
+            error_setg(errp, "Failed to read state");
+            ret = -EINVAL;
+            goto fail;
+        }
+
+        transfer_pointer = transfer_buf;
+        while (this_chunk_size > 0) {
+            write_ret = RETRY_ON_EINTR(
+                write(write_fd, transfer_pointer, this_chunk_size)
+            );
+            if (write_ret < 0) {
+                ret = -errno;
+                error_setg_errno(errp, -ret, "Failed to send state");
+                goto fail;
+            } else if (write_ret == 0) {
+                error_setg(errp, "Failed to send state: Connection is closed");
+                ret = -ECONNRESET;
+                goto fail;
+            }
+
+            assert(write_ret <= this_chunk_size);
+            this_chunk_size -= write_ret;
+            transfer_pointer += write_ret;
+        }
+    }
+
+    /*
+     * Close our end, thus ending transfer, before inquiring the back-end about
+     * whether transfer was successful
+     */
+    close(write_fd);
+    write_fd = -1;
+
+    /* Also, verify that the device is still stopped */
+    assert(!dev->started);
+
+    ret = vhost_check_device_state(dev, errp);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = 0;
+fail:
+    if (write_fd >= 0) {
+        close(write_fd);
+    }
+
+    return ret;
+}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3 5/5] vhost-user-fs: Implement internal migration
  2023-09-15 10:25 [PATCH v3 0/5] vhost-user: Back-end state migration Hanna Czenczek
                   ` (3 preceding siblings ...)
  2023-09-15 10:25 ` [PATCH v3 4/5] vhost: Add high-level state save/load functions Hanna Czenczek
@ 2023-09-15 10:25 ` Hanna Czenczek
  2023-09-25 20:26   ` Stefan Hajnoczi
  2023-09-25 20:48 ` [PATCH v3 0/5] vhost-user: Back-end state migration Stefan Hajnoczi
  5 siblings, 1 reply; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-15 10:25 UTC (permalink / raw)
  To: qemu-devel, virtio-fs
  Cc: Hanna Czenczek, Stefan Hajnoczi, Michael S . Tsirkin,
	Eugenio Pérez, German Maglione

A virtio-fs device's VM state consists of:
- the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
- the back-end's (virtiofsd's) internal state

We get/set the latter via the new vhost operations to transfer migratory
state.  It is its own dedicated subsection, so that for external
migration, it can be disabled.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 hw/virtio/vhost-user-fs.c | 101 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index 49d699ffc2..eb91723855 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -298,9 +298,108 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
     return &fs->vhost_dev;
 }
 
+/**
+ * Fetch the internal state from virtiofsd and save it to `f`.
+ */
+static int vuf_save_state(QEMUFile *f, void *pv, size_t size,
+                          const VMStateField *field, JSONWriter *vmdesc)
+{
+    VirtIODevice *vdev = pv;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    Error *local_error = NULL;
+    int ret;
+
+    ret = vhost_save_backend_state(&fs->vhost_dev, f, &local_error);
+    if (ret < 0) {
+        error_reportf_err(local_error,
+                          "Error saving back-end state of %s device %s "
+                          "(tag: \"%s\"): ",
+                          vdev->name, vdev->parent_obj.canonical_path,
+                          fs->conf.tag ?: "<none>");
+        return ret;
+    }
+
+    return 0;
+}
+
+/**
+ * Load virtiofsd's internal state from `f` and send it over to virtiofsd.
+ */
+static int vuf_load_state(QEMUFile *f, void *pv, size_t size,
+                          const VMStateField *field)
+{
+    VirtIODevice *vdev = pv;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+    Error *local_error = NULL;
+    int ret;
+
+    ret = vhost_load_backend_state(&fs->vhost_dev, f, &local_error);
+    if (ret < 0) {
+        error_reportf_err(local_error,
+                          "Error loading back-end state of %s device %s "
+                          "(tag: \"%s\"): ",
+                          vdev->name, vdev->parent_obj.canonical_path,
+                          fs->conf.tag ?: "<none>");
+        return ret;
+    }
+
+    return 0;
+}
+
+static bool vuf_is_internal_migration(void *opaque)
+{
+    /* TODO: Return false when an external migration is requested */
+    return true;
+}
+
+static int vuf_check_migration_support(void *opaque)
+{
+    VirtIODevice *vdev = opaque;
+    VHostUserFS *fs = VHOST_USER_FS(vdev);
+
+    if (!vhost_supports_device_state(&fs->vhost_dev)) {
+        error_report("Back-end of %s device %s (tag: \"%s\") does not support "
+                     "migration through qemu",
+                     vdev->name, vdev->parent_obj.canonical_path,
+                     fs->conf.tag ?: "<none>");
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vuf_backend_vmstate;
+
 static const VMStateDescription vuf_vmstate = {
     .name = "vhost-user-fs",
-    .unmigratable = 1,
+    .version_id = 0,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &vuf_backend_vmstate,
+        NULL,
+    }
+};
+
+static const VMStateDescription vuf_backend_vmstate = {
+    .name = "vhost-user-fs-backend",
+    .version_id = 0,
+    .needed = vuf_is_internal_migration,
+    .pre_load = vuf_check_migration_support,
+    .pre_save = vuf_check_migration_support,
+    .fields = (VMStateField[]) {
+        {
+            .name = "back-end",
+            .info = &(const VMStateInfo) {
+                .name = "virtio-fs back-end state",
+                .get = vuf_load_state,
+                .put = vuf_save_state,
+            },
+        },
+        VMSTATE_END_OF_LIST()
+    },
 };
 
 static Property vuf_properties[] = {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 1/5] vhost-user.rst: Migrating back-end-internal state
  2023-09-15 10:25 ` [PATCH v3 1/5] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
@ 2023-09-25 19:04   ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-25 19:04 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Eugenio Pérez,
	German Maglione

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

On Fri, Sep 15, 2023 at 12:25:26PM +0200, Hanna Czenczek wrote:
> For vhost-user devices, qemu can migrate the virtio state, but not the
> back-end's internal state.  To do so, we need to be able to transfer
> this internal state between front-end (qemu) and back-end.
> 
> At this point, this new feature is added for the purpose of virtio-fs
> migration.  Because virtiofsd's internal state will not be too large, we
> believe it is best to transfer it as a single binary blob after the
> streaming phase.
> 
> These are the additions to the protocol:
> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_DEVICE_STATE
> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a file
>   descriptor over which to transfer the state.
> - CHECK_DEVICE_STATE: After the state has been transferred through the
>   file descriptor, the front-end invokes this function to verify
>   success.  There is no in-band way (through the file descriptor) to
>   indicate failure, so we need to check explicitly.
> 
> Once the transfer FD has been established via SET_DEVICE_STATE_FD
> (which includes establishing the direction of transfer and migration
> phase), the sending side writes its data into it, and the reading side
> reads it until it sees an EOF.  Then, the front-end will check for
> success via CHECK_DEVICE_STATE, which on the destination side includes
> checking for integrity (i.e. errors during deserialization).
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  docs/interop/vhost-user.rst | 170 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 170 insertions(+)

Great documentation!

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/5] vhost-user.rst: Clarify enabling/disabling vrings
  2023-09-15 10:25 ` [PATCH v3 2/5] vhost-user.rst: Clarify enabling/disabling vrings Hanna Czenczek
@ 2023-09-25 19:15   ` Stefan Hajnoczi
  2023-09-26 13:54     ` Hanna Czenczek
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-25 19:15 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Eugenio Pérez,
	German Maglione

[-- Attachment #1: Type: text/plain, Size: 3346 bytes --]

On Fri, Sep 15, 2023 at 12:25:27PM +0200, Hanna Czenczek wrote:
> Currently, the vhost-user documentation says that rings are to be
> initialized in a disabled state when VHOST_USER_F_PROTOCOL_FEATURES is
> negotiated.  However, by the time of feature negotiation, all rings have
> already been initialized, so it is not entirely clear what this means.
> 
> At least the vhost-user-backend Rust crate's implementation interpreted
> it to mean that whenever this feature is negotiated, all rings are to
> put into a disabled state, which means that every SET_FEATURES call
> would disable all rings, effectively halting the device.  This is
> problematic because the VHOST_F_LOG_ALL feature is also set or cleared
> this way, which happens during migration.  Doing so should not halt the
> device.
> 
> Other implementations have interpreted this to mean that the device is
> to be initialized with all rings disabled, and a subsequent SET_FEATURES
> call that does not set VHOST_USER_F_PROTOCOL_FEATURES will enable all of
> them.  Here, SET_FEATURES will never disable any ring.
> 
> This interpretation does not suffer the problem of unintentionally
> halting the device whenever features are set or cleared, so it seems
> better and more reasonable.
> 
> We should clarify this in the documentation.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  docs/interop/vhost-user.rst | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index bb4dd0fd60..9b9b802c60 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -409,12 +409,20 @@ and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``.
>  
>  Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
>  
> -If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
> -ring starts directly in the enabled state.
> -
> -If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
> -initialized in a disabled state and is enabled by
> -``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
> +If ``VHOST_USER_SET_FEATURES`` does not negotiate
> +``VHOST_USER_F_PROTOCOL_FEATURES``, rings are enabled immediately when
> +started.
> +
> +If ``VHOST_USER_SET_FEATURES`` does negotiate
> +``VHOST_USER_F_PROTOCOL_FEATURES``, each ring will remain in the disabled
> +state until ``VHOST_USER_SET_VRING_ENABLE`` enables it with parameter 1.
> +
> +Back-end implementations that support ``VHOST_USER_F_PROTOCOL_FEATURES``
> +should implement this by initializing each ring in a disabled state, and
> +enabling them when ``VHOST_USER_SET_FEATURES`` is used without
> +negotiating ``VHOST_USER_F_PROTOCOL_FEATURES``.  Other than that, rings
> +should only be enabled and disabled through
> +``VHOST_USER_SET_VRING_ENABLE``.

The "Ring states" section starts by saying there are three states:
"stopped", "started but disabled", and "started and enabled". But this
patch talks about a "disabled state". Can you rephrase this patch to use
the exact state names defined earlier in the spec?

>  
>  While processing the rings (whether they are enabled or not), the back-end
>  must support changing some configuration aspects on the fly.
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/5] vhost-user: Interface for migration state transfer
  2023-09-15 10:25 ` [PATCH v3 3/5] vhost-user: Interface for migration state transfer Hanna Czenczek
@ 2023-09-25 20:18   ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-25 20:18 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Eugenio Pérez,
	German Maglione

[-- Attachment #1: Type: text/plain, Size: 581 bytes --]

On Fri, Sep 15, 2023 at 12:25:28PM +0200, Hanna Czenczek wrote:
> Add the interface for transferring the back-end's state during migration
> as defined previously in vhost-user.rst.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  include/hw/virtio/vhost-backend.h |  24 +++++
>  include/hw/virtio/vhost.h         |  79 ++++++++++++++++
>  hw/virtio/vhost-user.c            | 148 ++++++++++++++++++++++++++++++
>  hw/virtio/vhost.c                 |  37 ++++++++
>  4 files changed, 288 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 4/5] vhost: Add high-level state save/load functions
  2023-09-15 10:25 ` [PATCH v3 4/5] vhost: Add high-level state save/load functions Hanna Czenczek
@ 2023-09-25 20:23   ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-25 20:23 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Eugenio Pérez,
	German Maglione

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

On Fri, Sep 15, 2023 at 12:25:29PM +0200, Hanna Czenczek wrote:
> vhost_save_backend_state() and vhost_load_backend_state() can be used by
> vhost front-ends to easily save and load the back-end's state to/from
> the migration stream.
> 
> Because we do not know the full state size ahead of time,
> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
> writes each chunk consecutively into the migration stream, prefixed by
> its length.  EOF is indicated by a 0-length chunk.
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  include/hw/virtio/vhost.h |  35 +++++++
>  hw/virtio/vhost.c         | 204 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 239 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 5/5] vhost-user-fs: Implement internal migration
  2023-09-15 10:25 ` [PATCH v3 5/5] vhost-user-fs: Implement internal migration Hanna Czenczek
@ 2023-09-25 20:26   ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-25 20:26 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Eugenio Pérez,
	German Maglione, Anton Kuchin

[-- Attachment #1: Type: text/plain, Size: 4545 bytes --]

On Fri, Sep 15, 2023 at 12:25:30PM +0200, Hanna Czenczek wrote:
> A virtio-fs device's VM state consists of:
> - the virtio device (vring) state (VMSTATE_VIRTIO_DEVICE)
> - the back-end's (virtiofsd's) internal state
> 
> We get/set the latter via the new vhost operations to transfer migratory
> state.  It is its own dedicated subsection, so that for external
> migration, it can be disabled.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
>  hw/virtio/vhost-user-fs.c | 101 +++++++++++++++++++++++++++++++++++++-
>  1 file changed, 100 insertions(+), 1 deletion(-)

CCing Anton so he is aware that internal migration is being added.

> 
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index 49d699ffc2..eb91723855 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -298,9 +298,108 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>      return &fs->vhost_dev;
>  }
>  
> +/**
> + * Fetch the internal state from virtiofsd and save it to `f`.
> + */
> +static int vuf_save_state(QEMUFile *f, void *pv, size_t size,
> +                          const VMStateField *field, JSONWriter *vmdesc)
> +{
> +    VirtIODevice *vdev = pv;
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +    Error *local_error = NULL;
> +    int ret;
> +
> +    ret = vhost_save_backend_state(&fs->vhost_dev, f, &local_error);
> +    if (ret < 0) {
> +        error_reportf_err(local_error,
> +                          "Error saving back-end state of %s device %s "
> +                          "(tag: \"%s\"): ",
> +                          vdev->name, vdev->parent_obj.canonical_path,
> +                          fs->conf.tag ?: "<none>");
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +/**
> + * Load virtiofsd's internal state from `f` and send it over to virtiofsd.
> + */
> +static int vuf_load_state(QEMUFile *f, void *pv, size_t size,
> +                          const VMStateField *field)
> +{
> +    VirtIODevice *vdev = pv;
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +    Error *local_error = NULL;
> +    int ret;
> +
> +    ret = vhost_load_backend_state(&fs->vhost_dev, f, &local_error);
> +    if (ret < 0) {
> +        error_reportf_err(local_error,
> +                          "Error loading back-end state of %s device %s "
> +                          "(tag: \"%s\"): ",
> +                          vdev->name, vdev->parent_obj.canonical_path,
> +                          fs->conf.tag ?: "<none>");
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +static bool vuf_is_internal_migration(void *opaque)
> +{
> +    /* TODO: Return false when an external migration is requested */
> +    return true;
> +}
> +
> +static int vuf_check_migration_support(void *opaque)
> +{
> +    VirtIODevice *vdev = opaque;
> +    VHostUserFS *fs = VHOST_USER_FS(vdev);
> +
> +    if (!vhost_supports_device_state(&fs->vhost_dev)) {
> +        error_report("Back-end of %s device %s (tag: \"%s\") does not support "
> +                     "migration through qemu",
> +                     vdev->name, vdev->parent_obj.canonical_path,
> +                     fs->conf.tag ?: "<none>");
> +        return -ENOTSUP;
> +    }
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vuf_backend_vmstate;
> +
>  static const VMStateDescription vuf_vmstate = {
>      .name = "vhost-user-fs",
> -    .unmigratable = 1,
> +    .version_id = 0,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_VIRTIO_DEVICE,
> +        VMSTATE_END_OF_LIST()
> +    },
> +    .subsections = (const VMStateDescription * []) {
> +        &vuf_backend_vmstate,
> +        NULL,
> +    }
> +};
> +
> +static const VMStateDescription vuf_backend_vmstate = {
> +    .name = "vhost-user-fs-backend",
> +    .version_id = 0,
> +    .needed = vuf_is_internal_migration,
> +    .pre_load = vuf_check_migration_support,
> +    .pre_save = vuf_check_migration_support,
> +    .fields = (VMStateField[]) {
> +        {
> +            .name = "back-end",
> +            .info = &(const VMStateInfo) {
> +                .name = "virtio-fs back-end state",
> +                .get = vuf_load_state,
> +                .put = vuf_save_state,
> +            },
> +        },
> +        VMSTATE_END_OF_LIST()
> +    },
>  };
>  
>  static Property vuf_properties[] = {
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 0/5] vhost-user: Back-end state migration
  2023-09-15 10:25 [PATCH v3 0/5] vhost-user: Back-end state migration Hanna Czenczek
                   ` (4 preceding siblings ...)
  2023-09-15 10:25 ` [PATCH v3 5/5] vhost-user-fs: Implement internal migration Hanna Czenczek
@ 2023-09-25 20:48 ` Stefan Hajnoczi
  2023-09-26 13:32   ` [Virtio-fs] " Hanna Czenczek
  5 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-25 20:48 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Eugenio Pérez,
	German Maglione

[-- Attachment #1: Type: text/plain, Size: 2959 bytes --]

On Fri, Sep 15, 2023 at 12:25:25PM +0200, Hanna Czenczek wrote:
> RFC:
> https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html
> 
> v1:
> https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html
> 
> v2:
> https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02604.html
> 
> Hi,
> 
> I’ve decided not to work on vhost-user SUSPEND/RESUME for now – it is
> not technically required for virtio-fs migration, which is the actual
> priority for me now.  While we do want to have SUSPEND/RESUME at some
> point, the only practically existing reason for it is to be able to
> implement vhost-level resetting in virtiofsd, but that is not related to
> migration.

QEMU sends VHOST_USER_SET_STATUS 0 in vhost_dev_stop(). Are you assuming
that virtiofs back-ends do not reset the device upon receiving this
message?

> So one of the changes in v3 is that it no longer depends on the
> vhost-user SUSPEND/RESUME series, and describes the migration protocol
> without the device being suspended at any point, but merely that the
> vrings are stopped.
> 
> Other changes include:
> 
> - Patch 1:
>   - Rephrased a lot
>   - Added a description for the VHOST_USER_SET_DEVICE_STATE_FD
>     parameters
>   - Renamed VHOST_USER_PROTOCOL_F_MIGRATORY_STATE to
>     VHOST_USER_PROTOCOL_F_DEVICE_STATE
>   - enum variants changed in value due to dropping the SUSPEND/RESUME
>     dependency
> 
> - Patch 2:
>   - Pulled in, was a stand-alone patch before
>   - Dropped a sentence about ring state before feature negotiations, as
>     the rings are not to be used during that period anyway
>   - Bit of rephrasing
> 
> - Patch 3:
>   - Renamed “migratory state” to “device state”
>   - enum variants changed in value due to dropping the SUSPEND/RESUME
>     dependency
> 
> - Patch 4:
>   - Changed `f` to @f (referencing parameter “f”) in comments
>   - Use g_autofree for the transfer buffer
>   - Note SUSPEND state as a future feature, not currently existing
>   - Wrap read() and write() in RETRY_ON_EINTR()
> 
> - Patch 5:
>   - Renamed “migratory state” to “device state”
>   - (kept R-b still)
> 
> 
> Hanna Czenczek (5):
>   vhost-user.rst: Migrating back-end-internal state
>   vhost-user.rst: Clarify enabling/disabling vrings
>   vhost-user: Interface for migration state transfer
>   vhost: Add high-level state save/load functions
>   vhost-user-fs: Implement internal migration
> 
>  docs/interop/vhost-user.rst       | 188 ++++++++++++++++++++++-
>  include/hw/virtio/vhost-backend.h |  24 +++
>  include/hw/virtio/vhost.h         | 114 ++++++++++++++
>  hw/virtio/vhost-user-fs.c         | 101 ++++++++++++-
>  hw/virtio/vhost-user.c            | 148 ++++++++++++++++++
>  hw/virtio/vhost.c                 | 241 ++++++++++++++++++++++++++++++
>  6 files changed, 810 insertions(+), 6 deletions(-)
> 
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Virtio-fs] [PATCH v3 0/5] vhost-user: Back-end state migration
  2023-09-25 20:48 ` [PATCH v3 0/5] vhost-user: Back-end state migration Stefan Hajnoczi
@ 2023-09-26 13:32   ` Hanna Czenczek
  2023-09-26 19:20     ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-26 13:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-fs, Eugenio Pérez, qemu-devel, Michael S . Tsirkin

On 25.09.23 22:48, Stefan Hajnoczi wrote:
> On Fri, Sep 15, 2023 at 12:25:25PM +0200, Hanna Czenczek wrote:
>> RFC:
>> https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html
>>
>> v1:
>> https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html
>>
>> v2:
>> https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02604.html
>>
>> Hi,
>>
>> I’ve decided not to work on vhost-user SUSPEND/RESUME for now – it is
>> not technically required for virtio-fs migration, which is the actual
>> priority for me now.  While we do want to have SUSPEND/RESUME at some
>> point, the only practically existing reason for it is to be able to
>> implement vhost-level resetting in virtiofsd, but that is not related to
>> migration.
> QEMU sends VHOST_USER_SET_STATUS 0 in vhost_dev_stop(). Are you assuming
> that virtiofs back-ends do not reset the device upon receiving this
> message?

Absolutely.  vhost_dev_stop() is not in the migration-specific path, but 
is called whenever the vCPUs are stopped, which indeed happens to be 
part of migration, but is also used in other cases, like QMP stop.  We 
have identified that it is wrong that we reset the back-end just because 
the vCPUs are stopped (e.g. on migration), but it is what we do right 
now when the VM is paused (e.g. through QMP stop).

Therefore, stateful back-ends cannot implement reset lest stop/cont 
breaks the device.  I don’t think anybody really cares whether a 
vhost-user back-end actually resets its internal state (if there is any) 
when the guest driver asks for a reset on the virtio level, as long as 
the guest driver is then able to initialize the device afterwards.  I do 
think people care that stop/cont works, so it follows to me that no 
stateful back-end will reset its internal state when receiving a 
virtio/vhost reset.  If they do, stop/cont breaks, which is a 
user-visible bug that needs to be addressed – either properly by 
implementing SUSPEND/RESUME in both qemu and the affected back-ends, or 
by using a similar work-around to virtiofsd, which is to ignore reset 
commands.

It’s hard for me to imagine that people don’t care that stop/cont breaks 
some vhost-user back-end, but suddenly would start to care that 
migration doesn’t work – especially given that first of all someone will 
need to manually implement any migration support in that back-end even 
with this series, which means that really, the only back-end we are 
talking about here is our virtiofsd.  To this day I’m not even aware of 
any other back-end that has internal state.

Hanna

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/5] vhost-user.rst: Clarify enabling/disabling vrings
  2023-09-25 19:15   ` Stefan Hajnoczi
@ 2023-09-26 13:54     ` Hanna Czenczek
  2023-09-26 19:30       ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-26 13:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, virtio-fs, Michael S . Tsirkin, Eugenio Pérez,
	German Maglione

On 25.09.23 21:15, Stefan Hajnoczi wrote:
> On Fri, Sep 15, 2023 at 12:25:27PM +0200, Hanna Czenczek wrote:
>> Currently, the vhost-user documentation says that rings are to be
>> initialized in a disabled state when VHOST_USER_F_PROTOCOL_FEATURES is
>> negotiated.  However, by the time of feature negotiation, all rings have
>> already been initialized, so it is not entirely clear what this means.
>>
>> At least the vhost-user-backend Rust crate's implementation interpreted
>> it to mean that whenever this feature is negotiated, all rings are to
>> put into a disabled state, which means that every SET_FEATURES call
>> would disable all rings, effectively halting the device.  This is
>> problematic because the VHOST_F_LOG_ALL feature is also set or cleared
>> this way, which happens during migration.  Doing so should not halt the
>> device.
>>
>> Other implementations have interpreted this to mean that the device is
>> to be initialized with all rings disabled, and a subsequent SET_FEATURES
>> call that does not set VHOST_USER_F_PROTOCOL_FEATURES will enable all of
>> them.  Here, SET_FEATURES will never disable any ring.
>>
>> This interpretation does not suffer the problem of unintentionally
>> halting the device whenever features are set or cleared, so it seems
>> better and more reasonable.
>>
>> We should clarify this in the documentation.
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   docs/interop/vhost-user.rst | 20 ++++++++++++++------
>>   1 file changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
>> index bb4dd0fd60..9b9b802c60 100644
>> --- a/docs/interop/vhost-user.rst
>> +++ b/docs/interop/vhost-user.rst
>> @@ -409,12 +409,20 @@ and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``.
>>   
>>   Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
>>   
>> -If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
>> -ring starts directly in the enabled state.
>> -
>> -If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
>> -initialized in a disabled state and is enabled by
>> -``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
>> +If ``VHOST_USER_SET_FEATURES`` does not negotiate
>> +``VHOST_USER_F_PROTOCOL_FEATURES``, rings are enabled immediately when
>> +started.
>> +
>> +If ``VHOST_USER_SET_FEATURES`` does negotiate
>> +``VHOST_USER_F_PROTOCOL_FEATURES``, each ring will remain in the disabled
>> +state until ``VHOST_USER_SET_VRING_ENABLE`` enables it with parameter 1.
>> +
>> +Back-end implementations that support ``VHOST_USER_F_PROTOCOL_FEATURES``
>> +should implement this by initializing each ring in a disabled state, and
>> +enabling them when ``VHOST_USER_SET_FEATURES`` is used without
>> +negotiating ``VHOST_USER_F_PROTOCOL_FEATURES``.  Other than that, rings
>> +should only be enabled and disabled through
>> +``VHOST_USER_SET_VRING_ENABLE``.
> The "Ring states" section starts by saying there are three states:
> "stopped", "started but disabled", and "started and enabled". But this
> patch talks about a "disabled state". Can you rephrase this patch to use
> the exact state names defined earlier in the spec?

I would not want to do that.  We had the exact problem that the spec 
wanted to remain high-level, and was not reflecting exactly what 
existing implementations did, which resulted in confusion (at least for 
me and the vhost Rust crates authors).

Notably, the existing implementations I’m aware of track 
enabled/disabled even before the ring is started, exactly as formulated 
here.

If we changed this to read something like “If VHOST_USER_SET_FEATURES is 
ever called without negotiating VHOST_USER_F_PROTOCOL_FEATURES, the ring 
must be enabled immediately when it is started; otherwise, when the ring 
is started and VHOST_USER_F_PROTOCOL_FEATURES has always been set in 
every VHOST_USER_SET_FEATURES call, the ring should be disabled when 
started.” then this would conflict with the existing implementations:

We never disallow VHOST_USER_SET_VRING_ENABLE when the ring is stopped.  
Existing implementations track enabled/disabled before the rings are 
started, and they initialize this state to “disabled”, setting it to 
“enabled” on receiving VHOST_USER_SET_FEATURES without 
VHOST_USER_F_PROTOCOL_FEATURES, as described above.  Therefore, if you 
call VHOST_USER_SET_VRING_ENABLE 1 before the ring is started, the ring 
will start enabled even with VHOST_USER_F_PROTOCOL_FEATURES.  This is 
not possible if you only have three states.

Maybe we should rather clarify that enabled/disabled is tracked even 
while the ring is stopped.

Hanna



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Virtio-fs] [PATCH v3 0/5] vhost-user: Back-end state migration
  2023-09-26 13:32   ` [Virtio-fs] " Hanna Czenczek
@ 2023-09-26 19:20     ` Stefan Hajnoczi
  2023-09-26 20:10       ` Stefan Hajnoczi
  2023-09-27  8:13       ` Hanna Czenczek
  0 siblings, 2 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-26 19:20 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, open list:virtiofs, Eugenio Pérez,
	qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 3928 bytes --]

On Tue, Sep 26, 2023, 09:33 Hanna Czenczek <hreitz@redhat.com> wrote:

> On 25.09.23 22:48, Stefan Hajnoczi wrote:
> > On Fri, Sep 15, 2023 at 12:25:25PM +0200, Hanna Czenczek wrote:
> >> RFC:
> >> https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html
> >>
> >> v1:
> >> https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html
> >>
> >> v2:
> >> https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02604.html
> >>
> >> Hi,
> >>
> >> I’ve decided not to work on vhost-user SUSPEND/RESUME for now – it is
> >> not technically required for virtio-fs migration, which is the actual
> >> priority for me now.  While we do want to have SUSPEND/RESUME at some
> >> point, the only practically existing reason for it is to be able to
> >> implement vhost-level resetting in virtiofsd, but that is not related to
> >> migration.
> > QEMU sends VHOST_USER_SET_STATUS 0 in vhost_dev_stop(). Are you assuming
> > that virtiofs back-ends do not reset the device upon receiving this
> > message?
>

> Absolutely.  vhost_dev_stop() is not in the migration-specific path, but
> is called whenever the vCPUs are stopped, which indeed happens to be
> part of migration, but is also used in other cases, like QMP stop.  We
> have identified that it is wrong that we reset the back-end just because
> the vCPUs are stopped (e.g. on migration), but it is what we do right
> now when the VM is paused (e.g. through QMP stop).
>
> Therefore, stateful back-ends cannot implement reset lest stop/cont
> breaks the device.  I don’t think anybody really cares whether a
> vhost-user back-end actually resets its internal state (if there is any)
> when the guest driver asks for a reset on the virtio level, as long as
> the guest driver is then able to initialize the device afterwards.


Some devices send configuration change notifications. For example,
virtio-net speed and duplex changes.

Imagine a network boot ROM runs and the firmware resets the virtio-net
device when transferring control to the loaded kernel. Before the kernel
driver initializes the device again, the vhost-user-net back-end reports a
speed or duplex change and sends a Configuration Change Notification to the
guest. The guest receives a spurious interrupt because the vhost-user-net
device wasn't actually reset.

I'm concerned that ignoring reset matters (admittedly in corner cases) and
think that stateful device functionality shouldn't be added to the
vhost-user protocol without a solution for reset. This patch series changes
the vhost-user protocol, which is used by many different devices, not just
virtiofs. The solution should work for vhost-user devices of any type and
not be based on what we can get away with when running the current QEMU +
virtiofsd.

I do
> think people care that stop/cont works, so it follows to me that no
> stateful back-end will reset its internal state when receiving a
> virtio/vhost reset.  If they do, stop/cont breaks, which is a
> user-visible bug that needs to be addressed – either properly by
> implementing SUSPEND/RESUME in both qemu and the affected back-ends, or
> by using a similar work-around to virtiofsd, which is to ignore reset
> commands.
>

Properly, please.


> It’s hard for me to imagine that people don’t care that stop/cont breaks
> some vhost-user back-end, but suddenly would start to care that
> migration doesn’t work – especially given that first of all someone will
> need to manually implement any migration support in that back-end even
> with this series, which means that really, the only back-end we are
> talking about here is our virtiofsd.  To this day I’m not even aware of
> any other back-end that has internal state.
>

Another one I can think of is vhost-user-gpu.

Why did you give up on implementing SUSPEND/RESUME?

Stefan


> Hanna
>
>
>

[-- Attachment #2: Type: text/html, Size: 5801 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 2/5] vhost-user.rst: Clarify enabling/disabling vrings
  2023-09-26 13:54     ` Hanna Czenczek
@ 2023-09-26 19:30       ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-26 19:30 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, qemu-devel, virtio-fs, Michael S . Tsirkin,
	Eugenio Pérez, German Maglione

On Tue, 26 Sept 2023 at 09:55, Hanna Czenczek <hreitz@redhat.com> wrote:
>
> On 25.09.23 21:15, Stefan Hajnoczi wrote:
> > On Fri, Sep 15, 2023 at 12:25:27PM +0200, Hanna Czenczek wrote:
> >> Currently, the vhost-user documentation says that rings are to be
> >> initialized in a disabled state when VHOST_USER_F_PROTOCOL_FEATURES is
> >> negotiated.  However, by the time of feature negotiation, all rings have
> >> already been initialized, so it is not entirely clear what this means.
> >>
> >> At least the vhost-user-backend Rust crate's implementation interpreted
> >> it to mean that whenever this feature is negotiated, all rings are to
> >> put into a disabled state, which means that every SET_FEATURES call
> >> would disable all rings, effectively halting the device.  This is
> >> problematic because the VHOST_F_LOG_ALL feature is also set or cleared
> >> this way, which happens during migration.  Doing so should not halt the
> >> device.
> >>
> >> Other implementations have interpreted this to mean that the device is
> >> to be initialized with all rings disabled, and a subsequent SET_FEATURES
> >> call that does not set VHOST_USER_F_PROTOCOL_FEATURES will enable all of
> >> them.  Here, SET_FEATURES will never disable any ring.
> >>
> >> This interpretation does not suffer the problem of unintentionally
> >> halting the device whenever features are set or cleared, so it seems
> >> better and more reasonable.
> >>
> >> We should clarify this in the documentation.
> >>
> >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> >> ---
> >>   docs/interop/vhost-user.rst | 20 ++++++++++++++------
> >>   1 file changed, 14 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> >> index bb4dd0fd60..9b9b802c60 100644
> >> --- a/docs/interop/vhost-user.rst
> >> +++ b/docs/interop/vhost-user.rst
> >> @@ -409,12 +409,20 @@ and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``.
> >>
> >>   Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
> >>
> >> -If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
> >> -ring starts directly in the enabled state.
> >> -
> >> -If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
> >> -initialized in a disabled state and is enabled by
> >> -``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.
> >> +If ``VHOST_USER_SET_FEATURES`` does not negotiate
> >> +``VHOST_USER_F_PROTOCOL_FEATURES``, rings are enabled immediately when
> >> +started.
> >> +
> >> +If ``VHOST_USER_SET_FEATURES`` does negotiate
> >> +``VHOST_USER_F_PROTOCOL_FEATURES``, each ring will remain in the disabled
> >> +state until ``VHOST_USER_SET_VRING_ENABLE`` enables it with parameter 1.
> >> +
> >> +Back-end implementations that support ``VHOST_USER_F_PROTOCOL_FEATURES``
> >> +should implement this by initializing each ring in a disabled state, and
> >> +enabling them when ``VHOST_USER_SET_FEATURES`` is used without
> >> +negotiating ``VHOST_USER_F_PROTOCOL_FEATURES``.  Other than that, rings
> >> +should only be enabled and disabled through
> >> +``VHOST_USER_SET_VRING_ENABLE``.
> > The "Ring states" section starts by saying there are three states:
> > "stopped", "started but disabled", and "started and enabled". But this
> > patch talks about a "disabled state". Can you rephrase this patch to use
> > the exact state names defined earlier in the spec?
>
> I would not want to do that.  We had the exact problem that the spec
> wanted to remain high-level, and was not reflecting exactly what
> existing implementations did, which resulted in confusion (at least for
> me and the vhost Rust crates authors).
>
> Notably, the existing implementations I’m aware of track
> enabled/disabled even before the ring is started, exactly as formulated
> here.
>
> If we changed this to read something like “If VHOST_USER_SET_FEATURES is
> ever called without negotiating VHOST_USER_F_PROTOCOL_FEATURES, the ring
> must be enabled immediately when it is started; otherwise, when the ring
> is started and VHOST_USER_F_PROTOCOL_FEATURES has always been set in
> every VHOST_USER_SET_FEATURES call, the ring should be disabled when
> started.” then this would conflict with the existing implementations:
>
> We never disallow VHOST_USER_SET_VRING_ENABLE when the ring is stopped.
> Existing implementations track enabled/disabled before the rings are
> started, and they initialize this state to “disabled”, setting it to
> “enabled” on receiving VHOST_USER_SET_FEATURES without
> VHOST_USER_F_PROTOCOL_FEATURES, as described above.  Therefore, if you
> call VHOST_USER_SET_VRING_ENABLE 1 before the ring is started, the ring
> will start enabled even with VHOST_USER_F_PROTOCOL_FEATURES.  This is
> not possible if you only have three states.
>
> Maybe we should rather clarify that enabled/disabled is tracked even
> while the ring is stopped.

Yes, please. The beginning of the "Ring states" section is confusing
because it claims there are three states but later text doesn't use
those states. Instead it treats started/stopped and enabled/disabled
as independent.

If the entire "Ring states" section consistently talks about
started/stopped and enabled/disabled as independent, then that would
make it easier to understand.

Stefan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Virtio-fs] [PATCH v3 0/5] vhost-user: Back-end state migration
  2023-09-26 19:20     ` Stefan Hajnoczi
@ 2023-09-26 20:10       ` Stefan Hajnoczi
  2023-09-27  8:32         ` Hanna Czenczek
  2023-09-27  8:13       ` Hanna Czenczek
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-26 20:10 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, open list:virtiofs, Eugenio Pérez,
	qemu-devel, Michael S . Tsirkin

Hi Hanna,
I was thinking about how this could work without SUSPEND/RESUME. What
do you think of the following?

1. The front-end sends VHOST_USER_RESET_DEVICE (or
VHOST_USER_RESET_OWNER, when necessary) when the guest driver resets
the device but not on vhost_dev_start()/vhost_dev_stop().
2. Suspend the device when all virtqueues are stopped via
VHOST_USER_GET_VRING_BASE. Resume the device after at least one
virtqueue is started and enabled.
3. Ignore VHOST_USER_SET_STATUS.

Reset would work. The device would suspend and resume without losing
state. Existing vhost-user backends already behave like this in
practice (they often don't implement RESET_DEVICE).

It's close enough to what you're proposing that it doesn't require
much additional work, but I think it covers the cases.

Two concerns:
1. It's specific to vhost-user and diverges from vDPA.
2. VHOST_USER_SET_STATUS might be needed in the future even though
it's not useful today.

Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Virtio-fs] [PATCH v3 0/5] vhost-user: Back-end state migration
  2023-09-26 19:20     ` Stefan Hajnoczi
  2023-09-26 20:10       ` Stefan Hajnoczi
@ 2023-09-27  8:13       ` Hanna Czenczek
  1 sibling, 0 replies; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-27  8:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, open list:virtiofs, Eugenio Pérez,
	qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 7202 bytes --]

On 26.09.23 21:20, Stefan Hajnoczi wrote:
>
>
> On Tue, Sep 26, 2023, 09:33 Hanna Czenczek <hreitz@redhat.com> wrote:
>
>     On 25.09.23 22:48, Stefan Hajnoczi wrote:
>     > On Fri, Sep 15, 2023 at 12:25:25PM +0200, Hanna Czenczek wrote:
>     >> RFC:
>     >>
>     https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html
>     >>
>     >> v1:
>     >>
>     https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html
>     >>
>     >> v2:
>     >>
>     https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02604.html
>     >>
>     >> Hi,
>     >>
>     >> I’ve decided not to work on vhost-user SUSPEND/RESUME for now –
>     it is
>     >> not technically required for virtio-fs migration, which is the
>     actual
>     >> priority for me now.  While we do want to have SUSPEND/RESUME
>     at some
>     >> point, the only practically existing reason for it is to be able to
>     >> implement vhost-level resetting in virtiofsd, but that is not
>     related to
>     >> migration.
>     > QEMU sends VHOST_USER_SET_STATUS 0 in vhost_dev_stop(). Are you
>     assuming
>     > that virtiofs back-ends do not reset the device upon receiving this
>     > message?
>
>
>     Absolutely.  vhost_dev_stop() is not in the migration-specific
>     path, but
>     is called whenever the vCPUs are stopped, which indeed happens to be
>     part of migration, but is also used in other cases, like QMP
>     stop.  We
>     have identified that it is wrong that we reset the back-end just
>     because
>     the vCPUs are stopped (e.g. on migration), but it is what we do right
>     now when the VM is paused (e.g. through QMP stop).
>
>     Therefore, stateful back-ends cannot implement reset lest stop/cont
>     breaks the device.  I don’t think anybody really cares whether a
>     vhost-user back-end actually resets its internal state (if there
>     is any)
>     when the guest driver asks for a reset on the virtio level, as
>     long as
>     the guest driver is then able to initialize the device afterwards.
>
>
> Some devices send configuration change notifications. For example, 
> virtio-net speed and duplex changes.
>
> Imagine a network boot ROM runs and the firmware resets the virtio-net 
> device when transferring control to the loaded kernel. Before the 
> kernel driver initializes the device again, the vhost-user-net 
> back-end reports a speed or duplex change and sends a Configuration 
> Change Notification to the guest. The guest receives a spurious 
> interrupt because the vhost-user-net device wasn't actually reset.

I don’t see how this relates to my argument that no stateful back-end 
can implement a full reset because doing so would break stop/cont.

If vhost-user-net were stateful (which it isn’t, AFAIK), it could choose 
to implement a work-around such that it would stop sending notifications 
on reset, but not reset its internal state.  Then, when qemu restores 
vring state in vhost_dev_start(), it would resume sending 
notifications.  But again, I fail to see how this is not already an 
issue for stop/cont.

> I'm concerned that ignoring reset matters (admittedly in corner cases) 
> and think that stateful device functionality shouldn't be added to the 
> vhost-user protocol without a solution for reset.

I disagree.  We have a stateful device already, whether we add 
functionality acknowledging this to the protocol or not.  The problem 
exists.  It is independent of migration.  If there’s a problem because 
of this with migration, there’s a problem with stop/cont, too, that must 
already have been worked around.

> This patch series changes the vhost-user protocol, which is used by 
> many different devices, not just virtiofs. The solution should work 
> for vhost-user devices of any type and not be based on what we can get 
> away with when running the current QEMU + virtiofsd.

My argument was generic.  Any existing stateful device implementation 
must implement reset in such a way that it won’t break stop/cont, i.e., 
it must not reset its internal state.

>
>     I do
>     think people care that stop/cont works, so it follows to me that no
>     stateful back-end will reset its internal state when receiving a
>     virtio/vhost reset.  If they do, stop/cont breaks, which is a
>     user-visible bug that needs to be addressed – either properly by
>     implementing SUSPEND/RESUME in both qemu and the affected
>     back-ends, or
>     by using a similar work-around to virtiofsd, which is to ignore reset
>     commands.
>
>
> Properly, please.

You misunderstand me.  I’m not presenting the choice I have.  I’m 
presenting the choices existing implementations *have had until this 
point*.  *None* chose to do it properly.  I don’t know of stateful 
implementations besides virtiofsd, but virtiofsd chose to be content 
with not implementing reset and thus having things “just work”.

The work-arounds must exist already.

>
>
>     It’s hard for me to imagine that people don’t care that stop/cont
>     breaks
>     some vhost-user back-end, but suddenly would start to care that
>     migration doesn’t work – especially given that first of all
>     someone will
>     need to manually implement any migration support in that back-end
>     even
>     with this series, which means that really, the only back-end we are
>     talking about here is our virtiofsd.  To this day I’m not even
>     aware of
>     any other back-end that has internal state.
>
>
> Another one I can think of is vhost-user-gpu.

I sure hope stop/cont works for them.

> Why did you give up on implementing SUSPEND/RESUME?

Because I think it’s unnecessary for implementing migration, and 
migration is what’s on my priority list.  None of these issues are new, 
they have always existed with stop/cont, work-arounds must be in place 
to make stop/cont work, and because I don’t see the difference in how 
stop/cont is used outside of migration and how it is used during 
migration, I assume those work-arounds must work for migration as well.

Implementing virtio-fs migration is a chain of dependencies.  We need at 
least the specification to be in qemu before we can start sending merge 
requests to the vhost Rust crates to implement support there.  We need 
that support there before we can make the changes to virtiofsd.

Adding SUSPEND/RESUME adds another hard dependency to the whole 
discussion (this would have to go on before this series), which has 
proven absolutely clearly in the past months that it is a very complex 
finnicky isue that would take a ton of time still.  And I can’t justify 
that for myself, given that I don’t see any practically existing problem.

PS: As far as I remember, vhost-user doesn’t even have a working reset 
today.  vhost_dev_stop() calls vhost_reset_status(), which is a no-op 
unless the back-end supports SET_STATUS.  The only back-end 
implementation we found (while discussing SUSPEND/RESUME) to support 
SET_STATUS was dpdk, but while it logs SET_STATUS 0 as a reset, it 
doesn’t do a reset, i.e. doesn’t call reset_device(), which it would do 
on RESET_OWNER.

Hanna

[-- Attachment #2: Type: text/html, Size: 14681 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Virtio-fs] [PATCH v3 0/5] vhost-user: Back-end state migration
  2023-09-26 20:10       ` Stefan Hajnoczi
@ 2023-09-27  8:32         ` Hanna Czenczek
  2023-09-27 20:19           ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Hanna Czenczek @ 2023-09-27  8:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, open list:virtiofs, Eugenio Pérez,
	qemu-devel, Michael S . Tsirkin

On 26.09.23 22:10, Stefan Hajnoczi wrote:
> Hi Hanna,
> I was thinking about how this could work without SUSPEND/RESUME. What
> do you think of the following?
>
> 1. The front-end sends VHOST_USER_RESET_DEVICE (or
> VHOST_USER_RESET_OWNER, when necessary) when the guest driver resets
> the device but not on vhost_dev_start()/vhost_dev_stop().

This is half the work of SUSPEND/RESUME.  It isn’t easy to do.

> 2. Suspend the device when all virtqueues are stopped via
> VHOST_USER_GET_VRING_BASE. Resume the device after at least one
> virtqueue is started and enabled.
> 3. Ignore VHOST_USER_SET_STATUS.
>
> Reset would work. The device would suspend and resume without losing
> state. Existing vhost-user backends already behave like this in
> practice (they often don't implement RESET_DEVICE).

I don’t understand the point, though.  Today, reset in practice is a 
no-op anyway, precisely because we only send SET_STATUS 0, don’t fall 
back to
RESET_OWNER/RESET_DEVICE, and no back-end implements SET_STATUS 0 as a 
reset.  By sending RESET_* in case of a guest-initiated reset and 
nothing in case of stop/cont, we effectively don’t change anything about 
the latter (which is what SUSPEND/RESUME would be for), but only fix the 
former case.  While I agree that it’s wrong that we don’t really reset 
the back-end in case of a guest-initiated reset, this is the first time 
in this whole discussion that that part has been presented as a problem 
that needs fixing now.

So the proposal effectively changes nothing for the 
vhost_dev_stop()/start() case where we’d want to make use of 
SUSPEND/RESUME, but only for the case where we would not use it.

Hanna

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Virtio-fs] [PATCH v3 0/5] vhost-user: Back-end state migration
  2023-09-27  8:32         ` Hanna Czenczek
@ 2023-09-27 20:19           ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-27 20:19 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: Stefan Hajnoczi, open list:virtiofs, Eugenio Pérez,
	qemu-devel, Michael S . Tsirkin

[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]

On Wed, Sep 27, 2023 at 10:32:14AM +0200, Hanna Czenczek wrote:
> On 26.09.23 22:10, Stefan Hajnoczi wrote:
> > Hi Hanna,
> > I was thinking about how this could work without SUSPEND/RESUME. What
> > do you think of the following?
> > 
> > 1. The front-end sends VHOST_USER_RESET_DEVICE (or
> > VHOST_USER_RESET_OWNER, when necessary) when the guest driver resets
> > the device but not on vhost_dev_start()/vhost_dev_stop().
> 
> This is half the work of SUSPEND/RESUME.  It isn’t easy to do.

I sent a patch series to bring VHOST_USER_RESET_DEVICE to all vhost-user
devices:
https://lore.kernel.org/qemu-devel/20230927192737.528280-1-stefanha@redhat.com/T/#t

> 
> > 2. Suspend the device when all virtqueues are stopped via
> > VHOST_USER_GET_VRING_BASE. Resume the device after at least one
> > virtqueue is started and enabled.
> > 3. Ignore VHOST_USER_SET_STATUS.
> > 
> > Reset would work. The device would suspend and resume without losing
> > state. Existing vhost-user backends already behave like this in
> > practice (they often don't implement RESET_DEVICE).
> 
> I don’t understand the point, though.  Today, reset in practice is a no-op
> anyway, precisely because we only send SET_STATUS 0, don’t fall back to
> RESET_OWNER/RESET_DEVICE, and no back-end implements SET_STATUS 0 as a
> reset.  By sending RESET_* in case of a guest-initiated reset and nothing in
> case of stop/cont, we effectively don’t change anything about the latter
> (which is what SUSPEND/RESUME would be for), but only fix the former case. 
> While I agree that it’s wrong that we don’t really reset the back-end in
> case of a guest-initiated reset, this is the first time in this whole
> discussion that that part has been presented as a problem that needs fixing
> now.
> 
> So the proposal effectively changes nothing for the vhost_dev_stop()/start()
> case where we’d want to make use of SUSPEND/RESUME, but only for the case
> where we would not use it.

We discussed this on a call today. 2 & 3 are additions to the spec that
Hanna has agreed to work on.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-09-27 20:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-15 10:25 [PATCH v3 0/5] vhost-user: Back-end state migration Hanna Czenczek
2023-09-15 10:25 ` [PATCH v3 1/5] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
2023-09-25 19:04   ` Stefan Hajnoczi
2023-09-15 10:25 ` [PATCH v3 2/5] vhost-user.rst: Clarify enabling/disabling vrings Hanna Czenczek
2023-09-25 19:15   ` Stefan Hajnoczi
2023-09-26 13:54     ` Hanna Czenczek
2023-09-26 19:30       ` Stefan Hajnoczi
2023-09-15 10:25 ` [PATCH v3 3/5] vhost-user: Interface for migration state transfer Hanna Czenczek
2023-09-25 20:18   ` Stefan Hajnoczi
2023-09-15 10:25 ` [PATCH v3 4/5] vhost: Add high-level state save/load functions Hanna Czenczek
2023-09-25 20:23   ` Stefan Hajnoczi
2023-09-15 10:25 ` [PATCH v3 5/5] vhost-user-fs: Implement internal migration Hanna Czenczek
2023-09-25 20:26   ` Stefan Hajnoczi
2023-09-25 20:48 ` [PATCH v3 0/5] vhost-user: Back-end state migration Stefan Hajnoczi
2023-09-26 13:32   ` [Virtio-fs] " Hanna Czenczek
2023-09-26 19:20     ` Stefan Hajnoczi
2023-09-26 20:10       ` Stefan Hajnoczi
2023-09-27  8:32         ` Hanna Czenczek
2023-09-27 20:19           ` Stefan Hajnoczi
2023-09-27  8:13       ` Hanna Czenczek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).