From: Stefan Hajnoczi <stefanha@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: Hanna Czenczek <hreitz@redhat.com>,
qemu-devel@nongnu.org, virtio-fs@redhat.com,
German Maglione <gmaglione@redhat.com>,
Anton Kuchin <antonkuchin@yandex-team.ru>,
Juan Quintela <quintela@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: [PATCH 2/4] vhost-user: Interface for migration state transfer
Date: Mon, 17 Apr 2023 13:14:05 -0400 [thread overview]
Message-ID: <20230417171405.GA3890522@fedora> (raw)
In-Reply-To: <CAJaqyWfm=g_hr9=WpsnwJ4hdpVb7K7p5rirWjvx=PxKYUp8trA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 7698 bytes --]
On Thu, Apr 13, 2023 at 12:14:24PM +0200, Eugenio Perez Martin wrote:
> On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek wrote:
> > > So-called "internal" virtio-fs migration refers to transporting the
> > > back-end's (virtiofsd's) state through qemu's migration stream. To do
> > > this, we need to be able to transfer virtiofsd's internal state to and
> > > from virtiofsd.
> > >
> > > Because virtiofsd's internal state will not be too large, we believe it
> > > is best to transfer it as a single binary blob after the streaming
> > > phase. Because this method should be useful to other vhost-user
> > > implementations, too, it is introduced as a general-purpose addition to
> > > the protocol, not limited to vhost-user-fs.
> > >
> > > These are the additions to the protocol:
> > > - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> > > This feature signals support for transferring state, and is added so
> > > that migration can fail early when the back-end has no support.
> > >
> > > - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
> > > over which to transfer the state. The front-end sends an FD to the
> > > back-end into/from which it can write/read its state, and the back-end
> > > can decide to either use it, or reply with a different FD for the
> > > front-end to override the front-end's choice.
> > > The front-end creates a simple pipe to transfer the state, but maybe
> > > the back-end already has an FD into/from which it has to write/read
> > > its state, in which case it will want to override the simple pipe.
> > > Conversely, maybe in the future we find a way to have the front-end
> > > get an immediate FD for the migration stream (in some cases), in which
> > > case we will want to send this to the back-end instead of creating a
> > > pipe.
> > > Hence the negotiation: If one side has a better idea than a plain
> > > pipe, we will want to use that.
> > >
> > > - CHECK_DEVICE_STATE: After the state has been transferred through the
> > > pipe (the end indicated by EOF), the front-end invokes this function
> > > to verify success. There is no in-band way (through the pipe) to
> > > indicate failure, so we need to check explicitly.
> > >
> > > Once the transfer pipe has been established via SET_DEVICE_STATE_FD
> > > (which includes establishing the direction of transfer and migration
> > > phase), the sending side writes its data into the pipe, and the reading
> > > side reads it until it sees an EOF. Then, the front-end will check for
> > > success via CHECK_DEVICE_STATE, which on the destination side includes
> > > checking for integrity (i.e. errors during deserialization).
> > >
> > > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> > > ---
> > > include/hw/virtio/vhost-backend.h | 24 +++++
> > > include/hw/virtio/vhost.h | 79 ++++++++++++++++
> > > hw/virtio/vhost-user.c | 147 ++++++++++++++++++++++++++++++
> > > hw/virtio/vhost.c | 37 ++++++++
> > > 4 files changed, 287 insertions(+)
> > >
> > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
> > > index ec3fbae58d..5935b32fe3 100644
> > > --- a/include/hw/virtio/vhost-backend.h
> > > +++ b/include/hw/virtio/vhost-backend.h
> > > @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
> > > VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
> > > } VhostSetConfigType;
> > >
> > > +typedef enum VhostDeviceStateDirection {
> > > + /* Transfer state from back-end (device) to front-end */
> > > + VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
> > > + /* Transfer state from front-end to back-end (device) */
> > > + VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
> > > +} VhostDeviceStateDirection;
> > > +
> > > +typedef enum VhostDeviceStatePhase {
> > > + /* The device (and all its vrings) is stopped */
> > > + VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
> > > +} VhostDeviceStatePhase;
> >
> > vDPA has:
> >
> > /* Suspend a device so it does not process virtqueue requests anymore
> > *
> > * After the return of ioctl the device must preserve all the necessary state
> > * (the virtqueue vring base plus the possible device specific states) that is
> > * required for restoring in the future. The device must not change its
> > * configuration after that point.
> > */
> > #define VHOST_VDPA_SUSPEND _IO(VHOST_VIRTIO, 0x7D)
> >
> > /* Resume a device so it can resume processing virtqueue requests
> > *
> > * After the return of this ioctl the device will have restored all the
> > * necessary states and it is fully operational to continue processing the
> > * virtqueue descriptors.
> > */
> > #define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E)
> >
> > I wonder if it makes sense to import these into vhost-user so that the
> > difference between kernel vhost and vhost-user is minimized. It's okay
> > if one of them is ahead of the other, but it would be nice to avoid
> > overlapping/duplicated functionality.
> >
>
> That's what I had in mind in the first versions. I proposed VHOST_STOP
> instead of VHOST_VDPA_STOP for this very reason. Later it did change
> to SUSPEND.
>
> Generally it is better if we make the interface less parametrized and
> we trust in the messages and its semantics in my opinion. In other
> words, instead of
> vhost_set_device_state_fd_op(VHOST_TRANSFER_STATE_PHASE_STOPPED), send
> individually the equivalent of VHOST_VDPA_SUSPEND vhost-user command.
>
> Another way to apply this is with the "direction" parameter. Maybe it
> is better to split it into "set_state_fd" and "get_state_fd"?
>
> In that case, reusing the ioctls as vhost-user messages would be ok.
> But that puts this proposal further from the VFIO code, which uses
> "migration_set_state(state)", and maybe it is better when the number
> of states is high.
Hi Eugenio,
Another question about vDPA suspend/resume:
/* Host notifiers must be enabled at this point. */
void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
{
int i;
/* should only be called after backend is connected */
assert(hdev->vhost_ops);
event_notifier_test_and_clear(
&hdev->vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier);
event_notifier_test_and_clear(&vdev->config_notifier);
trace_vhost_dev_stop(hdev, vdev->name, vrings);
if (hdev->vhost_ops->vhost_dev_start) {
hdev->vhost_ops->vhost_dev_start(hdev, false);
^^^ SUSPEND ^^^
}
if (vrings) {
vhost_dev_set_vring_enable(hdev, false);
}
for (i = 0; i < hdev->nvqs; ++i) {
vhost_virtqueue_stop(hdev,
vdev,
hdev->vqs + i,
hdev->vq_index + i);
^^^ fetch virtqueue state from kernel ^^^
}
if (hdev->vhost_ops->vhost_reset_status) {
hdev->vhost_ops->vhost_reset_status(hdev);
^^^ reset device^^^
I noticed the QEMU vDPA code resets the device in vhost_dev_stop() ->
vhost_reset_status(). The device's migration code runs after
vhost_dev_stop() and the state will have been lost.
It looks like vDPA changes are necessary in order to support stateful
devices even though QEMU already uses SUSPEND. Is my understanding
correct?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-04-17 17:15 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 15:05 [PATCH 0/4] vhost-user-fs: Internal migration Hanna Czenczek
2023-04-11 15:05 ` [PATCH 1/4] vhost: Re-enable vrings after setting features Hanna Czenczek
2023-04-12 10:55 ` German Maglione
2023-04-12 12:18 ` Hanna Czenczek
2023-04-12 20:51 ` Stefan Hajnoczi
2023-04-13 7:17 ` Maxime Coquelin
2023-04-13 8:19 ` Hanna Czenczek
2023-04-13 11:03 ` Stefan Hajnoczi
2023-04-13 14:24 ` Anton Kuchin
2023-04-13 15:48 ` Michael S. Tsirkin
2023-04-13 11:03 ` Stefan Hajnoczi
2023-04-13 17:32 ` Hanna Czenczek
2023-04-13 13:19 ` Michael S. Tsirkin
2023-04-11 15:05 ` [PATCH 2/4] vhost-user: Interface for migration state transfer Hanna Czenczek
2023-04-12 21:06 ` Stefan Hajnoczi
2023-04-13 9:24 ` Hanna Czenczek
2023-04-13 11:38 ` Stefan Hajnoczi
2023-04-13 17:55 ` Hanna Czenczek
2023-04-13 20:42 ` Stefan Hajnoczi
2023-04-14 15:17 ` Eugenio Perez Martin
2023-04-17 15:18 ` Stefan Hajnoczi
2023-04-17 18:55 ` Eugenio Perez Martin
2023-04-17 19:08 ` Stefan Hajnoczi
2023-04-17 19:11 ` Eugenio Perez Martin
2023-04-17 19:46 ` Stefan Hajnoczi
2023-04-18 10:09 ` Eugenio Perez Martin
2023-04-19 10:45 ` Hanna Czenczek
2023-04-19 10:57 ` Stefan Hajnoczi
2023-04-13 10:14 ` Eugenio Perez Martin
2023-04-13 11:07 ` Stefan Hajnoczi
2023-04-13 17:31 ` Hanna Czenczek
2023-04-17 15:12 ` Stefan Hajnoczi
2023-04-19 10:47 ` Hanna Czenczek
2023-04-17 18:37 ` Eugenio Perez Martin
2023-04-17 15:38 ` Stefan Hajnoczi
2023-04-17 19:09 ` Eugenio Perez Martin
2023-04-17 19:33 ` Stefan Hajnoczi
2023-04-18 8:09 ` Eugenio Perez Martin
2023-04-18 17:59 ` Stefan Hajnoczi
2023-04-18 18:31 ` Eugenio Perez Martin
2023-04-18 20:40 ` Stefan Hajnoczi
2023-04-20 13:27 ` Eugenio Pérez
2023-05-08 19:12 ` Stefan Hajnoczi
2023-05-09 6:31 ` Eugenio Perez Martin
2023-05-09 9:01 ` Hanna Czenczek
2023-05-09 15:26 ` Eugenio Perez Martin
2023-04-19 10:57 ` [Virtio-fs] " Hanna Czenczek
2023-04-19 11:10 ` Stefan Hajnoczi
2023-04-19 11:15 ` Hanna Czenczek
2023-04-19 11:24 ` Stefan Hajnoczi
2023-04-17 17:14 ` Stefan Hajnoczi [this message]
2023-04-17 19:06 ` Eugenio Perez Martin
2023-04-17 19:20 ` Stefan Hajnoczi
2023-04-18 7:54 ` Eugenio Perez Martin
2023-04-19 11:10 ` Hanna Czenczek
2023-04-19 11:21 ` Stefan Hajnoczi
2023-04-19 11:24 ` Hanna Czenczek
2023-04-20 13:29 ` Eugenio Pérez
2023-05-08 20:10 ` Stefan Hajnoczi
2023-05-09 6:45 ` Eugenio Perez Martin
2023-05-09 15:09 ` Stefan Hajnoczi
2023-05-09 15:35 ` Eugenio Perez Martin
2023-05-09 17:33 ` Stefan Hajnoczi
2023-04-20 10:44 ` Eugenio Pérez
2023-04-13 8:50 ` Eugenio Perez Martin
2023-04-13 9:25 ` Hanna Czenczek
2023-04-11 15:05 ` [PATCH 3/4] vhost: Add high-level state save/load functions Hanna Czenczek
2023-04-12 21:14 ` Stefan Hajnoczi
2023-04-13 9:04 ` Hanna Czenczek
2023-04-13 11:22 ` Stefan Hajnoczi
2023-04-11 15:05 ` [PATCH 4/4] vhost-user-fs: Implement internal migration Hanna Czenczek
2023-04-12 21:00 ` [PATCH 0/4] vhost-user-fs: Internal migration Stefan Hajnoczi
2023-04-13 8:20 ` Hanna Czenczek
2023-04-13 16:11 ` Michael S. Tsirkin
2023-04-13 17:53 ` [Virtio-fs] " Hanna Czenczek
2023-05-04 16:05 ` Hanna Czenczek
2023-05-04 21:14 ` Stefan Hajnoczi
2023-05-05 9:03 ` Hanna Czenczek
2023-05-05 9:51 ` Hanna Czenczek
2023-05-05 14:26 ` Eugenio Perez Martin
2023-05-05 14:37 ` Hanna Czenczek
2023-05-08 17:00 ` Hanna Czenczek
2023-05-08 17:51 ` Eugenio Perez Martin
2023-05-08 19:31 ` Eugenio Perez Martin
2023-05-09 8:59 ` Hanna Czenczek
2023-05-09 15:30 ` Stefan Hajnoczi
2023-05-09 15:43 ` Eugenio Perez Martin
2023-05-05 9:53 ` Eugenio Perez Martin
2023-05-05 12:51 ` Hanna Czenczek
2023-05-08 21:10 ` Stefan Hajnoczi
2023-05-09 8:53 ` Hanna Czenczek
2023-05-09 14:53 ` Stefan Hajnoczi
2023-05-09 15:41 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230417171405.GA3890522@fedora \
--to=stefanha@redhat.com \
--cc=antonkuchin@yandex-team.ru \
--cc=eperezma@redhat.com \
--cc=gmaglione@redhat.com \
--cc=hreitz@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=sgarzare@redhat.com \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).