From: Hao Xu <hao.xu@linux.dev>
To: Hanna Czenczek <hreitz@redhat.com>,
qemu-devel@nongnu.org, virtio-fs@redhat.com
Cc: "Eugenio Pérez" <eperezma@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer
Date: Thu, 20 Jul 2023 23:05:24 +0800 [thread overview]
Message-ID: <d3d992ff-3d3d-1282-cdbc-f7ec1ad175cb@linux.dev> (raw)
In-Reply-To: <14677814-9707-6a08-7b07-4532fad5f7a1@redhat.com>
On 7/20/23 21:20, Hanna Czenczek wrote:
> On 20.07.23 14:13, Hao Xu wrote:
>>
>> On 7/12/23 19:17, Hanna Czenczek wrote:
>>> Add the interface for transferring the back-end's state during
>>> migration
>>> as defined previously in vhost-user.rst.
>>>
>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>>> ---
>>> include/hw/virtio/vhost-backend.h | 24 +++++
>>> include/hw/virtio/vhost.h | 79 ++++++++++++++++
>>> hw/virtio/vhost-user.c | 147
>>> ++++++++++++++++++++++++++++++
>>> hw/virtio/vhost.c | 37 ++++++++
>>> 4 files changed, 287 insertions(+)
>>>
>>> diff --git a/include/hw/virtio/vhost-backend.h
>>> b/include/hw/virtio/vhost-backend.h
>>> index 31a251a9f5..e59d0b53f8 100644
>>> --- a/include/hw/virtio/vhost-backend.h
>>> +++ b/include/hw/virtio/vhost-backend.h
>>> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
>>> VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>>> } VhostSetConfigType;
>>> +typedef enum VhostDeviceStateDirection {
>>> + /* Transfer state from back-end (device) to front-end */
>>> + VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
>>> + /* Transfer state from front-end to back-end (device) */
>>> + VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
>>> +} VhostDeviceStateDirection;
>>> +
>>> +typedef enum VhostDeviceStatePhase {
>>> + /* The device (and all its vrings) is stopped */
>>> + VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
>>> +} VhostDeviceStatePhase;
>>> +
>>> struct vhost_inflight;
>>> struct vhost_dev;
>>> struct vhost_log;
>>> @@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct
>>> vhost_dev *dev,
>>> typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
>>> +typedef bool (*vhost_supports_migratory_state_op)(struct
>>> vhost_dev *dev);
>>> +typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
>>> + VhostDeviceStateDirection direction,
>>> + VhostDeviceStatePhase phase,
>>> + int fd,
>>> + int *reply_fd,
>>> + Error **errp);
>>> +typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev,
>>> Error **errp);
>>> +
>>> typedef struct VhostOps {
>>> VhostBackendType backend_type;
>>> vhost_backend_init vhost_backend_init;
>>> @@ -181,6 +202,9 @@ typedef struct VhostOps {
>>> vhost_force_iommu_op vhost_force_iommu;
>>> vhost_set_config_call_op vhost_set_config_call;
>>> vhost_reset_status_op vhost_reset_status;
>>> + vhost_supports_migratory_state_op vhost_supports_migratory_state;
>>> + vhost_set_device_state_fd_op vhost_set_device_state_fd;
>>> + vhost_check_device_state_op vhost_check_device_state;
>>> } VhostOps;
>>> int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>> index 69bf59d630..d8877496e5 100644
>>> --- a/include/hw/virtio/vhost.h
>>> +++ b/include/hw/virtio/vhost.h
>>> @@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
>>> int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t
>>> queue_size,
>>> struct vhost_inflight *inflight);
>>> bool vhost_dev_has_iommu(struct vhost_dev *dev);
>>> +
>>> +/**
>>> + * vhost_supports_migratory_state(): Checks whether the back-end
>>> + * supports transferring internal state for the purpose of migration.
>>> + * Support for this feature is required for
>>> vhost_set_device_state_fd()
>>> + * and vhost_check_device_state().
>>> + *
>>> + * @dev: The vhost device
>>> + *
>>> + * Returns true if the device supports these commands, and false if it
>>> + * does not.
>>> + */
>>> +bool vhost_supports_migratory_state(struct vhost_dev *dev);
>>> +
>>> +/**
>>> + * vhost_set_device_state_fd(): Begin transfer of internal state
>>> from/to
>>> + * the back-end for the purpose of migration. Data is to be
>>> transferred
>>> + * over a pipe according to @direction and @phase. The sending end
>>> must
>>> + * only write to the pipe, and the receiving end must only read
>>> from it.
>>> + * Once the sending end is done, it closes its FD. The receiving end
>>> + * must take this as the end-of-transfer signal and close its FD, too.
>>> + *
>>> + * @fd is the back-end's end of the pipe: The write FD for SAVE,
>>> and the
>>> + * read FD for LOAD. This function transfers ownership of @fd to the
>>> + * back-end, i.e. closes it in the front-end.
>>> + *
>>> + * The back-end may optionally reply with an FD of its own, if this
>>> + * improves efficiency on its end. In this case, the returned FD is
>>
>>
>> Hi Hanna,
>>
>> In what case/situation, the back-end will have a more efficient fd?
>
> Hi Hao,
>
> There is no example yet.
>
>> Here my understanding of this "FD of its own" is as same type as
>>
>> the given fd(e.g. both pipe files), why the fd from back-end makes
>>
>> difference? Do I miss anything here?
>
> Maybe it makes more sense in the context of how we came up with the
> idea: Specifically, Stefan and me were asking which end should provide
> the FD. In the context of vhost-user, it makes sense to have it be
> the front-end, because it controls vhost-user communication, but
> that’s just the natural protocol choice, not necessarily the most
> efficient one.
>
> It is imaginable that the front-end (e.g. qemu) could create a file
> descriptor whose data is directly spliced (automatically) into the
> migration stream, and then hand this FD to the back-end. In practice,
> this doesn’t work for qemu (at this point), because it doesn’t use
> simple read/write into the migration stream, but has an abstraction
> layer for that. It might be possible to make this work in some cases,
> depending on what is used as a transport, but (1) not generally, and
> (2) not now. But this would be efficient.
I'm thinking one thing, we now already have a channel(a unix domain
socket) between front-end and back-end, why not delivering the state
file (as an fd) to/from back-end directly rathen than negotiating
a new pipe as data channel.(since we already assume they can share fd of
pipe, why not fd of normal file?)
>
> The model we’d implement in qemu with this series is comparatively not
> efficient, because it manually copies data from the FD (which by
> default is a pipe) into the migration stream.
>
> But it is possible that the front-end can provide a zero-copy FD into
> the migration stream, and for that reason we want to allow the
> front-end to provide the transfer FD.
>
> In contrast, the back-end might have a more efficient implementation
> on its side, too, though. It is difficult to imagine, but it may be
> possible that it has an FD already where the data needs to written
> to/read from, e.g. because it’s connected to a physical device that
> wants to get/send its state this way. Admittedly, we have absolutely
> no concrete example for such a back-end at this point, but it’s hard
> to rule out that it is possible that there will be back-ends that
> could make use of zero-copy if only they are allowed to dictate the
> transfer FD.
>
> So because we in qemu can’t (at least not generally) provide an
> efficient (zero-copy) implementation, we don’t want to rule out that
> the back-end might be able to, so we also want to allow it to provide
> the transfer FD.
>
> In the end, we decided that we don’t want to preclude either side of
> providing the FD. If one side knows it can do better than a plain
> pipe with copying on both ends, it should provide the FD. That doesn’t
> complicate the implementation much.
>
> (So, notably, we measure “improves efficiency” based on “is it better
> than a plain pipe with copying on both ends”. A pipe with copying is
> the default implementation, but as Stefan has pointed out in his
> review, it doesn’t need to be a pipe. More efficient FDs, like the
> back-end can provide in its reply, would actually likely not be pipes.)
Yea, but the vhost-user protocol in this patchset defines these FDs as
data transfer channel not the data itself, how about the latter?
>
> Hanna
>
next prev parent reply other threads:[~2023-07-20 15:06 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-12 11:16 [PATCH v2 0/4] vhost-user: Back-end state migration Hanna Czenczek
2023-07-12 11:16 ` [PATCH v2 1/4] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
2023-07-18 15:57 ` Stefan Hajnoczi
2023-07-19 16:33 ` Hanna Czenczek
2023-07-20 11:43 ` Stefan Hajnoczi
2023-07-18 16:12 ` Stefan Hajnoczi
2023-07-20 11:32 ` [Virtio-fs] " Hanna Czenczek
2023-07-12 11:17 ` [PATCH v2 2/4] vhost-user: Interface for migration state transfer Hanna Czenczek
2023-07-18 18:32 ` Stefan Hajnoczi
2023-07-20 12:13 ` [Virtio-fs] " Hao Xu
2023-07-20 13:20 ` Hanna Czenczek
2023-07-20 15:05 ` Hao Xu [this message]
2023-07-21 8:07 ` Hanna Czenczek
2023-07-21 8:23 ` Hao Xu
2023-07-12 11:17 ` [PATCH v2 3/4] vhost: Add high-level state save/load functions Hanna Czenczek
2023-07-18 18:42 ` Stefan Hajnoczi
2023-07-20 11:42 ` Hanna Czenczek
2023-07-21 15:18 ` Eugenio Perez Martin
2023-07-21 16:09 ` Hanna Czenczek
2023-07-12 11:17 ` [PATCH v2 4/4] vhost-user-fs: Implement internal migration Hanna Czenczek
2023-07-18 18:44 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3d992ff-3d3d-1282-cdbc-f7ec1ad175cb@linux.dev \
--to=hao.xu@linux.dev \
--cc=eperezma@redhat.com \
--cc=hreitz@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).