From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 30197EB64DC for ; Thu, 20 Jul 2023 15:06:40 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qMVDw-0004EW-Bp; Thu, 20 Jul 2023 11:05:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qMVDt-0004E0-Qc for qemu-devel@nongnu.org; Thu, 20 Jul 2023 11:05:45 -0400 Received: from out-13.mta0.migadu.com ([91.218.175.13]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qMVDp-0003OU-V5 for qemu-devel@nongnu.org; Thu, 20 Jul 2023 11:05:45 -0400 Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1689865538; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eWlUg67Z3Vsn5yQ9TKIdMGFUVfBzXVZ37bN0NECMfDc=; b=Hvw22b38iCSkWA4pAsMWQ57YTqkN3/O/HRWSeJ0djGJXLhkm56t4Hb5ZdNZpTo4xW3NV+c NGlAV+1Zn6/xFJbthkNMMf6cs02BEhqFw8CwfFhiobiua/aVt0jS9k+ms1kziDcqtSfWbS DzMHmvAkCVYX+Y35NSP/p646BW4Xfxo= Date: Thu, 20 Jul 2023 23:05:24 +0800 MIME-Version: 1.0 Subject: Re: [Virtio-fs] [PATCH v2 2/4] vhost-user: Interface for migration state transfer Content-Language: en-US To: Hanna Czenczek , qemu-devel@nongnu.org, virtio-fs@redhat.com Cc: =?UTF-8?Q?Eugenio_P=c3=a9rez?= , Stefan Hajnoczi , "Michael S . Tsirkin" References: <20230712111703.28031-1-hreitz@redhat.com> <20230712111703.28031-3-hreitz@redhat.com> <14677814-9707-6a08-7b07-4532fad5f7a1@redhat.com> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Xu In-Reply-To: <14677814-9707-6a08-7b07-4532fad5f7a1@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Received-SPF: pass client-ip=91.218.175.13; envelope-from=hao.xu@linux.dev; helo=out-13.mta0.migadu.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On 7/20/23 21:20, Hanna Czenczek wrote: > On 20.07.23 14:13, Hao Xu wrote: >> >> On 7/12/23 19:17, Hanna Czenczek wrote: >>> Add the interface for transferring the back-end's state during >>> migration >>> as defined previously in vhost-user.rst. >>> >>> Signed-off-by: Hanna Czenczek >>> --- >>>   include/hw/virtio/vhost-backend.h |  24 +++++ >>>   include/hw/virtio/vhost.h         |  79 ++++++++++++++++ >>>   hw/virtio/vhost-user.c            | 147 >>> ++++++++++++++++++++++++++++++ >>>   hw/virtio/vhost.c                 |  37 ++++++++ >>>   4 files changed, 287 insertions(+) >>> >>> diff --git a/include/hw/virtio/vhost-backend.h >>> b/include/hw/virtio/vhost-backend.h >>> index 31a251a9f5..e59d0b53f8 100644 >>> --- a/include/hw/virtio/vhost-backend.h >>> +++ b/include/hw/virtio/vhost-backend.h >>> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType { >>>       VHOST_SET_CONFIG_TYPE_MIGRATION = 1, >>>   } VhostSetConfigType; >>>   +typedef enum VhostDeviceStateDirection { >>> +    /* Transfer state from back-end (device) to front-end */ >>> +    VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0, >>> +    /* Transfer state from front-end to back-end (device) */ >>> +    VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1, >>> +} VhostDeviceStateDirection; >>> + >>> +typedef enum VhostDeviceStatePhase { >>> +    /* The device (and all its vrings) is stopped */ >>> +    VHOST_TRANSFER_STATE_PHASE_STOPPED = 0, >>> +} VhostDeviceStatePhase; >>> + >>>   struct vhost_inflight; >>>   struct vhost_dev; >>>   struct vhost_log; >>> @@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct >>> vhost_dev *dev, >>>     typedef void (*vhost_reset_status_op)(struct vhost_dev *dev); >>>   +typedef bool (*vhost_supports_migratory_state_op)(struct >>> vhost_dev *dev); >>> +typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev, >>> + VhostDeviceStateDirection direction, >>> + VhostDeviceStatePhase phase, >>> +                                            int fd, >>> +                                            int *reply_fd, >>> +                                            Error **errp); >>> +typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, >>> Error **errp); >>> + >>>   typedef struct VhostOps { >>>       VhostBackendType backend_type; >>>       vhost_backend_init vhost_backend_init; >>> @@ -181,6 +202,9 @@ typedef struct VhostOps { >>>       vhost_force_iommu_op vhost_force_iommu; >>>       vhost_set_config_call_op vhost_set_config_call; >>>       vhost_reset_status_op vhost_reset_status; >>> +    vhost_supports_migratory_state_op vhost_supports_migratory_state; >>> +    vhost_set_device_state_fd_op vhost_set_device_state_fd; >>> +    vhost_check_device_state_op vhost_check_device_state; >>>   } VhostOps; >>>     int vhost_backend_update_device_iotlb(struct vhost_dev *dev, >>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >>> index 69bf59d630..d8877496e5 100644 >>> --- a/include/hw/virtio/vhost.h >>> +++ b/include/hw/virtio/vhost.h >>> @@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev, >>>   int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t >>> queue_size, >>>                              struct vhost_inflight *inflight); >>>   bool vhost_dev_has_iommu(struct vhost_dev *dev); >>> + >>> +/** >>> + * vhost_supports_migratory_state(): Checks whether the back-end >>> + * supports transferring internal state for the purpose of migration. >>> + * Support for this feature is required for >>> vhost_set_device_state_fd() >>> + * and vhost_check_device_state(). >>> + * >>> + * @dev: The vhost device >>> + * >>> + * Returns true if the device supports these commands, and false if it >>> + * does not. >>> + */ >>> +bool vhost_supports_migratory_state(struct vhost_dev *dev); >>> + >>> +/** >>> + * vhost_set_device_state_fd(): Begin transfer of internal state >>> from/to >>> + * the back-end for the purpose of migration.  Data is to be >>> transferred >>> + * over a pipe according to @direction and @phase.  The sending end >>> must >>> + * only write to the pipe, and the receiving end must only read >>> from it. >>> + * Once the sending end is done, it closes its FD.  The receiving end >>> + * must take this as the end-of-transfer signal and close its FD, too. >>> + * >>> + * @fd is the back-end's end of the pipe: The write FD for SAVE, >>> and the >>> + * read FD for LOAD.  This function transfers ownership of @fd to the >>> + * back-end, i.e. closes it in the front-end. >>> + * >>> + * The back-end may optionally reply with an FD of its own, if this >>> + * improves efficiency on its end.  In this case, the returned FD is >> >> >> Hi Hanna, >> >> In what case/situation, the back-end will have a more efficient fd? > > Hi Hao, > > There is no example yet. > >> Here my understanding of this "FD of its own" is as same type as >> >> the given fd(e.g. both pipe files), why the fd from back-end makes >> >> difference? Do I miss anything here? > > Maybe it makes more sense in the context of how we came up with the > idea: Specifically, Stefan and me were asking which end should provide > the FD.  In the context of vhost-user, it makes sense to have it be > the front-end, because it controls vhost-user communication, but > that’s just the natural protocol choice, not necessarily the most > efficient one. > > It is imaginable that the front-end (e.g. qemu) could create a file > descriptor whose data is directly spliced (automatically) into the > migration stream, and then hand this FD to the back-end. In practice, > this doesn’t work for qemu (at this point), because it doesn’t use > simple read/write into the migration stream, but has an abstraction > layer for that.  It might be possible to make this work in some cases, > depending on what is used as a transport, but (1) not generally, and > (2) not now.  But this would be efficient. I'm thinking one thing, we now already have a channel(a unix domain socket) between front-end and back-end, why not delivering the state file (as an fd) to/from back-end directly rathen than negotiating a new pipe as data channel.(since we already assume they can share fd of pipe, why not fd of normal file?) > > The model we’d implement in qemu with this series is comparatively not > efficient, because it manually copies data from the FD (which by > default is a pipe) into the migration stream. > > But it is possible that the front-end can provide a zero-copy FD into > the migration stream, and for that reason we want to allow the > front-end to provide the transfer FD. > > In contrast, the back-end might have a more efficient implementation > on its side, too, though.  It is difficult to imagine, but it may be > possible that it has an FD already where the data needs to written > to/read from, e.g. because it’s connected to a physical device that > wants to get/send its state this way.  Admittedly, we have absolutely > no concrete example for such a back-end at this point, but it’s hard > to rule out that it is possible that there will be back-ends that > could make use of zero-copy if only they are allowed to dictate the > transfer FD. > > So because we in qemu can’t (at least not generally) provide an > efficient (zero-copy) implementation, we don’t want to rule out that > the back-end might be able to, so we also want to allow it to provide > the transfer FD. > > In the end, we decided that we don’t want to preclude either side of > providing the FD.  If one side knows it can do better than a plain > pipe with copying on both ends, it should provide the FD. That doesn’t > complicate the implementation much. > > (So, notably, we measure “improves efficiency” based on “is it better > than a plain pipe with copying on both ends”.  A pipe with copying is > the default implementation, but as Stefan has pointed out in his > review, it doesn’t need to be a pipe.  More efficient FDs, like the > back-end can provide in its reply, would actually likely not be pipes.) Yea, but the vhost-user protocol in this patchset defines these FDs as data transfer channel not the data itself, how about the latter? > > Hanna >