qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Si-Wei Liu <si-wei.liu@oracle.com>
To: "Eugenio Pérez" <eperezma@redhat.com>, qemu-devel@nongnu.org
Cc: Shannon <shannon.nelson@amd.com>,
	Parav Pandit <parav@mellanox.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	yin31149@gmail.com, Jason Wang <jasowang@redhat.com>,
	Yajun Wu <yajunw@nvidia.com>,
	Zhu Lingshan <lingshan.zhu@intel.com>,
	Lei Yang <leiyang@redhat.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	Juan Quintela <quintela@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Gautam Dawar <gdawar@xilinx.com>
Subject: Re: [RFC PATCH 00/18] Map memory at destination .load_setup in vDPA-net migration
Date: Thu, 2 Nov 2023 03:12:42 -0700	[thread overview]
Message-ID: <1e565b27-2897-4063-8eff-e42cfd5934c2@oracle.com> (raw)
In-Reply-To: <20231019143455.2377694-1-eperezma@redhat.com>



On 10/19/2023 7:34 AM, Eugenio Pérez wrote:
> Current memory operations like pinning may take a lot of time at the
>
> destination.  Currently they are done after the source of the migration is
>
> stopped, and before the workload is resumed at the destination.  This is a
>
> period where neigher traffic can flow, nor the VM workload can continue
>
> (downtime).
>
>
>
> We can do better as we know the memory layout of the guest RAM at the
>
> destination from the moment the migration starts.  Moving that operation allows
>
> QEMU to communicate the kernel the maps while the workload is still running in
>
> the source, so Linux can start mapping them.  Ideally, all IOMMU is configured,
>
> but if the vDPA parent driver uses on-chip IOMMU and .set_map we're still
>
> saving all the pinning time.
I get what you want to say, though not sure how pinning is relevant to 
on-chip IOMMU and .set_map here, essentially pinning is required for all 
parent vdpa drivers that perform DMA hence don't want VM pages to move 
around.
>
>
>
> Note that further devices setup at the end of the migration may alter the guest
>
> memory layout. But same as the previous point, many operations are still done
>
> incrementally, like memory pinning, so we're saving time anyway.
>
>
>
> The first bunch of patches just reorganizes the code, so memory related
>
> operation parameters are shared between all vhost_vdpa devices.  This is
>
> because the destination does not know what vhost_vdpa struct will have the
>
> registered listener member, so it is easier to place them in a shared struct
>
> rather to keep them in vhost_vdpa struct.  Future version may squash or omit
>
> these patches.
It looks this VhostVDPAShared facility (patch 1-13) is also what I need 
in my SVQ descriptor group series [*], for which I've built similar 
construct there. If possible please try to merge this in ASAP. I'll 
rework my series on top of that.

[*] 
https://github.com/siwliu-kernel/qemu/commit/813518354af5ee8a6e867b2bf7dff3d6004fbcd5

>
>
>
> Only tested with vdpa_sim. I'm sending this before full benchmark, as some work
>
> like [1] can be based on it, and Si-Wei agreed on benchmark this series with
>
> his experience.
Haven't done the full benchmark compared to pre-map at destination yet, 
though an observation is that the destination QEMU seems very easy to 
get stuck for very long time while in mid of pinning pages. During this 
period, any client doing read-only QMP query or executing HMP info 
command got frozen indefinitely (subject to how large size the memory is 
being pinned). Is it possible to unblock those QMP request or HMP 
command from being executed (at least the read-only ones) while in 
migration? Yield from the load_setup corourtine and spawn another thread?

Having said, not sure if .load_setup is a good fit for what we want to 
do. Searching all current users of .load_setup, either the job can be 
done instantly or the task is time bound without trapping into kernel 
for too long. Maybe pinning is too special use case here...

-Siwei
>
>
>
> Future directions on top of this series may include:
>
> * Iterative migration of virtio-net devices, as it may reduce downtime per [1].
>
>    vhost-vdpa net can apply the configuration through CVQ in the destination
>
>    while the source is still migrating.
>
> * Move more things ahead of migration time, like DRIVER_OK.
>
> * Check that the devices of the destination are valid, and cancel the migration
>
>    in case it is not.
>
>
>
> [1] https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566f61@nvidia.com/T/
>
>
>
> Eugenio Pérez (18):
>
>    vdpa: add VhostVDPAShared
>
>    vdpa: move iova tree to the shared struct
>
>    vdpa: move iova_range to vhost_vdpa_shared
>
>    vdpa: move shadow_data to vhost_vdpa_shared
>
>    vdpa: use vdpa shared for tracing
>
>    vdpa: move file descriptor to vhost_vdpa_shared
>
>    vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared
>
>    vdpa: move backend_cap to vhost_vdpa_shared
>
>    vdpa: remove msg type of vhost_vdpa
>
>    vdpa: move iommu_list to vhost_vdpa_shared
>
>    vdpa: use VhostVDPAShared in vdpa_dma_map and unmap
>
>    vdpa: use dev_shared in vdpa_iommu
>
>    vdpa: move memory listener to vhost_vdpa_shared
>
>    vdpa: do not set virtio status bits if unneeded
>
>    vdpa: add vhost_vdpa_load_setup
>
>    vdpa: add vhost_vdpa_net_load_setup NetClient callback
>
>    vdpa: use shadow_data instead of first device v->shadow_vqs_enabled
>
>    virtio_net: register incremental migration handlers
>
>
>
>   include/hw/virtio/vhost-vdpa.h |  43 +++++---
>
>   include/net/net.h              |   4 +
>
>   hw/net/virtio-net.c            |  23 +++++
>
>   hw/virtio/vdpa-dev.c           |   7 +-
>
>   hw/virtio/vhost-vdpa.c         | 183 ++++++++++++++++++---------------
>
>   net/vhost-vdpa.c               | 127 ++++++++++++-----------
>
>   hw/virtio/trace-events         |  14 +--
>
>   7 files changed, 239 insertions(+), 162 deletions(-)
>
>
>



  parent reply	other threads:[~2023-11-02 10:13 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-19 14:34 [RFC PATCH 00/18] Map memory at destination .load_setup in vDPA-net migration Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 01/18] vdpa: add VhostVDPAShared Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 02/18] vdpa: move iova tree to the shared struct Eugenio Pérez
2023-11-02  9:36   ` Si-Wei Liu
2023-11-24 17:11     ` Eugenio Perez Martin
2023-10-19 14:34 ` [RFC PATCH 03/18] vdpa: move iova_range to vhost_vdpa_shared Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 04/18] vdpa: move shadow_data " Eugenio Pérez
2023-11-02  8:47   ` Si-Wei Liu
2023-11-02 15:45     ` Eugenio Perez Martin
2023-10-19 14:34 ` [RFC PATCH 05/18] vdpa: use vdpa shared for tracing Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 06/18] vdpa: move file descriptor to vhost_vdpa_shared Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 07/18] vdpa: move iotlb_batch_begin_sent " Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 08/18] vdpa: move backend_cap " Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 09/18] vdpa: remove msg type of vhost_vdpa Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 10/18] vdpa: move iommu_list to vhost_vdpa_shared Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 11/18] vdpa: use VhostVDPAShared in vdpa_dma_map and unmap Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 12/18] vdpa: use dev_shared in vdpa_iommu Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 13/18] vdpa: move memory listener to vhost_vdpa_shared Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 14/18] vdpa: do not set virtio status bits if unneeded Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 15/18] vdpa: add vhost_vdpa_load_setup Eugenio Pérez
2023-11-02  8:48   ` Si-Wei Liu
2023-11-02 15:24     ` Eugenio Perez Martin
2023-10-19 14:34 ` [RFC PATCH 16/18] vdpa: add vhost_vdpa_net_load_setup NetClient callback Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 17/18] vdpa: use shadow_data instead of first device v->shadow_vqs_enabled Eugenio Pérez
2023-10-19 14:34 ` [RFC PATCH 18/18] virtio_net: register incremental migration handlers Eugenio Pérez
2023-11-02  4:36 ` [RFC PATCH 00/18] Map memory at destination .load_setup in vDPA-net migration Jason Wang
2023-11-02 10:12 ` Si-Wei Liu [this message]
2023-11-02 12:37   ` Eugenio Perez Martin
2023-11-03 20:19     ` Si-Wei Liu
2023-12-05 14:23       ` Eugenio Perez Martin
2023-12-06  0:36         ` Si-Wei Liu
2023-11-06  4:17     ` Jason Wang
2023-11-03 20:40   ` Si-Wei Liu
2023-11-06  9:04     ` Eugenio Perez Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e565b27-2897-4063-8eff-e42cfd5934c2@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=dtatulea@nvidia.com \
    --cc=eperezma@redhat.com \
    --cc=gdawar@xilinx.com \
    --cc=jasowang@redhat.com \
    --cc=leiyang@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=lvivier@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@mellanox.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=shannon.nelson@amd.com \
    --cc=yajunw@nvidia.com \
    --cc=yin31149@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).