From: Eugenio Perez Martin <eperezma@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>,
	qemu-devel@nongnu.org,  "Michael S. Tsirkin" <mst@redhat.com>,
	si-wei.liu@oracle.com, Lei Yang <leiyang@redhat.com>,
	 Dragos Tatulea <dtatulea@nvidia.com>,
	Zhu Lingshan <lingshan.zhu@intel.com>,
	 Parav Pandit <parav@mellanox.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	 Laurent Vivier <lvivier@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH for 9.0 08/12] vdpa: add vhost_vdpa_load_setup
Date: Wed, 3 Jan 2024 12:11:19 +0100	[thread overview]
Message-ID: <CAJaqyWfMEeg6FVhyFTVEest1eZXEwMiyib47Z8+BUGCaWkfH3w@mail.gmail.com> (raw)
In-Reply-To: <ZZT7wuq-_IhfN_wR@x1n>
On Wed, Jan 3, 2024 at 7:16 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Jan 02, 2024 at 12:28:48PM +0100, Eugenio Perez Martin wrote:
> > On Tue, Jan 2, 2024 at 6:33 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Jason, Eugenio,
> > >
> > > Apologies for a late reply; just back from the long holiday.
> > >
> > > On Thu, Dec 21, 2023 at 09:20:40AM +0100, Eugenio Perez Martin wrote:
> > > > Si-Wei did the actual profiling as he is the one with the 128G guests,
> > > > but most of the time was spent in the memory pinning. Si-Wei, please
> > > > correct me if I'm wrong.
> > >
> > > IIUC we're talking about no-vIOMMU use case.  The pinning should indeed
> > > take a lot of time if it's similar to what VFIO does.
> > >
> > > >
> > > > I didn't check VFIO, but I think it just maps at realize phase with
> > > > vfio_realize -> vfio_attach_device -> vfio_connect_container(). In
> > > > previous testings, this delayed the VM initialization by a lot, as
> > > > we're moving that 20s of blocking to every VM start.
> > > >
> > > > Investigating a way to do it only in the case of being the destination
> > > > of a live migration, I think the right place is .load_setup migration
> > > > handler. But I'm ok to move it for sure.
> > >
> > > If it's destined to map the 128G, it does sound sensible to me to do it
> > > when VM starts, rather than anytime afterwards.
> > >
> >
> > Just for completion, it is not 100% sure the driver will start the
> > device. But it is likely for sure.
>
> My understanding is that vDPA is still a quite special device, assuming
> only targeting advanced users, and should not appear in a default config
> for anyone.  It means the user should hopefully remove the device if the
> guest is not using it, instead of worrying on a slow boot.
>
> >
> > > Could anyone help to explain what's the problem if vDPA maps 128G at VM
> > > init just like what VFIO does?
> > >
> >
> > The main problem was the delay of VM start. In the master branch, the
> > pinning is done when the driver starts the device. While it takes the
> > BQL, the rest of the vCPUs can move work forward while the host is
> > pinning. So the impact of it is not so evident.
> >
> > To move it to initialization time made it very noticeable. To make
> > things worse, QEMU did not respond to QMP commands and similar. That's
> > why it was done only if the VM was the destination of a LM.
>
> Is that a major issue for us?
To me it is a regression but I'm ok with it for sure.
>  IIUC then VFIO shares the same condition.
> If it's a real problem, do we want to have a solution that works for both
> (or, is it possible)?
>
I would not consider a regression for VFIO since I think it has
behaved that way from the beginning. But yes, I'm all in to find a
common solution.
> >
> > However, we've added the memory map thread in this version, so this
> > might not be a problem anymore. We could move the spawn of the thread
> > to initialization time.
> >
> > But how to undo this pinning in the case the guest does not start the
> > device? In this series, this is done at the destination with
> > vhost_vdpa_load_cleanup. Or is it ok to just keep the memory mapped as
> > long as QEMU has the vDPA device?
>
> I think even if vDPA decides to use a thread, we should keep the same
> behavior before/after the migration.  Having assymetric behavior over DMA
> from the assigned HWs might have unpredictable implications.
>
> What I worry is we may over-optimize / over-engineer the case where the
> user will specify the vDPA device but not use it, as I mentioned above.
>
I agree with all of the above. If it is ok to keep memory mapped while
the guest has not started I think we can move the spawn of the thread,
or even just the map write itself, to the vdpa init.
> For the long term, maybe there's chance to optimize DMA pinning for both
> vdpa/vfio use cases, then we can always pin them during VM starts? Assuming
> that issue only exists for large VMs, while they should normally be good
> candidates for huge pages already.  Then, it means maybe one folio/page can
> cover a large range (e.g. 1G on x86_64) in one pin, and physical continuity
> also provides possibility of IOMMU large page mappings.  I didn't check at
> which stage we are for VFIO on this, Alex may know better.
Sounds interesting, and I think it should be implemented. Thanks for
the pointer!
> I'm copying Alex
> anyway since the problem seems to be a common one already, so maybe he has
> some thoughts.
>
Appreciated :).
Thanks!
next prev parent reply	other threads:[~2024-01-03 11:12 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15 17:28 [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration Eugenio Pérez
2023-12-15 17:28 ` [PATCH for 9.0 01/12] vdpa: do not set virtio status bits if unneeded Eugenio Pérez
2023-12-20  4:25   ` Jason Wang
2023-12-15 17:28 ` [PATCH for 9.0 02/12] vdpa: make batch_begin_once early return Eugenio Pérez
2023-12-20  4:27   ` Jason Wang
2023-12-15 17:28 ` [PATCH for 9.0 03/12] vdpa: merge _begin_batch into _batch_begin_once Eugenio Pérez
2023-12-20  4:30   ` Jason Wang
2023-12-15 17:28 ` [PATCH for 9.0 04/12] vdpa: extract out _dma_end_batch from _listener_commit Eugenio Pérez
2023-12-20  4:31   ` Jason Wang
2023-12-15 17:28 ` [PATCH for 9.0 05/12] vdpa: factor out stop path of vhost_vdpa_dev_start Eugenio Pérez
2023-12-20  4:31   ` Jason Wang
2023-12-15 17:28 ` [PATCH for 9.0 06/12] vdpa: check for iova tree initialized at net_client_start Eugenio Pérez
2023-12-15 17:28 ` [PATCH for 9.0 07/12] vdpa: set backend capabilities at vhost_vdpa_init Eugenio Pérez
2023-12-20  4:34   ` Jason Wang
2023-12-20  7:07     ` Eugenio Perez Martin
2023-12-21  3:39       ` Jason Wang
2023-12-15 17:28 ` [PATCH for 9.0 08/12] vdpa: add vhost_vdpa_load_setup Eugenio Pérez
2023-12-20  5:21   ` Jason Wang
2023-12-20  7:06     ` Eugenio Perez Martin
2023-12-21  2:17       ` Jason Wang
2023-12-21  8:20         ` Eugenio Perez Martin
2024-01-02  5:33           ` Peter Xu
2024-01-02 11:28             ` Eugenio Perez Martin
2024-01-03  6:16               ` Peter Xu
2024-01-03 11:11                 ` Eugenio Perez Martin [this message]
2024-01-04  6:46                   ` Peter Xu
2023-12-15 17:28 ` [PATCH for 9.0 09/12] vdpa: approve switchover after memory map in the migration destination Eugenio Pérez
2023-12-15 17:28 ` [PATCH for 9.0 10/12] vdpa: add vhost_vdpa_net_load_setup NetClient callback Eugenio Pérez
2023-12-15 17:28 ` [PATCH for 9.0 11/12] vdpa: add vhost_vdpa_net_switchover_ack_needed Eugenio Pérez
2023-12-15 17:28 ` [PATCH for 9.0 12/12] virtio_net: register incremental migration handlers Eugenio Pérez
2023-12-25  1:41 ` [PATCH for 9.0 00/12] Map memory at destination .load_setup in vDPA-net migration Lei Yang
2023-12-25 16:30 ` Michael S. Tsirkin
2024-01-02 14:38   ` Eugenio Perez Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=CAJaqyWfMEeg6FVhyFTVEest1eZXEwMiyib47Z8+BUGCaWkfH3w@mail.gmail.com \
    --to=eperezma@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=dtatulea@nvidia.com \
    --cc=jasowang@redhat.com \
    --cc=leiyang@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=lvivier@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@mellanox.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=si-wei.liu@oracle.com \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).