qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Raphael Norwitz <raphael.s.norwitz@gmail.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Cc: mst@redhat.com, peterx@redhat.com, farosas@suse.de,
	raphael@enfabrica.net,  sgarzare@redhat.com,
	marcandre.lureau@redhat.com, pbonzini@redhat.com,
	 kwolf@redhat.com, hreitz@redhat.com, berrange@redhat.com,
	eblake@redhat.com,  armbru@redhat.com, qemu-devel@nongnu.org,
	qemu-block@nongnu.org,  steven.sistare@oracle.com,
	den-plotnikov@yandex-team.ru
Subject: Re: [PATCH 00/33] vhost-user-blk: live-backend local migration
Date: Thu, 9 Oct 2025 19:28:48 -0400	[thread overview]
Message-ID: <CAFubqFsANpc_v8Dvd4P=Swmei2P_Nt33+0eXn4UdC8-dcJCPfA@mail.gmail.com> (raw)
In-Reply-To: <aabfa3db-e434-4dde-b01e-b0195eb4adee@yandex-team.ru>

Thanks for the detailed response here, it does clear up the intent.

I agree it's much better to keep the management layer from having to
make API calls back and forth to the backend so that the migration
looks like a reconnect from the backend's perspective. I'm not totally
clear on the fundamental reason why the management layer would have to
call out to the backend, as opposed to having the vhost-user code in
the backend figure out that it's a local migration when the new
destination QEMU tries to connect and respond accordingly.

That said, I haven't followed the work here all that closely. If MST
or other maintainers have blessed this as the right way I'm ok with
it.

On Thu, Oct 9, 2025 at 6:43 PM Vladimir Sementsov-Ogievskiy
<vsementsov@yandex-team.ru> wrote:
>
> On 09.10.25 22:16, Raphael Norwitz wrote:
> > My apologies for the late review here. I appreciate the need to work
> > around these issues but I do feel the approach complicates Qemu
> > significantly and it may be possible to achieve similar results
> > managing state inside the backend. More comments inline.
> >
> > I like a lot of the cleanups here - maybe consider breaking out a
> > series with some of the cleanups?
>
> Of course, I thought about that too.
>
> >
> > On Wed, Aug 13, 2025 at 12:56 PM Vladimir Sementsov-Ogievskiy
> > <vsementsov@yandex-team.ru> wrote:
> >>
> >> Hi all!
> >>
> >> Local migration of vhost-user-blk requires non-trivial actions
> >> from management layer, it should provide a new connection for new
> >> QEMU process and handle disk operation movement from one connection
> >> to another.
> >>
> >> Such switching, including reinitialization of vhost-user connection,
> >> draining disk requests, etc, adds significant value to local migration
> >> downtime.
> >
> > I see how draining IO requests adds downtime and is impactful. That
> > said, we need to start-stop the device anyways
>
> No, with this series and new feature enabled we don't have this drain,
> see
>
>      if (dev->backend_transfer) {
>          return 0;
>      }
>
> at start of do_vhost_virtqueue_stop().
>
> > so I'm not convinced
> > that setting up mappings and sending messages back and forth are
> > impactful enough to warrant adding a whole new migration mode. Am I
> > missing anything here?
>
> In management layer we have to manage two end-points for remote
> disk, and accompany a safe switch from one to another. That's
> complicated and often long procedure, which contributes an
> average delay of 0.6 seconds, and (which is worse) ~2.4 seconds
> in p99.
>
> Of course, you may say "just rewrite your management layer to
> work better":) But that's not simple, and we came to idea, that
> we can do the whole local migration at QEMU side, not touching
> backend at all.
>
> The main benefit: fewer participants. We don't rely on management layer
> and vhost-user server to do proper things for migration. Backend even
> don't know, that QEMU is updated. This makes the whole process
> simpler and therefore safer.
>
> The disk service may also be temporarily down at some time, which of course has
> a bad effect on live migration and its freeze-time. We avoid this
> issue with my series (as we don't communicate to the backend in
> any way during migration, and disk service should not manage any
> endpoints switching)
>
> Note also, that my series is not a precedent in QEMU, and not a totally new
> mode.
>
> Steve Sistare works on the idea to pass backends through UNIX socket, and it
> is now merged as cpr-transfer and cpr-exec migration modes, and supports
> VFIO devices.
>
> So, my work shares this existing concept on vhost-user-blk and virtio-net,
> and may be used as part of cpr-transfer / cpr-exec, or in separate.
>
> >
> >>
> >> This all leads to an idea: why not to just pass all we need from
> >> old QEMU process to the new one (including open file descriptors),
> >> and don't touch the backend at all? This way, the vhost user backend
> >> server will not even know, that QEMU process is changed, as live
> >> vhost-user connection is migrated.
> >
> > Alternatively, if it really is about avoiding IO draining, what if
> > Qemu advertised a new vhost-user protocol feature which would query
> > whether the backend already has state for the device? Then, if the
> > backend indicates that it does, Qemu and the backend can take a
> > different path in vhost-user, exchanging relevant information,
> > including the descriptor indexes for the VQs such that draining can be
> > avoided. I expect that could be implemented to cut down a lot of the
> > other vhost-user overhead anyways (i.e. you could skip setting the
> > memory table). If nothing else it would probably help other device
> > types take advantage of this without adding more options to Qemu.
> >
>
> Hmm, if say only about draining, as I understand, the only thing we need
> is support migrating of "inflight region". This done in the series,
> and we are also preparing a separate feature to support migrating
> inflight region for remote migration.
>
> But, for local migration we want more: remove disk service from
> the process at all, to have a guaranteed small downtime for live updates.
> independent of any problems which may occur on disk service side.
>
> Why freeze-time is more sensitive for live-updates than for remote
> migration? Because we have to run a lot of live-update operations:
> simply update all the vms in the cloud to a new version. Remote
> migration happens much less frequently: when we need to move all
> vms from physical server to reboot it (or repair it, serve it, etc).
>
> So, I still believe, that migrating backend states through QEMU migration
> stream makes sense in general, and for vhost-user-blk it works well too.
>
>
> --
> Best regards,
> Vladimir


  reply	other threads:[~2025-10-09 23:29 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-13 16:48 [PATCH 00/33] vhost-user-blk: live-backend local migration Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 01/33] vhost: introduce vhost_ops->vhost_set_vring_enable_supported method Vladimir Sementsov-Ogievskiy
2025-10-09 18:56   ` Raphael Norwitz
2025-10-09 19:25     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 02/33] vhost: drop backend_features field Vladimir Sementsov-Ogievskiy
2025-09-12 14:39   ` Markus Armbruster
2025-10-09 18:57   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 03/33] vhost-user: introduce vhost_user_has_prot() helper Vladimir Sementsov-Ogievskiy
2025-10-09 18:57   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 04/33] vhost: move protocol_features to vhost_user Vladimir Sementsov-Ogievskiy
2025-10-09 18:57   ` Raphael Norwitz
2025-10-09 19:35     ` Vladimir Sementsov-Ogievskiy
2025-10-09 19:45       ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 05/33] vhost-user-gpu: drop code duplication Vladimir Sementsov-Ogievskiy
2025-08-18  6:54   ` Philippe Mathieu-Daudé
2025-10-09 18:58   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 06/33] vhost: make vhost_dev.features private Vladimir Sementsov-Ogievskiy
2025-10-09 18:58   ` Raphael Norwitz
2025-10-09 19:40     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 07/33] virtio: move common part of _set_guest_notifier to generic code Vladimir Sementsov-Ogievskiy
2025-08-14  4:53   ` Philippe Mathieu-Daudé
2025-08-14 11:15     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 08/33] virtio: drop *_set_guest_notifier_fd_handler() helpers Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 09/33] vhost-user: keep QIOChannelSocket for backend channel Vladimir Sementsov-Ogievskiy
2025-10-09 18:58   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 10/33] vhost: vhost_virtqueue_start(): fix failure path Vladimir Sementsov-Ogievskiy
2025-10-09 19:00   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 11/33] vhost: make vhost_memory_unmap() null-safe Vladimir Sementsov-Ogievskiy
2025-10-09 19:00   ` Raphael Norwitz
2025-10-09 20:00     ` Vladimir Sementsov-Ogievskiy
2025-10-11 19:10       ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 12/33] vhost: simplify calls to vhost_memory_unmap() Vladimir Sementsov-Ogievskiy
2025-10-09 19:00   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 13/33] vhost: move vrings mapping to the top of vhost_virtqueue_start() Vladimir Sementsov-Ogievskiy
2025-10-09 19:01   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 14/33] vhost: vhost_virtqueue_start(): drop extra local variables Vladimir Sementsov-Ogievskiy
2025-10-09 19:02   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 15/33] vhost: final refactoring of vhost vrings map/unmap Vladimir Sementsov-Ogievskiy
2025-10-09 19:02   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 16/33] vhost: simplify vhost_dev_init() error-path Vladimir Sementsov-Ogievskiy
2025-10-09 19:04   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 17/33] vhost: move busyloop timeout initialization to vhost_virtqueue_init() Vladimir Sementsov-Ogievskiy
2025-10-09 19:04   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 18/33] vhost: introduce check_memslots() helper Vladimir Sementsov-Ogievskiy
2025-10-09 19:06   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 19/33] vhost: vhost_dev_init(): drop extra features variable Vladimir Sementsov-Ogievskiy
2025-10-09 19:06   ` Raphael Norwitz
2025-10-09 20:15     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 20/33] hw/virtio/virtio-bus: refactor virtio_bus_set_host_notifier() Vladimir Sementsov-Ogievskiy
2025-08-14  6:00   ` Philippe Mathieu-Daudé
2025-10-09 19:07   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 21/33] vhost-user: make trace events more readable Vladimir Sementsov-Ogievskiy
2025-08-14  5:59   ` Philippe Mathieu-Daudé
2025-10-09 19:07   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 22/33] vhost-user-blk: add some useful trace-points Vladimir Sementsov-Ogievskiy
2025-08-14  4:58   ` Philippe Mathieu-Daudé
2025-08-14 11:14     ` Vladimir Sementsov-Ogievskiy
2025-10-09 19:07   ` Raphael Norwitz
2025-10-09 20:19     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 23/33] vhost: " Vladimir Sementsov-Ogievskiy
2025-10-09 19:08   ` Raphael Norwitz
2025-10-09 20:20     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 24/33] chardev-add: support local migration Vladimir Sementsov-Ogievskiy
2025-09-12 14:56   ` Markus Armbruster
2025-09-12 15:04     ` Vladimir Sementsov-Ogievskiy
2025-09-12 15:24     ` Steven Sistare
2025-09-15 13:28       ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 25/33] virtio: introduce .skip_vhost_migration_log() handler Vladimir Sementsov-Ogievskiy
2025-10-09 19:08   ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 26/33] io/channel-socket: introduce qio_channel_socket_keep_nonblock() Vladimir Sementsov-Ogievskiy
2025-08-20 13:27   ` Peter Xu
2025-08-20 13:43     ` Daniel P. Berrangé
2025-08-20 14:37       ` Peter Xu
2025-08-20 14:42         ` Daniel P. Berrangé
2025-08-21 12:07       ` Vladimir Sementsov-Ogievskiy
2025-08-21 13:45         ` Peter Xu
2025-08-21 14:11           ` Daniel P. Berrangé
2025-08-20 13:37   ` Daniel P. Berrangé
2025-08-21 12:08     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 27/33] migration/socket: keep fds non-block Vladimir Sementsov-Ogievskiy
2025-08-20 13:30   ` Peter Xu
2025-08-21 12:15     ` Vladimir Sementsov-Ogievskiy
2025-08-21 13:49       ` Peter Xu
2025-08-13 16:48 ` [PATCH 28/33] vhost: introduce backend migration Vladimir Sementsov-Ogievskiy
2025-10-09 19:09   ` Raphael Norwitz
2025-10-09 20:51     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 29/33] vhost-user: support " Vladimir Sementsov-Ogievskiy
2025-10-09 19:09   ` Raphael Norwitz
2025-10-09 20:54     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 30/33] virtio: support vhost " Vladimir Sementsov-Ogievskiy
2025-10-09 19:09   ` Raphael Norwitz
2025-10-09 20:59     ` Vladimir Sementsov-Ogievskiy
2025-08-13 16:48 ` [PATCH 31/33] vhost-user-blk: " Vladimir Sementsov-Ogievskiy
2025-10-09 19:09   ` Raphael Norwitz
2025-10-09 21:14     ` Vladimir Sementsov-Ogievskiy
2025-10-09 23:43       ` Raphael Norwitz
2025-10-10  6:27         ` Vladimir Sementsov-Ogievskiy
2025-10-13 21:50           ` Raphael Norwitz
2025-08-13 16:48 ` [PATCH 32/33] test/functional: exec_command_and_wait_for_pattern: add vm arg Vladimir Sementsov-Ogievskiy
2025-08-14  5:01   ` Philippe Mathieu-Daudé
2025-08-18  6:55   ` Thomas Huth
2025-08-13 16:48 ` [PATCH 33/33] tests/functional: add test_x86_64_vhost_user_blk_fd_migration.py Vladimir Sementsov-Ogievskiy
2025-10-09 19:16 ` [PATCH 00/33] vhost-user-blk: live-backend local migration Raphael Norwitz
2025-10-09 22:43   ` Vladimir Sementsov-Ogievskiy
2025-10-09 23:28     ` Raphael Norwitz [this message]
2025-10-10  8:47       ` Vladimir Sementsov-Ogievskiy
2025-10-13 21:41         ` Raphael Norwitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFubqFsANpc_v8Dvd4P=Swmei2P_Nt33+0eXn4UdC8-dcJCPfA@mail.gmail.com' \
    --to=raphael.s.norwitz@gmail.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=den-plotnikov@yandex-team.ru \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael@enfabrica.net \
    --cc=sgarzare@redhat.com \
    --cc=steven.sistare@oracle.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).