From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org,
"Dr . David Alan Gilbert" <dave@treblig.org>,
"Kevin Wolf" <kwolf@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Daniel P . Berrangé" <berrange@redhat.com>,
"Hailiang Zhang" <zhanghailiang@xfusion.com>,
"Yury Kotov" <yury-kotov@yandex-team.ru>,
"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
"Prasad Pandit" <ppandit@redhat.com>,
"Zhang Chen" <zhangckid@gmail.com>,
"Li Zhijian" <lizhijian@fujitsu.com>,
"Juraj Marcin" <jmarcin@redhat.com>
Subject: Re: [PATCH RFC 0/9] migration: Threadify loadvm process
Date: Thu, 9 Oct 2025 12:58:15 -0400 [thread overview]
Message-ID: <aOfpp41s5rFt-Qzk@x1.local> (raw)
In-Reply-To: <87zfau13sk.fsf@suse.de>
On Tue, Sep 16, 2025 at 06:32:59PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > [this is an early RFC, not for merge, but to collect initial feedbacks]
> >
> > Background
> > ==========
> >
> > Nowadays, live migration heavily depends on threads. For example, most of
> > the major features that will be used nowadays in live migration (multifd,
> > postcopy, mapped-ram, vfio, etc.) all work with threads internally.
> >
> > But still, from time to time, we'll see some coroutines floating around the
> > migration context. The major one is precopy's loadvm, which is internally
> > a coroutine. It is still a critical path that any live migration depends on.
> >
>
> I always wanted to be an archaeologist:
>
> https://lists.gnu.org/archive/html/qemu-devel//2012-08/msg01136.html
>
> I was expecting to find some complicated chain of events leading to the
> choice of using a coroutine, but no.
I actually didn't see that previously.. I'll add this link into that major
patch commit message, to make future archaeology work easier.
>
> > A mixture of using both coroutines and threads is prone to issues. Some
> > examples can refer to commit e65cec5e5d ("migration/ram: Yield periodically
> > to the main loop") or commit 7afbdada7e ("migration/postcopy: ensure
> > preempt channel is ready before loading states").
> >
> > Overview
> > ========
> >
> > This series tries to move migration further into the thread-based model, by
> > allowing the loadvm process to happen in a thread rather than in the main
> > thread with a coroutine.
> >
> > Luckily, since the qio channel code is always ready for both cases, IO
> > paths should all be fine.
> >
> > Note that loadvm for postcopy already happens in a ram load thread which is
> > separate. However, RAM is just the simple case here, even it has its own
> > challenges (on atomically update of the pgtables), its complexity lies in
> > the kernel.
> >
> > For precopy, loadvm has quite a few operations that will need BQL. The
> > question is we can't take BQL for the whole process of loadvm, because
> > that'll block the main thread from executions (e.g. QMP hangs). Here, the
> > finer granule we can push BQL the better. This series so far chose
> > somewhere in the middle, by taking BQL on majorly these two places:
> >
> > - CPU synchronizations
> > - Device START/FULL sections
> >
> > After this series applied, most of the rest loadvm path will run without
> > BQL anymore. There is a more detailed discussion / todo in the commit
> > message of patch "migration: Thread-ify precopy vmstate load process"
> > explaning how to further split the BQL critical sections.
> >
> > I was trying to split the patches into smaller ones if possible, but it's
> > still quite challenging so there's one major patch that does the work.
> >
> > After the series applied, the only leftover pieces in migration/ that would
> > use a coroutine is snapshot save/load/delete jobs.
> >
>
> Which are then fine because the work itself runs on the main loop,
> right? So the bottom-half scheduling could be left as a coroutine.
Correct, iochannel works for both cases.
For coroutines, it can properly register the fd and yield like before for
snapshot save/load. It used to do the same for live loadvm, but now after
moving to a thread it will start to use qio_channel_wait() instead.
I think we could also move back to blocking mode for live migration
incoming side after make it a thread, which might be slightly more
efficient to directly block in recvmsg() rather than return+poll. But it
is trivial comparing to "moving to thread" change, and it can be done for
later even if it works.
>
> > Tests
> > =====
> >
> > Default CI passes.
> >
> > RDMA unit tests pass as usual. I also tried out cancellation / failure
> > tests over RDMA channels, making sure nothing is stuck.
> >
> > I also roughly measured how long it takes to run the whole 80+ migration
> > qtest suite, and see no measurable difference before / after this series.
> >
> > Risks
> > =====
> >
> > This series has the risk of breaking things. I would be surprised if it
> > didn't..
> >
> > I confess I didn't test anything on COLO but only from code observations
> > and analysis. COLO maintainers: could you add some unit tests to QEMU's
> > qtests?
> >
> > The current way of taking BQL during FULL section load may cause issues, it
> > means when the IOs are unstable we could be waiting for IO (in the new
> > migration incoming thread) with BQL held. This is low possibility, though,
> > only happens when the network halts during flushing the device states.
> > However still possible. One solution is to further breakdown the BQL
> > critical sections to smaller sections, as mentioned in TODO.
> >
> > Anything more than welcomed: suggestions, questions, objections, tests..
> >
> > Todo
> > ====
> >
> > - Test COLO?
> > - Finer grained BQL breakdown
> > - More..
> >
> > Thanks,
> >
> > Peter Xu (9):
> > migration/vfio: Remove BQL implication in
> > vfio_multifd_switchover_start()
> > migration/rdma: Fix wrong context in qio_channel_rdma_shutdown()
> > migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread
> > migration/rdma: Change io_create_watch() to return immediately
> > migration: Thread-ify precopy vmstate load process
> > migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel
> > migration/postcopy: Remove workaround on wait preempt channel
> > migration/ram: Remove workaround on ram yield during load
> > migration/rdma: Remove rdma_cm_poll_handler
> >
> > include/migration/colo.h | 6 +-
> > migration/migration.h | 52 +++++++--
> > migration/savevm.h | 5 +-
> > hw/vfio/migration-multifd.c | 9 +-
> > migration/channel.c | 7 +-
> > migration/colo-stubs.c | 2 +-
> > migration/colo.c | 23 +---
> > migration/migration.c | 62 ++++++++---
> > migration/ram.c | 13 +--
> > migration/rdma.c | 206 ++++++++----------------------------
> > migration/savevm.c | 85 +++++++--------
> > migration/trace-events | 4 +-
> > 12 files changed, 196 insertions(+), 278 deletions(-)
>
--
Peter Xu
prev parent reply other threads:[~2025-10-09 16:59 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-27 20:59 [PATCH RFC 0/9] migration: Threadify loadvm process Peter Xu
2025-08-27 20:59 ` [PATCH RFC 1/9] migration/vfio: Remove BQL implication in vfio_multifd_switchover_start() Peter Xu
2025-08-28 18:05 ` Maciej S. Szmigiero
2025-09-16 21:34 ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 2/9] migration/rdma: Fix wrong context in qio_channel_rdma_shutdown() Peter Xu
2025-09-16 21:41 ` Fabiano Rosas
2025-09-26 1:01 ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 3/9] migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread Peter Xu
2025-09-16 21:50 ` Fabiano Rosas
2025-09-26 1:02 ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 4/9] migration/rdma: Change io_create_watch() to return immediately Peter Xu
2025-09-16 22:35 ` Fabiano Rosas
2025-10-08 20:34 ` Peter Xu
2025-09-26 2:39 ` Zhijian Li (Fujitsu)
2025-10-08 20:42 ` Peter Xu
2025-08-27 20:59 ` [PATCH RFC 5/9] migration: Thread-ify precopy vmstate load process Peter Xu
2025-08-27 23:51 ` Dr. David Alan Gilbert
2025-08-29 16:37 ` Peter Xu
2025-09-04 1:38 ` Dr. David Alan Gilbert
2025-10-08 21:02 ` Peter Xu
2025-08-29 8:29 ` Vladimir Sementsov-Ogievskiy
2025-08-29 17:17 ` Peter Xu
2025-09-01 9:35 ` Vladimir Sementsov-Ogievskiy
2025-09-17 18:23 ` Fabiano Rosas
2025-10-09 21:41 ` Peter Xu
2025-09-26 3:41 ` Zhijian Li (Fujitsu)
2025-10-08 21:10 ` Peter Xu
2025-08-27 20:59 ` [PATCH RFC 6/9] migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel Peter Xu
2025-09-16 22:39 ` Fabiano Rosas
2025-10-08 21:18 ` Peter Xu
2025-09-26 2:44 ` Zhijian Li (Fujitsu)
2025-08-27 20:59 ` [PATCH RFC 7/9] migration/postcopy: Remove workaround on wait preempt channel Peter Xu
2025-09-17 18:30 ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 8/9] migration/ram: Remove workaround on ram yield during load Peter Xu
2025-09-17 18:31 ` Fabiano Rosas
2025-08-27 20:59 ` [PATCH RFC 9/9] migration/rdma: Remove rdma_cm_poll_handler Peter Xu
2025-09-17 18:38 ` Fabiano Rosas
2025-10-08 21:22 ` Peter Xu
2025-09-26 3:38 ` Zhijian Li (Fujitsu)
2025-08-29 8:29 ` [PATCH RFC 0/9] migration: Threadify loadvm process Vladimir Sementsov-Ogievskiy
2025-08-29 17:18 ` Peter Xu
2025-09-04 8:27 ` Zhang Chen
2025-10-08 21:26 ` Peter Xu
2025-09-16 21:32 ` Fabiano Rosas
2025-10-09 16:58 ` Peter Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aOfpp41s5rFt-Qzk@x1.local \
--to=peterx@redhat.com \
--cc=berrange@redhat.com \
--cc=dave@treblig.org \
--cc=farosas@suse.de \
--cc=jmarcin@redhat.com \
--cc=kwolf@redhat.com \
--cc=lizhijian@fujitsu.com \
--cc=pbonzini@redhat.com \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@yandex-team.ru \
--cc=yury-kotov@yandex-team.ru \
--cc=zhangckid@gmail.com \
--cc=zhanghailiang@xfusion.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).