* [PATCH 01/13] io: Add qio_channel_wait_cond() helper
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-23 13:02 ` Vladimir Sementsov-Ogievskiy
2025-10-24 12:00 ` Daniel P. Berrangé
2025-10-22 19:26 ` [PATCH 02/13] migration: Properly wait on G_IO_IN when peeking messages Peter Xu
` (12 subsequent siblings)
13 siblings, 2 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
Add the helper to wait for QIO channel's IO availability in any
context (coroutine, or non-coroutine). Use it tree-wide for three
occurences.
Cc: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/io/channel.h | 15 +++++++++++++++
io/channel.c | 21 +++++++++++----------
migration/qemu-file.c | 6 +-----
3 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/include/io/channel.h b/include/io/channel.h
index 0f25ae0069..d6d5bf2b5f 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -871,6 +871,21 @@ void qio_channel_wake_read(QIOChannel *ioc);
void qio_channel_wait(QIOChannel *ioc,
GIOCondition condition);
+/**
+ * qio_channel_wait_cond:
+ * @ioc: the channel object
+ * @condition: the I/O condition to wait for
+ *
+ * Block execution from the current thread until
+ * the condition indicated by @condition becomes
+ * available.
+ *
+ * This will work with/without a coroutine context, by automatically select
+ * the proper API to wait.
+ */
+void qio_channel_wait_cond(QIOChannel *ioc,
+ GIOCondition condition);
+
/**
* qio_channel_set_aio_fd_handler:
* @ioc: the channel object
diff --git a/io/channel.c b/io/channel.c
index 852e684938..b18fc346ff 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -159,11 +159,7 @@ int coroutine_mixed_fn qio_channel_readv_full_all_eof(QIOChannel *ioc,
len = qio_channel_readv_full(ioc, local_iov, nlocal_iov, local_fds,
local_nfds, flags, errp);
if (len == QIO_CHANNEL_ERR_BLOCK) {
- if (qemu_in_coroutine()) {
- qio_channel_yield(ioc, G_IO_IN);
- } else {
- qio_channel_wait(ioc, G_IO_IN);
- }
+ qio_channel_wait_cond(ioc, G_IO_IN);
continue;
}
@@ -268,11 +264,7 @@ int coroutine_mixed_fn qio_channel_writev_full_all(QIOChannel *ioc,
nfds, flags, errp);
if (len == QIO_CHANNEL_ERR_BLOCK) {
- if (qemu_in_coroutine()) {
- qio_channel_yield(ioc, G_IO_OUT);
- } else {
- qio_channel_wait(ioc, G_IO_OUT);
- }
+ qio_channel_wait_cond(ioc, G_IO_OUT);
continue;
}
if (len < 0) {
@@ -774,6 +766,15 @@ void qio_channel_wait(QIOChannel *ioc,
g_main_context_unref(ctxt);
}
+void qio_channel_wait_cond(QIOChannel *ioc,
+ GIOCondition condition)
+{
+ if (qemu_in_coroutine()) {
+ qio_channel_yield(ioc, condition);
+ } else {
+ qio_channel_wait(ioc, condition);
+ }
+}
static void qio_channel_finalize(Object *obj)
{
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2d4ce174a5..4b5a409a80 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -343,11 +343,7 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
QIO_CHANNEL_READ_FLAG_FD_PRESERVE_BLOCKING,
&local_error);
if (len == QIO_CHANNEL_ERR_BLOCK) {
- if (qemu_in_coroutine()) {
- qio_channel_yield(f->ioc, G_IO_IN);
- } else {
- qio_channel_wait(f->ioc, G_IO_IN);
- }
+ qio_channel_wait_cond(f->ioc, G_IO_IN);
}
} while (len == QIO_CHANNEL_ERR_BLOCK);
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 01/13] io: Add qio_channel_wait_cond() helper
2025-10-22 19:26 ` [PATCH 01/13] io: Add qio_channel_wait_cond() helper Peter Xu
@ 2025-10-23 13:02 ` Vladimir Sementsov-Ogievskiy
2025-10-24 12:00 ` Daniel P. Berrangé
1 sibling, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-23 13:02 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
On 22.10.25 22:26, Peter Xu wrote:
> Add the helper to wait for QIO channel's IO availability in any
> context (coroutine, or non-coroutine). Use it tree-wide for three
> occurences.
>
> Cc: Daniel P. Berrangé<berrange@redhat.com>
> Signed-off-by: Peter Xu<peterx@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 01/13] io: Add qio_channel_wait_cond() helper
2025-10-22 19:26 ` [PATCH 01/13] io: Add qio_channel_wait_cond() helper Peter Xu
2025-10-23 13:02 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-24 12:00 ` Daniel P. Berrangé
1 sibling, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2025-10-24 12:00 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Zhang Chen,
Dr . David Alan Gilbert, Prasad Pandit, Paolo Bonzini, Yury Kotov,
Juraj Marcin
On Wed, Oct 22, 2025 at 03:26:00PM -0400, Peter Xu wrote:
> Add the helper to wait for QIO channel's IO availability in any
> context (coroutine, or non-coroutine). Use it tree-wide for three
> occurences.
>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> include/io/channel.h | 15 +++++++++++++++
> io/channel.c | 21 +++++++++++----------
> migration/qemu-file.c | 6 +-----
> 3 files changed, 27 insertions(+), 15 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 02/13] migration: Properly wait on G_IO_IN when peeking messages
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
2025-10-22 19:26 ` [PATCH 01/13] io: Add qio_channel_wait_cond() helper Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-23 13:07 ` Vladimir Sementsov-Ogievskiy
2025-10-24 12:02 ` Daniel P. Berrangé
2025-10-22 19:26 ` [PATCH 03/13] migration/rdma: Fix wrong context in qio_channel_rdma_shutdown() Peter Xu
` (11 subsequent siblings)
13 siblings, 2 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
migration_channel_read_peek() used to do explicit waits of a short period
when peeking message needs retry. Replace it with explicit polls on the io
channel, exactly like what qemu_fill_buffer() does.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/channel.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/migration/channel.c b/migration/channel.c
index a547b1fbfe..462cc183e1 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -135,12 +135,7 @@ int migration_channel_read_peek(QIOChannel *ioc,
break;
}
- /* 1ms sleep. */
- if (qemu_in_coroutine()) {
- qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 1000000);
- } else {
- g_usleep(1000);
- }
+ qio_channel_wait_cond(ioc, G_IO_IN);
}
return 0;
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 02/13] migration: Properly wait on G_IO_IN when peeking messages
2025-10-22 19:26 ` [PATCH 02/13] migration: Properly wait on G_IO_IN when peeking messages Peter Xu
@ 2025-10-23 13:07 ` Vladimir Sementsov-Ogievskiy
2025-10-24 12:02 ` Daniel P. Berrangé
1 sibling, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-23 13:07 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
On 22.10.25 22:26, Peter Xu wrote:
> migration_channel_read_peek() used to do explicit waits of a short period
> when peeking message needs retry. Replace it with explicit polls on the io
> channel, exactly like what qemu_fill_buffer() does.
>
> Signed-off-by: Peter Xu<peterx@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 02/13] migration: Properly wait on G_IO_IN when peeking messages
2025-10-22 19:26 ` [PATCH 02/13] migration: Properly wait on G_IO_IN when peeking messages Peter Xu
2025-10-23 13:07 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-24 12:02 ` Daniel P. Berrangé
2025-10-28 18:16 ` Peter Xu
1 sibling, 1 reply; 23+ messages in thread
From: Daniel P. Berrangé @ 2025-10-24 12:02 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Zhang Chen,
Dr . David Alan Gilbert, Prasad Pandit, Paolo Bonzini, Yury Kotov,
Juraj Marcin
On Wed, Oct 22, 2025 at 03:26:01PM -0400, Peter Xu wrote:
> migration_channel_read_peek() used to do explicit waits of a short period
> when peeking message needs retry. Replace it with explicit polls on the io
> channel, exactly like what qemu_fill_buffer() does.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/channel.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 02/13] migration: Properly wait on G_IO_IN when peeking messages
2025-10-24 12:02 ` Daniel P. Berrangé
@ 2025-10-28 18:16 ` Peter Xu
0 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-28 18:16 UTC (permalink / raw)
To: Daniel P. Berrangé, Vladimir Sementsov-Ogievskiy
Cc: qemu-devel, Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, Fabiano Rosas, Zhang Chen,
Dr . David Alan Gilbert, Prasad Pandit, Paolo Bonzini, Yury Kotov,
Juraj Marcin
On Fri, Oct 24, 2025 at 01:02:06PM +0100, Daniel P. Berrangé wrote:
> On Wed, Oct 22, 2025 at 03:26:01PM -0400, Peter Xu wrote:
> > migration_channel_read_peek() used to do explicit waits of a short period
> > when peeking message needs retry. Replace it with explicit polls on the io
> > channel, exactly like what qemu_fill_buffer() does.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > migration/channel.c | 7 +------
> > 1 file changed, 1 insertion(+), 6 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Thanks both for the fast reviews. I queued patches 1-2 in advance for 10.2.
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 03/13] migration/rdma: Fix wrong context in qio_channel_rdma_shutdown()
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
2025-10-22 19:26 ` [PATCH 01/13] io: Add qio_channel_wait_cond() helper Peter Xu
2025-10-22 19:26 ` [PATCH 02/13] migration: Properly wait on G_IO_IN when peeking messages Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-22 19:26 ` [PATCH 04/13] migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread Peter Xu
` (10 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin, Lidong Chen
The rdmaout should be a cache of rioc->rdmaout, not rioc->rdmain.
Cc: Lidong Chen <jemmy858585@gmail.com>
Fixes: 54db882f07 ("migration: implement the shutdown for RDMA QIOChannel")
Reviewed-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/rdma.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/rdma.c b/migration/rdma.c
index 2d839fce6c..e6837184c8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2986,7 +2986,7 @@ qio_channel_rdma_shutdown(QIOChannel *ioc,
RCU_READ_LOCK_GUARD();
rdmain = qatomic_rcu_read(&rioc->rdmain);
- rdmaout = qatomic_rcu_read(&rioc->rdmain);
+ rdmaout = qatomic_rcu_read(&rioc->rdmaout);
switch (how) {
case QIO_CHANNEL_SHUTDOWN_READ:
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 04/13] migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (2 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 03/13] migration/rdma: Fix wrong context in qio_channel_rdma_shutdown() Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-23 13:41 ` Vladimir Sementsov-Ogievskiy
2025-10-22 19:26 ` [PATCH 05/13] migration/rdma: Change io_create_watch() to return immediately Peter Xu
` (9 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
It's almost there, except that currently it relies on a global flag showing
that it's in incoming migration.
Change it to detect coroutine instead.
Then we achieved two things in one shot:
- Drop migration_started_on_destination, which is not needed anymore, and
- It starts to work in a thread when loadvm using RDMA
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/rdma.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/migration/rdma.c b/migration/rdma.c
index e6837184c8..13dd391c14 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -357,13 +357,6 @@ typedef struct RDMAContext {
/* Index of the next RAMBlock received during block registration */
unsigned int next_src_index;
- /*
- * Migration on *destination* started.
- * Then use coroutine yield function.
- * Source runs in a thread, so we don't care.
- */
- int migration_started_on_destination;
-
int total_registrations;
int total_writes;
@@ -1352,12 +1345,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
{
struct rdma_cm_event *cm_event;
- /*
- * Coroutine doesn't start until migration_fd_process_incoming()
- * so don't yield unless we know we're running inside of a coroutine.
- */
- if (rdma->migration_started_on_destination &&
- migration_incoming_get_current()->state == MIGRATION_STATUS_ACTIVE) {
+ if (qemu_in_coroutine()) {
yield_until_fd_readable(comp_channel->fd);
} else {
/* This is the source side, we're in a separate thread
@@ -3884,7 +3872,6 @@ static void rdma_accept_incoming_migration(void *opaque)
return;
}
- rdma->migration_started_on_destination = 1;
migration_fd_process_incoming(f);
}
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 04/13] migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread
2025-10-22 19:26 ` [PATCH 04/13] migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread Peter Xu
@ 2025-10-23 13:41 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-23 13:41 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
On 22.10.25 22:26, Peter Xu wrote:
> It's almost there, except that currently it relies on a global flag showing
> that it's in incoming migration.
>
> Change it to detect coroutine instead.
>
> Then we achieved two things in one shot:
>
> - Drop migration_started_on_destination, which is not needed anymore, and
> - It starts to work in a thread when loadvm using RDMA
>
> Signed-off-by: Peter Xu<peterx@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 05/13] migration/rdma: Change io_create_watch() to return immediately
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (3 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 04/13] migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-22 19:26 ` [PATCH 06/13] migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED() Peter Xu
` (8 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
The old RDMA's io_create_watch() isn't really doing much work anyway. For
G_IO_OUT, it already does return immediately. For G_IO_IN, it will try to
detect some RDMA context length however normally nobody will be able to set
it at all.
Simplify the code so that RDMA iochannels simply always rely on synchronous
reads and writes. It is highly likely what 6ddd2d76ca6f86f was talking
about, that the async model isn't really working well.
To be eplicit, incoming migration should always have marked the iochannel
to be nonblocking. For non-RDMA channels, what happens with current master
branch is when we have nothing to read, QEMU yields the coroutine at
qemu_fill_buffer(). For RDMA, what I see is it always polls on its own and
it yields at qemu_rdma_wait_comp_channel(). A sample stack:
#0 qemu_coroutine_yield
#1 0x0000562e46e51f77 in yield_until_fd_readable
#2 0x0000562e46927823 in qemu_rdma_wait_comp_channel
#3 0x0000562e46927b35 in qemu_rdma_block_for_wrid
#4 0x0000562e46927e6f in qemu_rdma_post_send_control
#5 0x0000562e4692857f in qemu_rdma_exchange_recv
#6 0x0000562e4692ab5e in qio_channel_rdma_readv
#7 0x0000562e46c1f2d7 in qio_channel_readv_full
#8 0x0000562e46c13a6e in qemu_fill_buffer
#9 0x0000562e46c14ba8 in qemu_peek_byte
#10 0x0000562e46c14c09 in qemu_get_byte
#11 0x0000562e46c14e2a in qemu_get_be32
#12 0x0000562e46c14e8a in qemu_get_be64
#13 0x0000562e46913f08 in ram_load_precopy
#14 0x0000562e46914448 in ram_load
#15 0x0000562e469186e3 in vmstate_load
#16 0x0000562e4691ce6d in qemu_loadvm_section_part_end
#17 0x0000562e4691d99b in qemu_loadvm_state_main
#18 0x0000562e4691db87 in qemu_loadvm_state
#19 0x0000562e468f2e87 in process_incoming_migration_co
This patch may or may not help in reality, the whole IO watch may or may
not be working at all for RDMA iochannels. In all cases, this patch makes
sure above will be the only place that RDMA can poll on IOs.
Tested-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/rdma.c | 69 +++---------------------------------------------
1 file changed, 3 insertions(+), 66 deletions(-)
diff --git a/migration/rdma.c b/migration/rdma.c
index 13dd391c14..0e5e02cdca 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2776,56 +2776,14 @@ static gboolean
qio_channel_rdma_source_prepare(GSource *source,
gint *timeout)
{
- QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
- RDMAContext *rdma;
- GIOCondition cond = 0;
*timeout = -1;
-
- RCU_READ_LOCK_GUARD();
- if (rsource->condition == G_IO_IN) {
- rdma = qatomic_rcu_read(&rsource->rioc->rdmain);
- } else {
- rdma = qatomic_rcu_read(&rsource->rioc->rdmaout);
- }
-
- if (!rdma) {
- error_report("RDMAContext is NULL when prepare Gsource");
- return FALSE;
- }
-
- if (rdma->wr_data[0].control_len) {
- cond |= G_IO_IN;
- }
- cond |= G_IO_OUT;
-
- return cond & rsource->condition;
+ return TRUE;
}
static gboolean
qio_channel_rdma_source_check(GSource *source)
{
- QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
- RDMAContext *rdma;
- GIOCondition cond = 0;
-
- RCU_READ_LOCK_GUARD();
- if (rsource->condition == G_IO_IN) {
- rdma = qatomic_rcu_read(&rsource->rioc->rdmain);
- } else {
- rdma = qatomic_rcu_read(&rsource->rioc->rdmaout);
- }
-
- if (!rdma) {
- error_report("RDMAContext is NULL when check Gsource");
- return FALSE;
- }
-
- if (rdma->wr_data[0].control_len) {
- cond |= G_IO_IN;
- }
- cond |= G_IO_OUT;
-
- return cond & rsource->condition;
+ return TRUE;
}
static gboolean
@@ -2835,29 +2793,8 @@ qio_channel_rdma_source_dispatch(GSource *source,
{
QIOChannelFunc func = (QIOChannelFunc)callback;
QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
- RDMAContext *rdma;
- GIOCondition cond = 0;
-
- RCU_READ_LOCK_GUARD();
- if (rsource->condition == G_IO_IN) {
- rdma = qatomic_rcu_read(&rsource->rioc->rdmain);
- } else {
- rdma = qatomic_rcu_read(&rsource->rioc->rdmaout);
- }
-
- if (!rdma) {
- error_report("RDMAContext is NULL when dispatch Gsource");
- return FALSE;
- }
-
- if (rdma->wr_data[0].control_len) {
- cond |= G_IO_IN;
- }
- cond |= G_IO_OUT;
- return (*func)(QIO_CHANNEL(rsource->rioc),
- (cond & rsource->condition),
- user_data);
+ return (*func)(QIO_CHANNEL(rsource->rioc), rsource->condition, user_data);
}
static void
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 06/13] migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED()
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (4 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 05/13] migration/rdma: Change io_create_watch() to return immediately Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-28 13:27 ` Vladimir Sementsov-Ogievskiy
2025-10-22 19:26 ` [PATCH 07/13] migration: Pass in bql_held information from qemu_loadvm_state() Peter Xu
` (7 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
Introduce the helpers to conditionally take or release BQL for a process.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.h | 95 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
diff --git a/migration/migration.h b/migration/migration.h
index 01329bf824..e1c5029110 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -20,6 +20,7 @@
#include "qobject/json-writer.h"
#include "qemu/thread.h"
#include "qemu/coroutine.h"
+#include "qemu/main-loop.h"
#include "io/channel.h"
#include "io/channel-buffer.h"
#include "net/announce.h"
@@ -42,6 +43,100 @@
#define MIGRATION_THREAD_DST_LISTEN "mig/dst/listen"
#define MIGRATION_THREAD_DST_PREEMPT "mig/dst/preempt"
+struct WithBqlHeldAuto;
+typedef struct WithBqlHeldAuto WithBqlHeldAuto;
+
+static inline WithBqlHeldAuto *
+with_bql_held_auto_lock(bool bql_held, const char *file, int line)
+{
+ assert(bql_held == bql_locked());
+ if (!bql_held) {
+ bql_lock_impl(file, line);
+ return (WithBqlHeldAuto *)1;
+ }
+ return (WithBqlHeldAuto *)2;
+}
+
+static inline void
+with_bql_held_auto_unlock(WithBqlHeldAuto *v)
+{
+ assert(bql_locked());
+ if (v == (WithBqlHeldAuto *)1) {
+ bql_unlock();
+ }
+}
+
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(WithBqlHeldAuto, with_bql_held_auto_unlock);
+
+#define _WITH_BQL_HELD(bql_held, var) \
+ for (g_autoptr(WithBqlHeldAuto) var = \
+ with_bql_held_auto_lock(bql_held, __FILE__, __LINE__); \
+ var; \
+ with_bql_held_auto_unlock(var), var = NULL)
+
+/**
+ * WITH_BQL_HELD(): Run a block of code, making sure BQL is held
+ * @bql_held: Whether BQL is already held
+ *
+ * Example use case:
+ *
+ * WITH_BQL_HELD(bql_held) {
+ * // BQL is guaranteed to be held within this block.
+ * // If bql_held==false, bql will be released when the block finishes.
+ * }
+ */
+#define WITH_BQL_HELD(bql_held) \
+ _WITH_BQL_HELD(bql_held, glue(with_bql_held_var, __COUNTER__))
+
+struct WithBqlReleaseAuto;
+typedef struct WithBqlReleaseAuto WithBqlReleaseAuto;
+
+static inline WithBqlReleaseAuto *
+with_bql_release_auto_unlock(bool bql_held)
+{
+ assert(bql_held == bql_locked());
+ if (bql_held) {
+ bql_unlock();
+ return (WithBqlReleaseAuto *)1;
+ }
+ return (WithBqlReleaseAuto *)2;
+}
+
+static inline void
+with_bql_release_auto_lock(WithBqlReleaseAuto *v)
+{
+ assert(!bql_locked());
+ if (v == (WithBqlReleaseAuto *)1) {
+ /*
+ * NOTE: cleanup function cannot take more than 1 argument. Keep
+ * it simple here by not passing __FILE__/__LINE__ from the caller.
+ */
+ bql_lock_impl(__FILE__, __LINE__);
+ }
+}
+
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(WithBqlReleaseAuto, with_bql_release_auto_lock);
+
+#define _WITH_BQL_RELEASED(bql_held, var) \
+ for (g_autoptr(WithBqlReleaseAuto) var = \
+ with_bql_release_auto_unlock(bql_held); \
+ var; \
+ with_bql_release_auto_lock(var), var = NULL)
+
+/**
+ * WITH_BQL_RELEASED(): Run a block of code, making sure BQL is release
+ * @bql_held: Whether BQL is already held
+ *
+ * Example use case:
+ *
+ * WITH_BQL_RELEASE(bql_held) {
+ * // BQL is guaranteed to be release within this block.
+ * // If bql_held==true, bql will be re-taken when the block finishes.
+ * }
+ */
+#define WITH_BQL_RELEASED(bql_held) \
+ _WITH_BQL_RELEASED(bql_held, glue(with_bql_release_var, __COUNTER__))
+
struct PostcopyBlocktimeContext;
typedef struct ThreadPool ThreadPool;
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 06/13] migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED()
2025-10-22 19:26 ` [PATCH 06/13] migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED() Peter Xu
@ 2025-10-28 13:27 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-28 13:27 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
On 22.10.25 22:26, Peter Xu wrote:
> Introduce the helpers to conditionally take or release BQL for a process.
>
> Signed-off-by: Peter Xu<peterx@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 07/13] migration: Pass in bql_held information from qemu_loadvm_state()
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (5 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 06/13] migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED() Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-28 14:22 ` Vladimir Sementsov-Ogievskiy
2025-10-22 19:26 ` [PATCH 08/13] migration: Thread-ify precopy vmstate load process Peter Xu
` (6 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
Teach qemu_loadvm_state() and some of the internal functions to know
whether we're holding BQL or not.
So far, all the callers still always pass in TRUE, hence no functional
change expected. But it may change in the near future.
To reviewers: even if this is not functional change yet, it'll be the major
core functional change after we switch to threadified loadvm soon. Please
Treat it as one to add explicit code to mark out which part of incoming
live migration would need to be executed always with the BQL, or would need
to be run always without BQL.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/savevm.h | 4 +--
migration/colo.c | 2 +-
migration/migration.c | 2 +-
migration/savevm.c | 72 ++++++++++++++++++++++++++++++-------------
4 files changed, 55 insertions(+), 25 deletions(-)
diff --git a/migration/savevm.h b/migration/savevm.h
index c337e3e3d1..a04dee4166 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -64,10 +64,10 @@ void qemu_savevm_send_colo_enable(QEMUFile *f);
void qemu_savevm_live_state(QEMUFile *f);
int qemu_save_device_state(QEMUFile *f);
-int qemu_loadvm_state(QEMUFile *f, Error **errp);
+int qemu_loadvm_state(QEMUFile *f, bool bql_held, Error **errp);
void qemu_loadvm_state_cleanup(MigrationIncomingState *mis);
int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis,
- Error **errp);
+ bool bql_held, Error **errp);
int qemu_load_device_state(QEMUFile *f, Error **errp);
int qemu_loadvm_approve_switchover(void);
int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
diff --git a/migration/colo.c b/migration/colo.c
index db783f6fa7..4fd586951a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -686,7 +686,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
bql_lock();
cpu_synchronize_all_states();
- ret = qemu_loadvm_state_main(mis->from_src_file, mis, errp);
+ ret = qemu_loadvm_state_main(mis->from_src_file, mis, true, errp);
bql_unlock();
if (ret < 0) {
diff --git a/migration/migration.c b/migration/migration.c
index 4ed2a2e881..38a584afae 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -878,7 +878,7 @@ process_incoming_migration_co(void *opaque)
MIGRATION_STATUS_ACTIVE);
mis->loadvm_co = qemu_coroutine_self();
- ret = qemu_loadvm_state(mis->from_src_file, &local_err);
+ ret = qemu_loadvm_state(mis->from_src_file, true, &local_err);
mis->loadvm_co = NULL;
trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
diff --git a/migration/savevm.c b/migration/savevm.c
index 232cae090b..44aadc2f51 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -154,11 +154,12 @@ static void qemu_loadvm_thread_pool_destroy(MigrationIncomingState *mis)
}
static bool qemu_loadvm_thread_pool_wait(MigrationState *s,
- MigrationIncomingState *mis)
+ MigrationIncomingState *mis,
+ bool bql_held)
{
- bql_unlock(); /* Let load threads do work requiring BQL */
- thread_pool_wait(mis->load_threads);
- bql_lock();
+ WITH_BQL_RELEASED(bql_held) {
+ thread_pool_wait(mis->load_threads);
+ }
return !migrate_has_error(s);
}
@@ -2117,7 +2118,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
qemu_file_set_blocking(f, true, &error_fatal);
/* TODO: sanity check that only postcopiable data will be loaded here */
- load_res = qemu_loadvm_state_main(f, mis, &local_err);
+ load_res = qemu_loadvm_state_main(f, mis, true, &local_err);
/*
* This is tricky, but, mis->from_src_file can change after it
@@ -2420,7 +2421,8 @@ static void loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
* Returns: Negative values on error
*
*/
-static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis, Error **errp)
+static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
+ bool bql_held, Error **errp)
{
int ret;
size_t length;
@@ -2471,7 +2473,7 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis, Error **errp)
qemu_coroutine_yield();
} while (1);
- ret = qemu_loadvm_state_main(packf, mis, errp);
+ ret = qemu_loadvm_state_main(packf, mis, bql_held, errp);
trace_loadvm_handle_cmd_packaged_main(ret);
qemu_fclose(packf);
object_unref(OBJECT(bioc));
@@ -2571,7 +2573,7 @@ static int loadvm_postcopy_handle_switchover_start(Error **errp)
* LOADVM_QUIT All good, but exit the loop
* <0 Error
*/
-static int loadvm_process_command(QEMUFile *f, Error **errp)
+static int loadvm_process_command(QEMUFile *f, bool bql_held, Error **errp)
{
MigrationIncomingState *mis = migration_incoming_get_current();
uint16_t cmd;
@@ -2641,7 +2643,8 @@ static int loadvm_process_command(QEMUFile *f, Error **errp)
break;
case MIG_CMD_PACKAGED:
- return loadvm_handle_cmd_packaged(mis, errp);
+ /* PACKAGED may have bql dependency internally */
+ return loadvm_handle_cmd_packaged(mis, bql_held, errp);
case MIG_CMD_POSTCOPY_ADVISE:
return loadvm_postcopy_handle_advise(mis, len, errp);
@@ -2666,7 +2669,11 @@ static int loadvm_process_command(QEMUFile *f, Error **errp)
return loadvm_process_enable_colo(mis, errp);
case MIG_CMD_SWITCHOVER_START:
- return loadvm_postcopy_handle_switchover_start(errp);
+ WITH_BQL_HELD(bql_held) {
+ /* TODO: drop the BQL dependency */
+ ret = loadvm_postcopy_handle_switchover_start(errp);
+ }
+ return ret;
}
return 0;
@@ -2882,6 +2889,10 @@ static int qemu_loadvm_state_header(QEMUFile *f, Error **errp)
return -EINVAL;
}
+ /*
+ * NOTE: this can be invoked with/without BQL. It's safe because
+ * vmstate_configuration (and its hooks) doesn't rely on BQL status.
+ */
ret = vmstate_load_state(f, &vmstate_configuration, &savevm_state, 0,
errp);
if (ret) {
@@ -3072,7 +3083,7 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
}
int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis,
- Error **errp)
+ bool bql_held, Error **errp)
{
ERRP_GUARD();
uint8_t section_type;
@@ -3094,20 +3105,33 @@ retry:
switch (section_type) {
case QEMU_VM_SECTION_START:
case QEMU_VM_SECTION_FULL:
- ret = qemu_loadvm_section_start_full(f, section_type, errp);
+ /*
+ * FULL should normally require BQL, e.g. during post_load()
+ * there can be memory region updates. START may or may not
+ * require it, but just to keep it simple to always hold BQL
+ * for now.
+ */
+ WITH_BQL_HELD(bql_held) {
+ ret = qemu_loadvm_section_start_full(f, section_type, errp);
+ }
if (ret < 0) {
goto out;
}
break;
case QEMU_VM_SECTION_PART:
case QEMU_VM_SECTION_END:
+ /* PART / END may be loaded without BQL */
ret = qemu_loadvm_section_part_end(f, section_type, errp);
if (ret < 0) {
goto out;
}
break;
case QEMU_VM_COMMAND:
- ret = loadvm_process_command(f, errp);
+ /*
+ * Be careful; QEMU_VM_COMMAND can embed FULL sections, so it
+ * may internally need BQL.
+ */
+ ret = loadvm_process_command(f, bql_held, errp);
trace_qemu_loadvm_state_section_command(ret);
if ((ret < 0) || (ret == LOADVM_QUIT)) {
goto out;
@@ -3152,7 +3176,7 @@ out:
return ret;
}
-int qemu_loadvm_state(QEMUFile *f, Error **errp)
+int qemu_loadvm_state(QEMUFile *f, bool bql_held, Error **errp)
{
MigrationState *s = migrate_get_current();
MigrationIncomingState *mis = migration_incoming_get_current();
@@ -3177,9 +3201,12 @@ int qemu_loadvm_state(QEMUFile *f, Error **errp)
qemu_loadvm_state_switchover_ack_needed(mis);
}
- cpu_synchronize_all_pre_loadvm();
+ /* run_on_cpu() requires BQL */
+ WITH_BQL_HELD(bql_held) {
+ cpu_synchronize_all_pre_loadvm();
+ }
- ret = qemu_loadvm_state_main(f, mis, errp);
+ ret = qemu_loadvm_state_main(f, mis, bql_held, errp);
qemu_event_set(&mis->main_thread_load_event);
trace_qemu_loadvm_state_post_main(ret);
@@ -3195,7 +3222,7 @@ int qemu_loadvm_state(QEMUFile *f, Error **errp)
/* When reaching here, it must be precopy */
if (ret == 0) {
if (migrate_has_error(migrate_get_current()) ||
- !qemu_loadvm_thread_pool_wait(s, mis)) {
+ !qemu_loadvm_thread_pool_wait(s, mis, bql_held)) {
ret = -EINVAL;
error_setg(errp,
"Error while loading vmstate");
@@ -3249,7 +3276,10 @@ int qemu_loadvm_state(QEMUFile *f, Error **errp)
}
}
- cpu_synchronize_all_post_init();
+ /* run_on_cpu() requires BQL */
+ WITH_BQL_HELD(bql_held) {
+ cpu_synchronize_all_post_init();
+ }
return ret;
}
@@ -3260,7 +3290,7 @@ int qemu_load_device_state(QEMUFile *f, Error **errp)
int ret;
/* Load QEMU_VM_SECTION_FULL section */
- ret = qemu_loadvm_state_main(f, mis, errp);
+ ret = qemu_loadvm_state_main(f, mis, true, errp);
if (ret < 0) {
return ret;
}
@@ -3495,7 +3525,7 @@ void qmp_xen_load_devices_state(const char *filename, Error **errp)
f = qemu_file_new_input(QIO_CHANNEL(ioc));
object_unref(OBJECT(ioc));
- ret = qemu_loadvm_state(f, errp);
+ ret = qemu_loadvm_state(f, true, errp);
qemu_fclose(f);
if (ret < 0) {
error_prepend(errp, "loading Xen device state failed: ");
@@ -3573,7 +3603,7 @@ bool load_snapshot(const char *name, const char *vmstate,
ret = -EINVAL;
goto err_drain;
}
- ret = qemu_loadvm_state(f, errp);
+ ret = qemu_loadvm_state(f, true, errp);
migration_incoming_state_destroy();
bdrv_drain_all_end();
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 07/13] migration: Pass in bql_held information from qemu_loadvm_state()
2025-10-22 19:26 ` [PATCH 07/13] migration: Pass in bql_held information from qemu_loadvm_state() Peter Xu
@ 2025-10-28 14:22 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-28 14:22 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
On 22.10.25 22:26, Peter Xu wrote:
> Teach qemu_loadvm_state() and some of the internal functions to know
> whether we're holding BQL or not.
Actually, this commit does more: not only pass the bql_held information,
but also by introduce some WITH_BQL_HELD() sections.
IMHO it could be split:
1. only add bql_held parameters, which are used only to passthorough to
called functions. That's just to simplify further commits. Make the
information available. At this commit, we only need to check, that
passed information is correct (is it really held, may be add some
comments/assertions to make it obvious)
2. one or more commits, which prepares different functions to be called
in thread: support bql_held=false by introducing WITH_BQL_HELD() sections.
In such commit, we should lookthorgh the whole function and check that it
actually prepared to be called from thread.
Hmm, or without [1.], there may be several commits to prepare different
functions. Or maybe, even one commit as it is, but change commit subject
and message somehow, to reflect all the changes..
>
> So far, all the callers still always pass in TRUE, hence no functional
> change expected. But it may change in the near future.
>
> To reviewers: even if this is not functional change yet, it'll be the major
> core functional change after we switch to threadified loadvm soon. Please
> Treat it as one to add explicit code to mark out which part of incoming
> live migration would need to be executed always with the BQL, or would need
> to be run always without BQL.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
[..]
> diff --git a/migration/colo.c b/migration/colo.c
> index db783f6fa7..4fd586951a 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -686,7 +686,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis,
>
> bql_lock();
> cpu_synchronize_all_states();
> - ret = qemu_loadvm_state_main(mis->from_src_file, mis, errp);
> + ret = qemu_loadvm_state_main(mis->from_src_file, mis, true, errp);
That one is obvious..
> bql_unlock();
>
> if (ret < 0) {
> diff --git a/migration/migration.c b/migration/migration.c
> index 4ed2a2e881..38a584afae 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -878,7 +878,7 @@ process_incoming_migration_co(void *opaque)
> MIGRATION_STATUS_ACTIVE);
>
> mis->loadvm_co = qemu_coroutine_self();
> - ret = qemu_loadvm_state(mis->from_src_file, &local_err);
> + ret = qemu_loadvm_state(mis->from_src_file, true, &local_err);
Here, why are we sure? Are coroutines triggered by QMP command always run under BQL?
Maybe, worth an assertion.
> mis->loadvm_co = NULL;
>
> trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 232cae090b..44aadc2f51 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -154,11 +154,12 @@ static void qemu_loadvm_thread_pool_destroy(MigrationIncomingState *mis)
> }
>
> static bool qemu_loadvm_thread_pool_wait(MigrationState *s,
> - MigrationIncomingState *mis)
> + MigrationIncomingState *mis,
> + bool bql_held)
> {
> - bql_unlock(); /* Let load threads do work requiring BQL */
> - thread_pool_wait(mis->load_threads);
> - bql_lock();
> + WITH_BQL_RELEASED(bql_held) {
> + thread_pool_wait(mis->load_threads);
> + }
>
> return !migrate_has_error(s);
The function is now prepared to be called from thread, as migrate_has_error() has own mutex.
> }
> @@ -2117,7 +2118,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
> qemu_file_set_blocking(f, true, &error_fatal);
>
> /* TODO: sanity check that only postcopiable data will be loaded here */
> - load_res = qemu_loadvm_state_main(f, mis, &local_err);
> + load_res = qemu_loadvm_state_main(f, mis, true, &local_err);
Is it correct? I see, a bit later, postcopy_ram_listen_thread() does
bql_lock();
migration_incoming_state_destroy();
bql_unlock();
so, I assume, that before this, when we call qemu_loadvm_state_main(), BQL is not actually locked?
>
> /*
> * This is tricky, but, mis->from_src_file can change after it
> @@ -2420,7 +2421,8 @@ static void loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
> * Returns: Negative values on error
> *
> */
> -static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis, Error **errp)
> +static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
> + bool bql_held, Error **errp)
> {
> int ret;
[..]
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 08/13] migration: Thread-ify precopy vmstate load process
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (6 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 07/13] migration: Pass in bql_held information from qemu_loadvm_state() Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-22 19:26 ` [PATCH 09/13] migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel Peter Xu
` (5 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
Migration module was there for 10+ years. Initially, it was in most cases
based on coroutines. As more features were added into the framework, like
postcopy, multifd, etc.. it became a mixture of threads and coroutines.
I'm guessing coroutines just can't fix all issues that migration want to
resolve.
After all these years, migration is now heavily based on a threaded model.
Now there's still a major part of migration framework that is still not
thread-based, which is precopy load. We do load in a separate thread in
postcopy since the 1st day postcopy was introduced, however that requires a
separate state transition from precopy loading all devices first, which
still happens in the main thread of a coroutine.
This patch tries to move the migration incoming side to be run inside a
separate thread (mig/dst/main) just like the src (mig/src/main). The
entrance to be migration_incoming_thread().
Quite a few things are needed to make it fly.. One note here is we need to
change all these things in one patch to not break anything. The other way
to do this is add code to make all paths (that this patch touched) be ready
for either coroutine or thread. That may cause confusions in another way.
So reviewers, please take my sincere apology on the hardness of reviewing
this patch: it covers a few modules at the same time, and with some risky
changes.
BQL Analysis
============
Firstly, when moving it over to the thread, it means the thread cannot take
BQL during the whole process of loading anymore, because otherwise it can
block main thread from using the BQL for all kinds of other concurrent
tasks (for example, processing QMP / HMP commands).
Here the first question to ask is: what needs BQL during precopy load, and
what doesn't?
Most of the load process shouldn't need BQL, especially when it's about
RAM. After all, RAM is still the major chunk of data to move for a live
migration process. VFIO started to change that, though, but still, VFIO is
per-device so that shouldn't need BQL either in most cases.
Generic device loads will need BQL, likely not when receiving VMSDs, but
when applying them. One example is any post_load() could potentially
inject memory regions causing memory transactions to happen. That'll need
to update the global address spaces, hence requires BQL. The other one is
CPU sync operations, even if the sync alone may not need BQL (which is
still to be further justified), run_on_cpu() will need it.
For that, qemu_loadvm_state() and qemu_loadvm_state_main() functions need
to now take a "bql_held" parameter saying whether bql is held. We could
use things like BQL_LOCK_GUARD(), but this patch goes with explicit
lockings rather than relying on bql_locked TLS variable. In case of
migration, we always know whether BQL is held in different context as long
as we can still pass that information downwards.
COLO
====
COLO assumed the dest VM load happens in a coroutine. After this patch,
it's not anymore. Change that by invoking colo_incoming_co() directly from
the migration_incoming_thread().
The name (colo_incoming_co()) isn't proper anymore. Change it to
colo_incoming_wait(), removing the coroutine annotation alongside.
Remove all the bql_lock() implications in COLO, e.g., colo_incoming_co()
used to release the lock for a short period while join(). Now it's not
needed. Instead, taking BQL but only when needed (colo_release_ram_cache).
At the meantime, there's colo_incoming_co variable that used to store the
COLO incoming coroutine, only to be kicked off when a secondary failover
happens.
To recap, what should happen for such failover should be (taking example of
a QMP command x-colo-lost-heartbeat triggering on dest QEMU):
- The QMP command will kick off both the coroutine and the COLO
thread (colo_process_incoming_thread()), with something like:
/* Notify COLO incoming thread that failover work is finished */
qemu_event_set(&mis->colo_incoming_event);
qemu_coroutine_enter(mis->colo_incoming_co);
- The coroutine, which yielded itself before, now resumes after enter(),
then it'll wait for the join():
mis->colo_incoming_co = qemu_coroutine_self();
qemu_coroutine_yield();
mis->colo_incoming_co = NULL;
/* Wait checkpoint incoming thread exit before free resource */
qemu_thread_join(&th);
Here, when switching to a thread model, it should be fine removing
colo_incoming_co variable completely, because if so, the incoming thread
will (instead of yielding the coroutine) wait at qemu_thread_join() until
the colo thread completes execution (after receiving colo_incoming_event).
RDMA
====
With the prior patch making sure io_watch won't block for RDMA iochannels,
RDMA threads should only block at its io_readv/io_writev functions. When a
disconnection is detected (as in rdma_cm_poll_handler()), the update to
"errored" field will be immediately reflected in the migration incoming
thread. Hence the coroutine for RDMA is not needed anymore to kick the
thread out.
When the thread is available, we also can't have rdma_cm_poll_handler()
keep polling the fd and operate on it in the main thread. Drop it
completely, and it should be fine because qemu_rdma_wait_comp_channel()
should also be monitoring it.
This almost reverts commit 923709896b1b01fb982c93492ad01b233e6b6023.
We need to do this change in this same patch that we introduce the thread,
unfortunately, otherwise we can have a risk of racing.
TODO
====
Currently the BQL is taken during loading of a START|FULL section. When
the IO hangs (e.g. network issue) during this process, it could potentially
block others like the monitor servers. One solution is breaking BQL to
smaller granule and leave IOs to be always BQL-free. That'll need more
justifications.
For example, there are at least four things that need some closer
attention:
- SaveVMHandlers's load_state(): this likely DO NOT need BQL, but we need
to justify all of them (not to mention, some of them look like prone to
be rewritten as VMSDs..)
- VMSD's pre_load(): in most cases, this DO NOT really need BQL, but
sometimes maybe it will! Double checking on this will be needed.
- VMSD's post_load(): in many cases, this DO need BQL, for example on
address space operations. Likely we should just take it for any
post_load().
- VMSD field's get(): this is tricky! It could internally be anything
even if it was only a field. E.g. there can be users to use a SINGLE
field to load a whole VMSD, which can further introduce more
possibilities.
In general, QEMUFile IOs should not need BQL, that is when receiving the
VMSD data and waiting for e.g. the socket buffer to get refilled. But
that's the easy part.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/migration/colo.h | 6 ++--
migration/migration.h | 14 +++-----
migration/colo-stubs.c | 2 +-
migration/colo.c | 24 ++++---------
migration/migration.c | 77 +++++++++++++++++++++++++---------------
migration/rdma.c | 34 +-----------------
migration/savevm.c | 8 ++---
migration/trace-events | 4 +--
8 files changed, 69 insertions(+), 100 deletions(-)
diff --git a/include/migration/colo.h b/include/migration/colo.h
index d4fe422e4d..5de7d715a7 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -44,12 +44,10 @@ void colo_do_failover(void);
void colo_checkpoint_delay_set(void);
/*
- * Starts COLO incoming process. Called from process_incoming_migration_co()
+ * Starts COLO incoming process. Called from migration_incoming_thread()
* after loading the state.
- *
- * Called with BQL locked, may temporary release BQL.
*/
-void coroutine_fn colo_incoming_co(void);
+void colo_incoming_wait(void);
void colo_shutdown(void);
#endif
diff --git a/migration/migration.h b/migration/migration.h
index e1c5029110..0d22dc8cc2 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -214,6 +214,10 @@ struct MigrationIncomingState {
bool have_listen_thread;
QemuThread listen_thread;
+ /* Migration main recv thread */
+ bool have_recv_thread;
+ QemuThread recv_thread;
+
/* For the kernel to send us notifications */
int userfault_fd;
/* To notify the fault_thread to wake, e.g., when need to quit */
@@ -272,15 +276,7 @@ struct MigrationIncomingState {
MigrationStatus state;
- /*
- * The incoming migration coroutine, non-NULL during qemu_loadvm_state().
- * Used to wake the migration incoming coroutine from rdma code. How much is
- * it safe - it's a question.
- */
- Coroutine *loadvm_co;
-
- /* The coroutine we should enter (back) after failover */
- Coroutine *colo_incoming_co;
+ /* Notify secondary VM to move on */
QemuEvent colo_incoming_event;
/* Optional load threads pool and its thread exit request flag */
diff --git a/migration/colo-stubs.c b/migration/colo-stubs.c
index e22ce65234..ef77d1ab4b 100644
--- a/migration/colo-stubs.c
+++ b/migration/colo-stubs.c
@@ -9,7 +9,7 @@ void colo_shutdown(void)
{
}
-void coroutine_fn colo_incoming_co(void)
+void colo_incoming_wait(void)
{
}
diff --git a/migration/colo.c b/migration/colo.c
index 4fd586951a..81276a3e65 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -147,11 +147,6 @@ static void secondary_vm_do_failover(void)
}
/* Notify COLO incoming thread that failover work is finished */
qemu_event_set(&mis->colo_incoming_event);
-
- /* For Secondary VM, jump to incoming co */
- if (mis->colo_incoming_co) {
- qemu_coroutine_enter(mis->colo_incoming_co);
- }
}
static void primary_vm_do_failover(void)
@@ -848,10 +843,8 @@ static void *colo_process_incoming_thread(void *opaque)
mis->to_src_file = qemu_file_get_return_path(mis->from_src_file);
/*
- * Note: the communication between Primary side and Secondary side
- * should be sequential, we set the fd to unblocked in migration incoming
- * coroutine, and here we are in the COLO incoming thread, so it is ok to
- * set the fd back to blocked.
+ * Here we are in the COLO incoming thread, so it is ok to set the fd
+ * to blocking.
*/
if (!qemu_file_set_blocking(mis->from_src_file, true, &local_err)) {
error_report_err(local_err);
@@ -927,27 +920,22 @@ out:
return NULL;
}
-void coroutine_fn colo_incoming_co(void)
+/* Wait for failover */
+void colo_incoming_wait(void)
{
MigrationIncomingState *mis = migration_incoming_get_current();
QemuThread th;
- assert(bql_locked());
assert(migration_incoming_colo_enabled());
qemu_thread_create(&th, MIGRATION_THREAD_DST_COLO,
colo_process_incoming_thread,
mis, QEMU_THREAD_JOINABLE);
- mis->colo_incoming_co = qemu_coroutine_self();
- qemu_coroutine_yield();
- mis->colo_incoming_co = NULL;
-
- bql_unlock();
/* Wait checkpoint incoming thread exit before free resource */
qemu_thread_join(&th);
- bql_lock();
- /* We hold the global BQL, so it is safe here */
+ bql_lock();
colo_release_ram_cache();
+ bql_unlock();
}
diff --git a/migration/migration.c b/migration/migration.c
index 38a584afae..728d02dbee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -491,6 +491,11 @@ void migration_incoming_state_destroy(void)
mis->postcopy_qemufile_dst = NULL;
}
+ if (mis->have_recv_thread) {
+ qemu_thread_join(&mis->recv_thread);
+ mis->have_recv_thread = false;
+ }
+
cpr_set_incoming_mode(MIG_MODE_NONE);
yank_unregister_instance(MIGRATION_YANK_INSTANCE);
}
@@ -861,30 +866,46 @@ static void process_incoming_migration_bh(void *opaque)
migration_incoming_state_destroy();
}
-static void coroutine_fn
-process_incoming_migration_co(void *opaque)
+static void migration_incoming_state_destroy_bh(void *opaque)
+{
+ struct MigrationIncomingState *mis = opaque;
+
+ migration_incoming_state_destroy();
+
+ if (mis->exit_on_error) {
+ /*
+ * NOTE: this exit() should better happen in the main thread, as
+ * the exit notifier may require BQL which can deadlock. See
+ * commit e7bc0204e57836 for example.
+ */
+ exit(EXIT_FAILURE);
+ }
+}
+
+static void *migration_incoming_thread(void *opaque)
{
MigrationState *s = migrate_get_current();
- MigrationIncomingState *mis = migration_incoming_get_current();
+ MigrationIncomingState *mis = opaque;
PostcopyState ps;
int ret;
Error *local_err = NULL;
+ rcu_register_thread();
+
assert(mis->from_src_file);
+ assert(!bql_locked());
mis->largest_page_size = qemu_ram_pagesize_largest();
postcopy_state_set(POSTCOPY_INCOMING_NONE);
migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_ACTIVE);
- mis->loadvm_co = qemu_coroutine_self();
- ret = qemu_loadvm_state(mis->from_src_file, true, &local_err);
- mis->loadvm_co = NULL;
+ ret = qemu_loadvm_state(mis->from_src_file, false, &local_err);
trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
ps = postcopy_state_get();
- trace_process_incoming_migration_co_end(ret, ps);
+ trace_process_incoming_migration_end(ret, ps);
if (ps != POSTCOPY_INCOMING_NONE) {
if (ps == POSTCOPY_INCOMING_ADVISE) {
/*
@@ -898,7 +919,7 @@ process_incoming_migration_co(void *opaque)
* Postcopy was started, cleanup should happen at the end of the
* postcopy thread.
*/
- trace_process_incoming_migration_co_postcopy_end_main();
+ trace_process_incoming_migration_postcopy_end_main();
goto out;
}
/* Else if something went wrong then just fall out of the normal exit */
@@ -911,8 +932,8 @@ process_incoming_migration_co(void *opaque)
}
if (migration_incoming_colo_enabled()) {
- /* yield until COLO exit */
- colo_incoming_co();
+ /* wait until COLO exits */
+ colo_incoming_wait();
}
migration_bh_schedule(process_incoming_migration_bh, mis);
@@ -924,28 +945,22 @@ fail:
migrate_set_error(s, local_err);
error_free(local_err);
- migration_incoming_state_destroy();
+ WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+ error_report_err(s->error);
+ s->error = NULL;
+ }
- if (mis->exit_on_error) {
- WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
- error_report_err(s->error);
- s->error = NULL;
- }
+ /*
+ * There's some step of the destroy process that will need to happen in
+ * the main thread (e.g. joining this thread itself). Leave to a BH.
+ */
+ migration_bh_schedule(migration_incoming_state_destroy_bh, (void *)mis);
- exit(EXIT_FAILURE);
- } else {
- /*
- * Report the error here in case that QEMU abruptly exits
- * when postcopy is enabled.
- */
- WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
- error_report_err(s->error);
- s->error = NULL;
- }
- }
out:
/* Pairs with the refcount taken in qmp_migrate_incoming() */
migrate_incoming_unref_outgoing_state();
+ rcu_unregister_thread();
+ return NULL;
}
/**
@@ -963,8 +978,12 @@ static void migration_incoming_setup(QEMUFile *f)
void migration_incoming_process(void)
{
- Coroutine *co = qemu_coroutine_create(process_incoming_migration_co, NULL);
- qemu_coroutine_enter(co);
+ MigrationIncomingState *mis = migration_incoming_get_current();
+
+ mis->have_recv_thread = true;
+ qemu_thread_create(&mis->recv_thread, "mig/dst/main",
+ migration_incoming_thread, mis,
+ QEMU_THREAD_JOINABLE);
}
/* Returns true if recovered from a paused migration, otherwise false */
diff --git a/migration/rdma.c b/migration/rdma.c
index 0e5e02cdca..3389f6448b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3051,37 +3051,6 @@ int rdma_control_save_page(QEMUFile *f, ram_addr_t block_offset,
static void rdma_accept_incoming_migration(void *opaque);
-static void rdma_cm_poll_handler(void *opaque)
-{
- RDMAContext *rdma = opaque;
- struct rdma_cm_event *cm_event;
- MigrationIncomingState *mis = migration_incoming_get_current();
-
- if (rdma_get_cm_event(rdma->channel, &cm_event) < 0) {
- error_report("get_cm_event failed %d", errno);
- return;
- }
-
- if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
- cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
- if (!rdma->errored &&
- migration_incoming_get_current()->state !=
- MIGRATION_STATUS_COMPLETED) {
- error_report("receive cm event, cm event is %d", cm_event->event);
- rdma->errored = true;
- if (rdma->return_path) {
- rdma->return_path->errored = true;
- }
- }
- rdma_ack_cm_event(cm_event);
- if (mis->loadvm_co) {
- qemu_coroutine_enter(mis->loadvm_co);
- }
- return;
- }
- rdma_ack_cm_event(cm_event);
-}
-
static int qemu_rdma_accept(RDMAContext *rdma)
{
Error *err = NULL;
@@ -3199,8 +3168,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
NULL,
(void *)(intptr_t)rdma->return_path);
} else {
- qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
- NULL, rdma);
+ qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
}
ret = rdma_accept(rdma->cm_id, &conn_param);
diff --git a/migration/savevm.c b/migration/savevm.c
index 44aadc2f51..991f46593c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2118,7 +2118,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
qemu_file_set_blocking(f, true, &error_fatal);
/* TODO: sanity check that only postcopiable data will be loaded here */
- load_res = qemu_loadvm_state_main(f, mis, true, &local_err);
+ load_res = qemu_loadvm_state_main(f, mis, false, &local_err);
/*
* This is tricky, but, mis->from_src_file can change after it
@@ -2415,11 +2415,11 @@ static void loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
* Immediately following this command is a blob of data containing an embedded
* chunk of migration stream; read it and load it.
*
- * @mis: Incoming state
- * @length: Length of packaged data to read
+ * @mis: Incoming state
+ * @bql_held: Whether BQL is held already
+ * @errp: The Error** to set when returning failures.
*
* Returns: Negative values on error
- *
*/
static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
bool bql_held, Error **errp)
diff --git a/migration/trace-events b/migration/trace-events
index e8edd1fbba..2b7b522e73 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -193,8 +193,8 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
source_return_path_thread_switchover_acked(void) ""
migration_thread_low_pending(uint64_t pending) "%" PRIu64
migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
-process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
-process_incoming_migration_co_postcopy_end_main(void) ""
+process_incoming_migration_end(int ret, int ps) "ret=%d postcopy-state=%d"
+process_incoming_migration_postcopy_end_main(void) ""
postcopy_preempt_enabled(bool value) "%d"
migration_precopy_complete(void) ""
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 09/13] migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (7 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 08/13] migration: Thread-ify precopy vmstate load process Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-22 19:26 ` [PATCH 10/13] migration/postcopy: Remove workaround on wait preempt channel Peter Xu
` (4 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
Now after thread-ified dest VM load during precopy, we will always in a
thread context rather than within a coroutine. We can remove this path
now.
Reviewed-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/rdma.c | 90 ++++++++++++++++++++++--------------------------
1 file changed, 41 insertions(+), 49 deletions(-)
diff --git a/migration/rdma.c b/migration/rdma.c
index 3389f6448b..67119634d7 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -29,7 +29,6 @@
#include "qemu/rcu.h"
#include "qemu/sockets.h"
#include "qemu/bitmap.h"
-#include "qemu/coroutine.h"
#include "system/memory.h"
#include <sys/socket.h>
#include <netdb.h>
@@ -1345,61 +1344,54 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
{
struct rdma_cm_event *cm_event;
- if (qemu_in_coroutine()) {
- yield_until_fd_readable(comp_channel->fd);
- } else {
- /* This is the source side, we're in a separate thread
- * or destination prior to migration_fd_process_incoming()
- * after postcopy, the destination also in a separate thread.
- * we can't yield; so we have to poll the fd.
- * But we need to be able to handle 'cancel' or an error
- * without hanging forever.
- */
- while (!rdma->errored && !rdma->received_error) {
- GPollFD pfds[2];
- pfds[0].fd = comp_channel->fd;
- pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
- pfds[0].revents = 0;
-
- pfds[1].fd = rdma->channel->fd;
- pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
- pfds[1].revents = 0;
-
- /* 0.1s timeout, should be fine for a 'cancel' */
- switch (qemu_poll_ns(pfds, 2, 100 * 1000 * 1000)) {
- case 2:
- case 1: /* fd active */
- if (pfds[0].revents) {
- return 0;
- }
+ /*
+ * This is the source or dest side, either during precopy or
+ * postcopy. We're always in a separate thread when reaching here.
+ * Poll the fd. We need to be able to handle 'cancel' or an error
+ * without hanging forever.
+ */
+ while (!rdma->errored && !rdma->received_error) {
+ GPollFD pfds[2];
+ pfds[0].fd = comp_channel->fd;
+ pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+ pfds[0].revents = 0;
+
+ pfds[1].fd = rdma->channel->fd;
+ pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+ pfds[1].revents = 0;
+
+ /* 0.1s timeout, should be fine for a 'cancel' */
+ switch (qemu_poll_ns(pfds, 2, 100 * 1000 * 1000)) {
+ case 2:
+ case 1: /* fd active */
+ if (pfds[0].revents) {
+ return 0;
+ }
- if (pfds[1].revents) {
- if (rdma_get_cm_event(rdma->channel, &cm_event) < 0) {
- return -1;
- }
+ if (pfds[1].revents) {
+ if (rdma_get_cm_event(rdma->channel, &cm_event) < 0) {
+ return -1;
+ }
- if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
- cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
- rdma_ack_cm_event(cm_event);
- return -1;
- }
+ if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
+ cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
rdma_ack_cm_event(cm_event);
+ return -1;
}
- break;
+ rdma_ack_cm_event(cm_event);
+ }
+ break;
- case 0: /* Timeout, go around again */
- break;
+ case 0: /* Timeout, go around again */
+ break;
- default: /* Error of some type -
- * I don't trust errno from qemu_poll_ns
- */
- return -1;
- }
+ default: /* Error of some type - don't trust errno from qemu_poll_ns */
+ return -1;
+ }
- if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
- /* Bail out and let the cancellation happen */
- return -1;
- }
+ if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
+ /* Bail out and let the cancellation happen */
+ return -1;
}
}
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 10/13] migration/postcopy: Remove workaround on wait preempt channel
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (8 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 09/13] migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-22 19:26 ` [PATCH 11/13] migration/ram: Remove workaround on ram yield during load Peter Xu
` (3 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
This reverts commit 7afbdada7effbc2b97281bfbce0c6df351a3cf88.
Now after switching to a thread in loadvm process, the main thread should
be able to accept() even if loading the package could cause a page fault in
userfaultfd path.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/savevm.c | 21 ---------------------
1 file changed, 21 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index 991f46593c..16fae635c0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2452,27 +2452,6 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
- /*
- * Before loading the guest states, ensure that the preempt channel has
- * been ready to use, as some of the states (e.g. via virtio_load) might
- * trigger page faults that will be handled through the preempt channel.
- * So yield to the main thread in the case that the channel create event
- * hasn't been dispatched.
- *
- * TODO: if we can move migration loadvm out of main thread, then we
- * won't block main thread from polling the accept() fds. We can drop
- * this as a whole when that is done.
- */
- do {
- if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
- mis->postcopy_qemufile_dst) {
- break;
- }
-
- aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
- qemu_coroutine_yield();
- } while (1);
-
ret = qemu_loadvm_state_main(packf, mis, bql_held, errp);
trace_loadvm_handle_cmd_packaged_main(ret);
qemu_fclose(packf);
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 11/13] migration/ram: Remove workaround on ram yield during load
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (9 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 10/13] migration/postcopy: Remove workaround on wait preempt channel Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-22 19:26 ` [PATCH 12/13] migration: Allow blocking mode for incoming live migration Peter Xu
` (2 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
This reverts e65cec5e5d97927d22b39167d3e8edeffc771788.
RAM load path had a hack in the past explicitly yield the thread to the
main coroutine when RAM load spinning in a tight loop. Not needed now
because precopy RAM load now happens without the main thread.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 29f016cb25..16e81cf3bb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4304,7 +4304,7 @@ static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes)
static int ram_load_precopy(QEMUFile *f)
{
MigrationIncomingState *mis = migration_incoming_get_current();
- int flags = 0, ret = 0, invalid_flags = 0, i = 0;
+ int flags = 0, ret = 0, invalid_flags = 0;
if (migrate_mapped_ram()) {
invalid_flags |= (RAM_SAVE_FLAG_HOOK | RAM_SAVE_FLAG_MULTIFD_FLUSH |
@@ -4317,17 +4317,6 @@ static int ram_load_precopy(QEMUFile *f)
void *host = NULL, *host_bak = NULL;
uint8_t ch;
- /*
- * Yield periodically to let main loop run, but an iteration of
- * the main loop is expensive, so do it each some iterations
- */
- if ((i & 32767) == 0 && qemu_in_coroutine()) {
- aio_co_schedule(qemu_get_current_aio_context(),
- qemu_coroutine_self());
- qemu_coroutine_yield();
- }
- i++;
-
addr = qemu_get_be64(f);
ret = qemu_file_get_error(f);
if (ret) {
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 12/13] migration: Allow blocking mode for incoming live migration
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (10 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 11/13] migration/ram: Remove workaround on ram yield during load Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-22 19:26 ` [PATCH 13/13] migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START Peter Xu
2025-10-22 19:29 ` [PATCH 00/13] migration: Threadify loadvm process Peter Xu
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
We used to set nonblocking mode for incoming live migration. It was
required because while running inside a coroutine we cannot block in a
syscall. Instead of that, when a syscall needs blocking, we need to
register to the iochannel's GIO events and proactively yield the coroutine
so as to make sure other things can keep running on the QEMU main
thread. When the migration channel has pending data to read, QEMU main
thread will re-enter the blocked incoming migration coroutine.
Now, incoming migration is completely run inside a thread now. We can
safely switch to blocking mode for the main incoming channel.
It's still slightly more efficient to use blocking mode than nonblocking,
because QEMU can avoid creating temporary mainloops and avoid the extra
poll() syscall. Now it's safe to directly block in the iochannel's
syscall (e.g. io_readv()).
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/migration/migration.c b/migration/migration.c
index 728d02dbee..abd76801c2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -973,7 +973,9 @@ static void migration_incoming_setup(QEMUFile *f)
assert(!mis->from_src_file);
mis->from_src_file = f;
- qemu_file_set_blocking(f, false, &error_abort);
+
+ /* Incoming migration runs in a thread now, nonblocking is not needed */
+ qemu_file_set_blocking(f, true, &error_abort);
}
void migration_incoming_process(void)
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* [PATCH 13/13] migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (11 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 12/13] migration: Allow blocking mode for incoming live migration Peter Xu
@ 2025-10-22 19:26 ` Peter Xu
2025-10-22 19:29 ` [PATCH 00/13] migration: Threadify loadvm process Peter Xu
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:26 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, peterx, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin, Maciej S. Szmigiero,
Cédric Le Goater
That was there only because we used to do loadvm in a coroutine on the main
thread with BQL held, while there was a locking order issue.
Now make the API to not depend on BQL anymore. Instead making sure we
don't take BQL invoking switchover_start() hooks.
Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Cc: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
hw/vfio/migration-multifd.c | 3 ---
migration/savevm.c | 3 +--
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index e4785031a7..c824f30fb0 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -766,13 +766,10 @@ int vfio_multifd_switchover_start(VFIODevice *vbasedev)
assert(multifd);
- /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
- bql_unlock();
WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
assert(!multifd->load_bufs_thread_running);
multifd->load_bufs_thread_running = true;
}
- bql_lock();
qemu_loadvm_start_load_thread(vfio_load_bufs_thread, vbasedev);
diff --git a/migration/savevm.c b/migration/savevm.c
index 16fae635c0..f67b97c9ab 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2648,8 +2648,7 @@ static int loadvm_process_command(QEMUFile *f, bool bql_held, Error **errp)
return loadvm_process_enable_colo(mis, errp);
case MIG_CMD_SWITCHOVER_START:
- WITH_BQL_HELD(bql_held) {
- /* TODO: drop the BQL dependency */
+ WITH_BQL_RELEASED(bql_held) {
ret = loadvm_postcopy_handle_switchover_start(errp);
}
return ret;
--
2.50.1
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH 00/13] migration: Threadify loadvm process
2025-10-22 19:25 [PATCH 00/13] migration: Threadify loadvm process Peter Xu
` (12 preceding siblings ...)
2025-10-22 19:26 ` [PATCH 13/13] migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START Peter Xu
@ 2025-10-22 19:29 ` Peter Xu
13 siblings, 0 replies; 23+ messages in thread
From: Peter Xu @ 2025-10-22 19:29 UTC (permalink / raw)
To: qemu-devel
Cc: Li Zhijian, Hailiang Zhang, Kevin Wolf,
Vladimir Sementsov-Ogievskiy, Daniel P . Berrangé,
Fabiano Rosas, Zhang Chen, Dr . David Alan Gilbert, Prasad Pandit,
Paolo Bonzini, Yury Kotov, Juraj Marcin
On Wed, Oct 22, 2025 at 03:25:59PM -0400, Peter Xu wrote:
> This is v1, however not 10.2 material. The earliest I see fit would still
> be 11.0+ even if everything goes extremely smooth.
>
> Removal of RFC is only about that I'm more confident this should be able to
> land without breaking something too easily, as I smoked it slightly more
> cross-archs this time. AFAIU the best (and possibly only..) way to prove
> it solid is to merge it.. likely in the early phase of a dev cycle.
>
> The plan is we'll try to get to more device setups too soon, before it
> could land.
I forgot to attach a changelog, sorry, here it is:
rfc->v1:
- Collected tags
- Try to split the major patch 5 to smaller one [Dave]
- One with WITH_BQL_*() macros, rewritten to allow internal "return" or
nesting [Vladimir]
- One patch "migration: Pass in bql_held information from
qemu_loadvm_state()", the patch itself contain no functional change.
However it should contain all changes relevant to BQL, hence maybe it
would help review.
- One patch contains the rest that need to be in one patch.
- Refine commit message for patch "migration/rdma: Change io_create_watch()
to return immediately" [Fabiano]
- Pre-requisite patch to rework migration_channel_read_peek(), removing the
sleep functions [Fabiano]
- Added patch "io: Add qio_channel_wait_cond() helper"
- Added patch "migration: Properly wait on G_IO_IN when peeking messages"
- Squashed patch "migration/rdma: Remove rdma_cm_poll_handler" into patch
"migration: Thread-ify precopy vmstate load process", otherwise the patch
when split can be racy.
- Changed the vfio patch to be at the end, instead of adding dead code
directly remove the bql lock/unlock, making sure that API is invoked
without BQL instead.
- Added one patch "migration: Allow blocking mode for incoming live
migration", so that after the threadify work switching incoming main
channel to be in blocking mode. Reason in commit message.
It may not be complete, so there can be small touch ups here and there that
are not mentioned.
--
Peter Xu
^ permalink raw reply [flat|nested] 23+ messages in thread