* [Qemu-devel] [PATCH v2 0/5] migation: unbreak postcopy recovery
@ 2018-06-27 10:51 Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 1/5] migration: delay postcopy paused state Peter Xu
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Peter Xu @ 2018-06-27 10:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx
v2:
- break the first patch into several
- fix a QEMUFile leak
Please review. Thanks,
Peter Xu (5):
migration: delay postcopy paused state
migration: move income process out of multifd
migration: do explicit incoming setup for rdma
migration: unbreak postcopy recovery
migration: unify incoming processing
migration/ram.h | 2 +-
migration/exec.c | 3 ---
migration/fd.c | 3 ---
migration/migration.c | 34 +++++++++++++++++++++++++++-------
migration/ram.c | 11 +++++------
migration/rdma.c | 3 ++-
migration/savevm.c | 6 +++---
migration/socket.c | 5 -----
8 files changed, 38 insertions(+), 29 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Qemu-devel] [PATCH v2 1/5] migration: delay postcopy paused state
2018-06-27 10:51 [Qemu-devel] [PATCH v2 0/5] migation: unbreak postcopy recovery Peter Xu
@ 2018-06-27 10:51 ` Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 2/5] migration: move income process out of multifd Peter Xu
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2018-06-27 10:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx
Before this patch we firstly setup the postcopy-paused state then we
clean up the QEMUFile handles. That can be racy if there is a very fast
"migrate-recover" command running in parallel. Fix that up.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/savevm.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index c2f34ffc7c..851d74e8b6 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2194,9 +2194,6 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
/* Clear the triggered bit to allow one recovery */
mis->postcopy_recover_triggered = false;
- migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
- MIGRATION_STATUS_POSTCOPY_PAUSED);
-
assert(mis->from_src_file);
qemu_file_shutdown(mis->from_src_file);
qemu_fclose(mis->from_src_file);
@@ -2209,6 +2206,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
mis->to_src_file = NULL;
qemu_mutex_unlock(&mis->rp_mutex);
+ migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+ MIGRATION_STATUS_POSTCOPY_PAUSED);
+
/* Notify the fault thread for the invalidated file handle */
postcopy_fault_thread_notify(mis);
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [Qemu-devel] [PATCH v2 2/5] migration: move income process out of multifd
2018-06-27 10:51 [Qemu-devel] [PATCH v2 0/5] migation: unbreak postcopy recovery Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 1/5] migration: delay postcopy paused state Peter Xu
@ 2018-06-27 10:51 ` Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma Peter Xu
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2018-06-27 10:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx
Move the call to migration_incoming_process() out of multifd code. It's
a bit strange that we can migration generic calls in multifd code.
Instead, let multifd_recv_new_channel() return a boolean showing whether
it's ready to continue the incoming migration.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.h | 2 +-
migration/migration.c | 5 ++++-
migration/ram.c | 11 +++++------
3 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/migration/ram.h b/migration/ram.h
index d386f4d641..457bf54b8c 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -46,7 +46,7 @@ int multifd_save_cleanup(Error **errp);
int multifd_load_setup(void);
int multifd_load_cleanup(Error **errp);
bool multifd_recv_all_channels_created(void);
-void multifd_recv_new_channel(QIOChannel *ioc);
+bool multifd_recv_new_channel(QIOChannel *ioc);
uint64_t ram_pagesize_summary(void);
int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len);
diff --git a/migration/migration.c b/migration/migration.c
index e1eaa97df4..6ecea2de30 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -507,7 +507,10 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
migration_incoming_setup(f);
return;
}
- multifd_recv_new_channel(ioc);
+
+ if (multifd_recv_new_channel(ioc)) {
+ migration_incoming_process();
+ }
}
/**
diff --git a/migration/ram.c b/migration/ram.c
index cd5f55117d..52167d5142 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -873,7 +873,8 @@ bool multifd_recv_all_channels_created(void)
return thread_count == atomic_read(&multifd_recv_state->count);
}
-void multifd_recv_new_channel(QIOChannel *ioc)
+/* Return true if multifd is ready for the migration, otherwise false */
+bool multifd_recv_new_channel(QIOChannel *ioc)
{
MultiFDRecvParams *p;
Error *local_err = NULL;
@@ -882,7 +883,7 @@ void multifd_recv_new_channel(QIOChannel *ioc)
id = multifd_recv_initial_packet(ioc, &local_err);
if (id < 0) {
multifd_recv_terminate_threads(local_err);
- return;
+ return false;
}
p = &multifd_recv_state->params[id];
@@ -890,7 +891,7 @@ void multifd_recv_new_channel(QIOChannel *ioc)
error_setg(&local_err, "multifd: received id '%d' already setup'",
id);
multifd_recv_terminate_threads(local_err);
- return;
+ return false;
}
p->c = ioc;
object_ref(OBJECT(ioc));
@@ -899,9 +900,7 @@ void multifd_recv_new_channel(QIOChannel *ioc)
qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p,
QEMU_THREAD_JOINABLE);
atomic_inc(&multifd_recv_state->count);
- if (multifd_recv_state->count == migrate_multifd_channels()) {
- migration_incoming_process();
- }
+ return multifd_recv_state->count == migrate_multifd_channels();
}
/**
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma
2018-06-27 10:51 [Qemu-devel] [PATCH v2 0/5] migation: unbreak postcopy recovery Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 1/5] migration: delay postcopy paused state Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 2/5] migration: move income process out of multifd Peter Xu
@ 2018-06-27 10:51 ` Peter Xu
2018-06-27 12:25 ` Dr. David Alan Gilbert
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 4/5] migration: unbreak postcopy recovery Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 5/5] migration: unify incoming processing Peter Xu
4 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2018-06-27 10:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx
RDMA does not support postcopy recovery, so no need to go into such
logic. Instead of calling migration_fd_process_incoming(), let's call
the two functions that setup the incoming migration. There should have
no functional change at all.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/rdma.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/migration/rdma.c b/migration/rdma.c
index 8bd7159059..217cdd7dd8 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3687,7 +3687,8 @@ static void rdma_accept_incoming_migration(void *opaque)
}
rdma->migration_started_on_destination = 1;
- migration_fd_process_incoming(f);
+ migration_incoming_setup(f);
+ migration_incoming_process();
}
void rdma_start_incoming_migration(const char *host_port, Error **errp)
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [Qemu-devel] [PATCH v2 4/5] migration: unbreak postcopy recovery
2018-06-27 10:51 [Qemu-devel] [PATCH v2 0/5] migation: unbreak postcopy recovery Peter Xu
` (2 preceding siblings ...)
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma Peter Xu
@ 2018-06-27 10:51 ` Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 5/5] migration: unify incoming processing Peter Xu
4 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2018-06-27 10:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx
The whole postcopy recovery logic was accidentally broken. We need to
fix it in two steps.
This is the first step that we should do the recovery when needed. It
was bypassed before after commit 36c2f8be2c.
Since at it, rename the function migration_fd_process_incoming() into
postcopy_try_recover(), and make sure it only contain the recovery
logic (we don't need the removed lines any more since that's only used
by RDMA before and now RDMA code has got rid of it).
Fixes: 36c2f8be2c ("migration: Delay start of migration main routines")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 6ecea2de30..5e13b529fd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -466,7 +466,8 @@ void migration_incoming_process(void)
qemu_coroutine_enter(co);
}
-void migration_fd_process_incoming(QEMUFile *f)
+/* Returns true if recovered from a paused migration, otherwise false */
+static bool postcopy_try_recover(QEMUFile *f)
{
MigrationIncomingState *mis = migration_incoming_get_current();
@@ -491,11 +492,10 @@ void migration_fd_process_incoming(QEMUFile *f)
* that source is ready to reply to page requests.
*/
qemu_sem_post(&mis->postcopy_pause_sem_dst);
- } else {
- /* New incoming migration */
- migration_incoming_setup(f);
- migration_incoming_process();
+ return true;
}
+
+ return false;
}
void migration_ioc_process_incoming(QIOChannel *ioc)
@@ -504,6 +504,9 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
if (!mis->from_src_file) {
QEMUFile *f = qemu_fopen_channel_input(ioc);
+ if (postcopy_try_recover(f)) {
+ return;
+ }
migration_incoming_setup(f);
return;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [Qemu-devel] [PATCH v2 5/5] migration: unify incoming processing
2018-06-27 10:51 [Qemu-devel] [PATCH v2 0/5] migation: unbreak postcopy recovery Peter Xu
` (3 preceding siblings ...)
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 4/5] migration: unbreak postcopy recovery Peter Xu
@ 2018-06-27 10:51 ` Peter Xu
4 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2018-06-27 10:51 UTC (permalink / raw)
To: qemu-devel; +Cc: Juan Quintela, Dr . David Alan Gilbert, peterx
This is the 2nd patch to unbreak postcopy recovery.
Let's unify the migration_incoming_process() call at a single place
rather than calling it in connection setup codes. This fixes a problem
that we will go into incoming migration procedure even if we are trying
to recovery from a paused postcopy migration.
Fixes: 36c2f8be2c ("migration: Delay start of migration main routines")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/exec.c | 3 ---
migration/fd.c | 3 ---
migration/migration.c | 18 ++++++++++++++++--
migration/socket.c | 5 -----
4 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/migration/exec.c b/migration/exec.c
index 0bbeb63c97..375d2e1b54 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -49,9 +49,6 @@ static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
{
migration_channel_process_incoming(ioc);
object_unref(OBJECT(ioc));
- if (!migrate_use_multifd()) {
- migration_incoming_process();
- }
return G_SOURCE_REMOVE;
}
diff --git a/migration/fd.c b/migration/fd.c
index fee34ffdc0..a7c13df4ad 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -49,9 +49,6 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
{
migration_channel_process_incoming(ioc);
object_unref(OBJECT(ioc));
- if (!migrate_use_multifd()) {
- migration_incoming_process();
- }
return G_SOURCE_REMOVE;
}
diff --git a/migration/migration.c b/migration/migration.c
index 5e13b529fd..d6a18c85d7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -501,17 +501,31 @@ static bool postcopy_try_recover(QEMUFile *f)
void migration_ioc_process_incoming(QIOChannel *ioc)
{
MigrationIncomingState *mis = migration_incoming_get_current();
+ bool start_migration;
if (!mis->from_src_file) {
+ /* The first connection (multifd may have multiple) */
QEMUFile *f = qemu_fopen_channel_input(ioc);
+
+ /* If it's a recovery, we're done */
if (postcopy_try_recover(f)) {
return;
}
+
migration_incoming_setup(f);
- return;
+
+ /*
+ * Common migration only needs one channel, so we can start
+ * right now. Multifd needs more than one channel, we wait.
+ */
+ start_migration = !migrate_use_multifd();
+ } else {
+ /* Multiple connections */
+ assert(migrate_use_multifd());
+ start_migration = multifd_recv_new_channel(ioc);
}
- if (multifd_recv_new_channel(ioc)) {
+ if (start_migration) {
migration_incoming_process();
}
}
diff --git a/migration/socket.c b/migration/socket.c
index 3456eb76e9..f4c8174400 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -168,12 +168,7 @@ static void socket_accept_incoming_migration(QIONetListener *listener,
if (migration_has_all_channels()) {
/* Close listening socket as its no longer needed */
qio_net_listener_disconnect(listener);
-
object_unref(OBJECT(listener));
-
- if (!migrate_use_multifd()) {
- migration_incoming_process();
- }
}
}
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma Peter Xu
@ 2018-06-27 12:25 ` Dr. David Alan Gilbert
2018-06-27 13:03 ` Peter Xu
0 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2018-06-27 12:25 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juan Quintela
* Peter Xu (peterx@redhat.com) wrote:
> RDMA does not support postcopy recovery, so no need to go into such
> logic. Instead of calling migration_fd_process_incoming(), let's call
> the two functions that setup the incoming migration. There should have
> no functional change at all.
Can you explain why we need to though? The reason I ask is that there
is Lidong Chen's work that gets postcopy partially working with RDMA, so
then the next question is bound to be recovery.
Dave
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/rdma.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 8bd7159059..217cdd7dd8 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3687,7 +3687,8 @@ static void rdma_accept_incoming_migration(void *opaque)
> }
>
> rdma->migration_started_on_destination = 1;
> - migration_fd_process_incoming(f);
> + migration_incoming_setup(f);
> + migration_incoming_process();
> }
>
> void rdma_start_incoming_migration(const char *host_port, Error **errp)
> --
> 2.17.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma
2018-06-27 12:25 ` Dr. David Alan Gilbert
@ 2018-06-27 13:03 ` Peter Xu
2018-06-27 14:13 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2018-06-27 13:03 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: qemu-devel, Juan Quintela
On Wed, Jun 27, 2018 at 01:25:45PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > RDMA does not support postcopy recovery, so no need to go into such
> > logic. Instead of calling migration_fd_process_incoming(), let's call
> > the two functions that setup the incoming migration. There should have
> > no functional change at all.
>
> Can you explain why we need to though? The reason I ask is that there
> is Lidong Chen's work that gets postcopy partially working with RDMA, so
> then the next question is bound to be recovery.
Ah if so this patch needs to change. Could you paste me the message
id of the work? Or link?
After all I'll need to keep this bit, but I am just curious about what
is "partially" mean here. :)
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma
2018-06-27 13:03 ` Peter Xu
@ 2018-06-27 14:13 ` Dr. David Alan Gilbert
2018-06-29 6:58 ` Peter Xu
0 siblings, 1 reply; 10+ messages in thread
From: Dr. David Alan Gilbert @ 2018-06-27 14:13 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juan Quintela
* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jun 27, 2018 at 01:25:45PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > RDMA does not support postcopy recovery, so no need to go into such
> > > logic. Instead of calling migration_fd_process_incoming(), let's call
> > > the two functions that setup the incoming migration. There should have
> > > no functional change at all.
> >
> > Can you explain why we need to though? The reason I ask is that there
> > is Lidong Chen's work that gets postcopy partially working with RDMA, so
> > then the next question is bound to be recovery.
>
> Ah if so this patch needs to change. Could you paste me the message
> id of the work? Or link?
>
> After all I'll need to keep this bit, but I am just curious about what
> is "partially" mean here. :)
https://patchwork.ozlabs.org/cover/925525/
It enables postcopy with RDMA, but once it switches into postcopy mode
it doesn't use the fast features of RDMA, it still shuffles the data
as a stream (using some more basic RDMA features).
Dave
> Thanks,
>
> --
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma
2018-06-27 14:13 ` Dr. David Alan Gilbert
@ 2018-06-29 6:58 ` Peter Xu
0 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2018-06-29 6:58 UTC (permalink / raw)
To: Dr. David Alan Gilbert; +Cc: qemu-devel, Juan Quintela
On Wed, Jun 27, 2018 at 03:13:04PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Wed, Jun 27, 2018 at 01:25:45PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > RDMA does not support postcopy recovery, so no need to go into such
> > > > logic. Instead of calling migration_fd_process_incoming(), let's call
> > > > the two functions that setup the incoming migration. There should have
> > > > no functional change at all.
> > >
> > > Can you explain why we need to though? The reason I ask is that there
> > > is Lidong Chen's work that gets postcopy partially working with RDMA, so
> > > then the next question is bound to be recovery.
> >
> > Ah if so this patch needs to change. Could you paste me the message
> > id of the work? Or link?
> >
> > After all I'll need to keep this bit, but I am just curious about what
> > is "partially" mean here. :)
>
> https://patchwork.ozlabs.org/cover/925525/
>
> It enables postcopy with RDMA, but once it switches into postcopy mode
> it doesn't use the fast features of RDMA, it still shuffles the data
> as a stream (using some more basic RDMA features).
Ah so it's not really partial but relatively complete, though it seems
that there can be further performance work on top. That's an
interesting work, thanks for the pointer.
--
Peter Xu
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-06-29 6:58 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-27 10:51 [Qemu-devel] [PATCH v2 0/5] migation: unbreak postcopy recovery Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 1/5] migration: delay postcopy paused state Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 2/5] migration: move income process out of multifd Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 3/5] migration: do explicit incoming setup for rdma Peter Xu
2018-06-27 12:25 ` Dr. David Alan Gilbert
2018-06-27 13:03 ` Peter Xu
2018-06-27 14:13 ` Dr. David Alan Gilbert
2018-06-29 6:58 ` Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 4/5] migration: unbreak postcopy recovery Peter Xu
2018-06-27 10:51 ` [Qemu-devel] [PATCH v2 5/5] migration: unify incoming processing Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).