[PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs
@ 2024-01-25 16:25 Avihai Horon
  2024-01-25 16:25 ` [PATCH 01/17] migration: Fix logic of channels and transport compatibility check Avihai Horon
                   ` (17 more replies)
  0 siblings, 18 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Hello,

Today there are several types of migration channels that can be used
during migration: main migration channel, multifd channels and postcopy
preempt channel. Each channel type has its own code to connect and to
TLS upgrade itself when needed. There is no unified API for these tasks
and it makes the code a bit unclear and cumbersome.

This series aims to solve this by introducing two new APIs for the above
tasks that will replace the existing code.

The first one is an API to TLS upgrade migration channels. A new
function, migration_tls_channel_connect(), is introduced and is used by
main migration, multifd and postcopy preempt channels.

The second one is an API to connect migration channels. A new function,
migration_channel_connect(), is introduced and is used by all migration
channels other than main migration channel, i.e., multifd and postcopy
preempt channels. The reason it's not used by main migration channel is
because the main channel is a bit special and has different flows, and I
didn't see a smooth way to include it.

Patch breakdown:
1-5: Some fixes and cleanups.
6-12: Add new migration channel TLS upgrade API.
13-17: Add new migration channel connect API.

Thanks for reviewing.

Avihai Horon (17):
  migration: Fix logic of channels and transport compatibility check
  migration: Move local_err check in migration_ioc_process_incoming()
  migration: Rename default_channel to main_channel
  migration/multifd: Set p->running = true in the right place
  migration/multifd: Wait for multifd channels creation before
    proceeding
  migration/tls: Rename main migration channel TLS functions
  migration/tls: Add new migration channel TLS upgrade API
  migration: Use the new TLS upgrade API for main channel
  migration/multifd: Use the new TLS upgrade API for multifd channels
  migration/postcopy: Use the new TLS upgrade API for preempt channel
  migration/tls: Make migration_tls_client_create() static
  migration/multifd: Consolidate TLS/non-TLS multifd channel error flow
  migration: Store MigrationAddress in MigrationState
  migration: Rename migration_channel_connect()
  migration: Add new migration channel connect API
  migration/multifd: Use the new migration channel connect API for
    multifd
  migration/postcopy: Use the new migration channel connect API for
    postcopy preempt

 migration/channel.h      |  30 +++++++--
 migration/migration.h    |   5 ++
 migration/multifd.h      |   3 +
 migration/postcopy-ram.h |   2 +-
 migration/tls.h          |  30 +++++++--
 migration/channel.c      | 106 +++++++++++++++++++++++++----
 migration/exec.c         |   2 +-
 migration/fd.c           |   2 +-
 migration/file.c         |   2 +-
 migration/migration.c    |  47 ++++++++-----
 migration/multifd.c      | 142 +++++++++++----------------------------
 migration/postcopy-ram.c | 111 +++++++++++-------------------
 migration/ram.c          |   7 ++
 migration/socket.c       |   2 +-
 migration/tls.c          |  94 +++++++++++++++++---------
 migration/trace-events   |  16 ++---
 16 files changed, 344 insertions(+), 257 deletions(-)

-- 
2.26.3

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/17] migration: Fix logic of channels and transport compatibility check
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-26  3:09   ` Peter Xu
  2024-01-25 16:25 ` [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming() Avihai Horon
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

The commit in the fixes line mistakenly modified the channels and
transport compatibility check logic so it now checks multi-channel
support only for socket transport type.

Thus, running multifd migration using a transport other than socket that
is incompatible with multi-channels (such as "exec") would lead to a
segmentation fault instead of an error message.
For example:
  (qemu) migrate_set_capability multifd on
  (qemu) migrate -d "exec:cat > /tmp/vm_state"
  Segmentation fault (core dumped)

Fix it by checking multi-channel compatibility for all transport types.

Fixes: d95533e1cdcc ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/migration.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 219447dea1..6fc544711a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -128,11 +128,17 @@ static bool migration_needs_multiple_sockets(void)
     return migrate_multifd() || migrate_postcopy_preempt();
 }
 
-static bool transport_supports_multi_channels(SocketAddress *saddr)
+static bool transport_supports_multi_channels(MigrationAddress *addr)
 {
-    return saddr->type == SOCKET_ADDRESS_TYPE_INET ||
-           saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
-           saddr->type == SOCKET_ADDRESS_TYPE_VSOCK;
+    if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
+        SocketAddress *saddr = &addr->u.socket;
+
+        return saddr->type == SOCKET_ADDRESS_TYPE_INET ||
+               saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
+               saddr->type == SOCKET_ADDRESS_TYPE_VSOCK;
+    }
+
+    return false;
 }
 
 static bool
@@ -140,8 +146,7 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
                                             Error **errp)
 {
     if (migration_needs_multiple_sockets() &&
-        (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) &&
-        !transport_supports_multi_channels(&addr->u.socket)) {
+        !transport_supports_multi_channels(addr)) {
         error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)");
         return false;
     }
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/17] migration: Fix logic of channels and transport compatibility check
  2024-01-25 16:25 ` [PATCH 01/17] migration: Fix logic of channels and transport compatibility check Avihai Horon
@ 2024-01-26  3:09   ` Peter Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Xu @ 2024-01-26  3:09 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas

On Thu, Jan 25, 2024 at 06:25:12PM +0200, Avihai Horon wrote:
> The commit in the fixes line mistakenly modified the channels and
> transport compatibility check logic so it now checks multi-channel
> support only for socket transport type.
> 
> Thus, running multifd migration using a transport other than socket that
> is incompatible with multi-channels (such as "exec") would lead to a
> segmentation fault instead of an error message.
> For example:
>   (qemu) migrate_set_capability multifd on
>   (qemu) migrate -d "exec:cat > /tmp/vm_state"
>   Segmentation fault (core dumped)
> 
> Fix it by checking multi-channel compatibility for all transport types.
> 
> Fixes: d95533e1cdcc ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming()
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
  2024-01-25 16:25 ` [PATCH 01/17] migration: Fix logic of channels and transport compatibility check Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-26  3:10   ` Peter Xu
  2024-01-25 16:25 ` [PATCH 03/17] migration: Rename default_channel to main_channel Avihai Horon
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

local_err in migration_ioc_process_incoming() is used only in the
multifd if branch. Move the local_err check under the multifd branch,
where it is actually used.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/migration.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6fc544711a..ccd497ca21 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -863,15 +863,15 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
         assert(migration_needs_multiple_sockets());
         if (migrate_multifd()) {
             multifd_recv_new_channel(ioc, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                return;
+            }
         } else {
             assert(migrate_postcopy_preempt());
             f = qemu_file_new_input(ioc);
             postcopy_preempt_new_channel(mis, f);
         }
-        if (local_err) {
-            error_propagate(errp, local_err);
-            return;
-        }
     }
 
     if (migration_should_start_incoming(default_channel)) {
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming()
  2024-01-25 16:25 ` [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming() Avihai Horon
@ 2024-01-26  3:10   ` Peter Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Xu @ 2024-01-26  3:10 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas

On Thu, Jan 25, 2024 at 06:25:13PM +0200, Avihai Horon wrote:
> local_err in migration_ioc_process_incoming() is used only in the
> multifd if branch. Move the local_err check under the multifd branch,
> where it is actually used.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 03/17] migration: Rename default_channel to main_channel
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
  2024-01-25 16:25 ` [PATCH 01/17] migration: Fix logic of channels and transport compatibility check Avihai Horon
  2024-01-25 16:25 ` [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming() Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-26  3:11   ` Peter Xu
  2024-01-25 16:25 ` [PATCH 04/17] migration/multifd: Set p->running = true in the right place Avihai Horon
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

migration_ioc_process_incoming() uses the term "default_channel" to
describe the main migration channel. Rename it to the more commonly used
and informative term "main_channel".

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/migration.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ccd497ca21..9c769a1ecd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -823,7 +823,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
     MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
     QEMUFile *f;
-    bool default_channel = true;
+    bool main_channel = true;
     uint32_t channel_magic = 0;
     int ret = 0;
 
@@ -846,16 +846,16 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
             return;
         }
 
-        default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC));
+        main_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC));
     } else {
-        default_channel = !mis->from_src_file;
+        main_channel = !mis->from_src_file;
     }
 
     if (multifd_load_setup(errp) != 0) {
         return;
     }
 
-    if (default_channel) {
+    if (main_channel) {
         f = qemu_file_new_input(ioc);
         migration_incoming_setup(f);
     } else {
@@ -874,7 +874,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
         }
     }
 
-    if (migration_should_start_incoming(default_channel)) {
+    if (migration_should_start_incoming(main_channel)) {
         /* If it's a recovery, we're done */
         if (postcopy_try_recover()) {
             return;
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/17] migration: Rename default_channel to main_channel
  2024-01-25 16:25 ` [PATCH 03/17] migration: Rename default_channel to main_channel Avihai Horon
@ 2024-01-26  3:11   ` Peter Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Xu @ 2024-01-26  3:11 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas

On Thu, Jan 25, 2024 at 06:25:14PM +0200, Avihai Horon wrote:
> migration_ioc_process_incoming() uses the term "default_channel" to
> describe the main migration channel. Rename it to the more commonly used
> and informative term "main_channel".
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (2 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 03/17] migration: Rename default_channel to main_channel Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 20:57   ` Fabiano Rosas
  2024-01-25 16:25 ` [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding Avihai Horon
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

The commit in the fixes line moved multifd thread creation to a
different location, but forgot to move the p->running = true assignment
as well. Thus, p->running is set to true before multifd thread is
actually created.

p->running is used in multifd_save_cleanup() to decide whether to join
the multifd thread or not.

With TLS, an error in multifd_tls_channel_connect() can lead to a
segmentation fault because p->running is true but p->thread is never
initialized, so multifd_save_cleanup() tries to join an uninitialized
thread.

Fix it by moving p->running = true assignment right after multifd thread
creation. Also move qio_channel_set_delay() to there, as this is where
it used to be originally.

Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/multifd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 25cbc6dc6b..564e911b6c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -850,11 +850,13 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
         return multifd_tls_channel_connect(p, ioc, errp);
     }
 
+    qio_channel_set_delay(ioc, false);
     migration_ioc_register_yank(ioc);
     p->registered_yank = true;
     p->c = ioc;
     qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
                        QEMU_THREAD_JOINABLE);
+    p->running = true;
     return true;
 }
 
@@ -883,8 +885,6 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
 
     trace_multifd_new_send_channel_async(p->id);
     if (!qio_task_propagate_error(task, &local_err)) {
-        qio_channel_set_delay(ioc, false);
-        p->running = true;
         if (multifd_channel_connect(p, ioc, &local_err)) {
             return;
         }
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-25 16:25 ` [PATCH 04/17] migration/multifd: Set p->running = true in the right place Avihai Horon
@ 2024-01-25 20:57   ` Fabiano Rosas
  2024-01-28 15:43     ` Avihai Horon
  0 siblings, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2024-01-25 20:57 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Avihai Horon

Avihai Horon <avihaih@nvidia.com> writes:

> The commit in the fixes line moved multifd thread creation to a
> different location, but forgot to move the p->running = true assignment
> as well. Thus, p->running is set to true before multifd thread is
> actually created.
>
> p->running is used in multifd_save_cleanup() to decide whether to join
> the multifd thread or not.
>
> With TLS, an error in multifd_tls_channel_connect() can lead to a
> segmentation fault because p->running is true but p->thread is never
> initialized, so multifd_save_cleanup() tries to join an uninitialized
> thread.
>
> Fix it by moving p->running = true assignment right after multifd thread
> creation. Also move qio_channel_set_delay() to there, as this is where
> it used to be originally.
>
> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>

Just for context, I haven't looked at this patch yet, but we were
planning to remove p->running altogether:

https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-25 20:57   ` Fabiano Rosas
@ 2024-01-28 15:43     ` Avihai Horon
  2024-01-29  4:17       ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-28 15:43 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel; +Cc: Peter Xu


On 25/01/2024 22:57, Fabiano Rosas wrote:
> External email: Use caution opening links or attachments
>
>
> Avihai Horon <avihaih@nvidia.com> writes:
>
>> The commit in the fixes line moved multifd thread creation to a
>> different location, but forgot to move the p->running = true assignment
>> as well. Thus, p->running is set to true before multifd thread is
>> actually created.
>>
>> p->running is used in multifd_save_cleanup() to decide whether to join
>> the multifd thread or not.
>>
>> With TLS, an error in multifd_tls_channel_connect() can lead to a
>> segmentation fault because p->running is true but p->thread is never
>> initialized, so multifd_save_cleanup() tries to join an uninitialized
>> thread.
>>
>> Fix it by moving p->running = true assignment right after multifd thread
>> creation. Also move qio_channel_set_delay() to there, as this is where
>> it used to be originally.
>>
>> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> Just for context, I haven't looked at this patch yet, but we were
> planning to remove p->running altogether:
>
> https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de

Thanks for putting me in the picture.
I see that there has been a discussion about the multifd 
creation/treadown flow.
In light of this discussion, I can already see a few problems in my 
series that I didn't notice before (such as the TLS handshake thread leak).
The thread you mentioned here and some of my patches point out some 
problems in multifd creation/treardown. I guess we can discuss it and 
see what's the best way to solve them.

Regarding this patch, your solution indeed solves the bug that this 
patch addresses, so maybe this could be dropped (or only noted in your 
patch).

Maybe I should also put you (and Peter) in context for this whole series 
-- I am writing it as preparation for adding a separate migration 
channel for VFIO device migration, so VFIO devices could be migrated in 
parallel.
So this series tries to lay down some foundations to facilitate it.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-28 15:43     ` Avihai Horon
@ 2024-01-29  4:17       ` Peter Xu
  2024-01-29 12:20         ` Avihai Horon
  2024-01-29 12:23         ` Fabiano Rosas
  0 siblings, 2 replies; 37+ messages in thread
From: Peter Xu @ 2024-01-29  4:17 UTC (permalink / raw)
  To: Avihai Horon; +Cc: Fabiano Rosas, qemu-devel

On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
> 
> On 25/01/2024 22:57, Fabiano Rosas wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Avihai Horon <avihaih@nvidia.com> writes:
> > 
> > > The commit in the fixes line moved multifd thread creation to a
> > > different location, but forgot to move the p->running = true assignment
> > > as well. Thus, p->running is set to true before multifd thread is
> > > actually created.
> > > 
> > > p->running is used in multifd_save_cleanup() to decide whether to join
> > > the multifd thread or not.
> > > 
> > > With TLS, an error in multifd_tls_channel_connect() can lead to a
> > > segmentation fault because p->running is true but p->thread is never
> > > initialized, so multifd_save_cleanup() tries to join an uninitialized
> > > thread.
> > > 
> > > Fix it by moving p->running = true assignment right after multifd thread
> > > creation. Also move qio_channel_set_delay() to there, as this is where
> > > it used to be originally.
> > > 
> > > Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
> > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > Just for context, I haven't looked at this patch yet, but we were
> > planning to remove p->running altogether:
> > 
> > https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de
> 
> Thanks for putting me in the picture.
> I see that there has been a discussion about the multifd creation/treadown
> flow.
> In light of this discussion, I can already see a few problems in my series
> that I didn't notice before (such as the TLS handshake thread leak).
> The thread you mentioned here and some of my patches point out some problems
> in multifd creation/treardown. I guess we can discuss it and see what's the
> best way to solve them.
> 
> Regarding this patch, your solution indeed solves the bug that this patch
> addresses, so maybe this could be dropped (or only noted in your patch).
> 
> Maybe I should also put you (and Peter) in context for this whole series --
> I am writing it as preparation for adding a separate migration channel for
> VFIO device migration, so VFIO devices could be migrated in parallel.
> So this series tries to lay down some foundations to facilitate it.

Avihai, is the throughput the only reason that VFIO would like to have a
separate channel?

I'm wondering if we can also use multifd threads to send vfio data at some
point.  Now multifd indeed is closely bound to ram pages but maybe it'll
change in the near future to take any load?

Multifd is for solving the throughput issue already. If vfio has the same
goal, IMHO it'll be good to keep them using the same thread model, instead
of managing different threads in different places.  With that, any user
setting (for example, multifd-n-threads) will naturally apply to all
components, rather than relying on yet-another vfio-migration-threads-num
parameter.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-29  4:17       ` Peter Xu
@ 2024-01-29 12:20         ` Avihai Horon
  2024-01-30  5:57           ` Peter Xu
  2024-01-29 12:23         ` Fabiano Rosas
  1 sibling, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-29 12:20 UTC (permalink / raw)
  To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel

On 29/01/2024 6:17, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
>> On 25/01/2024 22:57, Fabiano Rosas wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Avihai Horon <avihaih@nvidia.com> writes:
>>>
>>>> The commit in the fixes line moved multifd thread creation to a
>>>> different location, but forgot to move the p->running = true assignment
>>>> as well. Thus, p->running is set to true before multifd thread is
>>>> actually created.
>>>>
>>>> p->running is used in multifd_save_cleanup() to decide whether to join
>>>> the multifd thread or not.
>>>>
>>>> With TLS, an error in multifd_tls_channel_connect() can lead to a
>>>> segmentation fault because p->running is true but p->thread is never
>>>> initialized, so multifd_save_cleanup() tries to join an uninitialized
>>>> thread.
>>>>
>>>> Fix it by moving p->running = true assignment right after multifd thread
>>>> creation. Also move qio_channel_set_delay() to there, as this is where
>>>> it used to be originally.
>>>>
>>>> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> Just for context, I haven't looked at this patch yet, but we were
>>> planning to remove p->running altogether:
>>>
>>> https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de
>> Thanks for putting me in the picture.
>> I see that there has been a discussion about the multifd creation/treadown
>> flow.
>> In light of this discussion, I can already see a few problems in my series
>> that I didn't notice before (such as the TLS handshake thread leak).
>> The thread you mentioned here and some of my patches point out some problems
>> in multifd creation/treardown. I guess we can discuss it and see what's the
>> best way to solve them.
>>
>> Regarding this patch, your solution indeed solves the bug that this patch
>> addresses, so maybe this could be dropped (or only noted in your patch).
>>
>> Maybe I should also put you (and Peter) in context for this whole series --
>> I am writing it as preparation for adding a separate migration channel for
>> VFIO device migration, so VFIO devices could be migrated in parallel.
>> So this series tries to lay down some foundations to facilitate it.
> Avihai, is the throughput the only reason that VFIO would like to have a
> separate channel?

Actually, the main reason is to be able to send and load multiple VFIO 
devices data in parallel.
For example, today if we have three VFIO devices, they are migrated 
sequentially one after another.
This particularly hurts during the complete pre-copy phase (downtime), 
as loading the VFIO data in destination involves FW interaction and 
resource allocation, which takes time and simply blocks the other 
devices from sending and loading their data.
Providing a separate channel and thread for each VIFO device solves this 
problem and ideally reduces the VFIO contribution to downtime from 
sum{VFIO device #1, ..., VFIO device #N} to max{VFIO device #1, ..., 
VFIO device #N}.

>
> I'm wondering if we can also use multifd threads to send vfio data at some
> point.  Now multifd indeed is closely bound to ram pages but maybe it'll
> change in the near future to take any load?
>
> Multifd is for solving the throughput issue already. If vfio has the same
> goal, IMHO it'll be good to keep them using the same thread model, instead
> of managing different threads in different places.  With that, any user
> setting (for example, multifd-n-threads) will naturally apply to all
> components, rather than relying on yet-another vfio-migration-threads-num
> parameter.

Frankly, I didn't really put much attention to the throughput factor, 
and my plan is to introduce only a single thread per device.
VFIO devices may have many GBs of data to migrate (e.g., vGPUs) and even 
mlx5 VFs can have a few GBs of data.
So what you are saying here is interesting, although I didn't test such 
scenario to see the actual benefit.

I am trying to think if/how this could work and I have a few concerns:
1. RAM is made of fixed-positioned pages that can be randomly 
read/written, so sending these pages over multiple channels and loading 
them in the destination can work pretty naturally without much overhead.
    VFIO device data, on the other hand, is just an opaque stream of 
bytes from QEMU point of view. This means that if we break this data to 
"packets" and send them over multiple channels, we must preserve the 
order by which this data was
    originally read from the device and write the data in the same order 
to the destination device.
    I am wondering if the overhead of maintaining such order may hurt 
performance, making it not worthwhile.

2. As I mentioned above, the main motivation for separate VFIO migration 
channel/thread, as I see today, is to allow parallel migration of VFIO 
devices.
    AFAIU multifd, as it is today, doesn't provide such parallelism 
(i.e., in the complete pre-copy phase each device, be it RAM or VFIO, 
will fully send its data over the multifd threads and only after 
finishing will the next device send its data).

This is just what came to mind. Maybe we can think this more thoroughly 
and see if it could work somehow, now or in the future.
However, I think making the multifd threads generic so they can send any 
kind of data is a good thing in general, regardless of VFIO.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-29 12:20         ` Avihai Horon
@ 2024-01-30  5:57           ` Peter Xu
  2024-01-30 18:44             ` Avihai Horon
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2024-01-30  5:57 UTC (permalink / raw)
  To: Avihai Horon; +Cc: Fabiano Rosas, qemu-devel

On Mon, Jan 29, 2024 at 02:20:35PM +0200, Avihai Horon wrote:
> 
> On 29/01/2024 6:17, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
> > > On 25/01/2024 22:57, Fabiano Rosas wrote:
> > > > External email: Use caution opening links or attachments
> > > > 
> > > > 
> > > > Avihai Horon <avihaih@nvidia.com> writes:
> > > > 
> > > > > The commit in the fixes line moved multifd thread creation to a
> > > > > different location, but forgot to move the p->running = true assignment
> > > > > as well. Thus, p->running is set to true before multifd thread is
> > > > > actually created.
> > > > > 
> > > > > p->running is used in multifd_save_cleanup() to decide whether to join
> > > > > the multifd thread or not.
> > > > > 
> > > > > With TLS, an error in multifd_tls_channel_connect() can lead to a
> > > > > segmentation fault because p->running is true but p->thread is never
> > > > > initialized, so multifd_save_cleanup() tries to join an uninitialized
> > > > > thread.
> > > > > 
> > > > > Fix it by moving p->running = true assignment right after multifd thread
> > > > > creation. Also move qio_channel_set_delay() to there, as this is where
> > > > > it used to be originally.
> > > > > 
> > > > > Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
> > > > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > > > Just for context, I haven't looked at this patch yet, but we were
> > > > planning to remove p->running altogether:
> > > > 
> > > > https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de
> > > Thanks for putting me in the picture.
> > > I see that there has been a discussion about the multifd creation/treadown
> > > flow.
> > > In light of this discussion, I can already see a few problems in my series
> > > that I didn't notice before (such as the TLS handshake thread leak).
> > > The thread you mentioned here and some of my patches point out some problems
> > > in multifd creation/treardown. I guess we can discuss it and see what's the
> > > best way to solve them.
> > > 
> > > Regarding this patch, your solution indeed solves the bug that this patch
> > > addresses, so maybe this could be dropped (or only noted in your patch).
> > > 
> > > Maybe I should also put you (and Peter) in context for this whole series --
> > > I am writing it as preparation for adding a separate migration channel for
> > > VFIO device migration, so VFIO devices could be migrated in parallel.
> > > So this series tries to lay down some foundations to facilitate it.
> > Avihai, is the throughput the only reason that VFIO would like to have a
> > separate channel?
> 
> Actually, the main reason is to be able to send and load multiple VFIO
> devices data in parallel.
> For example, today if we have three VFIO devices, they are migrated
> sequentially one after another.
> This particularly hurts during the complete pre-copy phase (downtime), as
> loading the VFIO data in destination involves FW interaction and resource
> allocation, which takes time and simply blocks the other devices from
> sending and loading their data.
> Providing a separate channel and thread for each VIFO device solves this
> problem and ideally reduces the VFIO contribution to downtime from sum{VFIO
> device #1, ..., VFIO device #N} to max{VFIO device #1, ..., VFIO device #N}.

I see.

> 
> > 
> > I'm wondering if we can also use multifd threads to send vfio data at some
> > point.  Now multifd indeed is closely bound to ram pages but maybe it'll
> > change in the near future to take any load?
> > 
> > Multifd is for solving the throughput issue already. If vfio has the same
> > goal, IMHO it'll be good to keep them using the same thread model, instead
> > of managing different threads in different places.  With that, any user
> > setting (for example, multifd-n-threads) will naturally apply to all
> > components, rather than relying on yet-another vfio-migration-threads-num
> > parameter.
> 
> Frankly, I didn't really put much attention to the throughput factor, and my
> plan is to introduce only a single thread per device.
> VFIO devices may have many GBs of data to migrate (e.g., vGPUs) and even
> mlx5 VFs can have a few GBs of data.
> So what you are saying here is interesting, although I didn't test such
> scenario to see the actual benefit.
> 
> I am trying to think if/how this could work and I have a few concerns:
> 1. RAM is made of fixed-positioned pages that can be randomly read/written,
> so sending these pages over multiple channels and loading them in the
> destination can work pretty naturally without much overhead.
>    VFIO device data, on the other hand, is just an opaque stream of bytes
> from QEMU point of view. This means that if we break this data to "packets"
> and send them over multiple channels, we must preserve the order by which
> this data was
>    originally read from the device and write the data in the same order to
> the destination device.
>    I am wondering if the overhead of maintaining such order may hurt
> performance, making it not worthwhile.

Indeed, it seems to me VFIO migration is based on a streaming model where
there's no easy way to index a chunk of data.

Is there any background on how that kernel interface was designed?  It
seems pretty unfriendly to concurrency already: even if multiple devices
can migrate concurrently, each single device can already contain GBs of
data as you said, which is pretty common here.  I'm a bit surprised to see
that the kernel interface is designed in this way for such a device.

I was wondering the possibility of whether VFIO can provide data chunks
with indexes just like RAM (which is represented in ramblock offsets).

> 
> 2. As I mentioned above, the main motivation for separate VFIO migration
> channel/thread, as I see today, is to allow parallel migration of VFIO
> devices.
>    AFAIU multifd, as it is today, doesn't provide such parallelism (i.e., in
> the complete pre-copy phase each device, be it RAM or VFIO, will fully send
> its data over the multifd threads and only after finishing will the next
> device send its data).

Indeed. That's actually an issue not only to VFIO but also to migration in
general: we can't migrate device states concurrently, and multifd is out of
the picture here so far, but maybe there's chance.

Consider huge VMs where there can be already ~500 vCPUs, each need their
own get()/put() of CPU states from/to KVM.  It'll be nice if we can do this
in concurrent threads too.  Here VFIO is one of the devices that will also
benefit from such a design, and greatly.

I added a todo in our wiki for this, which I see it a general improvement,
and hopefully someone can look into this:

https://wiki.qemu.org/ToDo/LiveMigration#Device_state_concurrency

I hope VFIO can consider resolving this as a generic issue, rather than
providing its own solution.

> 
> This is just what came to mind. Maybe we can think this more thoroughly and
> see if it could work somehow, now or in the future.
> However, I think making the multifd threads generic so they can send any
> kind of data is a good thing in general, regardless of VFIO.

Right.

In general, having a VFIO separate channel may solve the immediate issue,
but it may still won't solve all, meanwhile it may introduce the first
example of using completely separate channel that migration won't easily
manage itself, which IMHO can cause migration much harder to maintain in
the future.

It may also in the future become some technical debt that VFIO will need to
take even if such a solution merged, because VFIO could have its own model
of handling a few similar problems that migration has.

I hope there's some way out that we can work together to improve the
framework, providing a clean approach and consider for the long terms.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-30  5:57           ` Peter Xu
@ 2024-01-30 18:44             ` Avihai Horon
  2024-02-06 10:25               ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-30 18:44 UTC (permalink / raw)
  To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel


On 30/01/2024 7:57, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, Jan 29, 2024 at 02:20:35PM +0200, Avihai Horon wrote:
>> On 29/01/2024 6:17, Peter Xu wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
>>>> On 25/01/2024 22:57, Fabiano Rosas wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> Avihai Horon <avihaih@nvidia.com> writes:
>>>>>
>>>>>> The commit in the fixes line moved multifd thread creation to a
>>>>>> different location, but forgot to move the p->running = true assignment
>>>>>> as well. Thus, p->running is set to true before multifd thread is
>>>>>> actually created.
>>>>>>
>>>>>> p->running is used in multifd_save_cleanup() to decide whether to join
>>>>>> the multifd thread or not.
>>>>>>
>>>>>> With TLS, an error in multifd_tls_channel_connect() can lead to a
>>>>>> segmentation fault because p->running is true but p->thread is never
>>>>>> initialized, so multifd_save_cleanup() tries to join an uninitialized
>>>>>> thread.
>>>>>>
>>>>>> Fix it by moving p->running = true assignment right after multifd thread
>>>>>> creation. Also move qio_channel_set_delay() to there, as this is where
>>>>>> it used to be originally.
>>>>>>
>>>>>> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
>>>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>>>> Just for context, I haven't looked at this patch yet, but we were
>>>>> planning to remove p->running altogether:
>>>>>
>>>>> https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de
>>>> Thanks for putting me in the picture.
>>>> I see that there has been a discussion about the multifd creation/treadown
>>>> flow.
>>>> In light of this discussion, I can already see a few problems in my series
>>>> that I didn't notice before (such as the TLS handshake thread leak).
>>>> The thread you mentioned here and some of my patches point out some problems
>>>> in multifd creation/treardown. I guess we can discuss it and see what's the
>>>> best way to solve them.
>>>>
>>>> Regarding this patch, your solution indeed solves the bug that this patch
>>>> addresses, so maybe this could be dropped (or only noted in your patch).
>>>>
>>>> Maybe I should also put you (and Peter) in context for this whole series --
>>>> I am writing it as preparation for adding a separate migration channel for
>>>> VFIO device migration, so VFIO devices could be migrated in parallel.
>>>> So this series tries to lay down some foundations to facilitate it.
>>> Avihai, is the throughput the only reason that VFIO would like to have a
>>> separate channel?
>> Actually, the main reason is to be able to send and load multiple VFIO
>> devices data in parallel.
>> For example, today if we have three VFIO devices, they are migrated
>> sequentially one after another.
>> This particularly hurts during the complete pre-copy phase (downtime), as
>> loading the VFIO data in destination involves FW interaction and resource
>> allocation, which takes time and simply blocks the other devices from
>> sending and loading their data.
>> Providing a separate channel and thread for each VIFO device solves this
>> problem and ideally reduces the VFIO contribution to downtime from sum{VFIO
>> device #1, ..., VFIO device #N} to max{VFIO device #1, ..., VFIO device #N}.
> I see.
>
>>> I'm wondering if we can also use multifd threads to send vfio data at some
>>> point.  Now multifd indeed is closely bound to ram pages but maybe it'll
>>> change in the near future to take any load?
>>>
>>> Multifd is for solving the throughput issue already. If vfio has the same
>>> goal, IMHO it'll be good to keep them using the same thread model, instead
>>> of managing different threads in different places.  With that, any user
>>> setting (for example, multifd-n-threads) will naturally apply to all
>>> components, rather than relying on yet-another vfio-migration-threads-num
>>> parameter.
>> Frankly, I didn't really put much attention to the throughput factor, and my
>> plan is to introduce only a single thread per device.
>> VFIO devices may have many GBs of data to migrate (e.g., vGPUs) and even
>> mlx5 VFs can have a few GBs of data.
>> So what you are saying here is interesting, although I didn't test such
>> scenario to see the actual benefit.
>>
>> I am trying to think if/how this could work and I have a few concerns:
>> 1. RAM is made of fixed-positioned pages that can be randomly read/written,
>> so sending these pages over multiple channels and loading them in the
>> destination can work pretty naturally without much overhead.
>>     VFIO device data, on the other hand, is just an opaque stream of bytes
>> from QEMU point of view. This means that if we break this data to "packets"
>> and send them over multiple channels, we must preserve the order by which
>> this data was
>>     originally read from the device and write the data in the same order to
>> the destination device.
>>     I am wondering if the overhead of maintaining such order may hurt
>> performance, making it not worthwhile.
> Indeed, it seems to me VFIO migration is based on a streaming model where
> there's no easy way to index a chunk of data.

Yes, you can see it here as well: 
https://elixir.bootlin.com/linux/v6.8-rc2/source/include/uapi/linux/vfio.h#L1039

>
> Is there any background on how that kernel interface was designed?  It
> seems pretty unfriendly to concurrency already: even if multiple devices
> can migrate concurrently, each single device can already contain GBs of
> data as you said, which is pretty common here.  I'm a bit surprised to see
> that the kernel interface is designed in this way for such a device.

Not that I know of. There has been a pretty big discussion about the 
uAPI back then when it was introduced, but not something formal.
However, I don't think having a few GBs of device data is the common 
case for VFIO devices, I just pointed out the extreme cases.

> I was wondering the possibility of whether VFIO can provide data chunks
> with indexes just like RAM (which is represented in ramblock offsets).

I am not sure this would be feasible, as it will involve major changes 
to the uAPI and for the devices.
But if that's something you wish to explore I can ask around and see if 
it's a hard no go.

>> 2. As I mentioned above, the main motivation for separate VFIO migration
>> channel/thread, as I see today, is to allow parallel migration of VFIO
>> devices.
>>     AFAIU multifd, as it is today, doesn't provide such parallelism (i.e., in
>> the complete pre-copy phase each device, be it RAM or VFIO, will fully send
>> its data over the multifd threads and only after finishing will the next
>> device send its data).
> Indeed. That's actually an issue not only to VFIO but also to migration in
> general: we can't migrate device states concurrently, and multifd is out of
> the picture here so far, but maybe there's chance.
>
> Consider huge VMs where there can be already ~500 vCPUs, each need their
> own get()/put() of CPU states from/to KVM.  It'll be nice if we can do this
> in concurrent threads too.  Here VFIO is one of the devices that will also
> benefit from such a design, and greatly.

Interesting, do you know how much time migrating these vCPUs take?

> I added a todo in our wiki for this, which I see it a general improvement,
> and hopefully someone can look into this:
>
> https://wiki.qemu.org/ToDo/LiveMigration#Device_state_concurrency
>
> I hope VFIO can consider resolving this as a generic issue, rather than
> providing its own solution.
>
>> This is just what came to mind. Maybe we can think this more thoroughly and
>> see if it could work somehow, now or in the future.
>> However, I think making the multifd threads generic so they can send any
>> kind of data is a good thing in general, regardless of VFIO.
> Right.
>
> In general, having a VFIO separate channel may solve the immediate issue,
> but it may still won't solve all, meanwhile it may introduce the first
> example of using completely separate channel that migration won't easily
> manage itself, which IMHO can cause migration much harder to maintain in
> the future.
>
> It may also in the future become some technical debt that VFIO will need to
> take even if such a solution merged, because VFIO could have its own model
> of handling a few similar problems that migration has.
>
> I hope there's some way out that we can work together to improve the
> framework, providing a clean approach and consider for the long terms.

I understand your concern, and I hope as well we can work together 
towards a proper and maintainable solution.
Even if we don't get VFIO to use the multifd framework directly, maybe 
adding common APIs would be good enough.
For example, take this series of adding a common API to create migration 
channels.
I also saw you and Fabiano have been talking about a thread-pool model 
to replace the multifd threads. Maybe we can use such scheme also for 
VFIO and even for the vCPUs case you mentioned above, each component 
stating how many threads it needs and creating a big pool for all 
migration clients.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-30 18:44             ` Avihai Horon
@ 2024-02-06 10:25               ` Peter Xu
  2024-02-08 15:31                 ` Avihai Horon
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2024-02-06 10:25 UTC (permalink / raw)
  To: Avihai Horon; +Cc: Fabiano Rosas, qemu-devel

On Tue, Jan 30, 2024 at 08:44:19PM +0200, Avihai Horon wrote:
> 
> On 30/01/2024 7:57, Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Mon, Jan 29, 2024 at 02:20:35PM +0200, Avihai Horon wrote:
> > > On 29/01/2024 6:17, Peter Xu wrote:
> > > > External email: Use caution opening links or attachments
> > > > 
> > > > 
> > > > On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
> > > > > On 25/01/2024 22:57, Fabiano Rosas wrote:
> > > > > > External email: Use caution opening links or attachments
> > > > > > 
> > > > > > 
> > > > > > Avihai Horon <avihaih@nvidia.com> writes:
> > > > > > 
> > > > > > > The commit in the fixes line moved multifd thread creation to a
> > > > > > > different location, but forgot to move the p->running = true assignment
> > > > > > > as well. Thus, p->running is set to true before multifd thread is
> > > > > > > actually created.
> > > > > > > 
> > > > > > > p->running is used in multifd_save_cleanup() to decide whether to join
> > > > > > > the multifd thread or not.
> > > > > > > 
> > > > > > > With TLS, an error in multifd_tls_channel_connect() can lead to a
> > > > > > > segmentation fault because p->running is true but p->thread is never
> > > > > > > initialized, so multifd_save_cleanup() tries to join an uninitialized
> > > > > > > thread.
> > > > > > > 
> > > > > > > Fix it by moving p->running = true assignment right after multifd thread
> > > > > > > creation. Also move qio_channel_set_delay() to there, as this is where
> > > > > > > it used to be originally.
> > > > > > > 
> > > > > > > Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
> > > > > > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > > > > > Just for context, I haven't looked at this patch yet, but we were
> > > > > > planning to remove p->running altogether:
> > > > > > 
> > > > > > https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de
> > > > > Thanks for putting me in the picture.
> > > > > I see that there has been a discussion about the multifd creation/treadown
> > > > > flow.
> > > > > In light of this discussion, I can already see a few problems in my series
> > > > > that I didn't notice before (such as the TLS handshake thread leak).
> > > > > The thread you mentioned here and some of my patches point out some problems
> > > > > in multifd creation/treardown. I guess we can discuss it and see what's the
> > > > > best way to solve them.
> > > > > 
> > > > > Regarding this patch, your solution indeed solves the bug that this patch
> > > > > addresses, so maybe this could be dropped (or only noted in your patch).
> > > > > 
> > > > > Maybe I should also put you (and Peter) in context for this whole series --
> > > > > I am writing it as preparation for adding a separate migration channel for
> > > > > VFIO device migration, so VFIO devices could be migrated in parallel.
> > > > > So this series tries to lay down some foundations to facilitate it.
> > > > Avihai, is the throughput the only reason that VFIO would like to have a
> > > > separate channel?
> > > Actually, the main reason is to be able to send and load multiple VFIO
> > > devices data in parallel.
> > > For example, today if we have three VFIO devices, they are migrated
> > > sequentially one after another.
> > > This particularly hurts during the complete pre-copy phase (downtime), as
> > > loading the VFIO data in destination involves FW interaction and resource
> > > allocation, which takes time and simply blocks the other devices from
> > > sending and loading their data.
> > > Providing a separate channel and thread for each VIFO device solves this
> > > problem and ideally reduces the VFIO contribution to downtime from sum{VFIO
> > > device #1, ..., VFIO device #N} to max{VFIO device #1, ..., VFIO device #N}.
> > I see.
> > 
> > > > I'm wondering if we can also use multifd threads to send vfio data at some
> > > > point.  Now multifd indeed is closely bound to ram pages but maybe it'll
> > > > change in the near future to take any load?
> > > > 
> > > > Multifd is for solving the throughput issue already. If vfio has the same
> > > > goal, IMHO it'll be good to keep them using the same thread model, instead
> > > > of managing different threads in different places.  With that, any user
> > > > setting (for example, multifd-n-threads) will naturally apply to all
> > > > components, rather than relying on yet-another vfio-migration-threads-num
> > > > parameter.
> > > Frankly, I didn't really put much attention to the throughput factor, and my
> > > plan is to introduce only a single thread per device.
> > > VFIO devices may have many GBs of data to migrate (e.g., vGPUs) and even
> > > mlx5 VFs can have a few GBs of data.
> > > So what you are saying here is interesting, although I didn't test such
> > > scenario to see the actual benefit.
> > > 
> > > I am trying to think if/how this could work and I have a few concerns:
> > > 1. RAM is made of fixed-positioned pages that can be randomly read/written,
> > > so sending these pages over multiple channels and loading them in the
> > > destination can work pretty naturally without much overhead.
> > >     VFIO device data, on the other hand, is just an opaque stream of bytes
> > > from QEMU point of view. This means that if we break this data to "packets"
> > > and send them over multiple channels, we must preserve the order by which
> > > this data was
> > >     originally read from the device and write the data in the same order to
> > > the destination device.
> > >     I am wondering if the overhead of maintaining such order may hurt
> > > performance, making it not worthwhile.
> > Indeed, it seems to me VFIO migration is based on a streaming model where
> > there's no easy way to index a chunk of data.
> 
> Yes, you can see it here as well: https://elixir.bootlin.com/linux/v6.8-rc2/source/include/uapi/linux/vfio.h#L1039
> 
> > 
> > Is there any background on how that kernel interface was designed?  It
> > seems pretty unfriendly to concurrency already: even if multiple devices
> > can migrate concurrently, each single device can already contain GBs of
> > data as you said, which is pretty common here.  I'm a bit surprised to see
> > that the kernel interface is designed in this way for such a device.
> 
> Not that I know of. There has been a pretty big discussion about the uAPI
> back then when it was introduced, but not something formal.
> However, I don't think having a few GBs of device data is the common case
> for VFIO devices, I just pointed out the extreme cases.

I had that impression possibly because our QE team tests VFIO normally with
vGPU, and my memory is 1Q ~= 1GB device data, where NQ ~= N GB (mostly vRAM
per my understanding).

> 
> > I was wondering the possibility of whether VFIO can provide data chunks
> > with indexes just like RAM (which is represented in ramblock offsets).
> 
> I am not sure this would be feasible, as it will involve major changes to
> the uAPI and for the devices.
> But if that's something you wish to explore I can ask around and see if it's
> a hard no go.

The thing is if VFIO always relies on 1 fd read() then it's impossible to
scale.  Maybe there's chance in the future to provide other interfaces so
that at least data can be concurrently read or updated, even if they can
not directly be offseted.

> 
> > > 2. As I mentioned above, the main motivation for separate VFIO migration
> > > channel/thread, as I see today, is to allow parallel migration of VFIO
> > > devices.
> > >     AFAIU multifd, as it is today, doesn't provide such parallelism (i.e., in
> > > the complete pre-copy phase each device, be it RAM or VFIO, will fully send
> > > its data over the multifd threads and only after finishing will the next
> > > device send its data).
> > Indeed. That's actually an issue not only to VFIO but also to migration in
> > general: we can't migrate device states concurrently, and multifd is out of
> > the picture here so far, but maybe there's chance.
> > 
> > Consider huge VMs where there can be already ~500 vCPUs, each need their
> > own get()/put() of CPU states from/to KVM.  It'll be nice if we can do this
> > in concurrent threads too.  Here VFIO is one of the devices that will also
> > benefit from such a design, and greatly.
> 
> Interesting, do you know how much time migrating these vCPUs take?

Most of the vCPUs took only a portion of milliseconds, but some may took a
long time like 100+ ms.  We still haven't looked into why some vCPU is
special and what caused that slowness when save(), however consider >1
vCPUs taking 100ms they'll be added up lump sum contributing to the total
downtime.  That's definitely unwanted.

If we can somehow concurrently submit device states save() jobs to multiple
threads, then it could be potentially useful.

> 
> > I added a todo in our wiki for this, which I see it a general improvement,
> > and hopefully someone can look into this:
> > 
> > https://wiki.qemu.org/ToDo/LiveMigration#Device_state_concurrency
> > 
> > I hope VFIO can consider resolving this as a generic issue, rather than
> > providing its own solution.
> > 
> > > This is just what came to mind. Maybe we can think this more thoroughly and
> > > see if it could work somehow, now or in the future.
> > > However, I think making the multifd threads generic so they can send any
> > > kind of data is a good thing in general, regardless of VFIO.
> > Right.
> > 
> > In general, having a VFIO separate channel may solve the immediate issue,
> > but it may still won't solve all, meanwhile it may introduce the first
> > example of using completely separate channel that migration won't easily
> > manage itself, which IMHO can cause migration much harder to maintain in
> > the future.
> > 
> > It may also in the future become some technical debt that VFIO will need to
> > take even if such a solution merged, because VFIO could have its own model
> > of handling a few similar problems that migration has.
> > 
> > I hope there's some way out that we can work together to improve the
> > framework, providing a clean approach and consider for the long terms.
> 
> I understand your concern, and I hope as well we can work together towards a
> proper and maintainable solution.
> Even if we don't get VFIO to use the multifd framework directly, maybe
> adding common APIs would be good enough.

Yes.  Note that I haven't looked closely in your patchsets right now, as I
mentioned it may not apply due to the recent fixes.  However please feel
free to repost whatever that you think would still be worthwhile.

> For example, take this series of adding a common API to create migration
> channels.
> I also saw you and Fabiano have been talking about a thread-pool model to
> replace the multifd threads. Maybe we can use such scheme also for VFIO and
> even for the vCPUs case you mentioned above, each component stating how many
> threads it needs and creating a big pool for all migration clients.

We can keep discussing that.  So far I still think it'll be more valuable
if you can propose something that will apply not only to VFIO but also
other devices on concurrent save/loads, but I don't think I have everything
thought thoroughly.  So feel free to go with what you think proper, and we
can keep the discussion in your new thread if you're going to prepare
something.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-02-06 10:25               ` Peter Xu
@ 2024-02-08 15:31                 ` Avihai Horon
  0 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-02-08 15:31 UTC (permalink / raw)
  To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, Maciej Szmigiero, Joao Martins


On 06/02/2024 12:25, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Jan 30, 2024 at 08:44:19PM +0200, Avihai Horon wrote:
>> On 30/01/2024 7:57, Peter Xu wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Mon, Jan 29, 2024 at 02:20:35PM +0200, Avihai Horon wrote:
>>>> On 29/01/2024 6:17, Peter Xu wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
>>>>>> On 25/01/2024 22:57, Fabiano Rosas wrote:
>>>>>>> External email: Use caution opening links or attachments
>>>>>>>
>>>>>>>
>>>>>>> Avihai Horon <avihaih@nvidia.com> writes:
>>>>>>>
>>>>>>>> The commit in the fixes line moved multifd thread creation to a
>>>>>>>> different location, but forgot to move the p->running = true assignment
>>>>>>>> as well. Thus, p->running is set to true before multifd thread is
>>>>>>>> actually created.
>>>>>>>>
>>>>>>>> p->running is used in multifd_save_cleanup() to decide whether to join
>>>>>>>> the multifd thread or not.
>>>>>>>>
>>>>>>>> With TLS, an error in multifd_tls_channel_connect() can lead to a
>>>>>>>> segmentation fault because p->running is true but p->thread is never
>>>>>>>> initialized, so multifd_save_cleanup() tries to join an uninitialized
>>>>>>>> thread.
>>>>>>>>
>>>>>>>> Fix it by moving p->running = true assignment right after multifd thread
>>>>>>>> creation. Also move qio_channel_set_delay() to there, as this is where
>>>>>>>> it used to be originally.
>>>>>>>>
>>>>>>>> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
>>>>>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>>>>>> Just for context, I haven't looked at this patch yet, but we were
>>>>>>> planning to remove p->running altogether:
>>>>>>>
>>>>>>> https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de
>>>>>> Thanks for putting me in the picture.
>>>>>> I see that there has been a discussion about the multifd creation/treadown
>>>>>> flow.
>>>>>> In light of this discussion, I can already see a few problems in my series
>>>>>> that I didn't notice before (such as the TLS handshake thread leak).
>>>>>> The thread you mentioned here and some of my patches point out some problems
>>>>>> in multifd creation/treardown. I guess we can discuss it and see what's the
>>>>>> best way to solve them.
>>>>>>
>>>>>> Regarding this patch, your solution indeed solves the bug that this patch
>>>>>> addresses, so maybe this could be dropped (or only noted in your patch).
>>>>>>
>>>>>> Maybe I should also put you (and Peter) in context for this whole series --
>>>>>> I am writing it as preparation for adding a separate migration channel for
>>>>>> VFIO device migration, so VFIO devices could be migrated in parallel.
>>>>>> So this series tries to lay down some foundations to facilitate it.
>>>>> Avihai, is the throughput the only reason that VFIO would like to have a
>>>>> separate channel?
>>>> Actually, the main reason is to be able to send and load multiple VFIO
>>>> devices data in parallel.
>>>> For example, today if we have three VFIO devices, they are migrated
>>>> sequentially one after another.
>>>> This particularly hurts during the complete pre-copy phase (downtime), as
>>>> loading the VFIO data in destination involves FW interaction and resource
>>>> allocation, which takes time and simply blocks the other devices from
>>>> sending and loading their data.
>>>> Providing a separate channel and thread for each VIFO device solves this
>>>> problem and ideally reduces the VFIO contribution to downtime from sum{VFIO
>>>> device #1, ..., VFIO device #N} to max{VFIO device #1, ..., VFIO device #N}.
>>> I see.
>>>
>>>>> I'm wondering if we can also use multifd threads to send vfio data at some
>>>>> point.  Now multifd indeed is closely bound to ram pages but maybe it'll
>>>>> change in the near future to take any load?
>>>>>
>>>>> Multifd is for solving the throughput issue already. If vfio has the same
>>>>> goal, IMHO it'll be good to keep them using the same thread model, instead
>>>>> of managing different threads in different places.  With that, any user
>>>>> setting (for example, multifd-n-threads) will naturally apply to all
>>>>> components, rather than relying on yet-another vfio-migration-threads-num
>>>>> parameter.
>>>> Frankly, I didn't really put much attention to the throughput factor, and my
>>>> plan is to introduce only a single thread per device.
>>>> VFIO devices may have many GBs of data to migrate (e.g., vGPUs) and even
>>>> mlx5 VFs can have a few GBs of data.
>>>> So what you are saying here is interesting, although I didn't test such
>>>> scenario to see the actual benefit.
>>>>
>>>> I am trying to think if/how this could work and I have a few concerns:
>>>> 1. RAM is made of fixed-positioned pages that can be randomly read/written,
>>>> so sending these pages over multiple channels and loading them in the
>>>> destination can work pretty naturally without much overhead.
>>>>      VFIO device data, on the other hand, is just an opaque stream of bytes
>>>> from QEMU point of view. This means that if we break this data to "packets"
>>>> and send them over multiple channels, we must preserve the order by which
>>>> this data was
>>>>      originally read from the device and write the data in the same order to
>>>> the destination device.
>>>>      I am wondering if the overhead of maintaining such order may hurt
>>>> performance, making it not worthwhile.
>>> Indeed, it seems to me VFIO migration is based on a streaming model where
>>> there's no easy way to index a chunk of data.
>> Yes, you can see it here as well: https://elixir.bootlin.com/linux/v6.8-rc2/source/include/uapi/linux/vfio.h#L1039
>>
>>> Is there any background on how that kernel interface was designed?  It
>>> seems pretty unfriendly to concurrency already: even if multiple devices
>>> can migrate concurrently, each single device can already contain GBs of
>>> data as you said, which is pretty common here.  I'm a bit surprised to see
>>> that the kernel interface is designed in this way for such a device.
>> Not that I know of. There has been a pretty big discussion about the uAPI
>> back then when it was introduced, but not something formal.
>> However, I don't think having a few GBs of device data is the common case
>> for VFIO devices, I just pointed out the extreme cases.
> I had that impression possibly because our QE team tests VFIO normally with
> vGPU, and my memory is 1Q ~= 1GB device data, where NQ ~= N GB (mostly vRAM
> per my understanding).
>
>>> I was wondering the possibility of whether VFIO can provide data chunks
>>> with indexes just like RAM (which is represented in ramblock offsets).
>> I am not sure this would be feasible, as it will involve major changes to
>> the uAPI and for the devices.
>> But if that's something you wish to explore I can ask around and see if it's
>> a hard no go.
> The thing is if VFIO always relies on 1 fd read() then it's impossible to
> scale.  Maybe there's chance in the future to provide other interfaces so
> that at least data can be concurrently read or updated, even if they can
> not directly be offseted.

Yes, this could be some future work if it turns out to be a serious 
bottleneck.

>
>>>> 2. As I mentioned above, the main motivation for separate VFIO migration
>>>> channel/thread, as I see today, is to allow parallel migration of VFIO
>>>> devices.
>>>>      AFAIU multifd, as it is today, doesn't provide such parallelism (i.e., in
>>>> the complete pre-copy phase each device, be it RAM or VFIO, will fully send
>>>> its data over the multifd threads and only after finishing will the next
>>>> device send its data).
>>> Indeed. That's actually an issue not only to VFIO but also to migration in
>>> general: we can't migrate device states concurrently, and multifd is out of
>>> the picture here so far, but maybe there's chance.
>>>
>>> Consider huge VMs where there can be already ~500 vCPUs, each need their
>>> own get()/put() of CPU states from/to KVM.  It'll be nice if we can do this
>>> in concurrent threads too.  Here VFIO is one of the devices that will also
>>> benefit from such a design, and greatly.
>> Interesting, do you know how much time migrating these vCPUs take?
> Most of the vCPUs took only a portion of milliseconds, but some may took a
> long time like 100+ ms.  We still haven't looked into why some vCPU is
> special and what caused that slowness when save(), however consider >1
> vCPUs taking 100ms they'll be added up lump sum contributing to the total
> downtime.  That's definitely unwanted.
>
> If we can somehow concurrently submit device states save() jobs to multiple
> threads, then it could be potentially useful.

Yes, I agree.

>
>>> I added a todo in our wiki for this, which I see it a general improvement,
>>> and hopefully someone can look into this:
>>>
>>> https://wiki.qemu.org/ToDo/LiveMigration#Device_state_concurrency
>>>
>>> I hope VFIO can consider resolving this as a generic issue, rather than
>>> providing its own solution.
>>>
>>>> This is just what came to mind. Maybe we can think this more thoroughly and
>>>> see if it could work somehow, now or in the future.
>>>> However, I think making the multifd threads generic so they can send any
>>>> kind of data is a good thing in general, regardless of VFIO.
>>> Right.
>>>
>>> In general, having a VFIO separate channel may solve the immediate issue,
>>> but it may still won't solve all, meanwhile it may introduce the first
>>> example of using completely separate channel that migration won't easily
>>> manage itself, which IMHO can cause migration much harder to maintain in
>>> the future.
>>>
>>> It may also in the future become some technical debt that VFIO will need to
>>> take even if such a solution merged, because VFIO could have its own model
>>> of handling a few similar problems that migration has.
>>>
>>> I hope there's some way out that we can work together to improve the
>>> framework, providing a clean approach and consider for the long terms.
>> I understand your concern, and I hope as well we can work together towards a
>> proper and maintainable solution.
>> Even if we don't get VFIO to use the multifd framework directly, maybe
>> adding common APIs would be good enough.
> Yes.  Note that I haven't looked closely in your patchsets right now, as I
> mentioned it may not apply due to the recent fixes.  However please feel
> free to repost whatever that you think would still be worthwhile.

Sure.

>
>> For example, take this series of adding a common API to create migration
>> channels.
>> I also saw you and Fabiano have been talking about a thread-pool model to
>> replace the multifd threads. Maybe we can use such scheme also for VFIO and
>> even for the vCPUs case you mentioned above, each component stating how many
>> threads it needs and creating a big pool for all migration clients.
> We can keep discussing that.  So far I still think it'll be more valuable
> if you can propose something that will apply not only to VFIO but also
> other devices on concurrent save/loads, but I don't think I have everything
> thought thoroughly.  So feel free to go with what you think proper, and we
> can keep the discussion in your new thread if you're going to prepare
> something.

Well, it turns out that Oracle has been coding a PoC for VFIO parallel 
migration over the multifd framework.

It looks promising, so I will drop my efforts in the current approach, 
giving the lead to Maciej from Oracle (CCed him here) who coded this 
PoC. AFAIK, he is currently working on an initial series based on his 
PoC and he will post it when ready. I think in the meanwhile it would be 
good to CC him in other multifd work that could be related to VFIO 
parallel migration.

Thanks!



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place
  2024-01-29  4:17       ` Peter Xu
  2024-01-29 12:20         ` Avihai Horon
@ 2024-01-29 12:23         ` Fabiano Rosas
  1 sibling, 0 replies; 37+ messages in thread
From: Fabiano Rosas @ 2024-01-29 12:23 UTC (permalink / raw)
  To: Peter Xu, Avihai Horon; +Cc: qemu-devel

Peter Xu <peterx@redhat.com> writes:

> On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
>> 
>> On 25/01/2024 22:57, Fabiano Rosas wrote:
>> > External email: Use caution opening links or attachments
>> > 
>> > 
>> > Avihai Horon <avihaih@nvidia.com> writes:
>> > 
>> > > The commit in the fixes line moved multifd thread creation to a
>> > > different location, but forgot to move the p->running = true assignment
>> > > as well. Thus, p->running is set to true before multifd thread is
>> > > actually created.
>> > > 
>> > > p->running is used in multifd_save_cleanup() to decide whether to join
>> > > the multifd thread or not.
>> > > 
>> > > With TLS, an error in multifd_tls_channel_connect() can lead to a
>> > > segmentation fault because p->running is true but p->thread is never
>> > > initialized, so multifd_save_cleanup() tries to join an uninitialized
>> > > thread.
>> > > 
>> > > Fix it by moving p->running = true assignment right after multifd thread
>> > > creation. Also move qio_channel_set_delay() to there, as this is where
>> > > it used to be originally.
>> > > 
>> > > Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
>> > > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> > Just for context, I haven't looked at this patch yet, but we were
>> > planning to remove p->running altogether:
>> > 
>> > https://lore.kernel.org/r/20231110200241.20679-1-farosas@suse.de
>> 
>> Thanks for putting me in the picture.
>> I see that there has been a discussion about the multifd creation/treadown
>> flow.
>> In light of this discussion, I can already see a few problems in my series
>> that I didn't notice before (such as the TLS handshake thread leak).
>> The thread you mentioned here and some of my patches point out some problems
>> in multifd creation/treardown. I guess we can discuss it and see what's the
>> best way to solve them.
>> 
>> Regarding this patch, your solution indeed solves the bug that this patch
>> addresses, so maybe this could be dropped (or only noted in your patch).
>> 
>> Maybe I should also put you (and Peter) in context for this whole series --
>> I am writing it as preparation for adding a separate migration channel for
>> VFIO device migration, so VFIO devices could be migrated in parallel.
>> So this series tries to lay down some foundations to facilitate it.
>
> Avihai, is the throughput the only reason that VFIO would like to have a
> separate channel?
>
> I'm wondering if we can also use multifd threads to send vfio data at some
> point.  Now multifd indeed is closely bound to ram pages but maybe it'll
> change in the near future to take any load?

We're not far away from it IMO. We just need to gradually kick the pages
concept out of multifd.

Again, for context, I have played with this recently:

https://gitlab.com/farosas/qemu/-/commits/multifd-packet-cleanups?ref_type=heads

I'd be happy with any solution that turns that p->pages into something
opaque that multifd has no knowledge about.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (3 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 04/17] migration/multifd: Set p->running = true in the right place Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-29 14:34   ` Fabiano Rosas
  2024-01-25 16:25 ` [PATCH 06/17] migration/tls: Rename main migration channel TLS functions Avihai Horon
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Currently, multifd channels are created asynchronously without waiting
for their creation -- migration simply proceeds and may wait in
multifd_send_sync_main(), which is called by ram_save_setup(). This
hides in it some race conditions which can cause an unexpected behavior
if some channels creation fail.

For example, the following scenario of multifd migration with two
channels, where the first channel creation fails, will end in a
segmentation fault (time advances from top to bottom):

Thread           | Code execution
------------------------------------------------------------------------
Multifd 1        |
                 | multifd_new_send_channel_async (errors and quits)
                 |   multifd_new_send_channel_cleanup
                 |
Migration thread |
                 | qemu_savevm_state_setup
                 |   ram_save_setup
                 |     multifd_send_sync_main
                 |     (detects Multifd 1 error and quits)
                 | [...]
                 | migration_iteration_finish
                 |   migrate_fd_cleanup_schedule
                 |
Main thread      |
                 | migrate_fd_cleanup
                 |   multifd_save_cleanup (destroys Multifd 2 resources)
                 |
Multifd 2        |
                 | multifd_new_send_channel_async
                 | (accesses destroyed resources, segfault)

In another scenario, migration can hang indefinitely:
1. Main migration thread reaches multifd_send_sync_main() and waits on
   the semaphores.
2. Then, all multifd channels creation fails, so they post the
   semaphores and quit.
3. Main migration channel will not identify the error, proceed to send
   pages and will hang.

Fix it by waiting for all multifd channels to be created before
proceeding with migration.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/multifd.h   |  3 +++
 migration/migration.c |  1 +
 migration/multifd.c   | 34 +++++++++++++++++++++++++++++++---
 migration/ram.c       |  7 +++++++
 4 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 35d11f103c..87a64e0a87 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
 int multifd_send_sync_main(void);
 int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
+int multifd_send_channels_created(void);
 
 /* Multifd Compression flags */
 #define MULTIFD_FLAG_SYNC (1 << 0)
@@ -86,6 +87,8 @@ typedef struct {
     /* multifd flags for sending ram */
     int write_flags;
 
+    /* Syncs channel creation and migration thread */
+    QemuSemaphore create_sem;
     /* sem where to wait for more work */
     QemuSemaphore sem;
     /* syncs main thread and channels */
diff --git a/migration/migration.c b/migration/migration.c
index 9c769a1ecd..d81d96eaa5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
         error_report_err(local_err);
         migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
                           MIGRATION_STATUS_FAILED);
+        multifd_send_channels_created();
         migrate_fd_cleanup(s);
         return;
     }
diff --git a/migration/multifd.c b/migration/multifd.c
index 564e911b6c..f0c216f4f9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
         multifd_send_channel_destroy(p->c);
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);
+        qemu_sem_destroy(&p->create_sem);
         qemu_sem_destroy(&p->sem);
         qemu_sem_destroy(&p->sem_sync);
         g_free(p->name);
@@ -766,6 +767,29 @@ out:
     return NULL;
 }
 
+int multifd_send_channels_created(void)
+{
+    int ret = 0;
+
+    if (!migrate_multifd()) {
+        return 0;
+    }
+
+    for (int i = 0; i < migrate_multifd_channels(); i++) {
+        MultiFDSendParams *p = &multifd_send_state->params[i];
+
+        qemu_sem_wait(&p->create_sem);
+        WITH_QEMU_LOCK_GUARD(&p->mutex) {
+            if (p->quit) {
+                error_report("%s: channel %d has already quit", __func__, i);
+                ret = -1;
+            }
+        }
+    }
+
+    return ret;
+}
+
 static bool multifd_channel_connect(MultiFDSendParams *p,
                                     QIOChannel *ioc,
                                     Error **errp);
@@ -794,6 +818,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
     p->quit = true;
     qemu_sem_post(&multifd_send_state->channels_ready);
     qemu_sem_post(&p->sem_sync);
+    qemu_sem_post(&p->create_sem);
     error_free(err);
 }
 
@@ -857,6 +882,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
     qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
                        QEMU_THREAD_JOINABLE);
     p->running = true;
+    qemu_sem_post(&p->create_sem);
     return true;
 }
 
@@ -864,15 +890,16 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
                                              QIOChannel *ioc, Error *err)
 {
      migrate_set_error(migrate_get_current(), err);
-     /* Error happen, we need to tell who pay attention to me */
-     qemu_sem_post(&multifd_send_state->channels_ready);
-     qemu_sem_post(&p->sem_sync);
      /*
+      * Error happen, we need to tell who pay attention to me.
       * Although multifd_send_thread is not created, but main migration
       * thread need to judge whether it is running, so we need to mark
       * its status.
       */
      p->quit = true;
+     qemu_sem_post(&multifd_send_state->channels_ready);
+     qemu_sem_post(&p->sem_sync);
+     qemu_sem_post(&p->create_sem);
      object_unref(OBJECT(ioc));
      error_free(err);
 }
@@ -921,6 +948,7 @@ int multifd_save_setup(Error **errp)
         MultiFDSendParams *p = &multifd_send_state->params[i];
 
         qemu_mutex_init(&p->mutex);
+        qemu_sem_init(&p->create_sem, 0);
         qemu_sem_init(&p->sem, 0);
         qemu_sem_init(&p->sem_sync, 0);
         p->quit = false;
diff --git a/migration/ram.c b/migration/ram.c
index c0cdcccb75..b3e864a22b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2937,6 +2937,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     RAMBlock *block;
     int ret;
 
+    bql_unlock();
+    ret = multifd_send_channels_created();
+    bql_lock();
+    if (ret < 0) {
+        return ret;
+    }
+
     if (compress_threads_save_setup()) {
         return -1;
     }
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding
  2024-01-25 16:25 ` [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding Avihai Horon
@ 2024-01-29 14:34   ` Fabiano Rosas
  2024-01-30 18:32     ` Avihai Horon
  0 siblings, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2024-01-29 14:34 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu, Avihai Horon

Avihai Horon <avihaih@nvidia.com> writes:

> Currently, multifd channels are created asynchronously without waiting
> for their creation -- migration simply proceeds and may wait in
> multifd_send_sync_main(), which is called by ram_save_setup(). This
> hides in it some race conditions which can cause an unexpected behavior
> if some channels creation fail.
>
> For example, the following scenario of multifd migration with two
> channels, where the first channel creation fails, will end in a
> segmentation fault (time advances from top to bottom):

Is this reproducible? Or just observable at least.

I acknowledge the situation you describe, but with multifd there's
usually an issue in cleanup paths. Let's make sure we flushed those out
before adding this new semaphore.

This is similar to an issue Peter was addressing where we missed calling
multifd_send_termiante_threads() in the multifd_channel_connect() path:

patch 4 in this
https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com

>
> Thread           | Code execution
> ------------------------------------------------------------------------
> Multifd 1        |
>                  | multifd_new_send_channel_async (errors and quits)
>                  |   multifd_new_send_channel_cleanup
>                  |
> Migration thread |
>                  | qemu_savevm_state_setup
>                  |   ram_save_setup
>                  |     multifd_send_sync_main
>                  |     (detects Multifd 1 error and quits)
>                  | [...]
>                  | migration_iteration_finish
>                  |   migrate_fd_cleanup_schedule
>                  |
> Main thread      |
>                  | migrate_fd_cleanup
>                  |   multifd_save_cleanup (destroys Multifd 2 resources)
>                  |
> Multifd 2        |
>                  | multifd_new_send_channel_async
>                  | (accesses destroyed resources, segfault)
>
> In another scenario, migration can hang indefinitely:
> 1. Main migration thread reaches multifd_send_sync_main() and waits on
>    the semaphores.
> 2. Then, all multifd channels creation fails, so they post the
>    semaphores and quit.
> 3. Main migration channel will not identify the error, proceed to send
>    pages and will hang.
>
> Fix it by waiting for all multifd channels to be created before
> proceeding with migration.
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  migration/multifd.h   |  3 +++
>  migration/migration.c |  1 +
>  migration/multifd.c   | 34 +++++++++++++++++++++++++++++++---
>  migration/ram.c       |  7 +++++++
>  4 files changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 35d11f103c..87a64e0a87 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>  void multifd_recv_sync_main(void);
>  int multifd_send_sync_main(void);
>  int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
> +int multifd_send_channels_created(void);
>  
>  /* Multifd Compression flags */
>  #define MULTIFD_FLAG_SYNC (1 << 0)
> @@ -86,6 +87,8 @@ typedef struct {
>      /* multifd flags for sending ram */
>      int write_flags;
>  
> +    /* Syncs channel creation and migration thread */
> +    QemuSemaphore create_sem;
>      /* sem where to wait for more work */
>      QemuSemaphore sem;
>      /* syncs main thread and channels */
> diff --git a/migration/migration.c b/migration/migration.c
> index 9c769a1ecd..d81d96eaa5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>          error_report_err(local_err);
>          migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>                            MIGRATION_STATUS_FAILED);
> +        multifd_send_channels_created();
>          migrate_fd_cleanup(s);
>          return;
>      }
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 564e911b6c..f0c216f4f9 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
>          multifd_send_channel_destroy(p->c);
>          p->c = NULL;
>          qemu_mutex_destroy(&p->mutex);
> +        qemu_sem_destroy(&p->create_sem);
>          qemu_sem_destroy(&p->sem);
>          qemu_sem_destroy(&p->sem_sync);
>          g_free(p->name);
> @@ -766,6 +767,29 @@ out:
>      return NULL;
>  }
>  
> +int multifd_send_channels_created(void)
> +{
> +    int ret = 0;
> +
> +    if (!migrate_multifd()) {
> +        return 0;
> +    }
> +
> +    for (int i = 0; i < migrate_multifd_channels(); i++) {
> +        MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> +        qemu_sem_wait(&p->create_sem);
> +        WITH_QEMU_LOCK_GUARD(&p->mutex) {
> +            if (p->quit) {
> +                error_report("%s: channel %d has already quit", __func__, i);
> +                ret = -1;
> +            }
> +        }

There are races here when a channel fails at
multifd_send_initial_packet(). If p->quit can be set after post to
create_sem, then this function could always return true and we'd run
into a broken channel. Possibly even the same bug you're trying to fix.

I think that's one of the reasons we have channels_ready.

> +    }
> +
> +    return ret;
> +}
> +
>  static bool multifd_channel_connect(MultiFDSendParams *p,
>                                      QIOChannel *ioc,
>                                      Error **errp);
> @@ -794,6 +818,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
>      p->quit = true;
>      qemu_sem_post(&multifd_send_state->channels_ready);
>      qemu_sem_post(&p->sem_sync);
> +    qemu_sem_post(&p->create_sem);
>      error_free(err);
>  }
>  
> @@ -857,6 +882,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>      qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>                         QEMU_THREAD_JOINABLE);
>      p->running = true;
> +    qemu_sem_post(&p->create_sem);
>      return true;
>  }
>  
> @@ -864,15 +890,16 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
>                                               QIOChannel *ioc, Error *err)
>  {
>       migrate_set_error(migrate_get_current(), err);
> -     /* Error happen, we need to tell who pay attention to me */
> -     qemu_sem_post(&multifd_send_state->channels_ready);
> -     qemu_sem_post(&p->sem_sync);
>       /*
> +      * Error happen, we need to tell who pay attention to me.
>        * Although multifd_send_thread is not created, but main migration
>        * thread need to judge whether it is running, so we need to mark
>        * its status.
>        */
>       p->quit = true;
> +     qemu_sem_post(&multifd_send_state->channels_ready);
> +     qemu_sem_post(&p->sem_sync);
> +     qemu_sem_post(&p->create_sem);

Do we still need channels_ready and sem_sync here? The migration thread
shouldn't have gone past create_sem at this point.

>       object_unref(OBJECT(ioc));
>       error_free(err);
>  }
> @@ -921,6 +948,7 @@ int multifd_save_setup(Error **errp)
>          MultiFDSendParams *p = &multifd_send_state->params[i];
>  
>          qemu_mutex_init(&p->mutex);
> +        qemu_sem_init(&p->create_sem, 0);
>          qemu_sem_init(&p->sem, 0);
>          qemu_sem_init(&p->sem_sync, 0);
>          p->quit = false;
> diff --git a/migration/ram.c b/migration/ram.c
> index c0cdcccb75..b3e864a22b 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2937,6 +2937,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      RAMBlock *block;
>      int ret;
>  
> +    bql_unlock();
> +    ret = multifd_send_channels_created();
> +    bql_lock();
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
>      if (compress_threads_save_setup()) {
>          return -1;
>      }


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding
  2024-01-29 14:34   ` Fabiano Rosas
@ 2024-01-30 18:32     ` Avihai Horon
  2024-01-30 21:32       ` Fabiano Rosas
  0 siblings, 1 reply; 37+ messages in thread
From: Avihai Horon @ 2024-01-30 18:32 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel; +Cc: Peter Xu


On 29/01/2024 16:34, Fabiano Rosas wrote:
> External email: Use caution opening links or attachments
>
>
> Avihai Horon <avihaih@nvidia.com> writes:
>
>> Currently, multifd channels are created asynchronously without waiting
>> for their creation -- migration simply proceeds and may wait in
>> multifd_send_sync_main(), which is called by ram_save_setup(). This
>> hides in it some race conditions which can cause an unexpected behavior
>> if some channels creation fail.
>>
>> For example, the following scenario of multifd migration with two
>> channels, where the first channel creation fails, will end in a
>> segmentation fault (time advances from top to bottom):
> Is this reproducible? Or just observable at least.

Yes, though I had to engineer it a bit:
1. Run migration with two multifd channels and fail creation of the two 
channels (e.g., by changing the address they are connecting to).
2. Add sleep(3) in multifd_send_sync_main() before we loop through the 
channels and check p->quit.
3. Add sleep(5) only for the second multifd channel connect thread so 
its connection is delayed and runs last.

> I acknowledge the situation you describe, but with multifd there's
> usually an issue in cleanup paths. Let's make sure we flushed those out
> before adding this new semaphore.

Indeed, I was not keen on adding yet another semaphore either.
I think there are multiple bugs here, some of them overlap and some don't.
There is also your and Peter's previous work that I was not aware of to 
fix those and to clean up the code.

Maybe we can take it one step at a time, pushing your series first, 
cleaning the code and fixing some bugs.
Then we can see what bugs are left (if any) and fix them. It might even 
be easier to fix after the cleanups.

> This is similar to an issue Peter was addressing where we missed calling
> multifd_send_termiante_threads() in the multifd_channel_connect() path:
>
> patch 4 in this
> https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com

What issue are you referring here? Can you elaborate?

The main issue I am trying to fix in my patch is that we don't wait for 
all multifd channels to be created/error out before tearing down
multifd resources in mulitfd_save_cleanup().

>> Thread           | Code execution
>> ------------------------------------------------------------------------
>> Multifd 1        |
>>                   | multifd_new_send_channel_async (errors and quits)
>>                   |   multifd_new_send_channel_cleanup
>>                   |
>> Migration thread |
>>                   | qemu_savevm_state_setup
>>                   |   ram_save_setup
>>                   |     multifd_send_sync_main
>>                   |     (detects Multifd 1 error and quits)
>>                   | [...]
>>                   | migration_iteration_finish
>>                   |   migrate_fd_cleanup_schedule
>>                   |
>> Main thread      |
>>                   | migrate_fd_cleanup
>>                   |   multifd_save_cleanup (destroys Multifd 2 resources)
>>                   |
>> Multifd 2        |
>>                   | multifd_new_send_channel_async
>>                   | (accesses destroyed resources, segfault)
>>
>> In another scenario, migration can hang indefinitely:
>> 1. Main migration thread reaches multifd_send_sync_main() and waits on
>>     the semaphores.
>> 2. Then, all multifd channels creation fails, so they post the
>>     semaphores and quit.
>> 3. Main migration channel will not identify the error, proceed to send
>>     pages and will hang.
>>
>> Fix it by waiting for all multifd channels to be created before
>> proceeding with migration.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   migration/multifd.h   |  3 +++
>>   migration/migration.c |  1 +
>>   migration/multifd.c   | 34 +++++++++++++++++++++++++++++++---
>>   migration/ram.c       |  7 +++++++
>>   4 files changed, 42 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index 35d11f103c..87a64e0a87 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>   void multifd_recv_sync_main(void);
>>   int multifd_send_sync_main(void);
>>   int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
>> +int multifd_send_channels_created(void);
>>
>>   /* Multifd Compression flags */
>>   #define MULTIFD_FLAG_SYNC (1 << 0)
>> @@ -86,6 +87,8 @@ typedef struct {
>>       /* multifd flags for sending ram */
>>       int write_flags;
>>
>> +    /* Syncs channel creation and migration thread */
>> +    QemuSemaphore create_sem;
>>       /* sem where to wait for more work */
>>       QemuSemaphore sem;
>>       /* syncs main thread and channels */
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 9c769a1ecd..d81d96eaa5 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>           error_report_err(local_err);
>>           migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>                             MIGRATION_STATUS_FAILED);
>> +        multifd_send_channels_created();
>>           migrate_fd_cleanup(s);
>>           return;
>>       }
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 564e911b6c..f0c216f4f9 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
>>           multifd_send_channel_destroy(p->c);
>>           p->c = NULL;
>>           qemu_mutex_destroy(&p->mutex);
>> +        qemu_sem_destroy(&p->create_sem);
>>           qemu_sem_destroy(&p->sem);
>>           qemu_sem_destroy(&p->sem_sync);
>>           g_free(p->name);
>> @@ -766,6 +767,29 @@ out:
>>       return NULL;
>>   }
>>
>> +int multifd_send_channels_created(void)
>> +{
>> +    int ret = 0;
>> +
>> +    if (!migrate_multifd()) {
>> +        return 0;
>> +    }
>> +
>> +    for (int i = 0; i < migrate_multifd_channels(); i++) {
>> +        MultiFDSendParams *p = &multifd_send_state->params[i];
>> +
>> +        qemu_sem_wait(&p->create_sem);
>> +        WITH_QEMU_LOCK_GUARD(&p->mutex) {
>> +            if (p->quit) {
>> +                error_report("%s: channel %d has already quit", __func__, i);
>> +                ret = -1;
>> +            }
>> +        }
> There are races here when a channel fails at
> multifd_send_initial_packet(). If p->quit can be set after post to
> create_sem, then this function could always return true and we'd run
> into a broken channel. Possibly even the same bug you're trying to fix.
>
> I think that's one of the reasons we have channels_ready.

I am not sure exactly what bug you are describing here, can you elaborate?

>
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>>   static bool multifd_channel_connect(MultiFDSendParams *p,
>>                                       QIOChannel *ioc,
>>                                       Error **errp);
>> @@ -794,6 +818,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
>>       p->quit = true;
>>       qemu_sem_post(&multifd_send_state->channels_ready);
>>       qemu_sem_post(&p->sem_sync);
>> +    qemu_sem_post(&p->create_sem);
>>       error_free(err);
>>   }
>>
>> @@ -857,6 +882,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>       qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>                          QEMU_THREAD_JOINABLE);
>>       p->running = true;
>> +    qemu_sem_post(&p->create_sem);
>>       return true;
>>   }
>>
>> @@ -864,15 +890,16 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
>>                                                QIOChannel *ioc, Error *err)
>>   {
>>        migrate_set_error(migrate_get_current(), err);
>> -     /* Error happen, we need to tell who pay attention to me */
>> -     qemu_sem_post(&multifd_send_state->channels_ready);
>> -     qemu_sem_post(&p->sem_sync);
>>        /*
>> +      * Error happen, we need to tell who pay attention to me.
>>         * Although multifd_send_thread is not created, but main migration
>>         * thread need to judge whether it is running, so we need to mark
>>         * its status.
>>         */
>>        p->quit = true;
>> +     qemu_sem_post(&multifd_send_state->channels_ready);
>> +     qemu_sem_post(&p->sem_sync);
>> +     qemu_sem_post(&p->create_sem);
> Do we still need channels_ready and sem_sync here? The migration thread
> shouldn't have gone past create_sem at this point.

I think you are right, we can drop channels_ready and sem_sync here.

>
>>        object_unref(OBJECT(ioc));
>>        error_free(err);
>>   }
>> @@ -921,6 +948,7 @@ int multifd_save_setup(Error **errp)
>>           MultiFDSendParams *p = &multifd_send_state->params[i];
>>
>>           qemu_mutex_init(&p->mutex);
>> +        qemu_sem_init(&p->create_sem, 0);
>>           qemu_sem_init(&p->sem, 0);
>>           qemu_sem_init(&p->sem_sync, 0);
>>           p->quit = false;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index c0cdcccb75..b3e864a22b 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2937,6 +2937,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>       RAMBlock *block;
>>       int ret;
>>
>> +    bql_unlock();
>> +    ret = multifd_send_channels_created();
>> +    bql_lock();
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>>       if (compress_threads_save_setup()) {
>>           return -1;
>>       }


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding
  2024-01-30 18:32     ` Avihai Horon
@ 2024-01-30 21:32       ` Fabiano Rosas
  2024-01-31  4:49         ` Peter Xu
  2024-01-31 10:39         ` Avihai Horon
  0 siblings, 2 replies; 37+ messages in thread
From: Fabiano Rosas @ 2024-01-30 21:32 UTC (permalink / raw)
  To: Avihai Horon, qemu-devel; +Cc: Peter Xu

Avihai Horon <avihaih@nvidia.com> writes:

> On 29/01/2024 16:34, Fabiano Rosas wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Avihai Horon <avihaih@nvidia.com> writes:
>>
>>> Currently, multifd channels are created asynchronously without waiting
>>> for their creation -- migration simply proceeds and may wait in
>>> multifd_send_sync_main(), which is called by ram_save_setup(). This
>>> hides in it some race conditions which can cause an unexpected behavior
>>> if some channels creation fail.
>>>
>>> For example, the following scenario of multifd migration with two
>>> channels, where the first channel creation fails, will end in a
>>> segmentation fault (time advances from top to bottom):
>> Is this reproducible? Or just observable at least.
>
> Yes, though I had to engineer it a bit:
> 1. Run migration with two multifd channels and fail creation of the two 
> channels (e.g., by changing the address they are connecting to).
> 2. Add sleep(3) in multifd_send_sync_main() before we loop through the 
> channels and check p->quit.
> 3. Add sleep(5) only for the second multifd channel connect thread so 
> its connection is delayed and runs last.

Ok, well, that's something at least. I'll try to reproduce it so we can
keep track of it.

>> I acknowledge the situation you describe, but with multifd there's
>> usually an issue in cleanup paths. Let's make sure we flushed those out
>> before adding this new semaphore.
>
> Indeed, I was not keen on adding yet another semaphore either.
> I think there are multiple bugs here, some of them overlap and some don't.
> There is also your and Peter's previous work that I was not aware of to 
> fix those and to clean up the code.
>
> Maybe we can take it one step at a time, pushing your series first, 
> cleaning the code and fixing some bugs.
> Then we can see what bugs are left (if any) and fix them. It might even 
> be easier to fix after the cleanups.
>
>> This is similar to an issue Peter was addressing where we missed calling
>> multifd_send_termiante_threads() in the multifd_channel_connect() path:
>>
>> patch 4 in this
>> https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com
>
> What issue are you referring here? Can you elaborate?

Oh, I just realised that series doesn't address any particular bug. But
my point is that including a call to multifd_send_terminate_threads() at
new_send_channel_cleanup might be all that's needed because that has
code to cause the channels and the migration thread to end.

> The main issue I am trying to fix in my patch is that we don't wait for 
> all multifd channels to be created/error out before tearing down
> multifd resources in mulitfd_save_cleanup().

Ok, let me take a step back and ask why is this not solved by
multifd_save_cleanup() -> qemu_thread_join()? I see you moved
p->running=true to *after* the thread creation in patch 4. That will
always leave a gap where p->running == false but the thread is already
running.

>
>>> Thread           | Code execution
>>> ------------------------------------------------------------------------
>>> Multifd 1        |
>>>                   | multifd_new_send_channel_async (errors and quits)
>>>                   |   multifd_new_send_channel_cleanup
>>>                   |
>>> Migration thread |
>>>                   | qemu_savevm_state_setup
>>>                   |   ram_save_setup
>>>                   |     multifd_send_sync_main
>>>                   |     (detects Multifd 1 error and quits)
>>>                   | [...]
>>>                   | migration_iteration_finish
>>>                   |   migrate_fd_cleanup_schedule
>>>                   |
>>> Main thread      |
>>>                   | migrate_fd_cleanup
>>>                   |   multifd_save_cleanup (destroys Multifd 2 resources)
>>>                   |
>>> Multifd 2        |
>>>                   | multifd_new_send_channel_async
>>>                   | (accesses destroyed resources, segfault)
>>>
>>> In another scenario, migration can hang indefinitely:
>>> 1. Main migration thread reaches multifd_send_sync_main() and waits on
>>>     the semaphores.
>>> 2. Then, all multifd channels creation fails, so they post the
>>>     semaphores and quit.
>>> 3. Main migration channel will not identify the error, proceed to send
>>>     pages and will hang.
>>>
>>> Fix it by waiting for all multifd channels to be created before
>>> proceeding with migration.
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>>   migration/multifd.h   |  3 +++
>>>   migration/migration.c |  1 +
>>>   migration/multifd.c   | 34 +++++++++++++++++++++++++++++++---
>>>   migration/ram.c       |  7 +++++++
>>>   4 files changed, 42 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/migration/multifd.h b/migration/multifd.h
>>> index 35d11f103c..87a64e0a87 100644
>>> --- a/migration/multifd.h
>>> +++ b/migration/multifd.h
>>> @@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>>   void multifd_recv_sync_main(void);
>>>   int multifd_send_sync_main(void);
>>>   int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
>>> +int multifd_send_channels_created(void);
>>>
>>>   /* Multifd Compression flags */
>>>   #define MULTIFD_FLAG_SYNC (1 << 0)
>>> @@ -86,6 +87,8 @@ typedef struct {
>>>       /* multifd flags for sending ram */
>>>       int write_flags;
>>>
>>> +    /* Syncs channel creation and migration thread */
>>> +    QemuSemaphore create_sem;
>>>       /* sem where to wait for more work */
>>>       QemuSemaphore sem;
>>>       /* syncs main thread and channels */
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 9c769a1ecd..d81d96eaa5 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>>           error_report_err(local_err);
>>>           migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>                             MIGRATION_STATUS_FAILED);
>>> +        multifd_send_channels_created();
>>>           migrate_fd_cleanup(s);
>>>           return;
>>>       }
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index 564e911b6c..f0c216f4f9 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
>>>           multifd_send_channel_destroy(p->c);
>>>           p->c = NULL;
>>>           qemu_mutex_destroy(&p->mutex);
>>> +        qemu_sem_destroy(&p->create_sem);
>>>           qemu_sem_destroy(&p->sem);
>>>           qemu_sem_destroy(&p->sem_sync);
>>>           g_free(p->name);
>>> @@ -766,6 +767,29 @@ out:
>>>       return NULL;
>>>   }
>>>
>>> +int multifd_send_channels_created(void)
>>> +{
>>> +    int ret = 0;
>>> +
>>> +    if (!migrate_multifd()) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    for (int i = 0; i < migrate_multifd_channels(); i++) {
>>> +        MultiFDSendParams *p = &multifd_send_state->params[i];
>>> +
>>> +        qemu_sem_wait(&p->create_sem);
>>> +        WITH_QEMU_LOCK_GUARD(&p->mutex) {
>>> +            if (p->quit) {
>>> +                error_report("%s: channel %d has already quit", __func__, i);
>>> +                ret = -1;
>>> +            }
>>> +        }
>> There are races here when a channel fails at
>> multifd_send_initial_packet(). If p->quit can be set after post to
>> create_sem, then this function could always return true and we'd run
>> into a broken channel. Possibly even the same bug you're trying to fix.
>>
>> I think that's one of the reasons we have channels_ready.
>
> I am not sure exactly what bug you are describing here, can you elaborate?
>

This looks like it could be a preexisting issue actually, but in the
current code, what stops the multifd channel from running ahead is
p->sem. Except that the channel does some work at
multifd_send_initial_packet() before checking p->sem and that work could
fail.

This means that right after checking for p->quit above, the multifd
thread could already be exiting due to an error and
multifd_send_channels_created() would miss it. So there's still a chance
that this function effectively behaves just like the previous code.

Thread           | Code execution
------------------------------------------------------------------------
Migration        |
                 | ram_save_setup()
                 |   multifd_send_channels_created()
                 |     qemu_sem_wait(&p->create_sem);
Main             |
                 | multifd_channel_connect()
                 |   qemu_thread_create()
                 |   qemu_sem_post(&p->create_sem)
Multifd 1        |
                 | multifd_send_initial_packet() *errors*
                 | goto out
                 | multifd_send_terminate_threads()
Migration        |
                 | still at multifd_send_channels_created
                 |     qemu_mutex_lock(&p->mutex);
                 |     p->quit == false      <--- !!!
                 |     qemu_mutex_unlock(&p->mutex);
                 | return true from multifd_send_channels_created()

From here onwards, it's the same as not having checked
multifd_send_channels_created() at all. One way this could go is:

                 | runs unimpeded until multifd_send_sync_main()
                 | multifd_send_pages()
                 | *misses the exiting flag*
                 | qemu_sem_wait(&multifd_send_state->channels_ready);
Multifd 1        |
                 | still at multifd_send_terminate_threads
                 |   qemu_mutex_lock(&p->mutex);
                 |   p->quit = true;
                 |   qemu_mutex_unlock(&p->mutex);
                 | qemu_sem_post(&p->sem_sync);
                 | qemu_sem_post(&multifd_send_state->channels_ready);
Migration        |
                 | qemu_mutex_lock(&p->mutex);
                 | p->quit == true;  <--- correct now
                 | qemu_mutex_unlock(&p->mutex);
                 | return -1;
                 | *all good again*

It seems that in order to avoid this kind of race we'd need the
synchronization point to be at the multifd thread instead. But then,
that's what channels_ready does.

>>
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>>   static bool multifd_channel_connect(MultiFDSendParams *p,
>>>                                       QIOChannel *ioc,
>>>                                       Error **errp);
>>> @@ -794,6 +818,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
>>>       p->quit = true;
>>>       qemu_sem_post(&multifd_send_state->channels_ready);
>>>       qemu_sem_post(&p->sem_sync);
>>> +    qemu_sem_post(&p->create_sem);
>>>       error_free(err);
>>>   }
>>>
>>> @@ -857,6 +882,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>>       qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>>                          QEMU_THREAD_JOINABLE);
>>>       p->running = true;
>>> +    qemu_sem_post(&p->create_sem);
>>>       return true;
>>>   }
>>>
>>> @@ -864,15 +890,16 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
>>>                                                QIOChannel *ioc, Error *err)
>>>   {
>>>        migrate_set_error(migrate_get_current(), err);
>>> -     /* Error happen, we need to tell who pay attention to me */
>>> -     qemu_sem_post(&multifd_send_state->channels_ready);
>>> -     qemu_sem_post(&p->sem_sync);
>>>        /*
>>> +      * Error happen, we need to tell who pay attention to me.
>>>         * Although multifd_send_thread is not created, but main migration
>>>         * thread need to judge whether it is running, so we need to mark
>>>         * its status.
>>>         */
>>>        p->quit = true;
>>> +     qemu_sem_post(&multifd_send_state->channels_ready);
>>> +     qemu_sem_post(&p->sem_sync);
>>> +     qemu_sem_post(&p->create_sem);
>> Do we still need channels_ready and sem_sync here? The migration thread
>> shouldn't have gone past create_sem at this point.
>
> I think you are right, we can drop channels_ready and sem_sync here.
>
>>
>>>        object_unref(OBJECT(ioc));
>>>        error_free(err);
>>>   }
>>> @@ -921,6 +948,7 @@ int multifd_save_setup(Error **errp)
>>>           MultiFDSendParams *p = &multifd_send_state->params[i];
>>>
>>>           qemu_mutex_init(&p->mutex);
>>> +        qemu_sem_init(&p->create_sem, 0);
>>>           qemu_sem_init(&p->sem, 0);
>>>           qemu_sem_init(&p->sem_sync, 0);
>>>           p->quit = false;
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index c0cdcccb75..b3e864a22b 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -2937,6 +2937,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>>       RAMBlock *block;
>>>       int ret;
>>>
>>> +    bql_unlock();
>>> +    ret = multifd_send_channels_created();
>>> +    bql_lock();
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>>       if (compress_threads_save_setup()) {
>>>           return -1;
>>>       }


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding
  2024-01-30 21:32       ` Fabiano Rosas
@ 2024-01-31  4:49         ` Peter Xu
  2024-01-31 10:39         ` Avihai Horon
  1 sibling, 0 replies; 37+ messages in thread
From: Peter Xu @ 2024-01-31  4:49 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Avihai Horon, qemu-devel

On Tue, Jan 30, 2024 at 06:32:21PM -0300, Fabiano Rosas wrote:
> Avihai Horon <avihaih@nvidia.com> writes:
> 
> > On 29/01/2024 16:34, Fabiano Rosas wrote:
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Avihai Horon <avihaih@nvidia.com> writes:
> >>
> >>> Currently, multifd channels are created asynchronously without waiting
> >>> for their creation -- migration simply proceeds and may wait in
> >>> multifd_send_sync_main(), which is called by ram_save_setup(). This
> >>> hides in it some race conditions which can cause an unexpected behavior
> >>> if some channels creation fail.
> >>>
> >>> For example, the following scenario of multifd migration with two
> >>> channels, where the first channel creation fails, will end in a
> >>> segmentation fault (time advances from top to bottom):
> >> Is this reproducible? Or just observable at least.
> >
> > Yes, though I had to engineer it a bit:
> > 1. Run migration with two multifd channels and fail creation of the two 
> > channels (e.g., by changing the address they are connecting to).
> > 2. Add sleep(3) in multifd_send_sync_main() before we loop through the 
> > channels and check p->quit.
> > 3. Add sleep(5) only for the second multifd channel connect thread so 
> > its connection is delayed and runs last.
> 
> Ok, well, that's something at least. I'll try to reproduce it so we can
> keep track of it.
> 
> >> I acknowledge the situation you describe, but with multifd there's
> >> usually an issue in cleanup paths. Let's make sure we flushed those out
> >> before adding this new semaphore.
> >
> > Indeed, I was not keen on adding yet another semaphore either.
> > I think there are multiple bugs here, some of them overlap and some don't.
> > There is also your and Peter's previous work that I was not aware of to 
> > fix those and to clean up the code.
> >
> > Maybe we can take it one step at a time, pushing your series first, 
> > cleaning the code and fixing some bugs.
> > Then we can see what bugs are left (if any) and fix them. It might even 
> > be easier to fix after the cleanups.
> >
> >> This is similar to an issue Peter was addressing where we missed calling
> >> multifd_send_termiante_threads() in the multifd_channel_connect() path:
> >>
> >> patch 4 in this
> >> https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com
> >
> > What issue are you referring here? Can you elaborate?
> 
> Oh, I just realised that series doesn't address any particular bug. But
> my point is that including a call to multifd_send_terminate_threads() at
> new_send_channel_cleanup might be all that's needed because that has
> code to cause the channels and the migration thread to end.

It seems so to me.

One other issue is I hope we can get rid of p->quit before adding more code
to operate on it.

I'll see whether I can respin that series this week soon.  I'll consider
dropping the last ones, but pick the initial ones that may already help.  I
just noticed patch 2 is already merged with Avihai's similar patch;
obviously I completely forgot that series..

> 
> > The main issue I am trying to fix in my patch is that we don't wait for 
> > all multifd channels to be created/error out before tearing down
> > multifd resources in mulitfd_save_cleanup().
> 
> Ok, let me take a step back and ask why is this not solved by
> multifd_save_cleanup() -> qemu_thread_join()? I see you moved
> p->running=true to *after* the thread creation in patch 4. That will
> always leave a gap where p->running == false but the thread is already
> running.

The whole threading in multifd currently is just IMHO a mess.  We keep
creating threads but never cared on how that goes, and how to sync with the
threads.

I do have plan to track every TID that migration creates (including the
ones of qio tasks, maybe?), then we can always manage the threads, and
making sure all the threads will be freed when migrate_fd_cleanup()
finishes, by join()ing each of them.  I suppose we may also need things
like pthread_cancel(), consider when any thread got blocked somewhere but
the admin invoked a "migrate-cancel" request.

With any dangling thread being there, we always face weird risks: either
some migration code will be scheduled even after migration failed (like
this one), or it could be worse if that thread only got scheduled until the
2nd migration started after the 1st one cancelled - we need to be prepared
to see some extra threads running, having no idea where did they come from,
and those bugs will be hard to debug.

I haven't yet started looking into that, it'll be good if anyone would like
to explore that direction for a full resolution on multifd threadings.

> 
> >
> >>> Thread           | Code execution
> >>> ------------------------------------------------------------------------
> >>> Multifd 1        |
> >>>                   | multifd_new_send_channel_async (errors and quits)
> >>>                   |   multifd_new_send_channel_cleanup
> >>>                   |
> >>> Migration thread |
> >>>                   | qemu_savevm_state_setup
> >>>                   |   ram_save_setup
> >>>                   |     multifd_send_sync_main
> >>>                   |     (detects Multifd 1 error and quits)
> >>>                   | [...]
> >>>                   | migration_iteration_finish
> >>>                   |   migrate_fd_cleanup_schedule
> >>>                   |
> >>> Main thread      |
> >>>                   | migrate_fd_cleanup
> >>>                   |   multifd_save_cleanup (destroys Multifd 2 resources)
> >>>                   |
> >>> Multifd 2        |
> >>>                   | multifd_new_send_channel_async
> >>>                   | (accesses destroyed resources, segfault)
> >>>
> >>> In another scenario, migration can hang indefinitely:
> >>> 1. Main migration thread reaches multifd_send_sync_main() and waits on
> >>>     the semaphores.
> >>> 2. Then, all multifd channels creation fails, so they post the
> >>>     semaphores and quit.
> >>> 3. Main migration channel will not identify the error, proceed to send
> >>>     pages and will hang.
> >>>
> >>> Fix it by waiting for all multifd channels to be created before
> >>> proceeding with migration.
> >>>
> >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> >>> ---
> >>>   migration/multifd.h   |  3 +++
> >>>   migration/migration.c |  1 +
> >>>   migration/multifd.c   | 34 +++++++++++++++++++++++++++++++---
> >>>   migration/ram.c       |  7 +++++++
> >>>   4 files changed, 42 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/migration/multifd.h b/migration/multifd.h
> >>> index 35d11f103c..87a64e0a87 100644
> >>> --- a/migration/multifd.h
> >>> +++ b/migration/multifd.h
> >>> @@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
> >>>   void multifd_recv_sync_main(void);
> >>>   int multifd_send_sync_main(void);
> >>>   int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
> >>> +int multifd_send_channels_created(void);
> >>>
> >>>   /* Multifd Compression flags */
> >>>   #define MULTIFD_FLAG_SYNC (1 << 0)
> >>> @@ -86,6 +87,8 @@ typedef struct {
> >>>       /* multifd flags for sending ram */
> >>>       int write_flags;
> >>>
> >>> +    /* Syncs channel creation and migration thread */
> >>> +    QemuSemaphore create_sem;
> >>>       /* sem where to wait for more work */
> >>>       QemuSemaphore sem;
> >>>       /* syncs main thread and channels */
> >>> diff --git a/migration/migration.c b/migration/migration.c
> >>> index 9c769a1ecd..d81d96eaa5 100644
> >>> --- a/migration/migration.c
> >>> +++ b/migration/migration.c
> >>> @@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> >>>           error_report_err(local_err);
> >>>           migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> >>>                             MIGRATION_STATUS_FAILED);
> >>> +        multifd_send_channels_created();
> >>>           migrate_fd_cleanup(s);
> >>>           return;
> >>>       }
> >>> diff --git a/migration/multifd.c b/migration/multifd.c
> >>> index 564e911b6c..f0c216f4f9 100644
> >>> --- a/migration/multifd.c
> >>> +++ b/migration/multifd.c
> >>> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
> >>>           multifd_send_channel_destroy(p->c);
> >>>           p->c = NULL;
> >>>           qemu_mutex_destroy(&p->mutex);
> >>> +        qemu_sem_destroy(&p->create_sem);
> >>>           qemu_sem_destroy(&p->sem);
> >>>           qemu_sem_destroy(&p->sem_sync);
> >>>           g_free(p->name);
> >>> @@ -766,6 +767,29 @@ out:
> >>>       return NULL;
> >>>   }
> >>>
> >>> +int multifd_send_channels_created(void)
> >>> +{
> >>> +    int ret = 0;
> >>> +
> >>> +    if (!migrate_multifd()) {
> >>> +        return 0;
> >>> +    }
> >>> +
> >>> +    for (int i = 0; i < migrate_multifd_channels(); i++) {
> >>> +        MultiFDSendParams *p = &multifd_send_state->params[i];
> >>> +
> >>> +        qemu_sem_wait(&p->create_sem);
> >>> +        WITH_QEMU_LOCK_GUARD(&p->mutex) {
> >>> +            if (p->quit) {
> >>> +                error_report("%s: channel %d has already quit", __func__, i);
> >>> +                ret = -1;
> >>> +            }
> >>> +        }
> >> There are races here when a channel fails at
> >> multifd_send_initial_packet(). If p->quit can be set after post to
> >> create_sem, then this function could always return true and we'd run
> >> into a broken channel. Possibly even the same bug you're trying to fix.
> >>
> >> I think that's one of the reasons we have channels_ready.
> >
> > I am not sure exactly what bug you are describing here, can you elaborate?
> >
> 
> This looks like it could be a preexisting issue actually, but in the
> current code, what stops the multifd channel from running ahead is
> p->sem. Except that the channel does some work at
> multifd_send_initial_packet() before checking p->sem and that work could
> fail.
> 
> This means that right after checking for p->quit above, the multifd
> thread could already be exiting due to an error and
> multifd_send_channels_created() would miss it. So there's still a chance
> that this function effectively behaves just like the previous code.
> 
> Thread           | Code execution
> ------------------------------------------------------------------------
> Migration        |
>                  | ram_save_setup()
>                  |   multifd_send_channels_created()
>                  |     qemu_sem_wait(&p->create_sem);
> Main             |
>                  | multifd_channel_connect()
>                  |   qemu_thread_create()
>                  |   qemu_sem_post(&p->create_sem)
> Multifd 1        |
>                  | multifd_send_initial_packet() *errors*
>                  | goto out
>                  | multifd_send_terminate_threads()
> Migration        |
>                  | still at multifd_send_channels_created
>                  |     qemu_mutex_lock(&p->mutex);
>                  |     p->quit == false      <--- !!!
>                  |     qemu_mutex_unlock(&p->mutex);
>                  | return true from multifd_send_channels_created()
> 
> From here onwards, it's the same as not having checked
> multifd_send_channels_created() at all. One way this could go is:
> 
>                  | runs unimpeded until multifd_send_sync_main()
>                  | multifd_send_pages()
>                  | *misses the exiting flag*
>                  | qemu_sem_wait(&multifd_send_state->channels_ready);
> Multifd 1        |
>                  | still at multifd_send_terminate_threads
>                  |   qemu_mutex_lock(&p->mutex);
>                  |   p->quit = true;
>                  |   qemu_mutex_unlock(&p->mutex);
>                  | qemu_sem_post(&p->sem_sync);
>                  | qemu_sem_post(&multifd_send_state->channels_ready);
> Migration        |
>                  | qemu_mutex_lock(&p->mutex);
>                  | p->quit == true;  <--- correct now
>                  | qemu_mutex_unlock(&p->mutex);
>                  | return -1;
>                  | *all good again*
> 
> It seems that in order to avoid this kind of race we'd need the
> synchronization point to be at the multifd thread instead. But then,
> that's what channels_ready does.
> 
> >>
> >>> +    }
> >>> +
> >>> +    return ret;
> >>> +}
> >>> +
> >>>   static bool multifd_channel_connect(MultiFDSendParams *p,
> >>>                                       QIOChannel *ioc,
> >>>                                       Error **errp);
> >>> @@ -794,6 +818,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
> >>>       p->quit = true;
> >>>       qemu_sem_post(&multifd_send_state->channels_ready);
> >>>       qemu_sem_post(&p->sem_sync);
> >>> +    qemu_sem_post(&p->create_sem);
> >>>       error_free(err);
> >>>   }
> >>>
> >>> @@ -857,6 +882,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
> >>>       qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
> >>>                          QEMU_THREAD_JOINABLE);
> >>>       p->running = true;
> >>> +    qemu_sem_post(&p->create_sem);
> >>>       return true;
> >>>   }
> >>>
> >>> @@ -864,15 +890,16 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
> >>>                                                QIOChannel *ioc, Error *err)
> >>>   {
> >>>        migrate_set_error(migrate_get_current(), err);
> >>> -     /* Error happen, we need to tell who pay attention to me */
> >>> -     qemu_sem_post(&multifd_send_state->channels_ready);
> >>> -     qemu_sem_post(&p->sem_sync);
> >>>        /*
> >>> +      * Error happen, we need to tell who pay attention to me.
> >>>         * Although multifd_send_thread is not created, but main migration
> >>>         * thread need to judge whether it is running, so we need to mark
> >>>         * its status.
> >>>         */
> >>>        p->quit = true;
> >>> +     qemu_sem_post(&multifd_send_state->channels_ready);
> >>> +     qemu_sem_post(&p->sem_sync);
> >>> +     qemu_sem_post(&p->create_sem);
> >> Do we still need channels_ready and sem_sync here? The migration thread
> >> shouldn't have gone past create_sem at this point.
> >
> > I think you are right, we can drop channels_ready and sem_sync here.
> >
> >>
> >>>        object_unref(OBJECT(ioc));
> >>>        error_free(err);
> >>>   }
> >>> @@ -921,6 +948,7 @@ int multifd_save_setup(Error **errp)
> >>>           MultiFDSendParams *p = &multifd_send_state->params[i];
> >>>
> >>>           qemu_mutex_init(&p->mutex);
> >>> +        qemu_sem_init(&p->create_sem, 0);
> >>>           qemu_sem_init(&p->sem, 0);
> >>>           qemu_sem_init(&p->sem_sync, 0);
> >>>           p->quit = false;
> >>> diff --git a/migration/ram.c b/migration/ram.c
> >>> index c0cdcccb75..b3e864a22b 100644
> >>> --- a/migration/ram.c
> >>> +++ b/migration/ram.c
> >>> @@ -2937,6 +2937,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >>>       RAMBlock *block;
> >>>       int ret;
> >>>
> >>> +    bql_unlock();
> >>> +    ret = multifd_send_channels_created();
> >>> +    bql_lock();
> >>> +    if (ret < 0) {
> >>> +        return ret;
> >>> +    }
> >>> +
> >>>       if (compress_threads_save_setup()) {
> >>>           return -1;
> >>>       }
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding
  2024-01-30 21:32       ` Fabiano Rosas
  2024-01-31  4:49         ` Peter Xu
@ 2024-01-31 10:39         ` Avihai Horon
  1 sibling, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-31 10:39 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel; +Cc: Peter Xu


On 30/01/2024 23:32, Fabiano Rosas wrote:
> External email: Use caution opening links or attachments
>
>
> Avihai Horon <avihaih@nvidia.com> writes:
>
>> On 29/01/2024 16:34, Fabiano Rosas wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Avihai Horon <avihaih@nvidia.com> writes:
>>>
>>>> Currently, multifd channels are created asynchronously without waiting
>>>> for their creation -- migration simply proceeds and may wait in
>>>> multifd_send_sync_main(), which is called by ram_save_setup(). This
>>>> hides in it some race conditions which can cause an unexpected behavior
>>>> if some channels creation fail.
>>>>
>>>> For example, the following scenario of multifd migration with two
>>>> channels, where the first channel creation fails, will end in a
>>>> segmentation fault (time advances from top to bottom):
>>> Is this reproducible? Or just observable at least.
>> Yes, though I had to engineer it a bit:
>> 1. Run migration with two multifd channels and fail creation of the two
>> channels (e.g., by changing the address they are connecting to).
>> 2. Add sleep(3) in multifd_send_sync_main() before we loop through the
>> channels and check p->quit.
>> 3. Add sleep(5) only for the second multifd channel connect thread so
>> its connection is delayed and runs last.
> Ok, well, that's something at least. I'll try to reproduce it so we can
> keep track of it.
>
>>> I acknowledge the situation you describe, but with multifd there's
>>> usually an issue in cleanup paths. Let's make sure we flushed those out
>>> before adding this new semaphore.
>> Indeed, I was not keen on adding yet another semaphore either.
>> I think there are multiple bugs here, some of them overlap and some don't.
>> There is also your and Peter's previous work that I was not aware of to
>> fix those and to clean up the code.
>>
>> Maybe we can take it one step at a time, pushing your series first,
>> cleaning the code and fixing some bugs.
>> Then we can see what bugs are left (if any) and fix them. It might even
>> be easier to fix after the cleanups.
>>
>>> This is similar to an issue Peter was addressing where we missed calling
>>> multifd_send_termiante_threads() in the multifd_channel_connect() path:
>>>
>>> patch 4 in this
>>> https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com
>> What issue are you referring here? Can you elaborate?
> Oh, I just realised that series doesn't address any particular bug. But
> my point is that including a call to multifd_send_terminate_threads() at
> new_send_channel_cleanup might be all that's needed because that has
> code to cause the channels and the migration thread to end.

Yes, that's good and it solves the other bug I see where migration hangs.

>> The main issue I am trying to fix in my patch is that we don't wait for
>> all multifd channels to be created/error out before tearing down
>> multifd resources in mulitfd_save_cleanup().
> Ok, let me take a step back and ask why is this not solved by
> multifd_save_cleanup() -> qemu_thread_join()?

Because when we get to multifd_save_cleanup() there is no guarantee that 
the other threads have either successfully been established or errored out.
So we can have multifd_1 error out, triggering multifd_save_cleanup() 
which destroys all resources, and only then multifd_2 will get to 
multifd_new_send_channel_async() but by this time resources have already 
been destroyed.
qemu_thread_join() for multifd_2 in this case is simply "skipped".

Actually, while testing Peter's patch I encountered such error with less 
"engineering", I only caused the multifd threads to fail by changing 
their connect address.
Maybe now the error flow takes more time so this bug is more observable. 
It doesn't repro constantly though.

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555bbc6c2 in multifd_send_terminate_threads 
(err=0x7ffea8001310) at ../migration/multifd.c:485
485         if (qatomic_xchg(&multifd_send_state->exiting, 1)) {
(gdb) bt
#0  0x0000555555bbc6c2 in multifd_send_terminate_threads 
(err=0x7ffea8001310) at ../migration/multifd.c:485
#1  0x0000555555bbd6d0 in multifd_new_send_channel_async 
(task=0x555557f19730, opaque=0x5555571e0a38) at ../migration/multifd.c:862
#2  0x0000555555e6f010 in qio_task_complete (task=0x555557f19730) at 
../io/task.c:197
#3  0x0000555555e6eca3 in qio_task_thread_result (opaque=0x555557f19730) 
at ../io/task.c:112
#4  0x00007ffff780145b in g_idle_dispatch () at /lib64/libglib-2.0.so.0
#5  0x00007ffff780578f in g_main_context_dispatch () at 
/lib64/libglib-2.0.so.0
#6  0x0000555556138d30 in glib_pollfds_poll () at ../util/main-loop.c:287
#7  0x0000555556138daa in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:310
#8  0x0000555556138eaf in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:589
#9  0x0000555555b88129 in qemu_main_loop () at ../system/runstate.c:783
#10 0x0000555555e48e97 in qemu_default_main () at ../system/main.c:37
#11 0x0000555555e48ed2 in main (argc=20, argv=0x7fffffffe218) at 
../system/main.c:48
(gdb) print multifd_send_state
$1 = (struct {...} *) 0x0

> I see you moved
> p->running=true to *after* the thread creation in patch 4. That will
> always leave a gap where p->running == false but the thread is already
> running.

Yes, Patch #4 addressed a specific bug with TLS where we can get a segfault.
It didn't try to solve this gap.

>
>>>> Thread           | Code execution
>>>> ------------------------------------------------------------------------
>>>> Multifd 1        |
>>>>                    | multifd_new_send_channel_async (errors and quits)
>>>>                    |   multifd_new_send_channel_cleanup
>>>>                    |
>>>> Migration thread |
>>>>                    | qemu_savevm_state_setup
>>>>                    |   ram_save_setup
>>>>                    |     multifd_send_sync_main
>>>>                    |     (detects Multifd 1 error and quits)
>>>>                    | [...]
>>>>                    | migration_iteration_finish
>>>>                    |   migrate_fd_cleanup_schedule
>>>>                    |
>>>> Main thread      |
>>>>                    | migrate_fd_cleanup
>>>>                    |   multifd_save_cleanup (destroys Multifd 2 resources)
>>>>                    |
>>>> Multifd 2        |
>>>>                    | multifd_new_send_channel_async
>>>>                    | (accesses destroyed resources, segfault)
>>>>
>>>> In another scenario, migration can hang indefinitely:
>>>> 1. Main migration thread reaches multifd_send_sync_main() and waits on
>>>>      the semaphores.
>>>> 2. Then, all multifd channels creation fails, so they post the
>>>>      semaphores and quit.
>>>> 3. Main migration channel will not identify the error, proceed to send
>>>>      pages and will hang.
>>>>
>>>> Fix it by waiting for all multifd channels to be created before
>>>> proceeding with migration.
>>>>
>>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>>> ---
>>>>    migration/multifd.h   |  3 +++
>>>>    migration/migration.c |  1 +
>>>>    migration/multifd.c   | 34 +++++++++++++++++++++++++++++++---
>>>>    migration/ram.c       |  7 +++++++
>>>>    4 files changed, 42 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/migration/multifd.h b/migration/multifd.h
>>>> index 35d11f103c..87a64e0a87 100644
>>>> --- a/migration/multifd.h
>>>> +++ b/migration/multifd.h
>>>> @@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>>>    void multifd_recv_sync_main(void);
>>>>    int multifd_send_sync_main(void);
>>>>    int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
>>>> +int multifd_send_channels_created(void);
>>>>
>>>>    /* Multifd Compression flags */
>>>>    #define MULTIFD_FLAG_SYNC (1 << 0)
>>>> @@ -86,6 +87,8 @@ typedef struct {
>>>>        /* multifd flags for sending ram */
>>>>        int write_flags;
>>>>
>>>> +    /* Syncs channel creation and migration thread */
>>>> +    QemuSemaphore create_sem;
>>>>        /* sem where to wait for more work */
>>>>        QemuSemaphore sem;
>>>>        /* syncs main thread and channels */
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 9c769a1ecd..d81d96eaa5 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>>>            error_report_err(local_err);
>>>>            migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>>                              MIGRATION_STATUS_FAILED);
>>>> +        multifd_send_channels_created();
>>>>            migrate_fd_cleanup(s);
>>>>            return;
>>>>        }
>>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>>> index 564e911b6c..f0c216f4f9 100644
>>>> --- a/migration/multifd.c
>>>> +++ b/migration/multifd.c
>>>> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
>>>>            multifd_send_channel_destroy(p->c);
>>>>            p->c = NULL;
>>>>            qemu_mutex_destroy(&p->mutex);
>>>> +        qemu_sem_destroy(&p->create_sem);
>>>>            qemu_sem_destroy(&p->sem);
>>>>            qemu_sem_destroy(&p->sem_sync);
>>>>            g_free(p->name);
>>>> @@ -766,6 +767,29 @@ out:
>>>>        return NULL;
>>>>    }
>>>>
>>>> +int multifd_send_channels_created(void)
>>>> +{
>>>> +    int ret = 0;
>>>> +
>>>> +    if (!migrate_multifd()) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    for (int i = 0; i < migrate_multifd_channels(); i++) {
>>>> +        MultiFDSendParams *p = &multifd_send_state->params[i];
>>>> +
>>>> +        qemu_sem_wait(&p->create_sem);
>>>> +        WITH_QEMU_LOCK_GUARD(&p->mutex) {
>>>> +            if (p->quit) {
>>>> +                error_report("%s: channel %d has already quit", __func__, i);
>>>> +                ret = -1;
>>>> +            }
>>>> +        }
>>> There are races here when a channel fails at
>>> multifd_send_initial_packet(). If p->quit can be set after post to
>>> create_sem, then this function could always return true and we'd run
>>> into a broken channel. Possibly even the same bug you're trying to fix.
>>>
>>> I think that's one of the reasons we have channels_ready.
>> I am not sure exactly what bug you are describing here, can you elaborate?
>>
> This looks like it could be a preexisting issue actually, but in the
> current code, what stops the multifd channel from running ahead is
> p->sem. Except that the channel does some work at
> multifd_send_initial_packet() before checking p->sem and that work could
> fail.
>
> This means that right after checking for p->quit above, the multifd
> thread could already be exiting due to an error and
> multifd_send_channels_created() would miss it.

And it's fine, because the purpose of multifd_send_channels_created() is 
only to make sure that when it returns, all multifd channels have either 
been established or errored out (so we don't have left over threads 
running after multifd_save_cleanup()).
Maybe adding this p->quit check is confusing and redundant.

Thinking about it again, maybe the location of 
multifd_send_channels_created() inside ram_save_setup() is not the best 
-- if a fast migrate_cancel happens or some other .save_setup() handler 
errors before ram_save_setup(), multifd_send_channels_created() will be 
skipped and we could end up in the same situation.

Putting multifd_send_channels_created() at the beginning of 
multifd_save_cleanup() will also cover the above cases.

> So there's still a chance
> that this function effectively behaves just like the previous code.
>
> Thread           | Code execution
> ------------------------------------------------------------------------
> Migration        |
>                   | ram_save_setup()
>                   |   multifd_send_channels_created()
>                   |     qemu_sem_wait(&p->create_sem);
> Main             |
>                   | multifd_channel_connect()
>                   |   qemu_thread_create()
>                   |   qemu_sem_post(&p->create_sem)
> Multifd 1        |
>                   | multifd_send_initial_packet() *errors*
>                   | goto out
>                   | multifd_send_terminate_threads()
> Migration        |
>                   | still at multifd_send_channels_created
>                   |     qemu_mutex_lock(&p->mutex);
>                   |     p->quit == false      <--- !!!
>                   |     qemu_mutex_unlock(&p->mutex);
>                   | return true from multifd_send_channels_created()
>
>  From here onwards, it's the same as not having checked
> multifd_send_channels_created() at all. One way this could go is:
>
>                   | runs unimpeded until multifd_send_sync_main()
>                   | multifd_send_pages()
>                   | *misses the exiting flag*
>                   | qemu_sem_wait(&multifd_send_state->channels_ready);
> Multifd 1        |
>                   | still at multifd_send_terminate_threads
>                   |   qemu_mutex_lock(&p->mutex);
>                   |   p->quit = true;
>                   |   qemu_mutex_unlock(&p->mutex);
>                   | qemu_sem_post(&p->sem_sync);
>                   | qemu_sem_post(&multifd_send_state->channels_ready);
> Migration        |
>                   | qemu_mutex_lock(&p->mutex);
>                   | p->quit == true;  <--- correct now
>                   | qemu_mutex_unlock(&p->mutex);
>                   | return -1;
>                   | *all good again*
>
> It seems that in order to avoid this kind of race we'd need the
> synchronization point to be at the multifd thread instead. But then,
> that's what channels_ready does.
>
>>>> +    }
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>>    static bool multifd_channel_connect(MultiFDSendParams *p,
>>>>                                        QIOChannel *ioc,
>>>>                                        Error **errp);
>>>> @@ -794,6 +818,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
>>>>        p->quit = true;
>>>>        qemu_sem_post(&multifd_send_state->channels_ready);
>>>>        qemu_sem_post(&p->sem_sync);
>>>> +    qemu_sem_post(&p->create_sem);
>>>>        error_free(err);
>>>>    }
>>>>
>>>> @@ -857,6 +882,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>>>        qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>>>                           QEMU_THREAD_JOINABLE);
>>>>        p->running = true;
>>>> +    qemu_sem_post(&p->create_sem);
>>>>        return true;
>>>>    }
>>>>
>>>> @@ -864,15 +890,16 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
>>>>                                                 QIOChannel *ioc, Error *err)
>>>>    {
>>>>         migrate_set_error(migrate_get_current(), err);
>>>> -     /* Error happen, we need to tell who pay attention to me */
>>>> -     qemu_sem_post(&multifd_send_state->channels_ready);
>>>> -     qemu_sem_post(&p->sem_sync);
>>>>         /*
>>>> +      * Error happen, we need to tell who pay attention to me.
>>>>          * Although multifd_send_thread is not created, but main migration
>>>>          * thread need to judge whether it is running, so we need to mark
>>>>          * its status.
>>>>          */
>>>>         p->quit = true;
>>>> +     qemu_sem_post(&multifd_send_state->channels_ready);
>>>> +     qemu_sem_post(&p->sem_sync);
>>>> +     qemu_sem_post(&p->create_sem);
>>> Do we still need channels_ready and sem_sync here? The migration thread
>>> shouldn't have gone past create_sem at this point.
>> I think you are right, we can drop channels_ready and sem_sync here.
>>
>>>>         object_unref(OBJECT(ioc));
>>>>         error_free(err);
>>>>    }
>>>> @@ -921,6 +948,7 @@ int multifd_save_setup(Error **errp)
>>>>            MultiFDSendParams *p = &multifd_send_state->params[i];
>>>>
>>>>            qemu_mutex_init(&p->mutex);
>>>> +        qemu_sem_init(&p->create_sem, 0);
>>>>            qemu_sem_init(&p->sem, 0);
>>>>            qemu_sem_init(&p->sem_sync, 0);
>>>>            p->quit = false;
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index c0cdcccb75..b3e864a22b 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -2937,6 +2937,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>>>        RAMBlock *block;
>>>>        int ret;
>>>>
>>>> +    bql_unlock();
>>>> +    ret = multifd_send_channels_created();
>>>> +    bql_lock();
>>>> +    if (ret < 0) {
>>>> +        return ret;
>>>> +    }
>>>> +
>>>>        if (compress_threads_save_setup()) {
>>>>            return -1;
>>>>        }


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 06/17] migration/tls: Rename main migration channel TLS functions
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (4 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 07/17] migration/tls: Add new migration channel TLS upgrade API Avihai Horon
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Rename functions related to main migration channel TLS upgrade. This is
done in preparation for the next patch which will add a new TLS upgrade
API for migration channels with the same name.
No functional changes intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/tls.h        |  6 ++----
 migration/channel.c    |  2 +-
 migration/tls.c        | 24 ++++++++++--------------
 migration/trace-events |  6 +++---
 4 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/migration/tls.h b/migration/tls.h
index 5797d153cb..5435dd4867 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -32,10 +32,8 @@ QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
                                            const char *hostname,
                                            Error **errp);
 
-void migration_tls_channel_connect(MigrationState *s,
-                                   QIOChannel *ioc,
-                                   const char *hostname,
-                                   Error **errp);
+void migration_tls_channel_connect_main(MigrationState *s, QIOChannel *ioc,
+                                        const char *hostname, Error **errp);
 
 /* Whether the QIO channel requires further TLS handshake? */
 bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
diff --git a/migration/channel.c b/migration/channel.c
index f9de064f3b..041a63eb21 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -69,7 +69,7 @@ void migration_channel_connect(MigrationState *s,
 
     if (!error) {
         if (migrate_channel_requires_tls_upgrade(ioc)) {
-            migration_tls_channel_connect(s, ioc, hostname, &error);
+            migration_tls_channel_connect_main(s, ioc, hostname, &error);
 
             if (!error) {
                 /* tls_channel_connect will call back to this
diff --git a/migration/tls.c b/migration/tls.c
index fa03d9136c..803cb54c8b 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -98,17 +98,18 @@ void migration_tls_channel_process_incoming(MigrationState *s,
 }
 
 
-static void migration_tls_outgoing_handshake(QIOTask *task,
-                                             gpointer opaque)
+static void migration_tls_outgoing_handshake_main(QIOTask *task,
+                                                  gpointer opaque)
 {
     MigrationState *s = opaque;
     QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
     Error *err = NULL;
 
     if (qio_task_propagate_error(task, &err)) {
-        trace_migration_tls_outgoing_handshake_error(error_get_pretty(err));
+        trace_migration_tls_outgoing_handshake_main_error(
+            error_get_pretty(err));
     } else {
-        trace_migration_tls_outgoing_handshake_complete();
+        trace_migration_tls_outgoing_handshake_main_complete();
     }
     migration_channel_connect(s, ioc, NULL, err);
     object_unref(OBJECT(ioc));
@@ -133,10 +134,8 @@ QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
     return qio_channel_tls_new_client(ioc, creds, hostname, errp);
 }
 
-void migration_tls_channel_connect(MigrationState *s,
-                                   QIOChannel *ioc,
-                                   const char *hostname,
-                                   Error **errp)
+void migration_tls_channel_connect_main(MigrationState *s, QIOChannel *ioc,
+                                        const char *hostname, Error **errp)
 {
     QIOChannelTLS *tioc;
 
@@ -147,13 +146,10 @@ void migration_tls_channel_connect(MigrationState *s,
 
     /* Save hostname into MigrationState for handshake */
     s->hostname = g_strdup(hostname);
-    trace_migration_tls_outgoing_handshake_start(hostname);
+    trace_migration_tls_outgoing_handshake_main_start(hostname);
     qio_channel_set_name(QIO_CHANNEL(tioc), "migration-tls-outgoing");
-    qio_channel_tls_handshake(tioc,
-                              migration_tls_outgoing_handshake,
-                              s,
-                              NULL,
-                              NULL);
+    qio_channel_tls_handshake(tioc, migration_tls_outgoing_handshake_main, s,
+                              NULL, NULL);
 }
 
 bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
diff --git a/migration/trace-events b/migration/trace-events
index de4a743c8a..9448b5cedf 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -325,9 +325,9 @@ migration_socket_outgoing_connected(const char *hostname) "hostname=%s"
 migration_socket_outgoing_error(const char *err) "error=%s"
 
 # tls.c
-migration_tls_outgoing_handshake_start(const char *hostname) "hostname=%s"
-migration_tls_outgoing_handshake_error(const char *err) "err=%s"
-migration_tls_outgoing_handshake_complete(void) ""
+migration_tls_outgoing_handshake_main_start(const char *hostname) "hostname=%s"
+migration_tls_outgoing_handshake_main_error(const char *err) "err=%s"
+migration_tls_outgoing_handshake_main_complete(void) ""
 migration_tls_incoming_handshake_start(void) ""
 migration_tls_incoming_handshake_error(const char *err) "err=%s"
 migration_tls_incoming_handshake_complete(void) ""
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/17] migration/tls: Add new migration channel TLS upgrade API
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (5 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 06/17] migration/tls: Rename main migration channel TLS functions Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 08/17] migration: Use the new TLS upgrade API for main channel Avihai Horon
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Main migration channel, multifd channels and postcopy preempt channel
use the QIOChannelTLS API to upgrade their channels to TLS when needed.

Each of them has its own code to create a QIOChannelTLS and to perform
the TLS handshake. Some of this code is duplicate and can be avoided.

Add a new API to TLS upgrade migration channels. This will make the code
clearer and avoid duplicate code such as TLS handshake, trace handling
and threading.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/tls.h        | 27 ++++++++++++++++
 migration/tls.c        | 72 ++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events |  3 ++
 3 files changed, 102 insertions(+)

diff --git a/migration/tls.h b/migration/tls.h
index 5435dd4867..514529ff38 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -35,6 +35,33 @@ QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
 void migration_tls_channel_connect_main(MigrationState *s, QIOChannel *ioc,
                                         const char *hostname, Error **errp);
 
+typedef void (*MigTLSConCallback)(QIOChannel *ioc, void *opaque, Error *err);
+
+/**
+ * migration_tls_channel_connect:
+ * @ioc: The underlying channel object
+ * @name: The name of the channel
+ * @hostname: The user specified server hostname
+ * @callback: The callback to invoke when completed
+ * @opaque: Opaque data to pass to @callback
+ * @run_in_thread: Whether to run TLS handshake in new thread or not
+ * @errp: Pointer to a NULL-initialized error object pointer
+ *
+ * Establishes a TLS connection on top of the provided QIOChannel @ioc. If this
+ * function succeeds, @callback will be invoked upon completion and
+ * success/failure will be reported to it via the Error object argument.
+ * In case multiple channels are TLS upgraded in parallel, @run_in_thread
+ * should be set to true so the TLS handshake will be performed in a new
+ * thread, to avoid a potential risk of migration hang.
+ *
+ * Returns: True on successful initiation of TLS upgrade process, or false on
+ * failure.
+ */
+bool migration_tls_channel_connect(QIOChannel *ioc, const char *name,
+                                   const char *hostname,
+                                   MigTLSConCallback callback, void *opaque,
+                                   bool run_in_thread, Error **errp);
+
 /* Whether the QIO channel requires further TLS handshake? */
 bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
 
diff --git a/migration/tls.c b/migration/tls.c
index 803cb54c8b..e6a0349bd1 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -152,6 +152,78 @@ void migration_tls_channel_connect_main(MigrationState *s, QIOChannel *ioc,
                               NULL, NULL);
 }
 
+typedef struct {
+    QIOChannelTLS *tioc;
+    MigTLSConCallback callback;
+    void *opaque;
+    char *name;
+    QemuThread thread;
+} MigTLSConData;
+
+static void migration_tls_outgoing_handshake(QIOTask *task, void *opaque)
+{
+    QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
+    MigTLSConData *data = opaque;
+    Error *err = NULL;
+
+    if (qio_task_propagate_error(task, &err)) {
+        trace_migration_tls_outgoing_handshake_error(data->name,
+                                                     error_get_pretty(err));
+    } else {
+        trace_migration_tls_outgoing_handshake_complete(data->name);
+    }
+
+    data->callback(ioc, data->opaque, err);
+    g_free(data->name);
+    g_free(data);
+}
+
+static void *migration_tls_channel_connect_thread(void *opaque)
+{
+    MigTLSConData *data = opaque;
+
+    qio_channel_tls_handshake(data->tioc, migration_tls_outgoing_handshake,
+                              data, NULL, NULL);
+    return NULL;
+}
+
+bool migration_tls_channel_connect(QIOChannel *ioc, const char *name,
+                                   const char *hostname,
+                                   MigTLSConCallback callback, void *opaque,
+                                   bool run_in_thread, Error **errp)
+{
+    QIOChannelTLS *tioc;
+    MigTLSConData *data;
+    g_autofree char *channel_name = NULL;
+    g_autofree char *thread_name = NULL;
+
+    tioc = migration_tls_client_create(ioc, hostname, errp);
+    if (!tioc) {
+        return false;
+    }
+
+    data = g_new0(MigTLSConData, 1);
+    data->tioc = tioc;
+    data->callback = callback;
+    data->opaque = opaque;
+    data->name = g_strdup(name);
+
+    trace_migration_tls_outgoing_handshake_start(hostname, name);
+    channel_name = g_strdup_printf("migration-tls-outgoing-%s", name);
+    qio_channel_set_name(QIO_CHANNEL(tioc), channel_name);
+    if (!run_in_thread) {
+        qio_channel_tls_handshake(tioc, migration_tls_outgoing_handshake, data,
+                                  NULL, NULL);
+        return true;
+    }
+
+    thread_name = g_strdup_printf("migration-tls-outgoing-worker-%s", name);
+    qemu_thread_create(&data->thread, thread_name,
+                       migration_tls_channel_connect_thread, data,
+                       QEMU_THREAD_JOINABLE);
+    return true;
+}
+
 bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
 {
     if (!migrate_tls()) {
diff --git a/migration/trace-events b/migration/trace-events
index 9448b5cedf..09dd342d37 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -328,6 +328,9 @@ migration_socket_outgoing_error(const char *err) "error=%s"
 migration_tls_outgoing_handshake_main_start(const char *hostname) "hostname=%s"
 migration_tls_outgoing_handshake_main_error(const char *err) "err=%s"
 migration_tls_outgoing_handshake_main_complete(void) ""
+migration_tls_outgoing_handshake_start(const char *hostname, const char *name) "hostname=%s, name=%s"
+migration_tls_outgoing_handshake_error(const char *name, const char *err) "name=%s, err=%s"
+migration_tls_outgoing_handshake_complete(const char *name) "name=%s"
 migration_tls_incoming_handshake_start(void) ""
 migration_tls_incoming_handshake_error(const char *err) "err=%s"
 migration_tls_incoming_handshake_complete(void) ""
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/17] migration: Use the new TLS upgrade API for main channel
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (6 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 07/17] migration/tls: Add new migration channel TLS upgrade API Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 09/17] migration/multifd: Use the new TLS upgrade API for multifd channels Avihai Horon
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Use the new TLS upgrade API for main migration channel and remove the
old TLS code.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/tls.h        |  3 ---
 migration/channel.c    | 24 +++++++++++++++++-------
 migration/tls.c        | 36 ------------------------------------
 migration/trace-events |  3 ---
 4 files changed, 17 insertions(+), 49 deletions(-)

diff --git a/migration/tls.h b/migration/tls.h
index 514529ff38..a6babbfa14 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -32,9 +32,6 @@ QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
                                            const char *hostname,
                                            Error **errp);
 
-void migration_tls_channel_connect_main(MigrationState *s, QIOChannel *ioc,
-                                        const char *hostname, Error **errp);
-
 typedef void (*MigTLSConCallback)(QIOChannel *ioc, void *opaque, Error *err);
 
 /**
diff --git a/migration/channel.c b/migration/channel.c
index 041a63eb21..4022b2c9b8 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -50,6 +50,14 @@ void migration_channel_process_incoming(QIOChannel *ioc)
     }
 }
 
+static void migration_channel_tls_handshake_main(QIOChannel *ioc, void *opaque,
+                                                 Error *err)
+{
+    MigrationState *s = opaque;
+
+    migration_channel_connect(s, ioc, NULL, err);
+    object_unref(OBJECT(ioc));
+}
 
 /**
  * @migration_channel_connect - Create new outgoing migration channel
@@ -69,14 +77,16 @@ void migration_channel_connect(MigrationState *s,
 
     if (!error) {
         if (migrate_channel_requires_tls_upgrade(ioc)) {
-            migration_tls_channel_connect_main(s, ioc, hostname, &error);
-
-            if (!error) {
-                /* tls_channel_connect will call back to this
-                 * function after the TLS handshake,
-                 * so we mustn't call migrate_fd_connect until then
+            /* Save hostname into MigrationState for handshake */
+            s->hostname = g_strdup(hostname);
+            if (migration_tls_channel_connect(
+                    ioc, "main", hostname, migration_channel_tls_handshake_main,
+                    s, false, &error)) {
+                /*
+                 * migration_channel_tls_handshake_main will call back to this
+                 * function after the TLS handshake, so we mustn't call
+                 * migrate_fd_connect until then.
                  */
-
                 return;
             }
         } else {
diff --git a/migration/tls.c b/migration/tls.c
index e6a0349bd1..99c71e4fb6 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -97,24 +97,6 @@ void migration_tls_channel_process_incoming(MigrationState *s,
                               NULL);
 }
 
-
-static void migration_tls_outgoing_handshake_main(QIOTask *task,
-                                                  gpointer opaque)
-{
-    MigrationState *s = opaque;
-    QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-    Error *err = NULL;
-
-    if (qio_task_propagate_error(task, &err)) {
-        trace_migration_tls_outgoing_handshake_main_error(
-            error_get_pretty(err));
-    } else {
-        trace_migration_tls_outgoing_handshake_main_complete();
-    }
-    migration_channel_connect(s, ioc, NULL, err);
-    object_unref(OBJECT(ioc));
-}
-
 QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
                                            const char *hostname,
                                            Error **errp)
@@ -134,24 +116,6 @@ QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
     return qio_channel_tls_new_client(ioc, creds, hostname, errp);
 }
 
-void migration_tls_channel_connect_main(MigrationState *s, QIOChannel *ioc,
-                                        const char *hostname, Error **errp)
-{
-    QIOChannelTLS *tioc;
-
-    tioc = migration_tls_client_create(ioc, hostname, errp);
-    if (!tioc) {
-        return;
-    }
-
-    /* Save hostname into MigrationState for handshake */
-    s->hostname = g_strdup(hostname);
-    trace_migration_tls_outgoing_handshake_main_start(hostname);
-    qio_channel_set_name(QIO_CHANNEL(tioc), "migration-tls-outgoing");
-    qio_channel_tls_handshake(tioc, migration_tls_outgoing_handshake_main, s,
-                              NULL, NULL);
-}
-
 typedef struct {
     QIOChannelTLS *tioc;
     MigTLSConCallback callback;
diff --git a/migration/trace-events b/migration/trace-events
index 09dd342d37..80c3c20faa 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -325,9 +325,6 @@ migration_socket_outgoing_connected(const char *hostname) "hostname=%s"
 migration_socket_outgoing_error(const char *err) "error=%s"
 
 # tls.c
-migration_tls_outgoing_handshake_main_start(const char *hostname) "hostname=%s"
-migration_tls_outgoing_handshake_main_error(const char *err) "err=%s"
-migration_tls_outgoing_handshake_main_complete(void) ""
 migration_tls_outgoing_handshake_start(const char *hostname, const char *name) "hostname=%s, name=%s"
 migration_tls_outgoing_handshake_error(const char *name, const char *err) "name=%s, err=%s"
 migration_tls_outgoing_handshake_complete(const char *name) "name=%s"
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/17] migration/multifd: Use the new TLS upgrade API for multifd channels
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (7 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 08/17] migration: Use the new TLS upgrade API for main channel Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 10/17] migration/postcopy: Use the new TLS upgrade API for preempt channel Avihai Horon
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Use the new TLS upgrade API for multifd channels and remove the old TLS
code.

Note that p->c is now set only after a successful TLS connection, so
need to call object_unref() on QIOChannelTLS in case of error in
multifd_tls_outgoing_handshake() (previously it was done in
multifd_save_cleanup()).

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/multifd.c    | 66 ++++++++++--------------------------------
 migration/trace-events |  3 --
 2 files changed, 16 insertions(+), 53 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index f0c216f4f9..f4d8cd0023 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -794,22 +794,17 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                                     QIOChannel *ioc,
                                     Error **errp);
 
-static void multifd_tls_outgoing_handshake(QIOTask *task,
-                                           gpointer opaque)
+static void multifd_tls_outgoing_handshake(QIOChannel *ioc, gpointer opaque,
+                                           Error *err)
 {
     MultiFDSendParams *p = opaque;
-    QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-    Error *err = NULL;
 
-    if (!qio_task_propagate_error(task, &err)) {
-        trace_multifd_tls_outgoing_handshake_complete(ioc);
+    if (!err) {
         if (multifd_channel_connect(p, ioc, &err)) {
             return;
         }
     }
 
-    trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
-
     migrate_set_error(migrate_get_current(), err);
     /*
      * Error happen, mark multifd_send_thread status as 'quit' although it
@@ -820,59 +815,30 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
     qemu_sem_post(&p->sem_sync);
     qemu_sem_post(&p->create_sem);
     error_free(err);
-}
-
-static void *multifd_tls_handshake_thread(void *opaque)
-{
-    MultiFDSendParams *p = opaque;
-    QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c);
-
-    qio_channel_tls_handshake(tioc,
-                              multifd_tls_outgoing_handshake,
-                              p,
-                              NULL,
-                              NULL);
-    return NULL;
-}
-
-static bool multifd_tls_channel_connect(MultiFDSendParams *p,
-                                        QIOChannel *ioc,
-                                        Error **errp)
-{
-    MigrationState *s = migrate_get_current();
-    const char *hostname = s->hostname;
-    QIOChannelTLS *tioc;
-
-    tioc = migration_tls_client_create(ioc, hostname, errp);
-    if (!tioc) {
-        return false;
-    }
-
     object_unref(OBJECT(ioc));
-    trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname);
-    qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing");
-    p->c = QIO_CHANNEL(tioc);
-    qemu_thread_create(&p->thread, "multifd-tls-handshake-worker",
-                       multifd_tls_handshake_thread, p,
-                       QEMU_THREAD_JOINABLE);
-    return true;
 }
 
 static bool multifd_channel_connect(MultiFDSendParams *p,
                                     QIOChannel *ioc,
                                     Error **errp)
 {
-    trace_multifd_set_outgoing_channel(
-        ioc, object_get_typename(OBJECT(ioc)),
-        migrate_get_current()->hostname);
+    MigrationState *s = migrate_get_current();
+
+    trace_multifd_set_outgoing_channel(ioc, object_get_typename(OBJECT(ioc)),
+                                       s->hostname);
 
     if (migrate_channel_requires_tls_upgrade(ioc)) {
         /*
-         * tls_channel_connect will call back to this
-         * function after the TLS handshake,
-         * so we mustn't call multifd_send_thread until then
+         * multifd_tls_outgoing_handshake will call back to this function after
+         * the TLS handshake, so we mustn't call multifd_send_thread until then.
          */
-        return multifd_tls_channel_connect(p, ioc, errp);
+        if (migration_tls_channel_connect(ioc, p->name, s->hostname,
+                                          multifd_tls_outgoing_handshake, p,
+                                          true, errp)) {
+            object_unref(OBJECT(ioc));
+            return true;
+        }
+        return false;
     }
 
     qio_channel_set_delay(ioc, false);
diff --git a/migration/trace-events b/migration/trace-events
index 80c3c20faa..2c328326e8 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -144,9 +144,6 @@ multifd_send_sync_main_wait(uint8_t id) "channel %u"
 multifd_send_terminate_threads(bool error) "error %d"
 multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
-multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
-multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p err=%s"
-multifd_tls_outgoing_handshake_complete(void *ioc) "ioc=%p"
 multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
 
 # migration.c
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/17] migration/postcopy: Use the new TLS upgrade API for preempt channel
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (8 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 09/17] migration/multifd: Use the new TLS upgrade API for multifd channels Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 11/17] migration/tls: Make migration_tls_client_create() static Avihai Horon
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Use the new TLS upgrade API for postcopy preempt channel and remove old
TLS code.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/postcopy-ram.c | 20 +++++++-------------
 migration/trace-events   |  1 -
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 5408e028c6..3df937e7da 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1644,15 +1644,13 @@ postcopy_preempt_send_channel_done(MigrationState *s,
     qemu_sem_post(&s->postcopy_qemufile_src_sem);
 }
 
-static void
-postcopy_preempt_tls_handshake(QIOTask *task, gpointer opaque)
+static void postcopy_preempt_tls_handshake(QIOChannel *ioc, gpointer opaque,
+                                           Error *err)
 {
-    g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
     MigrationState *s = opaque;
-    Error *local_err = NULL;
 
-    qio_task_propagate_error(task, &local_err);
-    postcopy_preempt_send_channel_done(s, ioc, local_err);
+    postcopy_preempt_send_channel_done(s, ioc, err);
+    object_unref(ioc);
 }
 
 static void
@@ -1660,7 +1658,6 @@ postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
 {
     g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
     MigrationState *s = opaque;
-    QIOChannelTLS *tioc;
     Error *local_err = NULL;
 
     if (qio_task_propagate_error(task, &local_err)) {
@@ -1668,14 +1665,11 @@ postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
     }
 
     if (migrate_channel_requires_tls_upgrade(ioc)) {
-        tioc = migration_tls_client_create(ioc, s->hostname, &local_err);
-        if (!tioc) {
+        if (!migration_tls_channel_connect(ioc, "preempt", s->hostname,
+                                           postcopy_preempt_tls_handshake, s,
+                                           false, &local_err)) {
             goto out;
         }
-        trace_postcopy_preempt_tls_handshake();
-        qio_channel_set_name(QIO_CHANNEL(tioc), "migration-tls-preempt");
-        qio_channel_tls_handshake(tioc, postcopy_preempt_tls_handshake,
-                                  s, NULL, NULL);
         /* Setup the channel until TLS handshake finished */
         return;
     }
diff --git a/migration/trace-events b/migration/trace-events
index 2c328326e8..9a8ec67115 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -297,7 +297,6 @@ postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_off
 postcopy_request_shared_page_present(const char *sharer, const char *rb, uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in %s"
 postcopy_page_req_del(void *addr, int count) "resolved page req %p total %d"
-postcopy_preempt_tls_handshake(void) ""
 postcopy_preempt_new_channel(void) ""
 postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/17] migration/tls: Make migration_tls_client_create() static
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (9 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 10/17] migration/postcopy: Use the new TLS upgrade API for preempt channel Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 12/17] migration/multifd: Consolidate TLS/non-TLS multifd channel error flow Avihai Horon
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

migration_tls_client_create() is not used externally by anyone.
Make it static.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/tls.h | 4 ----
 migration/tls.c | 2 +-
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/migration/tls.h b/migration/tls.h
index a6babbfa14..87a79b4102 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -28,10 +28,6 @@ void migration_tls_channel_process_incoming(MigrationState *s,
                                             QIOChannel *ioc,
                                             Error **errp);
 
-QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
-                                           const char *hostname,
-                                           Error **errp);
-
 typedef void (*MigTLSConCallback)(QIOChannel *ioc, void *opaque, Error *err);
 
 /**
diff --git a/migration/tls.c b/migration/tls.c
index 99c71e4fb6..8f427ecc98 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -97,7 +97,7 @@ void migration_tls_channel_process_incoming(MigrationState *s,
                               NULL);
 }
 
-QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
+static QIOChannelTLS *migration_tls_client_create(QIOChannel *ioc,
                                            const char *hostname,
                                            Error **errp)
 {
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/17] migration/multifd: Consolidate TLS/non-TLS multifd channel error flow
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (10 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 11/17] migration/tls: Make migration_tls_client_create() static Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 13/17] migration: Store MigrationAddress in MigrationState Avihai Horon
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

The error flows of TLS and non-TLS multifd channel creation are similar
yet they don't share code. Consolidate the flows by using
multifd_new_send_channel_cleanup() also in TLS error flow.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/multifd.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index f4d8cd0023..cc9a1182fa 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -794,6 +794,9 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                                     QIOChannel *ioc,
                                     Error **errp);
 
+static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
+                                             QIOChannel *ioc, Error *err);
+
 static void multifd_tls_outgoing_handshake(QIOChannel *ioc, gpointer opaque,
                                            Error *err)
 {
@@ -805,17 +808,7 @@ static void multifd_tls_outgoing_handshake(QIOChannel *ioc, gpointer opaque,
         }
     }
 
-    migrate_set_error(migrate_get_current(), err);
-    /*
-     * Error happen, mark multifd_send_thread status as 'quit' although it
-     * is not created, and then tell who pay attention to me.
-     */
-    p->quit = true;
-    qemu_sem_post(&multifd_send_state->channels_ready);
-    qemu_sem_post(&p->sem_sync);
-    qemu_sem_post(&p->create_sem);
-    error_free(err);
-    object_unref(OBJECT(ioc));
+    multifd_new_send_channel_cleanup(p, ioc, err);
 }
 
 static bool multifd_channel_connect(MultiFDSendParams *p,
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/17] migration: Store MigrationAddress in MigrationState
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (11 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 12/17] migration/multifd: Consolidate TLS/non-TLS multifd channel error flow Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 14/17] migration: Rename migration_channel_connect() Avihai Horon
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

This will be used in the new migration channel creation API in the
following patches.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/migration.h | 3 +++
 migration/migration.c | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index 17972dac34..dc370ab3e8 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -470,6 +470,9 @@ struct MigrationState {
     bool switchover_acked;
     /* Is this a rdma migration */
     bool rdma_migration;
+
+    /* The address used for this migration */
+    MigrationAddress *address;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/migration.c b/migration/migration.c
index d81d96eaa5..deaa79ff14 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1286,6 +1286,8 @@ static void migrate_fd_cleanup(MigrationState *s)
     s->hostname = NULL;
     json_writer_free(s->vmdesc);
     s->vmdesc = NULL;
+    qapi_free_MigrationAddress(s->address);
+    s->address = NULL;
 
     qemu_savevm_state_cleanup();
 
@@ -1974,6 +1976,8 @@ void qmp_migrate(const char *uri, bool has_channels,
         }
     }
 
+    s->address = QAPI_CLONE(MigrationAddress, addr);
+
     if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
         SocketAddress *saddr = &addr->u.socket;
         if (saddr->type == SOCKET_ADDRESS_TYPE_INET ||
@@ -2005,6 +2009,8 @@ void qmp_migrate(const char *uri, bool has_channels,
         }
         migrate_fd_error(s, local_err);
         error_propagate(errp, local_err);
+        qapi_free_MigrationAddress(s->address);
+        s->address = NULL;
         return;
     }
 }
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/17] migration: Rename migration_channel_connect()
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (12 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 13/17] migration: Store MigrationAddress in MigrationState Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 15/17] migration: Add new migration channel connect API Avihai Horon
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

The following patches will add a new API to create migration channels
with the same name, so rename migration_channel_connect() to
migration_channel_connect_main().
No functional changes intended.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/channel.h |  6 ++----
 migration/channel.c | 10 ++++------
 migration/exec.c    |  2 +-
 migration/fd.c      |  2 +-
 migration/file.c    |  2 +-
 migration/socket.c  |  2 +-
 6 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/migration/channel.h b/migration/channel.h
index 5bdb8208a7..1e36bdd866 100644
--- a/migration/channel.h
+++ b/migration/channel.h
@@ -20,10 +20,8 @@
 
 void migration_channel_process_incoming(QIOChannel *ioc);
 
-void migration_channel_connect(MigrationState *s,
-                               QIOChannel *ioc,
-                               const char *hostname,
-                               Error *error_in);
+void migration_channel_connect_main(MigrationState *s, QIOChannel *ioc,
+                                    const char *hostname, Error *error_in);
 
 int migration_channel_read_peek(QIOChannel *ioc,
                                 const char *buf,
diff --git a/migration/channel.c b/migration/channel.c
index 4022b2c9b8..c1f7c6d556 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -55,22 +55,20 @@ static void migration_channel_tls_handshake_main(QIOChannel *ioc, void *opaque,
 {
     MigrationState *s = opaque;
 
-    migration_channel_connect(s, ioc, NULL, err);
+    migration_channel_connect_main(s, ioc, NULL, err);
     object_unref(OBJECT(ioc));
 }
 
 /**
- * @migration_channel_connect - Create new outgoing migration channel
+ * @migration_channel_connect_main - Create new main outgoing migration channel
  *
  * @s: Current migration state
  * @ioc: Channel to which we are connecting
  * @hostname: Where we want to connect
  * @error: Error indicating failure to connect, free'd here
  */
-void migration_channel_connect(MigrationState *s,
-                               QIOChannel *ioc,
-                               const char *hostname,
-                               Error *error)
+void migration_channel_connect_main(MigrationState *s, QIOChannel *ioc,
+                                    const char *hostname, Error *error)
 {
     trace_migration_set_outgoing_channel(
         ioc, object_get_typename(OBJECT(ioc)), hostname, error);
diff --git a/migration/exec.c b/migration/exec.c
index 47d2f3b8fb..043fb3f14d 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -89,7 +89,7 @@ void exec_start_outgoing_migration(MigrationState *s, strList *command,
     }
 
     qio_channel_set_name(ioc, "migration-exec-outgoing");
-    migration_channel_connect(s, ioc, NULL, NULL);
+    migration_channel_connect_main(s, ioc, NULL, NULL);
     object_unref(OBJECT(ioc));
 }
 
diff --git a/migration/fd.c b/migration/fd.c
index 0eb677dcae..9d3ff249dc 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -39,7 +39,7 @@ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **
     }
 
     qio_channel_set_name(ioc, "migration-fd-outgoing");
-    migration_channel_connect(s, ioc, NULL, NULL);
+    migration_channel_connect_main(s, ioc, NULL, NULL);
     object_unref(OBJECT(ioc));
 }
 
diff --git a/migration/file.c b/migration/file.c
index 5d4975f43e..60cdaead1e 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -57,7 +57,7 @@ void file_start_outgoing_migration(MigrationState *s,
         return;
     }
     qio_channel_set_name(ioc, "migration-file-outgoing");
-    migration_channel_connect(s, ioc, NULL, NULL);
+    migration_channel_connect_main(s, ioc, NULL, NULL);
 }
 
 static gboolean file_accept_incoming_migration(QIOChannel *ioc,
diff --git a/migration/socket.c b/migration/socket.c
index 98e3ea1514..c55129dc9b 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -106,7 +106,7 @@ static void socket_outgoing_migration(QIOTask *task,
     }
 
 out:
-    migration_channel_connect(data->s, sioc, data->hostname, err);
+    migration_channel_connect_main(data->s, sioc, data->hostname, err);
     object_unref(OBJECT(sioc));
 }
 
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/17] migration: Add new migration channel connect API
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (13 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 14/17] migration: Rename migration_channel_connect() Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 16/17] migration/multifd: Use the new migration channel connect API for multifd Avihai Horon
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Add a new API to connect additional migration channels other than the
main migration channel. This API removes the burden of handling the
transport type and TLS upgrade logic, and thus simplifies migration
channel connection.

It will be used in the next patches to connect multifd and postcopy
preempt channels.

Export migration_channels_and_transport_compatible() as now it is also
used outside of migration.c.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/channel.h    | 24 ++++++++++++++
 migration/migration.h  |  2 ++
 migration/channel.c    | 74 ++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c  |  5 ++-
 migration/trace-events |  3 ++
 5 files changed, 105 insertions(+), 3 deletions(-)

diff --git a/migration/channel.h b/migration/channel.h
index 1e36bdd866..f0fa94ad9e 100644
--- a/migration/channel.h
+++ b/migration/channel.h
@@ -27,4 +27,28 @@ int migration_channel_read_peek(QIOChannel *ioc,
                                 const char *buf,
                                 const size_t buflen,
                                 Error **errp);
+
+typedef void (*MigChannelCallback)(QIOChannel *ioc, void *opaque, Error *err);
+
+/**
+ * migration_channel_connect:
+ * @callback: The callback to invoke when completed
+ * @name: The name of the channel
+ * @opaque: Opaque data to pass to @callback
+ * @tls_in_thread: Whether to run TLS handshake in new thread or not (if TLS is
+ *                 needed).
+ * @errp: Pointer to a NULL-initialized error object pointer
+ *
+ * Establishes a new migration channel and TLS upgrades it if needed. If this
+ * function succeeds, @callback will be invoked upon completion and
+ * success/failure will be reported to it via the Error object.
+ * In case multiple channels are established in parallel, @tls_in_thread should
+ * be set to true so the TLS handshake will be performed in a new thread, to
+ * avoid a potential risk of migration hang.
+ *
+ * Returns: True on successful initiation of channel establishment process, or
+ * false on failure.
+ */
+bool migration_channel_connect(MigChannelCallback callback, const char *name,
+                               void *opaque, bool tls_in_thread, Error **errp);
 #endif
diff --git a/migration/migration.h b/migration/migration.h
index dc370ab3e8..52b340e00b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -523,6 +523,8 @@ bool check_dirty_bitmap_mig_alias_map(const BitmapMigrationNodeAliasList *bbm,
 void migrate_add_address(SocketAddress *address);
 bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
                        Error **errp);
+bool migration_channels_and_transport_compatible(MigrationAddress *addr,
+                                                 Error **errp);
 int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque);
 
 #define qemu_ram_foreach_block \
diff --git a/migration/channel.c b/migration/channel.c
index c1f7c6d556..741974279f 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -21,6 +21,7 @@
 #include "io/channel-socket.h"
 #include "qemu/yank.h"
 #include "yank_functions.h"
+#include "socket.h"
 
 /**
  * @migration_channel_process_incoming - Create new incoming migration channel
@@ -101,6 +102,79 @@ void migration_channel_connect_main(MigrationState *s, QIOChannel *ioc,
     error_free(error);
 }
 
+typedef struct {
+    MigChannelCallback callback;
+    void *opaque;
+    char *name;
+    bool tls_in_thread;
+} MigChannelData;
+
+static void migration_channel_connect_tls_handshake(QIOChannel *ioc,
+                                                    void *opaque, Error *err)
+{
+    MigChannelData *data = opaque;
+
+    data->callback(ioc, data->opaque, err);
+    g_free(data->name);
+    g_free(data);
+}
+
+static void migration_channel_connect_callback(QIOTask *task, void *opaque)
+{
+    QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
+    MigChannelData *data = opaque;
+    MigrationState *s = migrate_get_current();
+    Error *err = NULL;
+
+    if (qio_task_propagate_error(task, &err)) {
+        trace_migration_channel_connect_error(data->name,
+                                              error_get_pretty(err));
+        goto out;
+    }
+
+    trace_migration_channel_connect_complete(data->name);
+    if (!migrate_channel_requires_tls_upgrade(ioc)) {
+        goto out;
+    }
+
+    if (migration_tls_channel_connect(ioc, data->name, s->hostname,
+                                      migration_channel_connect_tls_handshake,
+                                      data, data->tls_in_thread, &err)) {
+        object_unref(OBJECT(ioc));
+        /* data->callback will be invoked after handshake */
+        return;
+    }
+
+out:
+    data->callback(ioc, data->opaque, err);
+    g_free(data->name);
+    g_free(data);
+}
+
+bool migration_channel_connect(MigChannelCallback callback, const char *name,
+                               void *opaque, bool tls_in_thread, Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+    MigChannelData *data;
+
+    g_assert(s->address);
+    g_assert(migration_channels_and_transport_compatible(s->address, NULL));
+
+    data = g_new0(MigChannelData, 1);
+    data->callback = callback;
+    data->opaque = opaque;
+    data->name = g_strdup(name);
+    data->tls_in_thread = tls_in_thread;
+
+    trace_migration_channel_connect_start(s->hostname, name);
+    /*
+     * Currently, creating migration channels other than main channel is
+     * supported only with socket transport.
+     */
+    socket_send_channel_create(migration_channel_connect_callback, data);
+
+    return true;
+}
 
 /**
  * @migration_channel_read_peek - Peek at migration channel, without
diff --git a/migration/migration.c b/migration/migration.c
index deaa79ff14..6f985e7f74 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -141,9 +141,8 @@ static bool transport_supports_multi_channels(MigrationAddress *addr)
     return false;
 }
 
-static bool
-migration_channels_and_transport_compatible(MigrationAddress *addr,
-                                            Error **errp)
+bool migration_channels_and_transport_compatible(MigrationAddress *addr,
+                                                 Error **errp)
 {
     if (migration_needs_multiple_sockets() &&
         !transport_supports_multi_channels(addr)) {
diff --git a/migration/trace-events b/migration/trace-events
index 9a8ec67115..6c915d8567 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -195,6 +195,9 @@ migration_transferred_bytes(uint64_t qemu_file, uint64_t multifd, uint64_t rdma)
 # channel.c
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
 migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname, void *err)  "ioc=%p ioctype=%s hostname=%s err=%p"
+migration_channel_connect_start(const char *hostname, const char *name) "hostname=%s, name=%s"
+migration_channel_connect_error(const char *name, const char *err) "name=%s, err=%s"
+migration_channel_connect_complete(const char *name) "name=%s"
 
 # global_state.c
 migrate_state_too_big(void) ""
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 16/17] migration/multifd: Use the new migration channel connect API for multifd
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (14 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 15/17] migration: Add new migration channel connect API Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-01-25 16:25 ` [PATCH 17/17] migration/postcopy: Use the new migration channel connect API for postcopy preempt Avihai Horon
  2024-02-06 10:04 ` [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Peter Xu
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Use the new migration channel connect API for multifd and remove old
channel connect code.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/multifd.c    | 89 ++++++++++--------------------------------
 migration/trace-events |  3 --
 2 files changed, 21 insertions(+), 71 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index cc9a1182fa..c679b64721 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -18,10 +18,10 @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "ram.h"
+#include "channel.h"
 #include "migration.h"
 #include "migration-stats.h"
 #include "socket.h"
-#include "tls.h"
 #include "qemu-file.h"
 #include "trace.h"
 #include "multifd.h"
@@ -790,61 +790,6 @@ int multifd_send_channels_created(void)
     return ret;
 }
 
-static bool multifd_channel_connect(MultiFDSendParams *p,
-                                    QIOChannel *ioc,
-                                    Error **errp);
-
-static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
-                                             QIOChannel *ioc, Error *err);
-
-static void multifd_tls_outgoing_handshake(QIOChannel *ioc, gpointer opaque,
-                                           Error *err)
-{
-    MultiFDSendParams *p = opaque;
-
-    if (!err) {
-        if (multifd_channel_connect(p, ioc, &err)) {
-            return;
-        }
-    }
-
-    multifd_new_send_channel_cleanup(p, ioc, err);
-}
-
-static bool multifd_channel_connect(MultiFDSendParams *p,
-                                    QIOChannel *ioc,
-                                    Error **errp)
-{
-    MigrationState *s = migrate_get_current();
-
-    trace_multifd_set_outgoing_channel(ioc, object_get_typename(OBJECT(ioc)),
-                                       s->hostname);
-
-    if (migrate_channel_requires_tls_upgrade(ioc)) {
-        /*
-         * multifd_tls_outgoing_handshake will call back to this function after
-         * the TLS handshake, so we mustn't call multifd_send_thread until then.
-         */
-        if (migration_tls_channel_connect(ioc, p->name, s->hostname,
-                                          multifd_tls_outgoing_handshake, p,
-                                          true, errp)) {
-            object_unref(OBJECT(ioc));
-            return true;
-        }
-        return false;
-    }
-
-    qio_channel_set_delay(ioc, false);
-    migration_ioc_register_yank(ioc);
-    p->registered_yank = true;
-    p->c = ioc;
-    qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
-                       QEMU_THREAD_JOINABLE);
-    p->running = true;
-    qemu_sem_post(&p->create_sem);
-    return true;
-}
-
 static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
                                              QIOChannel *ioc, Error *err)
 {
@@ -863,26 +808,34 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
      error_free(err);
 }
 
-static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
+static void multifd_new_send_channel_callback(QIOChannel *ioc, void *opaque,
+                                              Error *err)
 {
     MultiFDSendParams *p = opaque;
-    QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-    Error *local_err = NULL;
 
-    trace_multifd_new_send_channel_async(p->id);
-    if (!qio_task_propagate_error(task, &local_err)) {
-        if (multifd_channel_connect(p, ioc, &local_err)) {
-            return;
-        }
+    if (err) {
+        multifd_new_send_channel_cleanup(p, ioc, err);
+        return;
     }
 
-    trace_multifd_new_send_channel_async_error(p->id, local_err);
-    multifd_new_send_channel_cleanup(p, ioc, local_err);
+    qio_channel_set_delay(ioc, false);
+    migration_ioc_register_yank(ioc);
+    p->registered_yank = true;
+    p->c = ioc;
+    qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
+                       QEMU_THREAD_JOINABLE);
+    p->running = true;
+    qemu_sem_post(&p->create_sem);
 }
 
-static void multifd_new_send_channel_create(gpointer opaque)
+static void multifd_new_send_channel_create(MultiFDSendParams *p)
 {
-    socket_send_channel_create(multifd_new_send_channel_async, opaque);
+    Error *local_err = NULL;
+
+    if (!migration_channel_connect(multifd_new_send_channel_callback, p->name,
+                                   p, true, &local_err)) {
+        multifd_new_send_channel_cleanup(p, NULL, local_err);
+    }
 }
 
 int multifd_save_setup(Error **errp)
diff --git a/migration/trace-events b/migration/trace-events
index 6c915d8567..6ac73b0d85 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -126,8 +126,6 @@ postcopy_preempt_switch_channel(int channel) "%d"
 postcopy_preempt_reset_channel(void) ""
 
 # multifd.c
-multifd_new_send_channel_async(uint8_t id) "channel %u"
-multifd_new_send_channel_async_error(uint8_t id, void *err) "channel=%u err=%p"
 multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
 multifd_recv_new_channel(uint8_t id) "channel %u"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
@@ -144,7 +142,6 @@ multifd_send_sync_main_wait(uint8_t id) "channel %u"
 multifd_send_terminate_threads(bool error) "error %d"
 multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
-multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
 
 # migration.c
 migrate_set_state(const char *new_state) "new state %s"
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 17/17] migration/postcopy: Use the new migration channel connect API for postcopy preempt
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (15 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 16/17] migration/multifd: Use the new migration channel connect API for multifd Avihai Horon
@ 2024-01-25 16:25 ` Avihai Horon
  2024-02-06 10:04 ` [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Peter Xu
  17 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-01-25 16:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Fabiano Rosas, Avihai Horon

Use the new migration channel connect API for postcopy preempt and
remove old channel connect code.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 migration/postcopy-ram.h |   2 +-
 migration/postcopy-ram.c | 105 +++++++++++++++------------------------
 2 files changed, 42 insertions(+), 65 deletions(-)

diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 442ab89752..c1b7d9be6d 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -192,7 +192,7 @@ enum PostcopyChannels {
 };
 
 void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
-void postcopy_preempt_setup(MigrationState *s);
+int postcopy_preempt_setup(MigrationState *s);
 int postcopy_preempt_establish_channel(MigrationState *s);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 3df937e7da..1d80acc7b4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -34,9 +34,9 @@
 #include "exec/ramblock.h"
 #include "socket.h"
 #include "yank_functions.h"
-#include "tls.h"
 #include "qemu/userfaultfd.h"
 #include "qemu/mmap-alloc.h"
+#include "channel.h"
 #include "options.h"
 
 /* Arbitrary limit on size of each discard command,
@@ -1620,65 +1620,6 @@ void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
     trace_postcopy_preempt_new_channel();
 }
 
-/*
- * Setup the postcopy preempt channel with the IOC.  If ERROR is specified,
- * setup the error instead.  This helper will free the ERROR if specified.
- */
-static void
-postcopy_preempt_send_channel_done(MigrationState *s,
-                                   QIOChannel *ioc, Error *local_err)
-{
-    if (local_err) {
-        migrate_set_error(s, local_err);
-        error_free(local_err);
-    } else {
-        migration_ioc_register_yank(ioc);
-        s->postcopy_qemufile_src = qemu_file_new_output(ioc);
-        trace_postcopy_preempt_new_channel();
-    }
-
-    /*
-     * Kick the waiter in all cases.  The waiter should check upon
-     * postcopy_qemufile_src to know whether it failed or not.
-     */
-    qemu_sem_post(&s->postcopy_qemufile_src_sem);
-}
-
-static void postcopy_preempt_tls_handshake(QIOChannel *ioc, gpointer opaque,
-                                           Error *err)
-{
-    MigrationState *s = opaque;
-
-    postcopy_preempt_send_channel_done(s, ioc, err);
-    object_unref(ioc);
-}
-
-static void
-postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
-{
-    g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
-    MigrationState *s = opaque;
-    Error *local_err = NULL;
-
-    if (qio_task_propagate_error(task, &local_err)) {
-        goto out;
-    }
-
-    if (migrate_channel_requires_tls_upgrade(ioc)) {
-        if (!migration_tls_channel_connect(ioc, "preempt", s->hostname,
-                                           postcopy_preempt_tls_handshake, s,
-                                           false, &local_err)) {
-            goto out;
-        }
-        /* Setup the channel until TLS handshake finished */
-        return;
-    }
-
-out:
-    /* This handles both good and error cases */
-    postcopy_preempt_send_channel_done(s, ioc, local_err);
-}
-
 /*
  * This function will kick off an async task to establish the preempt
  * channel, and wait until the connection setup completed.  Returns 0 if
@@ -1697,7 +1638,9 @@ int postcopy_preempt_establish_channel(MigrationState *s)
      * setup phase of migration (even if racy in an unreliable network).
      */
     if (!s->preempt_pre_7_2) {
-        postcopy_preempt_setup(s);
+        if (postcopy_preempt_setup(s)) {
+            return -1;
+        }
     }
 
     /*
@@ -1709,10 +1652,44 @@ int postcopy_preempt_establish_channel(MigrationState *s)
     return s->postcopy_qemufile_src ? 0 : -1;
 }
 
-void postcopy_preempt_setup(MigrationState *s)
+/*
+ * Setup the postcopy preempt channel with the IOC.  If ERROR is specified,
+ * setup the error instead.  This helper will free the ERROR if specified.
+ */
+static void postcopy_preempt_send_channel_new_callback(QIOChannel *ioc,
+                                                       void *opaque, Error *err)
+{
+    MigrationState *s = opaque;
+
+    if (err) {
+        migrate_set_error(s, err);
+        error_free(err);
+    } else {
+        migration_ioc_register_yank(ioc);
+        s->postcopy_qemufile_src = qemu_file_new_output(ioc);
+        trace_postcopy_preempt_new_channel();
+    }
+
+    /*
+     * Kick the waiter in all cases.  The waiter should check upon
+     * postcopy_qemufile_src to know whether it failed or not.
+     */
+    qemu_sem_post(&s->postcopy_qemufile_src_sem);
+    object_unref(OBJECT(ioc));
+}
+
+int postcopy_preempt_setup(MigrationState *s)
 {
-    /* Kick an async task to connect */
-    socket_send_channel_create(postcopy_preempt_send_channel_new, s);
+    Error *local_err = NULL;
+
+    if (!migration_channel_connect(postcopy_preempt_send_channel_new_callback,
+                                   "preempt", s, false, &local_err)) {
+        migrate_set_error(s, local_err);
+        error_report_err(local_err);
+        return -1;
+    }
+
+    return 0;
 }
 
 static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
-- 
2.26.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs
  2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
                   ` (16 preceding siblings ...)
  2024-01-25 16:25 ` [PATCH 17/17] migration/postcopy: Use the new migration channel connect API for postcopy preempt Avihai Horon
@ 2024-02-06 10:04 ` Peter Xu
  2024-02-06 13:10   ` Avihai Horon
  17 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2024-02-06 10:04 UTC (permalink / raw)
  To: Avihai Horon; +Cc: qemu-devel, Fabiano Rosas

On Thu, Jan 25, 2024 at 06:25:11PM +0200, Avihai Horon wrote:
> Hello,
> 
> Today there are several types of migration channels that can be used
> during migration: main migration channel, multifd channels and postcopy
> preempt channel. Each channel type has its own code to connect and to
> TLS upgrade itself when needed. There is no unified API for these tasks
> and it makes the code a bit unclear and cumbersome.
> 
> This series aims to solve this by introducing two new APIs for the above
> tasks that will replace the existing code.
> 
> The first one is an API to TLS upgrade migration channels. A new
> function, migration_tls_channel_connect(), is introduced and is used by
> main migration, multifd and postcopy preempt channels.
> 
> The second one is an API to connect migration channels. A new function,
> migration_channel_connect(), is introduced and is used by all migration
> channels other than main migration channel, i.e., multifd and postcopy
> preempt channels. The reason it's not used by main migration channel is
> because the main channel is a bit special and has different flows, and I
> didn't see a smooth way to include it.
> 
> Patch breakdown:
> 1-5: Some fixes and cleanups.
> 6-12: Add new migration channel TLS upgrade API.
> 13-17: Add new migration channel connect API.

Hi, Avihai,

I am queuing patch 1 as it does look like a real bug to fix asap.  I'll
copy stable for that one.  For such patches, feel free to post separately
next time if you think I should merge earlier than the rest.

For the thread race issue: I believe it should be fully covered by
Fabiano's series, at least that's the plan.  Let me know if there's still
something missing.

For the rest: I didn't closely check, but obviously many of the changes
will not apply after applying Fabiano's fix on the thread races.  If you
have anything that still think worth merging, please rebase to that (or
wait for my pull; it should happen this week).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs
  2024-02-06 10:04 ` [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Peter Xu
@ 2024-02-06 13:10   ` Avihai Horon
  0 siblings, 0 replies; 37+ messages in thread
From: Avihai Horon @ 2024-02-06 13:10 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Fabiano Rosas


On 06/02/2024 12:04, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Jan 25, 2024 at 06:25:11PM +0200, Avihai Horon wrote:
>> Hello,
>>
>> Today there are several types of migration channels that can be used
>> during migration: main migration channel, multifd channels and postcopy
>> preempt channel. Each channel type has its own code to connect and to
>> TLS upgrade itself when needed. There is no unified API for these tasks
>> and it makes the code a bit unclear and cumbersome.
>>
>> This series aims to solve this by introducing two new APIs for the above
>> tasks that will replace the existing code.
>>
>> The first one is an API to TLS upgrade migration channels. A new
>> function, migration_tls_channel_connect(), is introduced and is used by
>> main migration, multifd and postcopy preempt channels.
>>
>> The second one is an API to connect migration channels. A new function,
>> migration_channel_connect(), is introduced and is used by all migration
>> channels other than main migration channel, i.e., multifd and postcopy
>> preempt channels. The reason it's not used by main migration channel is
>> because the main channel is a bit special and has different flows, and I
>> didn't see a smooth way to include it.
>>
>> Patch breakdown:
>> 1-5: Some fixes and cleanups.
>> 6-12: Add new migration channel TLS upgrade API.
>> 13-17: Add new migration channel connect API.
> Hi, Avihai,
>
> I am queuing patch 1 as it does look like a real bug to fix asap.  I'll
> copy stable for that one.  For such patches, feel free to post separately
> next time if you think I should merge earlier than the rest.

Sure, will do so next time.

>
> For the thread race issue: I believe it should be fully covered by
> Fabiano's series, at least that's the plan.  Let me know if there's still
> something missing.

Yep, Fabiano's series seems to have taken care of everything.

>
> For the rest: I didn't closely check, but obviously many of the changes
> will not apply after applying Fabiano's fix on the thread races.  If you
> have anything that still think worth merging, please rebase to that (or
> wait for my pull; it should happen this week).

Sure, thanks.



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-02-08 15:32 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
2024-01-25 16:25 ` [PATCH 01/17] migration: Fix logic of channels and transport compatibility check Avihai Horon
2024-01-26  3:09   ` Peter Xu
2024-01-25 16:25 ` [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming() Avihai Horon
2024-01-26  3:10   ` Peter Xu
2024-01-25 16:25 ` [PATCH 03/17] migration: Rename default_channel to main_channel Avihai Horon
2024-01-26  3:11   ` Peter Xu
2024-01-25 16:25 ` [PATCH 04/17] migration/multifd: Set p->running = true in the right place Avihai Horon
2024-01-25 20:57   ` Fabiano Rosas
2024-01-28 15:43     ` Avihai Horon
2024-01-29  4:17       ` Peter Xu
2024-01-29 12:20         ` Avihai Horon
2024-01-30  5:57           ` Peter Xu
2024-01-30 18:44             ` Avihai Horon
2024-02-06 10:25               ` Peter Xu
2024-02-08 15:31                 ` Avihai Horon
2024-01-29 12:23         ` Fabiano Rosas
2024-01-25 16:25 ` [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding Avihai Horon
2024-01-29 14:34   ` Fabiano Rosas
2024-01-30 18:32     ` Avihai Horon
2024-01-30 21:32       ` Fabiano Rosas
2024-01-31  4:49         ` Peter Xu
2024-01-31 10:39         ` Avihai Horon
2024-01-25 16:25 ` [PATCH 06/17] migration/tls: Rename main migration channel TLS functions Avihai Horon
2024-01-25 16:25 ` [PATCH 07/17] migration/tls: Add new migration channel TLS upgrade API Avihai Horon
2024-01-25 16:25 ` [PATCH 08/17] migration: Use the new TLS upgrade API for main channel Avihai Horon
2024-01-25 16:25 ` [PATCH 09/17] migration/multifd: Use the new TLS upgrade API for multifd channels Avihai Horon
2024-01-25 16:25 ` [PATCH 10/17] migration/postcopy: Use the new TLS upgrade API for preempt channel Avihai Horon
2024-01-25 16:25 ` [PATCH 11/17] migration/tls: Make migration_tls_client_create() static Avihai Horon
2024-01-25 16:25 ` [PATCH 12/17] migration/multifd: Consolidate TLS/non-TLS multifd channel error flow Avihai Horon
2024-01-25 16:25 ` [PATCH 13/17] migration: Store MigrationAddress in MigrationState Avihai Horon
2024-01-25 16:25 ` [PATCH 14/17] migration: Rename migration_channel_connect() Avihai Horon
2024-01-25 16:25 ` [PATCH 15/17] migration: Add new migration channel connect API Avihai Horon
2024-01-25 16:25 ` [PATCH 16/17] migration/multifd: Use the new migration channel connect API for multifd Avihai Horon
2024-01-25 16:25 ` [PATCH 17/17] migration/postcopy: Use the new migration channel connect API for postcopy preempt Avihai Horon
2024-02-06 10:04 ` [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Peter Xu
2024-02-06 13:10   ` Avihai Horon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).