qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] Fix segfault on migration return path
@ 2023-08-11 15:08 Fabiano Rosas
  2023-08-11 15:08 ` [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error Fabiano Rosas
                   ` (9 more replies)
  0 siblings, 10 replies; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang

I decided to fix the issues with the shutdown instead of complaining
about them. First 5 patches address all of the possible races I
found. The only problem left to figure out is the -EIO on shutdown
which will need more thought.

Patches 6 & 7 fix the original segfault.

Patches 8-10 make the cleanup of migration files more predictable and
centralized.

I also adjusted some small things that were mentioned by Peter:

- moved the rp.error = false outside of the thread;
- stopped checking for errors during postcopy_pause();
- dropped the tracepoint;

CI run: https://gitlab.com/farosas/qemu/-/pipelines/963407228

v2:
https://lore.kernel.org/r/20230802143644.7534-1-farosas@suse.de

- moved the await into postcopy_pause() as Peter suggested;

- brought back the mark_source_rp_bad call. Turns out that piece of
code is filled with nuance. I just moved it aside since it doesn't
make sense during pause/resume. We can tackle that when we get the
chance.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/953420150
Also ran the switchover and preempt tests for 1000 times each on
x86_64.

v1:
https://lore.kernel.org/r/20230728121516.16258-1-farosas@suse.de

The /x86_64/migration/postcopy/preempt/recovery/plain test is
sometimes failing due a segmentation fault on the migration return
path. There is a race involving the retry logic of the return path and
the migration resume command.

The issue happens when the retry logic tries to cleanup the current
return path file, but ends up cleaning the new one and trying to use
it right after. Tracing shows it clearly:

open_return_path_on_source  <-- at migration start
open_return_path_on_source_continue <-- rp thread created
postcopy_pause_incoming
postcopy_pause_fast_load
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. (incoming)
postcopy_pause_fault_thread
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. (source)
postcopy_pause_incoming_continued
open_return_path_on_source   <-- NOK, too soon
postcopy_pause_continued
postcopy_pause_return_path   <-- too late, already operating on the new from_dst_file
postcopy_pause_return_path_continued <-- will continue and crash
postcopy_pause_incoming
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
postcopy_pause_incoming_continued

We could solve this by adding some form of synchronization to ensure
that we always do the cleanup before setting up the new file, but I
find it more straight-forward to move the retry logic outside of the
thread by letting it finish and starting a new thread when resuming
the migration.

More details on the commit message.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/947875609

Fabiano Rosas (10):
  migration: Fix possible race when setting rp_state.error
  migration: Fix possible race when shutting return path
  migration: Fix possible race when checking to_dst_file for errors
  migration: Fix possible race when shutting down to_dst_file
  migration: Remove redundant cleanup of postcopy_qemufile_src
  migration: Consolidate return path closing code
  migration: Replace the return path retry logic
  migration: Move return path cleanup to main migration thread
  migration: Be consistent about shutdown of source shared files
  migration: Add a wrapper to cleanup migration files

 migration/migration.c | 228 +++++++++++++++---------------------------
 migration/migration.h |   1 -
 util/yank.c           |   2 -
 3 files changed, 79 insertions(+), 152 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 21:49   ` Peter Xu
  2023-08-11 15:08 ` [PATCH v3 02/10] migration: Fix possible race when shutting return path Fabiano Rosas
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

We don't need to set the rp_state.error right after a shutdown because
qemu_file_shutdown() always sets the QEMUFile error, so the return
path thread would have seen it and set the rp error itself.

Setting the error outside of the thread is also racy because the
thread could clear it after we set it.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5528acb65e..f88c86079c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2062,7 +2062,6 @@ static int await_return_path_close_on_source(MigrationState *ms)
          * waiting for the destination.
          */
         qemu_file_shutdown(ms->rp_state.from_dst_file);
-        mark_source_rp_bad(ms);
     }
     trace_await_return_path_close_on_source_joining();
     qemu_thread_join(&ms->rp_state.rp_thread);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 02/10] migration: Fix possible race when shutting return path
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
  2023-08-11 15:08 ` [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-11 15:08 ` [PATCH v3 03/10] migration: Fix possible race when checking to_dst_file for errors Fabiano Rosas
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

We cannot call qemu_file_shutdown() on the return path file without
taking the file lock. The return path thread could be running it's
cleanup code and have just cleared the pointer.

This was caught by inspection, it should be rare, but the next patches
will start calling this code from other places, so let's do the
correct thing.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f88c86079c..0067c927fa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2052,17 +2052,19 @@ static int open_return_path_on_source(MigrationState *ms,
 static int await_return_path_close_on_source(MigrationState *ms)
 {
     /*
-     * If this is a normal exit then the destination will send a SHUT and the
-     * rp_thread will exit, however if there's an error we need to cause
-     * it to exit.
+     * If this is a normal exit then the destination will send a SHUT
+     * and the rp_thread will exit, however if there's an error we
+     * need to cause it to exit. shutdown(2), if we have it, will
+     * cause it to unblock if it's stuck waiting for the destination.
      */
-    if (qemu_file_get_error(ms->to_dst_file) && ms->rp_state.from_dst_file) {
-        /*
-         * shutdown(2), if we have it, will cause it to unblock if it's stuck
-         * waiting for the destination.
-         */
-        qemu_file_shutdown(ms->rp_state.from_dst_file);
+    if (qemu_file_get_error(ms->to_dst_file)) {
+        WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
+            if (ms->rp_state.from_dst_file) {
+                qemu_file_shutdown(ms->rp_state.from_dst_file);
+            }
+        }
     }
+
     trace_await_return_path_close_on_source_joining();
     qemu_thread_join(&ms->rp_state.rp_thread);
     ms->rp_state.rp_thread_created = false;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 03/10] migration: Fix possible race when checking to_dst_file for errors
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
  2023-08-11 15:08 ` [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error Fabiano Rosas
  2023-08-11 15:08 ` [PATCH v3 02/10] migration: Fix possible race when shutting return path Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 21:49   ` Peter Xu
  2023-08-11 15:08 ` [PATCH v3 04/10] migration: Fix possible race when shutting down to_dst_file Fabiano Rosas
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

Checking ms->to_dst_file for errors when cleaning up the return path
could race with migrate_fd_cleanup() which clears the pointer.

Since migrate_fd_cleanup() is reachable via qmp_migrate(), which is
issued by the user, it is safer if we take the lock when reading
ms->to_dst_file.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0067c927fa..85c171f32c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2057,11 +2057,10 @@ static int await_return_path_close_on_source(MigrationState *ms)
      * need to cause it to exit. shutdown(2), if we have it, will
      * cause it to unblock if it's stuck waiting for the destination.
      */
-    if (qemu_file_get_error(ms->to_dst_file)) {
-        WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
-            if (ms->rp_state.from_dst_file) {
-                qemu_file_shutdown(ms->rp_state.from_dst_file);
-            }
+    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
+        if (ms->to_dst_file && ms->rp_state.from_dst_file &&
+            qemu_file_get_error(ms->to_dst_file)) {
+            qemu_file_shutdown(ms->rp_state.from_dst_file);
         }
     }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 04/10] migration: Fix possible race when shutting down to_dst_file
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
                   ` (2 preceding siblings ...)
  2023-08-11 15:08 ` [PATCH v3 03/10] migration: Fix possible race when checking to_dst_file for errors Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 21:51   ` Peter Xu
  2023-08-11 15:08 ` [PATCH v3 05/10] migration: Remove redundant cleanup of postcopy_qemufile_src Fabiano Rosas
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

It's not safe to call qemu_file_shutdown() on the to_dst_file without
first checking for the file's presence under the lock. The cleanup of
this file happens at postcopy_pause() and migrate_fd_cleanup() which
are not necessarily running in the same thread as migrate_fd_cancel().

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 85c171f32c..5e6a766235 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1245,7 +1245,7 @@ static void migrate_fd_error(MigrationState *s, const Error *error)
 static void migrate_fd_cancel(MigrationState *s)
 {
     int old_state ;
-    QEMUFile *f = migrate_get_current()->to_dst_file;
+
     trace_migrate_fd_cancel();
 
     WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
@@ -1271,11 +1271,13 @@ static void migrate_fd_cancel(MigrationState *s)
      * If we're unlucky the migration code might be stuck somewhere in a
      * send/write while the network has failed and is waiting to timeout;
      * if we've got shutdown(2) available then we can force it to quit.
-     * The outgoing qemu file gets closed in migrate_fd_cleanup that is
-     * called in a bh, so there is no race against this cancel.
      */
-    if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-        qemu_file_shutdown(f);
+    if (s->state == MIGRATION_STATUS_CANCELLING) {
+        WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
+            if (s->to_dst_file) {
+                qemu_file_shutdown(s->to_dst_file);
+            }
+        }
     }
     if (s->state == MIGRATION_STATUS_CANCELLING && s->block_inactive) {
         Error *local_err = NULL;
@@ -1519,12 +1521,14 @@ void qmp_migrate_pause(Error **errp)
 {
     MigrationState *ms = migrate_get_current();
     MigrationIncomingState *mis = migration_incoming_get_current();
-    int ret;
+    int ret = 0;
 
     if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
         /* Source side, during postcopy */
         qemu_mutex_lock(&ms->qemu_file_lock);
-        ret = qemu_file_shutdown(ms->to_dst_file);
+        if (ms->to_dst_file) {
+            ret = qemu_file_shutdown(ms->to_dst_file);
+        }
         qemu_mutex_unlock(&ms->qemu_file_lock);
         if (ret) {
             error_setg(errp, "Failed to pause source migration");
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 05/10] migration: Remove redundant cleanup of postcopy_qemufile_src
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
                   ` (3 preceding siblings ...)
  2023-08-11 15:08 ` [PATCH v3 04/10] migration: Fix possible race when shutting down to_dst_file Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 21:56   ` Peter Xu
  2023-08-11 15:08 ` [PATCH v3 06/10] migration: Consolidate return path closing code Fabiano Rosas
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

This file is owned by the return path thread which is already doing
cleanup.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5e6a766235..195726eb4a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1177,12 +1177,6 @@ static void migrate_fd_cleanup(MigrationState *s)
         qemu_fclose(tmp);
     }
 
-    if (s->postcopy_qemufile_src) {
-        migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
-        qemu_fclose(s->postcopy_qemufile_src);
-        s->postcopy_qemufile_src = NULL;
-    }
-
     assert(!migration_is_active(s));
 
     if (s->state == MIGRATION_STATUS_CANCELLING) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 06/10] migration: Consolidate return path closing code
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
                   ` (4 preceding siblings ...)
  2023-08-11 15:08 ` [PATCH v3 05/10] migration: Remove redundant cleanup of postcopy_qemufile_src Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 21:57   ` Peter Xu
  2023-08-11 15:08 ` [PATCH v3 07/10] migration: Replace the return path retry logic Fabiano Rosas
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

We'll start calling the await_return_path_close_on_source() function
from other parts of the code, so move all of the related checks and
tracepoints into it.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 195726eb4a..4edbee3a5d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2049,6 +2049,14 @@ static int open_return_path_on_source(MigrationState *ms,
 /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
 static int await_return_path_close_on_source(MigrationState *ms)
 {
+    int ret;
+
+    if (!ms->rp_state.rp_thread_created) {
+        return 0;
+    }
+
+    trace_migration_return_path_end_before();
+
     /*
      * If this is a normal exit then the destination will send a SHUT
      * and the rp_thread will exit, however if there's an error we
@@ -2066,7 +2074,10 @@ static int await_return_path_close_on_source(MigrationState *ms)
     qemu_thread_join(&ms->rp_state.rp_thread);
     ms->rp_state.rp_thread_created = false;
     trace_await_return_path_close_on_source_close();
-    return ms->rp_state.error;
+
+    ret = ms->rp_state.error;
+    trace_migration_return_path_end_after(ret);
+    return ret;
 }
 
 static inline void
@@ -2362,20 +2373,8 @@ static void migration_completion(MigrationState *s)
         goto fail;
     }
 
-    /*
-     * If rp was opened we must clean up the thread before
-     * cleaning everything else up (since if there are no failures
-     * it will wait for the destination to send it's status in
-     * a SHUT command).
-     */
-    if (s->rp_state.rp_thread_created) {
-        int rp_error;
-        trace_migration_return_path_end_before();
-        rp_error = await_return_path_close_on_source(s);
-        trace_migration_return_path_end_after(rp_error);
-        if (rp_error) {
-            goto fail;
-        }
+    if (await_return_path_close_on_source(s)) {
+        goto fail;
     }
 
     if (qemu_file_get_error(s->to_dst_file)) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 07/10] migration: Replace the return path retry logic
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
                   ` (5 preceding siblings ...)
  2023-08-11 15:08 ` [PATCH v3 06/10] migration: Consolidate return path closing code Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 21:58   ` Peter Xu
  2023-08-11 15:08 ` [PATCH v3 08/10] migration: Move return path cleanup to main migration thread Fabiano Rosas
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.

Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.

There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:

Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
154         return f->last_error;

(gdb) bt
 #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
 #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
 #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
 #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
 #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
 #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration                 | qmp                         | return path
--------------------------+-----------------------------+---------------------------------
			    qmp_migrate_pause()
			     shutdown(ms->to_dst_file)
			      f->last_error = -EIO
migrate_detect_error()
 postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
			    qmp_migrate(resume)
			    migrate_fd_connect()
			     resume = state == PAUSED
			     open_return_path <-- TOO SOON!
			     set_state(RECOVER)
			     post(postcopy_pause_sem)
							(incoming closes to_src_file)
							res = qemu_file_get_error(rp)
							migration_release_dst_files()
							ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)
							postcopy_pause_return_path_thread()
							  wait(postcopy_pause_rp_sem)
							rp = ms->rp_state.from_dst_file
							goto retry
							qemu_file_get_error(rp)
							SIGSEGV
-------------------------------------------------------------------------------------------

We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().

Move the retry logic to outside the thread by waiting for the thread
to finish before pausing the migration.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 60 ++++++++-----------------------------------
 migration/migration.h |  1 -
 2 files changed, 11 insertions(+), 50 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4edbee3a5d..7dfcbc3634 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1775,18 +1775,6 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
     }
 }
 
-/* Return true to retry, false to quit */
-static bool postcopy_pause_return_path_thread(MigrationState *s)
-{
-    trace_postcopy_pause_return_path();
-
-    qemu_sem_wait(&s->postcopy_pause_rp_sem);
-
-    trace_postcopy_pause_return_path_continued();
-
-    return true;
-}
-
 static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
 {
     RAMBlock *block = qemu_ram_block_by_name(block_name);
@@ -1870,7 +1858,6 @@ static void *source_return_path_thread(void *opaque)
     trace_source_return_path_thread_entry();
     rcu_register_thread();
 
-retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
            migration_is_setup_or_active(ms->state)) {
         trace_source_return_path_thread_loop_top();
@@ -1992,26 +1979,7 @@ retry:
     }
 
 out:
-    res = qemu_file_get_error(rp);
-    if (res) {
-        if (res && migration_in_postcopy()) {
-            /*
-             * Maybe there is something we can do: it looks like a
-             * network down issue, and we pause for a recovery.
-             */
-            migration_release_dst_files(ms);
-            rp = NULL;
-            if (postcopy_pause_return_path_thread(ms)) {
-                /*
-                 * Reload rp, reset the rest.  Referencing it is safe since
-                 * it's reset only by us above, or when migration completes
-                 */
-                rp = ms->rp_state.from_dst_file;
-                ms->rp_state.error = false;
-                goto retry;
-            }
-        }
-
+    if (qemu_file_get_error(rp)) {
         trace_source_return_path_thread_bad_end();
         mark_source_rp_bad(ms);
     }
@@ -2022,8 +1990,7 @@ out:
     return NULL;
 }
 
-static int open_return_path_on_source(MigrationState *ms,
-                                      bool create_thread)
+static int open_return_path_on_source(MigrationState *ms)
 {
     ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
     if (!ms->rp_state.from_dst_file) {
@@ -2032,11 +1999,6 @@ static int open_return_path_on_source(MigrationState *ms,
 
     trace_open_return_path_on_source();
 
-    if (!create_thread) {
-        /* We're done */
-        return 0;
-    }
-
     qemu_thread_create(&ms->rp_state.rp_thread, "return path",
                        source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
     ms->rp_state.rp_thread_created = true;
@@ -2076,6 +2038,7 @@ static int await_return_path_close_on_source(MigrationState *ms)
     trace_await_return_path_close_on_source_close();
 
     ret = ms->rp_state.error;
+    ms->rp_state.error = false;
     trace_migration_return_path_end_after(ret);
     return ret;
 }
@@ -2551,6 +2514,13 @@ static MigThrError postcopy_pause(MigrationState *s)
         qemu_file_shutdown(file);
         qemu_fclose(file);
 
+        /*
+         * We're already pausing, so ignore any errors on the return
+         * path and just wait for the thread to finish. It will be
+         * re-created when we resume.
+         */
+        await_return_path_close_on_source(s);
+
         migrate_set_state(&s->state, s->state,
                           MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -2568,12 +2538,6 @@ static MigThrError postcopy_pause(MigrationState *s)
         if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
             /* Woken up by a recover procedure. Give it a shot */
 
-            /*
-             * Firstly, let's wake up the return path now, with a new
-             * return path channel.
-             */
-            qemu_sem_post(&s->postcopy_pause_rp_sem);
-
             /* Do the resume logic */
             if (postcopy_do_resume(s) == 0) {
                 /* Let's continue! */
@@ -3263,7 +3227,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
      * QEMU uses the return path.
      */
     if (migrate_postcopy_ram() || migrate_return_path()) {
-        if (open_return_path_on_source(s, !resume)) {
+        if (open_return_path_on_source(s)) {
             error_setg(&local_err, "Unable to open return-path for postcopy");
             migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
             migrate_set_error(s, local_err);
@@ -3327,7 +3291,6 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->rate_limit_sem);
     qemu_sem_destroy(&ms->pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_sem);
-    qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
     qemu_sem_destroy(&ms->rp_state.rp_sem);
     qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
@@ -3347,7 +3310,6 @@ static void migration_instance_init(Object *obj)
     migrate_params_init(&ms->parameters);
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
-    qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_pong_acks, 0);
     qemu_sem_init(&ms->rate_limit_sem, 0);
diff --git a/migration/migration.h b/migration/migration.h
index 6eea18db36..36eb5ba70b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -382,7 +382,6 @@ struct MigrationState {
 
     /* Needed by postcopy-pause state */
     QemuSemaphore postcopy_pause_sem;
-    QemuSemaphore postcopy_pause_rp_sem;
     /*
      * Whether we abort the migration if decompression errors are
      * detected at the destination. It is left at false for qemu
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 08/10] migration: Move return path cleanup to main migration thread
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
                   ` (6 preceding siblings ...)
  2023-08-11 15:08 ` [PATCH v3 07/10] migration: Replace the return path retry logic Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 22:02   ` Peter Xu
  2023-08-11 15:08 ` [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files Fabiano Rosas
  2023-08-11 15:08 ` [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files Fabiano Rosas
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

Now that the return path thread is allowed to finish during a paused
migration, we can move the cleanup of the QEMUFiles to the main
migration thread.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7dfcbc3634..7fec57ad7f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -98,6 +98,7 @@ static int migration_maybe_pause(MigrationState *s,
                                  int *current_active_state,
                                  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
+static int await_return_path_close_on_source(MigrationState *s);
 
 static bool migration_needs_multiple_sockets(void)
 {
@@ -1177,6 +1178,12 @@ static void migrate_fd_cleanup(MigrationState *s)
         qemu_fclose(tmp);
     }
 
+    /*
+     * We already cleaned up to_dst_file, so errors from the return
+     * path might be due to that, ignore them.
+     */
+    await_return_path_close_on_source(s);
+
     assert(!migration_is_active(s));
 
     if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -1985,7 +1992,6 @@ out:
     }
 
     trace_source_return_path_thread_end();
-    migration_release_dst_files(ms);
     rcu_unregister_thread();
     return NULL;
 }
@@ -2039,6 +2045,9 @@ static int await_return_path_close_on_source(MigrationState *ms)
 
     ret = ms->rp_state.error;
     ms->rp_state.error = false;
+
+    migration_release_dst_files(ms);
+
     trace_migration_return_path_end_after(ret);
     return ret;
 }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
                   ` (7 preceding siblings ...)
  2023-08-11 15:08 ` [PATCH v3 08/10] migration: Move return path cleanup to main migration thread Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 22:08   ` Peter Xu
  2023-08-11 15:08 ` [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files Fabiano Rosas
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

When doing cleanup, we currently close() some of the shared migration
files and shutdown() + close() others. Be consistent by always calling
shutdown() before close().

Do this only for the source files for now because the source runs
multiple threads which could cause races between the two calls. Having
them together allows us to move them to a centralized place under the
protection of a lock the next patch.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 7fec57ad7f..4df5ca25c1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1175,6 +1175,7 @@ static void migrate_fd_cleanup(MigrationState *s)
          * critical section won't block for long.
          */
         migration_ioc_unregister_yank_from_file(tmp);
+        qemu_file_shutdown(tmp);
         qemu_fclose(tmp);
     }
 
@@ -1844,6 +1845,7 @@ static void migration_release_dst_files(MigrationState *ms)
         ms->postcopy_qemufile_src = NULL;
     }
 
+    qemu_file_shutdown(file);
     qemu_fclose(file);
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files
  2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
                   ` (8 preceding siblings ...)
  2023-08-11 15:08 ` [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files Fabiano Rosas
@ 2023-08-11 15:08 ` Fabiano Rosas
  2023-08-15 22:15   ` Peter Xu
  9 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-11 15:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras, Lukas Straub

We currently have a pattern for cleaning up a migration QEMUFile:

  qemu_mutex_lock(&s->qemu_file_lock);
  file = s->file_name;
  s->file_name = NULL;
  qemu_mutex_unlock(&s->qemu_file_lock);

  migration_ioc_unregister_yank_from_file(file);
  qemu_file_shutdown(file);
  qemu_fclose(file);

There are some considerations for this sequence:

- we must clear the pointer under the lock, to avoid TOC/TOU bugs;
- the shutdown() and close() expect be given a non-null parameter;
- a close() in one thread should not race with a shutdown() in another;

Create a wrapper function to make sure everything works correctly.

Note: the return path did not used to call
      migration_ioc_unregister_yank_from_file(), but I added it
      nonetheless for uniformity.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 92 ++++++++++++-------------------------------
 util/yank.c           |  2 -
 2 files changed, 26 insertions(+), 68 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4df5ca25c1..3c33e4fae4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -217,6 +217,27 @@ MigrationIncomingState *migration_incoming_get_current(void)
     return current_incoming;
 }
 
+static void migration_file_release(QEMUFile **file)
+{
+    MigrationState *ms = migrate_get_current();
+    QEMUFile *tmp;
+
+    /*
+     * Reset the pointer before releasing it to avoid holding the lock
+     * for too long.
+     */
+    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
+        tmp = *file;
+        *file = NULL;
+    }
+
+    if (tmp) {
+        migration_ioc_unregister_yank_from_file(tmp);
+        qemu_file_shutdown(tmp);
+        qemu_fclose(tmp);
+    }
+}
+
 void migration_incoming_transport_cleanup(MigrationIncomingState *mis)
 {
     if (mis->socket_address_list) {
@@ -1155,8 +1176,6 @@ static void migrate_fd_cleanup(MigrationState *s)
     qemu_savevm_state_cleanup();
 
     if (s->to_dst_file) {
-        QEMUFile *tmp;
-
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
         if (s->migration_thread_running) {
@@ -1166,17 +1185,7 @@ static void migrate_fd_cleanup(MigrationState *s)
         qemu_mutex_lock_iothread();
 
         multifd_save_cleanup();
-        qemu_mutex_lock(&s->qemu_file_lock);
-        tmp = s->to_dst_file;
-        s->to_dst_file = NULL;
-        qemu_mutex_unlock(&s->qemu_file_lock);
-        /*
-         * Close the file handle without the lock to make sure the
-         * critical section won't block for long.
-         */
-        migration_ioc_unregister_yank_from_file(tmp);
-        qemu_file_shutdown(tmp);
-        qemu_fclose(tmp);
+        migration_file_release(&s->to_dst_file);
     }
 
     /*
@@ -1816,39 +1825,6 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
     return 0;
 }
 
-/*
- * Release ms->rp_state.from_dst_file (and postcopy_qemufile_src if
- * existed) in a safe way.
- */
-static void migration_release_dst_files(MigrationState *ms)
-{
-    QEMUFile *file;
-
-    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
-        /*
-         * Reset the from_dst_file pointer first before releasing it, as we
-         * can't block within lock section
-         */
-        file = ms->rp_state.from_dst_file;
-        ms->rp_state.from_dst_file = NULL;
-    }
-
-    /*
-     * Do the same to postcopy fast path socket too if there is.  No
-     * locking needed because this qemufile should only be managed by
-     * return path thread.
-     */
-    if (ms->postcopy_qemufile_src) {
-        migration_ioc_unregister_yank_from_file(ms->postcopy_qemufile_src);
-        qemu_file_shutdown(ms->postcopy_qemufile_src);
-        qemu_fclose(ms->postcopy_qemufile_src);
-        ms->postcopy_qemufile_src = NULL;
-    }
-
-    qemu_file_shutdown(file);
-    qemu_fclose(file);
-}
-
 /*
  * Handles messages sent on the return path towards the source VM
  *
@@ -2048,7 +2024,8 @@ static int await_return_path_close_on_source(MigrationState *ms)
     ret = ms->rp_state.error;
     ms->rp_state.error = false;
 
-    migration_release_dst_files(ms);
+    migration_file_release(&ms->rp_state.from_dst_file);
+    migration_file_release(&ms->postcopy_qemufile_src);
 
     trace_migration_return_path_end_after(ret);
     return ret;
@@ -2504,26 +2481,9 @@ static MigThrError postcopy_pause(MigrationState *s)
     assert(s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
     while (true) {
-        QEMUFile *file;
-
-        /*
-         * Current channel is possibly broken. Release it.  Note that this is
-         * guaranteed even without lock because to_dst_file should only be
-         * modified by the migration thread.  That also guarantees that the
-         * unregister of yank is safe too without the lock.  It should be safe
-         * even to be within the qemu_file_lock, but we didn't do that to avoid
-         * taking more mutex (yank_lock) within qemu_file_lock.  TL;DR: we make
-         * the qemu_file_lock critical section as small as possible.
-         */
+        /* Current channel is possibly broken. Release it. */
         assert(s->to_dst_file);
-        migration_ioc_unregister_yank_from_file(s->to_dst_file);
-        qemu_mutex_lock(&s->qemu_file_lock);
-        file = s->to_dst_file;
-        s->to_dst_file = NULL;
-        qemu_mutex_unlock(&s->qemu_file_lock);
-
-        qemu_file_shutdown(file);
-        qemu_fclose(file);
+        migration_file_release(&s->to_dst_file);
 
         /*
          * We're already pausing, so ignore any errors on the return
diff --git a/util/yank.c b/util/yank.c
index abf47c346d..4b6afbf589 100644
--- a/util/yank.c
+++ b/util/yank.c
@@ -146,8 +146,6 @@ void yank_unregister_function(const YankInstance *instance,
             return;
         }
     }
-
-    abort();
 }
 
 void qmp_yank(YankInstanceList *instances,
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 03/10] migration: Fix possible race when checking to_dst_file for errors
  2023-08-11 15:08 ` [PATCH v3 03/10] migration: Fix possible race when checking to_dst_file for errors Fabiano Rosas
@ 2023-08-15 21:49   ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-15 21:49 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Fri, Aug 11, 2023 at 12:08:29PM -0300, Fabiano Rosas wrote:
> diff --git a/migration/migration.c b/migration/migration.c
> index 0067c927fa..85c171f32c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2057,11 +2057,10 @@ static int await_return_path_close_on_source(MigrationState *ms)
>       * need to cause it to exit. shutdown(2), if we have it, will
>       * cause it to unblock if it's stuck waiting for the destination.
>       */
> -    if (qemu_file_get_error(ms->to_dst_file)) {
> -        WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
> -            if (ms->rp_state.from_dst_file) {
> -                qemu_file_shutdown(ms->rp_state.from_dst_file);
> -            }
> +    WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
> +        if (ms->to_dst_file && ms->rp_state.from_dst_file &&
> +            qemu_file_get_error(ms->to_dst_file)) {
> +            qemu_file_shutdown(ms->rp_state.from_dst_file);
>          }
>      }

Squash into previous one?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error
  2023-08-11 15:08 ` [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error Fabiano Rosas
@ 2023-08-15 21:49   ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-15 21:49 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Fri, Aug 11, 2023 at 12:08:27PM -0300, Fabiano Rosas wrote:
> We don't need to set the rp_state.error right after a shutdown because
> qemu_file_shutdown() always sets the QEMUFile error, so the return
> path thread would have seen it and set the rp error itself.
> 
> Setting the error outside of the thread is also racy because the
> thread could clear it after we set it.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 04/10] migration: Fix possible race when shutting down to_dst_file
  2023-08-11 15:08 ` [PATCH v3 04/10] migration: Fix possible race when shutting down to_dst_file Fabiano Rosas
@ 2023-08-15 21:51   ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-15 21:51 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Fri, Aug 11, 2023 at 12:08:30PM -0300, Fabiano Rosas wrote:
> It's not safe to call qemu_file_shutdown() on the to_dst_file without
> first checking for the file's presence under the lock. The cleanup of
> this file happens at postcopy_pause() and migrate_fd_cleanup() which
> are not necessarily running in the same thread as migrate_fd_cancel().
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 05/10] migration: Remove redundant cleanup of postcopy_qemufile_src
  2023-08-11 15:08 ` [PATCH v3 05/10] migration: Remove redundant cleanup of postcopy_qemufile_src Fabiano Rosas
@ 2023-08-15 21:56   ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-15 21:56 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Fri, Aug 11, 2023 at 12:08:31PM -0300, Fabiano Rosas wrote:
> This file is owned by the return path thread which is already doing
> cleanup.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 06/10] migration: Consolidate return path closing code
  2023-08-11 15:08 ` [PATCH v3 06/10] migration: Consolidate return path closing code Fabiano Rosas
@ 2023-08-15 21:57   ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-15 21:57 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Fri, Aug 11, 2023 at 12:08:32PM -0300, Fabiano Rosas wrote:
> We'll start calling the await_return_path_close_on_source() function
> from other parts of the code, so move all of the related checks and
> tracepoints into it.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 07/10] migration: Replace the return path retry logic
  2023-08-11 15:08 ` [PATCH v3 07/10] migration: Replace the return path retry logic Fabiano Rosas
@ 2023-08-15 21:58   ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-15 21:58 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Fri, Aug 11, 2023 at 12:08:33PM -0300, Fabiano Rosas wrote:
> Replace the return path retry logic with finishing and restarting the
> thread. This fixes a race when resuming the migration that leads to a
> segfault.
> 
> Currently when doing postcopy we consider that an IO error on the
> return path file could be due to a network intermittency. We then keep
> the thread alive but have it do cleanup of the 'from_dst_file' and
> wait on the 'postcopy_pause_rp' semaphore. When the user issues a
> migrate resume, a new return path is opened and the thread is allowed
> to continue.
> 
> There's a race condition in the above mechanism. It is possible for
> the new return path file to be setup *before* the cleanup code in the
> return path thread has had a chance to run, leading to the *new* file
> being closed and the pointer set to NULL. When the thread is released
> after the resume, it tries to dereference 'from_dst_file' and crashes:
> 
> Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
> 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
> 154         return f->last_error;
> 
> (gdb) bt
>  #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
>  #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
>  #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
>  #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
>  #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
>  #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> Here's the race (important bit is open_return_path happening before
> migration_release_dst_files):
> 
> migration                 | qmp                         | return path
> --------------------------+-----------------------------+---------------------------------
> 			    qmp_migrate_pause()
> 			     shutdown(ms->to_dst_file)
> 			      f->last_error = -EIO
> migrate_detect_error()
>  postcopy_pause()
>   set_state(PAUSED)
>   wait(postcopy_pause_sem)
> 			    qmp_migrate(resume)
> 			    migrate_fd_connect()
> 			     resume = state == PAUSED
> 			     open_return_path <-- TOO SOON!
> 			     set_state(RECOVER)
> 			     post(postcopy_pause_sem)
> 							(incoming closes to_src_file)
> 							res = qemu_file_get_error(rp)
> 							migration_release_dst_files()
> 							ms->rp_state.from_dst_file = NULL
>   post(postcopy_pause_rp_sem)
> 							postcopy_pause_return_path_thread()
> 							  wait(postcopy_pause_rp_sem)
> 							rp = ms->rp_state.from_dst_file
> 							goto retry
> 							qemu_file_get_error(rp)
> 							SIGSEGV
> -------------------------------------------------------------------------------------------
> 
> We can keep the retry logic without having the thread alive and
> waiting. The only piece of data used by it is the 'from_dst_file' and
> it is only allowed to proceed after a migrate resume is issued and the
> semaphore released at migrate_fd_connect().
> 
> Move the retry logic to outside the thread by waiting for the thread
> to finish before pausing the migration.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 08/10] migration: Move return path cleanup to main migration thread
  2023-08-11 15:08 ` [PATCH v3 08/10] migration: Move return path cleanup to main migration thread Fabiano Rosas
@ 2023-08-15 22:02   ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-15 22:02 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Fri, Aug 11, 2023 at 12:08:34PM -0300, Fabiano Rosas wrote:
> Now that the return path thread is allowed to finish during a paused
> migration, we can move the cleanup of the QEMUFiles to the main
> migration thread.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files
  2023-08-11 15:08 ` [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files Fabiano Rosas
@ 2023-08-15 22:08   ` Peter Xu
  2023-08-15 22:19     ` Fabiano Rosas
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Xu @ 2023-08-15 22:08 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Fri, Aug 11, 2023 at 12:08:35PM -0300, Fabiano Rosas wrote:
> When doing cleanup, we currently close() some of the shared migration
> files and shutdown() + close() others. Be consistent by always calling
> shutdown() before close().
> 
> Do this only for the source files for now because the source runs
> multiple threads which could cause races between the two calls. Having
> them together allows us to move them to a centralized place under the
> protection of a lock the next patch.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Logically I think we should only need shutdown() when we don't want to
close immediately, or can't for some reason..  Maybe instead of adding
shutdown()s, we can remove some?

> ---
>  migration/migration.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 7fec57ad7f..4df5ca25c1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1175,6 +1175,7 @@ static void migrate_fd_cleanup(MigrationState *s)
>           * critical section won't block for long.
>           */
>          migration_ioc_unregister_yank_from_file(tmp);
> +        qemu_file_shutdown(tmp);
>          qemu_fclose(tmp);
>      }
>  
> @@ -1844,6 +1845,7 @@ static void migration_release_dst_files(MigrationState *ms)
>          ms->postcopy_qemufile_src = NULL;
>      }
>  
> +    qemu_file_shutdown(file);
>      qemu_fclose(file);
>  }
>  
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files
  2023-08-11 15:08 ` [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files Fabiano Rosas
@ 2023-08-15 22:15   ` Peter Xu
  2023-08-15 22:31     ` Fabiano Rosas
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Xu @ 2023-08-15 22:15 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras, Lukas Straub

On Fri, Aug 11, 2023 at 12:08:36PM -0300, Fabiano Rosas wrote:
> We currently have a pattern for cleaning up a migration QEMUFile:
> 
>   qemu_mutex_lock(&s->qemu_file_lock);
>   file = s->file_name;
>   s->file_name = NULL;
>   qemu_mutex_unlock(&s->qemu_file_lock);
> 
>   migration_ioc_unregister_yank_from_file(file);
>   qemu_file_shutdown(file);
>   qemu_fclose(file);
> 
> There are some considerations for this sequence:
> 
> - we must clear the pointer under the lock, to avoid TOC/TOU bugs;
> - the shutdown() and close() expect be given a non-null parameter;
> - a close() in one thread should not race with a shutdown() in another;
> 
> Create a wrapper function to make sure everything works correctly.
> 
> Note: the return path did not used to call
>       migration_ioc_unregister_yank_from_file(), but I added it
>       nonetheless for uniformity.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

This definitely looks cleaner.  Probably can be squashed together with
previous patch?  If you could double check whether we can just drop the
shutdown() all over the places when close() altogether, it'll be even
nicer (I hope I didn't miss any real reasons to explicitly do that).

> diff --git a/util/yank.c b/util/yank.c
> index abf47c346d..4b6afbf589 100644
> --- a/util/yank.c
> +++ b/util/yank.c
> @@ -146,8 +146,6 @@ void yank_unregister_function(const YankInstance *instance,
>              return;
>          }
>      }
> -
> -    abort();

I think we can't silently do this.  This check is very strict and I guess
you removed it because you hit a crash.  What's the crash?  Can we just
pair the yank reg/unreg?

>  }
>  
>  void qmp_yank(YankInstanceList *instances,
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files
  2023-08-15 22:08   ` Peter Xu
@ 2023-08-15 22:19     ` Fabiano Rosas
  2023-08-15 22:34       ` Peter Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-15 22:19 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

Peter Xu <peterx@redhat.com> writes:

> On Fri, Aug 11, 2023 at 12:08:35PM -0300, Fabiano Rosas wrote:
>> When doing cleanup, we currently close() some of the shared migration
>> files and shutdown() + close() others. Be consistent by always calling
>> shutdown() before close().
>> 
>> Do this only for the source files for now because the source runs
>> multiple threads which could cause races between the two calls. Having
>> them together allows us to move them to a centralized place under the
>> protection of a lock the next patch.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Logically I think we should only need shutdown() when we don't want to
> close immediately, or can't for some reason..  Maybe instead of adding
> shutdown()s, we can remove some?

Wouldn't shutdown() affect what the other end of the socket sees? I
thought we used shutdown() before close() as a way to end the connection
in a cleaner manner.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files
  2023-08-15 22:15   ` Peter Xu
@ 2023-08-15 22:31     ` Fabiano Rosas
  2023-08-16 14:26       ` Peter Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-15 22:31 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras, Lukas Straub

Peter Xu <peterx@redhat.com> writes:

> On Fri, Aug 11, 2023 at 12:08:36PM -0300, Fabiano Rosas wrote:
>> We currently have a pattern for cleaning up a migration QEMUFile:
>> 
>>   qemu_mutex_lock(&s->qemu_file_lock);
>>   file = s->file_name;
>>   s->file_name = NULL;
>>   qemu_mutex_unlock(&s->qemu_file_lock);
>> 
>>   migration_ioc_unregister_yank_from_file(file);
>>   qemu_file_shutdown(file);
>>   qemu_fclose(file);
>> 
>> There are some considerations for this sequence:
>> 
>> - we must clear the pointer under the lock, to avoid TOC/TOU bugs;
>> - the shutdown() and close() expect be given a non-null parameter;
>> - a close() in one thread should not race with a shutdown() in another;
>> 
>> Create a wrapper function to make sure everything works correctly.
>> 
>> Note: the return path did not used to call
>>       migration_ioc_unregister_yank_from_file(), but I added it
>>       nonetheless for uniformity.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> This definitely looks cleaner.  Probably can be squashed together with
> previous patch?  If you could double check whether we can just drop the
> shutdown() all over the places when close() altogether, it'll be even
> nicer (I hope I didn't miss any real reasons to explicitly do that).
>
>> diff --git a/util/yank.c b/util/yank.c
>> index abf47c346d..4b6afbf589 100644
>> --- a/util/yank.c
>> +++ b/util/yank.c
>> @@ -146,8 +146,6 @@ void yank_unregister_function(const YankInstance *instance,
>>              return;
>>          }
>>      }
>> -
>> -    abort();
>
> I think we can't silently do this.  This check is very strict and I guess
> you removed it because you hit a crash.  What's the crash?  Can we just
> pair the yank reg/unreg?
>

Well, the abort() is the crash. It just means that we looped and didn't
find the handler to unregister. It looks harmless to me. I should have
mentioned this in the commit message.

I could certainly add a yank handler to the rp_state.from_dst_file. But
then I have no idea what will happen if we try to yank the return path
at a random moment.

Side note: I see that yank does a qio_channel_shutdown() without the
controversial setting of -EIO. Which means it is probably succeptible to
the same race described in the qemu_file_shutdown() code.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files
  2023-08-15 22:19     ` Fabiano Rosas
@ 2023-08-15 22:34       ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-15 22:34 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Tue, Aug 15, 2023 at 07:19:43PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Aug 11, 2023 at 12:08:35PM -0300, Fabiano Rosas wrote:
> >> When doing cleanup, we currently close() some of the shared migration
> >> files and shutdown() + close() others. Be consistent by always calling
> >> shutdown() before close().
> >> 
> >> Do this only for the source files for now because the source runs
> >> multiple threads which could cause races between the two calls. Having
> >> them together allows us to move them to a centralized place under the
> >> protection of a lock the next patch.
> >> 
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >
> > Logically I think we should only need shutdown() when we don't want to
> > close immediately, or can't for some reason..  Maybe instead of adding
> > shutdown()s, we can remove some?
> 
> Wouldn't shutdown() affect what the other end of the socket sees? I
> thought we used shutdown() before close() as a way to end the connection
> in a cleaner manner.

Not something in my memory.  Would you try to avoid shutdown() for whatever
we'll close() immediately with next patch?  I'd expect no change, but I'm
happy to be corrected...

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files
  2023-08-15 22:31     ` Fabiano Rosas
@ 2023-08-16 14:26       ` Peter Xu
  2023-08-16 14:57         ` Fabiano Rosas
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Xu @ 2023-08-16 14:26 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras, Lukas Straub

On Tue, Aug 15, 2023 at 07:31:28PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Fri, Aug 11, 2023 at 12:08:36PM -0300, Fabiano Rosas wrote:
> >> We currently have a pattern for cleaning up a migration QEMUFile:
> >> 
> >>   qemu_mutex_lock(&s->qemu_file_lock);
> >>   file = s->file_name;
> >>   s->file_name = NULL;
> >>   qemu_mutex_unlock(&s->qemu_file_lock);
> >> 
> >>   migration_ioc_unregister_yank_from_file(file);
> >>   qemu_file_shutdown(file);
> >>   qemu_fclose(file);
> >> 
> >> There are some considerations for this sequence:
> >> 
> >> - we must clear the pointer under the lock, to avoid TOC/TOU bugs;
> >> - the shutdown() and close() expect be given a non-null parameter;
> >> - a close() in one thread should not race with a shutdown() in another;
> >> 
> >> Create a wrapper function to make sure everything works correctly.
> >> 
> >> Note: the return path did not used to call
> >>       migration_ioc_unregister_yank_from_file(), but I added it
> >>       nonetheless for uniformity.
> >> 
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >
> > This definitely looks cleaner.  Probably can be squashed together with
> > previous patch?  If you could double check whether we can just drop the
> > shutdown() all over the places when close() altogether, it'll be even
> > nicer (I hope I didn't miss any real reasons to explicitly do that).
> >
> >> diff --git a/util/yank.c b/util/yank.c
> >> index abf47c346d..4b6afbf589 100644
> >> --- a/util/yank.c
> >> +++ b/util/yank.c
> >> @@ -146,8 +146,6 @@ void yank_unregister_function(const YankInstance *instance,
> >>              return;
> >>          }
> >>      }
> >> -
> >> -    abort();
> >
> > I think we can't silently do this.  This check is very strict and I guess
> > you removed it because you hit a crash.  What's the crash?  Can we just
> > pair the yank reg/unreg?
> >
> 
> Well, the abort() is the crash. It just means that we looped and didn't
> find the handler to unregister. It looks harmless to me. I should have
> mentioned this in the commit message.

Yeah, trust me I wanted to remove that for quite a few times. :) But then I
normally decided to try harder to find what's missing; and so far indeed I
found that the cleanest way is always pair the reg/unreg.

> 
> I could certainly add a yank handler to the rp_state.from_dst_file. But
> then I have no idea what will happen if we try to yank the return path
> at a random moment.

I think the idea was it should be registered always when the channel is
created, and then unregistered when the channel is destroyed.  They should
just pair, alongside with the channel's lifecycle?

> 
> Side note: I see that yank does a qio_channel_shutdown() without the
> controversial setting of -EIO. Which means it is probably succeptible to
> the same race described in the qemu_file_shutdown() code.

Are you looking outside migration code (I saw nbd_teardown_connection()
does have one)?

For migration IIUC it's always via migration_ioc_unregister_yank().

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files
  2023-08-16 14:26       ` Peter Xu
@ 2023-08-16 14:57         ` Fabiano Rosas
  2023-08-16 15:26           ` Peter Xu
  0 siblings, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-08-16 14:57 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras, Lukas Straub

Peter Xu <peterx@redhat.com> writes:

> On Tue, Aug 15, 2023 at 07:31:28PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Fri, Aug 11, 2023 at 12:08:36PM -0300, Fabiano Rosas wrote:
>> >> We currently have a pattern for cleaning up a migration QEMUFile:
>> >> 
>> >>   qemu_mutex_lock(&s->qemu_file_lock);
>> >>   file = s->file_name;
>> >>   s->file_name = NULL;
>> >>   qemu_mutex_unlock(&s->qemu_file_lock);
>> >> 
>> >>   migration_ioc_unregister_yank_from_file(file);
>> >>   qemu_file_shutdown(file);
>> >>   qemu_fclose(file);
>> >> 
>> >> There are some considerations for this sequence:
>> >> 
>> >> - we must clear the pointer under the lock, to avoid TOC/TOU bugs;
>> >> - the shutdown() and close() expect be given a non-null parameter;
>> >> - a close() in one thread should not race with a shutdown() in another;
>> >> 
>> >> Create a wrapper function to make sure everything works correctly.
>> >> 
>> >> Note: the return path did not used to call
>> >>       migration_ioc_unregister_yank_from_file(), but I added it
>> >>       nonetheless for uniformity.
>> >> 
>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >
>> > This definitely looks cleaner.  Probably can be squashed together with
>> > previous patch?  If you could double check whether we can just drop the
>> > shutdown() all over the places when close() altogether, it'll be even
>> > nicer (I hope I didn't miss any real reasons to explicitly do that).
>> >
>> >> diff --git a/util/yank.c b/util/yank.c
>> >> index abf47c346d..4b6afbf589 100644
>> >> --- a/util/yank.c
>> >> +++ b/util/yank.c
>> >> @@ -146,8 +146,6 @@ void yank_unregister_function(const YankInstance *instance,
>> >>              return;
>> >>          }
>> >>      }
>> >> -
>> >> -    abort();
>> >
>> > I think we can't silently do this.  This check is very strict and I guess
>> > you removed it because you hit a crash.  What's the crash?  Can we just
>> > pair the yank reg/unreg?
>> >
>> 
>> Well, the abort() is the crash. It just means that we looped and didn't
>> find the handler to unregister. It looks harmless to me. I should have
>> mentioned this in the commit message.
>
> Yeah, trust me I wanted to remove that for quite a few times. :) But then I
> normally decided to try harder to find what's missing; and so far indeed I
> found that the cleanest way is always pair the reg/unreg.
>
>> 
>> I could certainly add a yank handler to the rp_state.from_dst_file. But
>> then I have no idea what will happen if we try to yank the return path
>> at a random moment.
>
> I think the idea was it should be registered always when the channel is
> created, and then unregistered when the channel is destroyed.  They should
> just pair, alongside with the channel's lifecycle?
>
>> 
>> Side note: I see that yank does a qio_channel_shutdown() without the
>> controversial setting of -EIO. Which means it is probably succeptible to
>> the same race described in the qemu_file_shutdown() code.
>
> Are you looking outside migration code (I saw nbd_teardown_connection()
> does have one)?
>
> For migration IIUC it's always via migration_ioc_unregister_yank().

I'm talking about the actual yank action, not the unregister.

migration_yank_iochannel() calls qio_channel_shutdown() in the same way
as qemu_file_shutdown(), but unlike the latter, it doesn't set
f->last_error = -EIO. Which means that in theory, we could yank and
still try to use the QEMUFile.

In other words, what commit a555b8092a ("qemu-file: Don't do IO after
shutdown") did does not apply to yank because yank didn't exit at the
time.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files
  2023-08-16 14:57         ` Fabiano Rosas
@ 2023-08-16 15:26           ` Peter Xu
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Xu @ 2023-08-16 15:26 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras, Lukas Straub

On Wed, Aug 16, 2023 at 11:57:17AM -0300, Fabiano Rosas wrote:
> I'm talking about the actual yank action, not the unregister.
> 
> migration_yank_iochannel() calls qio_channel_shutdown() in the same way
> as qemu_file_shutdown(), but unlike the latter, it doesn't set
> f->last_error = -EIO. Which means that in theory, we could yank and
> still try to use the QEMUFile.
> 
> In other words, what commit a555b8092a ("qemu-file: Don't do IO after
> shutdown") did does not apply to yank because yank didn't exit at the
> time.

Ah ok..

Perhaps we should register yank over the qemufiles not ioc for migrations?

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-08-16 15:26 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
2023-08-11 15:08 ` [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error Fabiano Rosas
2023-08-15 21:49   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 02/10] migration: Fix possible race when shutting return path Fabiano Rosas
2023-08-11 15:08 ` [PATCH v3 03/10] migration: Fix possible race when checking to_dst_file for errors Fabiano Rosas
2023-08-15 21:49   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 04/10] migration: Fix possible race when shutting down to_dst_file Fabiano Rosas
2023-08-15 21:51   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 05/10] migration: Remove redundant cleanup of postcopy_qemufile_src Fabiano Rosas
2023-08-15 21:56   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 06/10] migration: Consolidate return path closing code Fabiano Rosas
2023-08-15 21:57   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 07/10] migration: Replace the return path retry logic Fabiano Rosas
2023-08-15 21:58   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 08/10] migration: Move return path cleanup to main migration thread Fabiano Rosas
2023-08-15 22:02   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files Fabiano Rosas
2023-08-15 22:08   ` Peter Xu
2023-08-15 22:19     ` Fabiano Rosas
2023-08-15 22:34       ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files Fabiano Rosas
2023-08-15 22:15   ` Peter Xu
2023-08-15 22:31     ` Fabiano Rosas
2023-08-16 14:26       ` Peter Xu
2023-08-16 14:57         ` Fabiano Rosas
2023-08-16 15:26           ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).