[PATCH v2 0/2] Fix segfault on migration return path

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Fix segfault on migration return path
@ 2023-08-02 14:36 Fabiano Rosas
  2023-08-02 14:36 ` [PATCH v2 1/2] migration: Split await_return_path_close_on_source Fabiano Rosas
  2023-08-02 14:36 ` [PATCH v2 2/2] migration: Replace the return path retry logic Fabiano Rosas
  0 siblings, 2 replies; 14+ messages in thread
From: Fabiano Rosas @ 2023-08-02 14:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang

For this version:

- moved the await into postcopy_pause() as Peter suggested;

- brought back the mark_source_rp_bad call. Turns out that piece of
code is filled with nuance. I just moved it aside since it doesn't
make sense during pause/resume. We can tackle that when we get the
chance.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/953420150
Also ran the switchover and preempt tests for 1000 times each on
x86_64.

v1:
https://lore.kernel.org/r/20230728121516.16258-1-farosas@suse.de

The /x86_64/migration/postcopy/preempt/recovery/plain test is
sometimes failing due a segmentation fault on the migration return
path. There is a race involving the retry logic of the return path and
the migration resume command.

The issue happens when the retry logic tries to cleanup the current
return path file, but ends up cleaning the new one and trying to use
it right after. Tracing shows it clearly:

open_return_path_on_source  <-- at migration start
open_return_path_on_source_continue <-- rp thread created
postcopy_pause_incoming
postcopy_pause_fast_load
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. (incoming)
postcopy_pause_fault_thread
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused. (source)
postcopy_pause_incoming_continued
open_return_path_on_source   <-- NOK, too soon
postcopy_pause_continued
postcopy_pause_return_path   <-- too late, already operating on the new from_dst_file
postcopy_pause_return_path_continued <-- will continue and crash
postcopy_pause_incoming
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
postcopy_pause_incoming_continued

We could solve this by adding some form of synchronization to ensure
that we always do the cleanup before setting up the new file, but I
find it more straight-forward to move the retry logic outside of the
thread by letting it finish and starting a new thread when resuming
the migration.

More details on the commit message.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/947875609

Fabiano Rosas (2):
  migration: Split await_return_path_close_on_source
  migration: Replace the return path retry logic

 migration/migration.c  | 110 ++++++++++++++++-------------------------
 migration/migration.h  |   1 -
 migration/trace-events |   1 +
 3 files changed, 44 insertions(+), 68 deletions(-)

-- 
2.35.3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] migration: Split await_return_path_close_on_source
  2023-08-02 14:36 [PATCH v2 0/2] Fix segfault on migration return path Fabiano Rosas
@ 2023-08-02 14:36 ` Fabiano Rosas
  2023-08-02 16:19   ` Peter Xu
  2023-08-02 14:36 ` [PATCH v2 2/2] migration: Replace the return path retry logic Fabiano Rosas
  1 sibling, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2023-08-02 14:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

This function currently has a straight-forward part which is waiting
for the thread to join and a complicated part which is doing a
qemu_file_shutdown() on the return path file.

The shutdown is tricky because all calls to qemu_file_shutdown() set
f->last_error to -EIO, which means we can never know if an error is an
actual error or if we cleanly shutdown the file previously.

This is particularly bothersome for postcopy because it would send the
return path thread into the retry routine which would wait on the
postcopy_pause_rp_sem and consequently block the main thread. We
haven't had reports of this so I must presume we never reach here with
postcopy.

The shutdown call is also racy because since it doesn't take the
qemu_file_lock, it could NULL-dereference if the return path thread
happens to be in the middle of the critical region at
migration_release_dst_files().

Move this more complicated part of the code to a separate routine so
we can wait on the thread without all of this baggage.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 46 +++++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 91bba630a8..58f09275a8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2038,6 +2038,25 @@ static int open_return_path_on_source(MigrationState *ms,
 /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
 static int await_return_path_close_on_source(MigrationState *ms)
 {
+    if (!ms->rp_state.rp_thread_created) {
+        return 0;
+    }
+
+    trace_await_return_path_close_on_source_joining();
+    qemu_thread_join(&ms->rp_state.rp_thread);
+    ms->rp_state.rp_thread_created = false;
+    trace_await_return_path_close_on_source_close();
+    return ms->rp_state.error;
+}
+
+static int close_return_path_on_source(MigrationState *ms)
+{
+    int ret;
+
+    if (!ms->rp_state.rp_thread_created) {
+        return 0;
+    }
+
     /*
      * If this is a normal exit then the destination will send a SHUT and the
      * rp_thread will exit, however if there's an error we need to cause
@@ -2051,11 +2070,12 @@ static int await_return_path_close_on_source(MigrationState *ms)
         qemu_file_shutdown(ms->rp_state.from_dst_file);
         mark_source_rp_bad(ms);
     }
-    trace_await_return_path_close_on_source_joining();
-    qemu_thread_join(&ms->rp_state.rp_thread);
-    ms->rp_state.rp_thread_created = false;
-    trace_await_return_path_close_on_source_close();
-    return ms->rp_state.error;
+
+    trace_migration_return_path_end_before();
+    ret = await_return_path_close_on_source(ms);
+    trace_migration_return_path_end_after(ret);
+
+    return ret;
 }
 
 static inline void
@@ -2351,20 +2371,8 @@ static void migration_completion(MigrationState *s)
         goto fail;
     }
 
-    /*
-     * If rp was opened we must clean up the thread before
-     * cleaning everything else up (since if there are no failures
-     * it will wait for the destination to send it's status in
-     * a SHUT command).
-     */
-    if (s->rp_state.rp_thread_created) {
-        int rp_error;
-        trace_migration_return_path_end_before();
-        rp_error = await_return_path_close_on_source(s);
-        trace_migration_return_path_end_after(rp_error);
-        if (rp_error) {
-            goto fail;
-        }
+    if (close_return_path_on_source(s)) {
+        goto fail;
     }
 
     if (qemu_file_get_error(s->to_dst_file)) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] migration: Replace the return path retry logic
  2023-08-02 14:36 [PATCH v2 0/2] Fix segfault on migration return path Fabiano Rosas
  2023-08-02 14:36 ` [PATCH v2 1/2] migration: Split await_return_path_close_on_source Fabiano Rosas
@ 2023-08-02 14:36 ` Fabiano Rosas
  2023-08-02 16:02   ` Peter Xu
  1 sibling, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2023-08-02 14:36 UTC (permalink / raw)
  To: qemu-devel; +Cc: Juan Quintela, Peter Xu, Wei Wang, Leonardo Bras

Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.

Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.

There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:

Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
154         return f->last_error;

(gdb) bt
 #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
 #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
 #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
 #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
 #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
 #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration                 | qmp                         | return path
--------------------------+-----------------------------+---------------------------------
			    qmp_migrate_pause()
			     shutdown(ms->to_dst_file)
			      f->last_error = -EIO
migrate_detect_error()
 postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
			    qmp_migrate(resume)
			    migrate_fd_connect()
			     resume = state == PAUSED
			     open_return_path <-- TOO SOON!
			     set_state(RECOVER)
			     post(postcopy_pause_sem)
							(incoming closes to_src_file)
							res = qemu_file_get_error(rp)
							migration_release_dst_files()
							ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)
							postcopy_pause_return_path_thread()
							  wait(postcopy_pause_rp_sem)
							rp = ms->rp_state.from_dst_file
							goto retry
							qemu_file_get_error(rp)
							SIGSEGV
-------------------------------------------------------------------------------------------

We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().

Move the retry logic to outside the thread by having
open_return_path_on_source() wait for the thread to finish before
creating a new one with the updated 'from_dst_file'.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c  | 64 +++++++++++-------------------------------
 migration/migration.h  |  1 -
 migration/trace-events |  1 +
 3 files changed, 17 insertions(+), 49 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 58f09275a8..1356269122 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1764,18 +1764,6 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
     }
 }
 
-/* Return true to retry, false to quit */
-static bool postcopy_pause_return_path_thread(MigrationState *s)
-{
-    trace_postcopy_pause_return_path();
-
-    qemu_sem_wait(&s->postcopy_pause_rp_sem);
-
-    trace_postcopy_pause_return_path_continued();
-
-    return true;
-}
-
 static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
 {
     RAMBlock *block = qemu_ram_block_by_name(block_name);
@@ -1859,7 +1847,6 @@ static void *source_return_path_thread(void *opaque)
     trace_source_return_path_thread_entry();
     rcu_register_thread();
 
-retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
            migration_is_setup_or_active(ms->state)) {
         trace_source_return_path_thread_loop_top();
@@ -1981,28 +1968,18 @@ retry:
     }
 
 out:
-    res = qemu_file_get_error(rp);
-    if (res) {
-        if (res && migration_in_postcopy()) {
+    if (qemu_file_get_error(rp)) {
+        if (migration_in_postcopy()) {
             /*
-             * Maybe there is something we can do: it looks like a
-             * network down issue, and we pause for a recovery.
+             * This could be a network issue that would have been
+             * detected by the main migration thread and caused the
+             * migration to pause. Do cleanup and finish.
              */
-            migration_release_dst_files(ms);
-            rp = NULL;
-            if (postcopy_pause_return_path_thread(ms)) {
-                /*
-                 * Reload rp, reset the rest.  Referencing it is safe since
-                 * it's reset only by us above, or when migration completes
-                 */
-                rp = ms->rp_state.from_dst_file;
-                ms->rp_state.error = false;
-                goto retry;
-            }
+            ms->rp_state.error = false;
+        } else {
+            trace_source_return_path_thread_bad_end();
+            mark_source_rp_bad(ms);
         }
-
-        trace_source_return_path_thread_bad_end();
-        mark_source_rp_bad(ms);
     }
 
     trace_source_return_path_thread_end();
@@ -2011,8 +1988,7 @@ out:
     return NULL;
 }
 
-static int open_return_path_on_source(MigrationState *ms,
-                                      bool create_thread)
+static int open_return_path_on_source(MigrationState *ms)
 {
     ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
     if (!ms->rp_state.from_dst_file) {
@@ -2021,11 +1997,6 @@ static int open_return_path_on_source(MigrationState *ms,
 
     trace_open_return_path_on_source();
 
-    if (!create_thread) {
-        /* We're done */
-        return 0;
-    }
-
     qemu_thread_create(&ms->rp_state.rp_thread, "return path",
                        source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
     ms->rp_state.rp_thread_created = true;
@@ -2549,6 +2520,11 @@ static MigThrError postcopy_pause(MigrationState *s)
         qemu_file_shutdown(file);
         qemu_fclose(file);
 
+        if (await_return_path_close_on_source(s)) {
+            trace_migration_return_path_pause_err();
+            return MIG_THR_ERR_FATAL;
+        }
+
         migrate_set_state(&s->state, s->state,
                           MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -2566,12 +2542,6 @@ static MigThrError postcopy_pause(MigrationState *s)
         if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
             /* Woken up by a recover procedure. Give it a shot */
 
-            /*
-             * Firstly, let's wake up the return path now, with a new
-             * return path channel.
-             */
-            qemu_sem_post(&s->postcopy_pause_rp_sem);
-
             /* Do the resume logic */
             if (postcopy_do_resume(s) == 0) {
                 /* Let's continue! */
@@ -3259,7 +3229,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
      * QEMU uses the return path.
      */
     if (migrate_postcopy_ram() || migrate_return_path()) {
-        if (open_return_path_on_source(s, !resume)) {
+        if (open_return_path_on_source(s)) {
             error_report("Unable to open return-path for postcopy");
             migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
             migrate_fd_cleanup(s);
@@ -3320,7 +3290,6 @@ static void migration_instance_finalize(Object *obj)
     qemu_sem_destroy(&ms->rate_limit_sem);
     qemu_sem_destroy(&ms->pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_sem);
-    qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
     qemu_sem_destroy(&ms->rp_state.rp_sem);
     qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
@@ -3340,7 +3309,6 @@ static void migration_instance_init(Object *obj)
     migrate_params_init(&ms->parameters);
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
-    qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_pong_acks, 0);
     qemu_sem_init(&ms->rate_limit_sem, 0);
diff --git a/migration/migration.h b/migration/migration.h
index b7c8b67542..e78db5361c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -382,7 +382,6 @@ struct MigrationState {
 
     /* Needed by postcopy-pause state */
     QemuSemaphore postcopy_pause_sem;
-    QemuSemaphore postcopy_pause_rp_sem;
     /*
      * Whether we abort the migration if decompression errors are
      * detected at the destination. It is left at false for qemu
diff --git a/migration/trace-events b/migration/trace-events
index 5259c1044b..19ec649d1d 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -164,6 +164,7 @@ migration_rate_limit_pre(int ms) "%d ms"
 migration_rate_limit_post(int urgent) "urgent: %d"
 migration_return_path_end_before(void) ""
 migration_return_path_end_after(int rp_error) "%d"
+migration_return_path_pause_err(void) ""
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] migration: Replace the return path retry logic
  2023-08-02 14:36 ` [PATCH v2 2/2] migration: Replace the return path retry logic Fabiano Rosas
@ 2023-08-02 16:02   ` Peter Xu
  2023-08-02 20:04     ` Fabiano Rosas
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2023-08-02 16:02 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Wed, Aug 02, 2023 at 11:36:44AM -0300, Fabiano Rosas wrote:
> Replace the return path retry logic with finishing and restarting the
> thread. This fixes a race when resuming the migration that leads to a
> segfault.
> 
> Currently when doing postcopy we consider that an IO error on the
> return path file could be due to a network intermittency. We then keep
> the thread alive but have it do cleanup of the 'from_dst_file' and
> wait on the 'postcopy_pause_rp' semaphore. When the user issues a
> migrate resume, a new return path is opened and the thread is allowed
> to continue.
> 
> There's a race condition in the above mechanism. It is possible for
> the new return path file to be setup *before* the cleanup code in the
> return path thread has had a chance to run, leading to the *new* file
> being closed and the pointer set to NULL. When the thread is released
> after the resume, it tries to dereference 'from_dst_file' and crashes:
> 
> Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
> 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
> 154         return f->last_error;
> 
> (gdb) bt
>  #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
>  #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
>  #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
>  #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
>  #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
>  #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> Here's the race (important bit is open_return_path happening before
> migration_release_dst_files):
> 
> migration                 | qmp                         | return path
> --------------------------+-----------------------------+---------------------------------
> 			    qmp_migrate_pause()
> 			     shutdown(ms->to_dst_file)
> 			      f->last_error = -EIO
> migrate_detect_error()
>  postcopy_pause()
>   set_state(PAUSED)
>   wait(postcopy_pause_sem)
> 			    qmp_migrate(resume)
> 			    migrate_fd_connect()
> 			     resume = state == PAUSED
> 			     open_return_path <-- TOO SOON!
> 			     set_state(RECOVER)
> 			     post(postcopy_pause_sem)
> 							(incoming closes to_src_file)
> 							res = qemu_file_get_error(rp)
> 							migration_release_dst_files()
> 							ms->rp_state.from_dst_file = NULL
>   post(postcopy_pause_rp_sem)
> 							postcopy_pause_return_path_thread()
> 							  wait(postcopy_pause_rp_sem)
> 							rp = ms->rp_state.from_dst_file
> 							goto retry
> 							qemu_file_get_error(rp)
> 							SIGSEGV
> -------------------------------------------------------------------------------------------
> 
> We can keep the retry logic without having the thread alive and
> waiting. The only piece of data used by it is the 'from_dst_file' and
> it is only allowed to proceed after a migrate resume is issued and the
> semaphore released at migrate_fd_connect().
> 
> Move the retry logic to outside the thread by having
> open_return_path_on_source() wait for the thread to finish before
> creating a new one with the updated 'from_dst_file'.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/migration.c  | 64 +++++++++++-------------------------------
>  migration/migration.h  |  1 -
>  migration/trace-events |  1 +
>  3 files changed, 17 insertions(+), 49 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 58f09275a8..1356269122 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1764,18 +1764,6 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
>      }
>  }
>  
> -/* Return true to retry, false to quit */
> -static bool postcopy_pause_return_path_thread(MigrationState *s)
> -{
> -    trace_postcopy_pause_return_path();
> -
> -    qemu_sem_wait(&s->postcopy_pause_rp_sem);
> -
> -    trace_postcopy_pause_return_path_continued();
> -
> -    return true;
> -}
> -
>  static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
>  {
>      RAMBlock *block = qemu_ram_block_by_name(block_name);
> @@ -1859,7 +1847,6 @@ static void *source_return_path_thread(void *opaque)
>      trace_source_return_path_thread_entry();
>      rcu_register_thread();
>  
> -retry:
>      while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
>             migration_is_setup_or_active(ms->state)) {
>          trace_source_return_path_thread_loop_top();
> @@ -1981,28 +1968,18 @@ retry:
>      }
>  
>  out:
> -    res = qemu_file_get_error(rp);
> -    if (res) {
> -        if (res && migration_in_postcopy()) {
> +    if (qemu_file_get_error(rp)) {
> +        if (migration_in_postcopy()) {
>              /*
> -             * Maybe there is something we can do: it looks like a
> -             * network down issue, and we pause for a recovery.
> +             * This could be a network issue that would have been
> +             * detected by the main migration thread and caused the
> +             * migration to pause. Do cleanup and finish.
>               */
> -            migration_release_dst_files(ms);
> -            rp = NULL;
> -            if (postcopy_pause_return_path_thread(ms)) {
> -                /*
> -                 * Reload rp, reset the rest.  Referencing it is safe since
> -                 * it's reset only by us above, or when migration completes
> -                 */
> -                rp = ms->rp_state.from_dst_file;
> -                ms->rp_state.error = false;
> -                goto retry;
> -            }
> +            ms->rp_state.error = false;

Logically we should reflect an error here after the thread quited.  I think
you cleared it for the next resume which also makes sense, but would it be
better to reset it when creating the rp return thread always?

I noticed this because..

> +        } else {
> +            trace_source_return_path_thread_bad_end();
> +            mark_source_rp_bad(ms);
>          }
> -
> -        trace_source_return_path_thread_bad_end();
> -        mark_source_rp_bad(ms);
>      }
>  
>      trace_source_return_path_thread_end();
> @@ -2011,8 +1988,7 @@ out:
>      return NULL;
>  }
>  
> -static int open_return_path_on_source(MigrationState *ms,
> -                                      bool create_thread)
> +static int open_return_path_on_source(MigrationState *ms)
>  {
>      ms->rp_state.from_dst_file = qemu_file_get_return_path(ms->to_dst_file);
>      if (!ms->rp_state.from_dst_file) {
> @@ -2021,11 +1997,6 @@ static int open_return_path_on_source(MigrationState *ms,
>  
>      trace_open_return_path_on_source();
>  
> -    if (!create_thread) {
> -        /* We're done */
> -        return 0;
> -    }
> -
>      qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>                         source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
>      ms->rp_state.rp_thread_created = true;
> @@ -2549,6 +2520,11 @@ static MigThrError postcopy_pause(MigrationState *s)
>          qemu_file_shutdown(file);
>          qemu_fclose(file);
>  
> +        if (await_return_path_close_on_source(s)) {
> +            trace_migration_return_path_pause_err();
> +            return MIG_THR_ERR_FATAL;
> +        }

I see that here on return path failures we'll bail out, and actually it's
against the instinction (that when pause it should have failed, so it's
weird why it's returning 0).

So how about above suggestion, plus here we just call
await_return_path_close_on_source(), without caring about the retval?

> +
>          migrate_set_state(&s->state, s->state,
>                            MIGRATION_STATUS_POSTCOPY_PAUSED);
>  
> @@ -2566,12 +2542,6 @@ static MigThrError postcopy_pause(MigrationState *s)
>          if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
>              /* Woken up by a recover procedure. Give it a shot */
>  
> -            /*
> -             * Firstly, let's wake up the return path now, with a new
> -             * return path channel.
> -             */
> -            qemu_sem_post(&s->postcopy_pause_rp_sem);
> -
>              /* Do the resume logic */
>              if (postcopy_do_resume(s) == 0) {
>                  /* Let's continue! */
> @@ -3259,7 +3229,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>       * QEMU uses the return path.
>       */
>      if (migrate_postcopy_ram() || migrate_return_path()) {
> -        if (open_return_path_on_source(s, !resume)) {
> +        if (open_return_path_on_source(s)) {
>              error_report("Unable to open return-path for postcopy");
>              migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>              migrate_fd_cleanup(s);
> @@ -3320,7 +3290,6 @@ static void migration_instance_finalize(Object *obj)
>      qemu_sem_destroy(&ms->rate_limit_sem);
>      qemu_sem_destroy(&ms->pause_sem);
>      qemu_sem_destroy(&ms->postcopy_pause_sem);
> -    qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
>      qemu_sem_destroy(&ms->rp_state.rp_sem);
>      qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
>      qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
> @@ -3340,7 +3309,6 @@ static void migration_instance_init(Object *obj)
>      migrate_params_init(&ms->parameters);
>  
>      qemu_sem_init(&ms->postcopy_pause_sem, 0);
> -    qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
>      qemu_sem_init(&ms->rp_state.rp_sem, 0);
>      qemu_sem_init(&ms->rp_state.rp_pong_acks, 0);
>      qemu_sem_init(&ms->rate_limit_sem, 0);
> diff --git a/migration/migration.h b/migration/migration.h
> index b7c8b67542..e78db5361c 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -382,7 +382,6 @@ struct MigrationState {
>  
>      /* Needed by postcopy-pause state */
>      QemuSemaphore postcopy_pause_sem;
> -    QemuSemaphore postcopy_pause_rp_sem;
>      /*
>       * Whether we abort the migration if decompression errors are
>       * detected at the destination. It is left at false for qemu
> diff --git a/migration/trace-events b/migration/trace-events
> index 5259c1044b..19ec649d1d 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -164,6 +164,7 @@ migration_rate_limit_pre(int ms) "%d ms"
>  migration_rate_limit_post(int urgent) "urgent: %d"
>  migration_return_path_end_before(void) ""
>  migration_return_path_end_after(int rp_error) "%d"
> +migration_return_path_pause_err(void) ""

If it should never trigger, it shouldn't need a tracepoint.  It needs an
assertion if we're 100% confident, or error_report_once() perhaps would be
more suitable.

Thanks,

>  migration_thread_after_loop(void) ""
>  migration_thread_file_err(void) ""
>  migration_thread_setup_complete(void) ""
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
  2023-08-02 14:36 ` [PATCH v2 1/2] migration: Split await_return_path_close_on_source Fabiano Rosas
@ 2023-08-02 16:19   ` Peter Xu
  2023-08-02 19:58     ` Fabiano Rosas
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2023-08-02 16:19 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> This function currently has a straight-forward part which is waiting
> for the thread to join and a complicated part which is doing a
> qemu_file_shutdown() on the return path file.
> 
> The shutdown is tricky because all calls to qemu_file_shutdown() set
> f->last_error to -EIO, which means we can never know if an error is an
> actual error or if we cleanly shutdown the file previously.
> 
> This is particularly bothersome for postcopy because it would send the
> return path thread into the retry routine which would wait on the
> postcopy_pause_rp_sem and consequently block the main thread. We
> haven't had reports of this so I must presume we never reach here with
> postcopy.
> 
> The shutdown call is also racy because since it doesn't take the
> qemu_file_lock, it could NULL-dereference if the return path thread
> happens to be in the middle of the critical region at
> migration_release_dst_files().

After you rework the thread model on resume, shall we move
migration_release_dst_files() into the migration thread to be after the
pthread_join()?  I assume then we don't even need a mutex to protect it?

> 
> Move this more complicated part of the code to a separate routine so
> we can wait on the thread without all of this baggage.

I think you mentioned "some nuance" on having mark_source_rp_bad() in
await_return_path_close_on_source(), I did remember I tried to look into
that "nuance" too a long time ago but I just forgot what was that.  Great
if you can share some details.

> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/migration.c | 46 +++++++++++++++++++++++++------------------
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 91bba630a8..58f09275a8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2038,6 +2038,25 @@ static int open_return_path_on_source(MigrationState *ms,
>  /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
>  static int await_return_path_close_on_source(MigrationState *ms)
>  {
> +    if (!ms->rp_state.rp_thread_created) {
> +        return 0;
> +    }
> +
> +    trace_await_return_path_close_on_source_joining();
> +    qemu_thread_join(&ms->rp_state.rp_thread);
> +    ms->rp_state.rp_thread_created = false;
> +    trace_await_return_path_close_on_source_close();
> +    return ms->rp_state.error;
> +}
> +
> +static int close_return_path_on_source(MigrationState *ms)
> +{
> +    int ret;
> +
> +    if (!ms->rp_state.rp_thread_created) {
> +        return 0;
> +    }

Can we still rely on the await_return_path_close_on_source() check, so as
to dedup this one?

> +
>      /*
>       * If this is a normal exit then the destination will send a SHUT and the
>       * rp_thread will exit, however if there's an error we need to cause
> @@ -2051,11 +2070,12 @@ static int await_return_path_close_on_source(MigrationState *ms)
>          qemu_file_shutdown(ms->rp_state.from_dst_file);
>          mark_source_rp_bad(ms);
>      }
> -    trace_await_return_path_close_on_source_joining();
> -    qemu_thread_join(&ms->rp_state.rp_thread);
> -    ms->rp_state.rp_thread_created = false;
> -    trace_await_return_path_close_on_source_close();
> -    return ms->rp_state.error;
> +
> +    trace_migration_return_path_end_before();
> +    ret = await_return_path_close_on_source(ms);
> +    trace_migration_return_path_end_after(ret);
> +
> +    return ret;
>  }
>  
>  static inline void
> @@ -2351,20 +2371,8 @@ static void migration_completion(MigrationState *s)
>          goto fail;
>      }
>  
> -    /*
> -     * If rp was opened we must clean up the thread before
> -     * cleaning everything else up (since if there are no failures
> -     * it will wait for the destination to send it's status in
> -     * a SHUT command).
> -     */
> -    if (s->rp_state.rp_thread_created) {
> -        int rp_error;
> -        trace_migration_return_path_end_before();
> -        rp_error = await_return_path_close_on_source(s);
> -        trace_migration_return_path_end_after(rp_error);
> -        if (rp_error) {
> -            goto fail;
> -        }
> +    if (close_return_path_on_source(s)) {
> +        goto fail;
>      }
>  
>      if (qemu_file_get_error(s->to_dst_file)) {
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
  2023-08-02 16:19   ` Peter Xu
@ 2023-08-02 19:58     ` Fabiano Rosas
  2023-08-02 20:40       ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2023-08-02 19:58 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

Peter Xu <peterx@redhat.com> writes:

> On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
>> This function currently has a straight-forward part which is waiting
>> for the thread to join and a complicated part which is doing a
>> qemu_file_shutdown() on the return path file.
>> 
>> The shutdown is tricky because all calls to qemu_file_shutdown() set
>> f->last_error to -EIO, which means we can never know if an error is an
>> actual error or if we cleanly shutdown the file previously.
>> 
>> This is particularly bothersome for postcopy because it would send the
>> return path thread into the retry routine which would wait on the
>> postcopy_pause_rp_sem and consequently block the main thread. We
>> haven't had reports of this so I must presume we never reach here with
>> postcopy.
>> 
>> The shutdown call is also racy because since it doesn't take the
>> qemu_file_lock, it could NULL-dereference if the return path thread
>> happens to be in the middle of the critical region at
>> migration_release_dst_files().
>
> After you rework the thread model on resume, shall we move
> migration_release_dst_files() into the migration thread to be after the
> pthread_join()?  I assume then we don't even need a mutex to protect it?
>

I just need to figure out if it's ok to move the postcopy_qemufile_src
cleanup along. No idea why it is there in the first place. I see you
moved it from postcopy_pause and we're about to move it back to the
exact same place =D

>> 
>> Move this more complicated part of the code to a separate routine so
>> we can wait on the thread without all of this baggage.
>
> I think you mentioned "some nuance" on having mark_source_rp_bad() in
> await_return_path_close_on_source(), I did remember I tried to look into
> that "nuance" too a long time ago but I just forgot what was that.  Great
> if you can share some details.
>

Well, mark_source_rp_bad() at await_return_path_close_on_source() is
basically useless:

- We only call mark_source_rp_bad() if s->to_dst_file has an error and the
  migration_completion() already checks that condition and fails the
  migration anyway.

- If to_dst_file has an error, chances are the destination already did
  cleanup by this point, so from_dst_file would already have an errno,
  due to that. At qemu_fill_buffer(), the len == 0 case basically means
  "the other end finished cleanly". We still set -EIO in that case, I
  don't know why. Possibly because not all backends will have the same
  semantics for len == 0.

- If the above doesn't happen, then due to the shutdown, from_dst_file
  will already have an error again set by qemu_file_shutdown().

Not to mention that mark_source_rp_bad() is in a race with the return
path thread which could clear the error during postcopy retry.

As this patch tries to convey, this whole shutdown routine is weird. We
don't have any documentation explaining when it could happen, so we're
left with wanting to call it always. Except that doesn't work because in
postcopy we'd trigger the retry logic and that hangs, and because of the
"shutdown means -EIO" issue we'd be eating up whatever error happened
before (it's all -EIO, so there's no way to tell them apart).

Given all of that, I thought just moving this aside would be better for
the time being than to try to rationalise all of this. This series fixes
a reproducible bug while everything I said above is just code inspection
and some artificial testing of mine.

>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/migration.c | 46 +++++++++++++++++++++++++------------------
>>  1 file changed, 27 insertions(+), 19 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 91bba630a8..58f09275a8 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2038,6 +2038,25 @@ static int open_return_path_on_source(MigrationState *ms,
>>  /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
>>  static int await_return_path_close_on_source(MigrationState *ms)
>>  {
>> +    if (!ms->rp_state.rp_thread_created) {
>> +        return 0;
>> +    }
>> +
>> +    trace_await_return_path_close_on_source_joining();
>> +    qemu_thread_join(&ms->rp_state.rp_thread);
>> +    ms->rp_state.rp_thread_created = false;
>> +    trace_await_return_path_close_on_source_close();
>> +    return ms->rp_state.error;
>> +}
>> +
>> +static int close_return_path_on_source(MigrationState *ms)
>> +{
>> +    int ret;
>> +
>> +    if (!ms->rp_state.rp_thread_created) {
>> +        return 0;
>> +    }
>
> Can we still rely on the await_return_path_close_on_source() check, so as
> to dedup this one?
>

I guess we could, the tracepoints would be off though. We'd print

    trace_migration_return_path_end_before();
    trace_migration_return_path_end_end();

with "nothing" in between. I'll try to think of something.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] migration: Replace the return path retry logic
  2023-08-02 16:02   ` Peter Xu
@ 2023-08-02 20:04     ` Fabiano Rosas
  2023-08-02 20:44       ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2023-08-02 20:04 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

Peter Xu <peterx@redhat.com> writes:

>> +        if (await_return_path_close_on_source(s)) {
>> +            trace_migration_return_path_pause_err();
>> +            return MIG_THR_ERR_FATAL;
>> +        }
>
> I see that here on return path failures we'll bail out, and actually it's
> against the instinction (that when pause it should have failed, so it's
> weird why it's returning 0).
>
> So how about above suggestion, plus here we just call
> await_return_path_close_on_source(), without caring about the retval?

So you are suggesting to remove the knowledge of the retry entirely from
the thread. It just reports the error and the postcopy_pause takes the
responsibility of ignoring it when we want to retry... It could be
clearer that way indeed.

>> +
>>          migrate_set_state(&s->state, s->state,
>>                            MIGRATION_STATUS_POSTCOPY_PAUSED);
>>  
>> @@ -2566,12 +2542,6 @@ static MigThrError postcopy_pause(MigrationState *s)
>>          if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
>>              /* Woken up by a recover procedure. Give it a shot */
>>  
>> -            /*
>> -             * Firstly, let's wake up the return path now, with a new
>> -             * return path channel.
>> -             */
>> -            qemu_sem_post(&s->postcopy_pause_rp_sem);
>> -
>>              /* Do the resume logic */
>>              if (postcopy_do_resume(s) == 0) {
>>                  /* Let's continue! */
>> @@ -3259,7 +3229,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>       * QEMU uses the return path.
>>       */
>>      if (migrate_postcopy_ram() || migrate_return_path()) {
>> -        if (open_return_path_on_source(s, !resume)) {
>> +        if (open_return_path_on_source(s)) {
>>              error_report("Unable to open return-path for postcopy");
>>              migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>>              migrate_fd_cleanup(s);
>> @@ -3320,7 +3290,6 @@ static void migration_instance_finalize(Object *obj)
>>      qemu_sem_destroy(&ms->rate_limit_sem);
>>      qemu_sem_destroy(&ms->pause_sem);
>>      qemu_sem_destroy(&ms->postcopy_pause_sem);
>> -    qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
>>      qemu_sem_destroy(&ms->rp_state.rp_sem);
>>      qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
>>      qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
>> @@ -3340,7 +3309,6 @@ static void migration_instance_init(Object *obj)
>>      migrate_params_init(&ms->parameters);
>>  
>>      qemu_sem_init(&ms->postcopy_pause_sem, 0);
>> -    qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
>>      qemu_sem_init(&ms->rp_state.rp_sem, 0);
>>      qemu_sem_init(&ms->rp_state.rp_pong_acks, 0);
>>      qemu_sem_init(&ms->rate_limit_sem, 0);
>> diff --git a/migration/migration.h b/migration/migration.h
>> index b7c8b67542..e78db5361c 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -382,7 +382,6 @@ struct MigrationState {
>>  
>>      /* Needed by postcopy-pause state */
>>      QemuSemaphore postcopy_pause_sem;
>> -    QemuSemaphore postcopy_pause_rp_sem;
>>      /*
>>       * Whether we abort the migration if decompression errors are
>>       * detected at the destination. It is left at false for qemu
>> diff --git a/migration/trace-events b/migration/trace-events
>> index 5259c1044b..19ec649d1d 100644
>> --- a/migration/trace-events
>> +++ b/migration/trace-events
>> @@ -164,6 +164,7 @@ migration_rate_limit_pre(int ms) "%d ms"
>>  migration_rate_limit_post(int urgent) "urgent: %d"
>>  migration_return_path_end_before(void) ""
>>  migration_return_path_end_after(int rp_error) "%d"
>> +migration_return_path_pause_err(void) ""
>
> If it should never trigger, it shouldn't need a tracepoint.  It needs an
> assertion if we're 100% confident, or error_report_once() perhaps would be
> more suitable.

It would trigger when a rp error happened that wasn't related to the
QEMUFile. If we go with your suggestion above, then this goes away.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
  2023-08-02 19:58     ` Fabiano Rosas
@ 2023-08-02 20:40       ` Peter Xu
  2023-08-03 14:45         ` Fabiano Rosas
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2023-08-02 20:40 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> >> This function currently has a straight-forward part which is waiting
> >> for the thread to join and a complicated part which is doing a
> >> qemu_file_shutdown() on the return path file.
> >> 
> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
> >> f->last_error to -EIO, which means we can never know if an error is an
> >> actual error or if we cleanly shutdown the file previously.
> >> 
> >> This is particularly bothersome for postcopy because it would send the
> >> return path thread into the retry routine which would wait on the
> >> postcopy_pause_rp_sem and consequently block the main thread. We
> >> haven't had reports of this so I must presume we never reach here with
> >> postcopy.
> >> 
> >> The shutdown call is also racy because since it doesn't take the
> >> qemu_file_lock, it could NULL-dereference if the return path thread
> >> happens to be in the middle of the critical region at
> >> migration_release_dst_files().
> >
> > After you rework the thread model on resume, shall we move
> > migration_release_dst_files() into the migration thread to be after the
> > pthread_join()?  I assume then we don't even need a mutex to protect it?
> >
> 
> I just need to figure out if it's ok to move the postcopy_qemufile_src
> cleanup along. No idea why it is there in the first place. I see you
> moved it from postcopy_pause and we're about to move it back to the
> exact same place =D

It was there because the old postcopy-preempt was sending data via
postcopy_qemufile_src from the migration thread, while postcopy_pause is
also the migration thread context.

Then we had 9358982744 ("migration: Send requested page directly in
rp-return thread") where we moved that "send page" operation into the
return path thread to reduce latencies.  After moving there it also means
the file handle can be accessed in >1 threads, so I just moved it over to
operate that always in the return path thread, then no race should happen.

With your change, return path will vanish before migration thread accesses
it later (so as mentioned above, it must be after pthread_join()
succeeded), then I assume it'll be fine too to have it back in migration
thread.

Or perhaps just take the file lock?

> 
> >> 
> >> Move this more complicated part of the code to a separate routine so
> >> we can wait on the thread without all of this baggage.
> >
> > I think you mentioned "some nuance" on having mark_source_rp_bad() in
> > await_return_path_close_on_source(), I did remember I tried to look into
> > that "nuance" too a long time ago but I just forgot what was that.  Great
> > if you can share some details.
> >
> 
> Well, mark_source_rp_bad() at await_return_path_close_on_source() is
> basically useless:
> 
> - We only call mark_source_rp_bad() if s->to_dst_file has an error and the
>   migration_completion() already checks that condition and fails the
>   migration anyway.
> 
> - If to_dst_file has an error, chances are the destination already did
>   cleanup by this point, so from_dst_file would already have an errno,
>   due to that. At qemu_fill_buffer(), the len == 0 case basically means
>   "the other end finished cleanly". We still set -EIO in that case, I
>   don't know why. Possibly because not all backends will have the same
>   semantics for len == 0.

I don't know either, but I checked it's from a555b8092a ("qemu-file: Don't
do IO after shutdown").  Maybe there's better way to do so we could
identify the difference, but yes can be for later.

> 
> - If the above doesn't happen, then due to the shutdown, from_dst_file
>   will already have an error again set by qemu_file_shutdown().
> 
> Not to mention that mark_source_rp_bad() is in a race with the return
> path thread which could clear the error during postcopy retry.
> 
> As this patch tries to convey, this whole shutdown routine is weird. We
> don't have any documentation explaining when it could happen, so we're
> left with wanting to call it always. Except that doesn't work because in
> postcopy we'd trigger the retry logic and that hangs, and because of the
> "shutdown means -EIO" issue we'd be eating up whatever error happened
> before (it's all -EIO, so there's no way to tell them apart).
> 
> Given all of that, I thought just moving this aside would be better for
> the time being than to try to rationalise all of this. This series fixes
> a reproducible bug while everything I said above is just code inspection
> and some artificial testing of mine.

Yeah we can leave that for later.  Actually my other series removed that (I
totally forgot it, until I just noticed and checked).  But that'll conflict
with yours.  I think this series should in earlier if it fixes a possible
race, so I'll just rebase when it lands.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] migration: Replace the return path retry logic
  2023-08-02 20:04     ` Fabiano Rosas
@ 2023-08-02 20:44       ` Peter Xu
  2023-08-03 15:00         ` Fabiano Rosas
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2023-08-02 20:44 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Wed, Aug 02, 2023 at 05:04:45PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> >> +        if (await_return_path_close_on_source(s)) {
> >> +            trace_migration_return_path_pause_err();
> >> +            return MIG_THR_ERR_FATAL;
> >> +        }
> >
> > I see that here on return path failures we'll bail out, and actually it's
> > against the instinction (that when pause it should have failed, so it's
> > weird why it's returning 0).
> >
> > So how about above suggestion, plus here we just call
> > await_return_path_close_on_source(), without caring about the retval?
> 
> So you are suggesting to remove the knowledge of the retry entirely from
> the thread. It just reports the error and the postcopy_pause takes the
> responsibility of ignoring it when we want to retry... It could be
> clearer that way indeed.

That error doesn't really important IMHO here, because the to-dst-file
should have already errored out anyway.

I just think it cleaner if we reset rp_error only until the new thread
created.

> 
> >> +
> >>          migrate_set_state(&s->state, s->state,
> >>                            MIGRATION_STATUS_POSTCOPY_PAUSED);
> >>  
> >> @@ -2566,12 +2542,6 @@ static MigThrError postcopy_pause(MigrationState *s)
> >>          if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
> >>              /* Woken up by a recover procedure. Give it a shot */
> >>  
> >> -            /*
> >> -             * Firstly, let's wake up the return path now, with a new
> >> -             * return path channel.
> >> -             */
> >> -            qemu_sem_post(&s->postcopy_pause_rp_sem);
> >> -
> >>              /* Do the resume logic */
> >>              if (postcopy_do_resume(s) == 0) {
> >>                  /* Let's continue! */
> >> @@ -3259,7 +3229,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
> >>       * QEMU uses the return path.
> >>       */
> >>      if (migrate_postcopy_ram() || migrate_return_path()) {
> >> -        if (open_return_path_on_source(s, !resume)) {
> >> +        if (open_return_path_on_source(s)) {
> >>              error_report("Unable to open return-path for postcopy");
> >>              migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
> >>              migrate_fd_cleanup(s);
> >> @@ -3320,7 +3290,6 @@ static void migration_instance_finalize(Object *obj)
> >>      qemu_sem_destroy(&ms->rate_limit_sem);
> >>      qemu_sem_destroy(&ms->pause_sem);
> >>      qemu_sem_destroy(&ms->postcopy_pause_sem);
> >> -    qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
> >>      qemu_sem_destroy(&ms->rp_state.rp_sem);
> >>      qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
> >>      qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
> >> @@ -3340,7 +3309,6 @@ static void migration_instance_init(Object *obj)
> >>      migrate_params_init(&ms->parameters);
> >>  
> >>      qemu_sem_init(&ms->postcopy_pause_sem, 0);
> >> -    qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
> >>      qemu_sem_init(&ms->rp_state.rp_sem, 0);
> >>      qemu_sem_init(&ms->rp_state.rp_pong_acks, 0);
> >>      qemu_sem_init(&ms->rate_limit_sem, 0);
> >> diff --git a/migration/migration.h b/migration/migration.h
> >> index b7c8b67542..e78db5361c 100644
> >> --- a/migration/migration.h
> >> +++ b/migration/migration.h
> >> @@ -382,7 +382,6 @@ struct MigrationState {
> >>  
> >>      /* Needed by postcopy-pause state */
> >>      QemuSemaphore postcopy_pause_sem;
> >> -    QemuSemaphore postcopy_pause_rp_sem;
> >>      /*
> >>       * Whether we abort the migration if decompression errors are
> >>       * detected at the destination. It is left at false for qemu
> >> diff --git a/migration/trace-events b/migration/trace-events
> >> index 5259c1044b..19ec649d1d 100644
> >> --- a/migration/trace-events
> >> +++ b/migration/trace-events
> >> @@ -164,6 +164,7 @@ migration_rate_limit_pre(int ms) "%d ms"
> >>  migration_rate_limit_post(int urgent) "urgent: %d"
> >>  migration_return_path_end_before(void) ""
> >>  migration_return_path_end_after(int rp_error) "%d"
> >> +migration_return_path_pause_err(void) ""
> >
> > If it should never trigger, it shouldn't need a tracepoint.  It needs an
> > assertion if we're 100% confident, or error_report_once() perhaps would be
> > more suitable.
> 
> It would trigger when a rp error happened that wasn't related to the
> QEMUFile. If we go with your suggestion above, then this goes away.

With your current patch where rp_error seems to be always reset when thread
quit, if that's true then it'll 100% happen that this will not trigger.

But yeah this is a trivial spot, feel free to choose the best if you plan
to reorganize this patch a bit.  Thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
  2023-08-02 20:40       ` Peter Xu
@ 2023-08-03 14:45         ` Fabiano Rosas
  2023-08-03 15:15           ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Fabiano Rosas @ 2023-08-03 14:45 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

Peter Xu <peterx@redhat.com> writes:

> On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
>> >> This function currently has a straight-forward part which is waiting
>> >> for the thread to join and a complicated part which is doing a
>> >> qemu_file_shutdown() on the return path file.
>> >> 
>> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
>> >> f->last_error to -EIO, which means we can never know if an error is an
>> >> actual error or if we cleanly shutdown the file previously.
>> >> 
>> >> This is particularly bothersome for postcopy because it would send the
>> >> return path thread into the retry routine which would wait on the
>> >> postcopy_pause_rp_sem and consequently block the main thread. We
>> >> haven't had reports of this so I must presume we never reach here with
>> >> postcopy.
>> >> 
>> >> The shutdown call is also racy because since it doesn't take the
>> >> qemu_file_lock, it could NULL-dereference if the return path thread
>> >> happens to be in the middle of the critical region at
>> >> migration_release_dst_files().
>> >
>> > After you rework the thread model on resume, shall we move
>> > migration_release_dst_files() into the migration thread to be after the
>> > pthread_join()?  I assume then we don't even need a mutex to protect it?
>> >
>> 
>> I just need to figure out if it's ok to move the postcopy_qemufile_src
>> cleanup along. No idea why it is there in the first place. I see you
>> moved it from postcopy_pause and we're about to move it back to the
>> exact same place =D
>
> It was there because the old postcopy-preempt was sending data via
> postcopy_qemufile_src from the migration thread, while postcopy_pause is
> also the migration thread context.
>
> Then we had 9358982744 ("migration: Send requested page directly in
> rp-return thread") where we moved that "send page" operation into the
> return path thread to reduce latencies.  After moving there it also means
> the file handle can be accessed in >1 threads, so I just moved it over to
> operate that always in the return path thread, then no race should happen.
>

Thanks for the context.

> With your change, return path will vanish before migration thread accesses
> it later (so as mentioned above, it must be after pthread_join()
> succeeded), then I assume it'll be fine too to have it back in migration
> thread.
>
> Or perhaps just take the file lock?
>

There's also migrate_fd_cleanup and migrate_fd_cancel that can touch
these files. We might need to lock anyway, let's see.

In general I'd like to drop all of these "ok not to lock, because...",
it's too easy for code to change and the assumptions to stop being
true. IMHO it's not worth it to gain performance by not taking a lock
when the data is still shared and there's nothing stopping someone in
the future from accessing it concurrently.

>> 
>> >> 
>> >> Move this more complicated part of the code to a separate routine so
>> >> we can wait on the thread without all of this baggage.
>> >
>> > I think you mentioned "some nuance" on having mark_source_rp_bad() in
>> > await_return_path_close_on_source(), I did remember I tried to look into
>> > that "nuance" too a long time ago but I just forgot what was that.  Great
>> > if you can share some details.
>> >
>> 
>> Well, mark_source_rp_bad() at await_return_path_close_on_source() is
>> basically useless:
>> 
>> - We only call mark_source_rp_bad() if s->to_dst_file has an error and the
>>   migration_completion() already checks that condition and fails the
>>   migration anyway.
>> 
>> - If to_dst_file has an error, chances are the destination already did
>>   cleanup by this point, so from_dst_file would already have an errno,
>>   due to that. At qemu_fill_buffer(), the len == 0 case basically means
>>   "the other end finished cleanly". We still set -EIO in that case, I
>>   don't know why. Possibly because not all backends will have the same
>>   semantics for len == 0.
>
> I don't know either, but I checked it's from a555b8092a ("qemu-file: Don't
> do IO after shutdown").  Maybe there's better way to do so we could
> identify the difference, but yes can be for later.
>

That is not the -EIO I'm talking about, but I'm glad you mentioned it,
because I just noticed we might need to revert ac7d25b816 ("qemu-file:
remove shutdown member").

It did a purely logical change which is to drop the f->shutdown flag,
but that has the side-effect that now we cannot differentiate an orderly
shutdown from an IO error.

I get that perhaps ideally all of the code would be resilient to be able
to handle a shutdown in the same way as an IO error, but I'm not sure we
are prepared for that.


Now, the actual -EIO I was mentioning is this one in qemu-file.c:

static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
{
    ...
    ...
    do {
        len = qio_channel_read(f->ioc,
                               (char *)f->buf + pending,
                               IO_BUF_SIZE - pending,
                               &local_error);
        if (len == QIO_CHANNEL_ERR_BLOCK) {
            if (qemu_in_coroutine()) {
                qio_channel_yield(f->ioc, G_IO_IN);
            } else {
                qio_channel_wait(f->ioc, G_IO_IN);
            }
        } else if (len < 0) {
            len = -EIO;
        }
    } while (len == QIO_CHANNEL_ERR_BLOCK);

    if (len > 0) {
        f->buf_size += len;
        f->total_transferred += len;
    } else if (len == 0) {
        qemu_file_set_error_obj(f, -EIO, local_error); <---- THIS
    } else {
        qemu_file_set_error_obj(f, len, local_error);
    }

    return len;
}

the recvmsg manual says:

  "When a stream socket peer has performed an orderly shutdown, the return
  value will be 0 (the traditional "end-of-file" return)."

So both issues combined put us in a situation where there's never an
"orderly shutdown", everything is an error. I wouldn't say that's wrong,
but it's not clear to me whether we consciously made that design
decision.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] migration: Replace the return path retry logic
  2023-08-02 20:44       ` Peter Xu
@ 2023-08-03 15:00         ` Fabiano Rosas
  0 siblings, 0 replies; 14+ messages in thread
From: Fabiano Rosas @ 2023-08-03 15:00 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

Peter Xu <peterx@redhat.com> writes:

> On Wed, Aug 02, 2023 at 05:04:45PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> >> +        if (await_return_path_close_on_source(s)) {
>> >> +            trace_migration_return_path_pause_err();
>> >> +            return MIG_THR_ERR_FATAL;
>> >> +        }
>> >
>> > I see that here on return path failures we'll bail out, and actually it's
>> > against the instinction (that when pause it should have failed, so it's
>> > weird why it's returning 0).
>> >
>> > So how about above suggestion, plus here we just call
>> > await_return_path_close_on_source(), without caring about the retval?
>> 
>> So you are suggesting to remove the knowledge of the retry entirely from
>> the thread. It just reports the error and the postcopy_pause takes the
>> responsibility of ignoring it when we want to retry... It could be
>> clearer that way indeed.
>
> That error doesn't really important IMHO here, because the to-dst-file
> should have already errored out anyway.
>
> I just think it cleaner if we reset rp_error only until the new thread
> created.
>

ok

>> 
>> It would trigger when a rp error happened that wasn't related to the
>> QEMUFile. If we go with your suggestion above, then this goes away.
>
> With your current patch where rp_error seems to be always reset when thread
> quit, if that's true then it'll 100% happen that this will not trigger.
>
> But yeah this is a trivial spot, feel free to choose the best if you plan
> to reorganize this patch a bit.  Thanks.

My patch just resets the error when doing postcopy and the error is a
QEMUFile error. The header validation at the start of the loop could
still set rp_state.error and return without going through the postcopy
retry:

        if (header_type >= MIG_RP_MSG_MAX ||
            header_type == MIG_RP_MSG_INVALID) {
            error_report("RP: Received invalid message 0x%04x length 0x%04x",
                         header_type, header_len);
            mark_source_rp_bad(ms);
            goto out;
        }


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
  2023-08-03 14:45         ` Fabiano Rosas
@ 2023-08-03 15:15           ` Peter Xu
  2023-08-03 15:24             ` Daniel P. Berrangé
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2023-08-03 15:15 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Thu, Aug 03, 2023 at 11:45:38AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> >> >> This function currently has a straight-forward part which is waiting
> >> >> for the thread to join and a complicated part which is doing a
> >> >> qemu_file_shutdown() on the return path file.
> >> >> 
> >> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
> >> >> f->last_error to -EIO, which means we can never know if an error is an
> >> >> actual error or if we cleanly shutdown the file previously.
> >> >> 
> >> >> This is particularly bothersome for postcopy because it would send the
> >> >> return path thread into the retry routine which would wait on the
> >> >> postcopy_pause_rp_sem and consequently block the main thread. We
> >> >> haven't had reports of this so I must presume we never reach here with
> >> >> postcopy.
> >> >> 
> >> >> The shutdown call is also racy because since it doesn't take the
> >> >> qemu_file_lock, it could NULL-dereference if the return path thread
> >> >> happens to be in the middle of the critical region at
> >> >> migration_release_dst_files().
> >> >
> >> > After you rework the thread model on resume, shall we move
> >> > migration_release_dst_files() into the migration thread to be after the
> >> > pthread_join()?  I assume then we don't even need a mutex to protect it?
> >> >
> >> 
> >> I just need to figure out if it's ok to move the postcopy_qemufile_src
> >> cleanup along. No idea why it is there in the first place. I see you
> >> moved it from postcopy_pause and we're about to move it back to the
> >> exact same place =D
> >
> > It was there because the old postcopy-preempt was sending data via
> > postcopy_qemufile_src from the migration thread, while postcopy_pause is
> > also the migration thread context.
> >
> > Then we had 9358982744 ("migration: Send requested page directly in
> > rp-return thread") where we moved that "send page" operation into the
> > return path thread to reduce latencies.  After moving there it also means
> > the file handle can be accessed in >1 threads, so I just moved it over to
> > operate that always in the return path thread, then no race should happen.
> >
> 
> Thanks for the context.
> 
> > With your change, return path will vanish before migration thread accesses
> > it later (so as mentioned above, it must be after pthread_join()
> > succeeded), then I assume it'll be fine too to have it back in migration
> > thread.
> >
> > Or perhaps just take the file lock?
> >
> 
> There's also migrate_fd_cleanup and migrate_fd_cancel that can touch
> these files. We might need to lock anyway, let's see.

The cancel path shouldn't clear the QEMUFile*, then I assume it's fine.
That's based on the assumption that qemu_file_shutdown() is actually thread
safe (say, shutdown() syscall is thread-safe for sockets).

But yeah that depends on some more knowledge, it'll be good as you said
below to just always take the lock because that shouldn't hurt.

> 
> In general I'd like to drop all of these "ok not to lock, because...",
> it's too easy for code to change and the assumptions to stop being
> true. IMHO it's not worth it to gain performance by not taking a lock
> when the data is still shared and there's nothing stopping someone in
> the future from accessing it concurrently.

Yes I agree with you.

If you want we can move that from rp thread to migration thread, but with
the lock added altogether.  Then rp thread can always guarantee to have the
file there which should still be helpful; sometimes even with a lock there
we still need to take care of when QEMUFile*==NULL, but then not needed to
care for rp thread.

> 
> >> 
> >> >> 
> >> >> Move this more complicated part of the code to a separate routine so
> >> >> we can wait on the thread without all of this baggage.
> >> >
> >> > I think you mentioned "some nuance" on having mark_source_rp_bad() in
> >> > await_return_path_close_on_source(), I did remember I tried to look into
> >> > that "nuance" too a long time ago but I just forgot what was that.  Great
> >> > if you can share some details.
> >> >
> >> 
> >> Well, mark_source_rp_bad() at await_return_path_close_on_source() is
> >> basically useless:
> >> 
> >> - We only call mark_source_rp_bad() if s->to_dst_file has an error and the
> >>   migration_completion() already checks that condition and fails the
> >>   migration anyway.
> >> 
> >> - If to_dst_file has an error, chances are the destination already did
> >>   cleanup by this point, so from_dst_file would already have an errno,
> >>   due to that. At qemu_fill_buffer(), the len == 0 case basically means
> >>   "the other end finished cleanly". We still set -EIO in that case, I
> >>   don't know why. Possibly because not all backends will have the same
> >>   semantics for len == 0.
> >
> > I don't know either, but I checked it's from a555b8092a ("qemu-file: Don't
> > do IO after shutdown").  Maybe there's better way to do so we could
> > identify the difference, but yes can be for later.
> >
> 
> That is not the -EIO I'm talking about, but I'm glad you mentioned it,
> because I just noticed we might need to revert ac7d25b816 ("qemu-file:
> remove shutdown member").
> 
> It did a purely logical change which is to drop the f->shutdown flag,
> but that has the side-effect that now we cannot differentiate an orderly
> shutdown from an IO error.
> 
> I get that perhaps ideally all of the code would be resilient to be able
> to handle a shutdown in the same way as an IO error, but I'm not sure we
> are prepared for that.
> 
> 
> Now, the actual -EIO I was mentioning is this one in qemu-file.c:
> 
> static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
> {
>     ...
>     ...
>     do {
>         len = qio_channel_read(f->ioc,
>                                (char *)f->buf + pending,
>                                IO_BUF_SIZE - pending,
>                                &local_error);
>         if (len == QIO_CHANNEL_ERR_BLOCK) {
>             if (qemu_in_coroutine()) {
>                 qio_channel_yield(f->ioc, G_IO_IN);
>             } else {
>                 qio_channel_wait(f->ioc, G_IO_IN);
>             }
>         } else if (len < 0) {
>             len = -EIO;
>         }
>     } while (len == QIO_CHANNEL_ERR_BLOCK);
> 
>     if (len > 0) {
>         f->buf_size += len;
>         f->total_transferred += len;
>     } else if (len == 0) {
>         qemu_file_set_error_obj(f, -EIO, local_error); <---- THIS
>     } else {
>         qemu_file_set_error_obj(f, len, local_error);
>     }
> 
>     return len;
> }
> 
> the recvmsg manual says:
> 
>   "When a stream socket peer has performed an orderly shutdown, the return
>   value will be 0 (the traditional "end-of-file" return)."
> 
> So both issues combined put us in a situation where there's never an
> "orderly shutdown", everything is an error. I wouldn't say that's wrong,
> but it's not clear to me whether we consciously made that design
> decision.

Here when reaching qemu_fill_buffer() it normally means we expect something
already (it can originates from a qemu_fill_buffer() where we're waiting
for a header byte to come, for example), so _maybe_ it still makes sense to
error out because even if the other side does close() safely, it still
doesn't do it at the right time.

But I also agree this should not be part of a failure in the qemufile layer
(which is the transport only), instead it should be in the upper layer
(migration protocol) that set this error instead.  But I guess we have a
closer bind than expected on these two layers.

The qemufile layer is just not well designed IMHO, starting from
qemu_get_byte() directly return the byte without being able to fail at
all.  IOW one needs to call qemu_get_byte() then qemu_file_get_error() to
know whether the get_byte() is legal.

That gets a bit too far because qemufile is another level of issue, but for
now I think if you have solid idea on reviving f->shutdown and makes
"migration is cancelled" different from any other kinds of errors, feel
free to go; that looks reasonable to me so far.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
  2023-08-03 15:15           ` Peter Xu
@ 2023-08-03 15:24             ` Daniel P. Berrangé
  2023-08-03 15:39               ` Peter Xu
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel P. Berrangé @ 2023-08-03 15:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Thu, Aug 03, 2023 at 11:15:41AM -0400, Peter Xu wrote:
> On Thu, Aug 03, 2023 at 11:45:38AM -0300, Fabiano Rosas wrote:
> > Peter Xu <peterx@redhat.com> writes:
> > 
> > > On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
> > >> Peter Xu <peterx@redhat.com> writes:
> > >> 
> > >> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> > >> >> This function currently has a straight-forward part which is waiting
> > >> >> for the thread to join and a complicated part which is doing a
> > >> >> qemu_file_shutdown() on the return path file.
> > >> >> 
> > >> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
> > >> >> f->last_error to -EIO, which means we can never know if an error is an
> > >> >> actual error or if we cleanly shutdown the file previously.
> > >> >> 
> > >> >> This is particularly bothersome for postcopy because it would send the
> > >> >> return path thread into the retry routine which would wait on the
> > >> >> postcopy_pause_rp_sem and consequently block the main thread. We
> > >> >> haven't had reports of this so I must presume we never reach here with
> > >> >> postcopy.
> > >> >> 
> > >> >> The shutdown call is also racy because since it doesn't take the
> > >> >> qemu_file_lock, it could NULL-dereference if the return path thread
> > >> >> happens to be in the middle of the critical region at
> > >> >> migration_release_dst_files().
> > >> >
> > >> > After you rework the thread model on resume, shall we move
> > >> > migration_release_dst_files() into the migration thread to be after the
> > >> > pthread_join()?  I assume then we don't even need a mutex to protect it?
> > >> >
> > >> 
> > >> I just need to figure out if it's ok to move the postcopy_qemufile_src
> > >> cleanup along. No idea why it is there in the first place. I see you
> > >> moved it from postcopy_pause and we're about to move it back to the
> > >> exact same place =D
> > >
> > > It was there because the old postcopy-preempt was sending data via
> > > postcopy_qemufile_src from the migration thread, while postcopy_pause is
> > > also the migration thread context.
> > >
> > > Then we had 9358982744 ("migration: Send requested page directly in
> > > rp-return thread") where we moved that "send page" operation into the
> > > return path thread to reduce latencies.  After moving there it also means
> > > the file handle can be accessed in >1 threads, so I just moved it over to
> > > operate that always in the return path thread, then no race should happen.
> > >
> > 
> > Thanks for the context.
> > 
> > > With your change, return path will vanish before migration thread accesses
> > > it later (so as mentioned above, it must be after pthread_join()
> > > succeeded), then I assume it'll be fine too to have it back in migration
> > > thread.
> > >
> > > Or perhaps just take the file lock?
> > >
> > 
> > There's also migrate_fd_cleanup and migrate_fd_cancel that can touch
> > these files. We might need to lock anyway, let's see.
> 
> The cancel path shouldn't clear the QEMUFile*, then I assume it's fine.
> That's based on the assumption that qemu_file_shutdown() is actually thread
> safe (say, shutdown() syscall is thread-safe for sockets).

The shutdown() syscall and qio_channel_shutdown() method are intended
to be safe to call from any thread *PROVIDED* you can ensure no other
thread is concurrently going to call close() on the FD (or unref the
QIOChannel object).

There is no locking in qemu_file_shutdown() to guarantee this, but
maybe something else in migration code is guaranteeing that the
QIOChannel object is not going to be closed (or unref'd), while a
thread is invoking qemu_file_shutdown().

IOW, in theory qemu_file_shutdown() could be safe to use but
I'm not seeing a clearly expressed guarantee of safety in the
code. If it is safe, the reasons are very subtle and rationale
ought to be documented in the comment for qemu_file_shutdown


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] migration: Split await_return_path_close_on_source
  2023-08-03 15:24             ` Daniel P. Berrangé
@ 2023-08-03 15:39               ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-08-03 15:39 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Fabiano Rosas, qemu-devel, Juan Quintela, Wei Wang, Leonardo Bras

On Thu, Aug 03, 2023 at 04:24:16PM +0100, Daniel P. Berrangé wrote:
> On Thu, Aug 03, 2023 at 11:15:41AM -0400, Peter Xu wrote:
> > On Thu, Aug 03, 2023 at 11:45:38AM -0300, Fabiano Rosas wrote:
> > > Peter Xu <peterx@redhat.com> writes:
> > > 
> > > > On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
> > > >> Peter Xu <peterx@redhat.com> writes:
> > > >> 
> > > >> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> > > >> >> This function currently has a straight-forward part which is waiting
> > > >> >> for the thread to join and a complicated part which is doing a
> > > >> >> qemu_file_shutdown() on the return path file.
> > > >> >> 
> > > >> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
> > > >> >> f->last_error to -EIO, which means we can never know if an error is an
> > > >> >> actual error or if we cleanly shutdown the file previously.
> > > >> >> 
> > > >> >> This is particularly bothersome for postcopy because it would send the
> > > >> >> return path thread into the retry routine which would wait on the
> > > >> >> postcopy_pause_rp_sem and consequently block the main thread. We
> > > >> >> haven't had reports of this so I must presume we never reach here with
> > > >> >> postcopy.
> > > >> >> 
> > > >> >> The shutdown call is also racy because since it doesn't take the
> > > >> >> qemu_file_lock, it could NULL-dereference if the return path thread
> > > >> >> happens to be in the middle of the critical region at
> > > >> >> migration_release_dst_files().
> > > >> >
> > > >> > After you rework the thread model on resume, shall we move
> > > >> > migration_release_dst_files() into the migration thread to be after the
> > > >> > pthread_join()?  I assume then we don't even need a mutex to protect it?
> > > >> >
> > > >> 
> > > >> I just need to figure out if it's ok to move the postcopy_qemufile_src
> > > >> cleanup along. No idea why it is there in the first place. I see you
> > > >> moved it from postcopy_pause and we're about to move it back to the
> > > >> exact same place =D
> > > >
> > > > It was there because the old postcopy-preempt was sending data via
> > > > postcopy_qemufile_src from the migration thread, while postcopy_pause is
> > > > also the migration thread context.
> > > >
> > > > Then we had 9358982744 ("migration: Send requested page directly in
> > > > rp-return thread") where we moved that "send page" operation into the
> > > > return path thread to reduce latencies.  After moving there it also means
> > > > the file handle can be accessed in >1 threads, so I just moved it over to
> > > > operate that always in the return path thread, then no race should happen.
> > > >
> > > 
> > > Thanks for the context.
> > > 
> > > > With your change, return path will vanish before migration thread accesses
> > > > it later (so as mentioned above, it must be after pthread_join()
> > > > succeeded), then I assume it'll be fine too to have it back in migration
> > > > thread.
> > > >
> > > > Or perhaps just take the file lock?
> > > >
> > > 
> > > There's also migrate_fd_cleanup and migrate_fd_cancel that can touch
> > > these files. We might need to lock anyway, let's see.
> > 
> > The cancel path shouldn't clear the QEMUFile*, then I assume it's fine.
> > That's based on the assumption that qemu_file_shutdown() is actually thread
> > safe (say, shutdown() syscall is thread-safe for sockets).
> 
> The shutdown() syscall and qio_channel_shutdown() method are intended
> to be safe to call from any thread *PROVIDED* you can ensure no other
> thread is concurrently going to call close() on the FD (or unref the
> QIOChannel object).
> 
> There is no locking in qemu_file_shutdown() to guarantee this, but
> maybe something else in migration code is guaranteeing that the
> QIOChannel object is not going to be closed (or unref'd), while a
> thread is invoking qemu_file_shutdown().

It should currently be guaranteed by the qemu_file_lock I think.

> 
> IOW, in theory qemu_file_shutdown() could be safe to use but
> I'm not seeing a clearly expressed guarantee of safety in the
> code. If it is safe, the reasons are very subtle and rationale
> ought to be documented in the comment for qemu_file_shutdown

I agree.

For now for this specific use case of the migration qemufile, we can simply
always take the mutex.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-08-03 15:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-02 14:36 [PATCH v2 0/2] Fix segfault on migration return path Fabiano Rosas
2023-08-02 14:36 ` [PATCH v2 1/2] migration: Split await_return_path_close_on_source Fabiano Rosas
2023-08-02 16:19   ` Peter Xu
2023-08-02 19:58     ` Fabiano Rosas
2023-08-02 20:40       ` Peter Xu
2023-08-03 14:45         ` Fabiano Rosas
2023-08-03 15:15           ` Peter Xu
2023-08-03 15:24             ` Daniel P. Berrangé
2023-08-03 15:39               ` Peter Xu
2023-08-02 14:36 ` [PATCH v2 2/2] migration: Replace the return path retry logic Fabiano Rosas
2023-08-02 16:02   ` Peter Xu
2023-08-02 20:04     ` Fabiano Rosas
2023-08-02 20:44       ` Peter Xu
2023-08-03 15:00         ` Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).