Re: [PATCH 3/3] migration: Replace the return path retry logic

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Fabiano Rosas <farosas@suse.de>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	Leonardo Bras <leobras@redhat.com>
Subject: Re: [PATCH 3/3] migration: Replace the return path retry logic
Date: Mon, 31 Jul 2023 10:04:25 -0300	[thread overview]
Message-ID: <87ila08c6e.fsf@suse.de> (raw)
In-Reply-To: <ZMQ29X5/pcDkR7RC@x1n>

Peter Xu <peterx@redhat.com> writes:

> On Fri, Jul 28, 2023 at 09:15:16AM -0300, Fabiano Rosas wrote:
>> Replace the return path retry logic with finishing and restarting the
>> thread. This fixes a race when resuming the migration that leads to a
>> segfault.
>> 
>> Currently when doing postcopy we consider that an IO error on the
>> return path file could be due to a network intermittency. We then keep
>> the thread alive but have it do cleanup of the 'from_dst_file' and
>> wait on the 'postcopy_pause_rp' semaphore. When the user issues a
>> migrate resume, a new return path is opened and the thread is allowed
>> to continue.
>> 
>> There's a race condition in the above mechanism. It is possible for
>> the new return path file to be setup *before* the cleanup code in the
>> return path thread has had a chance to run, leading to the *new* file
>> being closed and the pointer set to NULL. When the thread is released
>> after the resume, it tries to dereference 'from_dst_file' and crashes:
>> 
>> Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
>> 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
>> 154         return f->last_error;
>> 
>> (gdb) bt
>>  #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
>>  #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
>>  #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
>>  #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
>>  #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
>>  #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>> 
>> Here's the race (important bit is open_return_path happening before
>> migration_release_dst_files):
>> 
>> migration                 | qmp                         | return path
>> --------------------------+-----------------------------+---------------------------------
>> 			    qmp_migrate_pause()
>> 			     shutdown(ms->to_dst_file)
>> 			      f->last_error = -EIO
>> migrate_detect_error()
>>  postcopy_pause()
>>   set_state(PAUSED)
>>   wait(postcopy_pause_sem)
>> 			    qmp_migrate(resume)
>> 			    migrate_fd_connect()
>> 			     resume = state == PAUSED
>> 			     open_return_path <-- TOO SOON!
>> 			     set_state(RECOVER)
>> 			     post(postcopy_pause_sem)
>> 							(incoming closes to_src_file)
>> 							res = qemu_file_get_error(rp)
>> 							migration_release_dst_files()
>> 							ms->rp_state.from_dst_file = NULL
>>   post(postcopy_pause_rp_sem)
>> 							postcopy_pause_return_path_thread()
>> 							  wait(postcopy_pause_rp_sem)
>> 							rp = ms->rp_state.from_dst_file
>> 							goto retry
>> 							qemu_file_get_error(rp)
>> 							SIGSEGV
>> -------------------------------------------------------------------------------------------
>> 
>> We can keep the retry logic without having the thread alive and
>> waiting. The only piece of data used by it is the 'from_dst_file' and
>> it is only allowed to proceed after a migrate resume is issued and the
>> semaphore released at migrate_fd_connect().
>> 
>> Move the retry logic to outside the thread by having
>> open_return_path_on_source() wait for the thread to finish before
>> creating a new one with the updated 'from_dst_file'.
>
> If we can remove that (along with the sync sem) it'll be pretty nice.  If
> you want, and if this works well, not sure whether you're interested in
> doing similarly to the other threads.  Currently we halt all the threads,
> I'm not sure whether we can do similiar things on dest and whether that can
> also benefit on sync efforts.

I'm interested but I don't know the postcopy code too well, I'll have to
spend some time with it.

Next on my list was the multifd p->running flag which seems entirely
superfluous to me.

>
> Still one comment below.
>
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/migration.c | 72 +++++++++++++++----------------------------
>>  migration/migration.h |  1 -
>>  2 files changed, 25 insertions(+), 48 deletions(-)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index d6f4470265..36cdd7bda8 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -97,6 +97,7 @@ static int migration_maybe_pause(MigrationState *s,
>>                                   int *current_active_state,
>>                                   int new_state);
>>  static void migrate_fd_cancel(MigrationState *s);
>> +static int await_return_path_close_on_source(MigrationState *ms);
>>  
>>  static bool migration_needs_multiple_sockets(void)
>>  {
>> @@ -1764,18 +1765,6 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
>>      }
>>  }
>>  
>> -/* Return true to retry, false to quit */
>> -static bool postcopy_pause_return_path_thread(MigrationState *s)
>> -{
>> -    trace_postcopy_pause_return_path();
>> -
>> -    qemu_sem_wait(&s->postcopy_pause_rp_sem);
>> -
>> -    trace_postcopy_pause_return_path_continued();
>> -
>> -    return true;
>> -}
>> -
>>  static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
>>  {
>>      RAMBlock *block = qemu_ram_block_by_name(block_name);
>> @@ -1859,7 +1848,6 @@ static void *source_return_path_thread(void *opaque)
>>      trace_source_return_path_thread_entry();
>>      rcu_register_thread();
>>  
>> -retry:
>>      while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
>>             migration_is_setup_or_active(ms->state)) {
>>          trace_source_return_path_thread_loop_top();
>> @@ -1981,28 +1969,18 @@ retry:
>>      }
>>  
>>  out:
>> -    res = qemu_file_get_error(rp);
>> -    if (res) {
>> -        if (res && migration_in_postcopy()) {
>> +    if (qemu_file_get_error(rp)) {
>> +        if (migration_in_postcopy()) {
>>              /*
>> -             * Maybe there is something we can do: it looks like a
>> -             * network down issue, and we pause for a recovery.
>> +             * This could be a network issue that would have been
>> +             * detected by the main migration thread and caused the
>> +             * migration to pause. Do cleanup and finish.
>>               */
>> -            migration_release_dst_files(ms);
>> -            rp = NULL;
>> -            if (postcopy_pause_return_path_thread(ms)) {
>> -                /*
>> -                 * Reload rp, reset the rest.  Referencing it is safe since
>> -                 * it's reset only by us above, or when migration completes
>> -                 */
>> -                rp = ms->rp_state.from_dst_file;
>> -                ms->rp_state.error = false;
>> -                goto retry;
>> -            }
>> +            ms->rp_state.error = false;
>> +        } else {
>> +            trace_source_return_path_thread_bad_end();
>> +            mark_source_rp_bad(ms);
>>          }
>> -
>> -        trace_source_return_path_thread_bad_end();
>> -        mark_source_rp_bad(ms);
>>      }
>>  
>>      trace_source_return_path_thread_end();
>> @@ -2012,8 +1990,21 @@ out:
>>  }
>>  
>>  static int open_return_path_on_source(MigrationState *ms,
>> -                                      bool create_thread)
>> +                                      bool resume)
>>  {
>> +    if (resume) {
>> +        assert(ms->state == MIGRATION_STATUS_POSTCOPY_PAUSED);
>> +
>> +        /*
>> +         * We're resuming from a paused postcopy migration. Wait for
>> +         * the thread to do its cleanup before re-opening the return
>> +         * path.
>> +         */
>> +        if (await_return_path_close_on_source(ms)) {
>> +            return -1;
>> +        }
>
> This smells a bit hacky.  Can we do this in postcopy_pause(), perhaps
> before we switch to PAUSED state?  Then we know after being PAUSED we're
> ready to recover.

It looks like we could. I'll move it.

     prev parent reply	other threads:[~2023-07-31 13:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-28 12:15 [PATCH 0/3] Fix segfault on migration return path Fabiano Rosas
2023-07-28 12:15 ` [PATCH 1/3] migration: Stop marking RP bad after shutdown Fabiano Rosas
2023-07-28 21:37   ` Peter Xu
2023-07-31 21:02     ` Fabiano Rosas
2023-07-31 21:13       ` Fabiano Rosas
2023-07-28 12:15 ` [PATCH 2/3] migration: Simplify calling of await_return_path_close_on_source Fabiano Rosas
2023-07-28 21:39   ` Peter Xu
2023-07-31 12:42     ` Fabiano Rosas
2023-07-31 17:06       ` Peter Xu
2023-07-28 12:15 ` [PATCH 3/3] migration: Replace the return path retry logic Fabiano Rosas
2023-07-28 21:45   ` Peter Xu
2023-07-31 13:04     ` Fabiano Rosas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ila08c6e.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.