qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	Wei Wang <wei.w.wang@intel.com>,
	Leonardo Bras <leobras@redhat.com>
Subject: Re: [PATCH v3 07/10] migration: Replace the return path retry logic
Date: Tue, 15 Aug 2023 17:58:46 -0400	[thread overview]
Message-ID: <ZNv1Fi2a6n+TXC3o@x1n> (raw)
In-Reply-To: <20230811150836.2895-8-farosas@suse.de>

On Fri, Aug 11, 2023 at 12:08:33PM -0300, Fabiano Rosas wrote:
> Replace the return path retry logic with finishing and restarting the
> thread. This fixes a race when resuming the migration that leads to a
> segfault.
> 
> Currently when doing postcopy we consider that an IO error on the
> return path file could be due to a network intermittency. We then keep
> the thread alive but have it do cleanup of the 'from_dst_file' and
> wait on the 'postcopy_pause_rp' semaphore. When the user issues a
> migrate resume, a new return path is opened and the thread is allowed
> to continue.
> 
> There's a race condition in the above mechanism. It is possible for
> the new return path file to be setup *before* the cleanup code in the
> return path thread has had a chance to run, leading to the *new* file
> being closed and the pointer set to NULL. When the thread is released
> after the resume, it tries to dereference 'from_dst_file' and crashes:
> 
> Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
> 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
> 154         return f->last_error;
> 
> (gdb) bt
>  #0  0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
>  #1  0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
>  #2  0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
>  #3  0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
>  #4  0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
>  #5  0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> Here's the race (important bit is open_return_path happening before
> migration_release_dst_files):
> 
> migration                 | qmp                         | return path
> --------------------------+-----------------------------+---------------------------------
> 			    qmp_migrate_pause()
> 			     shutdown(ms->to_dst_file)
> 			      f->last_error = -EIO
> migrate_detect_error()
>  postcopy_pause()
>   set_state(PAUSED)
>   wait(postcopy_pause_sem)
> 			    qmp_migrate(resume)
> 			    migrate_fd_connect()
> 			     resume = state == PAUSED
> 			     open_return_path <-- TOO SOON!
> 			     set_state(RECOVER)
> 			     post(postcopy_pause_sem)
> 							(incoming closes to_src_file)
> 							res = qemu_file_get_error(rp)
> 							migration_release_dst_files()
> 							ms->rp_state.from_dst_file = NULL
>   post(postcopy_pause_rp_sem)
> 							postcopy_pause_return_path_thread()
> 							  wait(postcopy_pause_rp_sem)
> 							rp = ms->rp_state.from_dst_file
> 							goto retry
> 							qemu_file_get_error(rp)
> 							SIGSEGV
> -------------------------------------------------------------------------------------------
> 
> We can keep the retry logic without having the thread alive and
> waiting. The only piece of data used by it is the 'from_dst_file' and
> it is only allowed to proceed after a migrate resume is issued and the
> semaphore released at migrate_fd_connect().
> 
> Move the retry logic to outside the thread by waiting for the thread
> to finish before pausing the migration.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



  reply	other threads:[~2023-08-15 21:59 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
2023-08-11 15:08 ` [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error Fabiano Rosas
2023-08-15 21:49   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 02/10] migration: Fix possible race when shutting return path Fabiano Rosas
2023-08-11 15:08 ` [PATCH v3 03/10] migration: Fix possible race when checking to_dst_file for errors Fabiano Rosas
2023-08-15 21:49   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 04/10] migration: Fix possible race when shutting down to_dst_file Fabiano Rosas
2023-08-15 21:51   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 05/10] migration: Remove redundant cleanup of postcopy_qemufile_src Fabiano Rosas
2023-08-15 21:56   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 06/10] migration: Consolidate return path closing code Fabiano Rosas
2023-08-15 21:57   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 07/10] migration: Replace the return path retry logic Fabiano Rosas
2023-08-15 21:58   ` Peter Xu [this message]
2023-08-11 15:08 ` [PATCH v3 08/10] migration: Move return path cleanup to main migration thread Fabiano Rosas
2023-08-15 22:02   ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files Fabiano Rosas
2023-08-15 22:08   ` Peter Xu
2023-08-15 22:19     ` Fabiano Rosas
2023-08-15 22:34       ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files Fabiano Rosas
2023-08-15 22:15   ` Peter Xu
2023-08-15 22:31     ` Fabiano Rosas
2023-08-16 14:26       ` Peter Xu
2023-08-16 14:57         ` Fabiano Rosas
2023-08-16 15:26           ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNv1Fi2a6n+TXC3o@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).