From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
Wei Wang <wei.w.wang@intel.com>,
Leonardo Bras <leobras@redhat.com>
Subject: Re: [PATCH v3 07/10] migration: Replace the return path retry logic
Date: Tue, 15 Aug 2023 17:58:46 -0400 [thread overview]
Message-ID: <ZNv1Fi2a6n+TXC3o@x1n> (raw)
In-Reply-To: <20230811150836.2895-8-farosas@suse.de>
On Fri, Aug 11, 2023 at 12:08:33PM -0300, Fabiano Rosas wrote:
> Replace the return path retry logic with finishing and restarting the
> thread. This fixes a race when resuming the migration that leads to a
> segfault.
>
> Currently when doing postcopy we consider that an IO error on the
> return path file could be due to a network intermittency. We then keep
> the thread alive but have it do cleanup of the 'from_dst_file' and
> wait on the 'postcopy_pause_rp' semaphore. When the user issues a
> migrate resume, a new return path is opened and the thread is allowed
> to continue.
>
> There's a race condition in the above mechanism. It is possible for
> the new return path file to be setup *before* the cleanup code in the
> return path thread has had a chance to run, leading to the *new* file
> being closed and the pointer set to NULL. When the thread is released
> after the resume, it tries to dereference 'from_dst_file' and crashes:
>
> Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
> 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
> 154 return f->last_error;
>
> (gdb) bt
> #0 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154
> #1 0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206
> #2 0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876
> #3 0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541
> #4 0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477
> #5 0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> Here's the race (important bit is open_return_path happening before
> migration_release_dst_files):
>
> migration | qmp | return path
> --------------------------+-----------------------------+---------------------------------
> qmp_migrate_pause()
> shutdown(ms->to_dst_file)
> f->last_error = -EIO
> migrate_detect_error()
> postcopy_pause()
> set_state(PAUSED)
> wait(postcopy_pause_sem)
> qmp_migrate(resume)
> migrate_fd_connect()
> resume = state == PAUSED
> open_return_path <-- TOO SOON!
> set_state(RECOVER)
> post(postcopy_pause_sem)
> (incoming closes to_src_file)
> res = qemu_file_get_error(rp)
> migration_release_dst_files()
> ms->rp_state.from_dst_file = NULL
> post(postcopy_pause_rp_sem)
> postcopy_pause_return_path_thread()
> wait(postcopy_pause_rp_sem)
> rp = ms->rp_state.from_dst_file
> goto retry
> qemu_file_get_error(rp)
> SIGSEGV
> -------------------------------------------------------------------------------------------
>
> We can keep the retry logic without having the thread alive and
> waiting. The only piece of data used by it is the 'from_dst_file' and
> it is only allowed to proceed after a migrate resume is issued and the
> semaphore released at migrate_fd_connect().
>
> Move the retry logic to outside the thread by waiting for the thread
> to finish before pausing the migration.
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
next prev parent reply other threads:[~2023-08-15 21:59 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-11 15:08 [PATCH v3 00/10] Fix segfault on migration return path Fabiano Rosas
2023-08-11 15:08 ` [PATCH v3 01/10] migration: Fix possible race when setting rp_state.error Fabiano Rosas
2023-08-15 21:49 ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 02/10] migration: Fix possible race when shutting return path Fabiano Rosas
2023-08-11 15:08 ` [PATCH v3 03/10] migration: Fix possible race when checking to_dst_file for errors Fabiano Rosas
2023-08-15 21:49 ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 04/10] migration: Fix possible race when shutting down to_dst_file Fabiano Rosas
2023-08-15 21:51 ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 05/10] migration: Remove redundant cleanup of postcopy_qemufile_src Fabiano Rosas
2023-08-15 21:56 ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 06/10] migration: Consolidate return path closing code Fabiano Rosas
2023-08-15 21:57 ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 07/10] migration: Replace the return path retry logic Fabiano Rosas
2023-08-15 21:58 ` Peter Xu [this message]
2023-08-11 15:08 ` [PATCH v3 08/10] migration: Move return path cleanup to main migration thread Fabiano Rosas
2023-08-15 22:02 ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 09/10] migration: Be consistent about shutdown of source shared files Fabiano Rosas
2023-08-15 22:08 ` Peter Xu
2023-08-15 22:19 ` Fabiano Rosas
2023-08-15 22:34 ` Peter Xu
2023-08-11 15:08 ` [PATCH v3 10/10] migration: Add a wrapper to cleanup migration files Fabiano Rosas
2023-08-15 22:15 ` Peter Xu
2023-08-15 22:31 ` Fabiano Rosas
2023-08-16 14:26 ` Peter Xu
2023-08-16 14:57 ` Fabiano Rosas
2023-08-16 15:26 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZNv1Fi2a6n+TXC3o@x1n \
--to=peterx@redhat.com \
--cc=farosas@suse.de \
--cc=leobras@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=wei.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).