From: Fabiano Rosas <farosas@suse.de>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding
Date: Tue, 30 Jan 2024 18:32:21 -0300 [thread overview]
Message-ID: <87v87abj8a.fsf@suse.de> (raw)
In-Reply-To: <a54d5dec-6103-4e10-b732-2156573c4ad8@nvidia.com>
Avihai Horon <avihaih@nvidia.com> writes:
> On 29/01/2024 16:34, Fabiano Rosas wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Avihai Horon <avihaih@nvidia.com> writes:
>>
>>> Currently, multifd channels are created asynchronously without waiting
>>> for their creation -- migration simply proceeds and may wait in
>>> multifd_send_sync_main(), which is called by ram_save_setup(). This
>>> hides in it some race conditions which can cause an unexpected behavior
>>> if some channels creation fail.
>>>
>>> For example, the following scenario of multifd migration with two
>>> channels, where the first channel creation fails, will end in a
>>> segmentation fault (time advances from top to bottom):
>> Is this reproducible? Or just observable at least.
>
> Yes, though I had to engineer it a bit:
> 1. Run migration with two multifd channels and fail creation of the two
> channels (e.g., by changing the address they are connecting to).
> 2. Add sleep(3) in multifd_send_sync_main() before we loop through the
> channels and check p->quit.
> 3. Add sleep(5) only for the second multifd channel connect thread so
> its connection is delayed and runs last.
Ok, well, that's something at least. I'll try to reproduce it so we can
keep track of it.
>> I acknowledge the situation you describe, but with multifd there's
>> usually an issue in cleanup paths. Let's make sure we flushed those out
>> before adding this new semaphore.
>
> Indeed, I was not keen on adding yet another semaphore either.
> I think there are multiple bugs here, some of them overlap and some don't.
> There is also your and Peter's previous work that I was not aware of to
> fix those and to clean up the code.
>
> Maybe we can take it one step at a time, pushing your series first,
> cleaning the code and fixing some bugs.
> Then we can see what bugs are left (if any) and fix them. It might even
> be easier to fix after the cleanups.
>
>> This is similar to an issue Peter was addressing where we missed calling
>> multifd_send_termiante_threads() in the multifd_channel_connect() path:
>>
>> patch 4 in this
>> https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com
>
> What issue are you referring here? Can you elaborate?
Oh, I just realised that series doesn't address any particular bug. But
my point is that including a call to multifd_send_terminate_threads() at
new_send_channel_cleanup might be all that's needed because that has
code to cause the channels and the migration thread to end.
> The main issue I am trying to fix in my patch is that we don't wait for
> all multifd channels to be created/error out before tearing down
> multifd resources in mulitfd_save_cleanup().
Ok, let me take a step back and ask why is this not solved by
multifd_save_cleanup() -> qemu_thread_join()? I see you moved
p->running=true to *after* the thread creation in patch 4. That will
always leave a gap where p->running == false but the thread is already
running.
>
>>> Thread | Code execution
>>> ------------------------------------------------------------------------
>>> Multifd 1 |
>>> | multifd_new_send_channel_async (errors and quits)
>>> | multifd_new_send_channel_cleanup
>>> |
>>> Migration thread |
>>> | qemu_savevm_state_setup
>>> | ram_save_setup
>>> | multifd_send_sync_main
>>> | (detects Multifd 1 error and quits)
>>> | [...]
>>> | migration_iteration_finish
>>> | migrate_fd_cleanup_schedule
>>> |
>>> Main thread |
>>> | migrate_fd_cleanup
>>> | multifd_save_cleanup (destroys Multifd 2 resources)
>>> |
>>> Multifd 2 |
>>> | multifd_new_send_channel_async
>>> | (accesses destroyed resources, segfault)
>>>
>>> In another scenario, migration can hang indefinitely:
>>> 1. Main migration thread reaches multifd_send_sync_main() and waits on
>>> the semaphores.
>>> 2. Then, all multifd channels creation fails, so they post the
>>> semaphores and quit.
>>> 3. Main migration channel will not identify the error, proceed to send
>>> pages and will hang.
>>>
>>> Fix it by waiting for all multifd channels to be created before
>>> proceeding with migration.
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>> migration/multifd.h | 3 +++
>>> migration/migration.c | 1 +
>>> migration/multifd.c | 34 +++++++++++++++++++++++++++++++---
>>> migration/ram.c | 7 +++++++
>>> 4 files changed, 42 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/migration/multifd.h b/migration/multifd.h
>>> index 35d11f103c..87a64e0a87 100644
>>> --- a/migration/multifd.h
>>> +++ b/migration/multifd.h
>>> @@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>> void multifd_recv_sync_main(void);
>>> int multifd_send_sync_main(void);
>>> int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
>>> +int multifd_send_channels_created(void);
>>>
>>> /* Multifd Compression flags */
>>> #define MULTIFD_FLAG_SYNC (1 << 0)
>>> @@ -86,6 +87,8 @@ typedef struct {
>>> /* multifd flags for sending ram */
>>> int write_flags;
>>>
>>> + /* Syncs channel creation and migration thread */
>>> + QemuSemaphore create_sem;
>>> /* sem where to wait for more work */
>>> QemuSemaphore sem;
>>> /* syncs main thread and channels */
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 9c769a1ecd..d81d96eaa5 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>> error_report_err(local_err);
>>> migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>> MIGRATION_STATUS_FAILED);
>>> + multifd_send_channels_created();
>>> migrate_fd_cleanup(s);
>>> return;
>>> }
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index 564e911b6c..f0c216f4f9 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
>>> multifd_send_channel_destroy(p->c);
>>> p->c = NULL;
>>> qemu_mutex_destroy(&p->mutex);
>>> + qemu_sem_destroy(&p->create_sem);
>>> qemu_sem_destroy(&p->sem);
>>> qemu_sem_destroy(&p->sem_sync);
>>> g_free(p->name);
>>> @@ -766,6 +767,29 @@ out:
>>> return NULL;
>>> }
>>>
>>> +int multifd_send_channels_created(void)
>>> +{
>>> + int ret = 0;
>>> +
>>> + if (!migrate_multifd()) {
>>> + return 0;
>>> + }
>>> +
>>> + for (int i = 0; i < migrate_multifd_channels(); i++) {
>>> + MultiFDSendParams *p = &multifd_send_state->params[i];
>>> +
>>> + qemu_sem_wait(&p->create_sem);
>>> + WITH_QEMU_LOCK_GUARD(&p->mutex) {
>>> + if (p->quit) {
>>> + error_report("%s: channel %d has already quit", __func__, i);
>>> + ret = -1;
>>> + }
>>> + }
>> There are races here when a channel fails at
>> multifd_send_initial_packet(). If p->quit can be set after post to
>> create_sem, then this function could always return true and we'd run
>> into a broken channel. Possibly even the same bug you're trying to fix.
>>
>> I think that's one of the reasons we have channels_ready.
>
> I am not sure exactly what bug you are describing here, can you elaborate?
>
This looks like it could be a preexisting issue actually, but in the
current code, what stops the multifd channel from running ahead is
p->sem. Except that the channel does some work at
multifd_send_initial_packet() before checking p->sem and that work could
fail.
This means that right after checking for p->quit above, the multifd
thread could already be exiting due to an error and
multifd_send_channels_created() would miss it. So there's still a chance
that this function effectively behaves just like the previous code.
Thread | Code execution
------------------------------------------------------------------------
Migration |
| ram_save_setup()
| multifd_send_channels_created()
| qemu_sem_wait(&p->create_sem);
Main |
| multifd_channel_connect()
| qemu_thread_create()
| qemu_sem_post(&p->create_sem)
Multifd 1 |
| multifd_send_initial_packet() *errors*
| goto out
| multifd_send_terminate_threads()
Migration |
| still at multifd_send_channels_created
| qemu_mutex_lock(&p->mutex);
| p->quit == false <--- !!!
| qemu_mutex_unlock(&p->mutex);
| return true from multifd_send_channels_created()
From here onwards, it's the same as not having checked
multifd_send_channels_created() at all. One way this could go is:
| runs unimpeded until multifd_send_sync_main()
| multifd_send_pages()
| *misses the exiting flag*
| qemu_sem_wait(&multifd_send_state->channels_ready);
Multifd 1 |
| still at multifd_send_terminate_threads
| qemu_mutex_lock(&p->mutex);
| p->quit = true;
| qemu_mutex_unlock(&p->mutex);
| qemu_sem_post(&p->sem_sync);
| qemu_sem_post(&multifd_send_state->channels_ready);
Migration |
| qemu_mutex_lock(&p->mutex);
| p->quit == true; <--- correct now
| qemu_mutex_unlock(&p->mutex);
| return -1;
| *all good again*
It seems that in order to avoid this kind of race we'd need the
synchronization point to be at the multifd thread instead. But then,
that's what channels_ready does.
>>
>>> + }
>>> +
>>> + return ret;
>>> +}
>>> +
>>> static bool multifd_channel_connect(MultiFDSendParams *p,
>>> QIOChannel *ioc,
>>> Error **errp);
>>> @@ -794,6 +818,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
>>> p->quit = true;
>>> qemu_sem_post(&multifd_send_state->channels_ready);
>>> qemu_sem_post(&p->sem_sync);
>>> + qemu_sem_post(&p->create_sem);
>>> error_free(err);
>>> }
>>>
>>> @@ -857,6 +882,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>> qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>> QEMU_THREAD_JOINABLE);
>>> p->running = true;
>>> + qemu_sem_post(&p->create_sem);
>>> return true;
>>> }
>>>
>>> @@ -864,15 +890,16 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
>>> QIOChannel *ioc, Error *err)
>>> {
>>> migrate_set_error(migrate_get_current(), err);
>>> - /* Error happen, we need to tell who pay attention to me */
>>> - qemu_sem_post(&multifd_send_state->channels_ready);
>>> - qemu_sem_post(&p->sem_sync);
>>> /*
>>> + * Error happen, we need to tell who pay attention to me.
>>> * Although multifd_send_thread is not created, but main migration
>>> * thread need to judge whether it is running, so we need to mark
>>> * its status.
>>> */
>>> p->quit = true;
>>> + qemu_sem_post(&multifd_send_state->channels_ready);
>>> + qemu_sem_post(&p->sem_sync);
>>> + qemu_sem_post(&p->create_sem);
>> Do we still need channels_ready and sem_sync here? The migration thread
>> shouldn't have gone past create_sem at this point.
>
> I think you are right, we can drop channels_ready and sem_sync here.
>
>>
>>> object_unref(OBJECT(ioc));
>>> error_free(err);
>>> }
>>> @@ -921,6 +948,7 @@ int multifd_save_setup(Error **errp)
>>> MultiFDSendParams *p = &multifd_send_state->params[i];
>>>
>>> qemu_mutex_init(&p->mutex);
>>> + qemu_sem_init(&p->create_sem, 0);
>>> qemu_sem_init(&p->sem, 0);
>>> qemu_sem_init(&p->sem_sync, 0);
>>> p->quit = false;
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index c0cdcccb75..b3e864a22b 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -2937,6 +2937,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>> RAMBlock *block;
>>> int ret;
>>>
>>> + bql_unlock();
>>> + ret = multifd_send_channels_created();
>>> + bql_lock();
>>> + if (ret < 0) {
>>> + return ret;
>>> + }
>>> +
>>> if (compress_threads_save_setup()) {
>>> return -1;
>>> }
next prev parent reply other threads:[~2024-01-30 21:33 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
2024-01-25 16:25 ` [PATCH 01/17] migration: Fix logic of channels and transport compatibility check Avihai Horon
2024-01-26 3:09 ` Peter Xu
2024-01-25 16:25 ` [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming() Avihai Horon
2024-01-26 3:10 ` Peter Xu
2024-01-25 16:25 ` [PATCH 03/17] migration: Rename default_channel to main_channel Avihai Horon
2024-01-26 3:11 ` Peter Xu
2024-01-25 16:25 ` [PATCH 04/17] migration/multifd: Set p->running = true in the right place Avihai Horon
2024-01-25 20:57 ` Fabiano Rosas
2024-01-28 15:43 ` Avihai Horon
2024-01-29 4:17 ` Peter Xu
2024-01-29 12:20 ` Avihai Horon
2024-01-30 5:57 ` Peter Xu
2024-01-30 18:44 ` Avihai Horon
2024-02-06 10:25 ` Peter Xu
2024-02-08 15:31 ` Avihai Horon
2024-01-29 12:23 ` Fabiano Rosas
2024-01-25 16:25 ` [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding Avihai Horon
2024-01-29 14:34 ` Fabiano Rosas
2024-01-30 18:32 ` Avihai Horon
2024-01-30 21:32 ` Fabiano Rosas [this message]
2024-01-31 4:49 ` Peter Xu
2024-01-31 10:39 ` Avihai Horon
2024-01-25 16:25 ` [PATCH 06/17] migration/tls: Rename main migration channel TLS functions Avihai Horon
2024-01-25 16:25 ` [PATCH 07/17] migration/tls: Add new migration channel TLS upgrade API Avihai Horon
2024-01-25 16:25 ` [PATCH 08/17] migration: Use the new TLS upgrade API for main channel Avihai Horon
2024-01-25 16:25 ` [PATCH 09/17] migration/multifd: Use the new TLS upgrade API for multifd channels Avihai Horon
2024-01-25 16:25 ` [PATCH 10/17] migration/postcopy: Use the new TLS upgrade API for preempt channel Avihai Horon
2024-01-25 16:25 ` [PATCH 11/17] migration/tls: Make migration_tls_client_create() static Avihai Horon
2024-01-25 16:25 ` [PATCH 12/17] migration/multifd: Consolidate TLS/non-TLS multifd channel error flow Avihai Horon
2024-01-25 16:25 ` [PATCH 13/17] migration: Store MigrationAddress in MigrationState Avihai Horon
2024-01-25 16:25 ` [PATCH 14/17] migration: Rename migration_channel_connect() Avihai Horon
2024-01-25 16:25 ` [PATCH 15/17] migration: Add new migration channel connect API Avihai Horon
2024-01-25 16:25 ` [PATCH 16/17] migration/multifd: Use the new migration channel connect API for multifd Avihai Horon
2024-01-25 16:25 ` [PATCH 17/17] migration/postcopy: Use the new migration channel connect API for postcopy preempt Avihai Horon
2024-02-06 10:04 ` [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Peter Xu
2024-02-06 13:10 ` Avihai Horon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v87abj8a.fsf@suse.de \
--to=farosas@suse.de \
--cc=avihaih@nvidia.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).