qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Avihai Horon <avihaih@nvidia.com>, qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding
Date: Tue, 30 Jan 2024 18:32:21 -0300	[thread overview]
Message-ID: <87v87abj8a.fsf@suse.de> (raw)
In-Reply-To: <a54d5dec-6103-4e10-b732-2156573c4ad8@nvidia.com>

Avihai Horon <avihaih@nvidia.com> writes:

> On 29/01/2024 16:34, Fabiano Rosas wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Avihai Horon <avihaih@nvidia.com> writes:
>>
>>> Currently, multifd channels are created asynchronously without waiting
>>> for their creation -- migration simply proceeds and may wait in
>>> multifd_send_sync_main(), which is called by ram_save_setup(). This
>>> hides in it some race conditions which can cause an unexpected behavior
>>> if some channels creation fail.
>>>
>>> For example, the following scenario of multifd migration with two
>>> channels, where the first channel creation fails, will end in a
>>> segmentation fault (time advances from top to bottom):
>> Is this reproducible? Or just observable at least.
>
> Yes, though I had to engineer it a bit:
> 1. Run migration with two multifd channels and fail creation of the two 
> channels (e.g., by changing the address they are connecting to).
> 2. Add sleep(3) in multifd_send_sync_main() before we loop through the 
> channels and check p->quit.
> 3. Add sleep(5) only for the second multifd channel connect thread so 
> its connection is delayed and runs last.

Ok, well, that's something at least. I'll try to reproduce it so we can
keep track of it.

>> I acknowledge the situation you describe, but with multifd there's
>> usually an issue in cleanup paths. Let's make sure we flushed those out
>> before adding this new semaphore.
>
> Indeed, I was not keen on adding yet another semaphore either.
> I think there are multiple bugs here, some of them overlap and some don't.
> There is also your and Peter's previous work that I was not aware of to 
> fix those and to clean up the code.
>
> Maybe we can take it one step at a time, pushing your series first, 
> cleaning the code and fixing some bugs.
> Then we can see what bugs are left (if any) and fix them. It might even 
> be easier to fix after the cleanups.
>
>> This is similar to an issue Peter was addressing where we missed calling
>> multifd_send_termiante_threads() in the multifd_channel_connect() path:
>>
>> patch 4 in this
>> https://lore.kernel.org/r/20231022201211.452861-1-peterx@redhat.com
>
> What issue are you referring here? Can you elaborate?

Oh, I just realised that series doesn't address any particular bug. But
my point is that including a call to multifd_send_terminate_threads() at
new_send_channel_cleanup might be all that's needed because that has
code to cause the channels and the migration thread to end.

> The main issue I am trying to fix in my patch is that we don't wait for 
> all multifd channels to be created/error out before tearing down
> multifd resources in mulitfd_save_cleanup().

Ok, let me take a step back and ask why is this not solved by
multifd_save_cleanup() -> qemu_thread_join()? I see you moved
p->running=true to *after* the thread creation in patch 4. That will
always leave a gap where p->running == false but the thread is already
running.

>
>>> Thread           | Code execution
>>> ------------------------------------------------------------------------
>>> Multifd 1        |
>>>                   | multifd_new_send_channel_async (errors and quits)
>>>                   |   multifd_new_send_channel_cleanup
>>>                   |
>>> Migration thread |
>>>                   | qemu_savevm_state_setup
>>>                   |   ram_save_setup
>>>                   |     multifd_send_sync_main
>>>                   |     (detects Multifd 1 error and quits)
>>>                   | [...]
>>>                   | migration_iteration_finish
>>>                   |   migrate_fd_cleanup_schedule
>>>                   |
>>> Main thread      |
>>>                   | migrate_fd_cleanup
>>>                   |   multifd_save_cleanup (destroys Multifd 2 resources)
>>>                   |
>>> Multifd 2        |
>>>                   | multifd_new_send_channel_async
>>>                   | (accesses destroyed resources, segfault)
>>>
>>> In another scenario, migration can hang indefinitely:
>>> 1. Main migration thread reaches multifd_send_sync_main() and waits on
>>>     the semaphores.
>>> 2. Then, all multifd channels creation fails, so they post the
>>>     semaphores and quit.
>>> 3. Main migration channel will not identify the error, proceed to send
>>>     pages and will hang.
>>>
>>> Fix it by waiting for all multifd channels to be created before
>>> proceeding with migration.
>>>
>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>>> ---
>>>   migration/multifd.h   |  3 +++
>>>   migration/migration.c |  1 +
>>>   migration/multifd.c   | 34 +++++++++++++++++++++++++++++++---
>>>   migration/ram.c       |  7 +++++++
>>>   4 files changed, 42 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/migration/multifd.h b/migration/multifd.h
>>> index 35d11f103c..87a64e0a87 100644
>>> --- a/migration/multifd.h
>>> +++ b/migration/multifd.h
>>> @@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>>>   void multifd_recv_sync_main(void);
>>>   int multifd_send_sync_main(void);
>>>   int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
>>> +int multifd_send_channels_created(void);
>>>
>>>   /* Multifd Compression flags */
>>>   #define MULTIFD_FLAG_SYNC (1 << 0)
>>> @@ -86,6 +87,8 @@ typedef struct {
>>>       /* multifd flags for sending ram */
>>>       int write_flags;
>>>
>>> +    /* Syncs channel creation and migration thread */
>>> +    QemuSemaphore create_sem;
>>>       /* sem where to wait for more work */
>>>       QemuSemaphore sem;
>>>       /* syncs main thread and channels */
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 9c769a1ecd..d81d96eaa5 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>>>           error_report_err(local_err);
>>>           migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>>>                             MIGRATION_STATUS_FAILED);
>>> +        multifd_send_channels_created();
>>>           migrate_fd_cleanup(s);
>>>           return;
>>>       }
>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>> index 564e911b6c..f0c216f4f9 100644
>>> --- a/migration/multifd.c
>>> +++ b/migration/multifd.c
>>> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
>>>           multifd_send_channel_destroy(p->c);
>>>           p->c = NULL;
>>>           qemu_mutex_destroy(&p->mutex);
>>> +        qemu_sem_destroy(&p->create_sem);
>>>           qemu_sem_destroy(&p->sem);
>>>           qemu_sem_destroy(&p->sem_sync);
>>>           g_free(p->name);
>>> @@ -766,6 +767,29 @@ out:
>>>       return NULL;
>>>   }
>>>
>>> +int multifd_send_channels_created(void)
>>> +{
>>> +    int ret = 0;
>>> +
>>> +    if (!migrate_multifd()) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    for (int i = 0; i < migrate_multifd_channels(); i++) {
>>> +        MultiFDSendParams *p = &multifd_send_state->params[i];
>>> +
>>> +        qemu_sem_wait(&p->create_sem);
>>> +        WITH_QEMU_LOCK_GUARD(&p->mutex) {
>>> +            if (p->quit) {
>>> +                error_report("%s: channel %d has already quit", __func__, i);
>>> +                ret = -1;
>>> +            }
>>> +        }
>> There are races here when a channel fails at
>> multifd_send_initial_packet(). If p->quit can be set after post to
>> create_sem, then this function could always return true and we'd run
>> into a broken channel. Possibly even the same bug you're trying to fix.
>>
>> I think that's one of the reasons we have channels_ready.
>
> I am not sure exactly what bug you are describing here, can you elaborate?
>

This looks like it could be a preexisting issue actually, but in the
current code, what stops the multifd channel from running ahead is
p->sem. Except that the channel does some work at
multifd_send_initial_packet() before checking p->sem and that work could
fail.

This means that right after checking for p->quit above, the multifd
thread could already be exiting due to an error and
multifd_send_channels_created() would miss it. So there's still a chance
that this function effectively behaves just like the previous code.

Thread           | Code execution
------------------------------------------------------------------------
Migration        |
                 | ram_save_setup()
                 |   multifd_send_channels_created()
                 |     qemu_sem_wait(&p->create_sem);
Main             |
                 | multifd_channel_connect()
                 |   qemu_thread_create()
                 |   qemu_sem_post(&p->create_sem)
Multifd 1        |
                 | multifd_send_initial_packet() *errors*
                 | goto out
                 | multifd_send_terminate_threads()
Migration        |
                 | still at multifd_send_channels_created
                 |     qemu_mutex_lock(&p->mutex);
                 |     p->quit == false      <--- !!!
                 |     qemu_mutex_unlock(&p->mutex);
                 | return true from multifd_send_channels_created()

From here onwards, it's the same as not having checked
multifd_send_channels_created() at all. One way this could go is:

                 | runs unimpeded until multifd_send_sync_main()
                 | multifd_send_pages()
                 | *misses the exiting flag*
                 | qemu_sem_wait(&multifd_send_state->channels_ready);
Multifd 1        |
                 | still at multifd_send_terminate_threads
                 |   qemu_mutex_lock(&p->mutex);
                 |   p->quit = true;
                 |   qemu_mutex_unlock(&p->mutex);
                 | qemu_sem_post(&p->sem_sync);
                 | qemu_sem_post(&multifd_send_state->channels_ready);
Migration        |
                 | qemu_mutex_lock(&p->mutex);
                 | p->quit == true;  <--- correct now
                 | qemu_mutex_unlock(&p->mutex);
                 | return -1;
                 | *all good again*

It seems that in order to avoid this kind of race we'd need the
synchronization point to be at the multifd thread instead. But then,
that's what channels_ready does.

>>
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>>   static bool multifd_channel_connect(MultiFDSendParams *p,
>>>                                       QIOChannel *ioc,
>>>                                       Error **errp);
>>> @@ -794,6 +818,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
>>>       p->quit = true;
>>>       qemu_sem_post(&multifd_send_state->channels_ready);
>>>       qemu_sem_post(&p->sem_sync);
>>> +    qemu_sem_post(&p->create_sem);
>>>       error_free(err);
>>>   }
>>>
>>> @@ -857,6 +882,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
>>>       qemu_thread_create(&p->thread, p->name, multifd_send_thread, p,
>>>                          QEMU_THREAD_JOINABLE);
>>>       p->running = true;
>>> +    qemu_sem_post(&p->create_sem);
>>>       return true;
>>>   }
>>>
>>> @@ -864,15 +890,16 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p,
>>>                                                QIOChannel *ioc, Error *err)
>>>   {
>>>        migrate_set_error(migrate_get_current(), err);
>>> -     /* Error happen, we need to tell who pay attention to me */
>>> -     qemu_sem_post(&multifd_send_state->channels_ready);
>>> -     qemu_sem_post(&p->sem_sync);
>>>        /*
>>> +      * Error happen, we need to tell who pay attention to me.
>>>         * Although multifd_send_thread is not created, but main migration
>>>         * thread need to judge whether it is running, so we need to mark
>>>         * its status.
>>>         */
>>>        p->quit = true;
>>> +     qemu_sem_post(&multifd_send_state->channels_ready);
>>> +     qemu_sem_post(&p->sem_sync);
>>> +     qemu_sem_post(&p->create_sem);
>> Do we still need channels_ready and sem_sync here? The migration thread
>> shouldn't have gone past create_sem at this point.
>
> I think you are right, we can drop channels_ready and sem_sync here.
>
>>
>>>        object_unref(OBJECT(ioc));
>>>        error_free(err);
>>>   }
>>> @@ -921,6 +948,7 @@ int multifd_save_setup(Error **errp)
>>>           MultiFDSendParams *p = &multifd_send_state->params[i];
>>>
>>>           qemu_mutex_init(&p->mutex);
>>> +        qemu_sem_init(&p->create_sem, 0);
>>>           qemu_sem_init(&p->sem, 0);
>>>           qemu_sem_init(&p->sem_sync, 0);
>>>           p->quit = false;
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index c0cdcccb75..b3e864a22b 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -2937,6 +2937,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>>       RAMBlock *block;
>>>       int ret;
>>>
>>> +    bql_unlock();
>>> +    ret = multifd_send_channels_created();
>>> +    bql_lock();
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>>       if (compress_threads_save_setup()) {
>>>           return -1;
>>>       }


  reply	other threads:[~2024-01-30 21:33 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25 16:25 [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Avihai Horon
2024-01-25 16:25 ` [PATCH 01/17] migration: Fix logic of channels and transport compatibility check Avihai Horon
2024-01-26  3:09   ` Peter Xu
2024-01-25 16:25 ` [PATCH 02/17] migration: Move local_err check in migration_ioc_process_incoming() Avihai Horon
2024-01-26  3:10   ` Peter Xu
2024-01-25 16:25 ` [PATCH 03/17] migration: Rename default_channel to main_channel Avihai Horon
2024-01-26  3:11   ` Peter Xu
2024-01-25 16:25 ` [PATCH 04/17] migration/multifd: Set p->running = true in the right place Avihai Horon
2024-01-25 20:57   ` Fabiano Rosas
2024-01-28 15:43     ` Avihai Horon
2024-01-29  4:17       ` Peter Xu
2024-01-29 12:20         ` Avihai Horon
2024-01-30  5:57           ` Peter Xu
2024-01-30 18:44             ` Avihai Horon
2024-02-06 10:25               ` Peter Xu
2024-02-08 15:31                 ` Avihai Horon
2024-01-29 12:23         ` Fabiano Rosas
2024-01-25 16:25 ` [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding Avihai Horon
2024-01-29 14:34   ` Fabiano Rosas
2024-01-30 18:32     ` Avihai Horon
2024-01-30 21:32       ` Fabiano Rosas [this message]
2024-01-31  4:49         ` Peter Xu
2024-01-31 10:39         ` Avihai Horon
2024-01-25 16:25 ` [PATCH 06/17] migration/tls: Rename main migration channel TLS functions Avihai Horon
2024-01-25 16:25 ` [PATCH 07/17] migration/tls: Add new migration channel TLS upgrade API Avihai Horon
2024-01-25 16:25 ` [PATCH 08/17] migration: Use the new TLS upgrade API for main channel Avihai Horon
2024-01-25 16:25 ` [PATCH 09/17] migration/multifd: Use the new TLS upgrade API for multifd channels Avihai Horon
2024-01-25 16:25 ` [PATCH 10/17] migration/postcopy: Use the new TLS upgrade API for preempt channel Avihai Horon
2024-01-25 16:25 ` [PATCH 11/17] migration/tls: Make migration_tls_client_create() static Avihai Horon
2024-01-25 16:25 ` [PATCH 12/17] migration/multifd: Consolidate TLS/non-TLS multifd channel error flow Avihai Horon
2024-01-25 16:25 ` [PATCH 13/17] migration: Store MigrationAddress in MigrationState Avihai Horon
2024-01-25 16:25 ` [PATCH 14/17] migration: Rename migration_channel_connect() Avihai Horon
2024-01-25 16:25 ` [PATCH 15/17] migration: Add new migration channel connect API Avihai Horon
2024-01-25 16:25 ` [PATCH 16/17] migration/multifd: Use the new migration channel connect API for multifd Avihai Horon
2024-01-25 16:25 ` [PATCH 17/17] migration/postcopy: Use the new migration channel connect API for postcopy preempt Avihai Horon
2024-02-06 10:04 ` [PATCH 00/17] migration: Add new migration channel connect and TLS upgrade APIs Peter Xu
2024-02-06 13:10   ` Avihai Horon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v87abj8a.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=avihaih@nvidia.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).