qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fei Li <fli@suse.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, armbru@redhat.com, dgilbert@redhat.com,
	famz@redhat.com, quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v7 4/9] migration: fix some segmentation faults when using multifd
Date: Fri, 2 Nov 2018 14:03:43 +0800	[thread overview]
Message-ID: <80eb6add-711d-c3d8-795d-afecc63727f7@suse.com> (raw)
In-Reply-To: <20181102023158.GC7804@xz-x1>



On 11/02/2018 10:31 AM, Peter Xu wrote:
> On Thu, Nov 01, 2018 at 06:17:10PM +0800, Fei Li wrote:
>> When multifd is used during migration, a segmentaion fault will
>> occur in the source when multifd_save_cleanup() is called again if
>> the multifd_send_state has been freed in earlier error handling. This
>> can happen when migrate_fd_connect() fails and multifd_fd_cleanup()
>> is called, and then multifd_new_send_channel_async() fails and
>> multifd_save_cleanup() is called again.
>>
>> If the QIOChannel *c of multifd_recv_state->params[i] (p->c) is not
>> initialized, there is no need to close the channel. Or else a
>> segmentation fault will occur in multifd_recv_terminate_threads()
>> when multifd_recv_initial_packet() fails.
> It's a bit odd to me when I see that multifd_send_thread() calls
> multifd_send_terminate_threads().  Is that the reason that you
> encountered the problem?
Yes, this is one of the reason. Actually this problem almost does not occur
before this patch series, but as this patch series is trying to make 
qemu_thread_create()
be more robust, I find this problem during the debugging. ;)

The second situation is when using multifd (in this way 
multifd_new_send_channel_asyn()[1]
is called several times) and once one channel fails in [1], the later 
channels
will encounter the segmentation fault problem when enters [1] again.
And the third is after applying the last patch, I mean after 
multifd_save_setup=>
socket_send_channel_create(multifd_new_send_channel_async, p) successfully,
but then qemu_thread_create(migration_thread) fails, I assume here we 
need to
do some migration cleanup, like migrate_fd_cleanup() or something 
similar like
the vm_start() in migration_iteration_finish()?
The fourth is when multifd_new_send_channel_async()[1] fails and 
multifd_save_cleanup()
is called, then multifd_send_sync_main <= qemu_savevm_state_setup <= 
migration_thread[2]
is called. (BTW, I find that sometimes [2] is called earlier than [1], 
but sometimes
later the first channel)
> Instead of checking all these null pointers, IMHO we should just let
> multifd_send_terminate_threads() be called only in the main thread...
Ok, from your reply in patch5/9, I know we should offer better isolation and
just let the main thread handle these cleanup.:)
>> Signed-off-by: Fei Li <fli@suse.com>
>> ---
>>   migration/ram.c | 28 +++++++++++++++++++++-------
>>   1 file changed, 21 insertions(+), 7 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 7e7deec4d8..4db3b3e8f4 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -907,6 +907,11 @@ static void multifd_send_terminate_threads(Error *err)
>>           }
>>       }
>>   
>> +    /* in case multifd_send_state has been freed earlier */
>> +    if (!multifd_send_state) {
>> +        return;
>> +    }
>> +
>>       for (i = 0; i < migrate_multifd_channels(); i++) {
>>           MultiFDSendParams *p = &multifd_send_state->params[i];
The above one is the first case.
>>   
>> @@ -922,7 +927,7 @@ int multifd_save_cleanup(Error **errp)
>>       int i;
>>       int ret = 0;
>>   
>> -    if (!migrate_use_multifd()) {
>> +    if (!migrate_use_multifd() || !multifd_send_state) {
>>           return 0;
>>       }
>>       multifd_send_terminate_threads(NULL);
The above one is the third case.
>> @@ -960,7 +965,7 @@ static void multifd_send_sync_main(void)
>>   {
>>       int i;
>>   
>> -    if (!migrate_use_multifd()) {
>> +    if (!migrate_use_multifd() || !multifd_send_state) {
>>           return;
>>       }
>>       if (multifd_send_state->pages->used) {
The above one is the fourth case.
>> @@ -1070,6 +1075,10 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
>>       QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task));
>>       Error *local_err = NULL;
>>   
>> +    if (!multifd_send_state) {
>> +        return;
>> +    }
>> +
>>       if (qio_task_propagate_error(task, &local_err)) {
>>           if (multifd_save_cleanup(&local_err) != 0) {
>>               migrate_set_error(migrate_get_current(), local_err);
The above one is the second case.
>> @@ -1131,7 +1140,7 @@ struct {
>>       uint64_t packet_num;
>>   } *multifd_recv_state;
Below is to fix the second paragraph (p->c) in the commit message. :)
>>   
>> -static void multifd_recv_terminate_threads(Error *err)
>> +static void multifd_recv_terminate_threads(Error *err, bool channel)
>>   {
>>       int i;
>>   
>> @@ -1145,6 +1154,11 @@ static void multifd_recv_terminate_threads(Error *err)
>>           }
>>       }
>>   
>> +    /* in case p->c is not initialized */
>> +    if (!channel) {
>> +        return;
>> +    }
>> +
>>       for (i = 0; i < migrate_multifd_channels(); i++) {
>>           MultiFDRecvParams *p = &multifd_recv_state->params[i];
>>   
>> @@ -1166,7 +1180,7 @@ int multifd_load_cleanup(Error **errp)
>>       if (!migrate_use_multifd()) {
>>           return 0;
>>       }
>> -    multifd_recv_terminate_threads(NULL);
>> +    multifd_recv_terminate_threads(NULL, true);
>>       for (i = 0; i < migrate_multifd_channels(); i++) {
>>           MultiFDRecvParams *p = &multifd_recv_state->params[i];
>>   
>> @@ -1269,7 +1283,7 @@ static void *multifd_recv_thread(void *opaque)
>>       }
>>   
>>       if (local_err) {
>> -        multifd_recv_terminate_threads(local_err);
>> +        multifd_recv_terminate_threads(local_err, true);
>>       }
>>       qemu_mutex_lock(&p->mutex);
>>       p->running = false;
>> @@ -1331,7 +1345,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>   
>>       id = multifd_recv_initial_packet(ioc, &local_err);
>>       if (id < 0) {
>> -        multifd_recv_terminate_threads(local_err);
>> +        multifd_recv_terminate_threads(local_err, false);
>>           return false;
>>       }
>>   
>> @@ -1339,7 +1353,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
>>       if (p->c != NULL) {
>>           error_setg(&local_err, "multifd: received id '%d' already setup'",
>>                      id);
>> -        multifd_recv_terminate_threads(local_err);
>> +        multifd_recv_terminate_threads(local_err, true);
>>           return false;
>>       }
>>       p->c = ioc;
>> -- 
>> 2.13.7
>>
> Regards,
>
Have a nice day, thanks for the comment. :)
Fei

  reply	other threads:[~2018-11-02  6:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-01 10:17 [Qemu-devel] [PATCH RFC v7 0/9] qemu_thread_create: propagate errors to callers to check Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 1/9] Fix segmentation fault when qemu_signal_init fails Fei Li
2018-11-05 13:32   ` Juan Quintela
2018-11-06  5:08     ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 2/9] qemu_init_vcpu: add a new Error parameter to propagate Fei Li
2018-11-05 13:34   ` Juan Quintela
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 3/9] qemu_thread_join: fix segmentation fault Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 4/9] migration: fix some segmentation faults when using multifd Fei Li
2018-11-02  2:31   ` Peter Xu
2018-11-02  6:03     ` Fei Li [this message]
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 5/9] migration: fix the multifd code when sending less channels Fei Li
2018-11-02  2:37   ` Peter Xu
2018-11-02  3:00     ` Fei Li
2018-11-02  3:32       ` Peter Xu
2018-11-02  7:13         ` Fei Li
2018-11-02  7:32           ` Peter Xu
2018-11-02 16:33         ` Dr. David Alan Gilbert
2018-11-12  4:43           ` Fei Li
2018-12-04  7:32             ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 6/9] migration: fix the multifd code when receiving " Fei Li
2018-11-02  2:46   ` Peter Xu
2018-11-06  5:29     ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 7/9] migration: remove unused &local_err parameter in migrate_set_error Fei Li
2018-11-05 13:59   ` Juan Quintela
2018-11-06  4:51     ` Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 8/9] migration: add more error handling for postcopy_ram_enable_notify Fei Li
2018-11-01 10:17 ` [Qemu-devel] [PATCH RFC v7 9/9] qemu_thread_create: propagate the error to callers to handle Fei Li
2018-11-05 13:53   ` Juan Quintela
2018-11-06  7:15     ` Fei Li
2018-11-03 18:09 ` [Qemu-devel] [PATCH RFC v7 0/9] qemu_thread_create: propagate errors to callers to check no-reply
2018-11-05  4:57   ` Fei Li
2018-11-05 18:19 ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80eb6add-711d-c3d8-795d-afecc63727f7@suse.com \
    --to=fli@suse.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=famz@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).