All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: "Zhanghaoyu (A)" <haoyu.zhang@huawei.com>
Cc: "Huangweidong (C)" <weidong.huang@huawei.com>,
	Gleb Natapov <gleb@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Luonengjun <luonengjun@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Zengjunliang <zengjunliang@huawei.com>,
	"Wangrui (K)" <moon.wangrui@huawei.com>
Subject: Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist
Date: Wed, 06 Nov 2013 10:04:36 +0100	[thread overview]
Message-ID: <527A0624.3040600@redhat.com> (raw)
In-Reply-To: <D3E216785288A145B7BC975F83A2ED1043EF1A15@szxeml556-mbx.china.huawei.com>

Il 06/11/2013 02:50, Zhanghaoyu (A) ha scritto:
>>>>> Avoid starting a new migration task while the previous one still
>>>> exist.
>>>>
>>>> Can you explain how to reproduce the problem?
>>>>
>>> When network disconnection between source and destination happened, 
>>> the migration thread stuck at below stack,
>>> Then I cancel the migration task, the migration state in qemu will be set to MIG_STATE_CANCELLED, so the migration job in libvirt quits.
>>> Then I perform migration again, at this time, the network reconnected 
>>> successfully, since the TCP timeout retransmission, above stack will not return immediately, so two migration tasks exist at the same time.
>>> And still worse, source qemu will crash, because of accessing the NULL 
>>> pointer in qemu_bh_schedule(s->cleanup_bh); statement in latter migration task, since the "s->cleanup_bh" had been deleted by previous migration task.
>>
>> Thanks for explaining.  CANCELLING looks like a useful addition.
>>
>> Why do you need both CANCELLING and COMPLETING?  The COMPLETED state should be set only after all I/O is done.
>
> There is a period of time from the timing of setting COMPLETED state to that of migration task exits,
> so it's problematic in COMPLETED->CANCELLED transition, but if applying your below proposal, the problem gone.
>
> do {
>     old_state = s->state;
>     if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
>         break;
>     }
>     migrate_set_state(s, old_state, MIG_STATE_CANCELLED);
> } while (s->state != MIG_STATE_CANCELLED);

Ok.

>> I agree with Eric that the CANCELLING state should not be exposed via QMP.
>> "info migrate" and "query-migrate" can keep showing "active" for maximum backwards compatibility.
>>
>> More comments below.
>>
>>
>>> -    if (s->state != MIG_STATE_COMPLETED) {
>>> +    if (s->state != MIG_STATE_COMPLETING) {
>>>          qemu_savevm_state_cancel();
>>> +        if (s->state == MIG_STATE_CANCELLING) {
>>> +            migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); 
>>> +        }
>>
>> I think you can remove the "if" and unconditionally call migrate_set_state.
> 
> Do you mean to remove the "if (s->state == MIG_STATE_CANCELLING)" ?
> The s->state probably is MIG_STATE_ERROR here, is it okay to unconditionally call migrate_set_state?

migrate_set_state has atomic_cmpxchg so it has an "implicit" if, but
you're right it's clearer this way.

Paolo

> Thanks,
> Zhang Haoyu
> 
>>
>>> +    }else {
>>> +        migrate_set_state(s, MIG_STATE_COMPLETING, 
>>> + MIG_STATE_COMPLETED);
>>>      }
>>>  
>>>      notifier_list_notify(&migration_state_notifiers, s);  }
>>>  
>>> -static void migrate_set_state(MigrationState *s, int old_state, int 
>>> new_state) -{
>>> -    if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) {
>>> -        trace_migrate_set_state(new_state);
>>> -    }
>>> -}
>>> -
>>>  void migrate_fd_error(MigrationState *s)  {
>>>      DPRINTF("setting error state\n"); @@ -328,7 +337,7 @@ static void 
>>> migrate_fd_cancel(MigrationState *s)  {
>>>      DPRINTF("cancelling migration\n");
>>>  
>>> -    migrate_set_state(s, s->state, MIG_STATE_CANCELLED);
>>> +    migrate_set_state(s, s->state, MIG_STATE_CANCELLING);
>>
>> Here probably we want something like
>>
>>    do {
>>        old_state = s->state;
>>        if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
>>            break;
>>        }
>>        migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
>>    } while (s->state != MIG_STATE_CANCELLING);
>>
>> to avoid a bogus COMPLETED->CANCELLED transition.  Please separate the patch in two parts:
>>
>> (1) the first uses the above code, with CANCELLED instead of CANCELLING
>>
>> (2) the second, similar to the one you have posted, introduces the new CANCELLING state
>>
>> Thanks,
>>
>> Paolo

  reply	other threads:[~2013-11-06  9:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-04 11:26 [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist Zhanghaoyu (A)
2013-11-04 11:30 ` Paolo Bonzini
2013-11-05  2:23   ` Zhanghaoyu (A)
2013-11-05  9:15     ` Paolo Bonzini
2013-11-06  1:50       ` Zhanghaoyu (A)
2013-11-06  9:04         ` Paolo Bonzini [this message]
2013-11-04 13:59 ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=527A0624.3040600@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=gleb@redhat.com \
    --cc=haoyu.zhang@huawei.com \
    --cc=luonengjun@huawei.com \
    --cc=moon.wangrui@huawei.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=weidong.huang@huawei.com \
    --cc=zengjunliang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.