qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
@ 2014-03-24 13:04 arei.gonglei
  2014-03-24 14:14 ` Eric Blake
  2014-03-24 15:47 ` Paolo Bonzini
  0 siblings, 2 replies; 9+ messages in thread
From: arei.gonglei @ 2014-03-24 13:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: quintela, yanqiangjun, lefty.zhao, owasserm, Gonglei,
	zengjunliang, pbonzini

From: zengjunliang <zengjunliang@huawei.com>

Return error for migrate cancel, when migration status is not
MIG_STATE_SETUP or MIG_STATE_ACTIVE. Thus, libvirt can can
perceive the operation fails.

Signed-off-by: zengjunliang <zengjunliang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 include/qapi/qmp/qerror.h | 3 +++
 migration.c               | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
index da75abf..b13e3e0 100644
--- a/include/qapi/qmp/qerror.h
+++ b/include/qapi/qmp/qerror.h
@@ -164,6 +164,9 @@ void qerror_report_err(Error *err);
 #define QERR_MIGRATION_ACTIVE \
     ERROR_CLASS_GENERIC_ERROR, "There's a migration process in progress"
 
+#define QERR_MIGRATION_COMPLETED \
+    ERROR_CLASS_GENERIC_ERROR, "There's no migration process in progress"
+
 #define QERR_MIGRATION_NOT_SUPPORTED \
     ERROR_CLASS_GENERIC_ERROR, "State blocked by non-migratable device '%s'"
 
diff --git a/migration.c b/migration.c
index e0e24d4..2f34c67 100644
--- a/migration.c
+++ b/migration.c
@@ -336,7 +336,7 @@ void migrate_fd_error(MigrationState *s)
     notifier_list_notify(&migration_state_notifiers, s);
 }
 
-static void migrate_fd_cancel(MigrationState *s)
+static void migrate_fd_cancel(MigrationState *s, Error **errp)
 {
     int old_state ;
     DPRINTF("cancelling migration\n");
@@ -344,6 +344,7 @@ static void migrate_fd_cancel(MigrationState *s)
     do {
         old_state = s->state;
         if (old_state != MIG_STATE_SETUP && old_state != MIG_STATE_ACTIVE) {
+            error_set(errp, QERR_MIGRATION_COMPLETED);
             break;
         }
         migrate_set_state(s, old_state, MIG_STATE_CANCELLING);
@@ -470,7 +471,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 void qmp_migrate_cancel(Error **errp)
 {
-    migrate_fd_cancel(migrate_get_current());
+    migrate_fd_cancel(migrate_get_current(), errp);
 }
 
 void qmp_migrate_set_cache_size(int64_t value, Error **errp)
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
  2014-03-24 13:04 [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel arei.gonglei
@ 2014-03-24 14:14 ` Eric Blake
  2014-03-24 15:47 ` Paolo Bonzini
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Blake @ 2014-03-24 14:14 UTC (permalink / raw)
  To: arei.gonglei, qemu-devel
  Cc: quintela, zengjunliang, yanqiangjun, lefty.zhao, owasserm,
	pbonzini

[-- Attachment #1: Type: text/plain, Size: 1537 bytes --]

On 03/24/2014 07:04 AM, arei.gonglei@huawei.com wrote:
> From: zengjunliang <zengjunliang@huawei.com>
> 
> Return error for migrate cancel, when migration status is not
> MIG_STATE_SETUP or MIG_STATE_ACTIVE. Thus, libvirt can can
> perceive the operation fails.
> 
> Signed-off-by: zengjunliang <zengjunliang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  include/qapi/qmp/qerror.h | 3 +++
>  migration.c               | 5 +++--
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
> index da75abf..b13e3e0 100644
> --- a/include/qapi/qmp/qerror.h
> +++ b/include/qapi/qmp/qerror.h
> @@ -164,6 +164,9 @@ void qerror_report_err(Error *err);
>  #define QERR_MIGRATION_ACTIVE \
>      ERROR_CLASS_GENERIC_ERROR, "There's a migration process in progress"
>  
> +#define QERR_MIGRATION_COMPLETED \

New code should NOT be adding macros in qerror.h, but just directly
report the error.

> +    ERROR_CLASS_GENERIC_ERROR, "There's no migration process in progress"

You use a generic error both for migration active and for no migration
in progress.  The error API documents that clients (such as libvirt)
must NOT parse the human-readable string.  If libvirt is actually going
to behave differently for this particular error, that argues that it may
need a different error category than GENERIC_ERROR.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
  2014-03-24 13:04 [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel arei.gonglei
  2014-03-24 14:14 ` Eric Blake
@ 2014-03-24 15:47 ` Paolo Bonzini
  2014-03-24 16:00   ` Eric Blake
  1 sibling, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2014-03-24 15:47 UTC (permalink / raw)
  To: arei.gonglei, qemu-devel
  Cc: quintela, zengjunliang, yanqiangjun, lefty.zhao, owasserm

Il 24/03/2014 14:04, arei.gonglei@huawei.com ha scritto:
> From: zengjunliang <zengjunliang@huawei.com>
>
> Return error for migrate cancel, when migration status is not
> MIG_STATE_SETUP or MIG_STATE_ACTIVE. Thus, libvirt can can
> perceive the operation fails.
>
> Signed-off-by: zengjunliang <zengjunliang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>

I think this is done on purpose, because canceling migration is racy. 
Instead, libvirt should do "query-migrate" and check if the migration 
was completed or canceled.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
  2014-03-24 15:47 ` Paolo Bonzini
@ 2014-03-24 16:00   ` Eric Blake
  2014-03-25 11:15     ` Gonglei (Arei)
  2014-03-28  9:18     ` Gonglei (Arei)
  0 siblings, 2 replies; 9+ messages in thread
From: Eric Blake @ 2014-03-24 16:00 UTC (permalink / raw)
  To: Paolo Bonzini, arei.gonglei, qemu-devel
  Cc: quintela, zengjunliang, libvir-list@redhat.com, yanqiangjun,
	lefty.zhao, owasserm

[-- Attachment #1: Type: text/plain, Size: 1041 bytes --]

[adding libvirt]

On 03/24/2014 09:47 AM, Paolo Bonzini wrote:
> Il 24/03/2014 14:04, arei.gonglei@huawei.com ha scritto:
>> From: zengjunliang <zengjunliang@huawei.com>
>>
>> Return error for migrate cancel, when migration status is not
>> MIG_STATE_SETUP or MIG_STATE_ACTIVE. Thus, libvirt can can
>> perceive the operation fails.
>>
>> Signed-off-by: zengjunliang <zengjunliang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> 
> I think this is done on purpose, because canceling migration is racy.
> Instead, libvirt should do "query-migrate" and check if the migration
> was completed or canceled.

Can you please give more details at how you are triggering the problem
with libvirt?  I think Paolo is probably right - the bug is more likely
to be in libvirt not expecting the race and not recovering correctly
when the race occurs, than it is to be in changing qemu's state algorithm.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
  2014-03-24 16:00   ` Eric Blake
@ 2014-03-25 11:15     ` Gonglei (Arei)
  2014-03-28  9:18     ` Gonglei (Arei)
  1 sibling, 0 replies; 9+ messages in thread
From: Gonglei (Arei) @ 2014-03-25 11:15 UTC (permalink / raw)
  To: Eric Blake, Paolo Bonzini, qemu-devel@nongnu.org
  Cc: quintela@redhat.com, Zengjunliang, libvir-list@redhat.com,
	Yanqiangjun, Zhaoyanbin (A), owasserm@redhat.com

> -----Original Message-----
> From: Eric Blake [mailto:eblake@redhat.com]
> Sent: Tuesday, March 25, 2014 12:01 AM
> To: Paolo Bonzini; Gonglei (Arei); qemu-devel@nongnu.org
> Cc: quintela@redhat.com; owasserm@redhat.com; Yanqiangjun; Zhaoyanbin
> (A); Zengjunliang; libvir-list@redhat.com
> Subject: Re: [PATCH] migration: Fix possible bug for migrate cancel
> 
> [adding libvirt]
> 
> On 03/24/2014 09:47 AM, Paolo Bonzini wrote:
> > Il 24/03/2014 14:04, arei.gonglei@huawei.com ha scritto:
> >> From: zengjunliang <zengjunliang@huawei.com>
> >>
> >> Return error for migrate cancel, when migration status is not
> >> MIG_STATE_SETUP or MIG_STATE_ACTIVE. Thus, libvirt can can
> >> perceive the operation fails.
> >>
> >> Signed-off-by: zengjunliang <zengjunliang@huawei.com>
> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >
> > I think this is done on purpose, because canceling migration is racy.
> > Instead, libvirt should do "query-migrate" and check if the migration
> > was completed or canceled.
> 
> Can you please give more details at how you are triggering the problem
> with libvirt?  I think Paolo is probably right - the bug is more likely
> to be in libvirt not expecting the race and not recovering correctly
> when the race occurs, than it is to be in changing qemu's state algorithm.
> 
When the migration progress reaches 100%, and the migration status becomes MIG_STATE_COMPLETED in Qemu.
It will take some time which from MIG_STATE_COMPLETED to the migration thread resources are recovered.
If we cancel the migration at this moment, the migrate_fd_cancel function will break directly without reporting
error code. Then, libvirt considers the cancle operation a success, contrary facts.

Best regards,
-Gonglei


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
  2014-03-24 16:00   ` Eric Blake
  2014-03-25 11:15     ` Gonglei (Arei)
@ 2014-03-28  9:18     ` Gonglei (Arei)
  2014-03-28  9:28       ` Paolo Bonzini
  1 sibling, 1 reply; 9+ messages in thread
From: Gonglei (Arei) @ 2014-03-28  9:18 UTC (permalink / raw)
  To: Gonglei (Arei), Eric Blake, Paolo Bonzini, qemu-devel@nongnu.org
  Cc: quintela@redhat.com, Zengjunliang, libvir-list@redhat.com,
	Yanqiangjun, Zhaoyanbin (A), owasserm@redhat.com

> > >> Return error for migrate cancel, when migration status is not
> > >> MIG_STATE_SETUP or MIG_STATE_ACTIVE. Thus, libvirt can can
> > >> perceive the operation fails.
> > >>
> > >> Signed-off-by: zengjunliang <zengjunliang@huawei.com>
> > >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > >
> > > I think this is done on purpose, because canceling migration is racy.
> > > Instead, libvirt should do "query-migrate" and check if the migration
> > > was completed or canceled.
> >
> > Can you please give more details at how you are triggering the problem
> > with libvirt?  I think Paolo is probably right - the bug is more likely
> > to be in libvirt not expecting the race and not recovering correctly
> > when the race occurs, than it is to be in changing qemu's state algorithm.
> >
> When the migration progress reaches 100%, and the migration status becomes
> MIG_STATE_COMPLETED in Qemu.
> It will take some time which from MIG_STATE_COMPLETED to the migration
> thread resources are recovered.
> If we cancel the migration at this moment, the migrate_fd_cancel function will
> break directly without reporting
> error code. Then, libvirt considers the cancle operation a success, contrary
> facts.
> 

Ping... 


Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
  2014-03-28  9:18     ` Gonglei (Arei)
@ 2014-03-28  9:28       ` Paolo Bonzini
  2014-03-28 11:30         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2014-03-28  9:28 UTC (permalink / raw)
  To: Gonglei (Arei), Eric Blake, qemu-devel@nongnu.org
  Cc: quintela@redhat.com, Zengjunliang, libvir-list@redhat.com,
	Yanqiangjun, Zhaoyanbin (A), owasserm@redhat.com

Il 28/03/2014 10:18, Gonglei (Arei) ha scritto:
>> > > Can you please give more details at how you are triggering the problem
>> > > with libvirt?  I think Paolo is probably right - the bug is more likely
>> > > to be in libvirt not expecting the race and not recovering correctly
>> > > when the race occurs, than it is to be in changing qemu's state algorithm.
>> > >
>> When the migration progress reaches 100%, and the migration status becomes
>> MIG_STATE_COMPLETED in Qemu.
>> It will take some time which from MIG_STATE_COMPLETED to the migration
>> thread resources are recovered.
>> If we cancel the migration at this moment, the migrate_fd_cancel function will
>> break directly without reporting
>> error code. Then, libvirt considers the cancle operation a success, contrary
>> facts.

There is no error, once migration is completed you can still shutdown on 
the destination and continue on the source.  Libvirt should either:

1) poll with "query-migrate" after migrate_cancel, and report an error 
there if it's the desired semantics;

2) toggle a "cancelled" flag before asking QEMU to cancel migration, 
check it in the migration functions after "query-migrate" reported 
completion; if it is true, do not resume on the destination.

Another reason for doing it in libvirt is that the serialization between 
cancellation and completion of migration ultimately is controlled by 
libvirt's lock.  Doing this in QEMU makes it harder to reason about 
concurrency.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
  2014-03-28  9:28       ` Paolo Bonzini
@ 2014-03-28 11:30         ` Dr. David Alan Gilbert
  2014-03-28 12:16           ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Dr. David Alan Gilbert @ 2014-03-28 11:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: quintela@redhat.com, libvir-list@redhat.com, Yanqiangjun,
	qemu-devel@nongnu.org, Zhaoyanbin (A), Zengjunliang,
	Gonglei (Arei)

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> Il 28/03/2014 10:18, Gonglei (Arei) ha scritto:
> >>> > Can you please give more details at how you are triggering the problem
> >>> > with libvirt?  I think Paolo is probably right - the bug is more likely
> >>> > to be in libvirt not expecting the race and not recovering correctly
> >>> > when the race occurs, than it is to be in changing qemu's state algorithm.
> >>> >
> >>When the migration progress reaches 100%, and the migration status becomes
> >>MIG_STATE_COMPLETED in Qemu.
> >>It will take some time which from MIG_STATE_COMPLETED to the migration
> >>thread resources are recovered.
> >>If we cancel the migration at this moment, the migrate_fd_cancel function will
> >>break directly without reporting
> >>error code. Then, libvirt considers the cancle operation a success, contrary
> >>facts.
> 
> There is no error, once migration is completed you can still
> shutdown on the destination and continue on the source.  Libvirt
> should either:

(I've rewritten my reply below about 4 times - swinging between
different answers, this stuff really isn't obvious, and certainly
not documented)

I think I agree that it's not an error; but I think migrate_fd_cancel
knows what the outcome will be.

If it was MIG_STATE_ERROR on entry to migrate_fd_cancel, then yes it
could tell you that the cancel failed because you were already in error.

If it was MIG_STATE_COMPLETED on entry to migrate_fd_cancel, then yes it
could tell you that the cancel failed because you already finished.

If it was MIG_STATE_ACTIVE on entry to migrate_fd_cancel - it will go to
MIG_STATE_CANCELLING and I believe eventually to MIG_STATE_CANCELLED;
I don't believe it can get to MIG_STATE_ERROR from that point, since
all of the places in the migrate_thread that transition to error
do explicit ACTIVE->ERROR transitions.  I don't believe it can get to
MIG_STATE_COMPLETED for the same reason.

So migrate_fd_cancel knows that the eventual outcome will be Error
or Cancelled or completed, even if the state isn't there yet, and it
could reply to say that.

> 1) poll with "query-migrate" after migrate_cancel, and report an
> error there if it's the desired semantics;
> 2) toggle a "cancelled" flag before asking QEMU to cancel migration,
> check it in the migration functions after "query-migrate" reported
> completion; if it is true, do not resume on the destination.

I think you're right you have to poll with query-migrate until you
get one of cancelled/failed/completed.

However it's a bit odd; prior to the introduction of 'CANCELLING', the
state that you would get by a query-migrate after migrate_fd_cancel
returned would in principal be the state you ended up in - i.e.
cancelled/failed/completed.  With cancelling added, query-migrate
might lie to you and say 'active' (when it's really hiding the
fact that cancelling is happening).    So while 'cancelling' apparently
didn't alter the API it did, in that query-migrate after a cancel
can now return active where it couldn't before.

> Another reason for doing it in libvirt is that the serialization
> between cancellation and completion of migration ultimately is
> controlled by libvirt's lock.  Doing this in QEMU makes it harder to
> reason about concurrency.

I think you have to be careful when you talk about 'cancellation and completion
of migration' - in that paragraph I don't think you mean the same thing
as MIG_STATE_CANCELLED and MIG_STATE_COMPLETED, I think you're talking
about the larger scale idea of completion after you take into account
that the VM might be paused after qemu has gone to MIG_STATE_COMPLETED and
libvirt might still decide it wants to give up and use the version on
the source that's still paused.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
  2014-03-28 11:30         ` Dr. David Alan Gilbert
@ 2014-03-28 12:16           ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2014-03-28 12:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: quintela@redhat.com, libvir-list@redhat.com, Yanqiangjun,
	qemu-devel@nongnu.org, Zhaoyanbin (A), Zengjunliang,
	Gonglei (Arei)

Il 28/03/2014 12:30, Dr. David Alan Gilbert ha scritto:
>> > Another reason for doing it in libvirt is that the serialization
>> > between cancellation and completion of migration ultimately is
>> > controlled by libvirt's lock.  Doing this in QEMU makes it harder to
>> > reason about concurrency.
> I think you have to be careful when you talk about 'cancellation and completion
> of migration' - in that paragraph I don't think you mean the same thing
> as MIG_STATE_CANCELLED and MIG_STATE_COMPLETED, I think you're talking
> about the larger scale idea of completion after you take into account
> that the VM might be paused after qemu has gone to MIG_STATE_COMPLETED and
> libvirt might still decide it wants to give up and use the version on
> the source that's still paused.

Yes, exactly.  This is why I considered the possibility of adding a 
"cancelled" flag within libvirt.

Libvirt always uses -S on the destination, so it's always possible to 
cancel migration even after MIG_STATE_COMPLETED.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-03-28 12:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-24 13:04 [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel arei.gonglei
2014-03-24 14:14 ` Eric Blake
2014-03-24 15:47 ` Paolo Bonzini
2014-03-24 16:00   ` Eric Blake
2014-03-25 11:15     ` Gonglei (Arei)
2014-03-28  9:18     ` Gonglei (Arei)
2014-03-28  9:28       ` Paolo Bonzini
2014-03-28 11:30         ` Dr. David Alan Gilbert
2014-03-28 12:16           ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).