All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix Remus failover regression
@ 2014-07-28  3:35 Yang Hongyang
  2014-07-28  3:44 ` Shriram Rajagopalan
  2014-07-28  4:05 ` Wen Congyang
  0 siblings, 2 replies; 5+ messages in thread
From: Yang Hongyang @ 2014-07-28  3:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Shriram Rajagopalan, Andrew Cooper, Yang Hongyang, Ian Jackson,
	Ian Campbell

commit: c2ba706c
tools/libxc: goto correct label on error paths by Andrew broke
Remus in Xen 4.4 or earlier versions that has this commit backported.

With Remus, this jump essentially discards the last incomplete
checkpoint received by the backup.
This is required for Remus to work and this does not break live
migration.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
---
 tools/libxc/xc_domain_restore.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
index e73e0a2..5d2fbd6 100644
--- a/tools/libxc/xc_domain_restore.c
+++ b/tools/libxc/xc_domain_restore.c
@@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
 
     if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
         PERROR("error when buffering batch, finishing");
-        goto out;
+        goto finish;
     }
     memset(&tmptail, 0, sizeof(tmptail));
     tmptail.ishvm = hvm;
     if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
                      ext_vcpucontext, vcpuextstate_size) < 0 ) {
         ERROR ("error buffering image tail, finishing");
-        goto out;
+        goto finish;
     }
     tailbuf_free(&tailbuf);
     memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix Remus failover regression
  2014-07-28  3:35 [PATCH] fix Remus failover regression Yang Hongyang
@ 2014-07-28  3:44 ` Shriram Rajagopalan
  2014-07-28  4:05 ` Wen Congyang
  1 sibling, 0 replies; 5+ messages in thread
From: Shriram Rajagopalan @ 2014-07-28  3:44 UTC (permalink / raw)
  To: FNST-Yang Hongyang; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1790 bytes --]

On Jul 27, 2014 11:35 PM, "Yang Hongyang" <yanghy@cn.fujitsu.com> wrote:
>
> commit: c2ba706c
> tools/libxc: goto correct label on error paths by Andrew broke
> Remus in Xen 4.4 or earlier versions that has this commit backported.
>
> With Remus, this jump essentially discards the last incomplete
> checkpoint received by the backup.
> This is required for Remus to work and this does not break live
> migration.
>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxc/xc_domain_restore.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/libxc/xc_domain_restore.c
b/tools/libxc/xc_domain_restore.c
> index e73e0a2..5d2fbd6 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int
io_fd, uint32_t dom,
>
>      if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
>          PERROR("error when buffering batch, finishing");
> -        goto out;
> +        goto finish;
>      }
>      memset(&tmptail, 0, sizeof(tmptail));
>      tmptail.ishvm = hvm;
>      if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
>                       ext_vcpucontext, vcpuextstate_size) < 0 ) {
>          ERROR ("error buffering image tail, finishing");
> -        goto out;
> +        goto finish;
>      }
>      tailbuf_free(&tailbuf);
>      memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
> --
> 1.9.1
>

Can you please add the comment about discarding the incomplete checkpoint
on top of the two goto statements?

Otherwise things look ok to me.

[-- Attachment #1.2: Type: text/html, Size: 2619 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix Remus failover regression
  2014-07-28  3:35 [PATCH] fix Remus failover regression Yang Hongyang
  2014-07-28  3:44 ` Shriram Rajagopalan
@ 2014-07-28  4:05 ` Wen Congyang
  2014-07-28  4:20   ` Hongyang Yang
  1 sibling, 1 reply; 5+ messages in thread
From: Wen Congyang @ 2014-07-28  4:05 UTC (permalink / raw)
  To: Yang Hongyang, xen-devel
  Cc: Shriram Rajagopalan, Andrew Cooper, Ian Jackson, Ian Campbell

At 07/28/2014 11:35 AM, Yang Hongyang Write:
> commit: c2ba706c
> tools/libxc: goto correct label on error paths by Andrew broke
> Remus in Xen 4.4 or earlier versions that has this commit backported.
> 
> With Remus, this jump essentially discards the last incomplete
> checkpoint received by the backup.
> This is required for Remus to work and this does not break live
> migration.
> 
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca>
> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
> ---
>  tools/libxc/xc_domain_restore.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
> index e73e0a2..5d2fbd6 100644
> --- a/tools/libxc/xc_domain_restore.c
> +++ b/tools/libxc/xc_domain_restore.c
> @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>  
>      if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
>          PERROR("error when buffering batch, finishing");
> -        goto out;
> +        goto finish;
>      }
>      memset(&tmptail, 0, sizeof(tmptail));
>      tmptail.ishvm = hvm;
>      if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
>                       ext_vcpucontext, vcpuextstate_size) < 0 ) {
>          ERROR ("error buffering image tail, finishing");
> -        goto out;
> +        goto finish;
>      }
>      tailbuf_free(&tailbuf);
>      memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
> 

The mail is here:
http://lists.xenproject.org/archives/html/xen-devel/2014-01/msg02299.html

> Both of these errors have been discovered by xc_domain_restore() returning
> success after suffering a fatal error during migration, leading to the
> toolstack believing that the VM migrated successfully.

These codes are only for Remus. So, why these codes are executed by migration?

Thanks
Wen Congyang

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix Remus failover regression
  2014-07-28  4:05 ` Wen Congyang
@ 2014-07-28  4:20   ` Hongyang Yang
  2014-07-28  4:50     ` Shriram Rajagopalan
  0 siblings, 1 reply; 5+ messages in thread
From: Hongyang Yang @ 2014-07-28  4:20 UTC (permalink / raw)
  To: Wen Congyang, xen-devel
  Cc: Shriram Rajagopalan, Andrew Cooper, Ian Jackson, Ian Campbell



On 07/28/2014 12:05 PM, Wen Congyang wrote:
> At 07/28/2014 11:35 AM, Yang Hongyang Write:
>> commit: c2ba706c
>> tools/libxc: goto correct label on error paths by Andrew broke
>> Remus in Xen 4.4 or earlier versions that has this commit backported.
>>
>> With Remus, this jump essentially discards the last incomplete
>> checkpoint received by the backup.
>> This is required for Remus to work and this does not break live
>> migration.
>>
>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxc/xc_domain_restore.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
>> index e73e0a2..5d2fbd6 100644
>> --- a/tools/libxc/xc_domain_restore.c
>> +++ b/tools/libxc/xc_domain_restore.c
>> @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>>
>>       if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
>>           PERROR("error when buffering batch, finishing");
>> -        goto out;
>> +        goto finish;
>>       }
>>       memset(&tmptail, 0, sizeof(tmptail));
>>       tmptail.ishvm = hvm;
>>       if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
>>                        ext_vcpucontext, vcpuextstate_size) < 0 ) {
>>           ERROR ("error buffering image tail, finishing");
>> -        goto out;
>> +        goto finish;
>>       }
>>       tailbuf_free(&tailbuf);
>>       memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
>>
>
> The mail is here:
> http://lists.xenproject.org/archives/html/xen-devel/2014-01/msg02299.html
>
>> Both of these errors have been discovered by xc_domain_restore() returning
>> success after suffering a fatal error during migration, leading to the
>> toolstack believing that the VM migrated successfully.
>
> These codes are only for Remus. So, why these codes are executed by migration?

I was confused also, without Remus, these two error path will not be hitted I
think, without Remus, migration will ended at:
1776     if ( ctx->last_checkpoint )
1777     {
1778         // DPRINTF("Last checkpoint, finishing\n");
1779         goto finish;
1780     }

>
> Thanks
> Wen Congyang
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix Remus failover regression
  2014-07-28  4:20   ` Hongyang Yang
@ 2014-07-28  4:50     ` Shriram Rajagopalan
  0 siblings, 0 replies; 5+ messages in thread
From: Shriram Rajagopalan @ 2014-07-28  4:50 UTC (permalink / raw)
  To: FNST-Yang Hongyang
  Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wen Congyang, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3105 bytes --]

On Jul 28, 2014 12:20 AM, "Hongyang Yang" <yanghy@cn.fujitsu.com> wrote:
>
>
>
> On 07/28/2014 12:05 PM, Wen Congyang wrote:
>>
>> At 07/28/2014 11:35 AM, Yang Hongyang Write:
>>>
>>> commit: c2ba706c
>>> tools/libxc: goto correct label on error paths by Andrew broke
>>> Remus in Xen 4.4 or earlier versions that has this commit backported.
>>>
>>> With Remus, this jump essentially discards the last incomplete
>>> checkpoint received by the backup.
>>> This is required for Remus to work and this does not break live
>>> migration.
>>>
>>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>>> CC: Ian Campbell <ian.campbell@citrix.com>
>>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>>> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>>> ---
>>>   tools/libxc/xc_domain_restore.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/libxc/xc_domain_restore.c
b/tools/libxc/xc_domain_restore.c
>>> index e73e0a2..5d2fbd6 100644
>>> --- a/tools/libxc/xc_domain_restore.c
>>> +++ b/tools/libxc/xc_domain_restore.c
>>> @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int
io_fd, uint32_t dom,
>>>
>>>       if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
>>>           PERROR("error when buffering batch, finishing");
>>> -        goto out;
>>> +        goto finish;
>>>       }
>>>       memset(&tmptail, 0, sizeof(tmptail));
>>>       tmptail.ishvm = hvm;
>>>       if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
>>>                        ext_vcpucontext, vcpuextstate_size) < 0 ) {
>>>           ERROR ("error buffering image tail, finishing");
>>> -        goto out;
>>> +        goto finish;
>>>       }
>>>       tailbuf_free(&tailbuf);
>>>       memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
>>>
>>
>> The mail is here:
>> http://lists.xenproject.org/archives/html/xen-devel/2014-01/msg02299.html
>>
>>> Both of these errors have been discovered by xc_domain_restore()
returning
>>> success after suffering a fatal error during migration, leading to the
>>> toolstack believing that the VM migrated successfully.
>>
>>
>> These codes are only for Remus. So, why these codes are executed by
migration?
>

I am not familiar with the XenServer code base. I don't know if it has
Remus support. So the xc_domain_restore.c file may or may not be the same
between Xen and XenServer. Please correct me if I am wrong.

Also, can those errors encountered in XenServer be reproduced in Xen 4.4?

Finally, goto finish vs goto out can be if-elsed by checking the
ctx->complete and the ctx->last_checkpoint variables, to distinguish
between a failure during mid-migragion vs mid-checkpoint. I haven't thought
about this fully.

>
> I was confused also, without Remus, these two error path will not be
hitted I
> think, without Remus, migration will ended at:
> 1776     if ( ctx->last_checkpoint )
> 1777     {
> 1778         // DPRINTF("Last checkpoint, finishing\n");
> 1779         goto finish;
> 1780     }
>
>>
>> Thanks
>> Wen Congyang
>>
>>
>> .
>>
>
> --
> Thanks,
> Yang.
>

[-- Attachment #1.2: Type: text/html, Size: 4692 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-07-28  4:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-28  3:35 [PATCH] fix Remus failover regression Yang Hongyang
2014-07-28  3:44 ` Shriram Rajagopalan
2014-07-28  4:05 ` Wen Congyang
2014-07-28  4:20   ` Hongyang Yang
2014-07-28  4:50     ` Shriram Rajagopalan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.