* [PATCH] fix Remus failover regression @ 2014-07-28 3:35 Yang Hongyang 2014-07-28 3:44 ` Shriram Rajagopalan 2014-07-28 4:05 ` Wen Congyang 0 siblings, 2 replies; 5+ messages in thread From: Yang Hongyang @ 2014-07-28 3:35 UTC (permalink / raw) To: xen-devel Cc: Shriram Rajagopalan, Andrew Cooper, Yang Hongyang, Ian Jackson, Ian Campbell commit: c2ba706c tools/libxc: goto correct label on error paths by Andrew broke Remus in Xen 4.4 or earlier versions that has this commit backported. With Remus, this jump essentially discards the last incomplete checkpoint received by the backup. This is required for Remus to work and this does not break live migration. CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> --- tools/libxc/xc_domain_restore.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index e73e0a2..5d2fbd6 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { PERROR("error when buffering batch, finishing"); - goto out; + goto finish; } memset(&tmptail, 0, sizeof(tmptail)); tmptail.ishvm = hvm; if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap, ext_vcpucontext, vcpuextstate_size) < 0 ) { ERROR ("error buffering image tail, finishing"); - goto out; + goto finish; } tailbuf_free(&tailbuf); memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); -- 1.9.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] fix Remus failover regression 2014-07-28 3:35 [PATCH] fix Remus failover regression Yang Hongyang @ 2014-07-28 3:44 ` Shriram Rajagopalan 2014-07-28 4:05 ` Wen Congyang 1 sibling, 0 replies; 5+ messages in thread From: Shriram Rajagopalan @ 2014-07-28 3:44 UTC (permalink / raw) To: FNST-Yang Hongyang; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1790 bytes --] On Jul 27, 2014 11:35 PM, "Yang Hongyang" <yanghy@cn.fujitsu.com> wrote: > > commit: c2ba706c > tools/libxc: goto correct label on error paths by Andrew broke > Remus in Xen 4.4 or earlier versions that has this commit backported. > > With Remus, this jump essentially discards the last incomplete > checkpoint received by the backup. > This is required for Remus to work and this does not break live > migration. > > CC: Ian Jackson <ian.jackson@eu.citrix.com> > CC: Ian Campbell <ian.campbell@citrix.com> > CC: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Shriram Rajagopalan <rshriram@cs.ubc.ca> > Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> > --- > tools/libxc/xc_domain_restore.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c > index e73e0a2..5d2fbd6 100644 > --- a/tools/libxc/xc_domain_restore.c > +++ b/tools/libxc/xc_domain_restore.c > @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > > if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { > PERROR("error when buffering batch, finishing"); > - goto out; > + goto finish; > } > memset(&tmptail, 0, sizeof(tmptail)); > tmptail.ishvm = hvm; > if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap, > ext_vcpucontext, vcpuextstate_size) < 0 ) { > ERROR ("error buffering image tail, finishing"); > - goto out; > + goto finish; > } > tailbuf_free(&tailbuf); > memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); > -- > 1.9.1 > Can you please add the comment about discarding the incomplete checkpoint on top of the two goto statements? Otherwise things look ok to me. [-- Attachment #1.2: Type: text/html, Size: 2619 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fix Remus failover regression 2014-07-28 3:35 [PATCH] fix Remus failover regression Yang Hongyang 2014-07-28 3:44 ` Shriram Rajagopalan @ 2014-07-28 4:05 ` Wen Congyang 2014-07-28 4:20 ` Hongyang Yang 1 sibling, 1 reply; 5+ messages in thread From: Wen Congyang @ 2014-07-28 4:05 UTC (permalink / raw) To: Yang Hongyang, xen-devel Cc: Shriram Rajagopalan, Andrew Cooper, Ian Jackson, Ian Campbell At 07/28/2014 11:35 AM, Yang Hongyang Write: > commit: c2ba706c > tools/libxc: goto correct label on error paths by Andrew broke > Remus in Xen 4.4 or earlier versions that has this commit backported. > > With Remus, this jump essentially discards the last incomplete > checkpoint received by the backup. > This is required for Remus to work and this does not break live > migration. > > CC: Ian Jackson <ian.jackson@eu.citrix.com> > CC: Ian Campbell <ian.campbell@citrix.com> > CC: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Shriram Rajagopalan <rshriram@cs.ubc.ca> > Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> > --- > tools/libxc/xc_domain_restore.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c > index e73e0a2..5d2fbd6 100644 > --- a/tools/libxc/xc_domain_restore.c > +++ b/tools/libxc/xc_domain_restore.c > @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > > if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { > PERROR("error when buffering batch, finishing"); > - goto out; > + goto finish; > } > memset(&tmptail, 0, sizeof(tmptail)); > tmptail.ishvm = hvm; > if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap, > ext_vcpucontext, vcpuextstate_size) < 0 ) { > ERROR ("error buffering image tail, finishing"); > - goto out; > + goto finish; > } > tailbuf_free(&tailbuf); > memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); > The mail is here: http://lists.xenproject.org/archives/html/xen-devel/2014-01/msg02299.html > Both of these errors have been discovered by xc_domain_restore() returning > success after suffering a fatal error during migration, leading to the > toolstack believing that the VM migrated successfully. These codes are only for Remus. So, why these codes are executed by migration? Thanks Wen Congyang ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fix Remus failover regression 2014-07-28 4:05 ` Wen Congyang @ 2014-07-28 4:20 ` Hongyang Yang 2014-07-28 4:50 ` Shriram Rajagopalan 0 siblings, 1 reply; 5+ messages in thread From: Hongyang Yang @ 2014-07-28 4:20 UTC (permalink / raw) To: Wen Congyang, xen-devel Cc: Shriram Rajagopalan, Andrew Cooper, Ian Jackson, Ian Campbell On 07/28/2014 12:05 PM, Wen Congyang wrote: > At 07/28/2014 11:35 AM, Yang Hongyang Write: >> commit: c2ba706c >> tools/libxc: goto correct label on error paths by Andrew broke >> Remus in Xen 4.4 or earlier versions that has this commit backported. >> >> With Remus, this jump essentially discards the last incomplete >> checkpoint received by the backup. >> This is required for Remus to work and this does not break live >> migration. >> >> CC: Ian Jackson <ian.jackson@eu.citrix.com> >> CC: Ian Campbell <ian.campbell@citrix.com> >> CC: Andrew Cooper <andrew.cooper3@citrix.com> >> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca> >> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> >> --- >> tools/libxc/xc_domain_restore.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c >> index e73e0a2..5d2fbd6 100644 >> --- a/tools/libxc/xc_domain_restore.c >> +++ b/tools/libxc/xc_domain_restore.c >> @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, >> >> if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { >> PERROR("error when buffering batch, finishing"); >> - goto out; >> + goto finish; >> } >> memset(&tmptail, 0, sizeof(tmptail)); >> tmptail.ishvm = hvm; >> if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap, >> ext_vcpucontext, vcpuextstate_size) < 0 ) { >> ERROR ("error buffering image tail, finishing"); >> - goto out; >> + goto finish; >> } >> tailbuf_free(&tailbuf); >> memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); >> > > The mail is here: > http://lists.xenproject.org/archives/html/xen-devel/2014-01/msg02299.html > >> Both of these errors have been discovered by xc_domain_restore() returning >> success after suffering a fatal error during migration, leading to the >> toolstack believing that the VM migrated successfully. > > These codes are only for Remus. So, why these codes are executed by migration? I was confused also, without Remus, these two error path will not be hitted I think, without Remus, migration will ended at: 1776 if ( ctx->last_checkpoint ) 1777 { 1778 // DPRINTF("Last checkpoint, finishing\n"); 1779 goto finish; 1780 } > > Thanks > Wen Congyang > > > . > -- Thanks, Yang. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fix Remus failover regression 2014-07-28 4:20 ` Hongyang Yang @ 2014-07-28 4:50 ` Shriram Rajagopalan 0 siblings, 0 replies; 5+ messages in thread From: Shriram Rajagopalan @ 2014-07-28 4:50 UTC (permalink / raw) To: FNST-Yang Hongyang Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wen Congyang, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 3105 bytes --] On Jul 28, 2014 12:20 AM, "Hongyang Yang" <yanghy@cn.fujitsu.com> wrote: > > > > On 07/28/2014 12:05 PM, Wen Congyang wrote: >> >> At 07/28/2014 11:35 AM, Yang Hongyang Write: >>> >>> commit: c2ba706c >>> tools/libxc: goto correct label on error paths by Andrew broke >>> Remus in Xen 4.4 or earlier versions that has this commit backported. >>> >>> With Remus, this jump essentially discards the last incomplete >>> checkpoint received by the backup. >>> This is required for Remus to work and this does not break live >>> migration. >>> >>> CC: Ian Jackson <ian.jackson@eu.citrix.com> >>> CC: Ian Campbell <ian.campbell@citrix.com> >>> CC: Andrew Cooper <andrew.cooper3@citrix.com> >>> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca> >>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> >>> --- >>> tools/libxc/xc_domain_restore.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c >>> index e73e0a2..5d2fbd6 100644 >>> --- a/tools/libxc/xc_domain_restore.c >>> +++ b/tools/libxc/xc_domain_restore.c >>> @@ -1783,14 +1783,14 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, >>> >>> if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { >>> PERROR("error when buffering batch, finishing"); >>> - goto out; >>> + goto finish; >>> } >>> memset(&tmptail, 0, sizeof(tmptail)); >>> tmptail.ishvm = hvm; >>> if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap, >>> ext_vcpucontext, vcpuextstate_size) < 0 ) { >>> ERROR ("error buffering image tail, finishing"); >>> - goto out; >>> + goto finish; >>> } >>> tailbuf_free(&tailbuf); >>> memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); >>> >> >> The mail is here: >> http://lists.xenproject.org/archives/html/xen-devel/2014-01/msg02299.html >> >>> Both of these errors have been discovered by xc_domain_restore() returning >>> success after suffering a fatal error during migration, leading to the >>> toolstack believing that the VM migrated successfully. >> >> >> These codes are only for Remus. So, why these codes are executed by migration? > I am not familiar with the XenServer code base. I don't know if it has Remus support. So the xc_domain_restore.c file may or may not be the same between Xen and XenServer. Please correct me if I am wrong. Also, can those errors encountered in XenServer be reproduced in Xen 4.4? Finally, goto finish vs goto out can be if-elsed by checking the ctx->complete and the ctx->last_checkpoint variables, to distinguish between a failure during mid-migragion vs mid-checkpoint. I haven't thought about this fully. > > I was confused also, without Remus, these two error path will not be hitted I > think, without Remus, migration will ended at: > 1776 if ( ctx->last_checkpoint ) > 1777 { > 1778 // DPRINTF("Last checkpoint, finishing\n"); > 1779 goto finish; > 1780 } > >> >> Thanks >> Wen Congyang >> >> >> . >> > > -- > Thanks, > Yang. > [-- Attachment #1.2: Type: text/html, Size: 4692 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-07-28 4:50 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-28 3:35 [PATCH] fix Remus failover regression Yang Hongyang 2014-07-28 3:44 ` Shriram Rajagopalan 2014-07-28 4:05 ` Wen Congyang 2014-07-28 4:20 ` Hongyang Yang 2014-07-28 4:50 ` Shriram Rajagopalan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.