From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hongyang Yang Subject: Re: [PATCH v2] fix Remus failover regression Date: Mon, 28 Jul 2014 17:29:19 +0800 Message-ID: <53D617EF.8060902@cn.fujitsu.com> References: <1406520207-10769-1-git-send-email-yanghy@cn.fujitsu.com> <53D616B9.20103@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53D616B9.20103@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , xen-devel@lists.xen.org Cc: Shriram Rajagopalan , Ian Jackson , Ian Campbell List-Id: xen-devel@lists.xenproject.org Hi Andrew, On 07/28/2014 05:24 PM, Andrew Cooper wrote: > On 28/07/14 05:03, Yang Hongyang wrote: >> commit: c2ba706c >> tools/libxc: goto correct label on error paths by Andrew Cooper >> broke Remus in Xen 4.4 or earlier versions that has this commit >> backported. > > My appologies for breaking Remus. (it just goes to show how fragile this > code is). > >> >> With Remus, this jump essentially discards the current incomplete >> checkpoint received by the backup and restore backup from the >> last complete checkpoint. >> This is required for Remus to work and this does not break live >> migration. >> It has been around since Xen 4.0. > > However, it is a genuine bugfix for regular migration, so simply > reverting it as this patch does is not appropriate. > > For regular migration, you absolutely have to goto out; on a failure > otherwise the finish code will run and declare the migration a success > despite only having half a domain restored. I think regular migration shouldn't run into this path (see what I commented in v1), but I agree that add a check will be better. > > You need something like: > > if ( !checkpointed_stream ) > goto err; > > /* Remus comment */ > goto finish; > > to deal with the different error handing requirements of remus and > regular streams. > > ~Andrew > >> >> CC: Ian Jackson >> CC: Ian Campbell >> CC: Andrew Cooper >> CC: Shriram Rajagopalan >> Signed-off-by: Yang Hongyang >> --- >> tools/libxc/xc_domain_restore.c | 13 +++++++++++-- >> 1 file changed, 11 insertions(+), 2 deletions(-) >> >> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c >> index e73e0a2..b9a56d5 100644 >> --- a/tools/libxc/xc_domain_restore.c >> +++ b/tools/libxc/xc_domain_restore.c >> @@ -1783,20 +1783,29 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, >> >> if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { >> PERROR("error when buffering batch, finishing"); >> - goto out; >> + /* >> + * Remus: discard the current incomplete checkpoint and restore >> + * backup from the last complete checkpoint. >> + */ >> + goto finish; >> } >> memset(&tmptail, 0, sizeof(tmptail)); >> tmptail.ishvm = hvm; >> if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap, >> ext_vcpucontext, vcpuextstate_size) < 0 ) { >> ERROR ("error buffering image tail, finishing"); >> - goto out; >> + /* >> + * Remus: discard the current incomplete checkpoint and restore >> + * backup from the last complete checkpoint. >> + */ >> + goto finish; >> } >> tailbuf_free(&tailbuf); >> memcpy(&tailbuf, &tmptail, sizeof(tailbuf)); >> >> goto loadpages; >> >> + /* With Remus: restore from last complete checkpoint */ >> finish: >> if ( hvm ) >> goto finish_hvm; > > . > -- Thanks, Yang.