xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Hongyang Yang <yanghy@cn.fujitsu.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>, xen-devel@lists.xen.org
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>
Subject: Re: [PATCH v2] fix Remus failover regression
Date: Mon, 28 Jul 2014 17:29:19 +0800	[thread overview]
Message-ID: <53D617EF.8060902@cn.fujitsu.com> (raw)
In-Reply-To: <53D616B9.20103@citrix.com>

Hi Andrew,

On 07/28/2014 05:24 PM, Andrew Cooper wrote:
> On 28/07/14 05:03, Yang Hongyang wrote:
>> commit: c2ba706c
>> tools/libxc: goto correct label on error paths by Andrew Cooper
>> broke Remus in Xen 4.4 or earlier versions that has this commit
>> backported.
>
> My appologies for breaking Remus. (it just goes to show how fragile this
> code is).
>
>>
>> With Remus, this jump essentially discards the current incomplete
>> checkpoint received by the backup and restore backup from the
>> last complete checkpoint.
>> This is required for Remus to work and this does not break live
>> migration.
>> It has been around since Xen 4.0.
>
> However, it is a genuine bugfix for regular migration, so simply
> reverting it as this patch does is not appropriate.
>
> For regular migration, you absolutely have to goto out; on a failure
> otherwise the finish code will run and declare the migration a success
> despite only having half a domain restored.

I think regular migration shouldn't run into this path (see what I commented
in v1), but I agree that add a check will be better.

>
> You need something like:
>
> if ( !checkpointed_stream )
>      goto err;
>
> /* Remus comment */
> goto finish;
>
> to deal with the different error handing requirements of remus and
> regular streams.
>
> ~Andrew
>
>>
>> CC: Ian Jackson <ian.jackson@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Shriram Rajagopalan <rshriram@cs.ubc.ca>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   tools/libxc/xc_domain_restore.c | 13 +++++++++++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c
>> index e73e0a2..b9a56d5 100644
>> --- a/tools/libxc/xc_domain_restore.c
>> +++ b/tools/libxc/xc_domain_restore.c
>> @@ -1783,20 +1783,29 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
>>
>>       if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
>>           PERROR("error when buffering batch, finishing");
>> -        goto out;
>> +        /*
>> +         * Remus: discard the current incomplete checkpoint and restore
>> +         * backup from the last complete checkpoint.
>> +         */
>> +        goto finish;
>>       }
>>       memset(&tmptail, 0, sizeof(tmptail));
>>       tmptail.ishvm = hvm;
>>       if ( buffer_tail(xch, ctx, &tmptail, io_fd, max_vcpu_id, vcpumap,
>>                        ext_vcpucontext, vcpuextstate_size) < 0 ) {
>>           ERROR ("error buffering image tail, finishing");
>> -        goto out;
>> +        /*
>> +         * Remus: discard the current incomplete checkpoint and restore
>> +         * backup from the last complete checkpoint.
>> +         */
>> +        goto finish;
>>       }
>>       tailbuf_free(&tailbuf);
>>       memcpy(&tailbuf, &tmptail, sizeof(tailbuf));
>>
>>       goto loadpages;
>>
>> +  /* With Remus: restore from last complete checkpoint */
>>     finish:
>>       if ( hvm )
>>           goto finish_hvm;
>
> .
>

-- 
Thanks,
Yang.

  reply	other threads:[~2014-07-28  9:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-28  4:03 [PATCH v2] fix Remus failover regression Yang Hongyang
2014-07-28  4:05 ` Shriram Rajagopalan
2014-07-28  9:24 ` Andrew Cooper
2014-07-28  9:29   ` Hongyang Yang [this message]
2014-07-28 10:11     ` Andrew Cooper
2014-08-07  1:16 ` Hongyang Yang
2014-08-07  7:43   ` Andrew Cooper
2014-08-21  8:12     ` Hongyang Yang
2014-08-21 22:49     ` Ian Campbell
2014-08-21 22:50     ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53D617EF.8060902@cn.fujitsu.com \
    --to=yanghy@cn.fujitsu.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=rshriram@cs.ubc.ca \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).