From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Hongyang Yang <yanghy@cn.fujitsu.com>, xen-devel@lists.xen.org
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>,
Ian Jackson <ian.jackson@eu.citrix.com>,
Ian Campbell <ian.campbell@citrix.com>
Subject: Re: [PATCH v2] fix Remus failover regression
Date: Mon, 28 Jul 2014 11:11:18 +0100 [thread overview]
Message-ID: <53D621C6.1060107@citrix.com> (raw)
In-Reply-To: <53D617EF.8060902@cn.fujitsu.com>
On 28/07/14 10:29, Hongyang Yang wrote:
> Hi Andrew,
>
> On 07/28/2014 05:24 PM, Andrew Cooper wrote:
>> On 28/07/14 05:03, Yang Hongyang wrote:
>>> commit: c2ba706c
>>> tools/libxc: goto correct label on error paths by Andrew Cooper
>>> broke Remus in Xen 4.4 or earlier versions that has this commit
>>> backported.
>>
>> My appologies for breaking Remus. (it just goes to show how fragile this
>> code is).
>>
>>>
>>> With Remus, this jump essentially discards the current incomplete
>>> checkpoint received by the backup and restore backup from the
>>> last complete checkpoint.
>>> This is required for Remus to work and this does not break live
>>> migration.
>>> It has been around since Xen 4.0.
>>
>> However, it is a genuine bugfix for regular migration, so simply
>> reverting it as this patch does is not appropriate.
>>
>> For regular migration, you absolutely have to goto out; on a failure
>> otherwise the finish code will run and declare the migration a success
>> despite only having half a domain restored.
>
> I think regular migration shouldn't run into this path (see what I
> commented
> in v1), but I agree that add a check will be better.
Hmm - I see what you mean. I can't spot how a regular migration would
end up at that point.
When I debugged the issue, I was encountering the pagebuf error message
on a regular migrate, although I was debugging a single isolated failure
from logs alone. With a bit of hindsight now, this probably means that
ctx->last_checkpoint was wrong.
We regularly test migration from before the point that
ctx->last_checkpoint was introduced and broke the migration
backwards-compatibility, but the purpose of checkpointed_stream was to
re-fix this without regressing backwards compatibility.
I have to admit that I somewhat confused as to what actually went on,
but it is also clear that my changes were based on incorrect reasoning
and further rereasoning at this point suggests the changes were wrong.
Therefore, this patch with comments is probably best.
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
IanJ: This fix needs backporting to 4.4 (4.3 and older are fine)
~Andrew
next prev parent reply other threads:[~2014-07-28 10:11 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-28 4:03 [PATCH v2] fix Remus failover regression Yang Hongyang
2014-07-28 4:05 ` Shriram Rajagopalan
2014-07-28 9:24 ` Andrew Cooper
2014-07-28 9:29 ` Hongyang Yang
2014-07-28 10:11 ` Andrew Cooper [this message]
2014-08-07 1:16 ` Hongyang Yang
2014-08-07 7:43 ` Andrew Cooper
2014-08-21 8:12 ` Hongyang Yang
2014-08-21 22:49 ` Ian Campbell
2014-08-21 22:50 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53D621C6.1060107@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=rshriram@cs.ubc.ca \
--cc=xen-devel@lists.xen.org \
--cc=yanghy@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).