From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
"Ian.Campbell@citrix.com" <Ian.Campbell@citrix.com>,
Wen Congyang <wency@cn.fujitsu.com>,
xen devel <xen-devel@lists.xen.org>
Subject: Re: question about migration
Date: Mon, 4 Jan 2016 16:38:25 +0000 [thread overview]
Message-ID: <568AA001.2080904@citrix.com> (raw)
In-Reply-To: <22154.36943.266930.151060@mariner.uk.xensource.com>
On 04/01/16 15:31, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [Xen-devel] question about migration"):
>> On 25/12/2015 03:06, Wen Congyang wrote:
>>> Another problem:
>>> If migration fails after the guest is suspended, we will resume it in the source.
>>> In this case, we cannot shutdown it. because no process hanlds the shutdown event.
>>> The log in /var/log/xen/xl-hvm_nopv.log:
>>> Waiting for domain hvm_nopv (domid 1) to die [pid 5508]
>>> Domain 1 has shut down, reason code 2 0x2
>>> Domain has suspended.
>>> Done. Exiting now
>>>
>>> The xl has exited...
> ...
>> Hmm yes. This is a libxl bug in libxl_evenable_domain_death(). CC'ing
>> the toolstack maintainers.
> AIUI this is a response to Wen's comments above.
>
>> It waits for the @releasedomain watch, but doesn't interpret the results
>> correctly. In particular, if it can still make successful hypercalls
>> with the provided domid, that domain was not the subject of
>> @releasedomain. (I also observe that domain_death_xswatch_callback() is
>> very inefficient. It only needs to make a single hypercall, not query
>> the entire state of all domains.)
> I don't understand precisely what you allege this bug to be, but:
>
> * libxl_evenable_domain_death may generate two events, a
> DOMAIN_SHUTDOWN and a DOMAIN_DEATH, or only one, a DOMAIN_DEATH.
> This is documented in libxl.h (although it refers to DESTROY rather
> than DEATH - see patch below to fix the doc).
>
> * @releaseDomain usually triggers twice for each domain: once when it
> goes to SHUTDOWN and once when it is actually destroyed. (This is
> obviously necessary to implement the above.)
So it does. I clearly had an accident with `git grep` when I came the
opposite conclusion. Apologies for the noise generated from this.
>
> * @releaseDomain does not have a specific domain which is the "subject
> of @releaseDomain". Arguably this is unhelpful, but it is not
> libxl's fault. It arises from the VIRQ generated by Xen. Note that
> xenstored needs to search its own list of active domains to see what
> has happened; it generates the @releaseDomain event and throws away
> the domid.
The semantics of @releaseDomain are quite mad, but this is have it has
always been.
The current semantics are a scalability limitation which someone in
XenServer will likely get around to in due course (we support 1000 VMs
per host).
> * It is not possible to resume the domain in the source after it has
> suspended.
This functionality exists and is already used in several circumstances,
both by libxl, and other toolstacks.
xl has an added split-brain problem here that plain demonic toolstacks
don't have; specifically that there are two completely independent
processes playing with the domain state at the same time.
The daemonic xl needs to ignore DOMAIN_SHUTDOWN and tidy up only after
DOMAIN_DEATH. Under these circumstances, a failed migrate which resumes
the domain won't result in qemu being cleaned up.
~Andrew
next prev parent reply other threads:[~2016-01-04 16:38 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-24 2:29 question about migration Wen Congyang
2015-12-24 12:36 ` Andrew Cooper
2015-12-25 0:55 ` Wen Congyang
2015-12-29 10:57 ` Andrew Cooper
2015-12-25 1:45 ` Wen Congyang
2015-12-25 3:06 ` Wen Congyang
2015-12-29 12:46 ` Andrew Cooper
2016-01-04 15:31 ` Ian Jackson
2016-01-04 15:44 ` Ian Campbell
2016-01-04 15:48 ` Ian Campbell
2016-01-04 16:38 ` Andrew Cooper [this message]
2016-01-04 17:46 ` Ian Jackson
2016-01-04 18:05 ` Andrew Cooper
2016-01-05 15:40 ` Ian Jackson
2016-01-05 17:39 ` Andrew Cooper
2016-01-05 18:17 ` Ian Jackson
2016-01-06 10:21 ` Ian Campbell
2015-12-29 11:24 ` Andrew Cooper
2016-01-04 10:28 ` Paul Durrant
2016-01-04 10:36 ` Andrew Cooper
2016-01-04 11:08 ` Paul Durrant
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=568AA001.2080904@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=wency@cn.fujitsu.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.