xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Diana Crisan <dcrisan@flexiant.com>
To: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	David Vrabel <david.vrabel@citrix.com>,
	Alex Bligh <alex@alex.org.uk>,
	Anthony PERARD <anthony.perard@citrix.com>
Subject: Re: HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
Date: Fri, 31 May 2013 09:34:23 +0100	[thread overview]
Message-ID: <51A8608F.9000302@flexiant.com> (raw)
In-Reply-To: <51A7791C.2020208@eu.citrix.com>

George,
On 30/05/13 17:06, George Dunlap wrote:
> On 05/30/2013 04:55 PM, Diana Crisan wrote:
>> On 30/05/13 16:26, George Dunlap wrote:
>>> On Tue, May 28, 2013 at 4:06 PM, Diana Crisan <dcrisan@flexiant.com>
>>> wrote:
>>>> Hi,
>>>>
>>>>
>>>> On 26/05/13 09:38, Ian Campbell wrote:
>>>>> On Sat, 2013-05-25 at 11:18 +0100, Alex Bligh wrote:
>>>>>> George,
>>>>>>
>>>>>> --On 24 May 2013 17:16:07 +0100 George Dunlap
>>>>>> <George.Dunlap@eu.citrix.com>
>>>>>> wrote:
>>>>>>
>>>>>>>> FWIW it's reproducible on every host h/w platform we've tried
>>>>>>>> (a total of 2).
>>>>>>> Do you see the same effects if you do a local-host migrate?
>>>>>> I hadn't even realised that was possible. That would have made 
>>>>>> testing
>>>>>> live
>>>>>> migrate easier!
>>>>> That's basically the whole reason it is supported ;-)
>>>>>
>>>>>> How do you avoid the name clash in xen-store?
>>>>> Most toolstacks receive the incoming migration into a domain named
>>>>> FOO-incoming or some such and then rename to FOO upon completion. 
>>>>> Some
>>>>> also rename the outgoing domain "FOO-migratedaway" towards the end so
>>>>> that the bits of the final teardown which can safely happen after the
>>>>> target have start can be done so.
>>>>>
>>>>> Ian.
>>>>>
>>>>>
>>>> I am unsure what I am doing wrong, but I cannot seem to be able to 
>>>> do a
>>>> localhost migrate.
>>>>
>>>> I created a domU using "xl create xl.conf" and once it fully booted I
>>>> issued
>>>> an "xl migrate 11 localhost". This fails and gives the output below.
>>>>
>>>> Would you please advise on how to get this working?
>>>>
>>>> Thanks,
>>>> Diana
>>>>
>>>>
>>>> root@ubuntu:~# xl migrate 11 localhost
>>>> root@localhost's password:
>>>> migration target: Ready to receive domain.
>>>> Saving to migration stream new xl format (info 0x0/0x0/2344)
>>>> Loading new save file <incoming migration stream> (new xl fmt info
>>>> 0x0/0x0/2344)
>>>>   Savefile contains xl domain config
>>>> xc: progress: Reloading memory pages: 53248/1048575    5%
>>>> xc: progress: Reloading memory pages: 105472/1048575   10%
>>>> libxl: error: libxl_dm.c:1280:device_model_spawn_outcome: domain 12
>>>> device
>>>> model: spawn failed (rc=-3)
>>>> libxl: error: libxl_create.c:1091:domcreate_devmodel_started: device
>>>> model
>>>> did not start: -3
>>>> libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device 
>>>> Model
>>>> already exited
>>>> migration target: Domain creation failed (code -3).
>>>> libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream
>>>> truncated
>>>> reading ready message from migration receiver stream
>>>> libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration
>>>> target process [10934] exited with error status 3
>>>> Migration failed, resuming at sender.
>>>> xc: error: Cannot resume uncooperative HVM guests: Internal error
>>>> libxl: error: libxl.c:404:libxl__domain_resume: xc_domain_resume
>>>> failed for
>>>> domain 11: Success
>>> Aha -- I managed to reproduce this one as well.
>>>
>>> Your problem is the "vncunused=0" -- that's instructing qemu "You must
>>> use this exact port for the vnc server".  But when you do the migrate,
>>> that port is still in use by the "from" domain; so the qemu for the
>>> "to" domain can't get it, and fails.
>>>
>>> Obviously this should fail a lot more gracefully, but that's a bit of
>>> a lower-priority bug I think.
>>>
>>>   -George
>> Yes, I managed to get to the bottom of it too and got vms migrating on
>> localhost on our end.
>>
>> I can confirm I did get the clock stuck problem while doing a localhost
>> migrate.
>
> Does the script I posted earlier "work" for you (i.e., does it fail 
> after some number of migrations)?
>

I left your script running throughout the night and it seems that it 
does not always catch the problem. I see the following:

1. vm has the clock stuck
2. script is still running as it seems the vm is still ping-able.
3. migration fails on the basis that the vm is does not ack the suspend 
request (see below).

libxl: error: libxl_dom.c:1063:libxl__domain_suspend_common_callback: 
guest didn't acknowledge suspend, cancelling request
libxl: error: libxl_dom.c:1085:libxl__domain_suspend_common_callback: 
guest didn't acknowledge suspend, request cancelled
xc: error: Suspend request failed: Internal error
xc: error: Domain appears not to have suspended: Internal error
libxl: error: libxl_dom.c:1370:libxl__xc_domain_save_done: saving 
domain: domain did not respond to suspend request: Invalid argument
migration sender: libxl_domain_suspend failed (rc=-8)
xc: error: 0-length read: Internal error
xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error
xc: error: Error when reading batch size (0 = Success): Internal error
xc: error: Error when reading batch (0 = Success): Internal error
libxl: error: libxl_create.c:834:libxl__xc_domain_restore_done: 
restoring domain: Resource temporarily unavailable
libxl: error: libxl_create.c:916:domcreate_rebuild_done: cannot 
(re-)build domain: -3
libxl: error: libxl.c:1378:libxl__destroy_domid: non-existant domain 111
libxl: error: libxl.c:1342:domain_destroy_callback: unable to destroy 
guest with domid 111
libxl: error: libxl_create.c:1225:domcreate_destruction_cb: unable to 
destroy domain 111 following failed creation
migration target: Domain creation failed (code -3).
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration 
target process [7849] exited with error status 3
Migration failed, failed to suspend at sender.
PING 172.16.1.223 (172.16.1.223) 56(84) bytes of data.
64 bytes from 172.16.1.223: icmp_req=1 ttl=64 time=0.339 ms
64 bytes from 172.16.1.223: icmp_req=2 ttl=64 time=0.569 ms
64 bytes from 172.16.1.223: icmp_req=3 ttl=64 time=0.535 ms
64 bytes from 172.16.1.223: icmp_req=4 ttl=64 time=0.544 ms
64 bytes from 172.16.1.223: icmp_req=5 ttl=64 time=0.529 ms


> I've been using it to do a localhost migrate, using a nearly identical 
> config as the one you posted (only difference, I'm using blkback 
> rather than blktap), with an Ubuntu Precise VM using the 
> 3.2.0-39-virtual kernel, and I'm up to 20 migrates with no problems.
>
> Differences between my setup and yours at this point:
>  - probably hardware (I've got an old AMD box)
>  - dom0 kernel is Debian 2.6.32-5-xen
>  - not using blktap
>
> I've also been testing this on an Intel box, with the Debian 
> 3.2.0-4-686-pae kernel, with a Debian distro, and it's up to 103 
> successful migrates.
>
> It's possible that it's a model-specific issue, but it's sort of hard 
> to see how the dom0 kernel, or blktap, could cause this.
>
> Do you have any special kernel config parameters you're passing in to 
> the guest?
>
> Also, could you try a generic Debian Wheezy install, just to see if 
> it's got something to do with the kernel?
>
>  -George


I reckon our code caught a separate problem with this issue as whenever 
the vm got its clock stuck, the network interface wasn't coming back up 
and I would see NO-CARRIER for the guest, which made it unreachable.

--
Diana

  parent reply	other threads:[~2013-05-31  8:34 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1223417765.8633857.1368537033873.JavaMail.root@zimbra002>
2013-05-14 13:11 ` HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI Diana Crisan
2013-05-14 16:09   ` George Dunlap
2013-05-15 10:05     ` Diana Crisan
2013-05-15 13:46   ` Alex Bligh
2013-05-20 11:11     ` George Dunlap
2013-05-20 19:28       ` Konrad Rzeszutek Wilk
2013-05-20 22:38         ` Alex Bligh
2013-05-21  1:04           ` Konrad Rzeszutek Wilk
2013-05-21 10:22             ` Diana Crisan
2013-05-21 10:47               ` David Vrabel
2013-05-21 11:16                 ` Diana Crisan
2013-05-21 12:49                   ` David Vrabel
2013-05-21 13:16                     ` Alex Bligh
2013-05-24 16:16                       ` George Dunlap
2013-05-25 10:18                         ` Alex Bligh
2013-05-26  8:38                           ` Ian Campbell
2013-05-28 15:06                             ` Diana Crisan
2013-05-29 16:16                               ` Alex Bligh
2013-05-29 19:04                                 ` Ian Campbell
2013-05-30 14:30                                   ` George Dunlap
2013-05-30 15:39                                 ` Frediano Ziglio
2013-05-30 15:26                               ` George Dunlap
2013-05-30 15:55                                 ` Diana Crisan
2013-05-30 16:06                                   ` George Dunlap
2013-05-30 17:02                                     ` Diana Crisan
2013-05-31  8:34                                     ` Diana Crisan [this message]
2013-05-31 10:54                                       ` George Dunlap
2013-05-31 10:59                                         ` George Dunlap
2013-05-31 11:41                                           ` George Dunlap
2013-05-31 21:30                                           ` Konrad Rzeszutek Wilk
2013-05-31 22:51                                             ` Alex Bligh
2013-06-03  9:43                                             ` George Dunlap
2013-05-31 11:18                                         ` Alex Bligh
2013-05-31 11:36                                         ` Diana Crisan
2013-05-31 11:41                                           ` Diana Crisan
2013-05-31 11:49                                             ` George Dunlap
2013-05-31 11:57                                               ` Alex Bligh
2013-05-31 12:40                                                 ` Ian Campbell
2013-05-31 13:07                                                   ` George Dunlap
2013-05-31 15:10                                                     ` Roger Pau Monné
2013-06-03  8:37                                                       ` Roger Pau Monné
2013-06-03 10:05                                                         ` Stefano Stabellini
2013-06-03 10:23                                                           ` Roger Pau Monné
2013-06-03 10:30                                                             ` Stefano Stabellini
2013-06-03 11:16                                                             ` George Dunlap
2013-06-03 11:24                                                               ` Diana Crisan
2013-06-03 14:01                                                               ` Diana Crisan
2013-06-03 17:09                                                               ` Alex Bligh
2013-06-03 17:12                                                                 ` George Dunlap
2013-06-03 17:18                                                                   ` Alex Bligh
2013-06-03 17:25                                                                     ` George Dunlap
2013-06-03 17:42                                                                       ` Alex Bligh
2013-06-03 10:25                                                         ` George Dunlap
2013-05-31 13:16                                                   ` Alex Bligh
2013-05-31 14:36                                                     ` Ian Campbell
2013-05-31 15:18                                                       ` Alex Bligh
2013-05-31 12:34                                               ` Ian Campbell
2013-05-30 14:32   ` George Dunlap
2013-05-30 14:42     ` Diana Crisan
2013-06-03 17:18 Alex Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51A8608F.9000302@flexiant.com \
    --to=dcrisan@flexiant.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=alex@alex.org.uk \
    --cc=anthony.perard@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).