From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI Date: Thu, 30 May 2013 17:06:52 +0100 Message-ID: <51A7791C.2020208@eu.citrix.com> References: <1717491994.10371605.1369131737226.JavaMail.root@zimbra002> <519B50C9.1000008@citrix.com> <519B577E.6070200@flexiant.com> <519B6D51.2060508@citrix.com> <951B3441BAE2324286D3AA6D@Ximines.local> <420439EA40B15FCBFDFF2BE3@nimrod.local> <1369557503.22605.11.camel@dagon.hellion.org.uk> <51A4C7EB.1010406@flexiant.com> <51A7767A.9030904@flexiant.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <51A7767A.9030904@flexiant.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Diana Crisan Cc: Ian Campbell , Konrad Rzeszutek Wilk , "xen-devel@lists.xen.org" , David Vrabel , Alex Bligh , Anthony PERARD List-Id: xen-devel@lists.xenproject.org On 05/30/2013 04:55 PM, Diana Crisan wrote: > On 30/05/13 16:26, George Dunlap wrote: >> On Tue, May 28, 2013 at 4:06 PM, Diana Crisan >> wrote: >>> Hi, >>> >>> >>> On 26/05/13 09:38, Ian Campbell wrote: >>>> On Sat, 2013-05-25 at 11:18 +0100, Alex Bligh wrote: >>>>> George, >>>>> >>>>> --On 24 May 2013 17:16:07 +0100 George Dunlap >>>>> >>>>> wrote: >>>>> >>>>>>> FWIW it's reproducible on every host h/w platform we've tried >>>>>>> (a total of 2). >>>>>> Do you see the same effects if you do a local-host migrate? >>>>> I hadn't even realised that was possible. That would have made testing >>>>> live >>>>> migrate easier! >>>> That's basically the whole reason it is supported ;-) >>>> >>>>> How do you avoid the name clash in xen-store? >>>> Most toolstacks receive the incoming migration into a domain named >>>> FOO-incoming or some such and then rename to FOO upon completion. Some >>>> also rename the outgoing domain "FOO-migratedaway" towards the end so >>>> that the bits of the final teardown which can safely happen after the >>>> target have start can be done so. >>>> >>>> Ian. >>>> >>>> >>> I am unsure what I am doing wrong, but I cannot seem to be able to do a >>> localhost migrate. >>> >>> I created a domU using "xl create xl.conf" and once it fully booted I >>> issued >>> an "xl migrate 11 localhost". This fails and gives the output below. >>> >>> Would you please advise on how to get this working? >>> >>> Thanks, >>> Diana >>> >>> >>> root@ubuntu:~# xl migrate 11 localhost >>> root@localhost's password: >>> migration target: Ready to receive domain. >>> Saving to migration stream new xl format (info 0x0/0x0/2344) >>> Loading new save file (new xl fmt info >>> 0x0/0x0/2344) >>> Savefile contains xl domain config >>> xc: progress: Reloading memory pages: 53248/1048575 5% >>> xc: progress: Reloading memory pages: 105472/1048575 10% >>> libxl: error: libxl_dm.c:1280:device_model_spawn_outcome: domain 12 >>> device >>> model: spawn failed (rc=-3) >>> libxl: error: libxl_create.c:1091:domcreate_devmodel_started: device >>> model >>> did not start: -3 >>> libxl: error: libxl_dm.c:1311:libxl__destroy_device_model: Device Model >>> already exited >>> migration target: Domain creation failed (code -3). >>> libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream >>> truncated >>> reading ready message from migration receiver stream >>> libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration >>> target process [10934] exited with error status 3 >>> Migration failed, resuming at sender. >>> xc: error: Cannot resume uncooperative HVM guests: Internal error >>> libxl: error: libxl.c:404:libxl__domain_resume: xc_domain_resume >>> failed for >>> domain 11: Success >> Aha -- I managed to reproduce this one as well. >> >> Your problem is the "vncunused=0" -- that's instructing qemu "You must >> use this exact port for the vnc server". But when you do the migrate, >> that port is still in use by the "from" domain; so the qemu for the >> "to" domain can't get it, and fails. >> >> Obviously this should fail a lot more gracefully, but that's a bit of >> a lower-priority bug I think. >> >> -George > Yes, I managed to get to the bottom of it too and got vms migrating on > localhost on our end. > > I can confirm I did get the clock stuck problem while doing a localhost > migrate. Does the script I posted earlier "work" for you (i.e., does it fail after some number of migrations)? I've been using it to do a localhost migrate, using a nearly identical config as the one you posted (only difference, I'm using blkback rather than blktap), with an Ubuntu Precise VM using the 3.2.0-39-virtual kernel, and I'm up to 20 migrates with no problems. Differences between my setup and yours at this point: - probably hardware (I've got an old AMD box) - dom0 kernel is Debian 2.6.32-5-xen - not using blktap I've also been testing this on an Intel box, with the Debian 3.2.0-4-686-pae kernel, with a Debian distro, and it's up to 103 successful migrates. It's possible that it's a model-specific issue, but it's sort of hard to see how the dom0 kernel, or blktap, could cause this. Do you have any special kernel config parameters you're passing in to the guest? Also, could you try a generic Debian Wheezy install, just to see if it's got something to do with the kernel? -George