From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
David Vrabel <david.vrabel@citrix.com>,
Alex Bligh <alex@alex.org.uk>,
Anthony PERARD <anthony.perard@citrix.com>,
Diana Crisan <dcrisan@flexiant.com>
Subject: Re: HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI
Date: Fri, 31 May 2013 17:30:41 -0400 [thread overview]
Message-ID: <20130531213041.GG5362@phenom.dumpdata.com> (raw)
In-Reply-To: <51A8828A.5090804@eu.citrix.com>
On Fri, May 31, 2013 at 11:59:22AM +0100, George Dunlap wrote:
> On 31/05/13 11:54, George Dunlap wrote:
> >On 31/05/13 09:34, Diana Crisan wrote:
> >>George,
> >>On 30/05/13 17:06, George Dunlap wrote:
> >>>On 05/30/2013 04:55 PM, Diana Crisan wrote:
> >>>>On 30/05/13 16:26, George Dunlap wrote:
> >>>>>On Tue, May 28, 2013 at 4:06 PM, Diana Crisan <dcrisan@flexiant.com>
> >>>>>wrote:
> >>>>>>Hi,
> >>>>>>
> >>>>>>
> >>>>>>On 26/05/13 09:38, Ian Campbell wrote:
> >>>>>>>On Sat, 2013-05-25 at 11:18 +0100, Alex Bligh wrote:
> >>>>>>>>George,
> >>>>>>>>
> >>>>>>>>--On 24 May 2013 17:16:07 +0100 George Dunlap
> >>>>>>>><George.Dunlap@eu.citrix.com>
> >>>>>>>>wrote:
> >>>>>>>>
> >>>>>>>>>>FWIW it's reproducible on every host h/w platform we've tried
> >>>>>>>>>>(a total of 2).
> >>>>>>>>>Do you see the same effects if you do a local-host migrate?
> >>>>>>>>I hadn't even realised that was possible. That would
> >>>>>>>>have made testing
> >>>>>>>>live
> >>>>>>>>migrate easier!
> >>>>>>>That's basically the whole reason it is supported ;-)
> >>>>>>>
> >>>>>>>>How do you avoid the name clash in xen-store?
> >>>>>>>Most toolstacks receive the incoming migration into a domain named
> >>>>>>>FOO-incoming or some such and then rename to FOO upon
> >>>>>>>completion. Some
> >>>>>>>also rename the outgoing domain "FOO-migratedaway"
> >>>>>>>towards the end so
> >>>>>>>that the bits of the final teardown which can safely
> >>>>>>>happen after the
> >>>>>>>target have start can be done so.
> >>>>>>>
> >>>>>>>Ian.
> >>>>>>>
> >>>>>>>
> >>>>>>I am unsure what I am doing wrong, but I cannot seem to
> >>>>>>be able to do a
> >>>>>>localhost migrate.
> >>>>>>
> >>>>>>I created a domU using "xl create xl.conf" and once it
> >>>>>>fully booted I
> >>>>>>issued
> >>>>>>an "xl migrate 11 localhost". This fails and gives the output below.
> >>>>>>
> >>>>>>Would you please advise on how to get this working?
> >>>>>>
> >>>>>>Thanks,
> >>>>>>Diana
> >>>>>>
> >>>>>>
> >>>>>>root@ubuntu:~# xl migrate 11 localhost
> >>>>>>root@localhost's password:
> >>>>>>migration target: Ready to receive domain.
> >>>>>>Saving to migration stream new xl format (info 0x0/0x0/2344)
> >>>>>>Loading new save file <incoming migration stream> (new xl fmt info
> >>>>>>0x0/0x0/2344)
> >>>>>> Savefile contains xl domain config
> >>>>>>xc: progress: Reloading memory pages: 53248/1048575 5%
> >>>>>>xc: progress: Reloading memory pages: 105472/1048575 10%
> >>>>>>libxl: error: libxl_dm.c:1280:device_model_spawn_outcome: domain 12
> >>>>>>device
> >>>>>>model: spawn failed (rc=-3)
> >>>>>>libxl: error: libxl_create.c:1091:domcreate_devmodel_started: device
> >>>>>>model
> >>>>>>did not start: -3
> >>>>>>libxl: error:
> >>>>>>libxl_dm.c:1311:libxl__destroy_device_model: Device
> >>>>>>Model
> >>>>>>already exited
> >>>>>>migration target: Domain creation failed (code -3).
> >>>>>>libxl: error: libxl_utils.c:393:libxl_read_exactly: file/stream
> >>>>>>truncated
> >>>>>>reading ready message from migration receiver stream
> >>>>>>libxl: info:
> >>>>>>libxl_exec.c:118:libxl_report_child_exitstatus:
> >>>>>>migration
> >>>>>>target process [10934] exited with error status 3
> >>>>>>Migration failed, resuming at sender.
> >>>>>>xc: error: Cannot resume uncooperative HVM guests: Internal error
> >>>>>>libxl: error: libxl.c:404:libxl__domain_resume: xc_domain_resume
> >>>>>>failed for
> >>>>>>domain 11: Success
> >>>>>Aha -- I managed to reproduce this one as well.
> >>>>>
> >>>>>Your problem is the "vncunused=0" -- that's instructing
> >>>>>qemu "You must
> >>>>>use this exact port for the vnc server". But when you do
> >>>>>the migrate,
> >>>>>that port is still in use by the "from" domain; so the qemu for the
> >>>>>"to" domain can't get it, and fails.
> >>>>>
> >>>>>Obviously this should fail a lot more gracefully, but that's a bit of
> >>>>>a lower-priority bug I think.
> >>>>>
> >>>>> -George
> >>>>Yes, I managed to get to the bottom of it too and got vms migrating on
> >>>>localhost on our end.
> >>>>
> >>>>I can confirm I did get the clock stuck problem while doing
> >>>>a localhost
> >>>>migrate.
> >>>
> >>>Does the script I posted earlier "work" for you (i.e., does it
> >>>fail after some number of migrations)?
> >>>
> >>
> >>I left your script running throughout the night and it seems
> >>that it does not always catch the problem. I see the following:
> >>
> >>1. vm has the clock stuck
> >>2. script is still running as it seems the vm is still ping-able.
> >>3. migration fails on the basis that the vm is does not ack the
> >>suspend request (see below).
> >
> >So I wrote a script to run "date", sleep for 2 seconds, and run
> >"date" a second time -- and eventually the *sleep* hung.
> >
> >The VM is still responsive, and I can log in; if I type "date"
> >manually successive times then I get an advancing clock, but if I
> >type "sleep 1" it just hangs.
> >
> >If you run "dmesg" in the guest, do you see the following line?
> >
> >CE: Reprogramming failure. Giving up
>
> I think this must be it; on my other box, I got the following messages:
>
> [ 224.732083] PM: late freeze of devices complete after 3.787 msecs
> [ 224.736062] Xen HVM callback vector for event delivery is enabled
> [ 224.736062] Xen Platform PCI: I/O protocol version 1
> [ 224.736062] xen: --> irq=8, pirq=16
> [ 224.736062] xen: --> irq=12, pirq=17
> [ 224.736062] xen: --> irq=1, pirq=18
> [ 224.736062] xen: --> irq=6, pirq=19
> [ 224.736062] xen: --> irq=4, pirq=20
> [ 224.736062] xen: --> irq=7, pirq=21
> [ 224.736062] xen: --> irq=28, pirq=22
> [ 224.736062] ata_piix 0000:00:01.1: restoring config space at
> offset 0x1 (was 0x2800001, writing 0x2800005)
> [ 224.736062] PM: early restore of devices complete after 5.854 msecs
> [ 224.739692] ata_piix 0000:00:01.1: setting latency timer to 64
> [ 224.739782] xen-platform-pci 0000:00:03.0: PCI INT A -> GSI 28
> (level, low) -> IRQ 28
> [ 224.746900] PM: restore of devices complete after 7.540 msecs
> [ 224.758612] Setting capacity to 16777216
> [ 224.758749] Setting capacity to 16777216
> [ 224.898426] ata2.01: NODEV after polling detection
> [ 224.900941] ata2.00: configured for MWDMA2
> [ 231.055978] CE: xen increased min_delta_ns to 150000 nsec
> [ 231.055986] hrtimer: interrupt took 14460 ns
> [ 247.893303] PM: freeze of devices complete after 2.168 msecs
> [ 247.893306] suspending xenstore...
> [ 247.896977] PM: late freeze of devices complete after 3.666 msecs
> [ 247.900067] Xen HVM callback vector for event delivery is enabled
> [ 247.900067] Xen Platform PCI: I/O protocol version 1
> [ 247.900067] xen: --> irq=8, pirq=16
> [ 247.900067] xen: --> irq=12, pirq=17
> [ 247.900067] xen: --> irq=1, pirq=18
> [ 247.900067] xen: --> irq=6, pirq=19
> [ 247.900067] xen: --> irq=4, pirq=20
> [ 247.900067] xen: --> irq=7, pirq=21
> [ 247.900067] xen: --> irq=28, pirq=22
> [ 247.900067] ata_piix 0000:00:01.1: restoring config space at
> offset 0x1 (was 0x2800001, writing 0x2800005)
> [ 247.900067] PM: early restore of devices complete after 4.612 msecs
> [ 247.906454] ata_piix 0000:00:01.1: setting latency timer to 64
> [ 247.906558] xen-platform-pci 0000:00:03.0: PCI INT A -> GSI 28
> (level, low) -> IRQ 28
> [ 247.914770] PM: restore of devices complete after 8.762 msecs
> [ 247.926557] Setting capacity to 16777216
> [ 247.926661] Setting capacity to 16777216
> [ 248.066661] ata2.01: NODEV after polling detection
> [ 248.067326] CE: xen increased min_delta_ns to 225000 nsec
> [ 248.067344] CE: xen increased min_delta_ns to 337500 nsec
> [ 248.067361] CE: xen increased min_delta_ns to 506250 nsec
> [ 248.067378] CE: xen increased min_delta_ns to 759375 nsec
> [ 248.067396] CE: xen increased min_delta_ns to 1139062 nsec
> [ 248.067413] CE: xen increased min_delta_ns to 1708593 nsec
> [ 248.067428] CE: xen increased min_delta_ns to 2562889 nsec
> [ 248.067441] CE: xen increased min_delta_ns to 3844333 nsec
> [ 248.067453] CE: xen increased min_delta_ns to 4000000 nsec
> [ 248.067466] CE: Reprogramming failure. Giving up
> [ 248.068075] ata2.00: configured for MWDMA2
>
> Note the "CE: xen increased min_delta_ns to 150000nsec" at 231 for
> the previous suspend, and now it's increasing it up to 4
> milliseconds before giving up for this suspend.
>
> Konrad, stefano, any idea what's going on here?
VIRQ_TIMER not being delievered. Aka this commit
bee980d9e9642e96351fa3ca9077b853ecf62f57
xen/events: Handle VIRQ_TIMER before any other hardirq in event loop.
should be back-ported but didn't yet. Let me put that
on my TODO list.
>
> -George
next prev parent reply other threads:[~2013-05-31 21:30 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1223417765.8633857.1368537033873.JavaMail.root@zimbra002>
2013-05-14 13:11 ` HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI Diana Crisan
2013-05-14 16:09 ` George Dunlap
2013-05-15 10:05 ` Diana Crisan
2013-05-15 13:46 ` Alex Bligh
2013-05-20 11:11 ` George Dunlap
2013-05-20 19:28 ` Konrad Rzeszutek Wilk
2013-05-20 22:38 ` Alex Bligh
2013-05-21 1:04 ` Konrad Rzeszutek Wilk
2013-05-21 10:22 ` Diana Crisan
2013-05-21 10:47 ` David Vrabel
2013-05-21 11:16 ` Diana Crisan
2013-05-21 12:49 ` David Vrabel
2013-05-21 13:16 ` Alex Bligh
2013-05-24 16:16 ` George Dunlap
2013-05-25 10:18 ` Alex Bligh
2013-05-26 8:38 ` Ian Campbell
2013-05-28 15:06 ` Diana Crisan
2013-05-29 16:16 ` Alex Bligh
2013-05-29 19:04 ` Ian Campbell
2013-05-30 14:30 ` George Dunlap
2013-05-30 15:39 ` Frediano Ziglio
2013-05-30 15:26 ` George Dunlap
2013-05-30 15:55 ` Diana Crisan
2013-05-30 16:06 ` George Dunlap
2013-05-30 17:02 ` Diana Crisan
2013-05-31 8:34 ` Diana Crisan
2013-05-31 10:54 ` George Dunlap
2013-05-31 10:59 ` George Dunlap
2013-05-31 11:41 ` George Dunlap
2013-05-31 21:30 ` Konrad Rzeszutek Wilk [this message]
2013-05-31 22:51 ` Alex Bligh
2013-06-03 9:43 ` George Dunlap
2013-05-31 11:18 ` Alex Bligh
2013-05-31 11:36 ` Diana Crisan
2013-05-31 11:41 ` Diana Crisan
2013-05-31 11:49 ` George Dunlap
2013-05-31 11:57 ` Alex Bligh
2013-05-31 12:40 ` Ian Campbell
2013-05-31 13:07 ` George Dunlap
2013-05-31 15:10 ` Roger Pau Monné
2013-06-03 8:37 ` Roger Pau Monné
2013-06-03 10:05 ` Stefano Stabellini
2013-06-03 10:23 ` Roger Pau Monné
2013-06-03 10:30 ` Stefano Stabellini
2013-06-03 11:16 ` George Dunlap
2013-06-03 11:24 ` Diana Crisan
2013-06-03 14:01 ` Diana Crisan
2013-06-03 17:09 ` Alex Bligh
2013-06-03 17:12 ` George Dunlap
2013-06-03 17:18 ` Alex Bligh
2013-06-03 17:25 ` George Dunlap
2013-06-03 17:42 ` Alex Bligh
2013-06-03 10:25 ` George Dunlap
2013-05-31 13:16 ` Alex Bligh
2013-05-31 14:36 ` Ian Campbell
2013-05-31 15:18 ` Alex Bligh
2013-05-31 12:34 ` Ian Campbell
2013-05-30 14:32 ` George Dunlap
2013-05-30 14:42 ` Diana Crisan
2013-06-03 17:18 Alex Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130531213041.GG5362@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=Ian.Campbell@citrix.com \
--cc=alex@alex.org.uk \
--cc=anthony.perard@citrix.com \
--cc=david.vrabel@citrix.com \
--cc=dcrisan@flexiant.com \
--cc=george.dunlap@eu.citrix.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.