From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Bligh Subject: Re: HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI Date: Fri, 31 May 2013 16:18:39 +0100 Message-ID: <37A4010683390E8CBA67A8BE@nimrod.local> References: <1717491994.10371605.1369131737226.JavaMail.root@zimbra002> <519B50C9.1000008@citrix.com> <519B577E.6070200@flexiant.com> <519B6D51.2060508@citrix.com> <951B3441BAE2324286D3AA6D@Ximines.local> <420439EA40B15FCBFDFF2BE3@nimrod.local> <1369557503.22605.11.camel@dagon.hellion.org.uk> <51A4C7EB.1010406@flexiant.com> <51A7767A.9030904@flexiant.com> <51A7791C.2020208@eu.citrix.com> <51A8608F.9000302@flexiant.com> <51A88151.3080001@eu.citrix.com> <0FE70400-1152-45F5-9BF9-973DF1DA9EE8@flexiant.com> <51A88E3E.5090208@eu.citrix.com> <1370004031.5199.133.camel@zakaz.uk.xensource.com> <3C61B2368D479E44F6D5FACE@nimrod.local> <1370010963.5199.184.camel@zakaz.uk.xensource.com> Reply-To: Alex Bligh Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1370010963.5199.184.camel@zakaz.uk.xensource.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Anthony@alex.org.uk, Konrad Rzeszutek Wilk , George Dunlap , xen-devel@lists.xen.org, David Vrabel , Alex Bligh , PERARD , Diana Crisan List-Id: xen-devel@lists.xenproject.org Ian, > There's no such thing as a "migration" on physical hardware and a > save/restore etc is under kernel control so it knows not to cache timer > values etc. Indeed, so it's the live migrate which is causing it! >> If that's correct, and I've understood what George said, then >> I /think/ the only quirky fix that needs doing is this is to change >> the API between kernel driver and xen so that 'don't give me a time >> in the past' means 'don't give me a time in the past unless you've >> just done a live migrate'. > > What does "just" mean here? How do you determine it? I'd suggest whatever time interval is required to resync. If you said 1 second, for instance, that would be a bodge, but would presumably work unless the clocks were out by more than a second. > I said "filling the hypervisor with lots of quirky exceptions", this is > just one and in isolation maybe it isn't too bad. Now imagine we'd > accumulated a dozen over the last 10 years, the semantics of our timer > operation would be impossible to understand, do this unless A, otherwise > if not B do something else, etc etc. > >> If you really want giving a time in the >> past to error under some circumstances, you can signal that another >> way ('really don't give me a time in the past). > > That would be changing the behaviour of an existing ABI AFAICT, which is > right out -- what if some other guest is relying on the current > behaviour? Well Linux is sort of relying on it - so we might fix those guests too :-) I suppose the result would be that if anyone relied on the failure of the timer event in the one second following migration, then sometimes that failure would not happen. > But in any case until George (or someone else) has actually diagnosed > what is going on this entire discussion is premature. > >> Yes, it would be lovely if everyone always applied the latest >> patches to their kernel and rebooted, but they don't. >> >> Otherwise the net result will be Xen4.3 does not reliably live migrate >> a pile of Linux OS's unless running with a patched kernel. That is not >> a great conclusion. > > Are you saying this didn't happen with Xen 4.2 and earlier? That would > tend to lean towards this being a Xen bug. It happens in 4.2. We did not discover it in 4.1, but have not retested so comprehensively. And in 4.1 we were using a different device model (if that's relevant). -- Alex Bligh