From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI Date: Fri, 31 May 2013 14:07:56 +0100 Message-ID: <51A8A0AC.1030301@eu.citrix.com> References: <1717491994.10371605.1369131737226.JavaMail.root@zimbra002> <519B50C9.1000008@citrix.com> <519B577E.6070200@flexiant.com> <519B6D51.2060508@citrix.com> <951B3441BAE2324286D3AA6D@Ximines.local> <420439EA40B15FCBFDFF2BE3@nimrod.local> <1369557503.22605.11.camel@dagon.hellion.org.uk> <51A4C7EB.1010406@flexiant.com> <51A7767A.9030904@flexiant.com> <51A7791C.2020208@eu.citrix.com> <51A8608F.9000302@flexiant.com> <51A88151.3080001@eu.citrix.com> <0FE70400-1152-45F5-9BF9-973DF1DA9EE8@flexiant.com> <51A88E3E.5090208@eu.citrix.com> <1370004031.5199.133.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1370004031.5199.133.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Konrad Rzeszutek Wilk , xen-devel@lists.xen.org, David Vrabel , Alex Bligh , Anthony PERARD , Diana Crisan List-Id: xen-devel@lists.xenproject.org On 31/05/13 13:40, Ian Campbell wrote: > On Fri, 2013-05-31 at 12:57 +0100, Alex Bligh wrote: >> --On 31 May 2013 12:49:18 +0100 George Dunlap >> wrote: >> >>> No -- Linux is asking, "Can you give me an alarm in 5ns?" And Xen is >>> saying, "No". So Linux is saying, "OK, how about 5us? 10us? 20us?" By >>> the time it reaches 4ms, Linux has had enough, and says, "If this timer >>> is so bad that it can't give me an event within 4ms it just won't use >>> timers at all, thank you very much." >>> >>> The problem appears to be that Linux thinks it's asking for something in >>> the future, but is actually asking for something in the past. It must >>> look at its watch just before the final domain pause, and then asks for >>> the time just after the migration resumes on the other side. So it >>> doesn't realize that 10ms (or something) has already passed, and that >>> it's actually asking for a timer in the past. The Xen timer driver in >>> Linux specifically asks Xen for times set in the past to return an error. >>> Xen is returning an error because the time is in the past, Linux thinks >>> it's getting an error because the time is too close in the future and >>> tries asking a little further away. >>> >>> Unfortunately I think this is something which needs to be fixed on the >>> Linux side; I don't really see how we can work around it in Xen. >> I don't think fixing it only on the Linux side is a great idea, not least >> as it makes any current Linux image not live migrateable reliably. That's >> pretty horrible. > Ultimately though a guest bug is a guest bug, we don't really want to be > filling the hypervisor with lots of quirky exceptions to interfaces in > order to work around them, otherwise where does it end? > > A kernel side fix can be pushed to the distros fairly aggressively (it's > mostly just a case of getting an upstream stable backport then filing > bugs with the main ones, we've done it before) and for users upgrading > the kernel via the distros is really not so hard and mostly reuses the > process they must have in place for guest kernel security updates and > other important kernel bugs anyway. In any case, it seems I was wrong -- Linux does "look at its watch" every time it asks. The generic timer interface is "set me a timer N nanoseconds in the future"; the Xen timer implementation executes pvclock_clocksource_read() and adds the delta. So it may well actually be a bug in Xen. Stand by for further investigation... -George