From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: HVM Migration of domU on Qemu-upstream DM causes stuck system clock with ACPI Date: Mon, 3 Jun 2013 11:25:08 +0100 Message-ID: <51AC6F04.9030501@eu.citrix.com> References: <1717491994.10371605.1369131737226.JavaMail.root@zimbra002> <519B50C9.1000008@citrix.com> <519B577E.6070200@flexiant.com> <519B6D51.2060508@citrix.com> <951B3441BAE2324286D3AA6D@Ximines.local> <420439EA40B15FCBFDFF2BE3@nimrod.local> <1369557503.22605.11.camel@dagon.hellion.org.uk> <51A4C7EB.1010406@flexiant.com> <51A7767A.9030904@flexiant.com> <51A7791C.2020208@eu.citrix.com> <51A8608F.9000302@flexiant.com> <51A88151.3080001@eu.citrix.com> <0FE70400-1152-45F5-9BF9-973DF1DA9EE8@flexiant.com> <51A88E3E.5090208@eu.citrix.com> <1370004031.5199.133.camel@zakaz.uk.xensource.com> <51A8A0AC.1030301@eu.citrix.com> <51A8BD48.6060104@citrix.com> <51AC55DD.7000507@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <51AC55DD.7000507@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= Cc: Ian Campbell , Konrad Rzeszutek Wilk , xen-devel@lists.xen.org, David Vrabel , Alex Bligh , Anthony PERARD , Diana Crisan List-Id: xen-devel@lists.xenproject.org On 03/06/13 09:37, Roger Pau Monn=E9 wrote: > On 31/05/13 17:10, Roger Pau Monn=E9 wrote: >> On 31/05/13 15:07, George Dunlap wrote: >>> On 31/05/13 13:40, Ian Campbell wrote: >>>> On Fri, 2013-05-31 at 12:57 +0100, Alex Bligh wrote: >>>>> --On 31 May 2013 12:49:18 +0100 George Dunlap >>>>> >>>>> wrote: >>>>> >>>>>> No -- Linux is asking, "Can you give me an alarm in 5ns?" And Xen is >>>>>> saying, "No". So Linux is saying, "OK, how about 5us? 10us? >>>>>> 20us?" By >>>>>> the time it reaches 4ms, Linux has had enough, and says, "If this ti= mer >>>>>> is so bad that it can't give me an event within 4ms it just won't use >>>>>> timers at all, thank you very much." >>>>>> >>>>>> The problem appears to be that Linux thinks it's asking for >>>>>> something in >>>>>> the future, but is actually asking for something in the past. It mu= st >>>>>> look at its watch just before the final domain pause, and then asks = for >>>>>> the time just after the migration resumes on the other side. So it >>>>>> doesn't realize that 10ms (or something) has already passed, and that >>>>>> it's actually asking for a timer in the past. The Xen timer driver = in >>>>>> Linux specifically asks Xen for times set in the past to return an >>>>>> error. >>>>>> Xen is returning an error because the time is in the past, Linux thi= nks >>>>>> it's getting an error because the time is too close in the future and >>>>>> tries asking a little further away. >>>>>> >>>>>> Unfortunately I think this is something which needs to be fixed on t= he >>>>>> Linux side; I don't really see how we can work around it in Xen. >>>>> I don't think fixing it only on the Linux side is a great idea, not >>>>> least >>>>> as it makes any current Linux image not live migrateable reliably. >>>>> That's >>>>> pretty horrible. >>>> Ultimately though a guest bug is a guest bug, we don't really want to = be >>>> filling the hypervisor with lots of quirky exceptions to interfaces in >>>> order to work around them, otherwise where does it end? >>>> >>>> A kernel side fix can be pushed to the distros fairly aggressively (it= 's >>>> mostly just a case of getting an upstream stable backport then filing >>>> bugs with the main ones, we've done it before) and for users upgrading >>>> the kernel via the distros is really not so hard and mostly reuses the >>>> process they must have in place for guest kernel security updates and >>>> other important kernel bugs anyway. >>> In any case, it seems I was wrong -- Linux does "look at its watch" >>> every time it asks. >>> >>> The generic timer interface is "set me a timer N nanoseconds in the >>> future"; the Xen timer implementation executes >>> pvclock_clocksource_read() and adds the delta. So it may well actually >>> be a bug in Xen. >>> >>> Stand by for further investigation... > I've been investigating further during the weekend, and although I'm not > familiar with the timer code in Xen, I think the problem comes from the > fact that in __update_vcpu_system_time when Xen detects that the guest > is using a vtsc it adds offsets to the time passed to the guest, while > in VCPUOP_set_singleshot_timer Xen compares the time passed from the > guest using NOW(), which is just the Xen uptime, without taking into > account any offsets. All the code is really complicated, but it seems like the offset is = added because the offset is *subtacted* by the hardware when the HVM = guest does an RDTSC instruction -- and subtracted in a different way by = Xen when emulating the RDTSC instruction, if you've set tsc_mode = "always_emulate". Just to test some of this stuff, I put the TSC mode to "always_emulate", = and it has the exact same effect -- even though "always_emulate" will = emulate a 1GHz clock. > This only happens after migration because Xen automatically switches to > vtsc when it detects that the guest has been migrated. I'm currently > setting up a Linux PVHVM on shared storage to perform some testing, but > one possible solution might be to add tsc_mode=3D"native_paravirt" to the > PVHVM config file, and another one would be fixing > VCPUOP_set_singleshot_timer to take into account the vtsc offsets and > correctly translate the time passed from the guest. So have you tested it with native_paravirt? Does it work around the = problem? -George