From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:51353) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TBqee-0007kZ-Ia for qemu-devel@nongnu.org; Wed, 12 Sep 2012 13:23:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TBqeY-0004U8-Gg for qemu-devel@nongnu.org; Wed, 12 Sep 2012 13:23:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47150) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TBqeY-0004Tx-89 for qemu-devel@nongnu.org; Wed, 12 Sep 2012 13:23:06 -0400 Date: Wed, 12 Sep 2012 14:23:51 -0300 From: Luiz Capitulino Message-ID: <20120912142351.44345e09@doriath.home> In-Reply-To: <87pq5r5otp.fsf@codemonkey.ws> References: <87pq5r5otp.fsf@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Rethinking missed tick catchup List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Gleb Natapov , Jan Kiszka , qemu-devel@nongnu.org, Michael Roth , Avi Kivity , Paolo Bonzini , Eric Blake On Wed, 12 Sep 2012 08:54:26 -0500 Anthony Liguori wrote: > > Hi, > > We've been running into a lot of problems lately with Windows guests and > I think they all ultimately could be addressed by revisiting the missed > tick catchup algorithms that we use. Mike and I spent a while talking > about it yesterday and I wanted to take the discussion to the list to > get some additional input. > > Here are the problems we're seeing: > > 1) Rapid reinjection can lead to time moving faster for short bursts of > time. We've seen a number of RTC watchdog BSoDs and it's possible > that at least one cause is reinjection speed. > > 2) When hibernating a host system, the guest gets is essentially paused > for a long period of time. This results in a very large tick catchup > while also resulting in a large skew in guest time. > > I've gotten reports of the tick catchup consuming a lot of CPU time > from rapid delivery of interrupts (although I haven't reproduced this > yet). > > 3) Windows appears to have a service that periodically syncs the guest > time with the hardware clock. I've been told the resync period is an > hour. For large clock skews, this can compete with reinjection > resulting in a positive skew in time (the guest can be ahead of the > host). > > I've been thinking about an algorithm like this to address these > problems: > > A) Limit the number of interrupts that we reinject to the equivalent of > a small period of wallclock time. Something like 60 seconds. > > B) In the event of (A), trigger a notification in QEMU. This is easy > for the RTC but harder for the in-kernel PIT. Maybe it's a good time to > revisit usage of the in-kernel PIT? > > C) On acculumated tick overflow, rely on using a qemu-ga command to > force a resync of the guest's time to the hardware wallclock time. > > D) Whenever the guest reads the wallclock time from the RTC, reset all > accumulated ticks. > > In order to do (C), we'll need to plumb qemu-ga through QMP. Mike and I > discussed a low-impact way of doing this (having a separate dispatch > path for guest agent commands) and I'm confident we could do this for > 1.3. Fine with me, but note that we're only two or three commands away from having the qapi conversion done. So, it's possible that we'll merge this and re-do it a few weeks later. > This would mean that management tools would need to consume qemu-ga > through QMP. Not sure if this is a problem for anyone. Shouldn't be a problem I think. > > I'm not sure whether it's worth trying to support this with the > in-kernel PIT or not either. > > Are there other issues with reinjection that people are aware of? Does > anything seem obviously wrong with the above? > > Regards, > > Anthony Liguori >