From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:38638) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TBoB0-00088H-2u for qemu-devel@nongnu.org; Wed, 12 Sep 2012 10:44:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TBoAp-0007Vc-A2 for qemu-devel@nongnu.org; Wed, 12 Sep 2012 10:44:26 -0400 Received: from mail-ob0-f173.google.com ([209.85.214.173]:45123) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TBoAp-0007VX-4o for qemu-devel@nongnu.org; Wed, 12 Sep 2012 10:44:15 -0400 Received: by obbta14 with SMTP id ta14so2525845obb.4 for ; Wed, 12 Sep 2012 07:44:14 -0700 (PDT) From: Anthony Liguori In-Reply-To: <50509A66.7010505@siemens.com> References: <87pq5r5otp.fsf@codemonkey.ws> <50509A66.7010505@siemens.com> Date: Wed, 12 Sep 2012 09:44:10 -0500 Message-ID: <87y5kfi9mt.fsf@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Re: [Qemu-devel] Rethinking missed tick catchup List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Michael Roth , Gleb Natapov , "qemu-devel@nongnu.org" , Luiz Capitulino , Avi Kivity , Paolo Bonzini , Eric Blake Jan Kiszka writes: > On 2012-09-12 15:54, Anthony Liguori wrote: >> >> Hi, >> >> We've been running into a lot of problems lately with Windows guests and >> I think they all ultimately could be addressed by revisiting the missed >> tick catchup algorithms that we use. Mike and I spent a while talking >> about it yesterday and I wanted to take the discussion to the list to >> get some additional input. >> >> Here are the problems we're seeing: >> >> 1) Rapid reinjection can lead to time moving faster for short bursts of >> time. We've seen a number of RTC watchdog BSoDs and it's possible >> that at least one cause is reinjection speed. >> >> 2) When hibernating a host system, the guest gets is essentially paused >> for a long period of time. This results in a very large tick catchup >> while also resulting in a large skew in guest time. >> >> I've gotten reports of the tick catchup consuming a lot of CPU time >> from rapid delivery of interrupts (although I haven't reproduced this >> yet). >> >> 3) Windows appears to have a service that periodically syncs the guest >> time with the hardware clock. I've been told the resync period is an >> hour. For large clock skews, this can compete with reinjection >> resulting in a positive skew in time (the guest can be ahead of the >> host). >> >> I've been thinking about an algorithm like this to address these >> problems: >> >> A) Limit the number of interrupts that we reinject to the equivalent of >> a small period of wallclock time. Something like 60 seconds. >> >> B) In the event of (A), trigger a notification in QEMU. This is easy >> for the RTC but harder for the in-kernel PIT. Maybe it's a good time to >> revisit usage of the in-kernel PIT? >> >> C) On acculumated tick overflow, rely on using a qemu-ga command to >> force a resync of the guest's time to the hardware wallclock time. >> >> D) Whenever the guest reads the wallclock time from the RTC, reset all >> accumulated ticks. >> >> In order to do (C), we'll need to plumb qemu-ga through QMP. Mike and I >> discussed a low-impact way of doing this (having a separate dispatch >> path for guest agent commands) and I'm confident we could do this for >> 1.3. >> >> This would mean that management tools would need to consume qemu-ga >> through QMP. Not sure if this is a problem for anyone. >> >> I'm not sure whether it's worth trying to support this with the >> in-kernel PIT or not either. > > As with our current discussion around fixing the PIC and its impact on > the PIT, we should try on the userspace model first and then check if > the design can be adapted to support in-kernel as well. > > For which guests is the PIT important again? Old Linux kernels? Windows > should be mostly happy with the RTC - or the HPET. I thought that only 64-bit Win2k8+ used the RTC. I thought win2k3 and even 32-bit win2k8 still used the PIT. >> Are there other issues with reinjection that people are aware of? Does >> anything seem obviously wrong with the above? > > We should take the chance and design everything in a way that the HPET > can finally be (left) enabled. I thought the issue with the HPET was access frequency and the cost of heavy weight exits. I don't have concrete data here. I've only heard it second hand. Can anyone comment more? Regards, Anthony Liguori > > Jan > > -- > Siemens AG, Corporate Technology, CT RTC ITP SDP-DE > Corporate Competence Center Embedded Linux