qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Rethinking missed tick catchup
@ 2012-09-12 13:54 Anthony Liguori
  2012-09-12 14:21 ` Jan Kiszka
                   ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Anthony Liguori @ 2012-09-12 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Gleb Natapov, Jan Kiszka, Michael Roth, Luiz Capitulino,
	Avi Kivity, Paolo Bonzini, Eric Blake


Hi,

We've been running into a lot of problems lately with Windows guests and
I think they all ultimately could be addressed by revisiting the missed
tick catchup algorithms that we use.  Mike and I spent a while talking
about it yesterday and I wanted to take the discussion to the list to
get some additional input.

Here are the problems we're seeing:

1) Rapid reinjection can lead to time moving faster for short bursts of
   time.  We've seen a number of RTC watchdog BSoDs and it's possible
   that at least one cause is reinjection speed.

2) When hibernating a host system, the guest gets is essentially paused
   for a long period of time.  This results in a very large tick catchup
   while also resulting in a large skew in guest time.

   I've gotten reports of the tick catchup consuming a lot of CPU time
   from rapid delivery of interrupts (although I haven't reproduced this
   yet).

3) Windows appears to have a service that periodically syncs the guest
   time with the hardware clock.  I've been told the resync period is an
   hour.  For large clock skews, this can compete with reinjection
   resulting in a positive skew in time (the guest can be ahead of the
   host).

I've been thinking about an algorithm like this to address these
problems:

A) Limit the number of interrupts that we reinject to the equivalent of
   a small period of wallclock time.  Something like 60 seconds.

B) In the event of (A), trigger a notification in QEMU.  This is easy
   for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
   revisit usage of the in-kernel PIT?

C) On acculumated tick overflow, rely on using a qemu-ga command to
   force a resync of the guest's time to the hardware wallclock time.

D) Whenever the guest reads the wallclock time from the RTC, reset all
   accumulated ticks.

In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
discussed a low-impact way of doing this (having a separate dispatch
path for guest agent commands) and I'm confident we could do this for
1.3.

This would mean that management tools would need to consume qemu-ga
through QMP.  Not sure if this is a problem for anyone.

I'm not sure whether it's worth trying to support this with the
in-kernel PIT or not either.

Are there other issues with reinjection that people are aware of?  Does
anything seem obviously wrong with the above?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: [Qemu-devel] Rethinking missed tick catchup
@ 2012-09-12 18:03 Clemens Kolbitsch
  2012-09-13  6:25 ` Paolo Bonzini
  0 siblings, 1 reply; 48+ messages in thread
From: Clemens Kolbitsch @ 2012-09-12 18:03 UTC (permalink / raw)
  To: qemu-devel

> On 2012-09-12 15:54, Anthony Liguori wrote:
>>
>> Hi,
>>
>> We've been running into a lot of problems lately with Windows guests and
>> I think they all ultimately could be addressed by revisiting the missed
>> tick catchup algorithms that we use.  Mike and I spent a while talking
>> about it yesterday and I wanted to take the discussion to the list to
>> get some additional input.
>>
>> Here are the problems we're seeing:
>>
>> 1) Rapid reinjection can lead to time moving faster for short bursts of
>>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
>>    that at least one cause is reinjection speed.
>>
>> 2) When hibernating a host system, the guest gets is essentially paused
>>    for a long period of time.  This results in a very large tick catchup
>>    while also resulting in a large skew in guest time.
>>
>>    I've gotten reports of the tick catchup consuming a lot of CPU time
>>    from rapid delivery of interrupts (although I haven't reproduced this
>>    yet).

Guys,

not much that I can contribute to solving the problem, but I have a
bunch of VMs where this happens _every_ time I resume a snapshot (but
without hibernating). In case this could be a connected problem and
you need help testing a patch, I'm more than happy to help.

-Clemens

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2012-09-19 16:57 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-12 13:54 [Qemu-devel] Rethinking missed tick catchup Anthony Liguori
2012-09-12 14:21 ` Jan Kiszka
2012-09-12 14:44   ` Anthony Liguori
2012-09-12 14:50     ` Jan Kiszka
2012-09-12 15:06     ` Gleb Natapov
2012-09-12 15:42       ` Jan Kiszka
2012-09-12 15:45         ` Gleb Natapov
2012-09-12 16:16       ` Gleb Natapov
2012-09-12 15:15 ` Gleb Natapov
2012-09-12 18:19   ` Anthony Liguori
2012-09-13 10:49     ` Gleb Natapov
2012-09-13 13:14       ` Eric Blake
2012-09-13 13:28         ` Daniel P. Berrange
2012-09-13 14:06           ` Anthony Liguori
2012-09-13 14:22             ` Gleb Natapov
2012-09-13 14:34               ` Avi Kivity
2012-09-13 14:42                 ` Eric Blake
2012-09-13 15:40                   ` Avi Kivity
2012-09-13 15:50                     ` Anthony Liguori
2012-09-13 15:53                       ` Avi Kivity
2012-09-13 18:27                         ` Anthony Liguori
2012-09-16 10:05                           ` Avi Kivity
2012-09-16 14:37                             ` Anthony Liguori
2012-09-19 15:34                               ` Avi Kivity
2012-09-19 16:37                                 ` Gleb Natapov
2012-09-19 16:44                                   ` Avi Kivity
2012-09-19 16:55                                     ` Gleb Natapov
2012-09-19 16:57                                       ` Avi Kivity
2012-09-13 14:35               ` Anthony Liguori
2012-09-13 14:48                 ` Gleb Natapov
2012-09-13 15:51                   ` Avi Kivity
2012-09-13 15:56                   ` Anthony Liguori
2012-09-13 16:06                     ` Gleb Natapov
2012-09-13 18:33                       ` Anthony Liguori
2012-09-13 18:56                         ` Gleb Natapov
2012-09-13 20:06                           ` Anthony Liguori
2012-09-13 16:08                     ` Avi Kivity
2012-09-13 13:47         ` Gleb Natapov
2012-09-12 16:27 ` Stefan Weil
2012-09-12 16:45   ` Gleb Natapov
2012-09-12 17:30     ` Stefan Weil
2012-09-12 18:13       ` Gleb Natapov
2012-09-12 19:45         ` Stefan Weil
2012-09-13 10:50           ` Gleb Natapov
2012-09-12 20:06       ` Michael Roth
2012-09-12 17:23 ` Luiz Capitulino
  -- strict thread matches above, loose matches on Subject: below --
2012-09-12 18:03 Clemens Kolbitsch
2012-09-13  6:25 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).