Re: [Qemu-devel] Rethinking missed tick catchup

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Gleb Natapov <gleb@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	qemu-devel@nongnu.org, Luiz Capitulino <lcapitulino@redhat.com>,
	Avi Kivity <avi@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
	Eric Blake <eblake@redhat.com>
Subject: Re: [Qemu-devel] Rethinking missed tick catchup
Date: Thu, 13 Sep 2012 13:49:40 +0300	[thread overview]
Message-ID: <20120913104940.GA20907@redhat.com> (raw)
In-Reply-To: <87y5kfrtne.fsf@codemonkey.ws>

On Wed, Sep 12, 2012 at 01:19:17PM -0500, Anthony Liguori wrote:
> Gleb Natapov <gleb@redhat.com> writes:
> 
> > On Wed, Sep 12, 2012 at 08:54:26AM -0500, Anthony Liguori wrote:
> >> 
> >> Hi,
> >> 
> >> We've been running into a lot of problems lately with Windows guests and
> >> I think they all ultimately could be addressed by revisiting the missed
> >> tick catchup algorithms that we use.  Mike and I spent a while talking
> >> about it yesterday and I wanted to take the discussion to the list to
> >> get some additional input.
> >> 
> >> Here are the problems we're seeing:
> >> 
> >> 1) Rapid reinjection can lead to time moving faster for short bursts of
> >>    time.  We've seen a number of RTC watchdog BSoDs and it's possible
> >>    that at least one cause is reinjection speed.
> >> 
> >> 2) When hibernating a host system, the guest gets is essentially paused
> >>    for a long period of time.  This results in a very large tick catchup
> >>    while also resulting in a large skew in guest time.
> >> 
> >>    I've gotten reports of the tick catchup consuming a lot of CPU time
> >>    from rapid delivery of interrupts (although I haven't reproduced this
> >>    yet).
> >> 
> >> 3) Windows appears to have a service that periodically syncs the guest
> >>    time with the hardware clock.  I've been told the resync period is an
> >>    hour.  For large clock skews, this can compete with reinjection
> >>    resulting in a positive skew in time (the guest can be ahead of the
> >>    host).
> >> 
> >> I've been thinking about an algorithm like this to address these
> >> problems:
> >> 
> >> A) Limit the number of interrupts that we reinject to the equivalent of
> >>    a small period of wallclock time.  Something like 60 seconds.
> >> 
> > How this will fix BSOD problem for instance? 60 seconds is long enough
> > to cause all the problem you are talking about above. We can make
> > amount of accumulated ticks easily configurable though to play with and
> > see.
> 
> It won't, but the goal of an upper limit is to cap time correction at
> something reasonably caused by overcommit, not by suspend/resume.
> 
> 60 seconds is probably way too long.  Maybe 5 seconds?  We can try
> various amounts as you said.
> 
> What do you think about slowing down the catchup rate?  I think now we
> increase wallclock time by 100-700%.
> 
Now we reinject up to 20 lost tick on guest interrupt acknowledgement
(RTC register C read) and increment frequency like you say if this is
not enough. We do both because on machines without hr timers we cannot
increment frequency if guest sets RTC to 1kHz and injecting a lot of RTC
interrupts at once makes Windows think that RTC irq line is stuck -> BSOD

> This is very fast.  I wonder if this makes sense anymore since hr timers
> are pretty much ubiquitous.
We can drop reinject on ACK if we do not want to support old kernels.
Frequency increase was arbitrary, we can make is smaller, but we have to
make sure that under load drift will not be stronger than our attempts
to fix it.

> 
> I think we could probably even just increase wallclock time by as little
> as 10-20%.  That should avoid false watchdog alerts but still give us a
> chance to inject enough interrupts.
We can start from 10-20% and check that if coalesced counter still grows
increase that.

> 
> >
> >> B) In the event of (A), trigger a notification in QEMU.  This is easy
> >>    for the RTC but harder for the in-kernel PIT.  Maybe it's a good time to
> >>    revisit usage of the in-kernel PIT?
> >> 
> > PIT does not matter for Windows guests.
> >
> >> C) On acculumated tick overflow, rely on using a qemu-ga command to
> >>    force a resync of the guest's time to the hardware wallclock time.
> >> 
> > Needs guest cooperation.
> 
> Yes, hence qemu-ga.  But is there any other choice?  Hibernation can
> cause us to miss an unbounded number of ticks.   Days worth of time.  It
> seems unreasonable to gradually catch up that much time.
timedrift fix was never meant to fix timedrifts from vmstop. This is a
side effect of making RTC use real time clock instead of vm clock. With
RTC using real time clock on resume qemu_timer tries to catch up with
current time and fires timer callback for each lost tick. They are all
coalesced of course since guest has no chance to run between them and
accumulated into coalesced_irq counter. If you configure RTC to use vm
clock you should not see this.

I agree with you of course that qemu-ga is the only sane way to fix time
drift due to vstop, but better to not be in this situation if possible.
See bellow.

> 
> >> D) Whenever the guest reads the wallclock time from the RTC, reset all
> >>    accumulated ticks.
> >>
> >> In order to do (C), we'll need to plumb qemu-ga through QMP.  Mike and I
> >> discussed a low-impact way of doing this (having a separate dispatch
> >> path for guest agent commands) and I'm confident we could do this for
> >> 1.3.
> >> 
> >> This would mean that management tools would need to consume qemu-ga
> >> through QMP.  Not sure if this is a problem for anyone.
> >> 
> >> I'm not sure whether it's worth trying to support this with the
> >> in-kernel PIT or not either.
> >> 
> >> Are there other issues with reinjection that people are aware of?  Does
> >> anything seem obviously wrong with the above?
> >> 
> > It looks like you are trying to solve only pathologically big timedrift
> > problems. Those do not happen normally.
> 
> They do if you hibernate your laptop.
> 
AFAIK libvirt migrates vm into a file on hibernate. It is better to move to S3
(using qemu-ga) instead and migrate to file only if s3 fails.

--
			Gleb.

next prev parent reply	other threads:[~2012-09-13 10:49 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-12 13:54 [Qemu-devel] Rethinking missed tick catchup Anthony Liguori
2012-09-12 14:21 ` Jan Kiszka
2012-09-12 14:44   ` Anthony Liguori
2012-09-12 14:50     ` Jan Kiszka
2012-09-12 15:06     ` Gleb Natapov
2012-09-12 15:42       ` Jan Kiszka
2012-09-12 15:45         ` Gleb Natapov
2012-09-12 16:16       ` Gleb Natapov
2012-09-12 15:15 ` Gleb Natapov
2012-09-12 18:19   ` Anthony Liguori
2012-09-13 10:49     ` Gleb Natapov [this message]
2012-09-13 13:14       ` Eric Blake
2012-09-13 13:28         ` Daniel P. Berrange
2012-09-13 14:06           ` Anthony Liguori
2012-09-13 14:22             ` Gleb Natapov
2012-09-13 14:34               ` Avi Kivity
2012-09-13 14:42                 ` Eric Blake
2012-09-13 15:40                   ` Avi Kivity
2012-09-13 15:50                     ` Anthony Liguori
2012-09-13 15:53                       ` Avi Kivity
2012-09-13 18:27                         ` Anthony Liguori
2012-09-16 10:05                           ` Avi Kivity
2012-09-16 14:37                             ` Anthony Liguori
2012-09-19 15:34                               ` Avi Kivity
2012-09-19 16:37                                 ` Gleb Natapov
2012-09-19 16:44                                   ` Avi Kivity
2012-09-19 16:55                                     ` Gleb Natapov
2012-09-19 16:57                                       ` Avi Kivity
2012-09-13 14:35               ` Anthony Liguori
2012-09-13 14:48                 ` Gleb Natapov
2012-09-13 15:51                   ` Avi Kivity
2012-09-13 15:56                   ` Anthony Liguori
2012-09-13 16:06                     ` Gleb Natapov
2012-09-13 18:33                       ` Anthony Liguori
2012-09-13 18:56                         ` Gleb Natapov
2012-09-13 20:06                           ` Anthony Liguori
2012-09-13 16:08                     ` Avi Kivity
2012-09-13 13:47         ` Gleb Natapov
2012-09-12 16:27 ` Stefan Weil
2012-09-12 16:45   ` Gleb Natapov
2012-09-12 17:30     ` Stefan Weil
2012-09-12 18:13       ` Gleb Natapov
2012-09-12 19:45         ` Stefan Weil
2012-09-13 10:50           ` Gleb Natapov
2012-09-12 20:06       ` Michael Roth
2012-09-12 17:23 ` Luiz Capitulino
  -- strict thread matches above, loose matches on Subject: below --
2012-09-12 18:03 Clemens Kolbitsch
2012-09-13  6:25 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120913104940.GA20907@redhat.com \
    --to=gleb@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=avi@redhat.com \
    --cc=eblake@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=lcapitulino@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).