kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Michael Tokarev <mjt@tls.msk.ru>, kvm <kvm@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: kvm guest: hrtimer: interrupt too slow
Date: Wed, 7 Oct 2009 21:54:56 -0300	[thread overview]
Message-ID: <20091008005456.GA10032@amt.cnet> (raw)
In-Reply-To: <20091007231733.GG5903@nowhere>

On Thu, Oct 08, 2009 at 01:17:35AM +0200, Frederic Weisbecker wrote:
> (Adding Thomas in Cc)
> 
> 
> On Sat, Oct 03, 2009 at 08:12:05PM -0300, Marcelo Tosatti wrote:
> > Michael,
> > 
> > Can you please give the patch below a try please? (without acpi_pm timer 
> > or priority adjustments for the guest).
> > 
> > On Tue, Sep 29, 2009 at 05:12:17PM +0400, Michael Tokarev wrote:
> > > Hello.
> > >
> > > I'm having quite an.. unusable system here.
> > > It's not really a regresssion with 0.11.0,
> > > it was something similar before, but with
> > > 0.11.0 and/or 2.6.31 it become much worse.
> > >
> > > The thing is that after some uptime, kvm
> > > guest prints something like this:
> > >
> > > hrtimer: interrupt too slow, forcing clock min delta to 461487495 ns
> > >
> > > after which system (guest) speeed becomes
> > > very slow.  The above message is from
> > > 2.6.31 guest running wiht 0.11.0 & 2.6.31
> > > host.  Before I tried it with 0.10.6 and
> > > 2.6.30 or 2.6.27, and the delta were a
> > > bit less than that:
> > >
> > > hrtimer: interrupt too slow, forcing clock min delta to 152222415 ns
> > > hrtimer: interrupt too slow, forcing clock min delta to 93629025 ns
> > 
> > It seems the way hrtimer_interrupt_hanging calculates min_delta is
> > wrong (especially to virtual machines). The guest vcpu can be scheduled
> > out during the execution of the hrtimer callbacks (and the callbacks
> > themselves can do operations that translate to blocking operations in
> > the hypervisor).
> > 
> > So high min_delta values can be calculated if, for example, a single
> > hrtimer_interrupt run takes two host time slices to execute, while some
> > other higher priority task runs for N slices in between.
> > 
> > Using the hrtimer_interrupt execution time (which can be the worse
> > case at any given time), as the min_delta is problematic.
> > 
> > So simply increase min_delta_ns by 50% once every detected failure,
> > which will eventually lead to an acceptable threshold (the algorithm
> > should scale back to down lower min_delta, to adjust back to wealthier
> > times, too).
> > 
> > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> > index 49da79a..8997978 100644
> > --- a/kernel/hrtimer.c
> > +++ b/kernel/hrtimer.c
> > @@ -1234,28 +1234,20 @@ static void __run_hrtimer(struct hrtimer *timer)
> >  
> >  #ifdef CONFIG_HIGH_RES_TIMERS
> >  
> > -static int force_clock_reprogram;
> > -
> >  /*
> >   * After 5 iteration's attempts, we consider that hrtimer_interrupt()
> >   * is hanging, which could happen with something that slows the interrupt
> > - * such as the tracing. Then we force the clock reprogramming for each future
> > - * hrtimer interrupts to avoid infinite loops and use the min_delta_ns
> > - * threshold that we will overwrite.
> > - * The next tick event will be scheduled to 3 times we currently spend on
> > - * hrtimer_interrupt(). This gives a good compromise, the cpus will spend
> > - * 1/4 of their time to process the hrtimer interrupts. This is enough to
> > - * let it running without serious starvation.
> > + * such as the tracing, so we increase min_delta_ns.
> >   */
> >  
> >  static inline void
> > -hrtimer_interrupt_hanging(struct clock_event_device *dev,
> > -			ktime_t try_time)
> > +hrtimer_interrupt_hanging(struct clock_event_device *dev)
> >  {
> > -	force_clock_reprogram = 1;
> > -	dev->min_delta_ns = (unsigned long)try_time.tv64 * 3;
> > -	printk(KERN_WARNING "hrtimer: interrupt too slow, "
> > -		"forcing clock min delta to %lu ns\n", dev->min_delta_ns);
> > +	dev->min_delta_ns += dev->min_delta_ns >> 1;
> 
> 
> I haven't thought about the guest that could be scheduled out in
> the middle of the timers servicing, making wrong this check based
> of the time spent in hrtimer_interrupt().
> 
> I guess there is no easy/generic/cheap way to rebase this check
> on the _virtual_ time spent in the timers servicing. By virtual,
> I mean the time spent in the guest only.
> 
> In a non-guest kernel, the old check forces an adaptive rate
> sharing:
> 
> - we spent n nanosecs to service the batch of timers.
> - we are hanging
> - we want at least 3/4 of time reserved for non-timer
>   servicing in the kernel, this is a minimum prerequisite
>   for the system to not starve
> - adapt the min_clock_delta against to fit the above constraint
> 
> All that does not make sense anymore in a guest. The hang detection
> and warnings, the recalibrations of the min_clock_deltas are completely
> wrong in this context.
> Not only does it spuriously warn, but the minimum timer is increasing
> slowly and the guest progressively suffers from higher and higher
> latencies.
> 
> That's really bad.
> 
> Your patch lowers the immediate impact and makes this illness evolving
> smoother by scaling down the recalibration to the min_clock_delta.
> This appeases the bug but doesn't solve it. I fear it could be even
> worse because it makes it more discreet.

True.

> May be can we instead increase the minimum threshold of loop in the
> hrtimer interrupt before considering it as a hang? Hmm, but a too high
> number could make this check useless, depending of the number of pending
> timers, which is a finite number.
> 
> Well, actually I'm not confident anymore in this check. Or actually we
> should change it. May be we can rebase it on the time spent on the hrtimer
> interrupt (and check it every 10 loops of reprocessing in hrtimer_interrupts).
> 
> Would a mimimum threshold of 5 seconds spent in hrtimer_interrupt() be
> a reasonable check to perform?
> We should probably base our check on such kind of high boundary.
> What we want is an ultimate rescue against hard hangs anyway, not
> something that can solve the hang source itself. After the min_clock_delta
> recalibration, the system will be unstable (eg: high latencies).
> So if this must behave as a hammer, let's ensure we really need this hammer,
> even if we need to wait for few seconds before it triggers.
> 
> 
> What do you think?

What about getting rid of the retry loop, instead? So something
like:

- run hrtimer callbacks (once)
- while (tick_program_event(expires))
	expires = ktime_add_ns(expires, dev->min_delta_ns)

This way there's no static tuning involved.

Its not clear to me why the loop is there in the first place.


  reply	other threads:[~2009-10-08  1:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-29 13:12 kvm guest: hrtimer: interrupt too slow Michael Tokarev
2009-09-29 13:47 ` Avi Kivity
2009-09-29 13:58   ` Michael Tokarev
2009-10-05 10:47     ` Avi Kivity
2009-10-03 23:12 ` Marcelo Tosatti
     [not found]   ` <4AC88E7E.8050909@msgid.tls.msk.ru>
2009-10-05  0:50     ` Marcelo Tosatti
2009-10-05  9:31       ` Michael Tokarev
2009-10-06 13:30         ` Michael Tokarev
2009-10-07 23:17   ` Frederic Weisbecker
2009-10-08  0:54     ` Marcelo Tosatti [this message]
2009-10-08  7:54       ` Michael Tokarev
2009-10-08  8:06         ` Thomas Gleixner
2009-10-08  8:14           ` Michael Tokarev
2009-10-08  9:29             ` Thomas Gleixner
2009-10-08 14:06               ` Michael Tokarev
2009-10-08 15:06                 ` Thomas Gleixner
2009-10-08 19:52                 ` Marcelo Tosatti
2009-10-09 21:22                   ` Michael Tokarev
2009-10-09 22:27                     ` Frederic Weisbecker
2009-10-09 22:34                       ` Michael Tokarev
2009-10-10  9:18                         ` Michael Tokarev
2009-10-10  9:24                           ` Frederic Weisbecker
2009-10-10 17:37                         ` Marcelo Tosatti
2009-10-08  8:05       ` Thomas Gleixner
2009-10-08 19:22         ` Marcelo Tosatti
2009-10-08 20:25           ` Thomas Gleixner
2009-10-08 21:02             ` Michael Tokarev
2009-10-10 17:32             ` [PATCH] tune hrtimer_interrupt hang logic Marcelo Tosatti
2009-10-08  8:09     ` kvm guest: hrtimer: interrupt too slow Michael Tokarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091008005456.GA10032@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mjt@tls.msk.ru \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).