All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Michael Tokarev <mjt@tls.msk.ru>, kvm <kvm@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: kvm guest: hrtimer: interrupt too slow
Date: Wed, 7 Oct 2009 21:54:56 -0300	[thread overview]
Message-ID: <20091008005456.GA10032@amt.cnet> (raw)
In-Reply-To: <20091007231733.GG5903@nowhere>

On Thu, Oct 08, 2009 at 01:17:35AM +0200, Frederic Weisbecker wrote:
> (Adding Thomas in Cc)
> 
> 
> On Sat, Oct 03, 2009 at 08:12:05PM -0300, Marcelo Tosatti wrote:
> > Michael,
> > 
> > Can you please give the patch below a try please? (without acpi_pm timer 
> > or priority adjustments for the guest).
> > 
> > On Tue, Sep 29, 2009 at 05:12:17PM +0400, Michael Tokarev wrote:
> > > Hello.
> > >
> > > I'm having quite an.. unusable system here.
> > > It's not really a regresssion with 0.11.0,
> > > it was something similar before, but with
> > > 0.11.0 and/or 2.6.31 it become much worse.
> > >
> > > The thing is that after some uptime, kvm
> > > guest prints something like this:
> > >
> > > hrtimer: interrupt too slow, forcing clock min delta to 461487495 ns
> > >
> > > after which system (guest) speeed becomes
> > > very slow.  The above message is from
> > > 2.6.31 guest running wiht 0.11.0 & 2.6.31
> > > host.  Before I tried it with 0.10.6 and
> > > 2.6.30 or 2.6.27, and the delta were a
> > > bit less than that:
> > >
> > > hrtimer: interrupt too slow, forcing clock min delta to 152222415 ns
> > > hrtimer: interrupt too slow, forcing clock min delta to 93629025 ns
> > 
> > It seems the way hrtimer_interrupt_hanging calculates min_delta is
> > wrong (especially to virtual machines). The guest vcpu can be scheduled
> > out during the execution of the hrtimer callbacks (and the callbacks
> > themselves can do operations that translate to blocking operations in
> > the hypervisor).
> > 
> > So high min_delta values can be calculated if, for example, a single
> > hrtimer_interrupt run takes two host time slices to execute, while some
> > other higher priority task runs for N slices in between.
> > 
> > Using the hrtimer_interrupt execution time (which can be the worse
> > case at any given time), as the min_delta is problematic.
> > 
> > So simply increase min_delta_ns by 50% once every detected failure,
> > which will eventually lead to an acceptable threshold (the algorithm
> > should scale back to down lower min_delta, to adjust back to wealthier
> > times, too).
> > 
> > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> > index 49da79a..8997978 100644
> > --- a/kernel/hrtimer.c
> > +++ b/kernel/hrtimer.c
> > @@ -1234,28 +1234,20 @@ static void __run_hrtimer(struct hrtimer *timer)
> >  
> >  #ifdef CONFIG_HIGH_RES_TIMERS
> >  
> > -static int force_clock_reprogram;
> > -
> >  /*
> >   * After 5 iteration's attempts, we consider that hrtimer_interrupt()
> >   * is hanging, which could happen with something that slows the interrupt
> > - * such as the tracing. Then we force the clock reprogramming for each future
> > - * hrtimer interrupts to avoid infinite loops and use the min_delta_ns
> > - * threshold that we will overwrite.
> > - * The next tick event will be scheduled to 3 times we currently spend on
> > - * hrtimer_interrupt(). This gives a good compromise, the cpus will spend
> > - * 1/4 of their time to process the hrtimer interrupts. This is enough to
> > - * let it running without serious starvation.
> > + * such as the tracing, so we increase min_delta_ns.
> >   */
> >  
> >  static inline void
> > -hrtimer_interrupt_hanging(struct clock_event_device *dev,
> > -			ktime_t try_time)
> > +hrtimer_interrupt_hanging(struct clock_event_device *dev)
> >  {
> > -	force_clock_reprogram = 1;
> > -	dev->min_delta_ns = (unsigned long)try_time.tv64 * 3;
> > -	printk(KERN_WARNING "hrtimer: interrupt too slow, "
> > -		"forcing clock min delta to %lu ns\n", dev->min_delta_ns);
> > +	dev->min_delta_ns += dev->min_delta_ns >> 1;
> 
> 
> I haven't thought about the guest that could be scheduled out in
> the middle of the timers servicing, making wrong this check based
> of the time spent in hrtimer_interrupt().
> 
> I guess there is no easy/generic/cheap way to rebase this check
> on the _virtual_ time spent in the timers servicing. By virtual,
> I mean the time spent in the guest only.
> 
> In a non-guest kernel, the old check forces an adaptive rate
> sharing:
> 
> - we spent n nanosecs to service the batch of timers.
> - we are hanging
> - we want at least 3/4 of time reserved for non-timer
>   servicing in the kernel, this is a minimum prerequisite
>   for the system to not starve
> - adapt the min_clock_delta against to fit the above constraint
> 
> All that does not make sense anymore in a guest. The hang detection
> and warnings, the recalibrations of the min_clock_deltas are completely
> wrong in this context.
> Not only does it spuriously warn, but the minimum timer is increasing
> slowly and the guest progressively suffers from higher and higher
> latencies.
> 
> That's really bad.
> 
> Your patch lowers the immediate impact and makes this illness evolving
> smoother by scaling down the recalibration to the min_clock_delta.
> This appeases the bug but doesn't solve it. I fear it could be even
> worse because it makes it more discreet.

True.

> May be can we instead increase the minimum threshold of loop in the
> hrtimer interrupt before considering it as a hang? Hmm, but a too high
> number could make this check useless, depending of the number of pending
> timers, which is a finite number.
> 
> Well, actually I'm not confident anymore in this check. Or actually we
> should change it. May be we can rebase it on the time spent on the hrtimer
> interrupt (and check it every 10 loops of reprocessing in hrtimer_interrupts).
> 
> Would a mimimum threshold of 5 seconds spent in hrtimer_interrupt() be
> a reasonable check to perform?
> We should probably base our check on such kind of high boundary.
> What we want is an ultimate rescue against hard hangs anyway, not
> something that can solve the hang source itself. After the min_clock_delta
> recalibration, the system will be unstable (eg: high latencies).
> So if this must behave as a hammer, let's ensure we really need this hammer,
> even if we need to wait for few seconds before it triggers.
> 
> 
> What do you think?

What about getting rid of the retry loop, instead? So something
like:

- run hrtimer callbacks (once)
- while (tick_program_event(expires))
	expires = ktime_add_ns(expires, dev->min_delta_ns)

This way there's no static tuning involved.

Its not clear to me why the loop is there in the first place.


  reply	other threads:[~2009-10-08  1:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-29 13:12 kvm guest: hrtimer: interrupt too slow Michael Tokarev
2009-09-29 13:47 ` Avi Kivity
2009-09-29 13:58   ` Michael Tokarev
2009-10-05 10:47     ` Avi Kivity
2009-10-03 23:12 ` Marcelo Tosatti
     [not found]   ` <4AC88E7E.8050909@msgid.tls.msk.ru>
2009-10-05  0:50     ` Marcelo Tosatti
2009-10-05  9:31       ` Michael Tokarev
2009-10-06 13:30         ` Michael Tokarev
2009-10-07 23:17   ` Frederic Weisbecker
2009-10-08  0:54     ` Marcelo Tosatti [this message]
2009-10-08  7:54       ` Michael Tokarev
2009-10-08  8:06         ` Thomas Gleixner
2009-10-08  8:14           ` Michael Tokarev
2009-10-08  9:29             ` Thomas Gleixner
2009-10-08 14:06               ` Michael Tokarev
2009-10-08 15:06                 ` Thomas Gleixner
2009-10-08 19:52                 ` Marcelo Tosatti
2009-10-09 21:22                   ` Michael Tokarev
2009-10-09 22:27                     ` Frederic Weisbecker
2009-10-09 22:34                       ` Michael Tokarev
2009-10-10  9:18                         ` Michael Tokarev
2009-10-10  9:24                           ` Frederic Weisbecker
2009-10-10 17:37                         ` Marcelo Tosatti
2009-10-08  8:05       ` Thomas Gleixner
2009-10-08 19:22         ` Marcelo Tosatti
2009-10-08 20:25           ` Thomas Gleixner
2009-10-08 21:02             ` Michael Tokarev
2009-10-10 17:32             ` [PATCH] tune hrtimer_interrupt hang logic Marcelo Tosatti
2009-10-08  8:09     ` kvm guest: hrtimer: interrupt too slow Michael Tokarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091008005456.GA10032@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mjt@tls.msk.ru \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.