From: Michael Tokarev <mjt@tls.msk.ru>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>, kvm <kvm@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: kvm guest: hrtimer: interrupt too slow
Date: Thu, 08 Oct 2009 12:09:57 +0400 [thread overview]
Message-ID: <4ACD9E55.4040604@msgid.tls.msk.ru> (raw)
In-Reply-To: <20091007231733.GG5903@nowhere>
[]
>>> hrtimer: interrupt too slow, forcing clock min delta to 461487495 ns
[]
> All that does not make sense anymore in a guest. The hang detection
> and warnings, the recalibrations of the min_clock_deltas are completely
> wrong in this context.
> Not only does it spuriously warn, but the minimum timer is increasing
> slowly and the guest progressively suffers from higher and higher
> latencies.
Well, it's not "slowly", -- that huge jump shown above is typical.
If my calculations are correct, that's about 0.5 sec min_delta.
> That's really bad.
*nod* :)
> Your patch lowers the immediate impact and makes this illness evolving
> smoother by scaling down the recalibration to the min_clock_delta.
> This appeases the bug but doesn't solve it. I fear it could be even
> worse because it makes it more discreet.
Well, long-term it's not worse still. New code has a chance to hitting
the same values for min_delta in a long run, but this chance is so small
and the time spent is so long that it can be forgotten about completely.
> May be can we instead increase the minimum threshold of loop in the
> hrtimer interrupt before considering it as a hang? Hmm, but a too high
> number could make this check useless, depending of the number of pending
> timers, which is a finite number.
>
> Well, actually I'm not confident anymore in this check. Or actually we
> should change it. May be we can rebase it on the time spent on the hrtimer
> interrupt (and check it every 10 loops of reprocessing in hrtimer_interrupts).
>
> Would a mimimum threshold of 5 seconds spent in hrtimer_interrupt() be
> a reasonable check to perform?
> We should probably base our check on such kind of high boundary.
> What we want is an ultimate rescue against hard hangs anyway, not
> something that can solve the hang source itself. After the min_clock_delta
> recalibration, the system will be unstable (eg: high latencies).
> So if this must behave as a hammer, let's ensure we really need this hammer,
> even if we need to wait for few seconds before it triggers.
By the way, all other cases I've seen this message (hrtimer: interrupt too slow..)
triggering, the problems were elsewhere and re-calibrating timer was not a good
idea anyway, because the problem was elsewhere and changing timer didn't solve
it.
Back into the vm issue at hand. I (almost) understand what's happening in
the discussion above, but I does not see how it is possible to have such a
*huge* delays explained by scheduling on a different CPU etc. The delays
are measured in *seconds*, not nano- or micro-secs etc.
I can imagine, say, swapping on host that causes the whole guest to be
swapped out for a while during the timer interrupt handling for example.
But it is NOT what's happening here, at least not that I can see it.
Yes host had some swapping:
pswpin 17535
pswpout 41602
but it's not massive and I know when exactly it happened - when I was testing
something else. Right now free(1) reports:
total used free shared buffers cached
Mem: 8155280 8105704 49576 0 1209136 27440
-/+ buffers/cache: 6869128 1286152
Swap: 8388856 124112 8264744
(and f*ng vmstat that, again, does not show swapping activity at all)
So, I think, the problem is somewhere elsewhere.
By the way, I *think* it only happens with kvm_clock, and does not
happen with acpi_pm clocksource. Is it worth to check?
Thanks!
/mjt
prev parent reply other threads:[~2009-10-08 8:10 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-29 13:12 kvm guest: hrtimer: interrupt too slow Michael Tokarev
2009-09-29 13:47 ` Avi Kivity
2009-09-29 13:58 ` Michael Tokarev
2009-10-05 10:47 ` Avi Kivity
2009-10-03 23:12 ` Marcelo Tosatti
[not found] ` <4AC88E7E.8050909@msgid.tls.msk.ru>
2009-10-05 0:50 ` Marcelo Tosatti
2009-10-05 9:31 ` Michael Tokarev
2009-10-06 13:30 ` Michael Tokarev
2009-10-07 23:17 ` Frederic Weisbecker
2009-10-08 0:54 ` Marcelo Tosatti
2009-10-08 7:54 ` Michael Tokarev
2009-10-08 8:06 ` Thomas Gleixner
2009-10-08 8:14 ` Michael Tokarev
2009-10-08 9:29 ` Thomas Gleixner
2009-10-08 14:06 ` Michael Tokarev
2009-10-08 15:06 ` Thomas Gleixner
2009-10-08 19:52 ` Marcelo Tosatti
2009-10-09 21:22 ` Michael Tokarev
2009-10-09 22:27 ` Frederic Weisbecker
2009-10-09 22:34 ` Michael Tokarev
2009-10-10 9:18 ` Michael Tokarev
2009-10-10 9:24 ` Frederic Weisbecker
2009-10-10 17:37 ` Marcelo Tosatti
2009-10-08 8:05 ` Thomas Gleixner
2009-10-08 19:22 ` Marcelo Tosatti
2009-10-08 20:25 ` Thomas Gleixner
2009-10-08 21:02 ` Michael Tokarev
2009-10-10 17:32 ` [PATCH] tune hrtimer_interrupt hang logic Marcelo Tosatti
2009-10-08 8:09 ` Michael Tokarev [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ACD9E55.4040604@msgid.tls.msk.ru \
--to=mjt@tls.msk.ru \
--cc=fweisbec@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mtosatti@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.