From: Michael Tokarev <mjt@tls.msk.ru>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>, kvm <kvm@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: kvm guest: hrtimer: interrupt too slow
Date: Thu, 08 Oct 2009 12:09:57 +0400 [thread overview]
Message-ID: <4ACD9E55.4040604@msgid.tls.msk.ru> (raw)
In-Reply-To: <20091007231733.GG5903@nowhere>
[]
>>> hrtimer: interrupt too slow, forcing clock min delta to 461487495 ns
[]
> All that does not make sense anymore in a guest. The hang detection
> and warnings, the recalibrations of the min_clock_deltas are completely
> wrong in this context.
> Not only does it spuriously warn, but the minimum timer is increasing
> slowly and the guest progressively suffers from higher and higher
> latencies.
Well, it's not "slowly", -- that huge jump shown above is typical.
If my calculations are correct, that's about 0.5 sec min_delta.
> That's really bad.
*nod* :)
> Your patch lowers the immediate impact and makes this illness evolving
> smoother by scaling down the recalibration to the min_clock_delta.
> This appeases the bug but doesn't solve it. I fear it could be even
> worse because it makes it more discreet.
Well, long-term it's not worse still. New code has a chance to hitting
the same values for min_delta in a long run, but this chance is so small
and the time spent is so long that it can be forgotten about completely.
> May be can we instead increase the minimum threshold of loop in the
> hrtimer interrupt before considering it as a hang? Hmm, but a too high
> number could make this check useless, depending of the number of pending
> timers, which is a finite number.
>
> Well, actually I'm not confident anymore in this check. Or actually we
> should change it. May be we can rebase it on the time spent on the hrtimer
> interrupt (and check it every 10 loops of reprocessing in hrtimer_interrupts).
>
> Would a mimimum threshold of 5 seconds spent in hrtimer_interrupt() be
> a reasonable check to perform?
> We should probably base our check on such kind of high boundary.
> What we want is an ultimate rescue against hard hangs anyway, not
> something that can solve the hang source itself. After the min_clock_delta
> recalibration, the system will be unstable (eg: high latencies).
> So if this must behave as a hammer, let's ensure we really need this hammer,
> even if we need to wait for few seconds before it triggers.
By the way, all other cases I've seen this message (hrtimer: interrupt too slow..)
triggering, the problems were elsewhere and re-calibrating timer was not a good
idea anyway, because the problem was elsewhere and changing timer didn't solve
it.
Back into the vm issue at hand. I (almost) understand what's happening in
the discussion above, but I does not see how it is possible to have such a
*huge* delays explained by scheduling on a different CPU etc. The delays
are measured in *seconds*, not nano- or micro-secs etc.
I can imagine, say, swapping on host that causes the whole guest to be
swapped out for a while during the timer interrupt handling for example.
But it is NOT what's happening here, at least not that I can see it.
Yes host had some swapping:
pswpin 17535
pswpout 41602
but it's not massive and I know when exactly it happened - when I was testing
something else. Right now free(1) reports:
total used free shared buffers cached
Mem: 8155280 8105704 49576 0 1209136 27440
-/+ buffers/cache: 6869128 1286152
Swap: 8388856 124112 8264744
(and f*ng vmstat that, again, does not show swapping activity at all)
So, I think, the problem is somewhere elsewhere.
By the way, I *think* it only happens with kvm_clock, and does not
happen with acpi_pm clocksource. Is it worth to check?
Thanks!
/mjt
prev parent reply other threads:[~2009-10-08 8:10 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-29 13:12 kvm guest: hrtimer: interrupt too slow Michael Tokarev
2009-09-29 13:47 ` Avi Kivity
2009-09-29 13:58 ` Michael Tokarev
2009-10-05 10:47 ` Avi Kivity
2009-10-03 23:12 ` Marcelo Tosatti
[not found] ` <4AC88E7E.8050909@msgid.tls.msk.ru>
2009-10-05 0:50 ` Marcelo Tosatti
2009-10-05 9:31 ` Michael Tokarev
2009-10-06 13:30 ` Michael Tokarev
2009-10-07 23:17 ` Frederic Weisbecker
2009-10-08 0:54 ` Marcelo Tosatti
2009-10-08 7:54 ` Michael Tokarev
2009-10-08 8:06 ` Thomas Gleixner
2009-10-08 8:14 ` Michael Tokarev
2009-10-08 9:29 ` Thomas Gleixner
2009-10-08 14:06 ` Michael Tokarev
2009-10-08 15:06 ` Thomas Gleixner
2009-10-08 19:52 ` Marcelo Tosatti
2009-10-09 21:22 ` Michael Tokarev
2009-10-09 22:27 ` Frederic Weisbecker
2009-10-09 22:34 ` Michael Tokarev
2009-10-10 9:18 ` Michael Tokarev
2009-10-10 9:24 ` Frederic Weisbecker
2009-10-10 17:37 ` Marcelo Tosatti
2009-10-08 8:05 ` Thomas Gleixner
2009-10-08 19:22 ` Marcelo Tosatti
2009-10-08 20:25 ` Thomas Gleixner
2009-10-08 21:02 ` Michael Tokarev
2009-10-10 17:32 ` [PATCH] tune hrtimer_interrupt hang logic Marcelo Tosatti
2009-10-08 8:09 ` Michael Tokarev [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ACD9E55.4040604@msgid.tls.msk.ru \
--to=mjt@tls.msk.ru \
--cc=fweisbec@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mtosatti@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).