From mboxrd@z Thu Jan  1 00:00:00 1970
From: Michael Tokarev <mjt@tls.msk.ru>
Subject: Re: kvm guest: hrtimer: interrupt too slow
Date: Thu, 08 Oct 2009 18:06:39 +0400
Message-ID: <4ACDF1EF.3070005@msgid.tls.msk.ru>
References: <4AC207B1.7020901@msgid.tls.msk.ru> <20091003231205.GA15015@amt.cnet> <20091007231733.GG5903@nowhere> <20091008005456.GA10032@amt.cnet> <4ACD9AB9.3080803@msgid.tls.msk.ru> <alpine.LFD.2.00.0910081005140.9428@localhost.localdomain> <4ACD9F68.10303@msgid.tls.msk.ru> <alpine.LFD.2.00.0910081127530.9428@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	kvm <kvm@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>
To: Thomas Gleixner <tglx@linutronix.de>
Return-path: <kvm-owner@vger.kernel.org>
Received: from isrv.corpit.ru ([81.13.33.159]:32929 "EHLO isrv.corpit.ru"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1757690AbZJHOHS (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 8 Oct 2009 10:07:18 -0400
In-Reply-To: <alpine.LFD.2.00.0910081127530.9428@localhost.localdomain>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Thomas Gleixner wrote:
> On Thu, 8 Oct 2009, Michael Tokarev wrote:
> 
>> Thomas Gleixner wrote:
>>> On Thu, 8 Oct 2009, Michael Tokarev wrote:
>>>> Yesterday I was "lucky" enough to actually watch what's
>>>> going on when the delay actually happens.
>>>>
>>>> I run desktop environment on a kvm virtual machine here.
>>>> The server is on diskless terminal, and the rest, incl.
>>>> the window manager etc, is started from a VM.
>>>>
>>>> And yesterday, during normal system load (nothing extra,
>>>> and not idle either, and all the other guests were running
>>>> under normal load too), I had a stall of everyhing on this
>>>> X session for about 2..3, maybe 5 secounds.
>>>>
>>>> It felt like completely stuck machine. Nothing were moving
>>>> on the screen, no reaction to the keyboard etc.
>>>>
>>>> And after several seconds it returned to normal.  With
>>>> the familiar message in dmesg -- increasing hrtimer etc,
>>>> to the next 50%.  (Without a patch from Marcelo at this
>>>> time it shuold increase min_delta to a large number).
>>>>
>>>> To summarize: there's something, well, more interesting
>>>> going on here.  In addition to the scheduling issues that
>>>> causes timers to be calculated on the "wrong" CPU etc as
>>> Care to elaborate ?
>> Such huge delays (in terms of seconds, not ms or ns) - I don't
>> understand how such delays can be explained by sheduling to the
>> different cpu etc.  That's what I mean.  I know very little about
>> all this low-level stuff so I may be completely out of context,
>> but such explanation does not look right to me, simple as that.
>> By "scheduling mistakes" we can get mistakes in range of millisecs,
>> but not secs.
> 
> I'm really missing the big picture here. 
> 
> What means "causes timers to be calculated on the "wrong" CPU etc" ?
> And what do you consider a "scheduling mistake" ?

 From the initial diagnostics by Marcelo:

 > It seems the way hrtimer_interrupt_hanging calculates min_delta is
 > wrong (especially to virtual machines). The guest vcpu can be scheduled
 > out during the execution of the hrtimer callbacks (and the callbacks
 > themselves can do operations that translate to blocking operations in
 > the hypervisor).
 >
 > So high min_delta values can be calculated if, for example, a single
 > hrtimer_interrupt run takes two host time slices to execute, while some
 > other higher priority task runs for N slices in between.

 From this I conclude that the huge min_delta is due to some other task(s)
on the host being run while this guest is in hrtimer callback.  But I
fail to see why that process on the host takes SO MUCH time, to warrant
resulting min_delta to 0.5s, or to cause delays for 3..5 seconds in
guest.  It's ok to have delays in range of several extra milliseconds,
but for *seconds* is too much.

Note again that neither host nor guest are not under high load when
this jump happens.  Also note that there's no high-priority processes
running on the host, all are of the same priority level, including
all the guests.

Note also that so far I only see it on SMP guests, never on UP
guests.  And only on guests with kvm_clock, not with acpi_pm
clocksource.

What I'm trying to say is that it looks like there's something
else wrong here in the guest code.  Huge stalls, huge delays
while in hrtimer callback (i think it jappens always when such
delay is happening, it's just noticed by hrtimer code) -- that's
the root cause of all this, (probably) wrong logic in hrtimer
calibration just shows the results of something that's wrong
elsewhere.

/mjt