From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: Re: kvm guest: hrtimer: interrupt too slow Date: Thu, 08 Oct 2009 12:14:32 +0400 Message-ID: <4ACD9F68.10303@msgid.tls.msk.ru> References: <4AC207B1.7020901@msgid.tls.msk.ru> <20091003231205.GA15015@amt.cnet> <20091007231733.GG5903@nowhere> <20091008005456.GA10032@amt.cnet> <4ACD9AB9.3080803@msgid.tls.msk.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , Frederic Weisbecker , kvm , Ingo Molnar To: Thomas Gleixner Return-path: Received: from isrv.corpit.ru ([81.13.33.159]:37502 "EHLO isrv.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755096AbZJHIPK (ORCPT ); Thu, 8 Oct 2009 04:15:10 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: Thomas Gleixner wrote: > On Thu, 8 Oct 2009, Michael Tokarev wrote: >> Yesterday I was "lucky" enough to actually watch what's >> going on when the delay actually happens. >> >> I run desktop environment on a kvm virtual machine here. >> The server is on diskless terminal, and the rest, incl. >> the window manager etc, is started from a VM. >> >> And yesterday, during normal system load (nothing extra, >> and not idle either, and all the other guests were running >> under normal load too), I had a stall of everyhing on this >> X session for about 2..3, maybe 5 secounds. >> >> It felt like completely stuck machine. Nothing were moving >> on the screen, no reaction to the keyboard etc. >> >> And after several seconds it returned to normal. With >> the familiar message in dmesg -- increasing hrtimer etc, >> to the next 50%. (Without a patch from Marcelo at this >> time it shuold increase min_delta to a large number). >> >> To summarize: there's something, well, more interesting >> going on here. In addition to the scheduling issues that >> causes timers to be calculated on the "wrong" CPU etc as > > Care to elaborate ? Such huge delays (in terms of seconds, not ms or ns) - I don't understand how such delays can be explained by sheduling to the different cpu etc. That's what I mean. I know very little about all this low-level stuff so I may be completely out of context, but such explanation does not look right to me, simple as that. By "scheduling mistakes" we can get mistakes in range of millisecs, but not secs. /mjt