From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: Re: kvm guest: hrtimer: interrupt too slow Date: Sat, 10 Oct 2009 02:34:22 +0400 Message-ID: <4ACFBA6E.7030901@msgid.tls.msk.ru> References: <20091003231205.GA15015@amt.cnet> <20091007231733.GG5903@nowhere> <20091008005456.GA10032@amt.cnet> <4ACD9AB9.3080803@msgid.tls.msk.ru> <4ACD9F68.10303@msgid.tls.msk.ru> <4ACDF1EF.3070005@msgid.tls.msk.ru> <20091008195232.GA8350@amt.cnet> <4ACFA988.6070500@msgid.tls.msk.ru> <20091009222733.GC5859@nowhere> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , Thomas Gleixner , kvm To: Frederic Weisbecker Return-path: Received: from isrv.corpit.ru ([81.13.33.159]:54785 "EHLO isrv.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751449AbZJIWfC (ORCPT ); Fri, 9 Oct 2009 18:35:02 -0400 In-Reply-To: <20091009222733.GC5859@nowhere> Sender: kvm-owner@vger.kernel.org List-ID: Frederic Weisbecker wrote: > On Sat, Oct 10, 2009 at 01:22:16AM +0400, Michael Tokarev wrote: >> Marcelo Tosatti wrote: >> [snip] >> >>> Would be useful to collect sar (sar -B -b -u) output every one second >>> in both host/guest. You already mentioned load was low, but this should >>> give more details. >> Here we go: http://www.corpit.ru/mjt/hrtimer-interrupt-too-slow/ >> >> Two incindents - cases when hrtimer: interrupt is too slow were >> reported in the guest (with Marcelo's patch so that min_delta is >> increased to 50% only), happened at 11:44:48 and 21:46:56 (as >> shown in guest-dmesg file). For both, there's `sar -BWbd' output >> for a 2-minute interval (starting one minute before the delay >> and ending one minute after) from both host and guest. >> >>> Was there swapping going on? >> Not as far as I can see, and sar output agrees. > > But I can read this from you guest traces: > > pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff > > 11:44:45 0.00 32.32 174.75 0.00 176.77 0.00 0.00 0.00 0.00 > 11:44:46 0.00 16.00 789.00 0.00 323.00 0.00 0.00 0.00 0.00 > 11:44:47 0.00 32.32 907.07 0.00 277.78 0.00 0.00 0.00 0.00 > 11:44:48 27.59 22.99 44.83 0.00 150.57 0.00 0.00 0.00 0.00 > 11:44:49 0.00 33.68 22.11 0.00 218.95 0.00 0.00 0.00 0.00 > 11:44:50 0.00 101.01 17.17 0.00 151.52 0.00 0.00 0.00 0.00 > 11:44:51 0.00 15.69 16.67 0.00 126.47 0.00 0.00 0.00 0.00 > [...] > 21:46:52 0.00 40.00 17.00 0.00 82.00 0.00 0.00 0.00 0.00 > 21:46:53 0.00 31.68 18.81 0.00 94.06 0.00 0.00 0.00 0.00 > 21:46:54 0.00 31.68 16.83 0.00 90.10 0.00 0.00 0.00 0.00 > 21:46:55 0.00 108.00 17.00 0.00 89.00 0.00 0.00 0.00 0.00 > 21:46:56 9.76 482.93 3890.24 0.00 439.02 0.00 0.00 0.00 0.00 > 21:46:57 0.00 760.00 8627.00 0.00 1133.00 0.00 0.00 0.00 0.00 > 21:46:58 0.00 84.85 2612.12 0.00 138.38 0.00 0.00 0.00 0.00 > 21:46:59 0.00 16.00 17.00 0.00 90.00 0.00 0.00 0.00 0.00 > 21:47:00 0.00 36.36 17.17 0.00 90.91 0.00 0.00 0.00 0.00 > > > So it looks like there was some swapping in when the hrtimer (spuriously) > hanged. It's not swapping. Swapping is in a separate table, with columns titled pswpin/s and pswpout/s -- first table. On my home machine with no swap at all, 4gig memory and 2gig free, pgpgin and pgpgout are increasing too. Also, while in the second case (21:46:56) there's actually some noticeable activity (page faults at least), in first case that activity is modest. Note there's no documentation for /proc/vmstat file :) /mjt