From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: Re: kvm guest: hrtimer: interrupt too slow Date: Fri, 09 Oct 2009 01:02:30 +0400 Message-ID: <4ACE5366.7020708@msgid.tls.msk.ru> References: <4AC207B1.7020901@msgid.tls.msk.ru> <20091003231205.GA15015@amt.cnet> <20091007231733.GG5903@nowhere> <20091008005456.GA10032@amt.cnet> <20091008192223.GA8111@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , Frederic Weisbecker , kvm , Ingo Molnar To: Thomas Gleixner Return-path: Received: from isrv.corpit.ru ([81.13.33.159]:36401 "EHLO isrv.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756443AbZJHVDI (ORCPT ); Thu, 8 Oct 2009 17:03:08 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: Thomas Gleixner wrote: [] > Also it's not clear to me why the problem does only happen with > kvm_clock and not with acpi_pm timer emulation (according to the > reporter) and is restricted to SMP guests. I just reproduced it with acpi_pm. I explained it already to Marcelo, the problem is that the issue is difficult to trigger. I still don't have any pointers as of how to trigger it, all my attempts so far, to create network, disk or cpu load, failed. So the only way is to run the guest and wait, in a hope it'll be there. And I restarted the "guinea pig" guest today (which happens to be our main office server :), and voila, after ~4 hours uptime said the same thing about hrtimer. That's lucky time, since it may run stable for several days... It just happens (and I mentioned it each time) that I didn't *see* the issue with acpi_pm. Now I see it with acpi_pm too. Speaking of smp -- well, that one is of the same category. Maybe smp just makes the issue easier to trigger but it exists with UP guests too, maybe it's SMP-specific - I don't know. What I know for sure is that out of 4 guests here (running on the same host), 2 are SMP and 2 UP, loaded approximately equally (according to the number of CPUs), and two SMP guests shows the issue quite easily, while for 2 UP guests I don't see anything in the logs for last 2 months. The issue isn't unique to my machines, other people reported it too in #kvm, including at least one active participant there. For him, issues stopped when he switched from SMP to UP guest. Yet there's no definite knowlege if the issue is really SMP-specific or not. >> retry: >> /* 5 retries is enough to notice a hang */ >> - if (!(++nr_retries % 5)) >> - hrtimer_interrupt_hanging(dev, ktime_sub(ktime_get(), now)); >> + if (!(++nr_retries % 5)) { >> + ktime_t try_time = ktime_sub(ktime_get(), now); >> + >> + do { >> + for (i = 0; i < 3; i++) >> + expires_next = ktime_add(expires_next,try_time); >> + } while (tick_program_event(expires_next, 0)); > > This needs at least a WARN_ON_ONCE() or some other way (sysfs, proc, > ...) where we can find out how often this happens. Definitely. Or printk_ratelimit. Before Marcelo come with his first version I was thinking about exposing that min_delta over procfs to be able to reset it back to a reasonable value.. ;) Thanks! /mjt