From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: 2.6.35-rc1 regression with pvclock and smp guests Date: Tue, 27 Jul 2010 23:51:49 +0200 Message-ID: <4C4F54F5.6080400@amd.com> References: <4C483F67.1010007@amd.com> <4C4BF96B.7010005@redhat.com> <4C4D4B8B.80006@amd.com> <4C4EAEFC.20207@redhat.com> <4C4EC7D1.6030708@amd.com> <4C4ECBC7.1070405@redhat.com> <4C4ECF2E.4070103@amd.com> <4C4ED257.40002@redhat.com> <4C4EE3B3.2090900@amd <4C4EE61F.3070908@redhat.com> <4C4EF361.4010303@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: "glommer@redhat.com" , Zachary Amsden , KVM list To: Avi Kivity Return-path: Received: from va3ehsobe005.messaging.microsoft.com ([216.32.180.31]:9134 "EHLO VA3EHSOBE005.bigfish.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751228Ab0G0VtJ (ORCPT ); Tue, 27 Jul 2010 17:49:09 -0400 In-Reply-To: <4C4EF361.4010303@amd.com> Sender: kvm-owner@vger.kernel.org List-ID: Andre Przywara wrote: > Avi Kivity wrote: >> On 07/27/2010 04:48 PM, Andre Przywara wrote: >>>> Wierd. Maybe the clock goes crazy. >>>> >>>> Let's see if it jumps forward alot: >>>> >>>> } while (unlikely(last != ret)); >>>> + >>>> + { >>>> + static u64 last_report; >>>> + if (ret > last_report + 10000) { >>>> + last_report = ret; >>>> + printk("kvmclock: %llx\n", ret); >>>> + } >>>> + >>>> + } >>>> >>>> return ret; >>>> } >>>> >>>> Worth updating the 'return last' to update ret and goto the new code, >>>> so we don't miss that path. >>> Did that. There is _a lot_ of output (about 350 lines per second via >>> the 115k serial console), both with smp=1 and smp=2. >>> The majority is differing about 2,000,000 (ticks?), but a handful of >>> them are in the range of 20 million. >> nanoseconds. So 2-20ms. Consistent with 350 lines/sec. >> >>> No difference between smp=2 and smp=1. >>> I also get some "BUG: recent printk recursion!" and I don't see any >>> kernel boot progress beyond outputting the BogoMIPS value. >> Right, printk() wants the time too. >> >>> BTW: I found two message from your earlier debug statement: >>> [ 0.000000] kvm-clock: cpu 0, msr 0:1ac0401, boot clock >>> [ 0.000000] kvm-clock: cpu 0, msr 0:1e15401, primary cpu clock >> Those are from kvmclock initialization, not from the older patch. >> >> I'm completely confused, everything seems to be in order. >> >> Let's see. if you s/return last/return ret/ in the original, does this >> help things along? this makes pvclock drop the computation and should >> be exactly the same as before the patch. > Yes, this works, both smp version boot. I see a short very short break > after the line in question, but then it proceeds well. > Thanks for your help, now I got a much better insight into the issue. I > will see if I can find something more. Did some more investigations, some observations: - The cmpxchg does not seem to be a problem, I didn't see the loop iterated more than once. - Turning off printk-timestamps makes the bug go away. But I guess it is just hiding or deferring it, and it's no real workaround anyway. - I instrumented the "if (ret < last) return last;" statement, when the kernel hangs I get only printks from there, although it has hit before: ---------- [ 0.820000] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.820000] returning last instead (cnt=19001) [ 0.820000] returning last instead (cnt=20001) The last line repeats forever with the same timestamp, the counter (counting the number of "return last;") increments about 3500 times/second. I will see if I find something more... Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12