From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: Re: smp guest questions Date: Thu, 18 Jun 2009 13:14:05 +0400 Message-ID: <4A3A055D.7040002@msgid.tls.msk.ru> References: <4A38ABA3.2010401@msgid.tls.msk.ru> <4A38B43E.6010704@redhat.com> <4A38C997.5020005@msgid.tls.msk.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: KVM list , Marcelo Tosatti To: Avi Kivity Return-path: Received: from isrv.corpit.ru ([81.13.33.159]:34922 "EHLO isrv.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753281AbZFRJOI (ORCPT ); Thu, 18 Jun 2009 05:14:08 -0400 In-Reply-To: <4A38C997.5020005@msgid.tls.msk.ru> Sender: kvm-owner@vger.kernel.org List-ID: Replying to myself & top-posting for reference. I can't reproduce the problem - neither of the two issues with timers mentioned in my original email quited below. But there IS a race somewhere, that's for sure. When I saw both - "pm-timer running at 200% rate" and "hrtimer: interrupt too slow" (and I saw them more than once on this configuration), - it was during host system startup, when it starts all the guest machines (several of them) and they continue its own startup at the background, all at once. I.e, it happened more than once when several kvm guests gets started all together. Playing with it more I wasn't able to repeat the issue, and can't trigger it with 4 guests on my test machine at home either. But it happened again "when I wasn't watching", also during massive guest startup. Another issue happened during startup (or, rather, AFTER such massive startup when one guest reported the 200% rate of pm-timer, probably at the same time when hrtimer message popped up) - another guest locked up hard, kvm process were looping using 100% cpu time and did not answer to monitor socket requests (it was supposed to listen on a unix socket for monitor commands). *Probably* at the time when one guest were in locked state, another guest reported that hrtimer message - but I'm not 100% sure since I can only see it by "--MARK--" messages in syslog of the died guest, which are at 20-minute intervals. Maybe some "random glitch", I dunno ;) In any way, since I can't provide more information about all this despite all my attempts to reproduce the situation.. I consider this issue closed, for now anyway. But let it be archived for future refefence :) Thanks! /mjt Michael Tokarev wrote: > Avi Kivity wrote: >> On 06/17/2009 11:38 AM, Michael Tokarev wrote: >>> After seeing words from Avi about that smp guests >>> are ok now, I descided to try. And immediately >>> got a few questions. >>> >>> Running on a Phenom 9750 machine (PhenomI), AMD780G >>> chipset. Host is 2.6.29 x86-64, qemu-kvm 0.10.5, >>> guests are linux with kvm paravirt bits enabled, also >>> dynticks (on both host and guest). >>> >>> >>> When booting a 2-CPU guest, I see in dmesg: >>> >>> PM-Timer running at invalid rate: 200% of normal - aborting. >>> >>> and indeed, in available_clocksource there's no pmtimer. >>> Should I be concerned? It does not look healthy. >>> >> >> It's a bug, please post guest details (kernel version, bitness). > > The guest kernel is also 2.6.29[.5], but this time it's x86-32 > (compiled for P4). kvm userspace is also 32bits (historical) -- > only host kernel is 64bit for now. I'll try to do some more > experiments later today on a test machine (this is a production > box) -- "hopefully" that same issue will occur on another > machine :) > >> Copying Marcelo. >> >>> >>> Some time later, I see stuff like: >>> >>> hrtimer: interrupt too slow, forcing clock min delta to 47210997 ns >>> >>> Which reminds me issues I had with broken hpet (time goes >>> back-n-forth with similar messages shown in dmesg, but >>> about hpet not hrtimer). Also does not look healthy. >>> >>> >>> I haven't seen either of the two messages above on any of >>> single-processor guests so far, at least with recent kernels >>> and kvm userspace, only on smp (2 cpu for now). >> >> Please also post host /proc/cpuifo. > > HOST cpuinfo (only for 4th core, other cores are similar): > processor : 3 > vendor_id : AuthenticAMD > cpu family : 16 > model : 2 > model name : AMD Phenom(tm) 9750 Quad-Core Processor > stepping : 3 > cpu MHz : 1200.000 > (yes ondemand cpufreq is in effect - nominal frequency is 2400. > I had no issues with cpufreq on this box so far, including all > the guests). > cache size : 512 KB > physical id : 0 > siblings : 4 > core id : 3 > cpu cores : 4 > apicid : 3 > initial apicid : 3 > fpu : yes > fpu_exception : yes > cpuid level : 5 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc pni > monitor cx16 lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a > misalignsse 3dnowprefetch osvw ibs > bogomips : 4812.67 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate > > > > cpuinfo on GUEST (also for only one CPU): > > processor : 1 > vendor_id : AuthenticAMD > cpu family : 6 > model : 2 > model name : QEMU Virtual CPU version 0.10.5 > stepping : 3 > cpu MHz : 2405.894 > cache size : 512 KB > fdiv_bug : no > hlt_bug : no > f00f_bug : no > coma_bug : no > fpu : yes > fpu_exception : yes > cpuid level : 2 > wp : yes > flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov > pat pse36 clflush mmx fxsr sse sse2 syscall lm pni hypervisor > bogomips : 4811.78 > clflush size : 64 > power management: > > > Thanks! > > /mjt > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html