From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Ahern" Subject: Re: RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock Date: Fri, 23 Apr 2010 15:42:49 -0600 Message-ID: <4BD21459.9010903@cisco.com> References: <4BD1D406.1040508@cisco.com> <201004231539.32205.iggy@theiggy.com> <4BD213AA.7070601@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Brian Jackson , kvm-devel To: Zachary Amsden Return-path: Received: from sj-iport-1.cisco.com ([171.71.176.70]:61763 "EHLO sj-iport-1.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753737Ab0DWVmw (ORCPT ); Fri, 23 Apr 2010 17:42:52 -0400 In-Reply-To: <4BD213AA.7070601@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 04/23/2010 03:39 PM, Zachary Amsden wrote: > On 04/23/2010 10:39 AM, Brian Jackson wrote: >> On Friday 23 April 2010 12:08:22 David S. Ahern wrote: >> >>> After a few days of debugging I think kvmclock is the source of lockups >>> for a RHEL5.5-based VM. The VM works fine on one host, but repeatedly >>> locks up on another. >>> >>> Server 1 - VM locks up repeatedly >>> -- DL580 G5 >>> -- 4 quad-core X7350 processors at 2.93GHz >>> -- 48GB RAM >>> >>> Server 2 - VM works just fine >>> -- DL380 G6 >>> -- 2 quad-core E5540 processors at 2.53GHz >>> -- 24GB RAM >>> >>> Both host servers are running Fedora Core 12, 2.6.32.11-99.fc12.x86_64 >>> kernel. I have tried various versions of qemu-kvm -- the version in >>> FC-12 and the version for FC-12 in virt-preview. In both cases the >>> qemu-kvm command line is identical. >>> >>> VM >>> - RHEL5.5, PAE kernel (also tried standard 32-bit) >>> - 2 vcpus >>> - 3GB RAM >>> - virtio network and disk >>> >>> When the VM locks up both vcpu threads are spinning at 100%. Changing >>> the clocksource to jiffies appears to have addressed the problem. >>> >> >> Does changing the guest to -smp 1 help? >> >> > > Based on our current understanding of the problem, it should help, but > it may not prevent the problem entirely. > > There are three issues with kvmclock due to sampling: > > 1) smp clock alignment may be slightly off due to timing conditions > 2) kvmclock is resampled at each switch of vcpu to another pcpu > 3) kvmclock granularity exceeds that of kernel timespec, which means > sampling errors may show even on UP > > Recommend using a different clocksource (tsc is great if you have stable > TSC and don't migrate across different-speed machines) until we have all > the fixes in place. That's my plan for now. As I recall jiffies was the default in early RHEL5 versions. Not sure what that means hardware wise. The biggest problem for me is that RHEL5.5 defaults to kvmclock; I'll find some workaround for it. David > > Zach