public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock
@ 2010-04-23 17:08 David S. Ahern
  2010-04-23 20:39 ` Brian Jackson
  0 siblings, 1 reply; 7+ messages in thread
From: David S. Ahern @ 2010-04-23 17:08 UTC (permalink / raw)
  To: kvm-devel

After a few days of debugging I think kvmclock is the source of lockups
for a RHEL5.5-based VM. The VM works fine on one host, but repeatedly
locks up on another.

Server 1 - VM locks up repeatedly
-- DL580 G5
-- 4 quad-core X7350 processors at 2.93GHz
-- 48GB RAM

Server 2 - VM works just fine
-- DL380 G6
-- 2 quad-core E5540 processors at 2.53GHz
-- 24GB RAM

Both host servers are running Fedora Core 12, 2.6.32.11-99.fc12.x86_64
kernel. I have tried various versions of qemu-kvm -- the version in
FC-12 and the version for FC-12 in virt-preview. In both cases the
qemu-kvm command line is identical.

VM
- RHEL5.5, PAE kernel (also tried standard 32-bit)
- 2 vcpus
- 3GB RAM
- virtio network and disk

When the VM locks up both vcpu threads are spinning at 100%. Changing
the clocksource to jiffies appears to have addressed the problem.

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock
  2010-04-23 17:08 RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock David S. Ahern
@ 2010-04-23 20:39 ` Brian Jackson
  2010-04-23 21:39   ` Zachary Amsden
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Jackson @ 2010-04-23 20:39 UTC (permalink / raw)
  To: David S. Ahern; +Cc: kvm-devel

On Friday 23 April 2010 12:08:22 David S. Ahern wrote:
> After a few days of debugging I think kvmclock is the source of lockups
> for a RHEL5.5-based VM. The VM works fine on one host, but repeatedly
> locks up on another.
> 
> Server 1 - VM locks up repeatedly
> -- DL580 G5
> -- 4 quad-core X7350 processors at 2.93GHz
> -- 48GB RAM
> 
> Server 2 - VM works just fine
> -- DL380 G6
> -- 2 quad-core E5540 processors at 2.53GHz
> -- 24GB RAM
> 
> Both host servers are running Fedora Core 12, 2.6.32.11-99.fc12.x86_64
> kernel. I have tried various versions of qemu-kvm -- the version in
> FC-12 and the version for FC-12 in virt-preview. In both cases the
> qemu-kvm command line is identical.
> 
> VM
> - RHEL5.5, PAE kernel (also tried standard 32-bit)
> - 2 vcpus
> - 3GB RAM
> - virtio network and disk
> 
> When the VM locks up both vcpu threads are spinning at 100%. Changing
> the clocksource to jiffies appears to have addressed the problem.


Does changing the guest to -smp 1 help?


> 
> David
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock
  2010-04-23 20:39 ` Brian Jackson
@ 2010-04-23 21:39   ` Zachary Amsden
  2010-04-23 21:42     ` David S. Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: Zachary Amsden @ 2010-04-23 21:39 UTC (permalink / raw)
  To: Brian Jackson; +Cc: David S. Ahern, kvm-devel

On 04/23/2010 10:39 AM, Brian Jackson wrote:
> On Friday 23 April 2010 12:08:22 David S. Ahern wrote:
>    
>> After a few days of debugging I think kvmclock is the source of lockups
>> for a RHEL5.5-based VM. The VM works fine on one host, but repeatedly
>> locks up on another.
>>
>> Server 1 - VM locks up repeatedly
>> -- DL580 G5
>> -- 4 quad-core X7350 processors at 2.93GHz
>> -- 48GB RAM
>>
>> Server 2 - VM works just fine
>> -- DL380 G6
>> -- 2 quad-core E5540 processors at 2.53GHz
>> -- 24GB RAM
>>
>> Both host servers are running Fedora Core 12, 2.6.32.11-99.fc12.x86_64
>> kernel. I have tried various versions of qemu-kvm -- the version in
>> FC-12 and the version for FC-12 in virt-preview. In both cases the
>> qemu-kvm command line is identical.
>>
>> VM
>> - RHEL5.5, PAE kernel (also tried standard 32-bit)
>> - 2 vcpus
>> - 3GB RAM
>> - virtio network and disk
>>
>> When the VM locks up both vcpu threads are spinning at 100%. Changing
>> the clocksource to jiffies appears to have addressed the problem.
>>      
>
> Does changing the guest to -smp 1 help?
>
>    

Based on our current understanding of the problem, it should help, but 
it may not prevent the problem entirely.

There are three issues with kvmclock due to sampling:

1) smp clock alignment may be slightly off due to timing conditions
2) kvmclock is resampled at each switch of vcpu to another pcpu
3) kvmclock granularity exceeds that of kernel timespec, which means 
sampling errors may show even on UP

Recommend using a different clocksource (tsc is great if you have stable 
TSC and don't migrate across different-speed machines) until we have all 
the fixes in place.

Zach

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock
  2010-04-23 21:39   ` Zachary Amsden
@ 2010-04-23 21:42     ` David S. Ahern
  2010-04-23 22:21       ` BRUNO CESAR RIBAS
  0 siblings, 1 reply; 7+ messages in thread
From: David S. Ahern @ 2010-04-23 21:42 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: Brian Jackson, kvm-devel



On 04/23/2010 03:39 PM, Zachary Amsden wrote:
> On 04/23/2010 10:39 AM, Brian Jackson wrote:
>> On Friday 23 April 2010 12:08:22 David S. Ahern wrote:
>>   
>>> After a few days of debugging I think kvmclock is the source of lockups
>>> for a RHEL5.5-based VM. The VM works fine on one host, but repeatedly
>>> locks up on another.
>>>
>>> Server 1 - VM locks up repeatedly
>>> -- DL580 G5
>>> -- 4 quad-core X7350 processors at 2.93GHz
>>> -- 48GB RAM
>>>
>>> Server 2 - VM works just fine
>>> -- DL380 G6
>>> -- 2 quad-core E5540 processors at 2.53GHz
>>> -- 24GB RAM
>>>
>>> Both host servers are running Fedora Core 12, 2.6.32.11-99.fc12.x86_64
>>> kernel. I have tried various versions of qemu-kvm -- the version in
>>> FC-12 and the version for FC-12 in virt-preview. In both cases the
>>> qemu-kvm command line is identical.
>>>
>>> VM
>>> - RHEL5.5, PAE kernel (also tried standard 32-bit)
>>> - 2 vcpus
>>> - 3GB RAM
>>> - virtio network and disk
>>>
>>> When the VM locks up both vcpu threads are spinning at 100%. Changing
>>> the clocksource to jiffies appears to have addressed the problem.
>>>      
>>
>> Does changing the guest to -smp 1 help?
>>
>>    
> 
> Based on our current understanding of the problem, it should help, but
> it may not prevent the problem entirely.
> 
> There are three issues with kvmclock due to sampling:
> 
> 1) smp clock alignment may be slightly off due to timing conditions
> 2) kvmclock is resampled at each switch of vcpu to another pcpu
> 3) kvmclock granularity exceeds that of kernel timespec, which means
> sampling errors may show even on UP
> 
> Recommend using a different clocksource (tsc is great if you have stable
> TSC and don't migrate across different-speed machines) until we have all
> the fixes in place.

That's my plan for now. As I recall jiffies was the default in early
RHEL5 versions. Not sure what that means hardware wise.

The biggest problem for me is that RHEL5.5 defaults to kvmclock; I'll
find some workaround for it.

David

> 
> Zach

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock
  2010-04-23 21:42     ` David S. Ahern
@ 2010-04-23 22:21       ` BRUNO CESAR RIBAS
  2010-04-23 23:21         ` David S. Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: BRUNO CESAR RIBAS @ 2010-04-23 22:21 UTC (permalink / raw)
  To: David S. Ahern; +Cc: Zachary Amsden, Brian Jackson, kvm-devel

On Fri, Apr 23, 2010 at 03:42:49PM -0600, David S. Ahern wrote:
> 
> 
> On 04/23/2010 03:39 PM, Zachary Amsden wrote:
> > On 04/23/2010 10:39 AM, Brian Jackson wrote:
> >> On Friday 23 April 2010 12:08:22 David S. Ahern wrote:
> >>   
> >>> After a few days of debugging I think kvmclock is the source of lockups
> >>> for a RHEL5.5-based VM. The VM works fine on one host, but repeatedly
> >>> locks up on another.
> >>>
> >>> Server 1 - VM locks up repeatedly
> >>> -- DL580 G5
> >>> -- 4 quad-core X7350 processors at 2.93GHz
> >>> -- 48GB RAM
> >>>
> >>> Server 2 - VM works just fine
> >>> -- DL380 G6
> >>> -- 2 quad-core E5540 processors at 2.53GHz
> >>> -- 24GB RAM
> >>>
> >>> Both host servers are running Fedora Core 12, 2.6.32.11-99.fc12.x86_64
> >>> kernel. I have tried various versions of qemu-kvm -- the version in
> >>> FC-12 and the version for FC-12 in virt-preview. In both cases the
> >>> qemu-kvm command line is identical.
> >>>
> >>> VM
> >>> - RHEL5.5, PAE kernel (also tried standard 32-bit)
> >>> - 2 vcpus
> >>> - 3GB RAM
> >>> - virtio network and disk
> >>>
> >>> When the VM locks up both vcpu threads are spinning at 100%. Changing
> >>> the clocksource to jiffies appears to have addressed the problem.
> >>>      
> >>
> >> Does changing the guest to -smp 1 help?
> >>
> >>    
> > 
> > Based on our current understanding of the problem, it should help, but
> > it may not prevent the problem entirely.
> > 
> > There are three issues with kvmclock due to sampling:
> > 
> > 1) smp clock alignment may be slightly off due to timing conditions
> > 2) kvmclock is resampled at each switch of vcpu to another pcpu
> > 3) kvmclock granularity exceeds that of kernel timespec, which means
> > sampling errors may show even on UP
> > 
> > Recommend using a different clocksource (tsc is great if you have stable
> > TSC and don't migrate across different-speed machines) until we have all
> > the fixes in place.
> 
> That's my plan for now. As I recall jiffies was the default in early
> RHEL5 versions. Not sure what that means hardware wise.
> 
> The biggest problem for me is that RHEL5.5 defaults to kvmclock; I'll
> find some workaround for it.

Could you try hpet? I had similar problem with multicore and multiCPU (per
mother board) [even with constant_tsc].

Since I changed the guest to hpet i had no more problems.

> 
> David
> 
> > 
> > Zach
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Bruno Ribas - ribas@c3sl.ufpr.br
http://www.inf.ufpr.br/ribas
C3SL: http://www.c3sl.ufpr.br

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock
  2010-04-23 22:21       ` BRUNO CESAR RIBAS
@ 2010-04-23 23:21         ` David S. Ahern
  2010-04-24 18:40           ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: David S. Ahern @ 2010-04-23 23:21 UTC (permalink / raw)
  To: BRUNO CESAR RIBAS; +Cc: Zachary Amsden, Brian Jackson, kvm-devel



On 04/23/2010 04:21 PM, BRUNO CESAR RIBAS wrote:
> 
> Could you try hpet? I had similar problem with multicore and multiCPU (per
> mother board) [even with constant_tsc].
> 
> Since I changed the guest to hpet i had no more problems.

It's stable in the sense of no lockups yet, but is a much slower time
source from a gettimeofday perspective compared to tsc and jiffies
(based on speed jiffies appears to be tsc-based).

David

> 
>>
>> David
>>
>>>
>>> Zach
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock
  2010-04-23 23:21         ` David S. Ahern
@ 2010-04-24 18:40           ` Avi Kivity
  0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2010-04-24 18:40 UTC (permalink / raw)
  To: David S. Ahern
  Cc: BRUNO CESAR RIBAS, Zachary Amsden, Brian Jackson, kvm-devel

On 04/24/2010 02:21 AM, David S. Ahern wrote:
>
> On 04/23/2010 04:21 PM, BRUNO CESAR RIBAS wrote:
>    
>> Could you try hpet? I had similar problem with multicore and multiCPU (per
>> mother board) [even with constant_tsc].
>>
>> Since I changed the guest to hpet i had no more problems.
>>      
> It's stable in the sense of no lockups yet, but is a much slower time
> source from a gettimeofday perspective compared to tsc and jiffies
> (based on speed jiffies appears to be tsc-based).
>    

Jiffies doesn't sample any hardware; instead, a timer interrupt causes a 
counter to be incremented, and that counter is sampled.  The downside is 
that clock granularity is very low - you can't use it for accurate timing.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-04-24 18:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-23 17:08 RHEL5.5, 32-bit VM repeatedly locks up due to kvmclock David S. Ahern
2010-04-23 20:39 ` Brian Jackson
2010-04-23 21:39   ` Zachary Amsden
2010-04-23 21:42     ` David S. Ahern
2010-04-23 22:21       ` BRUNO CESAR RIBAS
2010-04-23 23:21         ` David S. Ahern
2010-04-24 18:40           ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox