public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* KVM host kernel hang
@ 2009-01-07  8:35 Alexander Graf
  2009-01-07 10:15 ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Graf @ 2009-01-07  8:35 UTC (permalink / raw)
  To: kvm@vger.kernel.org; +Cc: Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 2306 bytes --]

Hi,

while trying to run a current openSUSE in VMWare ESX in KVM (using NPT),
some KVM code seems to be stuck in an endless loop. The qemu process
hangs, I can't attach gdb to it and the kernel module seems to be
hanging in a place where I don't see any looping code. One CPU is
definitely stuck in sys at 100% though.

This is running git as of yesterday with some minor ESX modifications
that should not touch any of these parts (userspace and MSRs).

Maybe one of you guys has a clue what's going on here. You'll find a
snippet of a t-sysrq trace with all qemu relevant parts below. The
registers (incl. IP) of these don't change over time.

Alex

qemu-system-x D ffff810001025280     0 27900   9501
 ffff8101000e5c58 0000000000000082 0000000000000000 ffff8101000e5c1c
 ffff81011446e728 ffffffff807e6280 ffffffff807e6280 ffff8100388ca680
 ffffffff80601890 ffff8100388ca9c0 0000000000200200 ffff8100388ca9c0
Call Trace:
 [<ffffffff804485ec>] __mutex_lock_slowpath+0x72/0xa9
 [<ffffffff8044847a>] mutex_lock+0x1e/0x22
 [<ffffffff88d7f630>] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
 [<ffffffff88d7c78e>] :kvm:kvm_vm_ioctl+0x744/0x777
 [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
 [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
 [<ffffffff802acdde>] sys_ioctl+0x55/0x77
 [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
 [<00007f2f3b15eb67>]

qemu-system-x R  running task        0 27908   9501
 0000000000000000 ffffffff88d7d3ad 0000000000000390 ffff810100120040
 ffff810116491000 00000000fee00390 0000000000000000 0000000000000000
 ffff81011b361d08 ffffffff88d7f1fb 0000000000000000 0000000100000000
Call Trace:
Inexact backtrace:

 [<ffffffff88d7d3ad>] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
 [<ffffffff88d7f1fb>] :kvm:emulate_instruction+0x199/0x266
 [<ffffffff88d86700>] :kvm:kvm_mmu_page_fault+0x49/0x86
 [<ffffffff88a3ebe8>] :kvm_amd:pf_interception+0xa8/0xb1
 [<ffffffff88a3e1b4>] :kvm_amd:handle_exit+0x218/0x221
 [<ffffffff88d810f6>] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
 [<ffffffff88d7a4f0>] :kvm:kvm_vcpu_ioctl+0xf6/0x485
 [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
 [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
 [<ffffffff802a13a3>] fget_light+0x1/0x83
 [<ffffffff802acdde>] sys_ioctl+0x55/0x77
 [<ffffffff802a0b48>] sys_writev+0x60/0x94
 [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f



[-- Attachment #2: dmesg.kvm.gz --]
[-- Type: application/x-gzip, Size: 22311 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KVM host kernel hang
  2009-01-07  8:35 KVM host kernel hang Alexander Graf
@ 2009-01-07 10:15 ` Avi Kivity
  2009-01-07 13:02   ` Alexander Graf
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-01-07 10:15 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm@vger.kernel.org, Joerg Roedel

Alexander Graf wrote:
> Hi,
>
> while trying to run a current openSUSE in VMWare ESX in KVM (using NPT),
> some KVM code seems to be stuck in an endless loop. The qemu process
> hangs, I can't attach gdb to it and the kernel module seems to be
> hanging in a place where I don't see any looping code. One CPU is
> definitely stuck in sys at 100% though.
>
> This is running git as of yesterday with some minor ESX modifications
> that should not touch any of these parts (userspace and MSRs).
>
> Maybe one of you guys has a clue what's going on here. You'll find a
> snippet of a t-sysrq trace with all qemu relevant parts below. The
> registers (incl. IP) of these don't change over time.
>
> Alex
>
> qemu-system-x D ffff810001025280     0 27900   9501
>  ffff8101000e5c58 0000000000000082 0000000000000000 ffff8101000e5c1c
>  ffff81011446e728 ffffffff807e6280 ffffffff807e6280 ffff8100388ca680
>  ffffffff80601890 ffff8100388ca9c0 0000000000200200 ffff8100388ca9c0
> Call Trace:
>  [<ffffffff804485ec>] __mutex_lock_slowpath+0x72/0xa9
>  [<ffffffff8044847a>] mutex_lock+0x1e/0x22
>  [<ffffffff88d7f630>] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
>  [<ffffffff88d7c78e>] :kvm:kvm_vm_ioctl+0x744/0x777
>  [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
>  [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
>  [<ffffffff802acdde>] sys_ioctl+0x55/0x77
>  [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
>  [<00007f2f3b15eb67>]
>
>   

Waiting for kvm->lock, so can't kill or strace.

> qemu-system-x R  running task        0 27908   9501
>  0000000000000000 ffffffff88d7d3ad 0000000000000390 ffff810100120040
>  ffff810116491000 00000000fee00390 0000000000000000 0000000000000000
>  ffff81011b361d08 ffffffff88d7f1fb 0000000000000000 0000000100000000
> Call Trace:
> Inexact backtrace:
>
>  [<ffffffff88d7d3ad>] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
>  [<ffffffff88d7f1fb>] :kvm:emulate_instruction+0x199/0x266
>  [<ffffffff88d86700>] :kvm:kvm_mmu_page_fault+0x49/0x86
>  [<ffffffff88a3ebe8>] :kvm_amd:pf_interception+0xa8/0xb1
>  [<ffffffff88a3e1b4>] :kvm_amd:handle_exit+0x218/0x221
>  [<ffffffff88d810f6>] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
>  [<ffffffff88d7a4f0>] :kvm:kvm_vcpu_ioctl+0xf6/0x485
>  [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
>  [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
>  [<ffffffff802a13a3>] fget_light+0x1/0x83
>  [<ffffffff802acdde>] sys_ioctl+0x55/0x77
>  [<ffffffff802a0b48>] sys_writev+0x60/0x94
>  [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
>   

But the mutex is not taken here.  Looks like we lost it, maybe 
CONFIG_LOCKDEP can find out where.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KVM host kernel hang
  2009-01-07 10:15 ` Avi Kivity
@ 2009-01-07 13:02   ` Alexander Graf
  2009-01-07 13:12     ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Graf @ 2009-01-07 13:02 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm@vger.kernel.org, Joerg Roedel


On 07.01.2009, at 11:15, Avi Kivity wrote:

> Alexander Graf wrote:
>> Hi,
>>
>> while trying to run a current openSUSE in VMWare ESX in KVM (using  
>> NPT),
>> some KVM code seems to be stuck in an endless loop. The qemu process
>> hangs, I can't attach gdb to it and the kernel module seems to be
>> hanging in a place where I don't see any looping code. One CPU is
>> definitely stuck in sys at 100% though.
>>
>> This is running git as of yesterday with some minor ESX modifications
>> that should not touch any of these parts (userspace and MSRs).
>>
>> Maybe one of you guys has a clue what's going on here. You'll find a
>> snippet of a t-sysrq trace with all qemu relevant parts below. The
>> registers (incl. IP) of these don't change over time.
>>
>> Alex
>>
>> qemu-system-x D ffff810001025280     0 27900   9501
>> ffff8101000e5c58 0000000000000082 0000000000000000 ffff8101000e5c1c
>> ffff81011446e728 ffffffff807e6280 ffffffff807e6280 ffff8100388ca680
>> ffffffff80601890 ffff8100388ca9c0 0000000000200200 ffff8100388ca9c0
>> Call Trace:
>> [<ffffffff804485ec>] __mutex_lock_slowpath+0x72/0xa9
>> [<ffffffff8044847a>] mutex_lock+0x1e/0x22
>> [<ffffffff88d7f630>] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
>> [<ffffffff88d7c78e>] :kvm:kvm_vm_ioctl+0x744/0x777
>> [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
>> [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
>> [<ffffffff802acdde>] sys_ioctl+0x55/0x77
>> [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
>> [<00007f2f3b15eb67>]
>>
>>
>
> Waiting for kvm->lock, so can't kill or strace.
>
>> qemu-system-x R  running task        0 27908   9501
>> 0000000000000000 ffffffff88d7d3ad 0000000000000390 ffff810100120040
>> ffff810116491000 00000000fee00390 0000000000000000 0000000000000000
>> ffff81011b361d08 ffffffff88d7f1fb 0000000000000000 0000000100000000
>> Call Trace:
>> Inexact backtrace:
>>
>> [<ffffffff88d7d3ad>] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
>> [<ffffffff88d7f1fb>] :kvm:emulate_instruction+0x199/0x266
>> [<ffffffff88d86700>] :kvm:kvm_mmu_page_fault+0x49/0x86
>> [<ffffffff88a3ebe8>] :kvm_amd:pf_interception+0xa8/0xb1
>> [<ffffffff88a3e1b4>] :kvm_amd:handle_exit+0x218/0x221
>> [<ffffffff88d810f6>] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
>> [<ffffffff88d7a4f0>] :kvm:kvm_vcpu_ioctl+0xf6/0x485
>> [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
>> [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
>> [<ffffffff802a13a3>] fget_light+0x1/0x83
>> [<ffffffff802acdde>] sys_ioctl+0x55/0x77
>> [<ffffffff802a0b48>] sys_writev+0x60/0x94
>> [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
>>
>
> But the mutex is not taken here.  Looks like we lost it, maybe  
> CONFIG_LOCKDEP can find out where.

I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's  
actually locking itself up?
Btw: The issue seems to be easily reproducible :-)

Alex


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KVM host kernel hang
  2009-01-07 13:02   ` Alexander Graf
@ 2009-01-07 13:12     ` Avi Kivity
  2009-01-07 13:41       ` Alexander Graf
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-01-07 13:12 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm@vger.kernel.org, Joerg Roedel

Alexander Graf wrote:
>
> I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's 
> actually locking itself up?
> Btw: The issue seems to be easily reproducible :-)

Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP.  _SUPPORT just 
indicates the arch can do it if you want, IIUC.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KVM host kernel hang
  2009-01-07 13:12     ` Avi Kivity
@ 2009-01-07 13:41       ` Alexander Graf
  2009-01-07 13:53         ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Graf @ 2009-01-07 13:41 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm@vger.kernel.org, Joerg Roedel

Avi Kivity wrote:
> Alexander Graf wrote:
>>
>> I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
>> actually locking itself up?
>> Btw: The issue seems to be easily reproducible :-)
>
> Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP.  _SUPPORT just
> indicates the arch can do it if you want, IIUC.

I just added some debug #define's to show me where exactly things break.


Jan  7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
Jan  7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {

   2145 mmio:
   2146         /*
   2147          * Is this MMIO handled locally?
   2148          */
   2149         mutex_lock(&vcpu->kvm->lock);
   2150         mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
   2151         if (mmio_dev) {
   2152                 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
   2153                 mutex_unlock(&vcpu->kvm->lock);
   2154                 return X86EMUL_CONTINUE;
   2155         }
   2156         mutex_unlock(&vcpu->kvm->lock);

   1901         case KVM_IRQ_LINE: {
   1902                 struct kvm_irq_level irq_event;
   1903
   1904                 r = -EFAULT;
   1905                 if (copy_from_user(&irq_event, argp, sizeof
irq_event))
   1906                         goto out;
   1907                 if (irqchip_in_kernel(kvm)) {
   1908                         mutex_lock(&kvm->lock);
   1909                         kvm_set_irq(kvm,
KVM_USERSPACE_IRQ_SOURCE_ID,
   1910                                     irq_event.irq, irq_event.level);
   1911                         mutex_unlock(&kvm->lock);
   1912                         r = 0;
   1913                 }
   1914                 break;
   1915         }

Any ideas?

Alex

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KVM host kernel hang
  2009-01-07 13:41       ` Alexander Graf
@ 2009-01-07 13:53         ` Avi Kivity
  2009-01-07 19:06           ` Alexander Graf
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-01-07 13:53 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm@vger.kernel.org, Joerg Roedel

Alexander Graf wrote:
> Avi Kivity wrote:
>   
>> Alexander Graf wrote:
>>     
>>> I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
>>> actually locking itself up?
>>> Btw: The issue seems to be easily reproducible :-)
>>>       
>> Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP.  _SUPPORT just
>> indicates the arch can do it if you want, IIUC.
>>     
>
> I just added some debug #define's to show me where exactly things break.
>
>
> Jan  7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
> Jan  7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {
>
>    2145 mmio:
>    2146         /*
>    2147          * Is this MMIO handled locally?
>    2148          */
>    2149         mutex_lock(&vcpu->kvm->lock);
>    2150         mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
>    2151         if (mmio_dev) {
>    2152                 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
>    2153                 mutex_unlock(&vcpu->kvm->lock);
>    2154                 return X86EMUL_CONTINUE;
>    2155         }
>    2156         mutex_unlock(&vcpu->kvm->lock);
>
>   

The lock was lost here.  But how?

>    1901         case KVM_IRQ_LINE: {
>    1902                 struct kvm_irq_level irq_event;
>    1903
>    1904                 r = -EFAULT;
>    1905                 if (copy_from_user(&irq_event, argp, sizeof
> irq_event))
>    1906                         goto out;
>    1907                 if (irqchip_in_kernel(kvm)) {
>    1908                         mutex_lock(&kvm->lock);
>    1909                         kvm_set_irq(kvm,
> KVM_USERSPACE_IRQ_SOURCE_ID,
>    1910                                     irq_event.irq, irq_event.level);
>    1911                         mutex_unlock(&kvm->lock);
>    1912                         r = 0;
>    1913                 }
>    1914                 break;
>    1915         }
>
>   
This is your hung iothread trying to inject an interrupt.  It's waiting 
for the lost lock.

I suggest enabling all the lock debug magic you can find in kconfig.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KVM host kernel hang
  2009-01-07 13:53         ` Avi Kivity
@ 2009-01-07 19:06           ` Alexander Graf
  0 siblings, 0 replies; 7+ messages in thread
From: Alexander Graf @ 2009-01-07 19:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm@vger.kernel.org, Joerg Roedel





On 07.01.2009, at 14:53, Avi Kivity <avi@redhat.com> wrote:

> Alexander Graf wrote:
>> Avi Kivity wrote:
>>
>>> Alexander Graf wrote:
>>>
>>>> I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
>>>> actually locking itself up?
>>>> Btw: The issue seems to be easily reproducible :-)
>>>>
>>> Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP.  _SUPPORT just
>>> indicates the arch can do it if you want, IIUC.
>>>
>>
>> I just added some debug #define's to show me where exactly things  
>> break.
>>
>>
>> Jan  7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
>> Jan  7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {
>>
>>   2145 mmio:
>>   2146         /*
>>   2147          * Is this MMIO handled locally?
>>   2148          */
>>   2149         mutex_lock(&vcpu->kvm->lock);
>>   2150         mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
>>   2151         if (mmio_dev) {
>>   2152                 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
>>   2153                 mutex_unlock(&vcpu->kvm->lock);
>>   2154                 return X86EMUL_CONTINUE;
>>   2155         }
>>   2156         mutex_unlock(&vcpu->kvm->lock);
>>
>>
>
> The lock was lost here.  But how?
>
>>   1901         case KVM_IRQ_LINE: {
>>   1902                 struct kvm_irq_level irq_event;
>>   1903
>>   1904                 r = -EFAULT;
>>   1905                 if (copy_from_user(&irq_event, argp, sizeof
>> irq_event))
>>   1906                         goto out;
>>   1907                 if (irqchip_in_kernel(kvm)) {
>>   1908                         mutex_lock(&kvm->lock);
>>   1909                         kvm_set_irq(kvm,
>> KVM_USERSPACE_IRQ_SOURCE_ID,
>>   1910                                     irq_event.irq,  
>> irq_event.level);
>>   1911                         mutex_unlock(&kvm->lock);
>>   1912                         r = 0;
>>   1913                 }
>>   1914                 break;
>>   1915         }
>>
>>
> This is your hung iothread trying to inject an interrupt.  It's  
> waiting for the lost lock.
>
> I suggest enabling all the lock debug magic you can find in kconfig.

I did that and still don't get anything. I'll try digging deeper into  
this tomorrow.

Alex

>
>
> -- 
> error compiling committee.c: too many arguments to function
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-01-07 19:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-07  8:35 KVM host kernel hang Alexander Graf
2009-01-07 10:15 ` Avi Kivity
2009-01-07 13:02   ` Alexander Graf
2009-01-07 13:12     ` Avi Kivity
2009-01-07 13:41       ` Alexander Graf
2009-01-07 13:53         ` Avi Kivity
2009-01-07 19:06           ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox