* KVM host kernel hang
@ 2009-01-07 8:35 Alexander Graf
2009-01-07 10:15 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Alexander Graf @ 2009-01-07 8:35 UTC (permalink / raw)
To: kvm@vger.kernel.org; +Cc: Joerg Roedel
[-- Attachment #1: Type: text/plain, Size: 2306 bytes --]
Hi,
while trying to run a current openSUSE in VMWare ESX in KVM (using NPT),
some KVM code seems to be stuck in an endless loop. The qemu process
hangs, I can't attach gdb to it and the kernel module seems to be
hanging in a place where I don't see any looping code. One CPU is
definitely stuck in sys at 100% though.
This is running git as of yesterday with some minor ESX modifications
that should not touch any of these parts (userspace and MSRs).
Maybe one of you guys has a clue what's going on here. You'll find a
snippet of a t-sysrq trace with all qemu relevant parts below. The
registers (incl. IP) of these don't change over time.
Alex
qemu-system-x D ffff810001025280 0 27900 9501
ffff8101000e5c58 0000000000000082 0000000000000000 ffff8101000e5c1c
ffff81011446e728 ffffffff807e6280 ffffffff807e6280 ffff8100388ca680
ffffffff80601890 ffff8100388ca9c0 0000000000200200 ffff8100388ca9c0
Call Trace:
[<ffffffff804485ec>] __mutex_lock_slowpath+0x72/0xa9
[<ffffffff8044847a>] mutex_lock+0x1e/0x22
[<ffffffff88d7f630>] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
[<ffffffff88d7c78e>] :kvm:kvm_vm_ioctl+0x744/0x777
[<ffffffff802acada>] vfs_ioctl+0x2a/0x78
[<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
[<ffffffff802acdde>] sys_ioctl+0x55/0x77
[<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
[<00007f2f3b15eb67>]
qemu-system-x R running task 0 27908 9501
0000000000000000 ffffffff88d7d3ad 0000000000000390 ffff810100120040
ffff810116491000 00000000fee00390 0000000000000000 0000000000000000
ffff81011b361d08 ffffffff88d7f1fb 0000000000000000 0000000100000000
Call Trace:
Inexact backtrace:
[<ffffffff88d7d3ad>] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
[<ffffffff88d7f1fb>] :kvm:emulate_instruction+0x199/0x266
[<ffffffff88d86700>] :kvm:kvm_mmu_page_fault+0x49/0x86
[<ffffffff88a3ebe8>] :kvm_amd:pf_interception+0xa8/0xb1
[<ffffffff88a3e1b4>] :kvm_amd:handle_exit+0x218/0x221
[<ffffffff88d810f6>] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
[<ffffffff88d7a4f0>] :kvm:kvm_vcpu_ioctl+0xf6/0x485
[<ffffffff802acada>] vfs_ioctl+0x2a/0x78
[<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
[<ffffffff802a13a3>] fget_light+0x1/0x83
[<ffffffff802acdde>] sys_ioctl+0x55/0x77
[<ffffffff802a0b48>] sys_writev+0x60/0x94
[<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
[-- Attachment #2: dmesg.kvm.gz --]
[-- Type: application/x-gzip, Size: 22311 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM host kernel hang
2009-01-07 8:35 KVM host kernel hang Alexander Graf
@ 2009-01-07 10:15 ` Avi Kivity
2009-01-07 13:02 ` Alexander Graf
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-01-07 10:15 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm@vger.kernel.org, Joerg Roedel
Alexander Graf wrote:
> Hi,
>
> while trying to run a current openSUSE in VMWare ESX in KVM (using NPT),
> some KVM code seems to be stuck in an endless loop. The qemu process
> hangs, I can't attach gdb to it and the kernel module seems to be
> hanging in a place where I don't see any looping code. One CPU is
> definitely stuck in sys at 100% though.
>
> This is running git as of yesterday with some minor ESX modifications
> that should not touch any of these parts (userspace and MSRs).
>
> Maybe one of you guys has a clue what's going on here. You'll find a
> snippet of a t-sysrq trace with all qemu relevant parts below. The
> registers (incl. IP) of these don't change over time.
>
> Alex
>
> qemu-system-x D ffff810001025280 0 27900 9501
> ffff8101000e5c58 0000000000000082 0000000000000000 ffff8101000e5c1c
> ffff81011446e728 ffffffff807e6280 ffffffff807e6280 ffff8100388ca680
> ffffffff80601890 ffff8100388ca9c0 0000000000200200 ffff8100388ca9c0
> Call Trace:
> [<ffffffff804485ec>] __mutex_lock_slowpath+0x72/0xa9
> [<ffffffff8044847a>] mutex_lock+0x1e/0x22
> [<ffffffff88d7f630>] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
> [<ffffffff88d7c78e>] :kvm:kvm_vm_ioctl+0x744/0x777
> [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
> [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
> [<ffffffff802acdde>] sys_ioctl+0x55/0x77
> [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
> [<00007f2f3b15eb67>]
>
>
Waiting for kvm->lock, so can't kill or strace.
> qemu-system-x R running task 0 27908 9501
> 0000000000000000 ffffffff88d7d3ad 0000000000000390 ffff810100120040
> ffff810116491000 00000000fee00390 0000000000000000 0000000000000000
> ffff81011b361d08 ffffffff88d7f1fb 0000000000000000 0000000100000000
> Call Trace:
> Inexact backtrace:
>
> [<ffffffff88d7d3ad>] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
> [<ffffffff88d7f1fb>] :kvm:emulate_instruction+0x199/0x266
> [<ffffffff88d86700>] :kvm:kvm_mmu_page_fault+0x49/0x86
> [<ffffffff88a3ebe8>] :kvm_amd:pf_interception+0xa8/0xb1
> [<ffffffff88a3e1b4>] :kvm_amd:handle_exit+0x218/0x221
> [<ffffffff88d810f6>] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
> [<ffffffff88d7a4f0>] :kvm:kvm_vcpu_ioctl+0xf6/0x485
> [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
> [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
> [<ffffffff802a13a3>] fget_light+0x1/0x83
> [<ffffffff802acdde>] sys_ioctl+0x55/0x77
> [<ffffffff802a0b48>] sys_writev+0x60/0x94
> [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
>
But the mutex is not taken here. Looks like we lost it, maybe
CONFIG_LOCKDEP can find out where.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM host kernel hang
2009-01-07 10:15 ` Avi Kivity
@ 2009-01-07 13:02 ` Alexander Graf
2009-01-07 13:12 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Alexander Graf @ 2009-01-07 13:02 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm@vger.kernel.org, Joerg Roedel
On 07.01.2009, at 11:15, Avi Kivity wrote:
> Alexander Graf wrote:
>> Hi,
>>
>> while trying to run a current openSUSE in VMWare ESX in KVM (using
>> NPT),
>> some KVM code seems to be stuck in an endless loop. The qemu process
>> hangs, I can't attach gdb to it and the kernel module seems to be
>> hanging in a place where I don't see any looping code. One CPU is
>> definitely stuck in sys at 100% though.
>>
>> This is running git as of yesterday with some minor ESX modifications
>> that should not touch any of these parts (userspace and MSRs).
>>
>> Maybe one of you guys has a clue what's going on here. You'll find a
>> snippet of a t-sysrq trace with all qemu relevant parts below. The
>> registers (incl. IP) of these don't change over time.
>>
>> Alex
>>
>> qemu-system-x D ffff810001025280 0 27900 9501
>> ffff8101000e5c58 0000000000000082 0000000000000000 ffff8101000e5c1c
>> ffff81011446e728 ffffffff807e6280 ffffffff807e6280 ffff8100388ca680
>> ffffffff80601890 ffff8100388ca9c0 0000000000200200 ffff8100388ca9c0
>> Call Trace:
>> [<ffffffff804485ec>] __mutex_lock_slowpath+0x72/0xa9
>> [<ffffffff8044847a>] mutex_lock+0x1e/0x22
>> [<ffffffff88d7f630>] :kvm:kvm_arch_vm_ioctl+0x30e/0x5ae
>> [<ffffffff88d7c78e>] :kvm:kvm_vm_ioctl+0x744/0x777
>> [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
>> [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
>> [<ffffffff802acdde>] sys_ioctl+0x55/0x77
>> [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
>> [<00007f2f3b15eb67>]
>>
>>
>
> Waiting for kvm->lock, so can't kill or strace.
>
>> qemu-system-x R running task 0 27908 9501
>> 0000000000000000 ffffffff88d7d3ad 0000000000000390 ffff810100120040
>> ffff810116491000 00000000fee00390 0000000000000000 0000000000000000
>> ffff81011b361d08 ffffffff88d7f1fb 0000000000000000 0000000100000000
>> Call Trace:
>> Inexact backtrace:
>>
>> [<ffffffff88d7d3ad>] :kvm:kvm_get_cs_db_l_bits+0x27/0x3e
>> [<ffffffff88d7f1fb>] :kvm:emulate_instruction+0x199/0x266
>> [<ffffffff88d86700>] :kvm:kvm_mmu_page_fault+0x49/0x86
>> [<ffffffff88a3ebe8>] :kvm_amd:pf_interception+0xa8/0xb1
>> [<ffffffff88a3e1b4>] :kvm_amd:handle_exit+0x218/0x221
>> [<ffffffff88d810f6>] :kvm:kvm_arch_vcpu_ioctl_run+0x600/0x81a
>> [<ffffffff88d7a4f0>] :kvm:kvm_vcpu_ioctl+0xf6/0x485
>> [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
>> [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
>> [<ffffffff802a13a3>] fget_light+0x1/0x83
>> [<ffffffff802acdde>] sys_ioctl+0x55/0x77
>> [<ffffffff802a0b48>] sys_writev+0x60/0x94
>> [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
>>
>
> But the mutex is not taken here. Looks like we lost it, maybe
> CONFIG_LOCKDEP can find out where.
I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
actually locking itself up?
Btw: The issue seems to be easily reproducible :-)
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM host kernel hang
2009-01-07 13:02 ` Alexander Graf
@ 2009-01-07 13:12 ` Avi Kivity
2009-01-07 13:41 ` Alexander Graf
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-01-07 13:12 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm@vger.kernel.org, Joerg Roedel
Alexander Graf wrote:
>
> I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
> actually locking itself up?
> Btw: The issue seems to be easily reproducible :-)
Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP. _SUPPORT just
indicates the arch can do it if you want, IIUC.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM host kernel hang
2009-01-07 13:12 ` Avi Kivity
@ 2009-01-07 13:41 ` Alexander Graf
2009-01-07 13:53 ` Avi Kivity
0 siblings, 1 reply; 7+ messages in thread
From: Alexander Graf @ 2009-01-07 13:41 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm@vger.kernel.org, Joerg Roedel
Avi Kivity wrote:
> Alexander Graf wrote:
>>
>> I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
>> actually locking itself up?
>> Btw: The issue seems to be easily reproducible :-)
>
> Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP. _SUPPORT just
> indicates the arch can do it if you want, IIUC.
I just added some debug #define's to show me where exactly things break.
Jan 7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
Jan 7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {
2145 mmio:
2146 /*
2147 * Is this MMIO handled locally?
2148 */
2149 mutex_lock(&vcpu->kvm->lock);
2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
2151 if (mmio_dev) {
2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
2153 mutex_unlock(&vcpu->kvm->lock);
2154 return X86EMUL_CONTINUE;
2155 }
2156 mutex_unlock(&vcpu->kvm->lock);
1901 case KVM_IRQ_LINE: {
1902 struct kvm_irq_level irq_event;
1903
1904 r = -EFAULT;
1905 if (copy_from_user(&irq_event, argp, sizeof
irq_event))
1906 goto out;
1907 if (irqchip_in_kernel(kvm)) {
1908 mutex_lock(&kvm->lock);
1909 kvm_set_irq(kvm,
KVM_USERSPACE_IRQ_SOURCE_ID,
1910 irq_event.irq, irq_event.level);
1911 mutex_unlock(&kvm->lock);
1912 r = 0;
1913 }
1914 break;
1915 }
Any ideas?
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: KVM host kernel hang
2009-01-07 13:41 ` Alexander Graf
@ 2009-01-07 13:53 ` Avi Kivity
2009-01-07 19:06 ` Alexander Graf
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-01-07 13:53 UTC (permalink / raw)
To: Alexander Graf; +Cc: kvm@vger.kernel.org, Joerg Roedel
Alexander Graf wrote:
> Avi Kivity wrote:
>
>> Alexander Graf wrote:
>>
>>> I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
>>> actually locking itself up?
>>> Btw: The issue seems to be easily reproducible :-)
>>>
>> Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP. _SUPPORT just
>> indicates the arch can do it if you want, IIUC.
>>
>
> I just added some debug #define's to show me where exactly things break.
>
>
> Jan 7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
> Jan 7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {
>
> 2145 mmio:
> 2146 /*
> 2147 * Is this MMIO handled locally?
> 2148 */
> 2149 mutex_lock(&vcpu->kvm->lock);
> 2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
> 2151 if (mmio_dev) {
> 2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
> 2153 mutex_unlock(&vcpu->kvm->lock);
> 2154 return X86EMUL_CONTINUE;
> 2155 }
> 2156 mutex_unlock(&vcpu->kvm->lock);
>
>
The lock was lost here. But how?
> 1901 case KVM_IRQ_LINE: {
> 1902 struct kvm_irq_level irq_event;
> 1903
> 1904 r = -EFAULT;
> 1905 if (copy_from_user(&irq_event, argp, sizeof
> irq_event))
> 1906 goto out;
> 1907 if (irqchip_in_kernel(kvm)) {
> 1908 mutex_lock(&kvm->lock);
> 1909 kvm_set_irq(kvm,
> KVM_USERSPACE_IRQ_SOURCE_ID,
> 1910 irq_event.irq, irq_event.level);
> 1911 mutex_unlock(&kvm->lock);
> 1912 r = 0;
> 1913 }
> 1914 break;
> 1915 }
>
>
This is your hung iothread trying to inject an interrupt. It's waiting
for the lost lock.
I suggest enabling all the lock debug magic you can find in kconfig.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: KVM host kernel hang
2009-01-07 13:53 ` Avi Kivity
@ 2009-01-07 19:06 ` Alexander Graf
0 siblings, 0 replies; 7+ messages in thread
From: Alexander Graf @ 2009-01-07 19:06 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm@vger.kernel.org, Joerg Roedel
On 07.01.2009, at 14:53, Avi Kivity <avi@redhat.com> wrote:
> Alexander Graf wrote:
>> Avi Kivity wrote:
>>
>>> Alexander Graf wrote:
>>>
>>>> I have CONFIG_LOCKDEP_SUPPORT=y. How do I make it detect that it's
>>>> actually locking itself up?
>>>> Btw: The issue seems to be easily reproducible :-)
>>>>
>>> Perhaps CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP. _SUPPORT just
>>> indicates the arch can do it if you want, IIUC.
>>>
>>
>> I just added some debug #define's to show me where exactly things
>> break.
>>
>>
>> Jan 7 14:34:46 linux-dp8n kernel: 2149: Grabbing lock {
>> Jan 7 14:34:46 linux-dp8n kernel: 1908: Grabbing lock {
>>
>> 2145 mmio:
>> 2146 /*
>> 2147 * Is this MMIO handled locally?
>> 2148 */
>> 2149 mutex_lock(&vcpu->kvm->lock);
>> 2150 mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
>> 2151 if (mmio_dev) {
>> 2152 kvm_iodevice_read(mmio_dev, gpa, bytes, val);
>> 2153 mutex_unlock(&vcpu->kvm->lock);
>> 2154 return X86EMUL_CONTINUE;
>> 2155 }
>> 2156 mutex_unlock(&vcpu->kvm->lock);
>>
>>
>
> The lock was lost here. But how?
>
>> 1901 case KVM_IRQ_LINE: {
>> 1902 struct kvm_irq_level irq_event;
>> 1903
>> 1904 r = -EFAULT;
>> 1905 if (copy_from_user(&irq_event, argp, sizeof
>> irq_event))
>> 1906 goto out;
>> 1907 if (irqchip_in_kernel(kvm)) {
>> 1908 mutex_lock(&kvm->lock);
>> 1909 kvm_set_irq(kvm,
>> KVM_USERSPACE_IRQ_SOURCE_ID,
>> 1910 irq_event.irq,
>> irq_event.level);
>> 1911 mutex_unlock(&kvm->lock);
>> 1912 r = 0;
>> 1913 }
>> 1914 break;
>> 1915 }
>>
>>
> This is your hung iothread trying to inject an interrupt. It's
> waiting for the lost lock.
>
> I suggest enabling all the lock debug magic you can find in kconfig.
I did that and still don't get anything. I'll try digging deeper into
this tomorrow.
Alex
>
>
> --
> error compiling committee.c: too many arguments to function
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-01-07 19:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-07 8:35 KVM host kernel hang Alexander Graf
2009-01-07 10:15 ` Avi Kivity
2009-01-07 13:02 ` Alexander Graf
2009-01-07 13:12 ` Avi Kivity
2009-01-07 13:41 ` Alexander Graf
2009-01-07 13:53 ` Avi Kivity
2009-01-07 19:06 ` Alexander Graf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox