Re: kernel bug in kvm

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: kernel bug in kvm_intel
       [not found]                       ` <4AED5C3F.9050506@kernel.org>
@ 2009-11-01 10:20                         ` Avi Kivity
  2009-11-01 10:45                           ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-11-01 10:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Theurer, kvm, Linux-kernel

On 11/01/2009 12:00 PM, Tejun Heo wrote:
> Hello,
>
> Avi Kivity wrote:
>    
>> Only, that merge doesn't change virt/kvm or arch/x86/kvm.
>>
>> Tejun, anything known bad about that merge?  ada3fa15 kills kvm.
>>      
> Nothing rings a bell at the moment.  How does it kill kvm?  One big
> difference caused by that merge is use of sparse areas near the top of
> vmalloc area.  This caused vmalloc area shortage on sparc64 and
> exposed paging code bug on ppc64 which caused the cpu to fault
> repeatedly on the same address.  Maybe something similiar is happening
> with kvm?
>
>    

We get a page fault immediately (next instruction) after returning from 
the guest when running with oprofile.  The page fault address does not 
match anything the instruction does, so presumably it is one of the 
accesses the processor performs in order to service an NMI (ordinary 
interrupts are masked; and the fact that it happens with oprofile 
strengthens this assumption).

If this is correct, the fault is not in the NMI handler itself, but in 
one of the memory areas the cpu looks in to vector the NMI, which can be:

- the IDT
- the GDT
- the TSS
- the NMI stack

Except for the IDT these are per-cpu structure, though I don't know 
whether they are allocated with the percpu infrastructure.

Here is the code in question:

>     3ae7:       75 05                   jne    3aee<vmx_vcpu_run+0x26a>
>       3ae9:       0f 01 c2                vmlaunch
>       3aec:       eb 03                   jmp    3af1<vmx_vcpu_run+0x26d>
>       3aee:       0f 01 c3                vmresume
>       3af1:       48 87 0c 24             xchg   %rcx,(%rsp)

^^^ fault, but not at (%rsp)

>       3af5:       48 89 81 18 01 00 00    mov    %rax,0x118(%rcx)
>       3afc:       48 89 99 30 01 00 00    mov    %rbx,0x130(%rcx)




--

error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-01 10:20                         ` kernel bug in kvm_intel Avi Kivity
@ 2009-11-01 10:45                           ` Tejun Heo
  2009-11-01 11:31                             ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2009-11-01 10:45 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Andrew Theurer, kvm, Linux-kernel

Hello,

Avi Kivity wrote:
> We get a page fault immediately (next instruction) after returning from
> the guest when running with oprofile.  The page fault address does not
> match anything the instruction does, so presumably it is one of the
> accesses the processor performs in order to service an NMI (ordinary
> interrupts are masked; and the fact that it happens with oprofile
> strengthens this assumption).

Ah... okay, that's tricky but IIRC faults like that can be
distinguished from regular ones via processor state, right?

> If this is correct, the fault is not in the NMI handler itself, but in
> one of the memory areas the cpu looks in to vector the NMI, which can be:
> 
> - the IDT
> - the GDT
> - the TSS
> - the NMI stack
> 
> Except for the IDT these are per-cpu structure, though I don't know
> whether they are allocated with the percpu infrastructure.

Don't know where NMI stack is but all else are percpu.

> Here is the code in question:
> 
>>     3ae7:       75 05                   jne    3aee<vmx_vcpu_run+0x26a>
>>       3ae9:       0f 01 c2                vmlaunch
>>       3aec:       eb 03                   jmp    3af1<vmx_vcpu_run+0x26d>
>>       3aee:       0f 01 c3                vmresume
>>       3af1:       48 87 0c 24             xchg   %rcx,(%rsp)
> 
> ^^^ fault, but not at (%rsp)

Can you please post the full oops (including kernel debug messages
during boot) or give me a pointer to the original message?  Also, does
the faulting address coincide with any symbol?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-01 10:45                           ` Tejun Heo
@ 2009-11-01 11:31                             ` Avi Kivity
  2009-11-18  9:26                               ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-11-01 11:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Andrew Theurer, kvm, Linux-kernel

On 11/01/2009 12:45 PM, Tejun Heo wrote:
> Hello,
>
> Avi Kivity wrote:
>    
>> We get a page fault immediately (next instruction) after returning from
>> the guest when running with oprofile.  The page fault address does not
>> match anything the instruction does, so presumably it is one of the
>> accesses the processor performs in order to service an NMI (ordinary
>> interrupts are masked; and the fact that it happens with oprofile
>> strengthens this assumption).
>>      
> Ah... okay, that's tricky but IIRC faults like that can be
> distinguished from regular ones via processor state, right?
>    

Not on x86.  But given that the fault address is different from %rsp 
(which is what the instruction accesses) and %rip, there aren't many 
alternatives.

>> Here is the code in question:
>>
>>      
>>>      3ae7:       75 05                   jne    3aee<vmx_vcpu_run+0x26a>
>>>        3ae9:       0f 01 c2                vmlaunch
>>>        3aec:       eb 03                   jmp    3af1<vmx_vcpu_run+0x26d>
>>>        3aee:       0f 01 c3                vmresume
>>>        3af1:       48 87 0c 24             xchg   %rcx,(%rsp)
>>>        
>> ^^^ fault, but not at (%rsp)
>>      
> Can you please post the full oops (including kernel debug messages
> during boot) or give me a pointer to the original message?

http://www.mail-archive.com/kvm@vger.kernel.org/msg23458.html

> Also, does
> the faulting address coincide with any symbol?
>    

No (at least, not in System.map).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-01 11:31                             ` Avi Kivity
@ 2009-11-18  9:26                               ` Tejun Heo
  2009-11-26  1:35                                 ` Andrew Theurer
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2009-11-18  9:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Andrew Theurer, kvm, Linux-kernel

Hello,

11/01/2009 08:31 PM, Avi Kivity wrote:
>>> Here is the code in question:
>>>
>>>     
>>>>      3ae7:       75 05                   jne   
>>>> 3aee<vmx_vcpu_run+0x26a>
>>>>        3ae9:       0f 01 c2                vmlaunch
>>>>        3aec:       eb 03                   jmp   
>>>> 3af1<vmx_vcpu_run+0x26d>
>>>>        3aee:       0f 01 c3                vmresume
>>>>        3af1:       48 87 0c 24             xchg   %rcx,(%rsp)
>>>>        
>>> ^^^ fault, but not at (%rsp)
>>>      
>> Can you please post the full oops (including kernel debug messages
>> during boot) or give me a pointer to the original message?
> 
> http://www.mail-archive.com/kvm@vger.kernel.org/msg23458.html
> 
>> Also, does
>> the faulting address coincide with any symbol?
>>    
> 
> No (at least, not in System.map).

Has there been any progress?  Is kvm + oprofile still broken?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-18  9:26                               ` Tejun Heo
@ 2009-11-26  1:35                                 ` Andrew Theurer
  2009-11-26  1:41                                   ` Tejun Heo
                                                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Theurer @ 2009-11-26  1:35 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Avi Kivity, kvm, Linux-kernel

Tejun Heo wrote:
> Hello,
> 
> 11/01/2009 08:31 PM, Avi Kivity wrote:
>>>> Here is the code in question:
>>>>
>>>>     
>>>>>      3ae7:       75 05                   jne   
>>>>> 3aee<vmx_vcpu_run+0x26a>
>>>>>        3ae9:       0f 01 c2                vmlaunch
>>>>>        3aec:       eb 03                   jmp   
>>>>> 3af1<vmx_vcpu_run+0x26d>
>>>>>        3aee:       0f 01 c3                vmresume
>>>>>        3af1:       48 87 0c 24             xchg   %rcx,(%rsp)
>>>>>        
>>>> ^^^ fault, but not at (%rsp)
>>>>      
>>> Can you please post the full oops (including kernel debug messages
>>> during boot) or give me a pointer to the original message?
>> http://www.mail-archive.com/kvm@vger.kernel.org/msg23458.html
>>
>>> Also, does
>>> the faulting address coincide with any symbol?
>>>    
>> No (at least, not in System.map).
> 
> Has there been any progress?  Is kvm + oprofile still broken?
>

I just tried testing tip of kvm.git, but unfortunately I think I might 
be hitting a different problem, where processes run 100% in kernel mode. 
  In my case, cpus 9 and 13 were stuck, running qemu processes.  A stack 
backtrace for both cpus are below.  FWIW, kernel.org 2.6.32-rc7 does not 
have this problem, or the original problem.


> NMI backtrace for cpu 9
> CPU 9:
> Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
> Pid: 5687, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1  -[7947AC1]-
> RIP: 0010:[<ffffffff810b802b>]  [<ffffffff810b802b>] fire_user_return_notifiers+0x31/0x36
> RSP: 0018:ffff88095024df08  EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000800 RCX: ffff88095024c000
> RDX: ffff880028340000 RSI: 0000000000000000 RDI: ffff88095024df58
> RBP: ffff88095024df18 R08: 0000000000000000 R09: 0000000000000001
> R10: 000000caf1fff62d R11: ffff8805b584de40 R12: 00007fffae48e0f0
> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> FS:  00007f45c69d57c0(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: fffff9800121056e CR3: 0000000953d36000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
>  <#DB[1]>  <<EOE>> Pid: 5687, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
> Call Trace:
>  <NMI>  [<ffffffff8100af53>] ? show_regs+0x44/0x49
>  [<ffffffff812e57b2>] nmi_watchdog_tick+0xc2/0x1b9
>  [<ffffffff812e4e73>] do_nmi+0xb0/0x252
>  [<ffffffff812e48a0>] nmi+0x20/0x30
>  [<ffffffff810b802b>] ? fire_user_return_notifiers+0x31/0x36
>  <<EOE>>  [<ffffffff8100b844>] do_notify_resume+0x62/0x69
>  [<ffffffff8100bf48>] ? int_check_syscall_exit_work+0x9/0x3d
>  [<ffffffff8100bf8e>] int_signal+0x12/0x17

> NMI backtrace for cpu 13
> CPU 13:
> Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
> Pid: 5792, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1  -[7947AC1]-
> RIP: 0010:[<ffffffff8100bfb0>]  [<ffffffff8100bfb0>] int_restore_rest+0x1d/0x3d
> RSP: 0018:ffff88124f491f58  EFLAGS: 00000292
> RAX: 0000000000000800 RBX: 00007fff9df852e0 RCX: ffff88124f490000
> RDX: ffff88099ff40000 RSI: 0000000000000000 RDI: 000000000000fe2e
> RBP: 00007fff9df85260 R08: ffff88124f490000 R09: 0000000000000000
> R10: 0000000000000005 R11: ffff880954971da0 R12: 00007fff9df851e0
> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> FS:  00007f73b5b1d7c0(0000) GS:ffff88099ff40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f8d5a8de9d0 CR3: 0000000eb34d7000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
>  <#DB[1]>  <<EOE>> Pid: 5792, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
> Call Trace:
>  <NMI>  [<ffffffff8100af53>] ? show_regs+0x44/0x49
>  [<ffffffff812e57b2>] nmi_watchdog_tick+0xc2/0x1b9
>  [<ffffffff812e4e73>] do_nmi+0xb0/0x252
>  [<ffffffff812e48a0>] nmi+0x20/0x30
>  [<ffffffff8100bfb0>] ? int_restore_rest+0x1d/0x3d
>  <<EOE>> 


-Andrew



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-26  1:35                                 ` Andrew Theurer
@ 2009-11-26  1:41                                   ` Tejun Heo
  2009-11-26 10:31                                   ` Avi Kivity
  2009-11-29 14:46                                   ` Avi Kivity
  2 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2009-11-26  1:41 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Avi Kivity, kvm, Linux-kernel

Hello,

11/26/2009 10:35 AM, Andrew Theurer wrote:
> I just tried testing tip of kvm.git, but unfortunately I think I might
> be hitting a different problem, where processes run 100% in kernel mode.
> In my case, cpus 9 and 13 were stuck, running qemu processes.  A stack
> backtrace for both cpus are below.  FWIW, kernel.org 2.6.32-rc7 does not
> have this problem, or the original problem.

2.6.32-rc7 doesn't have problem with kvm + oprofile?  If the original
analysis was right, I can't think of anything which could have changed
that between the merge commit and 2.6.32-rc7.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-26  1:35                                 ` Andrew Theurer
  2009-11-26  1:41                                   ` Tejun Heo
@ 2009-11-26 10:31                                   ` Avi Kivity
  2009-11-26 13:47                                     ` Andrew Theurer
  2009-11-29 14:46                                   ` Avi Kivity
  2 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-11-26 10:31 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Tejun Heo, kvm, Linux-kernel

On 11/26/2009 03:35 AM, Andrew Theurer wrote:
>
>> NMI backtrace for cpu 9
>> CPU 9:
>> Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc 
>> dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel 
>> kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata 
>> ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii 
>> matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 
>> matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core 
>> pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 
>> rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx 
>> scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd 
>> mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
>> Pid: 5687, comm: qemu-system-x86 Not tainted 
>> 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1  
>> -[7947AC1]-
>> RIP: 0010:[<ffffffff810b802b>]  [<ffffffff810b802b>] 
>> fire_user_return_notifiers+0x31/0x36
>> RSP: 0018:ffff88095024df08  EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: 0000000000000800 RCX: ffff88095024c000
>> RDX: ffff880028340000 RSI: 0000000000000000 RDI: ffff88095024df58
>> RBP: ffff88095024df18 R08: 0000000000000000 R09: 0000000000000001
>> R10: 000000caf1fff62d R11: ffff8805b584de40 R12: 00007fffae48e0f0
>> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
>> FS:  00007f45c69d57c0(0000) GS:ffff880028340000(0000) 
>> knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: fffff9800121056e CR3: 0000000953d36000 CR4: 00000000000026e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Call Trace:
>> <#DB[1]> <<EOE>> Pid: 5687, comm: qemu-system-x86 Not tainted 
>> 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
>> Call Trace:
>> <NMI>  [<ffffffff8100af53>] ? show_regs+0x44/0x49
>>  [<ffffffff812e57b2>] nmi_watchdog_tick+0xc2/0x1b9
>>  [<ffffffff812e4e73>] do_nmi+0xb0/0x252
>>  [<ffffffff812e48a0>] nmi+0x20/0x30
>>  [<ffffffff810b802b>] ? fire_user_return_notifiers+0x31/0x36
>> <<EOE>>  [<ffffffff8100b844>] do_notify_resume+0x62/0x69
>>  [<ffffffff8100bf48>] ? int_check_syscall_exit_work+0x9/0x3d
>>  [<ffffffff8100bf8e>] int_signal+0x12/0x17
>

That's a bug with the new user return notifiers.  Is your host kernel 
preemptible?

I think I saw this once but I'm not sure.  I can't reproduce with a host 
kernel build, some silly guest workload, and 'perf top' to generate an 
nmi load.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-26 10:31                                   ` Avi Kivity
@ 2009-11-26 13:47                                     ` Andrew Theurer
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Theurer @ 2009-11-26 13:47 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Tejun Heo, kvm, Linux-kernel

Avi Kivity wrote:
> On 11/26/2009 03:35 AM, Andrew Theurer wrote:
>>
>>> NMI backtrace for cpu 9
>>> CPU 9:
>>> Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc 
>>> dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel 
>>> kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata 
>>> ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii 
>>> matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 
>>> matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core 
>>> pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 
>>> rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx 
>>> scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd 
>>> mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
>>> Pid: 5687, comm: qemu-system-x86 Not tainted 
>>> 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1  
>>> -[7947AC1]-
>>> RIP: 0010:[<ffffffff810b802b>]  [<ffffffff810b802b>] 
>>> fire_user_return_notifiers+0x31/0x36
>>> RSP: 0018:ffff88095024df08  EFLAGS: 00000246
>>> RAX: 0000000000000000 RBX: 0000000000000800 RCX: ffff88095024c000
>>> RDX: ffff880028340000 RSI: 0000000000000000 RDI: ffff88095024df58
>>> RBP: ffff88095024df18 R08: 0000000000000000 R09: 0000000000000001
>>> R10: 000000caf1fff62d R11: ffff8805b584de40 R12: 00007fffae48e0f0
>>> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
>>> FS:  00007f45c69d57c0(0000) GS:ffff880028340000(0000) 
>>> knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> CR2: fffff9800121056e CR3: 0000000953d36000 CR4: 00000000000026e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Call Trace:
>>> <#DB[1]> <<EOE>> Pid: 5687, comm: qemu-system-x86 Not tainted 
>>> 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
>>> Call Trace:
>>> <NMI>  [<ffffffff8100af53>] ? show_regs+0x44/0x49
>>>  [<ffffffff812e57b2>] nmi_watchdog_tick+0xc2/0x1b9
>>>  [<ffffffff812e4e73>] do_nmi+0xb0/0x252
>>>  [<ffffffff812e48a0>] nmi+0x20/0x30
>>>  [<ffffffff810b802b>] ? fire_user_return_notifiers+0x31/0x36
>>> <<EOE>>  [<ffffffff8100b844>] do_notify_resume+0x62/0x69
>>>  [<ffffffff8100bf48>] ? int_check_syscall_exit_work+0x9/0x3d
>>>  [<ffffffff8100bf8e>] int_signal+0x12/0x17
>>
> 
> That's a bug with the new user return notifiers.  Is your host kernel 
> preemptible?

preempt is off.
> 
> I think I saw this once but I'm not sure.  I can't reproduce with a host 
> kernel build, some silly guest workload, and 'perf top' to generate an 
> nmi load.
> 

-Andrew



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-26  1:35                                 ` Andrew Theurer
  2009-11-26  1:41                                   ` Tejun Heo
  2009-11-26 10:31                                   ` Avi Kivity
@ 2009-11-29 14:46                                   ` Avi Kivity
  2009-11-30 16:27                                     ` Andrew Theurer
  2 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2009-11-29 14:46 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: Tejun Heo, kvm, Linux-kernel

On 11/26/2009 03:35 AM, Andrew Theurer wrote:
> I just tried testing tip of kvm.git, but unfortunately I think I might 
> be hitting a different problem, where processes run 100% in kernel 
> mode.  In my case, cpus 9 and 13 were stuck, running qemu processes.  
> A stack backtrace for both cpus are below.  FWIW, kernel.org 
> 2.6.32-rc7 does not have this problem, or the original problem.

I just posted a patch fixing this, titled "[PATCH tip:x86/entry] core: 
fix user return notifier on fork()".

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel bug in kvm_intel
  2009-11-29 14:46                                   ` Avi Kivity
@ 2009-11-30 16:27                                     ` Andrew Theurer
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Theurer @ 2009-11-30 16:27 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Tejun Heo, kvm, Linux-kernel

On Sun, 2009-11-29 at 16:46 +0200, Avi Kivity wrote:
> On 11/26/2009 03:35 AM, Andrew Theurer wrote:
> > I just tried testing tip of kvm.git, but unfortunately I think I might 
> > be hitting a different problem, where processes run 100% in kernel 
> > mode.  In my case, cpus 9 and 13 were stuck, running qemu processes.  
> > A stack backtrace for both cpus are below.  FWIW, kernel.org 
> > 2.6.32-rc7 does not have this problem, or the original problem.
> 
> I just posted a patch fixing this, titled "[PATCH tip:x86/entry] core: 
> fix user return notifier on fork()".
> 
Thank you, Avi.  I am running on this patch and am not seeing this
problem anymore.  I'll be testing for the previous issue next.

-Andrew


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-11-30 16:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4ACF9745.3050902@linux.vnet.ibm.com>
     [not found] ` <4AD16ACE.6040903@redhat.com>
     [not found]   ` <1255372957.4883.49.camel@twinturbo.austin.ibm.com>
     [not found]     ` <4AD4231F.6040608@redhat.com>
     [not found]       ` <1255442640.4883.56.camel@twinturbo.austin.ibm.com>
     [not found]         ` <4AD6061D.5070306@redhat.com>
     [not found]           ` <1255637909.4883.129.camel@twinturbo.austin.ibm.com>
     [not found]             ` <1256926052.4883.203.camel@twinturbo.austin.ibm.com>
     [not found]               ` <4AEC5C24.9080506@redhat.com>
     [not found]                 ` <4AEC64FC.7070908@linux.vnet.ibm.com>
     [not found]                   ` <4AEC6699.6000202@redhat.com>
     [not found]                     ` <4AEC6821.7010801@redhat.com>
     [not found]                       ` <4AED5C3F.9050506@kernel.org>
2009-11-01 10:20                         ` kernel bug in kvm_intel Avi Kivity
2009-11-01 10:45                           ` Tejun Heo
2009-11-01 11:31                             ` Avi Kivity
2009-11-18  9:26                               ` Tejun Heo
2009-11-26  1:35                                 ` Andrew Theurer
2009-11-26  1:41                                   ` Tejun Heo
2009-11-26 10:31                                   ` Avi Kivity
2009-11-26 13:47                                     ` Andrew Theurer
2009-11-29 14:46                                   ` Avi Kivity
2009-11-30 16:27                                     ` Andrew Theurer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox