[Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa
@ 2014-11-27 12:58 Gonglei (Arei)
  0 siblings, 0 replies; 7+ messages in thread
From: Gonglei (Arei) @ 2014-11-27 12:58 UTC (permalink / raw)
  To: qemu-devel@nongnu.org

Hi,

Running a redhat-6.4-64bit (kernel 2.6.32-358.el6.x86_64) or elder guest on
qemu-2.1, with kvm enabled and -cpu host, non default cpu-topology and guest numa
I'm seeing a reliable kernel panic from the guest shortly after boot. It is happening in
find_busiest_group(). 

We also found it happend since commit
787aaf5703a702094f395db6795e74230282cd62 by git bisect.

The reproducer:

(1) full qemu cmd line:
qemu-system-x86_64 -machine pc-i440fx-2.1,accel=kvm,usb=off \
-cpu host -m 16384 \
-smp 16,sockets=2,cores=4,threads=2 \
-object memory-backend-ram,size=8192M,id=ram-node0 \
-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
-object memory-backend-ram,size=8192M,id=ram-node1 \
-numa node,nodeid=1,cpus=8-15,memdev=ram-node1 \
-boot c -drive file=/data/wxin/vm/redhat_6.4_64 \
-vnc 0.0.0.0:0 -device cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x1.0x4 \
-msg timestamp=on

(2)the guest kernel messages:

divide error: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1 QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:[<ffffffff81059a9c>]  [<ffffffff81059a9c>] find_busiest_group+0x55c/0x9f0
RSP: 0018:ffff88023c85f9e0  EFLAGS: 00010046
RAX: 0000000000100000 RBX: ffff88023c85fbdc RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000010
RBP: ffff88023c85fb50 R08: ffff88023ca16c10 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffff01
R13: 0000000000016700 R14: ffffffffffffffff R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff88023c85e000, task ffff88043d27c040)
Stack:
 ffff88023c85faf0 ffff88023c85fa60 ffff88023c85fbc8 0000000200000000
<d> 0000000100000000 ffff880028210b60 0000000100000001 0000000000000008
<d> 0000000000016700 0000000000016700 ffff88023ca16c00 0000000000016700
Call Trace:
 [<ffffffff8150da2a>] thread_return+0x398/0x76e
 [<ffffffff8150e555>] schedule_timeout+0x215/0x2e0
 [<ffffffff81065905>] ? enqueue_entity+0x125/0x410
 [<ffffffff8150e1d3>] wait_for_common+0x123/0x180
 [<ffffffff81063310>] ? default_wake_function+0x0/0x20
 [<ffffffff8150e2ed>] wait_for_completion+0x1d/0x20
 [<ffffffff81096a89>] kthread_create+0x99/0x120
 [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
 [<ffffffff81167769>] ? alternate_node_alloc+0xc9/0xe0
 [<ffffffff810908d9>] create_workqueue_thread+0x59/0xd0
 [<ffffffff8150ebce>] ? mutex_lock+0x1e/0x50
 [<ffffffff810911bd>] __create_workqueue_key+0x14d/0x200
 [<ffffffff81c47233>] init_workqueues+0x9f/0xb1
 [<ffffffff81c2788c>] kernel_init+0x25e/0x2fe
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffff81c2762e>] ? kernel_init+0x0/0x2fe
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 8b b5 b0 fe ff ff 48 8b bd b8 fe ff ff e8 9d 85 ff ff 0f 1f 44 00 00 48 8b 95 e0
fe ff ff 48 8b 45 a8 8b 4a 08 48 c1 e0 0a 31 d2 <48> f7 f1 48 8b 4d b0 48 89 45 a0
31 c0 48 85 c9 74 0c 48 8b 45
RIP  [<ffffffff81059a9c>] find_busiest_group+0x55c/0x9f0
 RSP <ffff88023c85f9e0>
divide error: 0000 [#2]
---[ end trace d7d20afc6dd05e71 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1, comm: swapper Tainted: G      D    ---------------    2.6.32-358.el6.x86_64 #1
Call Trace:
 [<ffffffff8150cfc8>] ? panic+0xa7/0x16f
 [<ffffffff815111f4>] ? oops_end+0xe4/0x100
 [<ffffffff8100f19b>] ? die+0x5b/0x90
 [<ffffffff81510a34>] ? do_trap+0xc4/0x160
 [<ffffffff8100cf7f>] ? do_divide_error+0x8f/0xb0
 [<ffffffff81059a9c>] ? find_busiest_group+0x55c/0x9f0
 [<ffffffff8113b3a9>] ? zone_statistics+0x99/0xc0
 [<ffffffff8100bdfb>] ? divide_error+0x1b/0x20
 [<ffffffff81059a9c>] ? find_busiest_group+0x55c/0x9f0
 [<ffffffff8150da2a>] ? thread_return+0x398/0x76e
 [<ffffffff8150e555>] ? schedule_timeout+0x215/0x2e0
 [<ffffffff81065905>] ? enqueue_entity+0x125/0x410
 [<ffffffff8150e1d3>] ? wait_for_common+0x123/0x180
 [<ffffffff81063310>] ? default_wake_function+0x0/0x20
 [<ffffffff8150e2ed>] ? wait_for_completion+0x1d/0x20
 [<ffffffff81096a89>] ? kthread_create+0x99/0x120
 [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
 [<ffffffff81167769>] ? alternate_node_alloc+0xc9/0xe0
 [<ffffffff810908d9>] ? create_workqueue_thread+0x59/0xd0
 [<ffffffff8150ebce>] ? mutex_lock+0x1e/0x50
 [<ffffffff810911bd>] ? __create_workqueue_key+0x14d/0x200
 [<ffffffff81c47233>] ? init_workqueues+0x9f/0xb1
 [<ffffffff81c2788c>] ? kernel_init+0x25e/0x2fe
 [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
 [<ffffffff81c2762e>] ? kernel_init+0x0/0x2fe
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

---
(3)host info

/proc/cpuinfo on the host has 16 of these:

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
stepping        : 7
microcode       : 1803
cpu MHz         : 3301.000
cache size      : 10240 KB
physical id     : 1
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 39
initial apicid  : 39
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx
est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt
tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts
dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6599.83
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:


host numa topo:

node 0 cpus: 0 1 2 3 8 9 10 11
node 0 size: 40936 MB
node 0 free: 39625 MB
node 1 cpus: 4 5 6 7 12 13 14 15
node 1 size: 40960 MB
node 1 free: 39876 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

(4) With "sched_debug loglevel=8" kernel parameter command line, 
you can see follow error log(those "ERROR"s):

CPU0 attaching sched-domain:
 domain 0: span 0-15 level MC
  groups: 0 (cpu_power = 1023) 1 2 3 4 5 6 7 8 9 10 (cpu_power = 1023) 11 12 13 14 15
ERROR: parent span is not a superset of domain->span
  domain 1: span 0-7 level CPU
ERROR: domain->groups does not contain CPU0
   groups: 8-15 (cpu_power = 16382)
ERROR: groups don't span domain->span
   domain 2: span 0-15 level NODE
    groups:
ERROR: domain->cpu_power not set

Any comments and help will be appreciated! 

Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa
@ 2014-11-27 13:00 Gonglei (Arei)
  2014-11-27 17:20 ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Gonglei (Arei) @ 2014-11-27 13:00 UTC (permalink / raw)
  To: Gonglei (Arei), qemu-devel@nongnu.org
  Cc: Huangweidong (C), benoit@irqsave.net, wangxin (U),
	Huangpeng (Peter), Paolo Bonzini, Herongguang (Stephen)

Cc'ing Paolo and Benoît.


Best regards,
-Gonglei


> -----Original Message-----
> From: Gonglei (Arei)
> Sent: Thursday, November 27, 2014 8:58 PM
> To: qemu-devel@nongnu.org
> Subject: [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and
> guest numa
> 
> Hi,
> 
> Running a redhat-6.4-64bit (kernel 2.6.32-358.el6.x86_64) or elder guest on
> qemu-2.1, with kvm enabled and -cpu host, non default cpu-topology and guest
> numa
> I'm seeing a reliable kernel panic from the guest shortly after boot. It is
> happening in
> find_busiest_group().
> 
> We also found it happend since commit
> 787aaf5703a702094f395db6795e74230282cd62 by git bisect.
> 
> The reproducer:
> 
> (1) full qemu cmd line:
> qemu-system-x86_64 -machine pc-i440fx-2.1,accel=kvm,usb=off \
> -cpu host -m 16384 \
> -smp 16,sockets=2,cores=4,threads=2 \
> -object memory-backend-ram,size=8192M,id=ram-node0 \
> -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
> -object memory-backend-ram,size=8192M,id=ram-node1 \
> -numa node,nodeid=1,cpus=8-15,memdev=ram-node1 \
> -boot c -drive file=/data/wxin/vm/redhat_6.4_64 \
> -vnc 0.0.0.0:0 -device
> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x1.0x4 \
> -msg timestamp=on
> 
> (2)the guest kernel messages:
> 
> divide error: 0000 [#1] SMP
> last sysfs file:
> CPU 0
> Modules linked in:
> 
> Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1 QEMU Standard
> PC (i440FX + PIIX, 1996)
> RIP: 0010:[<ffffffff81059a9c>]  [<ffffffff81059a9c>]
> find_busiest_group+0x55c/0x9f0
> RSP: 0018:ffff88023c85f9e0  EFLAGS: 00010046
> RAX: 0000000000100000 RBX: ffff88023c85fbdc RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000010
> RBP: ffff88023c85fb50 R08: ffff88023ca16c10 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffff01
> R13: 0000000000016700 R14: ffffffffffffffff R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff880028200000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000407f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 1, threadinfo ffff88023c85e000, task ffff88043d27c040)
> Stack:
>  ffff88023c85faf0 ffff88023c85fa60 ffff88023c85fbc8 0000000200000000
> <d> 0000000100000000 ffff880028210b60 0000000100000001
> 0000000000000008
> <d> 0000000000016700 0000000000016700 ffff88023ca16c00
> 0000000000016700
> Call Trace:
>  [<ffffffff8150da2a>] thread_return+0x398/0x76e
>  [<ffffffff8150e555>] schedule_timeout+0x215/0x2e0
>  [<ffffffff81065905>] ? enqueue_entity+0x125/0x410
>  [<ffffffff8150e1d3>] wait_for_common+0x123/0x180
>  [<ffffffff81063310>] ? default_wake_function+0x0/0x20
>  [<ffffffff8150e2ed>] wait_for_completion+0x1d/0x20
>  [<ffffffff81096a89>] kthread_create+0x99/0x120
>  [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
>  [<ffffffff81167769>] ? alternate_node_alloc+0xc9/0xe0
>  [<ffffffff810908d9>] create_workqueue_thread+0x59/0xd0
>  [<ffffffff8150ebce>] ? mutex_lock+0x1e/0x50
>  [<ffffffff810911bd>] __create_workqueue_key+0x14d/0x200
>  [<ffffffff81c47233>] init_workqueues+0x9f/0xb1
>  [<ffffffff81c2788c>] kernel_init+0x25e/0x2fe
>  [<ffffffff8100c0ca>] child_rip+0xa/0x20
>  [<ffffffff81c2762e>] ? kernel_init+0x0/0x2fe
>  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> Code: 8b b5 b0 fe ff ff 48 8b bd b8 fe ff ff e8 9d 85 ff ff 0f 1f 44 00 00 48 8b 95 e0
> fe ff ff 48 8b 45 a8 8b 4a 08 48 c1 e0 0a 31 d2 <48> f7 f1 48 8b 4d b0 48 89 45 a0
> 31 c0 48 85 c9 74 0c 48 8b 45
> RIP  [<ffffffff81059a9c>] find_busiest_group+0x55c/0x9f0
>  RSP <ffff88023c85f9e0>
> divide error: 0000 [#2]
> ---[ end trace d7d20afc6dd05e71 ]---
> Kernel panic - not syncing: Fatal exception
> Pid: 1, comm: swapper Tainted: G      D    ---------------
> 2.6.32-358.el6.x86_64 #1
> Call Trace:
>  [<ffffffff8150cfc8>] ? panic+0xa7/0x16f
>  [<ffffffff815111f4>] ? oops_end+0xe4/0x100
>  [<ffffffff8100f19b>] ? die+0x5b/0x90
>  [<ffffffff81510a34>] ? do_trap+0xc4/0x160
>  [<ffffffff8100cf7f>] ? do_divide_error+0x8f/0xb0
>  [<ffffffff81059a9c>] ? find_busiest_group+0x55c/0x9f0
>  [<ffffffff8113b3a9>] ? zone_statistics+0x99/0xc0
>  [<ffffffff8100bdfb>] ? divide_error+0x1b/0x20
>  [<ffffffff81059a9c>] ? find_busiest_group+0x55c/0x9f0
>  [<ffffffff8150da2a>] ? thread_return+0x398/0x76e
>  [<ffffffff8150e555>] ? schedule_timeout+0x215/0x2e0
>  [<ffffffff81065905>] ? enqueue_entity+0x125/0x410
>  [<ffffffff8150e1d3>] ? wait_for_common+0x123/0x180
>  [<ffffffff81063310>] ? default_wake_function+0x0/0x20
>  [<ffffffff8150e2ed>] ? wait_for_completion+0x1d/0x20
>  [<ffffffff81096a89>] ? kthread_create+0x99/0x120
>  [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
>  [<ffffffff81167769>] ? alternate_node_alloc+0xc9/0xe0
>  [<ffffffff810908d9>] ? create_workqueue_thread+0x59/0xd0
>  [<ffffffff8150ebce>] ? mutex_lock+0x1e/0x50
>  [<ffffffff810911bd>] ? __create_workqueue_key+0x14d/0x200
>  [<ffffffff81c47233>] ? init_workqueues+0x9f/0xb1
>  [<ffffffff81c2788c>] ? kernel_init+0x25e/0x2fe
>  [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
>  [<ffffffff81c2762e>] ? kernel_init+0x0/0x2fe
>  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> 
> ---
> (3)host info
> 
> /proc/cpuinfo on the host has 16 of these:
> 
> processor       : 15
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 45
> model name      : Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
> stepping        : 7
> microcode       : 1803
> cpu MHz         : 3301.000
> cache size      : 10240 KB
> physical id     : 1
> siblings        : 8
> core id         : 3
> cpu cores       : 4
> apicid          : 39
> initial apicid  : 39
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 13
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> pdpe1gb
> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt
> tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts
> dtherm tpr_shadow vnmi flexpriority ept vpid
> bogomips        : 6599.83
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 46 bits physical, 48 bits virtual
> power management:
> 
> 
> host numa topo:
> 
> node 0 cpus: 0 1 2 3 8 9 10 11
> node 0 size: 40936 MB
> node 0 free: 39625 MB
> node 1 cpus: 4 5 6 7 12 13 14 15
> node 1 size: 40960 MB
> node 1 free: 39876 MB
> node distances:
> node   0   1
>   0:  10  21
>   1:  21  10
> 
> (4) With "sched_debug loglevel=8" kernel parameter command line,
> you can see follow error log(those "ERROR"s):
> 
> CPU0 attaching sched-domain:
>  domain 0: span 0-15 level MC
>   groups: 0 (cpu_power = 1023) 1 2 3 4 5 6 7 8 9 10 (cpu_power = 1023) 11 12
> 13 14 15
> ERROR: parent span is not a superset of domain->span
>   domain 1: span 0-7 level CPU
> ERROR: domain->groups does not contain CPU0
>    groups: 8-15 (cpu_power = 16382)
> ERROR: groups don't span domain->span
>    domain 2: span 0-15 level NODE
>     groups:
> ERROR: domain->cpu_power not set
> 
> Any comments and help will be appreciated!
> 
> Best regards,
> -Gonglei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa
  2014-11-27 13:00 [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa Gonglei (Arei)
@ 2014-11-27 17:20 ` Paolo Bonzini
  2014-11-28  2:38   ` Gonglei
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2014-11-27 17:20 UTC (permalink / raw)
  To: Gonglei (Arei), qemu-devel@nongnu.org
  Cc: Huangpeng (Peter), wangxin (U), Huangweidong (C),
	benoit@irqsave.net, Herongguang (Stephen)



On 27/11/2014 14:00, Gonglei (Arei) wrote:
>> 
>> Running a redhat-6.4-64bit (kernel 2.6.32-358.el6.x86_64) or elder guest on
>> qemu-2.1, with kvm enabled and -cpu host, non default cpu-topology and guest
>> numa
>> I'm seeing a reliable kernel panic from the guest shortly after boot. It is
>> happening in
>> find_busiest_group().
>> 
>> We also found it happend since commit
>> 787aaf5703a702094f395db6795e74230282cd62 by git bisect.
>> 
>> The reproducer:
>> 
>> (1) full qemu cmd line:
>> qemu-system-x86_64 -machine pc-i440fx-2.1,accel=kvm,usb=off \
>> -cpu host -m 16384 \
>> -smp 16,sockets=2,cores=4,threads=2 \
>> -object memory-backend-ram,size=8192M,id=ram-node0 \
>> -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
>> -object memory-backend-ram,size=8192M,id=ram-node1 \
>> -numa node,nodeid=1,cpus=8-15,memdev=ram-node1 \
>> -boot c -drive file=/data/wxin/vm/redhat_6.4_64 \
>> -vnc 0.0.0.0:0 -device
>> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x1.0x4 \
>> -msg timestamp=on
>> 
>> (2)the guest kernel messages:

Can you find what line of kernel/sched.c it is?

Thanks,

Paolo

>> divide error: 0000 [#1] SMP
>> last sysfs file:
>> CPU 0
>> Modules linked in:
>> 
>> Pid: 1, comm: swapper Not tainted 2.6.32-358.el6.x86_64 #1 QEMU Standard
>> PC (i440FX + PIIX, 1996)
>> RIP: 0010:[<ffffffff81059a9c>]  [<ffffffff81059a9c>]
>> find_busiest_group+0x55c/0x9f0
>> RSP: 0018:ffff88023c85f9e0  EFLAGS: 00010046
>> RAX: 0000000000100000 RBX: ffff88023c85fbdc RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000010
>> RBP: ffff88023c85fb50 R08: ffff88023ca16c10 R09: 0000000000000000
>> R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffff01
>> R13: 0000000000016700 R14: ffffffffffffffff R15: 0000000000000000
>> FS:  0000000000000000(0000) GS:ffff880028200000(0000)
>> knlGS:0000000000000000
>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> CR2: 0000000000000000 CR3: 0000000001a85000 CR4: 00000000000407f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process swapper (pid: 1, threadinfo ffff88023c85e000, task ffff88043d27c040)
>> Stack:
>>  ffff88023c85faf0 ffff88023c85fa60 ffff88023c85fbc8 0000000200000000
>> <d> 0000000100000000 ffff880028210b60 0000000100000001
>> 0000000000000008
>> <d> 0000000000016700 0000000000016700 ffff88023ca16c00
>> 0000000000016700
>> Call Trace:
>>  [<ffffffff8150da2a>] thread_return+0x398/0x76e
>>  [<ffffffff8150e555>] schedule_timeout+0x215/0x2e0
>>  [<ffffffff81065905>] ? enqueue_entity+0x125/0x410
>>  [<ffffffff8150e1d3>] wait_for_common+0x123/0x180
>>  [<ffffffff81063310>] ? default_wake_function+0x0/0x20
>>  [<ffffffff8150e2ed>] wait_for_completion+0x1d/0x20
>>  [<ffffffff81096a89>] kthread_create+0x99/0x120
>>  [<ffffffff81090950>] ? worker_thread+0x0/0x2a0
>>  [<ffffffff81167769>] ? alternate_node_alloc+0xc9/0xe0
>>  [<ffffffff810908d9>] create_workqueue_thread+0x59/0xd0
>>  [<ffffffff8150ebce>] ? mutex_lock+0x1e/0x50
>>  [<ffffffff810911bd>] __create_workqueue_key+0x14d/0x200
>>  [<ffffffff81c47233>] init_workqueues+0x9f/0xb1
>>  [<ffffffff81c2788c>] kernel_init+0x25e/0x2fe
>>  [<ffffffff8100c0ca>] child_rip+0xa/0x20
>>  [<ffffffff81c2762e>] ? kernel_init+0x0/0x2fe
>>  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
>> Code: 8b b5 b0 fe ff ff 48 8b bd b8 fe ff ff e8 9d 85 ff ff 0f 1f 44 00 00 48 8b 95 e0
>> fe ff ff 48 8b 45 a8 8b 4a 08 48 c1 e0 0a 31 d2 <48> f7 f1 48 8b 4d b0 48 89 45 a0
>> 31 c0 48 85 c9 74 0c 48 8b 45
>> RIP  [<ffffffff81059a9c>] find_busiest_group+0x55c/0x9f0
>>  RSP <ffff88023c85f9e0>
>> divide error: 0000 [#2]
>> ---[ end trace d7d20afc6dd05e71 ]---

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa
  2014-11-27 17:20 ` Paolo Bonzini
@ 2014-11-28  2:38   ` Gonglei
  2014-12-01  9:48     ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Gonglei @ 2014-11-28  2:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), benoit@irqsave.net, wangxin (U),
	qemu-devel@nongnu.org, Huangpeng (Peter), Herongguang (Stephen)

On 2014/11/28 1:20, Paolo Bonzini wrote:

> 
> On 27/11/2014 14:00, Gonglei (Arei) wrote:
>>> >> 
>>> >> Running a redhat-6.4-64bit (kernel 2.6.32-358.el6.x86_64) or elder guest on
>>> >> qemu-2.1, with kvm enabled and -cpu host, non default cpu-topology and guest
>>> >> numa
>>> >> I'm seeing a reliable kernel panic from the guest shortly after boot. It is
>>> >> happening in
>>> >> find_busiest_group().
>>> >> 
>>> >> We also found it happend since commit
>>> >> 787aaf5703a702094f395db6795e74230282cd62 by git bisect.
>>> >> 
>>> >> The reproducer:
>>> >> 
>>> >> (1) full qemu cmd line:
>>> >> qemu-system-x86_64 -machine pc-i440fx-2.1,accel=kvm,usb=off \
>>> >> -cpu host -m 16384 \
>>> >> -smp 16,sockets=2,cores=4,threads=2 \
>>> >> -object memory-backend-ram,size=8192M,id=ram-node0 \
>>> >> -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
>>> >> -object memory-backend-ram,size=8192M,id=ram-node1 \
>>> >> -numa node,nodeid=1,cpus=8-15,memdev=ram-node1 \
>>> >> -boot c -drive file=/data/wxin/vm/redhat_6.4_64 \
>>> >> -vnc 0.0.0.0:0 -device
>>> >> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x1.0x4 \
>>> >> -msg timestamp=on
>>> >> 
>>> >> (2)the guest kernel messages:
> Can you find what line of kernel/sched.c it is?

Yes, of course. See below please:
"sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power; "
in update_sg_lb_stats(), file sched.c, line 4094
And I can share the cause of we found. After commit 787aaf57(target-i386:
forward CPUID cache leaves when -cpu host is used), guest will get cpu cache
from host when -cpu host is used. But if we configure guest numa:
  node 0 cpus 0~7
  node 1 cpus 8~15
then the numa nodes lie in the same host cpu cache (cpus 0~16).
When the guest os boot, calculate group->cpu_power, but the guest find thoes
two different nodes own the same cache, then node1's group->cpu_power
will not be valued, just is the initial value '0'. And when vcpu is scheduled,
division by 0 causes kernel panic.

Regards,
-Gonglei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa
  2014-11-28  2:38   ` Gonglei
@ 2014-12-01  9:48     ` Paolo Bonzini
  2014-12-02  3:41       ` Gonglei
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2014-12-01  9:48 UTC (permalink / raw)
  To: Gonglei
  Cc: Huangweidong (C), benoit@irqsave.net, wangxin (U),
	qemu-devel@nongnu.org, Huangpeng (Peter), Herongguang (Stephen)



On 28/11/2014 03:38, Gonglei wrote:
>> > Can you find what line of kernel/sched.c it is?
> Yes, of course. See below please:
> "sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power; "
> in update_sg_lb_stats(), file sched.c, line 4094
> And I can share the cause of we found. After commit 787aaf57(target-i386:
> forward CPUID cache leaves when -cpu host is used), guest will get cpu cache
> from host when -cpu host is used. But if we configure guest numa:
>   node 0 cpus 0~7
>   node 1 cpus 8~15
> then the numa nodes lie in the same host cpu cache (cpus 0~16).
> When the guest os boot, calculate group->cpu_power, but the guest find thoes
> two different nodes own the same cache, then node1's group->cpu_power
> will not be valued, just is the initial value '0'. And when vcpu is scheduled,
> division by 0 causes kernel panic.

Thanks.  Please open a Red Hat bugzilla with the information, and Cc
Larry Woodman <lwoodman@redhat.com> who fixed a few instances of this in
the past.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa
  2014-12-01  9:48     ` Paolo Bonzini
@ 2014-12-02  3:41       ` Gonglei
  2014-12-02  3:43         ` Gonglei
  0 siblings, 1 reply; 7+ messages in thread
From: Gonglei @ 2014-12-02  3:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), benoit@irqsave.net, wangxin (U),
	qemu-devel@nongnu.org, Huangpeng (Peter), Herongguang (Stephen)

On 2014/12/1 17:48, Paolo Bonzini wrote:

> 
> 
> On 28/11/2014 03:38, Gonglei wrote:
>>>> Can you find what line of kernel/sched.c it is?
>> Yes, of course. See below please:
>> "sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power; "
>> in update_sg_lb_stats(), file sched.c, line 4094
>> And I can share the cause of we found. After commit 787aaf57(target-i386:
>> forward CPUID cache leaves when -cpu host is used), guest will get cpu cache
>> from host when -cpu host is used. But if we configure guest numa:
>>   node 0 cpus 0~7
>>   node 1 cpus 8~15
>> then the numa nodes lie in the same host cpu cache (cpus 0~16).
>> When the guest os boot, calculate group->cpu_power, but the guest find thoes
>> two different nodes own the same cache, then node1's group->cpu_power
>> will not be valued, just is the initial value '0'. And when vcpu is scheduled,
>> division by 0 causes kernel panic.
> 
> Thanks.  Please open a Red Hat bugzilla with the information, and Cc
> Larry Woodman <lwoodman@redhat.com> who fixed a few instances of this in
> the past.
> 

Hi, Paolo

A bug has been reported:
https://bugzilla.redhat.com/process_bug.cgi

Regards,
-Gonglei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa
  2014-12-02  3:41       ` Gonglei
@ 2014-12-02  3:43         ` Gonglei
  0 siblings, 0 replies; 7+ messages in thread
From: Gonglei @ 2014-12-02  3:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), benoit@irqsave.net, wangxin (U),
	qemu-devel@nongnu.org, Huangpeng (Peter), Herongguang (Stephen)

On 2014/12/2 11:41, Gonglei wrote:

> Hi, Paolo
> 
> A bug has been reported:

https://bugzilla.redhat.com/show_bug.cgi?id=1169577

Regards,
-Gonglei

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-12-02  3:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-27 13:00 [Qemu-devel] [BUG] Redhat-6.4_64bit-guest kernel panic with cpu-passthrough and guest numa Gonglei (Arei)
2014-11-27 17:20 ` Paolo Bonzini
2014-11-28  2:38   ` Gonglei
2014-12-01  9:48     ` Paolo Bonzini
2014-12-02  3:41       ` Gonglei
2014-12-02  3:43         ` Gonglei
  -- strict thread matches above, loose matches on Subject: below --
2014-11-27 12:58 Gonglei (Arei)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).