* BUG at drivers/xen/balloon.c:353
[not found] <E1VJGJg-0001Pu-Fw@woking.cam.xci-test.com>
@ 2013-09-10 16:12 ` Ian Jackson
2013-09-10 16:21 ` Jan Beulich
2013-09-10 16:30 ` BUG at drivers/xen/balloon.c:353 David Vrabel
0 siblings, 2 replies; 6+ messages in thread
From: Ian Jackson @ 2013-09-10 16:12 UTC (permalink / raw)
To: xen-devel
Now that we are on Linux 3.4.y, I am once again trying to commission
my pair of weevils. Things are better than they were but it's still
totally broken.
Below you can see the logs from my adhoc osstest flight 19169. I
looked at one of the save/restore failures (test-amd64-i386-xl) and
see this:
Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at drivers/xen/balloon.c:353!
I'd appreciate any help available with debugging this. The logs are
here:
http://www.chiark.greenend.org.uk/~xensrcts/logs/19169/test-amd64-i386-xl/info.html
I looked at the code in balloon.c:
BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
phys_to_machine_mapping_valid(pfn));
Ian.
Sep 10 00:03:08.629133 [ 240.142955] ------------[ cut here ]------------
Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at drivers/xen/balloon.c:353!
Sep 10 00:03:09.073102 [ 240.142974] invalid opcode: 0000 [#1] SMP
Sep 10 00:03:09.081064 [ 240.142978] Modules linked in: xen_acpi_processor xen_gntalloc ext4 jbd2 mbcache e1000e
Sep 10 00:03:09.093059 [ 240.142987] CPU: 6 PID: 847 Comm: kworker/6:1 Not tainted 3.11.0+ #1
Sep 10 00:03:09.093098 [ 240.142991] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0b 09/17/2012
Sep 10 00:03:09.101077 [ 240.142998] Workqueue: events balloon_process
Sep 10 00:03:09.113048 [ 240.143001] task: db4ee680 ti: d59de000 task.ti: d59de000
Sep 10 00:03:09.113084 [ 240.143004] EIP: 0061:[<c1306e83>] EFLAGS: 00010207 CPU: 6
Sep 10 00:03:09.121057 [ 240.143008] EIP is at balloon_process+0x343/0x350
Sep 10 00:03:09.121091 [ 240.143011] EAX: 0012042f EBX: 00006a6f ECX: c1a67000 EDX: c7146000
Sep 10 00:03:09.133054 [ 240.143014] ESI: 00000000 EDI: dfb3b000 EBP: d59dfef4 ESP: d59dfebc
Sep 10 00:03:09.133093 [ 240.143017] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Sep 10 00:03:09.141065 [ 240.143022] CR0: 80050033 CR2: b763c09c CR3: 06d8b000 CR4: 00042660
Sep 10 00:03:09.153047 [ 240.143027] Stack:
Sep 10 00:03:09.153074 [ 240.143029] 00000037 c19a4f40 00000001 00000001 00000000 dfc0fde0 c1a4d440 00000001
Sep 10 00:03:09.161064 [ 240.143035] 00000000 00000000 00007ff0 c18bebc0 dfb26ac0 dad1d000 d59dff30 c10b1f0b
Sep 10 00:03:09.161112 [ 240.143042] 80931d8c 00000000 c19a4f40 dfb26f40 db4ee680 db4ecbd2 dfb27d05 dfb27d00
Sep 10 00:03:09.173069 [ 240.143049] Call Trace:
Sep 10 00:03:09.173096 [ 240.143064] [<c10b1f0b>] process_one_work+0x13b/0x440
Sep 10 00:03:09.181069 [ 240.143073] [<c10b4299>] worker_thread+0xf9/0x490
Sep 10 00:03:09.181105 [ 240.143081] [<c10b8fdc>] kthread+0x9c/0xb0
Sep 10 00:03:09.189064 [ 240.143088] [<c10b41a0>] ? manage_workers+0x310/0x310
Sep 10 00:03:09.189100 [ 240.143098] [<c169eaf7>] ret_from_kernel_thread+0x1b/0x28
Sep 10 00:03:09.201066 [ 240.143106] [<c10b8f40>] ? kthread_freezable_should_stop+0x60/0x60
Sep 10 00:03:09.209055 [ 240.143113] Code: ff eb ca 0f 0b eb fe 89 45 cc e8 39 e8 38 00 8b 45 cc e9 25 fd ff ff 0f 0b eb fe 89 d8 e8 46 98 d4 ff 83 f8 ff 0f 84 a9 fe ff ff <0f> 0b eb fe 89 f6 8d bc 27 00 00 00 00 55 89 e5 57 89 cf 56 53
Sep 10 00:03:09.221085 [ 240.143235] EIP: [<c1306e83>] balloon_process+0x343/0x350 SS:ESP 0069:d59dfebc
Sep 10 00:03:09.229071 [ 240.143248] ---[ end trace 85d5730f0a872188 ]---
Sep 10 00:03:09.249100 [ 240.143324] Oops: 0000 [#2] SMP
Sep 10 00:03:09.261054 [ 240.143330] Modules linked in: xen_acpi_processor xen_gntalloc ext4 jbd2 mbcache e1000e
Sep 10 00:03:09.261104 [ 240.143347] CPU: 6 PID: 847 Comm: kworker/6:1 Tainted: G D 3.11.0+ #1
Sep 10 00:03:09.269082 [ 240.143354] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0b 09/17/2012
Sep 10 00:03:09.281072 [ 240.143370] task: db4ee680 ti: d59de000 task.ti: d59de000
Sep 10 00:03:09.289055 [ 240.143376] EIP: 0061:[<c10b8c8a>] EFLAGS: 00010002 CPU: 6
Sep 10 00:03:09.289091 [ 240.143383] EIP is at kthread_data+0xa/0x10
Sep 10 00:03:09.301048 [ 240.143389] EAX: 00000000 EBX: 00000006 ECX: 00000000 EDX: 00000006
Sep 10 00:03:09.301109 [ 240.143396] ESI: 00000006 EDI: db4ee680 EBP: d59dfcb0 ESP: d59dfca8
Sep 10 00:03:09.309063 [ 240.143403] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Sep 10 00:03:09.309099 [ 240.143410] CR0: 80050033 CR2: 00000014 CR3: 06d8b000 CR4: 00042660
Sep 10 00:03:09.321062 [ 240.143419] Stack:
Sep 10 00:03:09.321088 [ 240.143423] c10b032b db4ee934 d59dfd40 c169516a c10b0964 00000000 c127e918 db4ee680
Sep 10 00:03:09.329073 [ 240.143437] c19a4f40 c19a4f40 db472208 00000027 c19a4f40 db472180 00000000 c19a4f40
Sep 10 00:03:09.341057 [ 240.143452] dfb26f40 db4ee680 d59dfd08 c10b0bae 00000008 c6a1284c c6a1284c c6a12840
Sep 10 00:03:09.349054 [ 240.143466] Call Trace:
Sep 10 00:03:09.349081 [ 240.143473] [<c10b032b>] ? wq_worker_sleeping+0xb/0x80
Sep 10 00:03:09.349121 [ 240.143483] [<c169516a>] __schedule+0x54a/0x830
Sep 10 00:03:09.361054 [ 240.143492] [<c10b0964>] ? __queue_work+0x144/0x310
Sep 10 00:03:09.361089 [ 240.143500] [<c127e918>] ? cfq_put_queue+0x58/0xd0
Sep 10 00:03:09.369061 [ 240.143509] [<c10b0bae>] ? queue_work_on+0x5e/0xa0
Sep 10 00:03:09.369096 [ 240.143517] [<c126eab7>] ? put_io_context+0x67/0x90
Sep 10 00:03:09.381055 [ 240.143524] [<c126eb7b>] ? put_io_context_active+0x9b/0xc0
Sep 10 00:03:09.381091 [ 240.143533] [<c16956be>] schedule+0x1e/0x50
Sep 10 00:03:09.389061 [ 240.143541] [<c109b55f>] do_exit+0x61f/0x990
Sep 10 00:03:09.389095 [ 240.143549] [<c1692de8>] ? printk+0x38/0x3a
Sep 10 00:03:09.401050 [ 240.143557] [<c1698d38>] oops_end+0x98/0x150
Sep 10 00:03:09.401083 [ 240.143566] [<c105a77f>] die+0x4f/0x70
Sep 10 00:03:09.401118 [ 240.143573] [<c1698615>] do_trap+0x95/0xc0
Sep 10 00:03:09.409066 [ 240.143580] [<c1057e20>] ? do_coprocessor_segment_overrun+0x80/0x80
Sep 10 00:03:09.421047 [ 240.143589] [<c1057eb6>] do_invalid_op+0x96/0xb0
Sep 10 00:03:09.421082 [ 240.143597] [<c1306e83>] ? balloon_process+0x343/0x350
Sep 10 00:03:09.429058 [ 240.143605] [<c104c00d>] ? arbitrary_virt_to_machine+0x9d/0xc0
Sep 10 00:03:09.429096 [ 240.143614] [<c104b115>] ? xen_mc_flush+0xe5/0x1f0
Sep 10 00:03:09.441053 [ 240.143622] [<c104a012>] ? xen_end_context_switch+0x12/0x20
Sep 10 00:03:09.441091 [ 240.143631] [<c10564f8>] ? __switch_to+0x158/0x440
Sep 10 00:03:09.449060 [ 240.143638] [<c16983ca>] error_code+0x5a/0x60
Sep 10 00:03:09.449093 [ 240.143647] [<c114007b>] ? generic_file_aio_read+0x1bb/0x710
Sep 10 00:03:09.461053 [ 240.143656] [<c1057e20>] ? do_coprocessor_segment_overrun+0x80/0x80
Sep 10 00:03:09.461092 [ 240.143664] [<c1306e83>] ? balloon_process+0x343/0x350
Sep 10 00:03:09.469068 [ 240.143673] [<c10b1f0b>] process_one_work+0x13b/0x440
Sep 10 00:03:09.469104 [ 240.143681] [<c10b4299>] worker_thread+0xf9/0x490
Sep 10 00:03:09.481059 [ 240.143689] [<c10b8fdc>] kthread+0x9c/0xb0
Sep 10 00:03:09.481092 [ 240.143695] [<c10b41a0>] ? manage_workers+0x310/0x310
Sep 10 00:03:09.489066 [ 240.143704] [<c169eaf7>] ret_from_kernel_thread+0x1b/0x28
Sep 10 00:03:09.489103 [ 240.143717] [<c10b8f40>] ? kthread_freezable_should_stop+0x60/0x60
Sep 10 00:03:09.501065 [ 240.143727] Code: 00 55 64 a1 8c ef 99 c1 8b 80 88 02 00 00 89 e5 5d 8b 40 e4 c1 e8 02 83 e0 01 c3 8d b6 00 00 00 00 55 8b 80 88 02 00 00 89 e5 5d <8b> 40 ec c3 66 90 55 31 c0 89 e5 5d c3 89 f6 8d bc 27 00 00 00
Sep 10 00:03:09.521061 [ 240.143821] EIP: [<c10b8c8a>] kthread_data+0xa/0x10 SS:ESP 0069:d59dfca8
Sep 10 00:03:09.529068 [ 240.143832] CR2: 00000000ffffffec
Sep 10 00:03:09.529099 [ 240.143838] ---[ end trace 85d5730f0a872189 ]---
Sep 10 00:03:09.529135 [ 240.143844] Fixing recursive fault but reboot is needed!
Sep 10 00:03:09.537055 [ 261.145745] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 3, t=21002 jiffies, g=15652, c=15651, q=545)
Sep 10 00:03:30.085066 [ 261.145785] sending NMI to all CPUs:
Sep 10 00:03:30.085102 [ 261.145792] xen: vector 0x2 is not implemented
Sep 10 00:03:30.085138 [ 324.150727] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=84007 jiffies, g=15652, c=15651, q=691)
Sep 10 00:04:33.089113 [ 324.150760] sending NMI to all CPUs:
Sep 10 00:04:33.089155 [ 324.150766] xen: vector 0x2 is not implemented
Sep 10 00:04:33.089186 [ 387.155740] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=147012 jiffies, g=15652, c=15651, q=715)
Sep 10 00:05:36.097069 [ 387.155772] sending NMI to all CPUs:
Sep 10 00:05:36.097106 [ 387.155778] xen: vector 0x2 is not implemented
Sep 10 00:05:36.097142 [ 450.160728] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=210017 jiffies, g=15652, c=15651, q=728)
Sep 10 00:06:39.101065 [ 450.160761] sending NMI to all CPUs:
Sep 10 00:06:39.101099 [ 450.160767] xen: vector 0x2 is not implemented
Sep 10 00:06:39.101136 Sep 10 00:06:43.943216 <client 0x8061350 connected - now 1 clients>
> [adhoc commission-weevil] <testing.git wip /dev/pts/25>
> harness b5a3ebc: ts-logs-capture: retry log capture after doing power ...
> 19169: regressions - trouble: broken/fail/pass
>
> flight 19169 xen-unstable commission-weevil [commission-weevil]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/19169/
>
> Regressions :-(
[ Disregard the "broken" entries, which are due to the weevils being
AMD so the host allocation part of the test failed. ]
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> build-armhf 1 hosts-allocate broken REGR. vs. 19155
> build-armhf 2 capture-logs !broken [st=!broken!]
> test-amd64-amd64-xl-pcipt-intel 2 hosts-allocate broken REGR. vs. 19155
> test-amd64-i386-qemut-rhel6hvm-intel 7 redhat-install fail REGR. vs. 19155
> test-amd64-amd64-xl-sedf 10 guest-saverestore fail REGR. vs. 19155
> test-amd64-amd64-pv 10 guest-saverestore fail REGR. vs. 19155
> test-amd64-i386-xl 10 guest-saverestore fail REGR. vs. 19155
> test-amd64-i386-rhel6hvm-intel 7 redhat-install fail REGR. vs. 19155
> test-amd64-amd64-xl 10 guest-saverestore fail REGR. vs. 19155
> test-amd64-amd64-xl-sedf-pin 10 guest-saverestore fail REGR. vs. 19155
> test-amd64-i386-qemuu-rhel6hvm-amd 2 hosts-allocate broken REGR. vs. 19155
> test-amd64-i386-qemut-rhel6hvm-amd 2 hosts-allocate broken REGR. vs. 19155
> test-amd64-i386-rhel6hvm-amd 2 hosts-allocate broken REGR. vs. 19155
> test-amd64-amd64-xl-qemuu-win7-amd64 7 windows-install fail REGR. vs. 19155
> test-amd64-i386-xl-credit2 10 guest-saverestore fail REGR. vs. 19155
> test-amd64-amd64-pair 18 guest-migrate/dst_host/src_host fail REGR. vs. 19155
> test-amd64-i386-qemuu-rhel6hvm-intel 7 redhat-install fail REGR. vs. 19155
> test-amd64-i386-pv 10 guest-saverestore fail REGR. vs. 19155
> test-amd64-i386-xl-multivcpu 10 guest-saverestore fail REGR. vs. 19155
> test-amd64-i386-xl-winxpsp3-vcpus1 7 windows-install fail REGR. vs. 19155
> test-amd64-i386-pair 18 guest-migrate/dst_host/src_host fail REGR. vs. 19155
> test-amd64-i386-xend-winxpsp3 7 windows-install fail REGR. vs. 19155
> test-amd64-i386-xl-qemut-winxpsp3-vcpus1 7 windows-install fail REGR. vs. 19155
> test-amd64-amd64-xl-qemuu-winxpsp3 7 windows-install fail REGR. vs. 19155
> test-amd64-i386-xend-qemut-winxpsp3 7 windows-install fail REGR. vs. 19155
> test-amd64-amd64-xl-qemut-win7-amd64 7 windows-install fail REGR. vs. 19155
> test-amd64-amd64-xl-qemut-winxpsp3 7 windows-install fail REGR. vs. 19155
> test-amd64-amd64-xl-winxpsp3 7 windows-install fail REGR. vs. 19155
> test-amd64-amd64-xl-win7-amd64 7 windows-install fail REGR. vs. 19155
> test-amd64-i386-xl-qemut-win7-amd64 7 windows-install fail REGR. vs. 19155
> test-amd64-i386-xl-win7-amd64 7 windows-install fail REGR. vs. 19155
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG at drivers/xen/balloon.c:353
2013-09-10 16:12 ` BUG at drivers/xen/balloon.c:353 Ian Jackson
@ 2013-09-10 16:21 ` Jan Beulich
2013-09-10 17:23 ` BUG at drivers/xen/balloon.c:353 [and 3 more messages] Ian Jackson
2013-09-10 16:30 ` BUG at drivers/xen/balloon.c:353 David Vrabel
1 sibling, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2013-09-10 16:21 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
>>> On 10.09.13 at 18:12, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Now that we are on Linux 3.4.y, I am once again trying to commission
> my pair of weevils. Things are better than they were but it's still
> totally broken.
>
> Below you can see the logs from my adhoc osstest flight 19169. I
> looked at one of the save/restore failures (test-amd64-i386-xl) and
> see this:
>
> Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at
> drivers/xen/balloon.c:353!
>
> I'd appreciate any help available with debugging this. The logs are
> here:
>
>
> http://www.chiark.greenend.org.uk/~xensrcts/logs/19169/test-amd64-i386-xl/info
> .html
>
> I looked at the code in balloon.c:
>
> BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
> phys_to_machine_mapping_valid(pfn));
>
> Ian.
>
> Sep 10 00:03:08.629133 [ 240.142955] ------------[ cut here ]------------
> Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at drivers/xen/balloon.c:353!
> Sep 10 00:03:09.073102 [ 240.142974] invalid opcode: 0000 [#1] SMP
> Sep 10 00:03:09.081064 [ 240.142978] Modules linked in: xen_acpi_processor xen_gntalloc ext4 jbd2 mbcache e1000e
> Sep 10 00:03:09.093059 [ 240.142987] CPU: 6 PID: 847 Comm: kworker/6:1 Not tainted 3.11.0+ #1
But that's 3.11, and we had seen this very failure in logs on other
hosts. Did you try 3.4 on those systems?
Jan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG at drivers/xen/balloon.c:353
2013-09-10 16:12 ` BUG at drivers/xen/balloon.c:353 Ian Jackson
2013-09-10 16:21 ` Jan Beulich
@ 2013-09-10 16:30 ` David Vrabel
2013-09-10 16:52 ` Boris Ostrovsky
1 sibling, 1 reply; 6+ messages in thread
From: David Vrabel @ 2013-09-10 16:30 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel
On 10/09/13 17:12, Ian Jackson wrote:
> Now that we are on Linux 3.4.y, I am once again trying to commission
> my pair of weevils. Things are better than they were but it's still
> totally broken.
>
> Below you can see the logs from my adhoc osstest flight 19169. I
> looked at one of the save/restore failures (test-amd64-i386-xl) and
> see this:
>
> Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at drivers/xen/balloon.c:353!
>
> I'd appreciate any help available with debugging this. The logs are
> here:
>
> http://www.chiark.greenend.org.uk/~xensrcts/logs/19169/test-amd64-i386-xl/info.html
>
> I looked at the code in balloon.c:
>
> BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
> phys_to_machine_mapping_valid(pfn));
>
> Ian.
>
> Sep 10 00:03:08.629133 [ 240.142955] ------------[ cut here ]------------
> Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at drivers/xen/balloon.c:353!
> Sep 10 00:03:09.073102 [ 240.142974] invalid opcode: 0000 [#1] SMP
> Sep 10 00:03:09.081064 [ 240.142978] Modules linked in: xen_acpi_processor xen_gntalloc ext4 jbd2 mbcache e1000e
> Sep 10 00:03:09.093059 [ 240.142987] CPU: 6 PID: 847 Comm: kworker/6:1 Not tainted 3.11.0+ #1
There's a know bug in this area with 3.11+ (3.12-rc0) and there is
pending pull request for the fix.
I am a bit confused by the reference to 3.4.y above. Is this BUG in a
guest running 3.11+ on a host with 3.4.y as dom0? Or have you
accidentally installed 3.11+ as your dom0 kernel?
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG at drivers/xen/balloon.c:353
2013-09-10 16:30 ` BUG at drivers/xen/balloon.c:353 David Vrabel
@ 2013-09-10 16:52 ` Boris Ostrovsky
2013-09-10 17:01 ` David Vrabel
0 siblings, 1 reply; 6+ messages in thread
From: Boris Ostrovsky @ 2013-09-10 16:52 UTC (permalink / raw)
To: David Vrabel; +Cc: Ian Jackson, xen-devel
On 09/10/2013 12:30 PM, David Vrabel wrote:
> On 10/09/13 17:12, Ian Jackson wrote:
>> Now that we are on Linux 3.4.y, I am once again trying to commission
>> my pair of weevils. Things are better than they were but it's still
>> totally broken.
>>
>> Below you can see the logs from my adhoc osstest flight 19169. I
>> looked at one of the save/restore failures (test-amd64-i386-xl) and
>> see this:
>>
>> Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at drivers/xen/balloon.c:353!
>>
>> I'd appreciate any help available with debugging this. The logs are
>> here:
>>
>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19169/test-amd64-i386-xl/info.html
>>
>> I looked at the code in balloon.c:
>>
>> BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
>> phys_to_machine_mapping_valid(pfn));
>>
>> Ian.
>>
>> Sep 10 00:03:08.629133 [ 240.142955] ------------[ cut here ]------------
>> Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at drivers/xen/balloon.c:353!
>> Sep 10 00:03:09.073102 [ 240.142974] invalid opcode: 0000 [#1] SMP
>> Sep 10 00:03:09.081064 [ 240.142978] Modules linked in: xen_acpi_processor xen_gntalloc ext4 jbd2 mbcache e1000e
>> Sep 10 00:03:09.093059 [ 240.142987] CPU: 6 PID: 847 Comm: kworker/6:1 Not tainted 3.11.0+ #1
> There's a know bug in this area with 3.11+ (3.12-rc0) and there is
> pending pull request for the fix.
If you are thinking about preempt count bug then its' likely a different
one.
-boris
>
> I am a bit confused by the reference to 3.4.y above. Is this BUG in a
> guest running 3.11+ on a host with 3.4.y as dom0? Or have you
> accidentally installed 3.11+ as your dom0 kernel?
>
> David
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG at drivers/xen/balloon.c:353
2013-09-10 16:52 ` Boris Ostrovsky
@ 2013-09-10 17:01 ` David Vrabel
0 siblings, 0 replies; 6+ messages in thread
From: David Vrabel @ 2013-09-10 17:01 UTC (permalink / raw)
To: Boris Ostrovsky; +Cc: Ian Jackson, David Vrabel, xen-devel
On 10/09/13 17:52, Boris Ostrovsky wrote:
> On 09/10/2013 12:30 PM, David Vrabel wrote:
>> On 10/09/13 17:12, Ian Jackson wrote:
>>> Now that we are on Linux 3.4.y, I am once again trying to commission
>>> my pair of weevils. Things are better than they were but it's still
>>> totally broken.
>>>
>>> Below you can see the logs from my adhoc osstest flight 19169. I
>>> looked at one of the save/restore failures (test-amd64-i386-xl) and
>>> see this:
>>>
>>> Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at
>>> drivers/xen/balloon.c:353!
>>>
>>> I'd appreciate any help available with debugging this. The logs are
>>> here:
>>>
>>>
>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/19169/test-amd64-i386-xl/info.html
>>>
>>>
>>> I looked at the code in balloon.c:
>>>
>>> BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) &&
>>> phys_to_machine_mapping_valid(pfn));
>>>
>>> Ian.
>>>
>>> Sep 10 00:03:08.629133 [ 240.142955] ------------[ cut here
>>> ]------------
>>> Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at
>>> drivers/xen/balloon.c:353!
>>> Sep 10 00:03:09.073102 [ 240.142974] invalid opcode: 0000 [#1] SMP
>>> Sep 10 00:03:09.081064 [ 240.142978] Modules linked in:
>>> xen_acpi_processor xen_gntalloc ext4 jbd2 mbcache e1000e
>>> Sep 10 00:03:09.093059 [ 240.142987] CPU: 6 PID: 847 Comm:
>>> kworker/6:1 Not tainted 3.11.0+ #1
>> There's a know bug in this area with 3.11+ (3.12-rc0) and there is
>> pending pull request for the fix.
>
> If you are thinking about preempt count bug then its' likely a different
> one.
No, I'm thinking of Wei's "xen/balloon: remove BUG_ON in
increase_reservation" which actually doesn't appear to be queued up.
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG at drivers/xen/balloon.c:353 [and 3 more messages]
2013-09-10 16:21 ` Jan Beulich
@ 2013-09-10 17:23 ` Ian Jackson
0 siblings, 0 replies; 6+ messages in thread
From: Ian Jackson @ 2013-09-10 17:23 UTC (permalink / raw)
To: Boris Ostrovsky, David Vrabel, Jan Beulich; +Cc: xen-devel
Jan Beulich writes ("Re: [Xen-devel] BUG at drivers/xen/balloon.c:353"):
> On 10.09.13 at 18:12, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> > Sep 10 00:03:08.629133 [ 240.142955] ------------[ cut here ]------------
> > Sep 10 00:03:09.073066 [ 240.142970] kernel BUG at drivers/xen/balloon.c:353!
> > Sep 10 00:03:09.073102 [ 240.142974] invalid opcode: 0000 [#1] SMP
> > Sep 10 00:03:09.081064 [ 240.142978] Modules linked in: xen_acpi_processor xen_gntalloc ext4 jbd2 mbcache e1000e
> > Sep 10 00:03:09.093059 [ 240.142987] CPU: 6 PID: 847 Comm: kworker/6:1 Not tainted 3.11.0+ #1
>
> But that's 3.11, and we had seen this very failure in logs on other
> hosts. Did you try 3.4 on those systems?
I _intended_ to try 3.4. How odd.
...
Hmm, I have fixed my tools to really pick the right branch. I will
try again. Sorry for the noise.
Ian.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-09-10 17:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1VJGJg-0001Pu-Fw@woking.cam.xci-test.com>
2013-09-10 16:12 ` BUG at drivers/xen/balloon.c:353 Ian Jackson
2013-09-10 16:21 ` Jan Beulich
2013-09-10 17:23 ` BUG at drivers/xen/balloon.c:353 [and 3 more messages] Ian Jackson
2013-09-10 16:30 ` BUG at drivers/xen/balloon.c:353 David Vrabel
2013-09-10 16:52 ` Boris Ostrovsky
2013-09-10 17:01 ` David Vrabel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).