* Xen 3.2.1-rc1: FATAL PAGE FAULT
@ 2008-04-03 4:34 Christopher S. Aker
2008-04-03 14:04 ` Christopher S. Aker
0 siblings, 1 reply; 10+ messages in thread
From: Christopher S. Aker @ 2008-04-03 4:34 UTC (permalink / raw)
To: xen devel
Xen: 3.2.1-rc1 (I can get the exact changeset if needed)
domU: 2.6.16.33 PAE
(XEN) ----[ Xen-3.2.1-rc1 x86_64 debug=y Not tainted ]----
(XEN) CPU: 3
(XEN) RIP: e008:[<ffff828c8013dee4>] put_page_type+0x17/0x107
(XEN) RFLAGS: 0000000000210282 CONTEXT: hypervisor
(XEN) rax: 00001c9f2d2abca8 rbx: ffff9f232d2abca8 rcx: 0000000080000000
(XEN) rdx: 000000b72dedde51 rsi: 00000000002f25fd rdi: ffff9f232d2abca8
(XEN) rbp: ffff8300cee0fcb8 rsp: ffff8300cee0fc98 r8: 0000000000000000
(XEN) r9: 00000000deadbeef r10: ffff828c801c5bf0 r11: 0000000000000000
(XEN) r12: 0000000000000000 r13: ffff9f232d2abca8 r14: ffff8300cfc84100
(XEN) r15: ffff8300cfc84118 cr0: 000000008005003b cr4: 00000000000026b0
(XEN) cr3: 000000062ffd7000 cr2: ffff9f232d2abcc0
(XEN) ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff8300cee0fc98:
(XEN) ffff8300cee0fcd8 ffff9f232d2abca8 0000000000000000
00000000002f25fd
(XEN) ffff8300cee0fcd8 ffff828c8013b409 ffff8300cfc850f8
ffff8302f25fd000
(XEN) ffff8300cee0fd08 ffff828c8013c06d ffff8300cfc84100
ffff8284075def88
(XEN) 0000000068000001 ffff8300cfc850f8 ffff8300cee0fd38
ffff828c8013de5a
(XEN) 0000000060000001 0000000068000000 ffff8284075def88
ffff8300cfc850f8
(XEN) ffff8300cee0fd68 ffff828c8013df63 ffff8284075def88
ffff8284075def88
(XEN) ffff8284075def88 ffff8300cfc84100 ffff8300cee0fdb8
ffff828c80131680
(XEN) 0000000088000000 0000000080000000 ffff8300cee0ff28
ffff8300cfc84100
(XEN) ffff8300cfc84100 00000000b31fc868 0000000000000000
0000000000000000
(XEN) ffff8300cee0fdd8 ffff828c80131a94 ffff8300cfc84100
0000000000000000
(XEN) ffff8300cee0fe08 ffff828c80105638 ffff8300cee0fe18
ffff828c80114d70
(XEN) 00000000b31fc868 fffffffffffffff3 ffff8300cee0ff08
ffff828c8010479f
(XEN) ffff8300cee0fe48 ffff8300cee34130 0000000000000003
0001b932a9ddc50a
(XEN) 0000000000200282 0000000000000000 0000000500000002
083ca594b7b50067
(XEN) 0832ab4c011fc898 b7dadc50b7b5d68c b7a733e400000001
00000001b79fccdc
(XEN) 080facafb31fc8c8 081361e008313e98 080797e7b76c1934
00000000b76c1950
(XEN) b7da802c00000060 b761db6c00000000 0805946cb31fc8e8
b7da802cb761db6c
(XEN) b7dab6a000000000 00000002b5d5451c a5dba1eea5dba1ee
0000001f00000000
(XEN) ffff8300cee0fee8 ffff8300cee34100 0000000000000000
0000000000000000
(XEN) 0000000000000000 0000000000000000 00007cff311f00b7
ffff828c801bdd50
(XEN) Xen call trace:
(XEN) [<ffff828c8013dee4>] put_page_type+0x17/0x107
(XEN) [<ffff828c8013b409>] put_page_from_l3e+0x3f/0x4e
(XEN) [<ffff828c8013c06d>] free_l3_table+0x78/0xc4
(XEN) [<ffff828c8013de5a>] free_page_type+0x1d4/0x247
(XEN) [<ffff828c8013df63>] put_page_type+0x96/0x107
(XEN) [<ffff828c80131680>] relinquish_memory+0xce/0x262
(XEN) [<ffff828c80131a94>] domain_relinquish_resources+0xd1/0x1b0
(XEN) [<ffff828c80105638>] domain_kill+0x77/0x164
(XEN) [<ffff828c8010479f>] do_domctl+0x4dd/0xc1e
(XEN) [<ffff828c801bdd50>] compat_tracing_off+0xb/0x64
(XEN)
(XEN) Pagetable walk from ffff9f232d2abcc0:
(XEN) L4[0x13e] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff9f232d2abcc0
(XEN) ****************************************
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Xen 3.2.1-rc1: FATAL PAGE FAULT
2008-04-03 4:34 Xen 3.2.1-rc1: FATAL PAGE FAULT Christopher S. Aker
@ 2008-04-03 14:04 ` Christopher S. Aker
2008-04-03 15:55 ` Keir Fraser
0 siblings, 1 reply; 10+ messages in thread
From: Christopher S. Aker @ 2008-04-03 14:04 UTC (permalink / raw)
To: xen devel; +Cc: Keir Fraser
Keir Fraser wrote:
> On 3/4/08 14:27, "Christopher S. Aker" <caker@theshore.net> wrote:
>
>> I misspoke, dom0 was the 2.6.16.33. domUs were a mix of 2.6.24.3
>> pv_ops, and 2.6.18.8. We have about a dozen of these boxes deployed
>> with this version, each with 30-40 domains just doing their thing --
>> nothing crazy.
>
> That's interesting. 2.6.24 is less tested than other Linux kernels, and
> being pv_ops it is quite different. It's not unlikely to have corner-case
> bugs that crash it or, worst case, tickle dormant problems in the
hypervisor
> itself.
>
>> Maybe the symbols would help just a little bit? In any case, here are
>> the files:
>>
>> http://theshore.net/~caker/xen/BUGfatal_page_fault/
>
> I will take a look. It might help narrow down the possibilities a bit.
>
>> I guess I'll set up a thrash test environment full of nothing but
>> domains looping crashme and make -j kernel builds and the like. Sounds
>> like fun.
>
> Okay, is this a bug you've seen exactly once so far? That would be
annoying!
So far just the one time.
We just took Xen out of (a three year) beta, and so we're gearing up for
a large deployment and need to eliminate any potential host/hypervisor
crashes. I can deal with domain bugs, but having the whole box go down
is painful. Needless to say, I'm anxious to get this fixed, and will
help in any way I can. Can I provide anything else that you can think of?
In the meantime, we'll work up a thrash-xen box.
Thanks,
-Chris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Xen 3.2.1-rc1: FATAL PAGE FAULT
2008-04-03 14:04 ` Christopher S. Aker
@ 2008-04-03 15:55 ` Keir Fraser
2008-04-22 18:19 ` Xen 3.2.1-rc5: " Christopher S. Aker
0 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2008-04-03 15:55 UTC (permalink / raw)
To: Christopher S. Aker, xen devel
On 3/4/08 15:04, "Christopher S. Aker" <caker@theshore.net> wrote:
>>> Maybe the symbols would help just a little bit? In any case, here are
>>> the files:
>>>
>>> http://theshore.net/~caker/xen/BUGfatal_page_fault/
>>
>> I will take a look. It might help narrow down the possibilities a bit.
>>
My analysis is that the hypervisor crashed because one of the entries in a
dying guest's third-level page directory has the present bit (bit 0) set,
yet the physical address mapped by that entry is 0xb72dedde51000. That is a
rather large and obviously bogus number! It causes us to access way off the
end of an array indexed by physical address, resulting in a fatal page
fault.
Obviously the question is: Where did the bogus address come from?
That's going to be rather hard to answer without finding a more reliable
repro of the bug, and then adding some hypervisor tracing.
-- Keir
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Xen 3.2.1-rc5: FATAL PAGE FAULT
2008-04-03 15:55 ` Keir Fraser
@ 2008-04-22 18:19 ` Christopher S. Aker
2008-04-22 18:46 ` Keir Fraser
0 siblings, 1 reply; 10+ messages in thread
From: Christopher S. Aker @ 2008-04-22 18:19 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen devel
Keir Fraser wrote:
> That's going to be rather hard to answer without finding a more reliable
> repro of the bug, and then adding some hypervisor tracing.
Here are two more Xen traces with this problem. These always appear to
occur after we're forced to destroy a domain. The first trace is a
DoubleDump<tm> and has something new in the second dump...
http://www.theshore.net/~caker/xen/build-1.11/
I still don't have a method to reproduce, but since we're hitting this
with some frequency, would it be worth it to stick in some extra
debugging now?
====== First trace ======
----[ Xen-3.2.1-rc5 x86_64 debug=y Not tainted ]----
CPU: 1
RIP: e008:[<ffff828c8013dee4>] put_page_type+0x17/0x107
RFLAGS: 0000000000210286 CONTEXT: hypervisor
rax: 00001da2f4162bf0 rbx: ffffa026f4162bf0 rcx: 0000000080000000
rdx: 000000bdac808de6 rsi: 0000000000402fe3 rdi: ffffa026f4162bf0
rbp: ffff8300cf13fbf8 rsp: ffff8300cf13fbd8 r8: 0000000000000000
r9: 00000000deadbeef r10: ffff828c801c5bf0 r11: 0000000000000000
r12: 0000000000000000 r13: ffffa026f4162bf0 r14: 0000000000402fe3
r15: ffff82840a077b78 cr0: 000000008005003b cr4: 00000000000026b0
cr3: 000000062ffdd000 cr2: ffffa026f4162c08
ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0000 cs: e008
Xen stack trace from rsp=ffff8300cf13fbd8:
0000000000000002 ffffa026f4162bf0 0000000000000000 ffff8300cee48100
ffff8300cf13fc18 ffff828c8013b3bb 0000000000200202 ffff830402fe3000
ffff8300cf13fc58 ffff828c8013bfcd 00000000cee48100 ffff8300cee48100
ffff82840a077b78 000000004c000001 ffff8300cee48100 ffff8300cee48118
ffff8300cf13fc88 ffff828c8013de4a 0000000044000001 000000004c000000
ffff82840a077b78 ffff8300cee48100 ffff8300cf13fcb8 ffff828c8013df63
00007cff30ec0337 ffff82840a077b78 0000000000000003 00000000004011a4
ffff8300cf13fcd8 ffff828c8013b409 ffff8300cf13fd68 ffff8304011a4018
ffff8300cf13fd08 ffff828c8013c06d ffff8300cee48100 ffff82840a02c1a0
0000000068000001 ffff8300cee490f8 ffff8300cf13fd38 ffff828c8013de5a
0000000060000001 0000000068000000 ffff82840a02c1a0 ffff8300cee490f8
ffff8300cf13fd68 ffff828c8013df63 ffff82840a02c1a0 ffff82840a02c1a0
ffff82840a02c1a0 ffff8300cee48100 ffff8300cf13fdb8 ffff828c80131680
0000000088000000 0000000080000000 ffff8300cf13ff28 ffff8300cee48100
ffff8300cee48100 00000000b4dfc508 0000000000000000 0000000000000000
ffff8300cf13fdd8 ffff828c80131a94 ffff8300cee48100 0000000000000000
ffff8300cf13fe08 ffff828c80105638 ffff82840f448b58 ffff8300cf13fe28
00000000b4dfc508 fffffffffffffff3 ffff8300cf13ff08 ffff828c8010479f
00000000000000fb ffff8300cee3a130 ffff8300cf13fe68 ffff828c8011c746
0000000000200282 ffff8300ceefe118 0000000500000002 083010acb7ab000a
Xen call trace:
[<ffff828c8013dee4>] put_page_type+0x17/0x107
[<ffff828c8013b3bb>] put_page_from_l2e+0x3f/0x4e
[<ffff828c8013bfcd>] free_l2_table+0xa6/0xce
[<ffff828c8013de4a>] free_page_type+0x1c4/0x247
[<ffff828c8013df63>] put_page_type+0x96/0x107
[<ffff828c8013b409>] put_page_from_l3e+0x3f/0x4e
[<ffff828c8013c06d>] free_l3_table+0x78/0xc4
[<ffff828c8013de5a>] free_page_type+0x1d4/0x247
[<ffff828c8013df63>] put_page_type+0x96/0x107
[<ffff828c80131680>] relinquish_memory+0xce/0x262
[<ffff828c80131a94>] domain_relinquish_resources+0xd1/0x1b0
[<ffff828c80105638>] domain_kill+0x77/0x164
[<ffff828c8010479f>] do_domctl+0x4dd/0xc1e
[<ffff828c801bdd50>] compat_tracing_off+0xb/0x64
Pagetable walk from ffffa026f4162c08:
L4[0x140] = 0000000000000000 ffffffffffffffff
****************************************
Panic on CPU 1:
FATAL PAGE FAULT
[error_code=0000]
Faulting linear address: ffffa026f4162c08
****************************************
Reboot in five seconds...
...3 seconds later, this occurred...
Assertion '__cpus_subset(&(cpumask), &(cpu_online_map), 32)' failed at
smp.c:84
----[ Xen-3.2.1-rc5 x86_64 debug=y Not tainted ]----
CPU: 0
RIP: e008:[<ffff828c80145c68>] send_IPI_mask_flat+0x29/0x9c
RFLAGS: 0000000000010002 CONTEXT: hypervisor
rax: 00000000fffffffe rbx: ffff8300cee3c100 rcx: 0000000000000003
rdx: 0000000000000040 rsi: 00000000000000fc rdi: 0000000000000004
rbp: ffff828c80237be8 rsp: ffff828c80237bd0 r8: ffff828c8024c780
r9: 0000000000000002 r10: 00000000deadbeef r11: 0000000000000000
r12: 0000000000000004 r13: 00000000000000fc r14: 0000000000000010
r15: 00001485db7a5091 cr0: 000000008005003b cr4: 00000000000026b0
cr3: 00000003ff15a000 cr2: 00000000e3015078
ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
Xen stack trace from rsp=ffff828c80237bd0:
ffff8300cee3c100 0000000000000086 0000000000000000 ffff828c80237c08
ffff828c8014601a ffff8300cee30f00 0000000000000004 ffff828c80237c38
ffff828c80114da0 0000000000000004 ffff828c80137fe0 0000000000000004
ffff828c8025951c ffff828c80237c68 ffff828c80119b18 ffff828c80237c98
ffff828c80137ac2 ffff8300cee3c100 ffff8300cfdd4100 ffff828c80237c98
ffff828c80107409 00000000c0621300 ffff8300cfdd4100 ffff8300cee30f00
0000000000000000 ffff828c80237ca8 ffff828c801075c9 ffff828c80237cd8
ffff828c80137fe0 ffff828c80259500 ffff828c8025951c 0000000000000098
ffff828c80237d38 ffff828c80237d28 ffff828c80137ac2 0000000000000082
0000000000000000 ffff828c80237d18 0000000000000009 00000000ffffffff
ffff828c801ebb60 ffff828c8020e100 00001485db7a5091 00007d737fdc82a7
ffff828c801336e6 00001485db7a5091 ffff828c8020e100 ffff828c801ebb60
00000000ffffffff ffff828c80237de8 0000000000000009 0000000000000000
00000000deadbeef 0000000000000000 0000000000000000 000000007d9b040e
000000007d8a4358 000000000000290c 00000000001e8480 00000000000003e8
0000009800000000 ffff828c8012ac48 000000000000e008 0000000000000216
ffff828c80237de8 0000000000000000 00001485db7a5091 ffff828c80237e08
ffff828c80146257 ffff828c80237f28 ffff828c8020e534 ffff828c80237e28
ffff828c80145b9a ffff828c80237f28 ffff828c8020e534 ffff828c80237e38
ffff828c80146312 00007d737fdc8197 ffff828c801347a0 00001485db7a5091
Xen call trace:
[<ffff828c80145c68>] send_IPI_mask_flat+0x29/0x9c
[<ffff828c8014601a>] smp_send_event_check_mask+0x3e/0x40
[<ffff828c80114da0>] csched_vcpu_wake+0x242/0x259
[<ffff828c80119b18>] vcpu_wake+0x12d/0x248
[<ffff828c80107409>] evtchn_set_pending+0xe5/0x15c
[<ffff828c801075c9>] send_guest_pirq+0x61/0x63
[<ffff828c80137fe0>] __do_IRQ_guest+0x19c/0x1b2
[<ffff828c80137ac2>] do_IRQ+0x5a/0x1a7
[<ffff828c801336e6>] common_interrupt+0x26/0x30
[<ffff828c8012ac48>] __udelay+0x30/0x48
[<ffff828c80146257>] smp_send_stop+0x39/0x67
[<ffff828c80145b9a>] machine_restart+0x4f/0xc5
[<ffff828c80146312>] smp_call_function_interrupt+0x79/0xa7
[<ffff828c801347a0>] call_function_interrupt+0x30/0x40
[<ffff828c8012c73b>] default_idle+0x2f/0x34
[<ffff828c8012c7ff>] idle_loop+0x70/0x77
****************************************
Panic on CPU 0:
Assertion '__cpus_subset(&(cpumask), &(cpu_online_map), 32)' failed at
smp.c:84
****************************************
Reboot in five seconds...
====== Second trace ======
----[ Xen-3.2.1-rc5 x86_64 debug=y Not tainted ]----
CPU: 0
RIP: e008:[<ffff828c8013dee4>] put_page_type+0x17/0x107
RFLAGS: 0000000000210286 CONTEXT: hypervisor
rax: 00000a51169fd050 rbx: ffff8cd5169fd050 rcx: 0000000080000000
rdx: 0000004206f73202 rsi: 00000000004041e1 rdi: ffff8cd5169fd050
rbp: ffff828c80237bf8 rsp: ffff828c80237bd8 r8: 0000000000000000
r9: 00000000deadbeef r10: ffff828c801c5bf0 r11: 0000000000000000
r12: 0000000000000000 r13: ffff8cd5169fd050 r14: 00000000004041e1
r15: ffff82840a0a4b28 cr0: 000000008005003b cr4: 00000000000026b0
cr3: 000000062ffd9000 cr2: ffff8cd5169fd068
ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0000 cs: e008
Xen stack trace from rsp=ffff828c80237bd8:
ffff828409df5d01 ffff8cd5169fd050 0000000000000000 ffff8300ceea0100
ffff828c80237c18 ffff828c8013b3bb 0000000400000004 ffff8304041e1000
ffff828c80237c58 ffff828c8013bfcd 00000003f2f24027 ffff8300ceea0100
ffff82840a0a4b28 0000000048000001 ffff8300ceea0100 ffff8300ceea0118
ffff828c80237c88 ffff828c8013de4a 0000000040000001 0000000048000000
ffff82840a0a4b28 ffff8300ceea0100 ffff828c80237cb8 ffff828c8013df63
0000000000000000 ffff82840a0a4b28 0000000000000000 0000000000402dd4
ffff828c80237cd8 ffff828c8013b409 ffff8300ceea0100 ffff830402dd4000
ffff828c80237d08 ffff828c8013c06d ffff8300ceea0100 ffff82840a072920
0000000068000001 ffff8300ceea10f8 ffff828c80237d38 ffff828c8013de5a
0000000060000001 0000000068000000 ffff82840a072920 ffff8300ceea10f8
ffff828c80237d68 ffff828c8013df63 ffff82840a072920 ffff82840a072920
ffff82840a072920 ffff8300ceea0100 ffff828c80237db8 ffff828c80131680
0000000088000000 0000000080000000 ffff828c80237f28 ffff8300ceea0100
ffff8300ceea0100 00000000b2cf9868 0000000000000000 0000000000000000
ffff828c80237dd8 ffff828c80131a94 ffff8300ceea0100 0000000000000000
ffff828c80237e08 ffff828c80105638 ffff828c80237e18 ffff828c80114da0
00000000b2cf9868 fffffffffffffff3 ffff828c80237f08 ffff828c8010479f
ffff828c80237e48 ffff8300cee36130 0000000000000000 000078cdfb20f27f
0000000000200282 0000000000000000 0000000500000002 081d66ecb7af0010
Xen call trace:
[<ffff828c8013dee4>] put_page_type+0x17/0x107
[<ffff828c8013b3bb>] put_page_from_l2e+0x3f/0x4e
[<ffff828c8013bfcd>] free_l2_table+0xa6/0xce
[<ffff828c8013de4a>] free_page_type+0x1c4/0x247
[<ffff828c8013df63>] put_page_type+0x96/0x107
[<ffff828c8013b409>] put_page_from_l3e+0x3f/0x4e
[<ffff828c8013c06d>] free_l3_table+0x78/0xc4
[<ffff828c8013de5a>] free_page_type+0x1d4/0x247
[<ffff828c8013df63>] put_page_type+0x96/0x107
[<ffff828c80131680>] relinquish_memory+0xce/0x262
[<ffff828c80131a94>] domain_relinquish_resources+0xd1/0x1b0
[<ffff828c80105638>] domain_kill+0x77/0x164
[<ffff828c8010479f>] do_domctl+0x4dd/0xc1e
[<ffff828c801bdd50>] compat_tracing_off+0xb/0x64
Pagetable walk from ffff8cd5169fd068:
L4[0x119] = 0000000000000000 ffffffffffffffff
****************************************
Panic on CPU 0:
FATAL PAGE FAULT
[error_code=0000]
Faulting linear address: ffff8cd5169fd068
****************************************
Reboot in five seconds...
-Chris
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Xen 3.2.1-rc5: FATAL PAGE FAULT
2008-04-22 18:19 ` Xen 3.2.1-rc5: " Christopher S. Aker
@ 2008-04-22 18:46 ` Keir Fraser
2008-04-22 19:39 ` Christopher S. Aker
0 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2008-04-22 18:46 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: xen devel
On 22/4/08 19:19, "Christopher S. Aker" <caker@theshore.net> wrote:
> Here are two more Xen traces with this problem. These always appear to
> occur after we're forced to destroy a domain. The first trace is a
> DoubleDump<tm> and has something new in the second dump...
>
> http://www.theshore.net/~caker/xen/build-1.11/
>
> I still don't have a method to reproduce, but since we're hitting this
> with some frequency, would it be worth it to stick in some extra
> debugging now?
The second crash is just some overzealous asserting. Easily fixed but also
not very interesting, unfortunately.
The two main backtraces are exactly the same bug as you saw last time.
Except in this case you have bogus nonsense in a pair of L2 pagetable
entries, whereas last time the garbage was in an L3 entry.
My best guess just now, seeing as noone else has reported ever seeing this,
is that maybe you have a bad driver or hardware corrupting memory? Obviously
that's a bit of a stab in the dark though.
Have you seen this particular type of crash on multiple different machines?
If so, are they different types of machine?
-- Keir
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Xen 3.2.1-rc5: FATAL PAGE FAULT
2008-04-22 18:46 ` Keir Fraser
@ 2008-04-22 19:39 ` Christopher S. Aker
2008-04-22 20:21 ` Keir Fraser
0 siblings, 1 reply; 10+ messages in thread
From: Christopher S. Aker @ 2008-04-22 19:39 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen devel
Keir Fraser wrote:
> The second crash is just some overzealous asserting. Easily fixed but also
> not very interesting, unfortunately.
>
> The two main backtraces are exactly the same bug as you saw last time.
> Except in this case you have bogus nonsense in a pair of L2 pagetable
> entries, whereas last time the garbage was in an L3 entry.
>
> My best guess just now, seeing as noone else has reported ever seeing this,
> is that maybe you have a bad driver or hardware corrupting memory? Obviously
> that's a bit of a stab in the dark though.
>
> Have you seen this particular type of crash on multiple different machines?
> If so, are they different types of machine?
Two machines thus far, both are of identical software and hardware
configurations.
Now that it looks like the 3ware issues have been corrected in post-Xen
3.1 dom0, I'll update our boxes from 2.6.16.x to 2.6.18.8 and hope for
the best.
Thanks for your help so far.
-Chris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Xen 3.2.1-rc5: FATAL PAGE FAULT
2008-04-22 19:39 ` Christopher S. Aker
@ 2008-04-22 20:21 ` Keir Fraser
2008-04-28 14:02 ` Christopher S. Aker
0 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2008-04-22 20:21 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: xen devel
On 22/4/08 20:39, "Christopher S. Aker" <caker@theshore.net> wrote:
>> My best guess just now, seeing as noone else has reported ever seeing this,
>> is that maybe you have a bad driver or hardware corrupting memory? Obviously
>> that's a bit of a stab in the dark though.
>>
>> Have you seen this particular type of crash on multiple different machines?
>> If so, are they different types of machine?
>
> Two machines thus far, both are of identical software and hardware
> configurations.
Have you been running this type of workload on a variety of hardware, or are
you limited in the range of types of hardware that you're testing on? This
might indicate whether it is significant that you have only seen the crash
on a single hardware type.
-- Keir
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Xen 3.2.1-rc5: FATAL PAGE FAULT
2008-04-22 20:21 ` Keir Fraser
@ 2008-04-28 14:02 ` Christopher S. Aker
2008-04-28 14:44 ` Keir Fraser
0 siblings, 1 reply; 10+ messages in thread
From: Christopher S. Aker @ 2008-04-28 14:02 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen devel
Keir Fraser wrote:
> On 22/4/08 20:39, "Christopher S. Aker" <caker@theshore.net> wrote:
>
>>> My best guess just now, seeing as noone else has reported ever seeing this,
>>> is that maybe you have a bad driver or hardware corrupting memory? Obviously
>>> that's a bit of a stab in the dark though.
>>>
>>> Have you seen this particular type of crash on multiple different machines?
>>> If so, are they different types of machine?
>> Two machines thus far, both are of identical software and hardware
>> configurations.
>
> Have you been running this type of workload on a variety of hardware, or are
> you limited in the range of types of hardware that you're testing on? This
> might indicate whether it is significant that you have only seen the crash
> on a single hardware type.
Make that three machines. They're all of the same config. This
identical hardware config runs fine under non-Xen. It also only occurs
when a domain is being destroyed, so I wouldn't suspect this is a driver
issue or memory corruption given the pattern. Xen is most suspect, in
my mind.
Will you provide me with some debugging code that'll make these
occurrences more useful in tracking down the problem the next time it
triggers?
(XEN) Pagetable walk from 00000000c16e3f30:
(XEN) L4[0x000] = 00000002bfe8d027 00000000000258e3
(XEN) L3[0x003] = 646c696843206120 ffffffffffffffff
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 84 (vcpu#2) crashed on cpu#1:
(XEN) ----[ Xen-3.2.1-rc1 x86_64 debug=y Not tainted ]----
(XEN) CPU: 1
(XEN) RIP: 0061:[<00000000c0101347>]
(XEN) RFLAGS: 0000000000010246 CONTEXT: guest
(XEN) rax: 0000000000000000 rbx: 00000000deadbeef rcx: 00000000deadbeef
(XEN) rdx: 00000000deadbeef rsi: 00000000deadbeef rdi: 00000000c7006030
(XEN) rbp: 00000000c16e3fac rsp: 00000000c16e3f38 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026b0
(XEN) cr3: 000000060f4c8000 cr2: 00000000c0101347
(XEN) ds: 007b es: 007b fs: 0000 gs: 0000 ss: 0069 cs: 0061
(XEN) Guest stack trace from esp=c16e3f38:
(XEN) Fault while accessing guest memory.
(XEN) ----[ Xen-3.2.1-rc1 x86_64 debug=y Not tainted ]----
(XEN) CPU: 5
(XEN) RIP: e008:[<ffff828c8013dee4>] put_page_type+0x17/0x107
(XEN) RFLAGS: 0000000000210282 CONTEXT: hypervisor
(XEN) rax: 000006162f512f98 rbx: ffff889a2f512f98 rcx: 6765746143206568
(XEN) rdx: 00000026f4620797 rsi: 00000000002bfe8d rdi: ffff889a2f512f98
(XEN) rbp: ffff8300cfde7cb8 rsp: ffff8300cfde7c98 r8: 0000000000000000
(XEN) r9: 00000000deadbeef r10: ffff828c801c5bf0 r11: 0000000000000000
(XEN) r12: 0000000000000001 r13: ffff889a2f512f98 r14: ffff8300cee88100
(XEN) r15: ffff8300cee88118 cr0: 000000008005003b cr4: 00000000000026b0
(XEN) cr3: 000000062ffdf000 cr2: ffff889a2f512fb0
(XEN) ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff8300cfde7c98:
(XEN) ffff8300cfde7ca8 ffff889a2f512f98 0000000000000001
00000000002bfe8d
(XEN) ffff8300cfde7cd8 ffff828c8013b409 ffff8300cee88100
ffff8302bfe8d008
(XEN) ffff8300cfde7d08 ffff828c8013c06d ffff8300cee88100
ffff828406dfc608
(XEN) 0000000068000001 ffff8300cee890f8 ffff8300cfde7d38
ffff828c8013de5a
(XEN) 0000000060000001 0000000068000000 ffff828406dfc608
ffff8300cee890f8
(XEN) ffff8300cfde7d68 ffff828c8013df63 ffff828406dfc608
ffff828406dfc608
(XEN) ffff828406dfc608 ffff8300cee88100 ffff8300cfde7db8
ffff828c80131680
(XEN) 0000000088000000 0000000080000000 ffff8300cfde7f28
ffff8300cee88100
(XEN) ffff8300cee88100 00000000b4dfb508 0000000000000000
0000000000000000
(XEN) ffff8300cfde7dd8 ffff828c80131a94 ffff8300cee88100
0000000000000000
(XEN) ffff8300cfde7e08 ffff828c80105638 ffff8300cfde7e08
ffff828c8014601a
(XEN) 00000000b4dfb508 fffffffffffffff3 ffff8300cfde7f08
ffff828c8010479f
(XEN) 0000000000000001 0000000000000000 0000000000000001
0000000000000000
(XEN) ffff8300cfde7e68 0000000000200286 0000000500000002
082ebba4b7b80054
(XEN) 0836d2a401dfb538 b7ddfc50b7b8f68c b7aa53e400000001
00000001b7a2ecdc
(XEN) 080facafb4dfb568 081361e0082f17c0 080797e7b775bf0c
00000000b775bf28
(XEN) b7dda02c00000060 b76f084c00000000 0805946cb4dfb588
b7dda02cb76f084c
(XEN) b7ddd6a000000000 00000002b765eeac a5dba1eea5dba1ee
0000001f00000000
(XEN) 0000000000000010 ffff8300cee3c100 0000000000000000
0000000000000000
(XEN) 0000000000000000 0000000000000000 00007cff302180b7
ffff828c801bdd50
(XEN) Xen call trace:
(XEN) [<ffff828c8013dee4>] put_page_type+0x17/0x107
(XEN) [<ffff828c8013b409>] put_page_from_l3e+0x3f/0x4e
(XEN) [<ffff828c8013c06d>] free_l3_table+0x78/0xc4
(XEN) [<ffff828c8013de5a>] free_page_type+0x1d4/0x247
(XEN) [<ffff828c8013df63>] put_page_type+0x96/0x107
(XEN) [<ffff828c80131680>] relinquish_memory+0xce/0x262
(XEN) [<ffff828c80131a94>] domain_relinquish_resources+0xd1/0x1b0
(XEN) [<ffff828c80105638>] domain_kill+0x77/0x164
(XEN) [<ffff828c8010479f>] do_domctl+0x4dd/0xc1e
(XEN) [<ffff828c801bdd50>] compat_tracing_off+0xb/0x64
(XEN)
(XEN) Pagetable walk from ffff889a2f512fb0:
(XEN) L4[0x111] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 5:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff889a2f512fb0
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
Thanks,
-Chris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Xen 3.2.1-rc5: FATAL PAGE FAULT
2008-04-28 14:02 ` Christopher S. Aker
@ 2008-04-28 14:44 ` Keir Fraser
2008-04-28 15:00 ` Christopher S. Aker
0 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2008-04-28 14:44 UTC (permalink / raw)
To: Christopher S. Aker; +Cc: xen devel
On 28/4/08 15:02, "Christopher S. Aker" <caker@theshore.net> wrote:
> Make that three machines. They're all of the same config. This
> identical hardware config runs fine under non-Xen. It also only occurs
> when a domain is being destroyed, so I wouldn't suspect this is a driver
> issue or memory corruption given the pattern. Xen is most suspect, in
> my mind.
>
> Will you provide me with some debugging code that'll make these
> occurrences more useful in tracking down the problem the next time it
> triggers?
I suggest you try repro'ing on a slightly different hardware configuration.
For example, a different storage controller. Did you repro with a 2.6.18
dom0 yet?
-- Keir
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Xen 3.2.1-rc5: FATAL PAGE FAULT
2008-04-28 14:44 ` Keir Fraser
@ 2008-04-28 15:00 ` Christopher S. Aker
0 siblings, 0 replies; 10+ messages in thread
From: Christopher S. Aker @ 2008-04-28 15:00 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen devel
Keir Fraser wrote:
> On 28/4/08 15:02, "Christopher S. Aker" <caker@theshore.net> wrote:
>> Will you provide me with some debugging code that'll make these
>> occurrences more useful in tracking down the problem the next time it
>> triggers?
>
> I suggest you try repro'ing on a slightly different hardware configuration.
> For example, a different storage controller. Did you repro with a 2.6.18
> dom0 yet?
All of our machines are using 3ware RAID cards, so trying this on
alternate hardware isn't an option.
We haven't hit this on 2.6.18 dom0 yet. Newly deployed machines and
boxes that crash are being updated to 2.6.18 dom0. I'll keep you posted :)
Thanks,
-Chris
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-04-28 15:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-03 4:34 Xen 3.2.1-rc1: FATAL PAGE FAULT Christopher S. Aker
2008-04-03 14:04 ` Christopher S. Aker
2008-04-03 15:55 ` Keir Fraser
2008-04-22 18:19 ` Xen 3.2.1-rc5: " Christopher S. Aker
2008-04-22 18:46 ` Keir Fraser
2008-04-22 19:39 ` Christopher S. Aker
2008-04-22 20:21 ` Keir Fraser
2008-04-28 14:02 ` Christopher S. Aker
2008-04-28 14:44 ` Keir Fraser
2008-04-28 15:00 ` Christopher S. Aker
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.