* [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs [not found] <20070825154617.GA27424@wavehammer.waldi.eu.org> @ 2007-08-29 10:02 ` Bastian Blank 2007-08-29 10:14 ` Keir Fraser ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Bastian Blank @ 2007-08-29 10:02 UTC (permalink / raw) To: xen-devel Hi folks After disabling the MCE support, the kernel crashs: | (XEN) domain_crash_sync called from entry.S (ff16f829) | (XEN) Domain 1 (vcpu#0) crashed on cpu#0: | (XEN) ----[ Xen-3.1.0 x86_32p debug=n Not tainted ]---- | (XEN) CPU: 0 | (XEN) EIP: 0061:[<c0107a20>] | (XEN) EFLAGS: 00010286 CONTEXT: guest | (XEN) eax: 00010061 ebx: 00010061 ecx: c027ac8c edx: 0000000d | (XEN) esi: ffffffff edi: 00000000 ebp: 00000020 esp: c004000c | (XEN) cr0: 80050033 cr4: 000006f0 cr3: 0bd89000 cr2: 00010061 | (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: e021 cs: 0061 | (XEN) Guest stack trace from esp=c004000c: | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 There seems to be two code adresses in the trace: | c027abd0 T page_fault | c0107a20 T invalid_op The adress in ecx (c027ac8c) points to | mov %eax,%fs Bastian -- I'm a soldier, not a diplomat. I can only tell the truth. -- Kirk, "Errand of Mercy", stardate 3198.9 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 10:02 ` [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs Bastian Blank @ 2007-08-29 10:14 ` Keir Fraser 2007-08-29 10:27 ` Bastian Blank 2007-08-29 10:39 ` Jan Beulich 2007-08-29 13:23 ` Bastian Blank 2 siblings, 1 reply; 10+ messages in thread From: Keir Fraser @ 2007-08-29 10:14 UTC (permalink / raw) To: Bastian Blank, xen-devel You mean 2.6.23-rc3? And what do you mean by 'disabling the MCE support'? -- Keir On 29/8/07 11:02, "Bastian Blank" <bastian@waldi.eu.org> wrote: > Hi folks > > After disabling the MCE support, the kernel crashs: > | (XEN) domain_crash_sync called from entry.S (ff16f829) > | (XEN) Domain 1 (vcpu#0) crashed on cpu#0: > | (XEN) ----[ Xen-3.1.0 x86_32p debug=n Not tainted ]---- > | (XEN) CPU: 0 > | (XEN) EIP: 0061:[<c0107a20>] > | (XEN) EFLAGS: 00010286 CONTEXT: guest > | (XEN) eax: 00010061 ebx: 00010061 ecx: c027ac8c edx: 0000000d > | (XEN) esi: ffffffff edi: 00000000 ebp: 00000020 esp: c004000c > | (XEN) cr0: 80050033 cr4: 000006f0 cr3: 0bd89000 cr2: 00010061 > | (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: e021 cs: 0061 > | (XEN) Guest stack trace from esp=c004000c: > | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 > c027abd0 > | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 > 00010061 > | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 > 00010086 > | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 > 00000002 > | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 > c0107a20 > | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 > 00010061 > | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 > 00010086 > | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 > c027abd0 > | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 > 00010061 > | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 > 00010086 > | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 > 00000002 > | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 > c0107a20 > | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 > 00010061 > | (XEN) 00010086 c027abd0 00010061 00010086 00000002 c0107a20 00010061 > 00010086 > | (XEN) c027abd0 00010061 00010086 00000002 c0107a20 00010061 00010086 > c027abd0 > | (XEN) 00010061 00010086 00000002 c0107a20 00010061 00010086 c027abd0 > 00010061 > | (XEN) 00010086 00000002 c0107a20 00010061 00010086 c027abd0 00010061 > 00010086 > | (XEN) 00000002 c0107a20 00010061 00010086 c027abd0 00010061 00010086 > 00000002 > | (XEN) c0107a20 00010061 00010086 c027abd0 00010061 00010086 00000002 > c0107a20 > | (XEN) 00010061 00010086 c027abd0 00010061 00010086 00000002 c0107a20 > 00010061 > > There seems to be two code adresses in the trace: > | c027abd0 T page_fault > | c0107a20 T invalid_op > > The adress in ecx (c027ac8c) points to > | mov %eax,%fs > > Bastian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 10:14 ` Keir Fraser @ 2007-08-29 10:27 ` Bastian Blank 0 siblings, 0 replies; 10+ messages in thread From: Bastian Blank @ 2007-08-29 10:27 UTC (permalink / raw) To: Keir Fraser; +Cc: xen-devel On Wed, Aug 29, 2007 at 11:14:19AM +0100, Keir Fraser wrote: > You mean 2.6.23-rc3? Yes. > And what do you mean by 'disabling the MCE support'? CONFIG_X86_MCE. If enabled, it tries to disable MCE on boot, which delivers a gpf in the CR4 write. Bastian -- Change is the essential process of all existence. -- Spock, "Let That Be Your Last Battlefield", stardate 5730.2 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 10:02 ` [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs Bastian Blank 2007-08-29 10:14 ` Keir Fraser @ 2007-08-29 10:39 ` Jan Beulich 2007-08-29 11:02 ` Bastian Blank 2007-08-29 13:23 ` Bastian Blank 2 siblings, 1 reply; 10+ messages in thread From: Jan Beulich @ 2007-08-29 10:39 UTC (permalink / raw) To: Bastian Blank; +Cc: xen-devel >There seems to be two code adresses in the trace: >| c027abd0 T page_fault >| c0107a20 T invalid_op These aren't really meaningful, as the VM was obviously in a loop getting repeated exceptions (and the stack pointer clearly went bad meanwhile). You'd need to catch the state much earlier, when the first (or just very few) of these exceptions happened, so that looking at the stack can actually provide some insight. Jan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 10:39 ` Jan Beulich @ 2007-08-29 11:02 ` Bastian Blank 2007-08-29 11:32 ` Jan Beulich 0 siblings, 1 reply; 10+ messages in thread From: Bastian Blank @ 2007-08-29 11:02 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel On Wed, Aug 29, 2007 at 11:39:25AM +0100, Jan Beulich wrote: > These aren't really meaningful, as the VM was obviously in a loop getting > repeated exceptions (and the stack pointer clearly went bad meanwhile). > You'd need to catch the state much earlier, when the first (or just very few) > of these exceptions happened, so that looking at the stack can actually > provide some insight. How? There seems to be something in tools/debugger/gdb. It seems to build a special gdbserver. Does this gdbserver stop the domain on an exception or do I need to break explicitely? Bastian -- A princess should not be afraid -- not with a brave knight to protect her. -- McCoy, "Shore Leave", stardate 3025.3 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 11:02 ` Bastian Blank @ 2007-08-29 11:32 ` Jan Beulich 2007-08-29 13:09 ` Keir Fraser 0 siblings, 1 reply; 10+ messages in thread From: Jan Beulich @ 2007-08-29 11:32 UTC (permalink / raw) To: Bastian Blank; +Cc: xen-devel >>> Bastian Blank <bastian@waldi.eu.org> 29.08.07 13:02 >>> >On Wed, Aug 29, 2007 at 11:39:25AM +0100, Jan Beulich wrote: >> These aren't really meaningful, as the VM was obviously in a loop getting >> repeated exceptions (and the stack pointer clearly went bad meanwhile). >> You'd need to catch the state much earlier, when the first (or just very few) >> of these exceptions happened, so that looking at the stack can actually >> provide some insight. > >How? There seems to be something in tools/debugger/gdb. It seems to >build a special gdbserver. Does this gdbserver stop the domain on an >exception or do I need to break explicitely? I never used it, so I don't know. But you must have been building the kernel yourself, and given from you testing -rc kernels I also assume you're familiar with modifying the kernel sources, so it shouldn't be too difficult to e.g. remove registration of the illegal opcode handler so that Xen dumps the VCPU state (and kills the VM) the first time such an exception occurs in the guest (of course assuming there are no other instances of 'valid' uses of the exception - you'd see this pretty quickly). Jan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 11:32 ` Jan Beulich @ 2007-08-29 13:09 ` Keir Fraser 2007-08-29 14:13 ` Bastian Blank 0 siblings, 1 reply; 10+ messages in thread From: Keir Fraser @ 2007-08-29 13:09 UTC (permalink / raw) To: Jan Beulich, Bastian Blank; +Cc: xen-devel Sounds like Bastian already worked out the problem is we do not allow X86_CR4_MCE to be cleared. Xen should probably just ignore attempts to change that bit. Even better would be to remember guest value of that flag and return appropriate value on reads of CR4. But that's more than required here, I think. Actually, instead of GPF'ing on 'bad' CR4 writes, we could just log a XENLOG_WARNING and return. That would avoid any problems for any other CR4 bits too. -- Keir On 29/8/07 12:32, "Jan Beulich" <jbeulich@novell.com> wrote: >>>> Bastian Blank <bastian@waldi.eu.org> 29.08.07 13:02 >>> >> On Wed, Aug 29, 2007 at 11:39:25AM +0100, Jan Beulich wrote: >>> These aren't really meaningful, as the VM was obviously in a loop getting >>> repeated exceptions (and the stack pointer clearly went bad meanwhile). >>> You'd need to catch the state much earlier, when the first (or just very >>> few) >>> of these exceptions happened, so that looking at the stack can actually >>> provide some insight. >> >> How? There seems to be something in tools/debugger/gdb. It seems to >> build a special gdbserver. Does this gdbserver stop the domain on an >> exception or do I need to break explicitely? > > I never used it, so I don't know. But you must have been building the kernel > yourself, and given from you testing -rc kernels I also assume you're familiar > with modifying the kernel sources, so it shouldn't be too difficult to e.g. > remove registration of the illegal opcode handler so that Xen dumps the > VCPU state (and kills the VM) the first time such an exception occurs in the > guest (of course assuming there are no other instances of 'valid' uses of > the exception - you'd see this pretty quickly). > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 13:09 ` Keir Fraser @ 2007-08-29 14:13 ` Bastian Blank 2007-08-29 14:22 ` Keir Fraser 0 siblings, 1 reply; 10+ messages in thread From: Bastian Blank @ 2007-08-29 14:13 UTC (permalink / raw) To: xen-devel On Wed, Aug 29, 2007 at 02:09:23PM +0100, Keir Fraser wrote: > Sounds like Bastian already worked out the problem is we do not allow > X86_CR4_MCE to be cleared. Xen should probably just ignore attempts to > change that bit. Even better would be to remember guest value of that flag > and return appropriate value on reads of CR4. But that's more than required > here, I think. What should happen with the upcoming MCE support in Xen? > Actually, instead of GPF'ing on 'bad' CR4 writes, we could just log a > XENLOG_WARNING and return. That would avoid any problems for any other CR4 > bits too. What is the documented behaviour if a bit is set while the machine lacks support for it? Bastian -- The face of war has never changed. Surely it is more logical to heal than to kill. -- Surak of Vulcan, "The Savage Curtain", stardate 5906.5 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 14:13 ` Bastian Blank @ 2007-08-29 14:22 ` Keir Fraser 0 siblings, 0 replies; 10+ messages in thread From: Keir Fraser @ 2007-08-29 14:22 UTC (permalink / raw) To: Bastian Blank, xen-devel On 29/8/07 15:13, "Bastian Blank" <bastian@waldi.eu.org> wrote: > On Wed, Aug 29, 2007 at 02:09:23PM +0100, Keir Fraser wrote: >> Sounds like Bastian already worked out the problem is we do not allow >> X86_CR4_MCE to be cleared. Xen should probably just ignore attempts to >> change that bit. Even better would be to remember guest value of that flag >> and return appropriate value on reads of CR4. But that's more than required >> here, I think. > > What should happen with the upcoming MCE support in Xen? We'll cross that bridge when we come to it. ;-) The default will be that it all continues to be handled by Xen, and explicit paravirtualisations will be introduced to allow dom0, and perhaps domUs also, to get involved. >> Actually, instead of GPF'ing on 'bad' CR4 writes, we could just log a >> XENLOG_WARNING and return. That would avoid any problems for any other CR4 >> bits too. > > What is the documented behaviour if a bit is set while the machine lacks > support for it? That should GPF. But no OS actually probes for features that way, as there are perfectly good CPUID feature flags for that. Linux in particular will not be happy if any of its writes to CR4 results in a GPF. -- Keir ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs 2007-08-29 10:02 ` [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs Bastian Blank 2007-08-29 10:14 ` Keir Fraser 2007-08-29 10:39 ` Jan Beulich @ 2007-08-29 13:23 ` Bastian Blank 2 siblings, 0 replies; 10+ messages in thread From: Bastian Blank @ 2007-08-29 13:23 UTC (permalink / raw) To: xen-devel On Wed, Aug 29, 2007 at 12:02:59PM +0200, Bastian Blank wrote: > | (XEN) domain_crash_sync called from entry.S (ff16f829) > | (XEN) Domain 1 (vcpu#0) crashed on cpu#0: > | (XEN) ----[ Xen-3.1.0 x86_32p debug=n Not tainted ]---- I can't longer reproduce it. Bastian -- We do not colonize. We conquer. We rule. There is no other way for us. -- Rojan, "By Any Other Name", stardate 4657.5 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-08-29 14:22 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070825154617.GA27424@wavehammer.waldi.eu.org>
2007-08-29 10:02 ` [RESEND] Linux 2.3.23-rc3 on Xen 3.1 crashs Bastian Blank
2007-08-29 10:14 ` Keir Fraser
2007-08-29 10:27 ` Bastian Blank
2007-08-29 10:39 ` Jan Beulich
2007-08-29 11:02 ` Bastian Blank
2007-08-29 11:32 ` Jan Beulich
2007-08-29 13:09 ` Keir Fraser
2007-08-29 14:13 ` Bastian Blank
2007-08-29 14:22 ` Keir Fraser
2007-08-29 13:23 ` Bastian Blank
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.