All of lore.kernel.org
 help / color / mirror / Atom feed
* Reproducible system crash
@ 2005-03-15 14:36 ` Peter Joanes
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Joanes @ 2005-03-15 14:36 UTC (permalink / raw)
  To: xen-devel

Hello,

My system instantly reboots everytime a particular program is run as a user 
(unpriviledged) process in dom0. I haven't tried running it under a domU 
kernel yet, and, of course, can't even be sure that it is a xen-specific 
problem.
I'm using Xen 2.0.4 with kernel 2.6.10 on a recent Gentoo installation (gcc 
3.4.3).

Unfortunately the program at issue is proprietary - it's IE6 running on wine 
(v. 20050211).
I had copied over a previously working installation (user's ".wine" 
directory), but I get the same result at the end of the ie6setup install 
stage of the instructions at:
	http://frankscorner.org/index.php?p=ie6

I'd be grateful for any suggestions of ways to debug this.


-	Peter Joanes.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Reproducible system crash
@ 2005-03-15 16:04 Ian Pratt
  2005-03-15 16:30   ` Peter Joanes
  0 siblings, 1 reply; 15+ messages in thread
From: Ian Pratt @ 2005-03-15 16:04 UTC (permalink / raw)
  To: Peter Joanes, xen-devel; +Cc: ian.pratt

> My system instantly reboots everytime a particular program is 
> run as a user 
> (unpriviledged) process in dom0. 

Are you running this as root or is the binary suid root?

Ian


I haven't tried running it 
> under a domU 
> kernel yet, and, of course, can't even be sure that it is a 
> xen-specific 
> problem.
> I'm using Xen 2.0.4 with kernel 2.6.10 on a recent Gentoo 
> installation (gcc 
> 3.4.3).
> 
> Unfortunately the program at issue is proprietary - it's IE6 
> running on wine 
> (v. 20050211).
> I had copied over a previously working installation (user's ".wine" 
> directory), but I get the same result at the end of the 
> ie6setup install 
> stage of the instructions at:
> 	http://frankscorner.org/index.php?p=ie6
> 
> I'd be grateful for any suggestions of ways to debug this.
> 
> 
> -	Peter Joanes.
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from 
> real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
> 


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
@ 2005-03-15 16:30   ` Peter Joanes
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Joanes @ 2005-03-15 16:30 UTC (permalink / raw)
  To: xen-devel

On Tuesday 15 March 2005 16:04, Ian Pratt wrote:
> > My system instantly reboots everytime a particular program is
> > run as a user
> > (unpriviledged) process in dom0.
>
> Are you running this as root or is the binary suid root?

No it isn't. One thing that I had thought must be coincidental, but that might 
be worth mentioning anyway was that at the end of the IE6 install, the 
message said that the computer needed to be restarted, and it was upon 
clicking away that box that the whole machine rebooted. Is it possible that 
the Wine environment is prevented from trapping a call that does this?

-	Peter.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Reproducible system crash
@ 2005-03-15 16:44 Ian Pratt
  0 siblings, 0 replies; 15+ messages in thread
From: Ian Pratt @ 2005-03-15 16:44 UTC (permalink / raw)
  To: Peter Joanes, xen-devel; +Cc: ian.pratt

> > > My system instantly reboots everytime a particular program is
> > > run as a user
> > > (unpriviledged) process in dom0.
> >
> > Are you running this as root or is the binary suid root?
> 
> No it isn't. One thing that I had thought must be 
> coincidental, but that might 
> be worth mentioning anyway was that at the end of the IE6 
> install, the 
> message said that the computer needed to be restarted, and it 
> was upon 
> clicking away that box that the whole machine rebooted. Is it 
> possible that 
> the Wine environment is prevented from trapping a call that does this?

I'm not entirely surprised that wine doesn't run, but I am concerned
about it crashing the machine if it is indeed non root. Could you try
running it under strace to see what it is doing?

Thanks,
Ian


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
@ 2005-03-16 14:29   ` Peter Joanes
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Joanes @ 2005-03-16 14:29 UTC (permalink / raw)
  To: xen-devel

On Tuesday 15 March 2005 20:12, Ian Pratt wrote:
> Do you know if wine uses vm86 mode?
Google seems to think so.

> Can you get a serial line on the machines? 
Yes. [Actually this took a little while because I didn't realise that Xen 
manages the serial port, so the dom0 8250 module can't be loaded.]
Here's the end of the output (and I have the whole capture if that would 
help):

writev(4, [{"^\0\0\0\2\0\0\0D\0\0\0\24\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 64}, 
{"*\0", 2}], 2) = 66
read(5, "\0\0\0\0 \0\0\0\1\0\0\0 \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 64) = 64
read(5, "n\0a\0t\0i\0v\0e\0,\0 \0b\0u\0i\0l\0t\0i\0n\0\0\0", 32) = 32
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
open("/usr/lib/wine/apphelp.dll.so", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("/usr/lib/wine/apphelp.dll.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No 
suchfile or directory)
rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0
(XEN) BUG at domain.c:142
(XEN) CPU:    0
(XEN) EIP:    0808:[<fc505a0e>]
(XEN) EFLAGS: 00011296
(XEN) eax: 00000000   ebx: fc5fd9e0   ecx: 00000000   edx: 00000000
(XEN) esi: 000e0003   edi: c010a960   ebp: 77acdfa0   esp: fc503f9c
(XEN) ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
(XEN) Stack trace from ESP=fc503f9c:
(XEN) fc52a478 fc52a4ce 0000008e 000e0003 [fc505930] fc5fd9e0 000e0003 
77ef5ca4
(XEN)        0000007b 00000000 ffffffff c010a960 77acdfa0 ffffffff 000e0003 
c010990a
(XEN)        00000061 00011296 d2278000 00000069 0000007b 0000007b 0000003b 
00000033
(XEN)        fc5fd9e0
(XEN) Call Trace from ESP=fc503f9c: [<fc505930>]

****************************************
CPU0 FATAL TRAP: vector = 6 (invalid operand)
[error_code=0000]
Aieee! CPU0 is toast...
****************************************

Reboot in five seconds...



-	Peter.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Reproducible system crash
@ 2005-03-16 16:08 Ian Pratt
  2005-03-16 16:22 ` Keir Fraser
  2005-03-16 18:00   ` Peter Joanes
  0 siblings, 2 replies; 15+ messages in thread
From: Ian Pratt @ 2005-03-16 16:08 UTC (permalink / raw)
  To: Peter Joanes, xen-devel; +Cc: ian.pratt

 

> -----Original Message-----
> From: xen-devel-admin@lists.sourceforge.net 
> [mailto:xen-devel-admin@lists.sourceforge.net] On Behalf Of 
> Peter Joanes
> Sent: 16 March 2005 14:30
> To: xen-devel@lists.sourceforge.net
> Subject: Re: [Xen-devel] Reproducible system crash
> 
> On Tuesday 15 March 2005 20:12, Ian Pratt wrote:
> > Do you know if wine uses vm86 mode?
> Google seems to think so.
> 
> > Can you get a serial line on the machines? 
> Yes. [Actually this took a little while because I didn't 
> realise that Xen 
> manages the serial port, so the dom0 8250 module can't be loaded.]
> Here's the end of the output (and I have the whole capture if 
> that would 
> help):

It looks like the domain 0 kernel is trying to do something illegal, and
Xen is killing it. It would be interesting to know whether this still
happens with the unstable.bk tree.

Please can you try this again, but this time building xen with debug=y
i.e.
 cd xen; make clean; make debug=y; cd .. ; make dist

We might learn a little more about what's going wrong. 

Thanks,
Ian


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
  2005-03-16 16:08 Reproducible system crash Ian Pratt
@ 2005-03-16 16:22 ` Keir Fraser
  2005-03-16 18:39     ` Peter Joanes
  2005-03-16 19:35   ` Kip Macy
  2005-03-16 18:00   ` Peter Joanes
  1 sibling, 2 replies; 15+ messages in thread
From: Keir Fraser @ 2005-03-16 16:22 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Peter Joanes, ian.pratt, xen-devel


On 16 Mar 2005, at 16:08, Ian Pratt wrote:

> It looks like the domain 0 kernel is trying to do something illegal, 
> and
> Xen is killing it. It would be interesting to know whether this still
> happens with the unstable.bk tree.
>
> Please can you try this again, but this time building xen with debug=y
> i.e.
>  cd xen; make clean; make debug=y; cd .. ; make dist
>
> We might learn a little more about what's going wrong.

The stack backtrace also indicates that the code at address 0xc010990a 
in XenLinux may also be interesting (that is the code point that caused 
the crash).
'objdump -d vmlinux' and then grep the output for that address....

  -- Keir



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
@ 2005-03-16 18:00   ` Peter Joanes
  2005-03-16 18:20     ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Joanes @ 2005-03-16 18:00 UTC (permalink / raw)
  To: xen-devel

On Wednesday 16 March 2005 16:08, Ian Pratt wrote:
> Please can you try this again, but this time building xen with debug=y
I have used 2.0-testing for this, with the same dom0 kernel as before -- I 
haven't got round to patching my kernel source tree again yet so that it'll 
work with a newer xen.
This was the output this time:

rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
open("/usr/lib/wine/apphelp.dll.so", O_RDONLY) = -1 ENOENT (No such file or 
directory)
open("/usr/lib/wine/apphelp.dll.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No 
suchfile or directory)
rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0
(XEN) DOM0: (file=memory.c, line=409) Bad L1 type settings 0100
(XEN) DOM0: (file=memory.c, line=1759) ptwr: Could not re-validate l1 page
(XEN)
(XEN) BUG at domain.c:143
(XEN) CPU:    0
(XEN) EIP:    0808:[<fc505e0e>]
(XEN) EFLAGS: 00011292
(XEN) eax: 00000000   ebx: fcff99e0   ecx: 00000000   edx: 00000000
(XEN) esi: 00000010   edi: ffb51000   ebp: 00000004   esp: fc503e84
(XEN) ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
(XEN) Stack trace from ESP=fc503e84:
(XEN) fc5323f8 fc532497 0000008f 00000000 fcff0000 00000000 00000010 
[fc521b76]
(XEN)        ffb50000 00000000 000006df [fc52279c] 00000000 c0386f08 00000008 
80000002
(XEN)        fc57a180 00000004 fcff0004 00000001 00000ffc e0000000 fcff99e0 
fd45b740
(XEN)        fcff9a54 00000010 fcff9a54 fcff99e0 00000000 ffb50000 00000000 
d29d9ffd
(XEN)        fef4a764 00000001 c0109929 07e27061 00001096 00049aa2 07e27061 
d29d8ffc
(XEN)        00000010 00000000 00007e28 [fc52394d] 00000001 fc503f6c 00000000 
000000b5
(XEN)        [fc518117] f1ce2900 0000002e 000002da 00000010 07e28061 05b40000 
00000000
(XEN)        00000001 feffbb68 07e27063 fcff99e0 d29d8ffc fc503fb8 00000000 
[fc52c568]
(XEN)        d29d8ffc 00000000 c010a960 77acdfa0 [fc511972] fc551b40 00000000 
fcff99e0
(XEN)        ffffffff c010a960 77acdfa0 [fc530c3e] fc503fb8 77ef5ca4 0000007b 
00000000
(XEN)        ffffffff c010a960 77acdfa0 d29d9010 000e0003 c0109b38 00000061 
00011296
(XEN)        d29d9000 00000069 0000007b 0000007b 0000003b 00000033 fcff99e0
(XEN) Call Trace from ESP=fc503e84: [<fc521b76>] [<fc52279c>] [<fc52394d>] 
[<fc518117>] [<fc52c568>] [<fc511972>]
(XEN)    [<fc530c3e>]

****************************************
CPU0 FATAL TRAP: vector = 6 (invalid operand)
[error_code=0000]
Aieee! CPU0 is toast...
****************************************

Reboot in five seconds...



-	Peter.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
  2005-03-16 18:00   ` Peter Joanes
@ 2005-03-16 18:20     ` Keir Fraser
  2005-03-16 19:07         ` Peter Joanes
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2005-03-16 18:20 UTC (permalink / raw)
  To: Peter Joanes; +Cc: xen-devel


On 16 Mar 2005, at 18:00, Peter Joanes wrote:

> (XEN) DOM0: (file=memory.c, line=409) Bad L1 type settings 0100
> (XEN) DOM0: (file=memory.c, line=1759) ptwr: Could not re-validate l1 
> page
> (XEN)

Someone is attempting to set _PAGE_GLOBAL bit in one of the page-table 
entries.
XenLinux should never do this -- do you have any suspicious-looking 
modules installed in your kernel that may need fixing? Alternatively we 
could silently drop the bit rather than killing the guest. :-)

It would be interesting to know, if you kill the test that fails above 
in the Xen code, whether your problems all go away. It's the test that 
starts:
     if ( unlikely(l1v & (_PAGE_GLOBAL|_PAGE_PAT)) )

  -- Keir



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
@ 2005-03-16 18:39     ` Peter Joanes
  2005-03-16 18:55       ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Joanes @ 2005-03-16 18:39 UTC (permalink / raw)
  To: xen-devel

On Wednesday 16 March 2005 16:22, Keir Fraser wrote:
> The stack backtrace also indicates that the code at address 0xc010990a
> in XenLinux may also be interesting (that is the code point that caused
> the crash).
> 'objdump -d vmlinux' and then grep the output for that address....

Here's what I think is the relevant section:

c0109900:       1e                      push   %ds
c0109901:       50                      push   %eax
c0109902:       31 c0                   xor    %eax,%eax
c0109904:       55                      push   %ebp
c0109905:       57                      push   %edi
c0109906:       56                      push   %esi
c0109907:       52                      push   %edx
c0109908:       48                      dec    %eax
c0109909:       51                      push   %ecx
c010990a:       53                      push   %ebx
c010990b:       fc                      cld
c010990c:       8c c1                   mov    %es,%ecx
c010990e:       8b 7c 24 20             mov    0x20(%esp),%edi
c0109912:       8b 54 24 24             mov    0x24(%esp),%edx
c0109916:       89 44 24 24             mov    %eax,0x24(%esp)
c010991a:       89 4c 24 20             mov    %ecx,0x20(%esp)
c010991e:       b9 7b 00 00 00          mov    $0x7b,%ecx
c0109923:       8e d9                   mov    %ecx,%ds
c0109925:       8e c1                   mov    %ecx,%es
c0109927:       89 e0                   mov    %esp,%eax
c0109929:       8b 35 9c cf 2f c0       mov    0xc02fcf9c,%esi
c010992f:       8a 5e 01                mov    0x1(%esi),%bl
c0109932:       88 5c 24 2e             mov    %bl,0x2e(%esp)
c0109936:       ff d7                   call   *%edi
c0109938:       e9 3b fd ff ff          jmp    0xc0109678


-	Peter.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
  2005-03-16 18:39     ` Peter Joanes
@ 2005-03-16 18:55       ` Keir Fraser
  2005-03-17 12:33           ` Peter Joanes
  0 siblings, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2005-03-16 18:55 UTC (permalink / raw)
  To: Peter Joanes; +Cc: xen-devel


On 16 Mar 2005, at 18:39, Peter Joanes wrote:

> On Wednesday 16 March 2005 16:22, Keir Fraser wrote:
>> The stack backtrace also indicates that the code at address 0xc010990a
>> in XenLinux may also be interesting (that is the code point that 
>> caused
>> the crash).
>> 'objdump -d vmlinux' and then grep the output for that address....
>
> Here's what I think is the relevant section:

Not so useful unfortunately. Here's a slightly harder thing to try:

1. Edit linux/include/asm-xen/asm-i386/pgtable-2level.h, and change 
set_pte_batched() to use xen_l1_pgentry_update rather than 
queue_l1_pgentry_update. Also, kill current definition of set_pte() and 
redefine it to the same as the new set_pte_batched().

2. Edit xen/arch/x86/memory.c, at the test that fails for you ('Bad L1 
type'). In the error path, before returning 0, add:
     extern void show_guest_stack(void);
     show_guest_stack();

This should cause us to get a backtrace of the guest kernel state when 
the bad PTE is written. If you send a link to that along with your 
vmlinux file then we can work out where the _PAGE_GLOBAL bit is coming 
from.

  -- Keir



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
@ 2005-03-16 19:07         ` Peter Joanes
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Joanes @ 2005-03-16 19:07 UTC (permalink / raw)
  To: xen-devel

On Wednesday 16 March 2005 18:20, Keir Fraser wrote:
> Someone is attempting to set _PAGE_GLOBAL bit in one of the page-table
> entries.
> XenLinux should never do this -- do you have any suspicious-looking
> modules installed in your kernel that may need fixing?
Nothing that doesn't come with the vanilla sources; lsmod says:
Module                  Size  Used by
pppoatm                 4640  1
nfsd                   92328  9
exportfs                5024  1 nfsd
lockd                  61416  2 nfsd
sunrpc                132676  12 nfsd,lockd
ipt_MASQUERADE          2528  1
iptable_nat            23240  2 ipt_MASQUERADE
ip_conntrack           40340  2 ipt_MASQUERADE,iptable_nat
ipt_REJECT              5632  9
iptable_filter          2752  1
ip_tables              16640  4 
ipt_MASQUERADE,iptable_nat,ipt_REJECT,iptable_filter
bridge                 45944  0
speedtch               10600  0
firmware_class          7552  1 speedtch
usb_atm                13360  2 speedtch
uhci_hcd               30736  0
usbcore               106616  4 speedtch,usb_atm,uhci_hcd
br2684                  7236  0
atm                    37528  5 pppoatm,usb_atm,br2684
ppp_generic            21396  5 pppoatm
slhc                    6304  1 ppp_generic
loop                   12968  0

> It would be interesting to know, if you kill the test that fails above
> in the Xen code, whether your problems all go away. It's the test that
> starts:
>      if ( unlikely(l1v & (_PAGE_GLOBAL|_PAGE_PAT)) )
I commented out that part of memory.c, and it still crashes. This time the end 
of the output is:

open("/usr/lib/wine/apphelp.dll.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No 
suchfile or directory)
rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0
(XEN) (file=extable.c, line=71) Pre-exception: fc530a7e -> fc530b34
(XEN) (file=traps.c, line=463) Page fault: fc530b49 -> fc505d30
(XEN) BUG at domain.c:143
(XEN) CPU:    0
(XEN) EIP:    0808:[<fc505e0e>]
(XEN) EFLAGS: 00011296
(XEN) eax: 00000000   ebx: fcff99e0   ecx: 00000000   edx: 00000000
(XEN) esi: 000e0003   edi: c010a960   ebp: 77acdfa0   esp: fc503f9c
(XEN) ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
(XEN) Stack trace from ESP=fc503f9c:
(XEN) fc5323c8 fc532467 0000008f 000e0003 [fc505d30] fcff99e0 000e0003 
77ef5ca4
(XEN)        0000007b 00000000 ffffffff c010a960 77acdfa0 00000000 000e0003 
c0109904
(XEN)        00000061 00011246 d2411000 00000069 0000007b 0000007b 0000003b 
00000033
(XEN)        fcff99e0
(XEN) Call Trace from ESP=fc503f9c: [<fc505d30>]

****************************************
CPU0 FATAL TRAP: vector = 6 (invalid operand)
[error_code=0000]
Aieee! CPU0 is toast...
****************************************

Reboot in five seconds...


-	Peter.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
  2005-03-16 16:22 ` Keir Fraser
  2005-03-16 18:39     ` Peter Joanes
@ 2005-03-16 19:35   ` Kip Macy
  1 sibling, 0 replies; 15+ messages in thread
From: Kip Macy @ 2005-03-16 19:35 UTC (permalink / raw)
  To: xen-devel, Peter Joanes

Real men don't use debuggers ;-) but for those of us who do:

gdb vmlinux 
(gdb) x/i 0xc01099a 

will do the same thing. If you have symbols compiled in, you can also do :

(gdb) info line *0xc01099a 

and get the exact line of C code.

  -Kip

> The stack backtrace also indicates that the code at address 0xc010990a
> in XenLinux may also be interesting (that is the code point that caused
> the crash).
> 'objdump -d vmlinux' and then grep the output for that address....
>


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
@ 2005-03-17 12:33           ` Peter Joanes
  2005-03-17 13:17             ` Keir Fraser
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Joanes @ 2005-03-17 12:33 UTC (permalink / raw)
  To: xen-devel

On Wednesday 16 March 2005 18:55, Keir wrote:
> ... Here's a slightly harder thing to try:
>
> 1. Edit linux/include/asm-xen/asm-i386/pgtable-2level.h, and change
> set_pte_batched() to use xen_l1_pgentry_update rather than
> queue_l1_pgentry_update. Also, kill current definition of set_pte() and
> redefine it to the same as the new set_pte_batched().
s/pgentry/entry/ in the above? -- That's what I've used.

> 2. Edit xen/arch/x86/memory.c, at the test that fails for you ('Bad L1
> type'). In the error path, before returning 0, add:
>      extern void show_guest_stack(void);
>      show_guest_stack();
I left out the definition, it was somewhere else.

> This should cause us to get a backtrace of the guest kernel state when
> the bad PTE is written. If you send a link to that along with your
> vmlinux file then we can work out where the _PAGE_GLOBAL bit is coming
> from.
You can temporarily get the vmlinux from <http://joanes.net/vmlinux>. The 
output contains just a couple of extra lines:

open("/usr/lib/wine/apphelp.dll.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No 
suchfile or directory)
rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0
(XEN) (file=extable.c, line=71) Pre-exception: fc530abe -> fc530b74
(XEN) DOM0: (file=memory.c, line=409) Bad L1 type settings 0100
(XEN) Guest EIP is c0109904
(XEN)
(XEN) DOM0: (file=memory.c, line=1760) ptwr: Could not re-validate l1 page
(XEN)
(XEN) BUG at domain.c:143
(XEN) CPU:    0
(XEN) EIP:    0808:[<fc505e0e>]
(XEN) EFLAGS: 00011292
(XEN) eax: 00000000   ebx: fcff99e0   ecx: 00000000   edx: 00000000
(XEN) esi: 00000010   edi: ff951000   ebp: 00000010   esp: fc503e84
(XEN) ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
(XEN) Stack trace from ESP=fc503e84:
(XEN) fc532408 fc5324a7 0000008f 00000000 fcff0000 00000000 00000010 
[fc521b86]
(XEN)        ff950000 00000000 000006e0 [fc5227ac] 00000000 c0386f08 00000008 
80000001
(XEN)        ff913448 00000010 fcff0010 00000004 00000ff0 e0000000 fcff99e0 
00000010
(XEN)        fcff9a54 00000010 00000008 fcff99e0 00000000 ff950000 00000000 
ce42affd
(XEN)        fef390a8 00000001 c0109929 0c3d6061 00001096 0030b033 0c3d6061 
ce429ffc
(XEN)        00000010 00000000 0000c3d7 [fc52395d] 00000001 fc503f6c 00000000 
0000005c
(XEN)        [fc518117] 2d47dca4 0000003e 000002e0 00000010 0c3d7061 05c00000 
00000000
(XEN)        00000001 feffbb80 0c3d6063 fcff99e0 ce429ffc fc503fb8 ce521ff9 
[fc52c578]
(XEN)        ce429ffc 00000000 c010a960 77acdfa0 [fc511972] 000e0003 0000a960 
fcff99e0
(XEN)        ffffffff c010a960 77acdfa0 [fc530c4e] fc503fb8 77ef5ca4 0000007b 
00000000
(XEN)        ffffffff c010a960 77acdfa0 00000000 000e0003 c0109904 00000061 
00011246
(XEN)        ce42a000 00000069 0000007b 0000007b 0000003b 00000033 fcff99e0
(XEN) Call Trace from ESP=fc503e84: [<fc521b86>] [<fc5227ac>] [<fc52395d>] 
[<fc518117>] [<fc52c578>] [<fc511972>]
(XEN)    [<fc530c4e>]

****************************************
CPU0 FATAL TRAP: vector = 6 (invalid operand)
[error_code=0000]
Aieee! CPU0 is toast...
****************************************

Reboot in five seconds...


-	Peter.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Reproducible system crash
  2005-03-17 12:33           ` Peter Joanes
@ 2005-03-17 13:17             ` Keir Fraser
  0 siblings, 0 replies; 15+ messages in thread
From: Keir Fraser @ 2005-03-17 13:17 UTC (permalink / raw)
  To: Peter Joanes; +Cc: xen-devel


On 17 Mar 2005, at 12:33, Peter Joanes wrote:

> On Wednesday 16 March 2005 18:55, Keir wrote:
>> ... Here's a slightly harder thing to try:
>>
>> 1. Edit linux/include/asm-xen/asm-i386/pgtable-2level.h, and change
>> set_pte_batched() to use xen_l1_pgentry_update rather than
>> queue_l1_pgentry_update. Also, kill current definition of set_pte() 
>> and
>> redefine it to the same as the new set_pte_batched().
> s/pgentry/entry/ in the above? -- That's what I've used

Yes.

>> 2. Edit xen/arch/x86/memory.c, at the test that fails for you ('Bad L1
>> type'). In the error path, before returning 0, add:
>>      extern void show_guest_stack(void);
>>      show_guest_stack();
> I left out the definition, it was somewhere else.

Okay.

The kernel is still using writable pagetables however (you can tell 
because of the 'could not re-validate L1 page' error message). Perhaps 
you ran the wrong kernel, or didn't redefine set_pte() correctly in the 
right place?

  -- Keir



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-03-17 13:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-16 16:08 Reproducible system crash Ian Pratt
2005-03-16 16:22 ` Keir Fraser
2005-03-16 18:39   ` Peter Joanes
2005-03-16 18:39     ` Peter Joanes
2005-03-16 18:55       ` Keir Fraser
2005-03-17 12:33         ` Peter Joanes
2005-03-17 12:33           ` Peter Joanes
2005-03-17 13:17             ` Keir Fraser
2005-03-16 19:35   ` Kip Macy
2005-03-16 18:00 ` Peter Joanes
2005-03-16 18:00   ` Peter Joanes
2005-03-16 18:20     ` Keir Fraser
2005-03-16 19:07       ` Peter Joanes
2005-03-16 19:07         ` Peter Joanes
     [not found] <A95E2296287EAD4EB592B5DEEFCE0E9D1E36A2@liverpoolst.ad.cl.cam.ac.uk>
2005-03-16 14:29 ` Peter Joanes
2005-03-16 14:29   ` Peter Joanes
  -- strict thread matches above, loose matches on Subject: below --
2005-03-15 16:44 Ian Pratt
2005-03-15 16:04 Ian Pratt
2005-03-15 16:30 ` Peter Joanes
2005-03-15 16:30   ` Peter Joanes
2005-03-15 14:36 Peter Joanes
2005-03-15 14:36 ` Peter Joanes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.