All of lore.kernel.org
 help / color / mirror / Atom feed
* refcount errors then crash on XenoLinux with the latest source
  2004-02-22 21:38 dumping a domain's core Ian Pratt
@ 2004-02-23 21:02 ` Kip Macy
  2004-02-23 21:36   ` Kip Macy
  0 siblings, 1 reply; 13+ messages in thread
From: Kip Macy @ 2004-02-23 21:02 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel

I had just tested my domain builder for the nth time on xeno-unstable
(very latest source), when I saw the messages below on the console.
DOM0 no longer responds to ping - I'm hoping that it will recover,
however, in all likelihood I will be hitting the rpb in a few minutes.

audit_all_pages
zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
refcount error: pfn=000000 cf=fffffffd refcount=1
audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0

refcount error: pfn=000247 cf=00000001 refcount=0
audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040

refcount error: pfn=00024d cf=00000001 refcount=0
audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040

refcount error: pfn=00036f cf=40000002 refcount=1
audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
    pte_idx=3f9 *pte_idx=0036f063

refcount error: pfn=000371 cf=40000002 refcount=1
audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
    pte_idx=3fe *pte_idx=00371063

refcount error: pfn=000372 cf=40000002 refcount=1
audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
    pte_idx=3fd *pte_idx=00372063

refcount error: pfn=000390 cf=00000001 refcount=0
audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780

refcount error: pfn=000392 cf=00000001 refcount=0
audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780

refcount error: pfn=000393 cf=00000001 refcount=0
audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320

refcount error: pfn=000395 cf=00000001 refcount=0
audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320

refcount error: pfn=00039f cf=00000001 refcount=0
audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0

refcount error: pfn=0003a1 cf=00000001 refcount=0
audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0

refcount error: pfn=0003a2 cf=00000001 refcount=0
audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060

refcount error: pfn=0003a8 cf=00000001 refcount=0
audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060

refcount error: pfn=0003a9 cf=00000001 refcount=0
audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00

refcount error: pfn=0003ab cf=00000001 refcount=0
audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00

refcount error: pfn=0003ac cf=00000001 refcount=0
audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0

refcount error: pfn=0003ae cf=00000001 refcount=0
audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0

refcount error: pfn=0003af cf=00000001 refcount=0
audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340

refcount error: pfn=0003b1 cf=00000001 refcount=0
audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340

refcount error: pfn=0003b2 cf=00000001 refcount=0
audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0

refcount error: pfn=0003b4 cf=00000001 refcount=0
audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0






-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-23 21:02 ` refcount errors then crash on XenoLinux with the latest source Kip Macy
@ 2004-02-23 21:36   ` Kip Macy
  2004-02-23 23:35     ` Keir Fraser
  0 siblings, 1 reply; 13+ messages in thread
From: Kip Macy @ 2004-02-23 21:36 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel

After a few more minutes the following popped out on the console:

CPU:    1
EIP:    0808:[<fc532927>]
EFLAGS: 00010206
eax: 0a725012   ebx: 00000010   ecx: fc657560   edx: fc76a460
esi: fc76a460   edi: fc657540   ebp: 00000000   esp: fc64fda0
ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
Stack trace from ESP=fc64fda0:
ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a]
       fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 0000003e
       fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 [fc531ef0]
       fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 fc76a740
       04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 3d6e6670
       33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 [fc5b1c43]
       00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 0007fff0
       0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 0007fff0
       00000000 00000000 00000040 00010810 00000810 00000810 fc500810 ffffff10
       [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 0000004d
       00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 02000001
       fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] fc650200
       00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 00000004
       fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 fc648040
       00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 00000040
       fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 [fc5b585c]
       00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 69745f63
       5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172
       636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 fc648040

****************************************
CPU1 FATAL PAGE FAULT
[error_code=00000000]
Faulting linear address might be 0a725012
Aieee! CPU1 is toast...
****************************************

Is this oops from Xen or from XenoLinux? I downloaded the latest
ksymoops and did the following:
kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m ../xenolinux-2.4.25/System.map < ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt
ksymoops 2.4.9 on i686 2.4.25-xeno.  Options used
     -v ../xenolinux-2.4.25/vmlinux (specified)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.25-xeno/ (default)
     -m ../xenolinux-2.4.25/System.map (specified)

No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid
lsmod file?
Warning (compare_maps): mismatch on symbol state d, System.map says
c0175ca8, vmlinux says 0.  Ignoring System.map entry
Warning (compare_maps): mismatch on symbol state a, vmlinux says 0,
System.map says c0175ca8.  Ignoring System.map entry
CPU:    1
EIP:    0808:[<fc532927>]
Using defaults from ksymoopsSegmentation fault


				-Kip

On Mon, 23 Feb 2004, Kip Macy wrote:

> I had just tested my domain builder for the nth time on xeno-unstable
> (very latest source), when I saw the messages below on the console.
> DOM0 no longer responds to ping - I'm hoping that it will recover,
> however, in all likelihood I will be hitting the rpb in a few minutes.
>
> audit_all_pages
> zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
> refcount error: pfn=000000 cf=fffffffd refcount=1
> audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0
>
> refcount error: pfn=000247 cf=00000001 refcount=0
> audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040
>
> refcount error: pfn=00024d cf=00000001 refcount=0
> audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040
>
> refcount error: pfn=00036f cf=40000002 refcount=1
> audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3f9 *pte_idx=0036f063
>
> refcount error: pfn=000371 cf=40000002 refcount=1
> audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3fe *pte_idx=00371063
>
> refcount error: pfn=000372 cf=40000002 refcount=1
> audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3fd *pte_idx=00372063
>
> refcount error: pfn=000390 cf=00000001 refcount=0
> audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
>
> refcount error: pfn=000392 cf=00000001 refcount=0
> audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
>
> refcount error: pfn=000393 cf=00000001 refcount=0
> audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
>
> refcount error: pfn=000395 cf=00000001 refcount=0
> audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
>
> refcount error: pfn=00039f cf=00000001 refcount=0
> audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
>
> refcount error: pfn=0003a1 cf=00000001 refcount=0
> audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
>
> refcount error: pfn=0003a2 cf=00000001 refcount=0
> audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
>
> refcount error: pfn=0003a8 cf=00000001 refcount=0
> audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
>
> refcount error: pfn=0003a9 cf=00000001 refcount=0
> audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
>
> refcount error: pfn=0003ab cf=00000001 refcount=0
> audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
>
> refcount error: pfn=0003ac cf=00000001 refcount=0
> audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
>
> refcount error: pfn=0003ae cf=00000001 refcount=0
> audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
>
> refcount error: pfn=0003af cf=00000001 refcount=0
> audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
>
> refcount error: pfn=0003b1 cf=00000001 refcount=0
> audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
>
> refcount error: pfn=0003b2 cf=00000001 refcount=0
> audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
>
> refcount error: pfn=0003b4 cf=00000001 refcount=0
> audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
>
>
>
>
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-23 21:36   ` Kip Macy
@ 2004-02-23 23:35     ` Keir Fraser
  2004-02-24  1:11       ` Kip Macy
  0 siblings, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2004-02-23 23:35 UTC (permalink / raw)
  To: Kip Macy; +Cc: Ian Pratt, xen-devel


This is a Xen crash dump. ksymoops won't help -- you'll need to map
the crash dump to Xen code by hand. It doesn't take long. The
addresses in the stack trace that are enclosed in square brackets are
likely to be return addresses in the function-call trace.

'objdump -d xen >xen.s'. Then you can search in xen.s with a text
editor to find the call-trace addresses.

 -- Keir

> After a few more minutes the following popped out on the console:
> 
> CPU:    1
> EIP:    0808:[<fc532927>]
> EFLAGS: 00010206
> eax: 0a725012   ebx: 00000010   ecx: fc657560   edx: fc76a460
> esi: fc76a460   edi: fc657540   ebp: 00000000   esp: fc64fda0
> ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
> Stack trace from ESP=fc64fda0:
> ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a]
>        fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 0000003e
>        fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 [fc531ef0]
>        fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 fc76a740
>        04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 3d6e6670
>        33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 [fc5b1c43]
>        00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 0007fff0
>        0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 0007fff0
>        00000000 00000000 00000040 00010810 00000810 00000810 fc500810 ffffff10
>        [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 0000004d
>        00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 02000001
>        fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] fc650200
>        00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 00000004
>        fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 fc648040
>        00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 00000040
>        fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 [fc5b585c]
>        00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 69745f63
>        5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172
>        636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 fc648040
> 
> ****************************************
> CPU1 FATAL PAGE FAULT
> [error_code=00000000]
> Faulting linear address might be 0a725012
> Aieee! CPU1 is toast...
> ****************************************
> 
> Is this oops from Xen or from XenoLinux? I downloaded the latest
> ksymoops and did the following:
> kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m ../xenolinux-2.4.25/System.map < ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt
> ksymoops 2.4.9 on i686 2.4.25-xeno.  Options used
>      -v ../xenolinux-2.4.25/vmlinux (specified)
>      -k /proc/ksyms (default)
>      -l /proc/modules (default)
>      -o /lib/modules/2.4.25-xeno/ (default)
>      -m ../xenolinux-2.4.25/System.map (specified)
> 
> No modules in ksyms, skipping objects
> Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid
> lsmod file?
> Warning (compare_maps): mismatch on symbol state d, System.map says
> c0175ca8, vmlinux says 0.  Ignoring System.map entry
> Warning (compare_maps): mismatch on symbol state a, vmlinux says 0,
> System.map says c0175ca8.  Ignoring System.map entry
> CPU:    1
> EIP:    0808:[<fc532927>]
> Using defaults from ksymoopsSegmentation fault
> 
> 
> 				-Kip
> 
> On Mon, 23 Feb 2004, Kip Macy wrote:
> 
> > I had just tested my domain builder for the nth time on xeno-unstable
> > (very latest source), when I saw the messages below on the console.
> > DOM0 no longer responds to ping - I'm hoping that it will recover,
> > however, in all likelihood I will be hitting the rpb in a few minutes.
> >
> > audit_all_pages
> > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
> > refcount error: pfn=000000 cf=fffffffd refcount=1
> > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0
> >
> > refcount error: pfn=000247 cf=00000001 refcount=0
> > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040
> >
> > refcount error: pfn=00024d cf=00000001 refcount=0
> > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040
> >
> > refcount error: pfn=00036f cf=40000002 refcount=1
> > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> >     pte_idx=3f9 *pte_idx=0036f063
> >
> > refcount error: pfn=000371 cf=40000002 refcount=1
> > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> >     pte_idx=3fe *pte_idx=00371063
> >
> > refcount error: pfn=000372 cf=40000002 refcount=1
> > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> >     pte_idx=3fd *pte_idx=00372063
> >
> > refcount error: pfn=000390 cf=00000001 refcount=0
> > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> >
> > refcount error: pfn=000392 cf=00000001 refcount=0
> > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> >
> > refcount error: pfn=000393 cf=00000001 refcount=0
> > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> >
> > refcount error: pfn=000395 cf=00000001 refcount=0
> > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> >
> > refcount error: pfn=00039f cf=00000001 refcount=0
> > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> >
> > refcount error: pfn=0003a1 cf=00000001 refcount=0
> > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> >
> > refcount error: pfn=0003a2 cf=00000001 refcount=0
> > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> >
> > refcount error: pfn=0003a8 cf=00000001 refcount=0
> > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> >
> > refcount error: pfn=0003a9 cf=00000001 refcount=0
> > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> >
> > refcount error: pfn=0003ab cf=00000001 refcount=0
> > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> >
> > refcount error: pfn=0003ac cf=00000001 refcount=0
> > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> >
> > refcount error: pfn=0003ae cf=00000001 refcount=0
> > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> >
> > refcount error: pfn=0003af cf=00000001 refcount=0
> > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> >
> > refcount error: pfn=0003b1 cf=00000001 refcount=0
> > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> >
> > refcount error: pfn=0003b2 cf=00000001 refcount=0
> > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> >
> > refcount error: pfn=0003b4 cf=00000001 refcount=0
> > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> >
> >
> >
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > Build and deploy apps & Web services for Linux with
> > a free DVD software kit from IBM. Click Now!
> > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
> >
> 
> 
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: refcount errors then crash on XenoLinux with the latest source
@ 2004-02-23 23:52 Neugebauer, Rolf
  2004-02-24  0:21 ` Kip Macy
  0 siblings, 1 reply; 13+ messages in thread
From: Neugebauer, Rolf @ 2004-02-23 23:52 UTC (permalink / raw)
  To: Kip Macy, Ian Pratt; +Cc: xen-devel, Neugebauer, Rolf



> -----Original Message-----
> From: xen-devel-admin@lists.sourceforge.net [mailto:xen-devel-
> admin@lists.sourceforge.net] On Behalf Of Kip Macy
> Sent: 23 February 2004 21:03
> To: Ian Pratt
> Cc: xen-devel@lists.sourceforge.net
> Subject: [Xen-devel] refcount errors then crash on XenoLinux with the
> latest source
> 
> I had just tested my domain builder for the nth time on xeno-unstable
> (very latest source), when I saw the messages below on the console.
> DOM0 no longer responds to ping - I'm hoping that it will recover,

That is normal if you do an audit of all pages. Interrupts are disabled
for the whole time and you are basically trawling the entire frametable
twice, which can take some time depending on your memory size.

> however, in all likelihood I will be hitting the rpb in a few minutes.
> 
> audit_all_pages
> zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
> refcount error: pfn=000000 cf=fffffffd refcount=1
> audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0

This one is odd, I think. At least I haven't seen it before.

> refcount error: pfn=000247 cf=00000001 refcount=0
> audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040
> 
> refcount error: pfn=00024d cf=00000001 refcount=0
> audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040

These are expected. They are used for comms rings.

The rest all indicate an error of some sort. The first three have a
mapping but their ref count is wrong.

I don't know the bitmasks for the type and the top of my head but I can
have a look tomorrow.

If it helps, I also have some more debug code which allows a domain to
get the pfn_info from Xen for a given page. I could send you a patch
against unstable again tomorrow.

Rolf

> refcount error: pfn=00036f cf=40000002 refcount=1
> audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3f9 *pte_idx=0036f063
> 
> refcount error: pfn=000371 cf=40000002 refcount=1
> audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3fe *pte_idx=00371063
> 
> refcount error: pfn=000372 cf=40000002 refcount=1
> audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3fd *pte_idx=00372063
> 
> refcount error: pfn=000390 cf=00000001 refcount=0
> audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> 
> refcount error: pfn=000392 cf=00000001 refcount=0
> audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> 
> refcount error: pfn=000393 cf=00000001 refcount=0
> audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> 
> refcount error: pfn=000395 cf=00000001 refcount=0
> audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> 
> refcount error: pfn=00039f cf=00000001 refcount=0
> audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> 
> refcount error: pfn=0003a1 cf=00000001 refcount=0
> audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> 
> refcount error: pfn=0003a2 cf=00000001 refcount=0
> audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> 
> refcount error: pfn=0003a8 cf=00000001 refcount=0
> audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> 
> refcount error: pfn=0003a9 cf=00000001 refcount=0
> audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> 
> refcount error: pfn=0003ab cf=00000001 refcount=0
> audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> 
> refcount error: pfn=0003ac cf=00000001 refcount=0
> audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> 
> refcount error: pfn=0003ae cf=00000001 refcount=0
> audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> 
> refcount error: pfn=0003af cf=00000001 refcount=0
> audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> 
> refcount error: pfn=0003b1 cf=00000001 refcount=0
> audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> 
> refcount error: pfn=0003b2 cf=00000001 refcount=0
> audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> 
> refcount error: pfn=0003b4 cf=00000001 refcount=0
> audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> 
> 
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id\x1356&alloc_id438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: refcount errors then crash on XenoLinux with the latest source
  2004-02-23 23:52 refcount errors then crash on XenoLinux with the latest source Neugebauer, Rolf
@ 2004-02-24  0:21 ` Kip Macy
  2004-02-24 18:42   ` Rolf Neugebauer
  0 siblings, 1 reply; 13+ messages in thread
From: Kip Macy @ 2004-02-24  0:21 UTC (permalink / raw)
  To: Neugebauer, Rolf; +Cc: Ian Pratt, xen-devel

>
> If it helps, I also have some more debug code which allows a domain to
> get the pfn_info from Xen for a given page. I could send you a patch
> against unstable again tomorrow.

That would be great. Although I'm hoping to figure out why I'm getting a
FAULT10 in my domain before I spend too much time on Xen.

			-Kip

>
> Rolf
>
> > refcount error: pfn=00036f cf=40000002 refcount=1
> > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> >     pte_idx=3f9 *pte_idx=0036f063
> >
> > refcount error: pfn=000371 cf=40000002 refcount=1
> > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> >     pte_idx=3fe *pte_idx=00371063
> >
> > refcount error: pfn=000372 cf=40000002 refcount=1
> > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> >     pte_idx=3fd *pte_idx=00372063
> >
> > refcount error: pfn=000390 cf=00000001 refcount=0
> > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> >
> > refcount error: pfn=000392 cf=00000001 refcount=0
> > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> >
> > refcount error: pfn=000393 cf=00000001 refcount=0
> > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> >
> > refcount error: pfn=000395 cf=00000001 refcount=0
> > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> >
> > refcount error: pfn=00039f cf=00000001 refcount=0
> > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> >
> > refcount error: pfn=0003a1 cf=00000001 refcount=0
> > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> >
> > refcount error: pfn=0003a2 cf=00000001 refcount=0
> > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> >
> > refcount error: pfn=0003a8 cf=00000001 refcount=0
> > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> >
> > refcount error: pfn=0003a9 cf=00000001 refcount=0
> > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> >
> > refcount error: pfn=0003ab cf=00000001 refcount=0
> > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> >
> > refcount error: pfn=0003ac cf=00000001 refcount=0
> > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> >
> > refcount error: pfn=0003ae cf=00000001 refcount=0
> > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> >
> > refcount error: pfn=0003af cf=00000001 refcount=0
> > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> >
> > refcount error: pfn=0003b1 cf=00000001 refcount=0
> > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> >
> > refcount error: pfn=0003b2 cf=00000001 refcount=0
> > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> >
> > refcount error: pfn=0003b4 cf=00000001 refcount=0
> > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> >
> >
> >
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > Build and deploy apps & Web services for Linux with
> > a free DVD software kit from IBM. Click Now!
> > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-23 23:35     ` Keir Fraser
@ 2004-02-24  1:11       ` Kip Macy
  2004-02-24  3:44         ` Kip Macy
  2004-02-24  8:40         ` Keir Fraser
  0 siblings, 2 replies; 13+ messages in thread
From: Kip Macy @ 2004-02-24  1:11 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, xen-devel


>
> This is a Xen crash dump. ksymoops won't help -- you'll need to map
> the crash dump to Xen code by hand. It doesn't take long. The
> addresses in the stack trace that are enclosed in square brackets are
> likely to be return addresses in the function-call trace.

This is sufficiently tedious that if this happens again I'm going to
either run screaming or write a ksymoops for xen.

>
> 'objdump -d xen >xen.s'. Then you can search in xen.s with a text
> editor to find the call-trace addresses.

I did this and got what you see below. It looks like to backtraces
interleaved. All of the values in brackets are legitimate return
addresses (they immediately follow a call instruction). "function addr"
is the address of the function itself and "ret addr" is the address
taken from the oops.

function		function addr	ret addr
================================================
putchar			fc5095be	fc5095ef
e100_rx_srv		fc532048	fc53240a
printf			fc5095f7	fc509664
putchar_serial		fc50927c	fc509299
e100intr		fc531d8f	fc531ef0
handle_IRQ_event	fc5b1a25	fc5b1a7d
do_IRQ			fc5b1bbb	fc5b1c43
call_do_IRQ		fc5af4bb	fc5af4c0
serial_rx_int		fc51801d	fc518078
serial_rx_int		fc51801d	fc518046
handle_IRQ_event	fc5b1a25	fc5b1a7d
reprogram_ac_timer	fc5af087	fc5af0aa
do_IRQ			fc5b1bbb	fc5b1c43
ac_timer_softirq_action	fc50455c	fc50465b
call_do_IRQ		fc5af4bb	fc5af4c0
default_idle		fc5b585c	fc5b582e
continue_cpu_idle_loop	fc5b585f	fc5b5898


The fault instruction is this:
fc532927:	66 83 38 00          	cmpw   $0x0,(%eax)
It is in e100_start_ru. Obviously eax is pointing at some piece of
unmapped memory. I'm not sufficiently versed in assembler, particularly
optimized, to tell where in we are going wrong:


	list_for_each(entry_ptr, &(bdp->active_rx_list)) {
		rx_struct =
			list_entry(entry_ptr, struct rx_list_elem, list_elem);
		pci_dma_sync_single(bdp->pdev, rx_struct->dma_addr,
				    bdp->rfd_size, PCI_DMA_FROMDEVICE);
		if (!((SKB_RFD_STATUS(rx_struct->skb, bdp) &
		       __constant_cpu_to_le16(RFD_STATUS_COMPLETE)))) {
			buffer_found = 1;
			break;
		}
	}

Could the list have been corrupted?


				-Kip


>
>  -- Keir
>
> > After a few more minutes the following popped out on the console:
> >
> > CPU:    1
> > EIP:    0808:[<fc532927>]
> > EFLAGS: 00010206
> > eax: 0a725012   ebx: 00000010   ecx: fc657560   edx: fc76a460
> > esi: fc76a460   edi: fc657540   ebp: 00000000   esp: fc64fda0
> > ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
> > Stack trace from ESP=fc64fda0:
> > ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a]
> >        fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 0000003e
> >        fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 [fc531ef0]
> >        fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 fc76a740
> >        04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 3d6e6670
> >        33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 [fc5b1c43]
> >        00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 0007fff0
> >        0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 0007fff0
> >        00000000 00000000 00000040 00010810 00000810 00000810 fc500810 ffffff10
> >        [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 0000004d
> >        00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 02000001
> >        fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] fc650200
> >        00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 00000004
> >        fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 fc648040
> >        00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 00000040
> >        fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 [fc5b585c]
> >        00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 69745f63
> >        5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172
> >        636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 fc648040
> >
> > ****************************************
> > CPU1 FATAL PAGE FAULT
> > [error_code=00000000]
> > Faulting linear address might be 0a725012
> > Aieee! CPU1 is toast...
> > ****************************************
> >
> > Is this oops from Xen or from XenoLinux? I downloaded the latest
> > ksymoops and did the following:
> > kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m ../xenolinux-2.4.25/System.map < ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt
> > ksymoops 2.4.9 on i686 2.4.25-xeno.  Options used
> >      -v ../xenolinux-2.4.25/vmlinux (specified)
> >      -k /proc/ksyms (default)
> >      -l /proc/modules (default)
> >      -o /lib/modules/2.4.25-xeno/ (default)
> >      -m ../xenolinux-2.4.25/System.map (specified)
> >
> > No modules in ksyms, skipping objects
> > Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid
> > lsmod file?
> > Warning (compare_maps): mismatch on symbol state d, System.map says
> > c0175ca8, vmlinux says 0.  Ignoring System.map entry
> > Warning (compare_maps): mismatch on symbol state a, vmlinux says 0,
> > System.map says c0175ca8.  Ignoring System.map entry
> > CPU:    1
> > EIP:    0808:[<fc532927>]
> > Using defaults from ksymoopsSegmentation fault
> >
> >
> > 				-Kip
> >
> > On Mon, 23 Feb 2004, Kip Macy wrote:
> >
> > > I had just tested my domain builder for the nth time on xeno-unstable
> > > (very latest source), when I saw the messages below on the console.
> > > DOM0 no longer responds to ping - I'm hoping that it will recover,
> > > however, in all likelihood I will be hitting the rpb in a few minutes.
> > >
> > > audit_all_pages
> > > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
> > > refcount error: pfn=000000 cf=fffffffd refcount=1
> > > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0
> > >
> > > refcount error: pfn=000247 cf=00000001 refcount=0
> > > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040
> > >
> > > refcount error: pfn=00024d cf=00000001 refcount=0
> > > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040
> > >
> > > refcount error: pfn=00036f cf=40000002 refcount=1
> > > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3f9 *pte_idx=0036f063
> > >
> > > refcount error: pfn=000371 cf=40000002 refcount=1
> > > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3fe *pte_idx=00371063
> > >
> > > refcount error: pfn=000372 cf=40000002 refcount=1
> > > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3fd *pte_idx=00372063
> > >
> > > refcount error: pfn=000390 cf=00000001 refcount=0
> > > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > >
> > > refcount error: pfn=000392 cf=00000001 refcount=0
> > > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > >
> > > refcount error: pfn=000393 cf=00000001 refcount=0
> > > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> > >
> > > refcount error: pfn=000395 cf=00000001 refcount=0
> > > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> > >
> > > refcount error: pfn=00039f cf=00000001 refcount=0
> > > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > >
> > > refcount error: pfn=0003a1 cf=00000001 refcount=0
> > > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > >
> > > refcount error: pfn=0003a2 cf=00000001 refcount=0
> > > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > >
> > > refcount error: pfn=0003a8 cf=00000001 refcount=0
> > > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > >
> > > refcount error: pfn=0003a9 cf=00000001 refcount=0
> > > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > >
> > > refcount error: pfn=0003ab cf=00000001 refcount=0
> > > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > >
> > > refcount error: pfn=0003ac cf=00000001 refcount=0
> > > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > >
> > > refcount error: pfn=0003ae cf=00000001 refcount=0
> > > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > >
> > > refcount error: pfn=0003af cf=00000001 refcount=0
> > > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> > >
> > > refcount error: pfn=0003b1 cf=00000001 refcount=0
> > > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> > >
> > > refcount error: pfn=0003b2 cf=00000001 refcount=0
> > > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > >
> > > refcount error: pfn=0003b4 cf=00000001 refcount=0
> > > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > >
> > >
> > >
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > > Build and deploy apps & Web services for Linux with
> > > a free DVD software kit from IBM. Click Now!
> > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> > >
> >
> >
> > -------------------------------------------------------
> > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > Build and deploy apps & Web services for Linux with
> > a free DVD software kit from IBM. Click Now!
> > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
>
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-24  1:11       ` Kip Macy
@ 2004-02-24  3:44         ` Kip Macy
  2004-02-24  8:15           ` Ian Pratt
  2004-02-24  8:40         ` Keir Fraser
  1 sibling, 1 reply; 13+ messages in thread
From: Kip Macy @ 2004-02-24  3:44 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, xen-devel

It happened again. Is it possible that Xen isn't disabling network
interrupts while it is "auditing all pages"?

			-Kip


Killing domain 1
Releasing task 1
DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes
DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes
DOM0: INIT: Id "x1" respawning too fast: disabled for 5 minutes
audit_all_pages
refcount error: pfn=000247 cf=00000001 refcount=0
audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040

refcount error: pfn=00024d cf=00000001 refcount=0
audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040

refcount error: pfn=00036f cf=40000002 refcount=1
audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0
    pte_idx=3f9 *pte_idx=0036f063

refcount error: pfn=000371 cf=40000002 refcount=1
audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0
    pte_idx=3fe *pte_idx=00371063

refcount error: pfn=000372 cf=40000002 refcount=1
audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
  pte_pfn=002207 cf=10000023 tf=30000021 dom=fc648be0
    pte_idx=3fd *pte_idx=00372063

CPU:    1
EIP:    0808:[<fc532ddf>]
EFLAGS: 00010206
eax: 06081012   ebx: 00000010   ecx: fc657560   edx: fc650da0
esi: fc650da0   edi: fc657540   ebp: 00000000   esp: fc64fd70
ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
Stack trace from ESP=fc64fd70:
ff803012 00000008 [fc51302a] fc76bb40 0000003c fc657540 fc657540
[fc5328c2]
       fc657540 fc657400 00000017 ffffffff 00000017 fc648040 00000040
0000003e
       fc657400 fc76bb40 00000040 fc76a740 04000001 fc657540 00005048
[fc5323a8]
       fc657540 fc5ebc80 fc5d1f1c 00000001 00000046 00000004 [fc509639]
fc76a740
       04000001 fc64fe60 00000010 [fc5b1f2d] 00000010 fc657400 fc64fe60
3d6e6670
       33303030 63203237 00000001 fc76a740 fc600bc0 00000010 fc64fe60
[fc5b20f3]
       00000010 fc64fe60 fc76a740 00002207 00372063 34429e8f 00000001
0007fff0
       0007fff0 00000002 00000004 [fc5af970] 0007fff0 fd800000 00000001
0007fff0
       00000002 00000004 00000040 00010810 00000810 00000810 fc500810
ffffff10
       [fc50d0e5] 00000808 00000202 0000004d fc64ff6c fc64ff6c
[fc509f22] 0000004d
       00000000 fc64ff6c [fc5162b6] 00000003 00000040 fc5ebc80
[fc517c4e] 0000004d
       fc64ff6c fc64ff6c [fc512529] 00000003 fc6501e0 02000001
[fc517ca4] fc5ebc80
       fc64ff6c 0092578a 00000000 fc651200 00000006 00000006 [fc5b1f2d]
00000004
       fc5ebc80 fc64ff6c [fc5b151c] 00000004 00000001 00000001 fc6501e0
fc6008c0
       00000004 fc64ff6c [fc5b20f3] 00000004 fc64ff6c fc6501e0
[fc511e2e] fc624494
       431ea128 00000001 00000040 fc648040 00000040 fc649780 [fc5af970]
00000040
       00000001 00000040 fc648040 00000040 fc649780 00000040 fc640810
fc640810
       00000810 fc640810 ffffff04 [fc5b5e04] 00000808 00000246
[fc5b5e40] fc648040
       004c4b40 ffffffff 655f6464 7972746e 5f636100 656d6974 61007372
69745f63
       5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974
62007172

On Mon, 23 Feb 2004, Kip Macy wrote:

>
> >
> > This is a Xen crash dump. ksymoops won't help -- you'll need to map
> > the crash dump to Xen code by hand. It doesn't take long. The
> > addresses in the stack trace that are enclosed in square brackets are
> > likely to be return addresses in the function-call trace.
>
> This is sufficiently tedious that if this happens again I'm going to
> either run screaming or write a ksymoops for xen.
>
> >
> > 'objdump -d xen >xen.s'. Then you can search in xen.s with a text
> > editor to find the call-trace addresses.
>
> I did this and got what you see below. It looks like to backtraces
> interleaved. All of the values in brackets are legitimate return
> addresses (they immediately follow a call instruction). "function addr"
> is the address of the function itself and "ret addr" is the address
> taken from the oops.
>
> function		function addr	ret addr
> ================================================
> putchar			fc5095be	fc5095ef
> e100_rx_srv		fc532048	fc53240a
> printf			fc5095f7	fc509664
> putchar_serial		fc50927c	fc509299
> e100intr		fc531d8f	fc531ef0
> handle_IRQ_event	fc5b1a25	fc5b1a7d
> do_IRQ			fc5b1bbb	fc5b1c43
> call_do_IRQ		fc5af4bb	fc5af4c0
> serial_rx_int		fc51801d	fc518078
> serial_rx_int		fc51801d	fc518046
> handle_IRQ_event	fc5b1a25	fc5b1a7d
> reprogram_ac_timer	fc5af087	fc5af0aa
> do_IRQ			fc5b1bbb	fc5b1c43
> ac_timer_softirq_action	fc50455c	fc50465b
> call_do_IRQ		fc5af4bb	fc5af4c0
> default_idle		fc5b585c	fc5b582e
> continue_cpu_idle_loop	fc5b585f	fc5b5898
>
>
> The fault instruction is this:
> fc532927:	66 83 38 00          	cmpw   $0x0,(%eax)
> It is in e100_start_ru. Obviously eax is pointing at some piece of
> unmapped memory. I'm not sufficiently versed in assembler, particularly
> optimized, to tell where in we are going wrong:
>
>
> 	list_for_each(entry_ptr, &(bdp->active_rx_list)) {
> 		rx_struct =
> 			list_entry(entry_ptr, struct rx_list_elem, list_elem);
> 		pci_dma_sync_single(bdp->pdev, rx_struct->dma_addr,
> 				    bdp->rfd_size, PCI_DMA_FROMDEVICE);
> 		if (!((SKB_RFD_STATUS(rx_struct->skb, bdp) &
> 		       __constant_cpu_to_le16(RFD_STATUS_COMPLETE)))) {
> 			buffer_found = 1;
> 			break;
> 		}
> 	}
>
> Could the list have been corrupted?
>
>
> 				-Kip
>
>
> >
> >  -- Keir
> >
> > > After a few more minutes the following popped out on the console:
> > >
> > > CPU:    1
> > > EIP:    0808:[<fc532927>]
> > > EFLAGS: 00010206
> > > eax: 0a725012   ebx: 00000010   ecx: fc657560   edx: fc76a460
> > > esi: fc76a460   edi: fc657540   ebp: 00000000   esp: fc64fda0
> > > ds: 0810   es: 0810   fs: 0810   gs: 0810   ss: 0810
> > > Stack trace from ESP=fc64fda0:
> > > ff865012 0000000a [fc5095ef] fc780140 0000003c fc657540 fc657540 [fc53240a]
> > >        fc657540 fc657400 [fc509664] 0000000a [fc509299] fc648040 00000040 0000003e
> > >        fc657400 fc780140 00000040 fc76a740 04000001 fc657540 00005048 [fc531ef0]
> > >        fc657540 00000046 [fc509664] 0000000a 30303030 74203130 00000046 fc76a740
> > >        04000001 fc64fe90 00000010 [fc5b1a7d] 00000010 fc657400 fc64fe90 3d6e6670
> > >        33303030 63203462 00000001 fc76a740 fc600200 00000010 fc64fe90 [fc5b1c43]
> > >        00000010 fc64fe90 fc76a740 0007fff0 000003b4 25c4fe2d 00000001 0007fff0
> > >        0007fff0 00000000 00000000 [fc5af4c0] 0007fff0 fd800000 00000001 0007fff0
> > >        00000000 00000000 00000040 00010810 00000810 00000810 fc500810 ffffff10
> > >        [fc50cff5] 00000808 00000202 fc654d4d 0000004d fc64ff6c [fc518078] 0000004d
> > >        00000000 fc64ff6c [fc518046] 0036bfec 00000000 00000292 fc654d00 02000001
> > >        fc64ff6c 00000004 [fc5b1a7d] 00000004 00000000 fc64ff6c [fc5af0aa] fc650200
> > >        00000086 00000001 fc654d00 fc5fff00 00000004 fc64ff6c [fc5b1c43] 00000004
> > >        fc64ff6c fc654d00 [fc50465b] 35c9c161 50d04d38 00000001 00000040 fc648040
> > >        00000040 fc7b8080 [fc5af4c0] 00000040 00000028 00000040 fc648040 00000040
> > >        fc7b8080 00000040 fc640810 fc640810 00000810 fc7b0810 ffffff04 [fc5b585c]
> > >        00000808 00000246 [fc5b5898] fc648040 004c4b40 ffffffff 61007372 69745f63
> > >        5f72656d 74666f73 5f717269 69746361 64006e6f 5f706d75 656d6974 62007172
> > >        636f6c72 00632e6b 736e6f63 2e656c6f 65640063 2e677562 65640063 fc648040
> > >
> > > ****************************************
> > > CPU1 FATAL PAGE FAULT
> > > [error_code=00000000]
> > > Faulting linear address might be 0a725012
> > > Aieee! CPU1 is toast...
> > > ****************************************
> > >
> > > Is this oops from Xen or from XenoLinux? I downloaded the latest
> > > ksymoops and did the following:
> > > kmacy@xentap ./ksymoops -v ../xenolinux-2.4.25/vmlinux -m ../xenolinux-2.4.25/System.map < ../xeno-unstable.bk.home/tools/xc/lib/crash1.txt
> > > ksymoops 2.4.9 on i686 2.4.25-xeno.  Options used
> > >      -v ../xenolinux-2.4.25/vmlinux (specified)
> > >      -k /proc/ksyms (default)
> > >      -l /proc/modules (default)
> > >      -o /lib/modules/2.4.25-xeno/ (default)
> > >      -m ../xenolinux-2.4.25/System.map (specified)
> > >
> > > No modules in ksyms, skipping objects
> > > Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid
> > > lsmod file?
> > > Warning (compare_maps): mismatch on symbol state d, System.map says
> > > c0175ca8, vmlinux says 0.  Ignoring System.map entry
> > > Warning (compare_maps): mismatch on symbol state a, vmlinux says 0,
> > > System.map says c0175ca8.  Ignoring System.map entry
> > > CPU:    1
> > > EIP:    0808:[<fc532927>]
> > > Using defaults from ksymoopsSegmentation fault
> > >
> > >
> > > 				-Kip
> > >
> > > On Mon, 23 Feb 2004, Kip Macy wrote:
> > >
> > > > I had just tested my domain builder for the nth time on xeno-unstable
> > > > (very latest source), when I saw the messages below on the console.
> > > > DOM0 no longer responds to ping - I'm hoping that it will recover,
> > > > however, in all likelihood I will be hitting the rpb in a few minutes.
> > > >
> > > > audit_all_pages
> > > > zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
> > > > refcount error: pfn=000000 cf=fffffffd refcount=1
> > > > audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0
> > > >
> > > > refcount error: pfn=000247 cf=00000001 refcount=0
> > > > audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040
> > > >
> > > > refcount error: pfn=00024d cf=00000001 refcount=0
> > > > audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040
> > > >
> > > > refcount error: pfn=00036f cf=40000002 refcount=1
> > > > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > > >     pte_idx=3f9 *pte_idx=0036f063
> > > >
> > > > refcount error: pfn=000371 cf=40000002 refcount=1
> > > > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > > >     pte_idx=3fe *pte_idx=00371063
> > > >
> > > > refcount error: pfn=000372 cf=40000002 refcount=1
> > > > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > > >     pte_idx=3fd *pte_idx=00372063
> > > >
> > > > refcount error: pfn=000390 cf=00000001 refcount=0
> > > > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > > >
> > > > refcount error: pfn=000392 cf=00000001 refcount=0
> > > > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > > >
> > > > refcount error: pfn=000393 cf=00000001 refcount=0
> > > > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> > > >
> > > > refcount error: pfn=000395 cf=00000001 refcount=0
> > > > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> > > >
> > > > refcount error: pfn=00039f cf=00000001 refcount=0
> > > > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > > >
> > > > refcount error: pfn=0003a1 cf=00000001 refcount=0
> > > > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > > >
> > > > refcount error: pfn=0003a2 cf=00000001 refcount=0
> > > > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > > >
> > > > refcount error: pfn=0003a8 cf=00000001 refcount=0
> > > > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > > >
> > > > refcount error: pfn=0003a9 cf=00000001 refcount=0
> > > > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > > >
> > > > refcount error: pfn=0003ab cf=00000001 refcount=0
> > > > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > > >
> > > > refcount error: pfn=0003ac cf=00000001 refcount=0
> > > > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > > >
> > > > refcount error: pfn=0003ae cf=00000001 refcount=0
> > > > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > > >
> > > > refcount error: pfn=0003af cf=00000001 refcount=0
> > > > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> > > >
> > > > refcount error: pfn=0003b1 cf=00000001 refcount=0
> > > > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> > > >
> > > > refcount error: pfn=0003b2 cf=00000001 refcount=0
> > > > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > > >
> > > > refcount error: pfn=0003b4 cf=00000001 refcount=0
> > > > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > > > Build and deploy apps & Web services for Linux with
> > > > a free DVD software kit from IBM. Click Now!
> > > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > > > _______________________________________________
> > > > Xen-devel mailing list
> > > > Xen-devel@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> > > >
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > > Build and deploy apps & Web services for Linux with
> > > a free DVD software kit from IBM. Click Now!
> > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> >
> >
> >
> > -------------------------------------------------------
> > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > Build and deploy apps & Web services for Linux with
> > a free DVD software kit from IBM. Click Now!
> > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
> >
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-24  3:44         ` Kip Macy
@ 2004-02-24  8:15           ` Ian Pratt
  2004-02-24  8:35             ` Keir Fraser
  0 siblings, 1 reply; 13+ messages in thread
From: Ian Pratt @ 2004-02-24  8:15 UTC (permalink / raw)
  To: Kip Macy; +Cc: Keir Fraser, Ian Pratt, xen-devel

> It happened again. Is it possible that Xen isn't disabling network
> interrupts while it is "auditing all pages"?

Quite possibly. The auditing code was added fairly recently
specifically to assist debugging of a guest OS that was
internally 'leaking' references to pages.

Its not in any of the non-debug builds, and is not well
tested. In the circumstance we were using it the problem with the
guestOS was rather subtle and just a couple of pages were failing
the audit and generating log messages. (The audit code gets
called when the guest does something 'weird', or when you invoke
the appropriate keyboard handler)

If your machine has lots of physical memory, the auditing will
take some time, and I'm not surprised that its causing
problems. If its not helping you, just comment it out, or invoke
it via the keyboard handler when you want it.

Cheers,
Ian





-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-24  8:15           ` Ian Pratt
@ 2004-02-24  8:35             ` Keir Fraser
  2004-02-24 17:21               ` Kip Macy
  0 siblings, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2004-02-24  8:35 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Kip Macy, Keir Fraser, xen-devel

> > It happened again. Is it possible that Xen isn't disabling network
> > interrupts while it is "auditing all pages"?
> 
> Quite possibly. The auditing code was added fairly recently
> specifically to assist debugging of a guest OS that was
> internally 'leaking' references to pages.
> 
> Its not in any of the non-debug builds, and is not well
> tested. In the circumstance we were using it the problem with the
> guestOS was rather subtle and just a couple of pages were failing
> the audit and generating log messages. (The audit code gets
> called when the guest does something 'weird', or when you invoke
> the appropriate keyboard handler)
> 
> If your machine has lots of physical memory, the auditing will
> take some time, and I'm not surprised that its causing
> problems. If its not helping you, just comment it out, or invoke
> it via the keyboard handler when you want it.

In fact the auditing code is only ever invoked in response to keyboard
or serial input. Just avoid pressing the 'm' key. :-)

 -- Keir


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-24  1:11       ` Kip Macy
  2004-02-24  3:44         ` Kip Macy
@ 2004-02-24  8:40         ` Keir Fraser
  1 sibling, 0 replies; 13+ messages in thread
From: Keir Fraser @ 2004-02-24  8:40 UTC (permalink / raw)
  To: Kip Macy; +Cc: Keir Fraser, Ian Pratt, xen-devel

> > 'objdump -d xen >xen.s'. Then you can search in xen.s with a text
> > editor to find the call-trace addresses.
> 
> I did this and got what you see below. It looks like to backtraces
> interleaved. All of the values in brackets are legitimate return
> addresses (they immediately follow a call instruction). "function addr"
> is the address of the function itself and "ret addr" is the address
> taken from the oops.

Yeah, we cannot precisly print the call trace because we build Xen
with '-fomit-frame-pointer', even when doing a debug build. If we
included a frame pointer then we could "chase" register %ebp to find
the true call trace.

As it is, we just look at the entire stack contents and enclose in
square brackets any value that could correspond to an address between
labels '_start' and '_end' in Xen.

The effect of this is that if you have stale return addresses on your
stack then they get included on the approximate call trace. These
stale addresses may occur because the stack frame was popped, then
another functyion invocation has pushed itself a large stack frame
that encompasses teh stale one, but hasn't blown away all of the stale
contents. 

 -- Keir


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-24  8:35             ` Keir Fraser
@ 2004-02-24 17:21               ` Kip Macy
  2004-02-24 17:45                 ` Ian Pratt
  0 siblings, 1 reply; 13+ messages in thread
From: Kip Macy @ 2004-02-24 17:21 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Ian Pratt, xen-devel

Or accidentally left-clicking on the console window with the mouse while
trying to copy the output. Interesting side affect, several mouse clicks
== reboot.

		-Kip

On Tue, 24 Feb 2004, Keir Fraser wrote:

> > > It happened again. Is it possible that Xen isn't disabling network
> > > interrupts while it is "auditing all pages"?
> >
> > Quite possibly. The auditing code was added fairly recently
> > specifically to assist debugging of a guest OS that was
> > internally 'leaking' references to pages.
> >
> > Its not in any of the non-debug builds, and is not well
> > tested. In the circumstance we were using it the problem with the
> > guestOS was rather subtle and just a couple of pages were failing
> > the audit and generating log messages. (The audit code gets
> > called when the guest does something 'weird', or when you invoke
> > the appropriate keyboard handler)
> >
> > If your machine has lots of physical memory, the auditing will
> > take some time, and I'm not surprised that its causing
> > problems. If its not helping you, just comment it out, or invoke
> > it via the keyboard handler when you want it.
>
> In fact the auditing code is only ever invoked in response to keyboard
> or serial input. Just avoid pressing the 'm' key. :-)
>
>  -- Keir
>
>
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: refcount errors then crash on XenoLinux with the latest source
  2004-02-24 17:21               ` Kip Macy
@ 2004-02-24 17:45                 ` Ian Pratt
  0 siblings, 0 replies; 13+ messages in thread
From: Ian Pratt @ 2004-02-24 17:45 UTC (permalink / raw)
  To: Kip Macy; +Cc: Keir Fraser, Ian Pratt, xen-devel

> Or accidentally left-clicking on the console window with the mouse while
> trying to copy the output. Interesting side affect, several mouse clicks
> == reboot.

That would be a capital 'R' ;-)

Hit 'h' for help :

'h' pressed -> showing installed handlers
 key 'B' (ascii '42') => reboot machine gracefully
 key 'L' (ascii '4c') => reset sched latency histogram
 key 'P' (ascii '50') => reset performance counters
 key 'R' (ascii '52') => reboot machine ungracefully
 key 'a' (ascii '61') => dump ac_timer queues
 key 'b' (ascii '62') => dump xen ide blkdev statistics
 key 'd' (ascii '64') => dump registers
 key 'h' (ascii '68') => show this message
 key 'l' (ascii '6c') => print sched latency histogram
 key 'p' (ascii '70') => print performance counters
 key 'q' (ascii '71') => dump task queues + guest state
 key 'r' (ascii '72') => dump run queues
 key '~' (ascii '7e') => toggle serial echo

This is from a 1.2 build. The 1.3 build has more debug handlers.

They're available over the serial line, or by pressing scroll
lock-and-key on the keyboard (even in production builds).  We've
found them very useful.

Best,
Ian



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: refcount errors then crash on XenoLinux with the latest source
  2004-02-24  0:21 ` Kip Macy
@ 2004-02-24 18:42   ` Rolf Neugebauer
  0 siblings, 0 replies; 13+ messages in thread
From: Rolf Neugebauer @ 2004-02-24 18:42 UTC (permalink / raw)
  To: Kip Macy; +Cc: rolf.neugebauer, Ian Pratt, xen-devel

[-- Attachment #1: Type: text/plain, Size: 4002 bytes --]

On Tue, 2004-02-24 at 00:21, Kip Macy wrote:
> >
> > If it helps, I also have some more debug code which allows a domain to
> > get the pfn_info from Xen for a given page. I could send you a patch
> > against unstable again tomorrow.
> 
> That would be great. Although I'm hoping to figure out why I'm getting a
> FAULT10 in my domain before I spend too much time on Xen.
> 

attached. A new hypercall allows a domain to get info on a given pfn if
it belongs to the domain. dom0 should be able to access all memory.

I haven't tested this patch! It compiles against the latest tree. Let me
know if you have problems with it or if you find it useful.

Rolf
			-Kip
> 
> >
> > Rolf
> >
> > > refcount error: pfn=00036f cf=40000002 refcount=1
> > > audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3f9 *pte_idx=0036f063
> > >
> > > refcount error: pfn=000371 cf=40000002 refcount=1
> > > audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3fe *pte_idx=00371063
> > >
> > > refcount error: pfn=000372 cf=40000002 refcount=1
> > > audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
> > >   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
> > >     pte_idx=3fd *pte_idx=00372063
> > >
> > > refcount error: pfn=000390 cf=00000001 refcount=0
> > > audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > >
> > > refcount error: pfn=000392 cf=00000001 refcount=0
> > > audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> > >
> > > refcount error: pfn=000393 cf=00000001 refcount=0
> > > audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> > >
> > > refcount error: pfn=000395 cf=00000001 refcount=0
> > > audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> > >
> > > refcount error: pfn=00039f cf=00000001 refcount=0
> > > audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > >
> > > refcount error: pfn=0003a1 cf=00000001 refcount=0
> > > audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> > >
> > > refcount error: pfn=0003a2 cf=00000001 refcount=0
> > > audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > >
> > > refcount error: pfn=0003a8 cf=00000001 refcount=0
> > > audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> > >
> > > refcount error: pfn=0003a9 cf=00000001 refcount=0
> > > audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > >
> > > refcount error: pfn=0003ab cf=00000001 refcount=0
> > > audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> > >
> > > refcount error: pfn=0003ac cf=00000001 refcount=0
> > > audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > >
> > > refcount error: pfn=0003ae cf=00000001 refcount=0
> > > audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> > >
> > > refcount error: pfn=0003af cf=00000001 refcount=0
> > > audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> > >
> > > refcount error: pfn=0003b1 cf=00000001 refcount=0
> > > audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> > >
> > > refcount error: pfn=0003b2 cf=00000001 refcount=0
> > > audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > >
> > > refcount error: pfn=0003b4 cf=00000001 refcount=0
> > > audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> > >
> > >
> > >
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> > > Build and deploy apps & Web services for Linux with
> > > a free DVD software kit from IBM. Click Now!
> > > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> >
> >
> >

[-- Attachment #2: get_page_info.patch --]
[-- Type: text/x-patch, Size: 4705 bytes --]

diff -Nur --exclude=RCS --exclude=CVS --exclude=SCCS --exclude=BitKeeper --exclude=ChangeSet xeno.bk/xen/arch/i386/entry.S xeno.foo/xen/arch/i386/entry.S
--- xeno.bk/xen/arch/i386/entry.S	Mon Feb 23 17:49:50 2004
+++ xeno.foo/xen/arch/i386/entry.S	Tue Feb 24 17:58:04 2004
@@ -727,6 +727,7 @@
         .long SYMBOL_NAME(do_set_timer_op)       /* 20 */
         .long SYMBOL_NAME(do_event_channel_op)
         .long SYMBOL_NAME(do_xen_version)
+        .long SYMBOL_NAME(do_get_page_info)
         .rept NR_syscalls-((.-hypervisor_call_table)/4)
         .long SYMBOL_NAME(do_ni_syscall)
         .endr
diff -Nur --exclude=RCS --exclude=CVS --exclude=SCCS --exclude=BitKeeper --exclude=ChangeSet xeno.bk/xen/common/memory.c xeno.foo/xen/common/memory.c
--- xeno.bk/xen/common/memory.c	Mon Feb 23 14:32:33 2004
+++ xeno.foo/xen/common/memory.c	Tue Feb 24 17:58:54 2004
@@ -1102,6 +1102,54 @@
 }
 
 
+/*
+ * allows a domain to get query some information on a page
+ * this is primarily for debugging purposes
+ * a normal domain is only allowed to query its own pages.
+ * privileged domains can query any page.
+ */
+long do_get_page_info(get_page_info_t *u_page_info)
+{
+    long             ret = 0;
+    get_page_info_t *pi;
+
+    if ( (pi = kmalloc(sizeof(*pi), GFP_KERNEL)) == NULL )
+        return -ENOMEM;
+
+
+    /* sanity checks */
+    if ( copy_from_user(pi, u_page_info, sizeof(*pi)) )
+    {
+        ret = -EFAULT;
+        goto out;
+    }
+
+    if ( unlikely(pi->pfn >= max_page) )
+    {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    if ( (frame_table[pi->pfn].u.domain != current) && 
+         !IS_PRIV(current) )
+    {
+        ret = -EPERM;
+        goto out;
+    }
+
+    /* copy the info */
+    pi->owner = (unsigned long) frame_table[pi->pfn].u.domain;
+    pi->count_and_flags = frame_table[pi->pfn].count_and_flags;
+    pi->type_and_flags = frame_table[pi->pfn].type_and_flags;
+    pi->tlbflush_timestamp = frame_table[pi->pfn].tlbflush_timestamp;
+
+    copy_to_user(u_page_info, pi, sizeof(*pi));
+ out:
+    kfree(pi);
+    return 0;
+}
+
+
 #ifndef NDEBUG
 /*
  * below are various memory debugging functions: 
diff -Nur --exclude=RCS --exclude=CVS --exclude=SCCS --exclude=BitKeeper --exclude=ChangeSet xeno.bk/xen/include/hypervisor-ifs/hypervisor-if.h xeno.foo/xen/include/hypervisor-ifs/hypervisor-if.h
--- xeno.bk/xen/include/hypervisor-ifs/hypervisor-if.h	Mon Feb 23 17:49:51 2004
+++ xeno.foo/xen/include/hypervisor-ifs/hypervisor-if.h	Tue Feb 24 17:57:17 2004
@@ -63,6 +63,7 @@
 #define __HYPERVISOR_set_timer_op         20
 #define __HYPERVISOR_event_channel_op     21
 #define __HYPERVISOR_xen_version          22
+#define __HYPERVISOR_get_page_info        23
 
 /* And the trap vector is... */
 #define TRAP_INSTR "int $0x82"
@@ -245,6 +246,21 @@
     unsigned long esp;
     unsigned long ss;
 } execution_context_t;
+
+/*
+ * returned by get_page_info hypercall
+ * usefull for debugging. allows a dom to find out info xen keeps about a page
+ */
+typedef struct
+{
+    unsigned long   pfn;
+    unsigned long   owner;
+    unsigned long   count_and_flags;
+    unsigned long   type_and_flags;
+    unsigned long   tlbflush_timestamp;
+} get_page_info_t;
+
+
 
 /*
  * Xen/guestos shared data -- pointer provided in start_info.
diff -Nur --exclude=RCS --exclude=CVS --exclude=SCCS --exclude=BitKeeper --exclude=ChangeSet xeno.bk/xenolinux-2.4.25-sparse/include/asm-xeno/hypervisor.h xeno.foo/xenolinux-2.4.25-sparse/include/asm-xeno/hypervisor.h
--- xeno.bk/xenolinux-2.4.25-sparse/include/asm-xeno/hypervisor.h	Mon Feb 23 17:49:51 2004
+++ xeno.foo/xenolinux-2.4.25-sparse/include/asm-xeno/hypervisor.h	Tue Feb 24 18:13:37 2004
@@ -446,4 +446,16 @@
     return ret;
 }
 
+static inline int HYPERVISOR_get_page_info(get_page_info_t *pi)
+{
+    int ret;
+    __asm__ __volatile__ (
+        TRAP_INSTR
+        : "=a" (ret) : "0" (__HYPERVISOR_get_page_info),
+        "b" (pi) : "memory" );
+
+    return ret;
+}
+
+
 #endif /* __HYPERVISOR_H__ */
diff -Nur --exclude=RCS --exclude=CVS --exclude=SCCS --exclude=BitKeeper --exclude=ChangeSet xeno.bk/xenolinux-sparse/include/asm-xeno/hypervisor.h xeno.foo/xenolinux-sparse/include/asm-xeno/hypervisor.h
--- xeno.bk/xenolinux-sparse/include/asm-xeno/hypervisor.h	Mon Feb 23 17:49:51 2004
+++ xeno.foo/xenolinux-sparse/include/asm-xeno/hypervisor.h	Tue Feb 24 18:13:37 2004
@@ -446,4 +446,16 @@
     return ret;
 }
 
+static inline int HYPERVISOR_get_page_info(get_page_info_t *pi)
+{
+    int ret;
+    __asm__ __volatile__ (
+        TRAP_INSTR
+        : "=a" (ret) : "0" (__HYPERVISOR_get_page_info),
+        "b" (pi) : "memory" );
+
+    return ret;
+}
+
+
 #endif /* __HYPERVISOR_H__ */

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-02-24 18:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-23 23:52 refcount errors then crash on XenoLinux with the latest source Neugebauer, Rolf
2004-02-24  0:21 ` Kip Macy
2004-02-24 18:42   ` Rolf Neugebauer
  -- strict thread matches above, loose matches on Subject: below --
2004-02-22 21:38 dumping a domain's core Ian Pratt
2004-02-23 21:02 ` refcount errors then crash on XenoLinux with the latest source Kip Macy
2004-02-23 21:36   ` Kip Macy
2004-02-23 23:35     ` Keir Fraser
2004-02-24  1:11       ` Kip Macy
2004-02-24  3:44         ` Kip Macy
2004-02-24  8:15           ` Ian Pratt
2004-02-24  8:35             ` Keir Fraser
2004-02-24 17:21               ` Kip Macy
2004-02-24 17:45                 ` Ian Pratt
2004-02-24  8:40         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.