All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: refcount errors then crash on XenoLinux with the latest source
@ 2004-02-23 23:52 Neugebauer, Rolf
  2004-02-24  0:21 ` Kip Macy
  0 siblings, 1 reply; 13+ messages in thread
From: Neugebauer, Rolf @ 2004-02-23 23:52 UTC (permalink / raw)
  To: Kip Macy, Ian Pratt; +Cc: xen-devel, Neugebauer, Rolf



> -----Original Message-----
> From: xen-devel-admin@lists.sourceforge.net [mailto:xen-devel-
> admin@lists.sourceforge.net] On Behalf Of Kip Macy
> Sent: 23 February 2004 21:03
> To: Ian Pratt
> Cc: xen-devel@lists.sourceforge.net
> Subject: [Xen-devel] refcount errors then crash on XenoLinux with the
> latest source
> 
> I had just tested my domain builder for the nth time on xeno-unstable
> (very latest source), when I saw the messages below on the console.
> DOM0 no longer responds to ping - I'm hoping that it will recover,

That is normal if you do an audit of all pages. Interrupts are disabled
for the whole time and you are basically trawling the entire frametable
twice, which can take some time depending on your memory size.

> however, in all likelihood I will be hitting the rpb in a few minutes.
> 
> audit_all_pages
> zombie: pfn=00000000 cf=fffffffd tf=fffffffd dom=00000000
> refcount error: pfn=000000 cf=fffffffd refcount=1
> audit page: pfn=0 info: cf=fffffffd tf=fffffffd ts=0 dom=0

This one is odd, I think. At least I haven't seen it before.

> refcount error: pfn=000247 cf=00000001 refcount=0
> audit page: pfn=247 info: cf=1 tf=f0000001 ts=0 dom=fc648040
> 
> refcount error: pfn=00024d cf=00000001 refcount=0
> audit page: pfn=24d info: cf=1 tf=f0000001 ts=0 dom=fc648040

These are expected. They are used for comms rings.

The rest all indicate an error of some sort. The first three have a
mapping but their ref count is wrong.

I don't know the bitmasks for the type and the top of my head but I can
have a look tomorrow.

If it helps, I also have some more debug code which allows a domain to
get the pfn_info from Xen for a given page. I could send you a patch
against unstable again tomorrow.

Rolf

> refcount error: pfn=00036f cf=40000002 refcount=1
> audit page: pfn=36f info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3f9 *pte_idx=0036f063
> 
> refcount error: pfn=000371 cf=40000002 refcount=1
> audit page: pfn=371 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3fe *pte_idx=00371063
> 
> refcount error: pfn=000372 cf=40000002 refcount=1
> audit page: pfn=372 info: cf=40000002 tf=f0000002 ts=0 dom=fc648be0
>   pte_pfn=002207 cf=10000026 tf=30000024 dom=fc648be0
>     pte_idx=3fd *pte_idx=00372063
> 
> refcount error: pfn=000390 cf=00000001 refcount=0
> audit page: pfn=390 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> 
> refcount error: pfn=000392 cf=00000001 refcount=0
> audit page: pfn=392 info: cf=1 tf=f0000001 ts=4ddca dom=fc649780
> 
> refcount error: pfn=000393 cf=00000001 refcount=0
> audit page: pfn=393 info: cf=1 tf=f0000001 ts=4ae4c dom=fc64a320
> 
> refcount error: pfn=000395 cf=00000001 refcount=0
> audit page: pfn=395 info: cf=1 tf=f0000001 ts=0 dom=fc64a320
> 
> refcount error: pfn=00039f cf=00000001 refcount=0
> audit page: pfn=39f info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> 
> refcount error: pfn=0003a1 cf=00000001 refcount=0
> audit page: pfn=3a1 info: cf=1 tf=f0000001 ts=0 dom=fc64aec0
> 
> refcount error: pfn=0003a2 cf=00000001 refcount=0
> audit page: pfn=3a2 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> 
> refcount error: pfn=0003a8 cf=00000001 refcount=0
> audit page: pfn=3a8 info: cf=1 tf=f0000001 ts=0 dom=fc7a4060
> 
> refcount error: pfn=0003a9 cf=00000001 refcount=0
> audit page: pfn=3a9 info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> 
> refcount error: pfn=0003ab cf=00000001 refcount=0
> audit page: pfn=3ab info: cf=1 tf=f0000001 ts=0 dom=fc7a4c00
> 
> refcount error: pfn=0003ac cf=00000001 refcount=0
> audit page: pfn=3ac info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> 
> refcount error: pfn=0003ae cf=00000001 refcount=0
> audit page: pfn=3ae info: cf=1 tf=f0000001 ts=191ab3 dom=fc7a57a0
> 
> refcount error: pfn=0003af cf=00000001 refcount=0
> audit page: pfn=3af info: cf=1 tf=f0000001 ts=191ab2 dom=fc7a6340
> 
> refcount error: pfn=0003b1 cf=00000001 refcount=0
> audit page: pfn=3b1 info: cf=1 tf=f0000001 ts=0 dom=fc7a6340
> 
> refcount error: pfn=0003b2 cf=00000001 refcount=0
> audit page: pfn=3b2 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> 
> refcount error: pfn=0003b4 cf=00000001 refcount=0
> audit page: pfn=3b4 info: cf=1 tf=f0000001 ts=0 dom=fc7a6ee0
> 
> 
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id\x1356&alloc_id438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: dumping a domain's core
@ 2004-02-22 21:38 Ian Pratt
  2004-02-23 21:02 ` refcount errors then crash on XenoLinux with the latest source Kip Macy
  0 siblings, 1 reply; 13+ messages in thread
From: Ian Pratt @ 2004-02-22 21:38 UTC (permalink / raw)
  To: Kip Macy; +Cc: Ian Pratt, xen-devel


> > Right now, the only way to do this is rather grim -- see the auto
> > reboot stuff in xc_dom_create. It polls get_domain_info once a
> > second.
> 
> Hmm. I guess that could work.

Using the new inter-domain comms rings mechanism it'll be easy to
add events for things like this.

> > In the 1.3 tree, if you've got the pages mapped into domain 0
> > they won't go straight back on the free list when the domain dies
> > (as they're referenced counted). You can then write out a core
> > dump.
> 
> I was hoping that I could map them in on demand. I guess there isn't
> any good reason why DOM0 shouldn't have access to everyone's memory
> all the time.

The trouble with mapping them on demand is that as soon as the
domain exits the reference count on the pages will go to zero and
they'll end up on the free list, hence may get overwritten
e.g. by network packets. 

Rather than destroying a domain when it faults, its arguable we
should just mark it as a zombie, and then rely on user-space
domain0 tools to issue a 'destroy' on the zombies, after writing
a coredump if required.

This would be an easy hack to add for you purposes.

You could create the coredump by modifying the xc_linux_save
function.

Cheers,
Ian


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-02-24 18:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-23 23:52 refcount errors then crash on XenoLinux with the latest source Neugebauer, Rolf
2004-02-24  0:21 ` Kip Macy
2004-02-24 18:42   ` Rolf Neugebauer
  -- strict thread matches above, loose matches on Subject: below --
2004-02-22 21:38 dumping a domain's core Ian Pratt
2004-02-23 21:02 ` refcount errors then crash on XenoLinux with the latest source Kip Macy
2004-02-23 21:36   ` Kip Macy
2004-02-23 23:35     ` Keir Fraser
2004-02-24  1:11       ` Kip Macy
2004-02-24  3:44         ` Kip Macy
2004-02-24  8:15           ` Ian Pratt
2004-02-24  8:35             ` Keir Fraser
2004-02-24 17:21               ` Kip Macy
2004-02-24 17:45                 ` Ian Pratt
2004-02-24  8:40         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.