From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kip Macy Subject: Re: xm pause causing lockup Date: Thu, 14 Apr 2005 14:04:30 -0700 Message-ID: References: <4afce18847157fad34cd38e14fb83c2c@cl.cam.ac.uk> Reply-To: Kip Macy Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: xen-devel List-Id: xen-devel@lists.xenproject.org I think there may be a bug in your page pinning validation logic - the lockup occurs when stepping through xen_pgd_pin. I don't know if I'm really passing in 0, as register locals can quickly get overwritten, but it is certainly worth checking. Breakpoint 15, pmap_pinit (pmap=3D0xc06900c0) at ../../../i386-xen/i386-xen/pmap.c:1206 1206 xen_pgd_pin(ma); (gdb)=20 Continuing. Breakpoint 8, xen_pgd_pin (ma=3D0x0) at ../../../i386-xen/i386-xen/xen_machdep.c:490 490 op.cmd =3D MMUEXT_PIN_L2_TABLE; (gdb) s 491 op.mfn =3D ma >> PAGE_SHIFT; (gdb)=20 492 xen_flush_queue(); (gdb)=20 Breakpoint 4, xen_flush_queue () at ../../../i386-xen/i386-xen/xen_machdep.= c:431 431 if (XPQ_IDX !=3D 0) _xen_flush_queue(); (gdb)=20 432 } (gdb)=20 xen_pgd_pin (ma=3D0x630f) at hypervisor.h:72 72 { (gdb)=20 76 __asm__ __volatile__ ( (gdb)=20 On 4/14/05, Kip Macy wrote: > I haven't tracked down the problem yet, but I thought the following > was sufficiently interesting to post: >=20 > kmacy@curly while (1) > while? xm list > while? sleep 5 > while? end > Name Id Mem(MB) CPU State Time(s) Console > Domain-0 0 507 0 r---- 67.9 > xen-vm2 1 128 1 r---- 4.0 9601 > Name Id Mem(MB) CPU State Time(s) Console > Domain-0 0 507 0 r---- 68.1 > xen-vm2 1 128 1 r---- 4.0 9601 > Name Id Mem(MB) CPU State Time(s) Console > Domain-0 0 507 0 r---- 68.3 > xen-vm2 1 128 1 r---- 4.0 9601 > Name Id Mem(MB) CPU State Time(s) Console > Domain-0 0 507 0 r---- 68.5 > xen-vm2 1 128 1 r---- 4.0 9601 > Name Id Mem(MB) CPU State Time(s) Console > Domain-0 0 507 0 r---- 68.7 > xen-vm2 1 128 1 r---- 4.0 9601 > Name Id Mem(MB) CPU State Time(s) Console > Domain-0 0 507 0 r---- 68.9 > xen-vm2 1 128 1 r---- 4.0 9601 >=20 > xen-vm2 is always shown as running, but its time is not increasing. >=20 > -Kip >=20 >=20 > On 4/13/05, Kip Macy wrote: > > On 4/13/05, Keir Fraser wrote: > > > Probably easiest way to trace this is with printk's in Xen. The guts = of > > > the work is done by domain_pause_by_systemcontroller() in xen/sched.h= . > > > This in turn calls domain_sleep() in common/schedule.c. > > > > I traced through that code a while back when trying to decide what to > > call from the int3 handler. > > > > A particularly > > > interesting place to look will be teh synchronous spin loop at the en= d > > > of domain_sleep -- if the paused domain isn't descheduled for some > > > weird reason then the spin loop would never exit and domain0 would > > > hang. > > > > Good point. It will be interesting to see. > > > > I sometimes wonder if I should keep some of the buggy versions of > > FreeBSD around for regression testing as they trigger some interesting > > behaviours in xen and xend. > > > > -Kip > > >