From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: Detecting deadlocks with hypervisor.. Date: Sun, 19 Mar 2006 10:30:09 -0600 Message-ID: <441D8711.2090502@us.ibm.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Thileepan Subramaniam Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Thileepan Subramaniam wrote: > Hello, > > I am trying to see if the hypervisor can be used to detect deadlocks > in the guest VMs. My goal is to detect if a guest OS is deadlocked, > and if it is, then create a clone of the deadlocked OS without the > locking condition, and letting the clone run. While the clone runs I > am hoping to generate some hints that could tell me what caused the > deadlock. > > I simulated a deadlock/hang situation in a guest OS (by loading a > badly written module to the kernel) and when the guestOS kernel was > hanging, I ran "xm save" from Dom-0. But this command waits forever. > > I tried to follow the flow of the .py files (XendCheckpoint.py etc.). > These seem to be called when I run 'xm save'. But beyond a point I am > not sure what the python scripts do. I also see some libxc files such > as xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or > the XenU). Can someone help me by explaining me what happens behind > the scene when "xm save" is called ? Is there any good documentation > explaining which actions are done by which layers (eg: python layer, C > layer etc). > > Also, does it seem viable to clone a copy of a deadlocked guest OS in > the first place? As Ewan pointed out, xm save is guest-assisted so a hung guest will not be savable. You may want to look at xc_domain_dumpcore(). You could do some post-analysis of the core dump to determine where it locked. Determining why it dead-locked is of course impossible for the general case but you may be able to develop some interesting heuristics with appropriate static analysis. As for recovering the guest, a really clever approach would be to rewrite some of the locking code (maybe temporarily?) by mapping the guest's code page into dom0's memory after examining EIP in the core. I reckon there's a rather interesting paper to be written on something like this :-) Regards, Anthony Liguori > thanks! > - ts > > _________________________________________________________________ > On the road to retirement? Check out MSN Life Events for advice on how > to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel