All of lore.kernel.org
 help / color / mirror / Atom feed
* Detecting deadlocks with hypervisor..
@ 2006-03-19  2:14 Thileepan Subramaniam
  2006-03-19  6:37 ` Randy Thelen
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Thileepan Subramaniam @ 2006-03-19  2:14 UTC (permalink / raw)
  To: xen-devel

Hello,

I am trying to see if the hypervisor can be used to detect deadlocks in the 
guest VMs. My goal is to detect if a guest OS is deadlocked, and if it is, 
then create a clone of the deadlocked OS without the locking condition, and 
letting the clone run. While the clone runs I am hoping to generate some 
hints that could tell me what caused the deadlock.

I simulated a deadlock/hang situation in a guest OS (by loading a badly 
written module to the kernel) and when the guestOS kernel was hanging, I ran 
"xm save" from Dom-0. But this command waits forever.

I tried to follow the flow of the .py files (XendCheckpoint.py etc.). These 
seem to be called when I run 'xm save'. But beyond a point I am not sure 
what the python scripts do. I also see some libxc files such as 
xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the 
XenU). Can someone help me by explaining me what happens behind the scene 
when "xm save" is called ? Is there any good documentation explaining which 
actions are done by which layers (eg: python layer, C layer etc).

Also, does it seem viable to clone a copy of a deadlocked guest OS in the 
first place?

thanks!
- ts

_________________________________________________________________
On the road to retirement? Check out MSN Life Events for advice on how to 
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-19  2:14 Detecting deadlocks with hypervisor Thileepan Subramaniam
@ 2006-03-19  6:37 ` Randy Thelen
  2006-03-19 10:16 ` Edwin Zhai
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Randy Thelen @ 2006-03-19  6:37 UTC (permalink / raw)
  To: Thileepan Subramaniam; +Cc: xen-devel

Thileepan Subramaniam wrote:

> Can someone help me by explaining me what happens behind the scene  
> when "xm save" is called ? Is there any good documentation  
> explaining which actions are done by which layers (eg: python  
> layer, C layer etc).

This would be immensely valuable.  I imagine there's a college  
student looking for some way to make their mark in the open source  
community.

> Also, does it seem viable to clone a copy of a deadlocked guest OS  
> in the first place?

The idea of using clones as a way of detecting deadlocks is intriguing.

> I am trying to see if the hypervisor can be used to detect  
> deadlocks in the guest VMs. My goal is to detect if a guest OS is  
> deadlocked, and if it is, then create a clone of the deadlocked OS  
> without the locking condition, and letting the clone run. While the  
> clone runs I am hoping to generate some hints that could tell me  
> what caused the deadlock.

But, I suspect that some logic injected into the lock routines (and  
data structures) of the host O/S are an easier and possibly better bet.

-- Randy Thelen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-19  2:14 Detecting deadlocks with hypervisor Thileepan Subramaniam
  2006-03-19  6:37 ` Randy Thelen
@ 2006-03-19 10:16 ` Edwin Zhai
  2006-03-19 13:17 ` Ewan Mellor
  2006-03-19 16:30 ` Anthony Liguori
  3 siblings, 0 replies; 15+ messages in thread
From: Edwin Zhai @ 2006-03-19 10:16 UTC (permalink / raw)
  To: Thileepan Subramaniam; +Cc: xen-devel

On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote:
> I tried to follow the flow of the .py files (XendCheckpoint.py etc.). These 
> seem to be called when I run 'xm save'. But beyond a point I am not sure what 
> the python scripts do. I also see some libxc files such as xc_linux_save.c, 
> but I am not sure who is using it (Dom-0 or Xen or the XenU). Can someone help 
> me by explaining me what happens behind the scene when "xm save" is called ? 
> Is there any good documentation explaining which actions are done by which 
> layers (eg: python layer, C layer etc).
python layer only save some domain info, i think.
then the app xc_save will be called, that in turn call xc_linux_save.  
xc_linux_save save all the memory and vcpu context of the guest.

-- 
thanks,
edwin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-19  2:14 Detecting deadlocks with hypervisor Thileepan Subramaniam
  2006-03-19  6:37 ` Randy Thelen
  2006-03-19 10:16 ` Edwin Zhai
@ 2006-03-19 13:17 ` Ewan Mellor
  2006-03-24 18:57   ` T S
  2006-03-24 19:04   ` T S
  2006-03-19 16:30 ` Anthony Liguori
  3 siblings, 2 replies; 15+ messages in thread
From: Ewan Mellor @ 2006-03-19 13:17 UTC (permalink / raw)
  To: Thileepan Subramaniam; +Cc: xen-devel

On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote:

> Hello,
> 
> I am trying to see if the hypervisor can be used to detect deadlocks in the 
> guest VMs. My goal is to detect if a guest OS is deadlocked, and if it is, 
> then create a clone of the deadlocked OS without the locking condition, and 
> letting the clone run. While the clone runs I am hoping to generate some 
> hints that could tell me what caused the deadlock.
> 
> I simulated a deadlock/hang situation in a guest OS (by loading a badly 
> written module to the kernel) and when the guestOS kernel was hanging, I 
> ran "xm save" from Dom-0. But this command waits forever.
> 
> I tried to follow the flow of the .py files (XendCheckpoint.py etc.). These 
> seem to be called when I run 'xm save'. But beyond a point I am not sure 
> what the python scripts do. I also see some libxc files such as 
> xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the 
> XenU). Can someone help me by explaining me what happens behind the scene 
> when "xm save" is called ? Is there any good documentation explaining which 
> actions are done by which layers (eg: python layer, C layer etc).

xc_save, the executable, calls xc_linux_save, the libxc function.  Depending
upon whether this is a live or non-live save, some stuff is done (see
xc_linux_save for details).  The Python layer is then called back, requesting
that the domain is suspended.  This request is passed through to the guest by
writing /local/domain/<domid>/control/shutdown = suspend in the store.  This
is seen by the guest (a watch fires inside reboot.c) and then the guest
suspends itself.  This is probably where you are falling down -- if the guest
kernel is completely deadlocked, it's going to struggle to suspend itself
correctly.

If a suspend completes correctly, Xend will see it (another watch will fire),
and xc_linux_save will be free to complete the save.

> Also, does it seem viable to clone a copy of a deadlocked guest OS in the 
> first place?

If you have a byte-for-byte copy of a deadlocked guest, even if you could
suspend it, surely it will be deadlocked when it is resumed.  How do you
intend to break the deadlock, and how is it easier to do that from outside
than it is to perform deadlock detection in the guest?

Ewan.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-19  2:14 Detecting deadlocks with hypervisor Thileepan Subramaniam
                   ` (2 preceding siblings ...)
  2006-03-19 13:17 ` Ewan Mellor
@ 2006-03-19 16:30 ` Anthony Liguori
  3 siblings, 0 replies; 15+ messages in thread
From: Anthony Liguori @ 2006-03-19 16:30 UTC (permalink / raw)
  To: Thileepan Subramaniam; +Cc: xen-devel

Thileepan Subramaniam wrote:
> Hello,
>
> I am trying to see if the hypervisor can be used to detect deadlocks 
> in the guest VMs. My goal is to detect if a guest OS is deadlocked, 
> and if it is, then create a clone of the deadlocked OS without the 
> locking condition, and letting the clone run. While the clone runs I 
> am hoping to generate some hints that could tell me what caused the 
> deadlock.
>
> I simulated a deadlock/hang situation in a guest OS (by loading a 
> badly written module to the kernel) and when the guestOS kernel was 
> hanging, I ran "xm save" from Dom-0. But this command waits forever.
>
> I tried to follow the flow of the .py files (XendCheckpoint.py etc.). 
> These seem to be called when I run 'xm save'. But beyond a point I am 
> not sure what the python scripts do. I also see some libxc files such 
> as xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or 
> the XenU). Can someone help me by explaining me what happens behind 
> the scene when "xm save" is called ? Is there any good documentation 
> explaining which actions are done by which layers (eg: python layer, C 
> layer etc).
>
> Also, does it seem viable to clone a copy of a deadlocked guest OS in 
> the first place?

As Ewan pointed out, xm save is guest-assisted so a hung guest will not 
be savable.

You may want to look at xc_domain_dumpcore().  You could do some 
post-analysis of the core dump to determine where it locked.  
Determining why it dead-locked is of course impossible for the general 
case but you may be able to develop some interesting heuristics with 
appropriate static analysis.

As for recovering the guest, a really clever approach would be to 
rewrite some of the locking code (maybe temporarily?) by mapping the 
guest's code page into dom0's memory after examining EIP in the core.

I reckon there's a rather interesting paper to be written on something 
like this :-)

Regards,

Anthony Liguori

> thanks!
> - ts
>
> _________________________________________________________________
> On the road to retirement? Check out MSN Life Events for advice on how 
> to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-19 13:17 ` Ewan Mellor
@ 2006-03-24 18:57   ` T S
  2006-03-24 19:04   ` T S
  1 sibling, 0 replies; 15+ messages in thread
From: T S @ 2006-03-24 18:57 UTC (permalink / raw)
  To: xen-devel

>From: Ewan Mellor
>To: Thileepan Subramaniam CC: xen-devel@lists.xensource.com
>Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
>Date: Sun, 19 Mar 2006 13:17:35 +0000
>
>On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote:
>
> > Hello,
> >
> > I am trying to see if the hypervisor can be used to detect deadlocks in 
>the
> > guest VMs. My goal is to detect if a guest OS is deadlocked, and if it 
>is,
> > then create a clone of the deadlocked OS without the locking condition, 
>and
> > letting the clone run. While the clone runs I am hoping to generate some
> > hints that could tell me what caused the deadlock.
> >
> > I simulated a deadlock/hang situation in a guest OS (by loading a badly
> > written module to the kernel) and when the guestOS kernel was hanging, I
> > ran "xm save" from Dom-0. But this command waits forever.
> >
> > I tried to follow the flow of the .py files (XendCheckpoint.py etc.). 
>These
> > seem to be called when I run 'xm save'. But beyond a point I am not sure
> > what the python scripts do. I also see some libxc files such as
> > xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the
> > XenU). Can someone help me by explaining me what happens behind the 
>scene
> > when "xm save" is called ? Is there any good documentation explaining 
>which
> > actions are done by which layers (eg: python layer, C layer etc).
>
>xc_save, the executable, calls xc_linux_save, the libxc function.  
>Depending
>upon whether this is a live or non-live save, some stuff is done (see
>xc_linux_save for details).  The Python layer is then called back, 
>requesting
>that the domain is suspended.  This request is passed through to the guest 
>by
>writing /local/domain/<domid>/control/shutdown = suspend in the store.  
>This
>is seen by the guest (a watch fires inside reboot.c) and then the guest
>suspends itself.  This is probably where you are falling down -- if the 
>guest
>kernel is completely deadlocked, it's going to struggle to suspend itself
>correctly.
>
>If a suspend completes correctly, Xend will see it (another watch will 
>fire),
>and xc_linux_save will be free to complete the save.

So, I went and experimented this: basically, I changed XendCheckpoint.py to 
NOT wait for the guest to shutdown; I also changed xc_linux_save() to 
proceed saving without waiting (essentially, suspend_and_state() returns 0 
instead of retrying repeateedly). With this I am able to save a deadlocked 
kernel smoothly.

But when I try restore, I get this error message:
Error: /usr/lib/xen/bin/xc_restore 10 19 5 34816 1 2 failed

And the log says,
[2006-03-24 13:48:42 xend] DEBUG (XendCheckpoint:152) [xc_restore]: 
/usr/lib/xen/bin/xc_restore 10 19 5 34816 1 2
[2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) xc_linux_restore 
start: max_pfn = 8800
[2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) Increased domain 
reservationby22000KB
[2006-03-24 13:48:42 xend] ERROR (XendCheckpoint:231) Reloading memory 
pages:   0%
[2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Received all pages (0 
races)
[2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Failed to pin batch of 
22 page tables: 22
[2006-03-24 13:48:54 xend] ERROR (XendCheckpoint:231) Restore exit with rc=1

Any clue .. so that i can overcome this and restore the kernel to its 
previous state (i.e., deadlocked state) ?

thanks,
TS

> > Also, does it seem viable to clone a copy of a deadlocked guest OS in 
>the
> > first place?
>
>If you have a byte-for-byte copy of a deadlocked guest, even if you could
>suspend it, surely it will be deadlocked when it is resumed.  How do you
>intend to break the deadlock, and how is it easier to do that from outside
>than it is to perform deadlock detection in the guest?
>
>Ewan.

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar – get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-19 13:17 ` Ewan Mellor
  2006-03-24 18:57   ` T S
@ 2006-03-24 19:04   ` T S
  2006-03-24 19:24     ` Anthony Liguori
  1 sibling, 1 reply; 15+ messages in thread
From: T S @ 2006-03-24 19:04 UTC (permalink / raw)
  To: xen-devel

>From: Ewan Mellor <ewan@xensource.com>
>To: Thileepan Subramaniam <thileepan_@hotmail.com>
>CC: xen-devel@lists.xensource.com
>Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
>Date: Sun, 19 Mar 2006 13:17:35 +0000
>
>On Sat, Mar 18, 2006 at 06:14:09PM -0800, Thileepan Subramaniam wrote:
>
> > Hello,
> >
> > I am trying to see if the hypervisor can be used to detect deadlocks in 
>the
> > guest VMs. My goal is to detect if a guest OS is deadlocked, and if it 
>is,
> > then create a clone of the deadlocked OS without the locking condition, 
>and
> > letting the clone run. While the clone runs I am hoping to generate some
> > hints that could tell me what caused the deadlock.
> >
> > I simulated a deadlock/hang situation in a guest OS (by loading a badly
> > written module to the kernel) and when the guestOS kernel was hanging, I
> > ran "xm save" from Dom-0. But this command waits forever.
> >
> > I tried to follow the flow of the .py files (XendCheckpoint.py etc.). 
>These
> > seem to be called when I run 'xm save'. But beyond a point I am not sure
> > what the python scripts do. I also see some libxc files such as
> > xc_linux_save.c, but I am not sure who is using it (Dom-0 or Xen or the
> > XenU). Can someone help me by explaining me what happens behind the 
>scene
> > when "xm save" is called ? Is there any good documentation explaining 
>which
> > actions are done by which layers (eg: python layer, C layer etc).
>
>xc_save, the executable, calls xc_linux_save, the libxc function.  
>Depending
>upon whether this is a live or non-live save, some stuff is done (see
>xc_linux_save for details).  The Python layer is then called back, 
>requesting
>that the domain is suspended.  This request is passed through to the guest 
>by
>writing /local/domain/<domid>/control/shutdown = suspend in the store.  
>This
>is seen by the guest (a watch fires inside reboot.c) and then the guest
>suspends itself.  This is probably where you are falling down -- if the 
>guest
>kernel is completely deadlocked, it's going to struggle to suspend itself
>correctly.

This may sound a silly question (pardon me because i am relatively new to 
linux kernel) .. will it be possible to continue running reboot.c (or for 
that matter any kernel thread) when the kernel is deadlocked ? In Linux, is 
the kernel a single process or a bunch of parallelly executing entities? If 
later, then during a kernel deadlock (eg: by loading a faulty module that 
disables interrupts and do something silly) there can still be some other 
processes/threads run, right?

thanks
TS

>
>If a suspend completes correctly, Xend will see it (another watch will 
>fire),
>and xc_linux_save will be free to complete the save.
>
> > Also, does it seem viable to clone a copy of a deadlocked guest OS in 
>the
> > first place?
>
>If you have a byte-for-byte copy of a deadlocked guest, even if you could
>suspend it, surely it will be deadlocked when it is resumed.  How do you
>intend to break the deadlock, and how is it easier to do that from outside
>than it is to perform deadlock detection in the guest?
>
>Ewan.
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-24 19:04   ` T S
@ 2006-03-24 19:24     ` Anthony Liguori
  2006-03-24 20:30       ` T S
  2006-04-07 17:11       ` T S
  0 siblings, 2 replies; 15+ messages in thread
From: Anthony Liguori @ 2006-03-24 19:24 UTC (permalink / raw)
  To: T S; +Cc: xen-devel

T S wrote:
> This may sound a silly question (pardon me because i am relatively new 
> to linux kernel) .. will it be possible to continue running reboot.c 
> (or for that matter any kernel thread) when the kernel is deadlocked ? 
> In Linux, is the kernel a single process or a bunch of parallelly 
> executing entities? If later, then during a kernel deadlock (eg: by 
> loading a faulty module that disables interrupts and do something 
> silly) there can still be some other processes/threads run, right?

Sorry for not making this more clear previously.  You cannot restore a 
dead-locked domain if a normal xm save doesn't work.  One thing that 
makes Xen unique is that guests actually are aware of what physical 
pages are assigned to them.  When one does a save/restore, the guest has 
to canonicalize all of it's internal references to physical pages.  When 
it's restored, it then remaps it's newly assigned physical pages to all 
the old places where it needed to know about them for some reason or 
another.

If the guest isn't responsive when you do a save, then it will never 
canonicalize itself and there is no way to restore the domain.

Regards,

Anthony Liguori

> thanks
> TS
>
>>
>> If a suspend completes correctly, Xend will see it (another watch 
>> will fire),
>> and xc_linux_save will be free to complete the save.
>>
>> > Also, does it seem viable to clone a copy of a deadlocked guest OS 
>> in the
>> > first place?
>>
>> If you have a byte-for-byte copy of a deadlocked guest, even if you 
>> could
>> suspend it, surely it will be deadlocked when it is resumed.  How do you
>> intend to break the deadlock, and how is it easier to do that from 
>> outside
>> than it is to perform deadlock detection in the guest?
>>
>> Ewan.
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today - it's 
> FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-24 19:24     ` Anthony Liguori
@ 2006-03-24 20:30       ` T S
  2006-04-07 17:11       ` T S
  1 sibling, 0 replies; 15+ messages in thread
From: T S @ 2006-03-24 20:30 UTC (permalink / raw)
  To: aliguori; +Cc: xen-devel

>From: Anthony Liguori <aliguori@us.ibm.com>
>To: T S <thileepan_@hotmail.com>
>CC: xen-devel@lists.xensource.com
>Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
>Date: Fri, 24 Mar 2006 13:24:46 -0600
>
>T S wrote:
>>This may sound a silly question (pardon me because i am relatively new to 
>>linux kernel) .. will it be possible to continue running reboot.c (or for 
>>that matter any kernel thread) when the kernel is deadlocked ? In Linux, 
>>is the kernel a single process or a bunch of parallelly executing 
>>entities? If later, then during a kernel deadlock (eg: by loading a faulty 
>>module that disables interrupts and do something silly) there can still be 
>>some other processes/threads run, right?
>
>Sorry for not making this more clear previously.  You cannot restore a 
>dead-locked domain if a normal xm save doesn't work.  One thing that makes 
>Xen unique is that guests actually are aware of what physical pages are 
>assigned to them.  When one does a save/restore, the guest has to 
>canonicalize all of it's internal references to physical pages.  When it's 
>restored, it then remaps it's newly assigned physical pages to all the old 
>places where it needed to know about them for some reason or another.

Thank you for the reply. Do you mean to say that the canonicalize..() 
functions in the xc_linux_save.c are actually invoked in the guest OS' 
context?

>If the guest isn't responsive when you do a save, then it will never 
>canonicalize itself and there is no way to restore the domain.
>
>Regards,
>
>Anthony Liguori
>
>>thanks
>>TS
>>
>>>
>>>If a suspend completes correctly, Xend will see it (another watch will 
>>>fire),
>>>and xc_linux_save will be free to complete the save.
>>>
>>> > Also, does it seem viable to clone a copy of a deadlocked guest OS in 
>>>the
>>> > first place?
>>>
>>>If you have a byte-for-byte copy of a deadlocked guest, even if you could
>>>suspend it, surely it will be deadlocked when it is resumed.  How do you
>>>intend to break the deadlock, and how is it easier to do that from 
>>>outside
>>>than it is to perform deadlock detection in the guest?
>>>
>>>Ewan.
>>>
>>>
>>>_______________________________________________
>>>Xen-devel mailing list
>>>Xen-devel@lists.xensource.com
>>>http://lists.xensource.com/xen-devel
>>
>>_________________________________________________________________
>>Express yourself instantly with MSN Messenger! Download today - it's FREE! 
>>http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>>
>>
>>_______________________________________________
>>Xen-devel mailing list
>>Xen-devel@lists.xensource.com
>>http://lists.xensource.com/xen-devel
>

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee® 
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-03-24 19:24     ` Anthony Liguori
  2006-03-24 20:30       ` T S
@ 2006-04-07 17:11       ` T S
  2006-04-07 17:22         ` Keir Fraser
  2006-04-07 17:41         ` Anthony Liguori
  1 sibling, 2 replies; 15+ messages in thread
From: T S @ 2006-04-07 17:11 UTC (permalink / raw)
  To: aliguori, ewan, edwin.zhai, rthelen; +Cc: Xen-devel

>From: Anthony Liguori <aliguori@us.ibm.com>
>To: T S <thileepan_@hotmail.com>
>CC: xen-devel@lists.xensource.com
>Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
>Date: Fri, 24 Mar 2006 13:24:46 -0600
>
>T S wrote:
>>This may sound a silly question (pardon me because i am relatively new to 
>>linux kernel) .. will it be possible to continue running reboot.c (or for 
>>that matter any kernel thread) when the kernel is deadlocked ? In Linux, 
>>is the kernel a single process or a bunch of parallelly executing 
>>entities? If later, then during a kernel deadlock (eg: by loading a faulty 
>>module that disables interrupts and do something silly) there can still be 
>>some other processes/threads run, right?
>
>Sorry for not making this more clear previously.  You cannot restore a 
>dead-locked domain if a normal xm save doesn't work.  One thing that makes 
>Xen unique is that guests actually are aware of what physical pages are 
>assigned to them.  When one does a save/restore, the guest has to 
>canonicalize all of it's internal references to physical pages.  When it's 
>restored, it then remaps it's newly assigned physical pages to all the old 
>places where it needed to know about them for some reason or another.

We took a look at the xc_linux_save() function ... and what we see is that
the canonicalize action is actually done by the Dom-0 (and not by the 
Dom-U);
Dom-0 is able to do this because it is able to access the page tables of 
Dom-U
as well as the pfn2mfn list of the Dom-U. Based on this, we think the Dom-0 
can
actually save the 'context' of the deadlocked Dom-U. Please correct me if 
this
claim is wrong.

Also, given that Dom-0 can access the page tables and other structures of 
the deadlocked guest,
can one of you be able to tell me what changes I need to do to 
xm_linux_save( ) (and other related functions) to save the state of the 
deadlocked guest without doing any handshake with the guest OS ?

thanks!
- T


>If the guest isn't responsive when you do a save, then it will never 
>canonicalize itself and there is no way to restore the domain.
>
>Regards,
>
>Anthony Liguori
>
>>thanks
>>TS
>>
>>>
>>>If a suspend completes correctly, Xend will see it (another watch will 
>>>fire),
>>>and xc_linux_save will be free to complete the save.
>>>
>>> > Also, does it seem viable to clone a copy of a deadlocked guest OS in 
>>>the
>>> > first place?
>>>
>>>If you have a byte-for-byte copy of a deadlocked guest, even if you could
>>>suspend it, surely it will be deadlocked when it is resumed.  How do you
>>>intend to break the deadlock, and how is it easier to do that from 
>>>outside
>>>than it is to perform deadlock detection in the guest?
>>>
>>>Ewan.
>>>
>>>
>>>_______________________________________________
>>>Xen-devel mailing list
>>>Xen-devel@lists.xensource.com
>>>http://lists.xensource.com/xen-devel
>>
>>_________________________________________________________________
>>Express yourself instantly with MSN Messenger! Download today - it's FREE! 
>>http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>>
>>
>>_______________________________________________
>>Xen-devel mailing list
>>Xen-devel@lists.xensource.com
>>http://lists.xensource.com/xen-devel
>

_________________________________________________________________
Don’t just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-04-07 17:11       ` T S
@ 2006-04-07 17:22         ` Keir Fraser
  2006-04-07 17:45           ` Anthony Liguori
  2006-04-07 17:41         ` Anthony Liguori
  1 sibling, 1 reply; 15+ messages in thread
From: Keir Fraser @ 2006-04-07 17:22 UTC (permalink / raw)
  To: T S; +Cc: rthelen, Xen-devel, ewan, edwin.zhai


On 7 Apr 2006, at 18:11, T S wrote:

> We took a look at the xc_linux_save() function ... and what we see is 
> that
> the canonicalize action is actually done by the Dom-0 (and not by the 
> Dom-U);
> Dom-0 is able to do this because it is able to access the page tables 
> of Dom-U
> as well as the pfn2mfn list of the Dom-U. Based on this, we think the 
> Dom-0 can
> actually save the 'context' of the deadlocked Dom-U. Please correct me 
> if this
> claim is wrong.
>
> Also, given that Dom-0 can access the page tables and other structures 
> of the deadlocked guest,
> can one of you be able to tell me what changes I need to do to 
> xm_linux_save( ) (and other related functions) to save the state of 
> the deadlocked guest without doing any handshake with the guest OS ?

You can get at the consistent state of a guest by pausing it and then 
reading its state. However, the reason for the handshake is to ensure 
that the guest is not currently accessing pagetables or doing other 
critical operations. If it were then we could not safely translate its 
memory page addresses as it could have those addresses in places like 
its kernel stacks or register contexts, where they would not get 
translated and would cause a crash on restore.

  -- Keir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-04-07 17:11       ` T S
  2006-04-07 17:22         ` Keir Fraser
@ 2006-04-07 17:41         ` Anthony Liguori
  2006-04-08  1:47           ` T S
  1 sibling, 1 reply; 15+ messages in thread
From: Anthony Liguori @ 2006-04-07 17:41 UTC (permalink / raw)
  To: T S; +Cc: rthelen, Xen-devel, ewan, edwin.zhai

T S wrote:
>> From: Anthony Liguori <aliguori@us.ibm.com>
>> To: T S <thileepan_@hotmail.com>
>> CC: xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
>> Date: Fri, 24 Mar 2006 13:24:46 -0600
>>
>> T S wrote:
>>> This may sound a silly question (pardon me because i am relatively 
>>> new to linux kernel) .. will it be possible to continue running 
>>> reboot.c (or for that matter any kernel thread) when the kernel is 
>>> deadlocked ? In Linux, is the kernel a single process or a bunch of 
>>> parallelly executing entities? If later, then during a kernel 
>>> deadlock (eg: by loading a faulty module that disables interrupts 
>>> and do something silly) there can still be some other 
>>> processes/threads run, right?
>>
>> Sorry for not making this more clear previously. You cannot restore a 
>> dead-locked domain if a normal xm save doesn't work. One thing that 
>> makes Xen unique is that guests actually are aware of what physical 
>> pages are assigned to them. When one does a save/restore, the guest 
>> has to canonicalize all of it's internal references to physical 
>> pages. When it's restored, it then remaps it's newly assigned 
>> physical pages to all the old places where it needed to know about 
>> them for some reason or another.
>
> We took a look at the xc_linux_save() function ... and what we see is 
> that
> the canonicalize action is actually done by the Dom-0 (and not by the 
> Dom-U);

Take a look at linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(). 
Canonicalization is done both in Dom-0 and in the guest itself. Dom-0 
attempts to do as much of it as it can but as I've said before, it 
cannot do all of it.

> Also, given that Dom-0 can access the page tables and other structures 
> of the deadlocked guest,
> can one of you be able to tell me what changes I need to do to 
> xm_linux_save( ) (and other related functions) to save the state of 
> the deadlocked guest without doing any handshake with the guest OS ?

If you want to attempt to futz with the state of a guest while it's 
running without the guest cooperating, your best bet is to do as Keir 
suggested and pause the domain, make your changes, and then unpause.

Regards,

Anthony Liguori

>
> thanks!
> - T
>
>
>> If the guest isn't responsive when you do a save, then it will never 
>> canonicalize itself and there is no way to restore the domain.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> thanks
>>> TS
>>>
>>>>
>>>> If a suspend completes correctly, Xend will see it (another watch 
>>>> will fire),
>>>> and xc_linux_save will be free to complete the save.
>>>>
>>>> > Also, does it seem viable to clone a copy of a deadlocked guest 
>>>> OS in the
>>>> > first place?
>>>>
>>>> If you have a byte-for-byte copy of a deadlocked guest, even if you 
>>>> could
>>>> suspend it, surely it will be deadlocked when it is resumed. How do 
>>>> you
>>>> intend to break the deadlock, and how is it easier to do that from 
>>>> outside
>>>> than it is to perform deadlock detection in the guest?
>>>>
>>>> Ewan.
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>
>>> _________________________________________________________________
>>> Express yourself instantly with MSN Messenger! Download today - it's 
>>> FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>
>
> _________________________________________________________________
> Don’t just search. Find. Check out the new MSN Search! 
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-04-07 17:22         ` Keir Fraser
@ 2006-04-07 17:45           ` Anthony Liguori
  0 siblings, 0 replies; 15+ messages in thread
From: Anthony Liguori @ 2006-04-07 17:45 UTC (permalink / raw)
  To: Keir Fraser; +Cc: rthelen, T S, Xen-devel, ewan, edwin.zhai

Keir Fraser wrote:
>
> On 7 Apr 2006, at 18:11, T S wrote:
>
>> We took a look at the xc_linux_save() function ... and what we see is 
>> that
>> the canonicalize action is actually done by the Dom-0 (and not by the 
>> Dom-U);
>> Dom-0 is able to do this because it is able to access the page tables 
>> of Dom-U
>> as well as the pfn2mfn list of the Dom-U. Based on this, we think the 
>> Dom-0 can
>> actually save the 'context' of the deadlocked Dom-U. Please correct 
>> me if this
>> claim is wrong.
>>
>> Also, given that Dom-0 can access the page tables and other 
>> structures of the deadlocked guest,
>> can one of you be able to tell me what changes I need to do to 
>> xm_linux_save( ) (and other related functions) to save the state of 
>> the deadlocked guest without doing any handshake with the guest OS ?
>
> You can get at the consistent state of a guest by pausing it and then 
> reading its state. However, the reason for the handshake is to ensure 
> that the guest is not currently accessing pagetables or doing other 
> critical operations. If it were then we could not safely translate its 
> memory page addresses as it could have those addresses in places like 
> its kernel stacks or register contexts, where they would not get 
> translated and would cause a crash on restore.

I should add that this is a problem specific to writable page tables as 
the guest must be aware of the actual physical pages that it is using.   
With a VT/SVM guest or on an architecture that doesn't use writable page 
tables, this isn't an issue.

Regards,

Anthony Liguoi

>  -- Keir
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-04-07 17:41         ` Anthony Liguori
@ 2006-04-08  1:47           ` T S
  2006-04-08 14:38             ` Anthony Liguori
  0 siblings, 1 reply; 15+ messages in thread
From: T S @ 2006-04-08  1:47 UTC (permalink / raw)
  To: aliguori; +Cc: rthelen, Xen-devel, ewan, edwin.zhai

>From: Anthony Liguori <aliguori@us.ibm.com>
>To: T S <thileepan_@hotmail.com>
>CC: ewan@xensource.com, edwin.zhai@intel.com, rthelen@netapp.com,        
>Xen-devel@lists.xensource.com, Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
>Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
>Date: Fri, 07 Apr 2006 12:41:20 -0500
>
>T S wrote:
>>>From: Anthony Liguori <aliguori@us.ibm.com>
>>>To: T S <thileepan_@hotmail.com>
>>>CC: xen-devel@lists.xensource.com
>>>Subject: Re: [Xen-devel] Detecting deadlocks with hypervisor..
>>>Date: Fri, 24 Mar 2006 13:24:46 -0600
>>>
>>>T S wrote:
>>>>This may sound a silly question (pardon me because i am relatively new 
>>>>to linux kernel) .. will it be possible to continue running reboot.c (or 
>>>>for that matter any kernel thread) when the kernel is deadlocked ? In 
>>>>Linux, is the kernel a single process or a bunch of parallelly executing 
>>>>entities? If later, then during a kernel deadlock (eg: by loading a 
>>>>faulty module that disables interrupts and do something silly) there can 
>>>>still be some other processes/threads run, right?
>>>
>>>Sorry for not making this more clear previously. You cannot restore a 
>>>dead-locked domain if a normal xm save doesn't work. One thing that makes 
>>>Xen unique is that guests actually are aware of what physical pages are 
>>>assigned to them. When one does a save/restore, the guest has to 
>>>canonicalize all of it's internal references to physical pages. When it's 
>>>restored, it then remaps it's newly assigned physical pages to all the 
>>>old places where it needed to know about them for some reason or another.
>>
>>We took a look at the xc_linux_save() function ... and what we see is that
>>the canonicalize action is actually done by the Dom-0 (and not by the 
>>Dom-U);
>
>Take a look at linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(). 
>Canonicalization is done both in Dom-0 and in the guest itself. Dom-0 
>attempts to do as much of it as it can but as I've said before, it cannot 
>do all of it.

Anthony,
Thank you for your reply.
In linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(), we see store_mfn 
and console_mfn being canonicalized before the guest-OS goes to sleep (as 
done in "xm save"). But before this canonicalization took place the python 
layer writes the store_mfn and console_mfn into the save-file (in the file's 
header area).

Does this mean the store_mfn and console_mfn values present in the header of 
the file are re-written at a later part of the file ?

Other than the store & console mfn's are there any other parameters 
canoicalized BY the guest OS during "xm save" ?

thanks.


>
>>Also, given that Dom-0 can access the page tables and other structures of 
>>the deadlocked guest,
>>can one of you be able to tell me what changes I need to do to 
>>xm_linux_save( ) (and other related functions) to save the state of the 
>>deadlocked guest without doing any handshake with the guest OS ?
>
>If you want to attempt to futz with the state of a guest while it's running 
>without the guest cooperating, your best bet is to do as Keir suggested and 
>pause the domain, make your changes, and then unpause.
>
>Regards,
>
>Anthony Liguori
>
>>
>>thanks!
>>- T
>>
>>
>>>If the guest isn't responsive when you do a save, then it will never 
>>>canonicalize itself and there is no way to restore the domain.
>>>
>>>Regards,
>>>
>>>Anthony Liguori
>>>
>>>>thanks
>>>>TS
>>>>
>>>>>
>>>>>If a suspend completes correctly, Xend will see it (another watch will 
>>>>>fire),
>>>>>and xc_linux_save will be free to complete the save.
>>>>>
>>>>> > Also, does it seem viable to clone a copy of a deadlocked guest OS 
>>>>>in the
>>>>> > first place?
>>>>>
>>>>>If you have a byte-for-byte copy of a deadlocked guest, even if you 
>>>>>could
>>>>>suspend it, surely it will be deadlocked when it is resumed. How do you
>>>>>intend to break the deadlock, and how is it easier to do that from 
>>>>>outside
>>>>>than it is to perform deadlock detection in the guest?
>>>>>
>>>>>Ewan.
>>>>>
>>>>>
>>>>>_______________________________________________
>>>>>Xen-devel mailing list
>>>>>Xen-devel@lists.xensource.com
>>>>>http://lists.xensource.com/xen-devel
>>>>
>>>>_________________________________________________________________
>>>>Express yourself instantly with MSN Messenger! Download today - it's 
>>>>FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>>>>
>>>>
>>>>_______________________________________________
>>>>Xen-devel mailing list
>>>>Xen-devel@lists.xensource.com
>>>>http://lists.xensource.com/xen-devel
>>>
>>
>>_________________________________________________________________
>>Don’t just search. Find. Check out the new MSN Search! 
>>http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>>
>

_________________________________________________________________
Don’t just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting deadlocks with hypervisor..
  2006-04-08  1:47           ` T S
@ 2006-04-08 14:38             ` Anthony Liguori
  0 siblings, 0 replies; 15+ messages in thread
From: Anthony Liguori @ 2006-04-08 14:38 UTC (permalink / raw)
  To: T S; +Cc: rthelen, Xen-devel, ewan, edwin.zhai

T S wrote:
>> Take a look at linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(). 
>> Canonicalization is done both in Dom-0 and in the guest itself. Dom-0 
>> attempts to do as much of it as it can but as I've said before, it 
>> cannot do all of it.
>
> Anthony,
> Thank you for your reply.
> In linux-2.6-sparse/drivers/core/reboot.c:__do_suspend(), we see 
> store_mfn and console_mfn being canonicalized before the guest-OS goes 
> to sleep (as done in "xm save"). But before this canonicalization took 
> place the python layer writes the store_mfn and console_mfn into the 
> save-file (in the file's header area).

Yes, although this strictly isn't necessary.

> Does this mean the store_mfn and console_mfn values present in the 
> header of the file are re-written at a later part of the file ?
>
> Other than the store & console mfn's are there any other parameters 
> canoicalized BY the guest OS during "xm save" ?

Not currently, although, as Keir pointed out, you still have to contend 
with the fact that a guest may have a cached PFN somewhere (for 
instance, because it's in the process of updating a page table).

Regards,

Anthony Liguori

> thanks.
>
>
>>
>>> Also, given that Dom-0 can access the page tables and other 
>>> structures of the deadlocked guest,
>>> can one of you be able to tell me what changes I need to do to 
>>> xm_linux_save( ) (and other related functions) to save the state of 
>>> the deadlocked guest without doing any handshake with the guest OS ?
>>
>> If you want to attempt to futz with the state of a guest while it's 
>> running without the guest cooperating, your best bet is to do as Keir 
>> suggested and pause the domain, make your changes, and then unpause.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>>
>>> thanks!
>>> - T
>>>
>>>
>>>> If the guest isn't responsive when you do a save, then it will 
>>>> never canonicalize itself and there is no way to restore the domain.
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>>>
>>>>> thanks
>>>>> TS
>>>>>
>>>>>>
>>>>>> If a suspend completes correctly, Xend will see it (another watch 
>>>>>> will fire),
>>>>>> and xc_linux_save will be free to complete the save.
>>>>>>
>>>>>> > Also, does it seem viable to clone a copy of a deadlocked guest 
>>>>>> OS in the
>>>>>> > first place?
>>>>>>
>>>>>> If you have a byte-for-byte copy of a deadlocked guest, even if 
>>>>>> you could
>>>>>> suspend it, surely it will be deadlocked when it is resumed. How 
>>>>>> do you
>>>>>> intend to break the deadlock, and how is it easier to do that 
>>>>>> from outside
>>>>>> than it is to perform deadlock detection in the guest?
>>>>>>
>>>>>> Ewan.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>> _________________________________________________________________
>>>>> Express yourself instantly with MSN Messenger! Download today - 
>>>>> it's FREE! 
>>>>> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>>
>>>
>>> _________________________________________________________________
>>> Don’t just search. Find. Check out the new MSN Search! 
>>> http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>>>
>>
>
> _________________________________________________________________
> Don’t just search. Find. Check out the new MSN Search! 
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-04-08 14:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-19  2:14 Detecting deadlocks with hypervisor Thileepan Subramaniam
2006-03-19  6:37 ` Randy Thelen
2006-03-19 10:16 ` Edwin Zhai
2006-03-19 13:17 ` Ewan Mellor
2006-03-24 18:57   ` T S
2006-03-24 19:04   ` T S
2006-03-24 19:24     ` Anthony Liguori
2006-03-24 20:30       ` T S
2006-04-07 17:11       ` T S
2006-04-07 17:22         ` Keir Fraser
2006-04-07 17:45           ` Anthony Liguori
2006-04-07 17:41         ` Anthony Liguori
2006-04-08  1:47           ` T S
2006-04-08 14:38             ` Anthony Liguori
2006-03-19 16:30 ` Anthony Liguori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.