All of lore.kernel.org
 help / color / mirror / Atom feed
* Should shadow_lock be spin_lock_recursive?
@ 2005-05-11 17:42 Arun Sharma
  2005-05-12  1:57 ` Xiaofeng Ling
  0 siblings, 1 reply; 5+ messages in thread
From: Arun Sharma @ 2005-05-11 17:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Ling, Xiaofeng


During our testing, we found this code path where xen attempts to grab 
the shadow_lock, while holding it - leading to a deadlock.

 >> free_dom_mem->
 >> shadow_sync_and_drop_references->
 >> shadow_lock -> ..................... first lock
 >> shadow_remove_all_access->
 >> remove_all_access_in_page->
 >> put_page->
 >> free_domheap_pages->
 >> shadow_drop_references->
 >> shadow_lock -> ..................... second lock

Questions:

- should shadow lock be recursive?
- is shadow lock too coarse grained? It seems to have led to a lot of 
code refactoring (__foo without lock and foo with lock). But there may 
be more such instances we haven't found yet.

	-Arun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Should shadow_lock be spin_lock_recursive?
  2005-05-11 17:42 Should shadow_lock be spin_lock_recursive? Arun Sharma
@ 2005-05-12  1:57 ` Xiaofeng Ling
  2005-05-12  7:57   ` Keir Fraser
  2005-05-12 12:59   ` Michael A Fetterman
  0 siblings, 2 replies; 5+ messages in thread
From: Xiaofeng Ling @ 2005-05-12  1:57 UTC (permalink / raw)
  To: Arun Sharma; +Cc: xen-devel

This dead lock happened on VNIF code when enabled shadow mode.
The shadow_lock path is so complex and maybe using recurisve lock
is the easiest way to avoid dead lock currently before clearing
the path or splitting the lock .

Arun Sharma wrote:
> 
> During our testing, we found this code path where xen attempts to grab 
> the shadow_lock, while holding it - leading to a deadlock.
> 
>  >> free_dom_mem->
>  >> shadow_sync_and_drop_references->
>  >> shadow_lock -> ..................... first lock
>  >> shadow_remove_all_access->
>  >> remove_all_access_in_page->
>  >> put_page->
>  >> free_domheap_pages->
>  >> shadow_drop_references->
>  >> shadow_lock -> ..................... second lock
> 
> Questions:
> 
> - should shadow lock be recursive?
> - is shadow lock too coarse grained? It seems to have led to a lot of 
> code refactoring (__foo without lock and foo with lock). But there may 
> be more such instances we haven't found yet.
> 
>     -Arun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: Should shadow_lock be spin_lock_recursive?
  2005-05-12  1:57 ` Xiaofeng Ling
@ 2005-05-12  7:57   ` Keir Fraser
  2005-05-12 12:59   ` Michael A Fetterman
  1 sibling, 0 replies; 5+ messages in thread
From: Keir Fraser @ 2005-05-12  7:57 UTC (permalink / raw)
  To: Xiaofeng Ling; +Cc: Arun Sharma, xen-devel


On 12 May 2005, at 02:57, Xiaofeng Ling wrote:

> This dead lock happened on VNIF code when enabled shadow mode.
> The shadow_lock path is so complex and maybe using recurisve lock
> is the easiest way to avoid dead lock currently before clearing
> the path or splitting the lock .

shadow_lock() and show_unlock() should just get the per-domain BIGLOCK 
I think. That's recursive anyway.

  -- Keir

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Re: Should shadow_lock be spin_lock_recursive?
  2005-05-12  1:57 ` Xiaofeng Ling
  2005-05-12  7:57   ` Keir Fraser
@ 2005-05-12 12:59   ` Michael A Fetterman
  2005-05-13 22:40     ` Arun Sharma
  1 sibling, 1 reply; 5+ messages in thread
From: Michael A Fetterman @ 2005-05-12 12:59 UTC (permalink / raw)
  To: Ling, Xiaofeng, Sharma, Arun; +Cc: xen-devel

I think there's another bug somewhere that's provoking this.
There may be a flaw in my argument, below, but I currently think this
argument is correct:

Consider the call tree:

free_dom_mem() is trying to get rid of all shadow references to page X,
so that it can relinguish page X back to the free list.
*Note that free_dom_mem() has done a get_page() on X, so X's refcount
must be >= 1...

free_dom_mem() calls shadow_sync_and_drop_references(X),
which calls shadow_remove_all_access(X),
which calls remove_all_access_in_page(random shadow page, X),
which (when it finds references to X) calls put_page(X).

However, those calls to put_page(X) should never result in calls
to free_domheap_pages(), as X's refcount should always be >= 1
because of the get_page performed in free_dom_mem().

So that tells me the refcount on X was already broken before we got
here...

Michael 

-----Original Message-----
From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Xiaofeng Ling
Sent: Thursday, May 12, 2005 2:57 AM
To: Sharma, Arun
Cc: xen-devel@lists.xensource.com
Subject: [Xen-devel] Re: Should shadow_lock be spin_lock_recursive?

This dead lock happened on VNIF code when enabled shadow mode.
The shadow_lock path is so complex and maybe using recurisve lock
is the easiest way to avoid dead lock currently before clearing
the path or splitting the lock .

Arun Sharma wrote:
> 
> During our testing, we found this code path where xen attempts to grab 
> the shadow_lock, while holding it - leading to a deadlock.
> 
>  >> free_dom_mem->
>  >> shadow_sync_and_drop_references->
>  >> shadow_lock -> ..................... first lock
>  >> shadow_remove_all_access->
>  >> remove_all_access_in_page->
>  >> put_page->
>  >> free_domheap_pages->
>  >> shadow_drop_references->
>  >> shadow_lock -> ..................... second lock
> 
> Questions:
> 
> - should shadow lock be recursive?
> - is shadow lock too coarse grained? It seems to have led to a lot of 
> code refactoring (__foo without lock and foo with lock). But there may 
> be more such instances we haven't found yet.
> 
>     -Arun


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: Should shadow_lock be spin_lock_recursive?
  2005-05-12 12:59   ` Michael A Fetterman
@ 2005-05-13 22:40     ` Arun Sharma
  0 siblings, 0 replies; 5+ messages in thread
From: Arun Sharma @ 2005-05-13 22:40 UTC (permalink / raw)
  To: Michael A Fetterman; +Cc: Ling, Xiaofeng, xen-devel

Michael A Fetterman wrote:
> 
> However, those calls to put_page(X) should never result in calls
> to free_domheap_pages(), as X's refcount should always be >= 1
> because of the get_page performed in free_dom_mem().
> 
> So that tells me the refcount on X was already broken before we got
> here...
> 

Thanks for the explanation. It was a patched xen that we were testing 
and the problem may have been caused by our patch. We'll try to look 
closely at the patch to figure out why the refrence counts were wrong.

	-Arun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-05-13 22:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-11 17:42 Should shadow_lock be spin_lock_recursive? Arun Sharma
2005-05-12  1:57 ` Xiaofeng Ling
2005-05-12  7:57   ` Keir Fraser
2005-05-12 12:59   ` Michael A Fetterman
2005-05-13 22:40     ` Arun Sharma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.