* Should shadow_lock be spin_lock_recursive? @ 2005-05-11 17:42 Arun Sharma 2005-05-12 1:57 ` Xiaofeng Ling 0 siblings, 1 reply; 5+ messages in thread From: Arun Sharma @ 2005-05-11 17:42 UTC (permalink / raw) To: xen-devel; +Cc: Ling, Xiaofeng During our testing, we found this code path where xen attempts to grab the shadow_lock, while holding it - leading to a deadlock. >> free_dom_mem-> >> shadow_sync_and_drop_references-> >> shadow_lock -> ..................... first lock >> shadow_remove_all_access-> >> remove_all_access_in_page-> >> put_page-> >> free_domheap_pages-> >> shadow_drop_references-> >> shadow_lock -> ..................... second lock Questions: - should shadow lock be recursive? - is shadow lock too coarse grained? It seems to have led to a lot of code refactoring (__foo without lock and foo with lock). But there may be more such instances we haven't found yet. -Arun ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Should shadow_lock be spin_lock_recursive? 2005-05-11 17:42 Should shadow_lock be spin_lock_recursive? Arun Sharma @ 2005-05-12 1:57 ` Xiaofeng Ling 2005-05-12 7:57 ` Keir Fraser 2005-05-12 12:59 ` Michael A Fetterman 0 siblings, 2 replies; 5+ messages in thread From: Xiaofeng Ling @ 2005-05-12 1:57 UTC (permalink / raw) To: Arun Sharma; +Cc: xen-devel This dead lock happened on VNIF code when enabled shadow mode. The shadow_lock path is so complex and maybe using recurisve lock is the easiest way to avoid dead lock currently before clearing the path or splitting the lock . Arun Sharma wrote: > > During our testing, we found this code path where xen attempts to grab > the shadow_lock, while holding it - leading to a deadlock. > > >> free_dom_mem-> > >> shadow_sync_and_drop_references-> > >> shadow_lock -> ..................... first lock > >> shadow_remove_all_access-> > >> remove_all_access_in_page-> > >> put_page-> > >> free_domheap_pages-> > >> shadow_drop_references-> > >> shadow_lock -> ..................... second lock > > Questions: > > - should shadow lock be recursive? > - is shadow lock too coarse grained? It seems to have led to a lot of > code refactoring (__foo without lock and foo with lock). But there may > be more such instances we haven't found yet. > > -Arun ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: Should shadow_lock be spin_lock_recursive? 2005-05-12 1:57 ` Xiaofeng Ling @ 2005-05-12 7:57 ` Keir Fraser 2005-05-12 12:59 ` Michael A Fetterman 1 sibling, 0 replies; 5+ messages in thread From: Keir Fraser @ 2005-05-12 7:57 UTC (permalink / raw) To: Xiaofeng Ling; +Cc: Arun Sharma, xen-devel On 12 May 2005, at 02:57, Xiaofeng Ling wrote: > This dead lock happened on VNIF code when enabled shadow mode. > The shadow_lock path is so complex and maybe using recurisve lock > is the easiest way to avoid dead lock currently before clearing > the path or splitting the lock . shadow_lock() and show_unlock() should just get the per-domain BIGLOCK I think. That's recursive anyway. -- Keir ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Re: Should shadow_lock be spin_lock_recursive? 2005-05-12 1:57 ` Xiaofeng Ling 2005-05-12 7:57 ` Keir Fraser @ 2005-05-12 12:59 ` Michael A Fetterman 2005-05-13 22:40 ` Arun Sharma 1 sibling, 1 reply; 5+ messages in thread From: Michael A Fetterman @ 2005-05-12 12:59 UTC (permalink / raw) To: Ling, Xiaofeng, Sharma, Arun; +Cc: xen-devel I think there's another bug somewhere that's provoking this. There may be a flaw in my argument, below, but I currently think this argument is correct: Consider the call tree: free_dom_mem() is trying to get rid of all shadow references to page X, so that it can relinguish page X back to the free list. *Note that free_dom_mem() has done a get_page() on X, so X's refcount must be >= 1... free_dom_mem() calls shadow_sync_and_drop_references(X), which calls shadow_remove_all_access(X), which calls remove_all_access_in_page(random shadow page, X), which (when it finds references to X) calls put_page(X). However, those calls to put_page(X) should never result in calls to free_domheap_pages(), as X's refcount should always be >= 1 because of the get_page performed in free_dom_mem(). So that tells me the refcount on X was already broken before we got here... Michael -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Xiaofeng Ling Sent: Thursday, May 12, 2005 2:57 AM To: Sharma, Arun Cc: xen-devel@lists.xensource.com Subject: [Xen-devel] Re: Should shadow_lock be spin_lock_recursive? This dead lock happened on VNIF code when enabled shadow mode. The shadow_lock path is so complex and maybe using recurisve lock is the easiest way to avoid dead lock currently before clearing the path or splitting the lock . Arun Sharma wrote: > > During our testing, we found this code path where xen attempts to grab > the shadow_lock, while holding it - leading to a deadlock. > > >> free_dom_mem-> > >> shadow_sync_and_drop_references-> > >> shadow_lock -> ..................... first lock > >> shadow_remove_all_access-> > >> remove_all_access_in_page-> > >> put_page-> > >> free_domheap_pages-> > >> shadow_drop_references-> > >> shadow_lock -> ..................... second lock > > Questions: > > - should shadow lock be recursive? > - is shadow lock too coarse grained? It seems to have led to a lot of > code refactoring (__foo without lock and foo with lock). But there may > be more such instances we haven't found yet. > > -Arun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: Should shadow_lock be spin_lock_recursive? 2005-05-12 12:59 ` Michael A Fetterman @ 2005-05-13 22:40 ` Arun Sharma 0 siblings, 0 replies; 5+ messages in thread From: Arun Sharma @ 2005-05-13 22:40 UTC (permalink / raw) To: Michael A Fetterman; +Cc: Ling, Xiaofeng, xen-devel Michael A Fetterman wrote: > > However, those calls to put_page(X) should never result in calls > to free_domheap_pages(), as X's refcount should always be >= 1 > because of the get_page performed in free_dom_mem(). > > So that tells me the refcount on X was already broken before we got > here... > Thanks for the explanation. It was a patched xen that we were testing and the problem may have been caused by our patch. We'll try to look closely at the patch to figure out why the refrence counts were wrong. -Arun ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-05-13 22:40 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-11 17:42 Should shadow_lock be spin_lock_recursive? Arun Sharma 2005-05-12 1:57 ` Xiaofeng Ling 2005-05-12 7:57 ` Keir Fraser 2005-05-12 12:59 ` Michael A Fetterman 2005-05-13 22:40 ` Arun Sharma
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.