From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Shutdown problems in xs.c Date: Tue, 11 May 2010 15:01:29 -0700 Message-ID: <4BE9D3B9.2050206@goop.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: Xen-devel , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org I've been getting deadlocks in xl, particularly "xl destroy". It turns out the main thread is stuck in a pthread_join while holding all the mutexes, while the xenstore reading thread is stuck in a pthread_mutex_lock before it can get to a cancellation point and exit. This looks like it is a very long-standing deadlock (the code in question mostly dates back to 2005), but perhaps something has changed that makes it more likely to happen. I think the original intention of the code was to hold all the mutexes while doing the cancel/join to avoid cancelling while the reader is holding any mutexes. This fails when the reader loop is not holding any, but needs to take one before getting to a cancellation point (pthread_mutex_lock is not itself a cancellation point). The following two patches address it by 1) making sure that the read thread has sufficient pthread cleanup handlers to free any allocated-but-unused memory and release the mutexes when cancelled, and 2) do the pthread cancel/join while not holding any mutexes. Thanks, J