From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47621) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X6bkQ-0001Aq-DZ for qemu-devel@nongnu.org; Mon, 14 Jul 2014 04:36:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X6bkH-00053U-Da for qemu-devel@nongnu.org; Mon, 14 Jul 2014 04:36:34 -0400 Received: from mail-qa0-x22c.google.com ([2607:f8b0:400d:c00::22c]:64676) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X6bkH-00053L-7n for qemu-devel@nongnu.org; Mon, 14 Jul 2014 04:36:25 -0400 Received: by mail-qa0-f44.google.com with SMTP id f12so2911440qad.31 for ; Mon, 14 Jul 2014 01:36:24 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <53C39685.2090400@redhat.com> Date: Mon, 14 Jul 2014 10:36:21 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1405077612-7806-1-git-send-email-stefanha@redhat.com> <1405077612-7806-3-git-send-email-stefanha@redhat.com> In-Reply-To: <1405077612-7806-3-git-send-email-stefanha@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH for-2.1? 2/2] thread-pool: avoid deadlock in nested aio_poll() calls List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , qemu-devel@nongnu.org Cc: Christian Borntraeger Il 11/07/2014 13:20, Stefan Hajnoczi ha scritto: > The thread pool has a race condition if two elements complete before > thread_pool_completion_bh() runs: > > If element A's callback waits for element B using aio_poll() it will > deadlock since pool->completion_bh is not marked scheduled when the > nested aio_poll() runs. > > Fix this by marking the BH scheduled while thread_pool_completion_bh() > is executing. This way any nested aio_poll() loops will enter > thread_pool_completion_bh() and complete the remaining elements. > > Signed-off-by: Stefan Hajnoczi > --- > thread-pool.c | 27 +++++++++++++++++++++++++-- > 1 file changed, 25 insertions(+), 2 deletions(-) > > diff --git a/thread-pool.c b/thread-pool.c > index 4cfd078..0ede168 100644 > --- a/thread-pool.c > +++ b/thread-pool.c > @@ -65,6 +65,9 @@ struct ThreadPool { > int max_threads; > QEMUBH *new_thread_bh; > > + /* Atomic counter to detect completions while completion handler runs */ > + uint32_t completion_token; > + > /* The following variables are only accessed from one AioContext. */ > QLIST_HEAD(, ThreadPoolElement) head; > > @@ -118,6 +121,7 @@ static void *worker_thread(void *opaque) > qemu_cond_broadcast(&pool->check_cancel); > } > > + atomic_inc(&pool->completion_token); > qemu_bh_schedule(pool->completion_bh); > } > > @@ -167,9 +171,8 @@ static void spawn_thread(ThreadPool *pool) > } > } > > -static void thread_pool_completion_bh(void *opaque) > +static void thread_pool_complete_elements(ThreadPool *pool) > { > - ThreadPool *pool = opaque; > ThreadPoolElement *elem, *next; > > restart: > @@ -196,6 +199,26 @@ restart: > } > } > > +static void thread_pool_completion_bh(void *opaque) > +{ > + ThreadPool *pool = opaque; > + uint32_t token; > + > + do { > + token = atomic_mb_read(&pool->completion_token); > + > + /* Stay scheduled in case elem->common.cb() makes a nested aio_poll() > + * call. This avoids deadlock if element A's callback waits for > + * element B and both completed at the same time. > + */ > + qemu_bh_schedule(pool->completion_bh); > + > + thread_pool_complete_elements(pool); > + > + qemu_bh_cancel(pool->completion_bh); > + } while (token != pool->completion_token); > +} > + > static void thread_pool_cancel(BlockDriverAIOCB *acb) > { > ThreadPoolElement *elem = (ThreadPoolElement *)acb; > I am not sure I understand this patch. The simplest way to fix deadlock is to change this in thread_pool_completion_bh: elem->common.cb(elem->common.opaque, elem->ret); qemu_aio_release(elem); goto restart; to /* In case elem->common.cb() makes a nested aio_poll() call, * next may become invalid as well. Instead of just * restarting the QLIST_FOREACH_SAFE, go through the BH * once more, which also avoids deadlock if element A's * callback waits for element B and both completed at the * same time. */ qemu_bh_schedule(pool->completion_bh); elem->common.cb(elem->common.opaque, elem->ret); qemu_aio_release(elem); There is no change in logic, it's just that the goto is switched to a BH representing a continuation. I am then not sure why pool->completion_token is necessary? Perhaps it is just an optimization to avoid going multiple times around aio_poll()? Paolo