From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60869) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X73dy-0004bF-TC for qemu-devel@nongnu.org; Tue, 15 Jul 2014 10:23:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X73dp-0007Lp-QB for qemu-devel@nongnu.org; Tue, 15 Jul 2014 10:23:46 -0400 Received: from mail-wi0-x231.google.com ([2a00:1450:400c:c05::231]:34265) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X73dp-0007Lj-FY for qemu-devel@nongnu.org; Tue, 15 Jul 2014 10:23:37 -0400 Received: by mail-wi0-f177.google.com with SMTP id ho1so4407362wib.4 for ; Tue, 15 Jul 2014 07:23:36 -0700 (PDT) Date: Tue, 15 Jul 2014 16:23:27 +0200 From: Stefan Hajnoczi Message-ID: <20140715142327.GC9837@stefanha-thinkpad.redhat.com> References: <1405077612-7806-1-git-send-email-stefanha@redhat.com> <1405077612-7806-3-git-send-email-stefanha@redhat.com> <53C39685.2090400@redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="z4+8/lEcDcG5Ke9S" Content-Disposition: inline In-Reply-To: <53C39685.2090400@redhat.com> Subject: Re: [Qemu-devel] [PATCH for-2.1? 2/2] thread-pool: avoid deadlock in nested aio_poll() calls List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Christian Borntraeger , qemu-devel@nongnu.org, Stefan Hajnoczi --z4+8/lEcDcG5Ke9S Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 14, 2014 at 10:36:21AM +0200, Paolo Bonzini wrote: > Il 11/07/2014 13:20, Stefan Hajnoczi ha scritto: > >The thread pool has a race condition if two elements complete before > >thread_pool_completion_bh() runs: > > > > If element A's callback waits for element B using aio_poll() it will > > deadlock since pool->completion_bh is not marked scheduled when the > > nested aio_poll() runs. > > > >Fix this by marking the BH scheduled while thread_pool_completion_bh() > >is executing. This way any nested aio_poll() loops will enter > >thread_pool_completion_bh() and complete the remaining elements. > > > >Signed-off-by: Stefan Hajnoczi > >--- > > thread-pool.c | 27 +++++++++++++++++++++++++-- > > 1 file changed, 25 insertions(+), 2 deletions(-) > > > >diff --git a/thread-pool.c b/thread-pool.c > >index 4cfd078..0ede168 100644 > >--- a/thread-pool.c > >+++ b/thread-pool.c > >@@ -65,6 +65,9 @@ struct ThreadPool { > > int max_threads; > > QEMUBH *new_thread_bh; > > > >+ /* Atomic counter to detect completions while completion handler ru= ns */ > >+ uint32_t completion_token; > >+ > > /* The following variables are only accessed from one AioContext. */ > > QLIST_HEAD(, ThreadPoolElement) head; > > > >@@ -118,6 +121,7 @@ static void *worker_thread(void *opaque) > > qemu_cond_broadcast(&pool->check_cancel); > > } > > > >+ atomic_inc(&pool->completion_token); > > qemu_bh_schedule(pool->completion_bh); > > } > > > >@@ -167,9 +171,8 @@ static void spawn_thread(ThreadPool *pool) > > } > > } > > > >-static void thread_pool_completion_bh(void *opaque) > >+static void thread_pool_complete_elements(ThreadPool *pool) > > { > >- ThreadPool *pool =3D opaque; > > ThreadPoolElement *elem, *next; > > > > restart: > >@@ -196,6 +199,26 @@ restart: > > } > > } > > > >+static void thread_pool_completion_bh(void *opaque) > >+{ > >+ ThreadPool *pool =3D opaque; > >+ uint32_t token; > >+ > >+ do { > >+ token =3D atomic_mb_read(&pool->completion_token); > >+ > >+ /* Stay scheduled in case elem->common.cb() makes a nested aio_= poll() > >+ * call. This avoids deadlock if element A's callback waits for > >+ * element B and both completed at the same time. > >+ */ > >+ qemu_bh_schedule(pool->completion_bh); > >+ > >+ thread_pool_complete_elements(pool); > >+ > >+ qemu_bh_cancel(pool->completion_bh); > >+ } while (token !=3D pool->completion_token); > >+} > >+ > > static void thread_pool_cancel(BlockDriverAIOCB *acb) > > { > > ThreadPoolElement *elem =3D (ThreadPoolElement *)acb; > > >=20 > I am not sure I understand this patch. >=20 > The simplest way to fix deadlock is to change this in > thread_pool_completion_bh: >=20 > elem->common.cb(elem->common.opaque, elem->ret); > qemu_aio_release(elem); > goto restart; >=20 > to >=20 > /* In case elem->common.cb() makes a nested aio_poll() call, > * next may become invalid as well. Instead of just > * restarting the QLIST_FOREACH_SAFE, go through the BH > * once more, which also avoids deadlock if element A's > * callback waits for element B and both completed at the > * same time. > */ > qemu_bh_schedule(pool->completion_bh); > elem->common.cb(elem->common.opaque, elem->ret); > qemu_aio_release(elem); >=20 > There is no change in logic, it's just that the goto is switched to a BH > representing a continuation. I am then not sure why pool->completion_tok= en > is necessary? >=20 > Perhaps it is just an optimization to avoid going multiple times around > aio_poll()? Yes, it's just an optimization. I wanted to cancel pool->completion_bh to avoid needlessly entering the BH. But if we call cancel then we must watch for worker threads that complete a request while we're invoking completions - hence the completion_token. Your approach is simpler and I like that. I'll send a v2. Stefan --z4+8/lEcDcG5Ke9S Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJTxTlfAAoJEJykq7OBq3PI/DUH/0qPG0wya23ZXQVUZ/uCHZfe MwzeECcLgN5H9AyJcKpchf1LW+qOfrmytdQFBgEj8SytjuTPwb6GeGnvqVbt5tGA EEIE5Cgsh9w3JZQX7HQ7F4nYUd3cYCkPFOdOP1lpv00WXoDS8JURhATCpu3eIMii ohviDDdj5Rcox3sJptGT6Dji5fxMwKeXmcr2dWF16QE3Ji5fSR0jik3nYLFBHnhQ uEs4Zu6qsJKm5Z1wWhD5OwOB1fhlgyqusD5sjqEWj9pQzIwTERbuK+FUUl9N+H/X yJ9awuzLuR+EODo6iMdAFA919pVxXlpa/aBL691rbapA8nyhEWSM6XNIwnB9M+0= =lzwP -----END PGP SIGNATURE----- --z4+8/lEcDcG5Ke9S--