From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36471) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4akD-0007Ol-0d for qemu-devel@nongnu.org; Tue, 08 Jul 2014 15:08:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X4ak3-0001Sl-Ob for qemu-devel@nongnu.org; Tue, 08 Jul 2014 15:08:00 -0400 Received: from e06smtp13.uk.ibm.com ([195.75.94.109]:60972) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4ak3-0001Sc-G8 for qemu-devel@nongnu.org; Tue, 08 Jul 2014 15:07:51 -0400 Received: from /spool/local by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 8 Jul 2014 20:07:49 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id EAB471B08051 for ; Tue, 8 Jul 2014 20:08:24 +0100 (BST) Received: from d06av05.portsmouth.uk.ibm.com (d06av05.portsmouth.uk.ibm.com [9.149.37.229]) by b06cxnps3075.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s68J7lXZ29622310 for ; Tue, 8 Jul 2014 19:07:47 GMT Received: from d06av05.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av05.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s68J7kGC024005 for ; Tue, 8 Jul 2014 13:07:47 -0600 Message-ID: <53BC4181.6070604@de.ibm.com> Date: Tue, 08 Jul 2014 21:07:45 +0200 From: Christian Borntraeger MIME-Version: 1.0 References: <53BA8B49.9050709@de.ibm.com> <20140708155956.GB11505@stefanha-thinkpad.redhat.com> <53BC2579.9060200@redhat.com> In-Reply-To: <53BC2579.9060200@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] another locking issue in current dataplane code? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Stefan Hajnoczi Cc: Cornelia Huck , Kevin Wolf , ming.lei@canonical.com, "qemu-devel@nongnu.org" , Dominik Dingel On 08/07/14 19:08, Paolo Bonzini wrote: > Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto: >> I sent Christian an initial patch to fix this but now both threads are >> stuck in rfifolock_lock() inside cond wait. That's very strange and >> should never happen. > > I had this patch pending for 2.2: > > commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f > Author: Paolo Bonzini > Date: Mon Jul 7 10:39:49 2014 +0200 > > iothread: do not rely on aio_poll(ctx, true) result to end a loop > > Currently, whenever aio_poll(ctx, true) has completed all pending > work it returns true *and* the next call to aio_poll(ctx, true) > will not block. > > This invariant has its roots in qemu_aio_flush()'s implementation > as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does > not exist anymore and bdrv_drain_all() is implemented differently; > and this invariant is complicated to maintain and subtly different > from the return value of GMainLoop's g_main_context_iteration. > > All calls to aio_poll(ctx, true) except one are guarded by a > while() loop checking for a request to be incomplete, or a > BlockDriverState to be idle. Modify that one exception in > iothread.c. > > Signed-off-by: Paolo Bonzini The hangs are gone. Looks like 2.1 material now... Acked-by: Christian Borntraeger Tested-by: Christian Borntraeger > > diff --git a/iothread.c b/iothread.c > index 1fbf9f1..d9403cf 100644 > --- a/iothread.c > +++ b/iothread.c > @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass; > static void *iothread_run(void *opaque) > { > IOThread *iothread = opaque; > + bool blocking; > > qemu_mutex_lock(&iothread->init_done_lock); > iothread->thread_id = qemu_get_thread_id(); > @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque) > > while (!iothread->stopping) { > aio_context_acquire(iothread->ctx); > - while (!iothread->stopping && aio_poll(iothread->ctx, true)) { > + blocking = true; > + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) { > /* Progress was made, keep going */ > + blocking = false; > } > aio_context_release(iothread->ctx); > } > > Christian, can you test it? > > Paolo >