From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53441) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4mjT-0006P2-8R for qemu-devel@nongnu.org; Wed, 09 Jul 2014 03:56:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X4mjR-000612-GT for qemu-devel@nongnu.org; Wed, 09 Jul 2014 03:56:03 -0400 Received: from mail-ob0-x22a.google.com ([2607:f8b0:4003:c01::22a]:54232) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4mjR-00060l-Bb for qemu-devel@nongnu.org; Wed, 09 Jul 2014 03:56:01 -0400 Received: by mail-ob0-f170.google.com with SMTP id uz6so7637478obc.15 for ; Wed, 09 Jul 2014 00:56:00 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <53BC4181.6070604@de.ibm.com> References: <53BA8B49.9050709@de.ibm.com> <20140708155956.GB11505@stefanha-thinkpad.redhat.com> <53BC2579.9060200@redhat.com> <53BC4181.6070604@de.ibm.com> Date: Wed, 9 Jul 2014 09:56:00 +0200 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] another locking issue in current dataplane code? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger Cc: Kevin Wolf , Ming Lei , "qemu-devel@nongnu.org" , Dominik Dingel , Stefan Hajnoczi , Cornelia Huck , Paolo Bonzini On Tue, Jul 8, 2014 at 9:07 PM, Christian Borntraeger wrote: > On 08/07/14 19:08, Paolo Bonzini wrote: >> Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto: >>> I sent Christian an initial patch to fix this but now both threads are >>> stuck in rfifolock_lock() inside cond wait. That's very strange and >>> should never happen. >> >> I had this patch pending for 2.2: >> >> commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f >> Author: Paolo Bonzini >> Date: Mon Jul 7 10:39:49 2014 +0200 >> >> iothread: do not rely on aio_poll(ctx, true) result to end a loop >> >> Currently, whenever aio_poll(ctx, true) has completed all pending >> work it returns true *and* the next call to aio_poll(ctx, true) >> will not block. >> >> This invariant has its roots in qemu_aio_flush()'s implementation >> as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does >> not exist anymore and bdrv_drain_all() is implemented differently; >> and this invariant is complicated to maintain and subtly different >> from the return value of GMainLoop's g_main_context_iteration. >> >> All calls to aio_poll(ctx, true) except one are guarded by a >> while() loop checking for a request to be incomplete, or a >> BlockDriverState to be idle. Modify that one exception in >> iothread.c. >> >> Signed-off-by: Paolo Bonzini > > The hangs are gone. Looks like 2.1 material now... > > Acked-by: Christian Borntraeger > Tested-by: Christian Borntraeger > > > > >> >> diff --git a/iothread.c b/iothread.c >> index 1fbf9f1..d9403cf 100644 >> --- a/iothread.c >> +++ b/iothread.c >> @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass; >> static void *iothread_run(void *opaque) >> { >> IOThread *iothread = opaque; >> + bool blocking; >> >> qemu_mutex_lock(&iothread->init_done_lock); >> iothread->thread_id = qemu_get_thread_id(); >> @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque) >> >> while (!iothread->stopping) { >> aio_context_acquire(iothread->ctx); >> - while (!iothread->stopping && aio_poll(iothread->ctx, true)) { >> + blocking = true; >> + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) { >> /* Progress was made, keep going */ >> + blocking = false; >> } >> aio_context_release(iothread->ctx); >> } >> >> Christian, can you test it? Could affect performance because of the extra poll/release/acquire but a clean solution for broken iothread_run(). Reviewed-by: Stefan Hajnoczi Good for 2.1