From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45341) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4bPh-0001t5-RP for qemu-devel@nongnu.org; Tue, 08 Jul 2014 15:51:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X4bPb-0005x9-MA for qemu-devel@nongnu.org; Tue, 08 Jul 2014 15:50:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:15378) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X4bPb-0005wl-Dh for qemu-devel@nongnu.org; Tue, 08 Jul 2014 15:50:47 -0400 Message-ID: <53BC4B85.8040107@redhat.com> Date: Tue, 08 Jul 2014 21:50:29 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <53BA8B49.9050709@de.ibm.com> <20140708155956.GB11505@stefanha-thinkpad.redhat.com> <53BC2579.9060200@redhat.com> <53BC4181.6070604@de.ibm.com> In-Reply-To: <53BC4181.6070604@de.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] another locking issue in current dataplane code? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger , Stefan Hajnoczi Cc: Cornelia Huck , Kevin Wolf , ming.lei@canonical.com, "qemu-devel@nongnu.org" , Dominik Dingel Il 08/07/2014 21:07, Christian Borntraeger ha scritto: > On 08/07/14 19:08, Paolo Bonzini wrote: >> Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto: >>> I sent Christian an initial patch to fix this but now both threads are >>> stuck in rfifolock_lock() inside cond wait. That's very strange and >>> should never happen. >> >> I had this patch pending for 2.2: >> >> commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f >> Author: Paolo Bonzini >> Date: Mon Jul 7 10:39:49 2014 +0200 >> >> iothread: do not rely on aio_poll(ctx, true) result to end a loop >> >> Currently, whenever aio_poll(ctx, true) has completed all pending >> work it returns true *and* the next call to aio_poll(ctx, true) >> will not block. >> >> This invariant has its roots in qemu_aio_flush()'s implementation >> as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does >> not exist anymore and bdrv_drain_all() is implemented differently; >> and this invariant is complicated to maintain and subtly different >> from the return value of GMainLoop's g_main_context_iteration. >> >> All calls to aio_poll(ctx, true) except one are guarded by a >> while() loop checking for a request to be incomplete, or a >> BlockDriverState to be idle. Modify that one exception in >> iothread.c. >> >> Signed-off-by: Paolo Bonzini > > The hangs are gone. Looks like 2.1 material now... > > Acked-by: Christian Borntraeger > Tested-by: Christian Borntraeger Great, I'll send it out tomorrow morning. Paolo