From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:36008)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1S4Zco-0003CL-9h
	for qemu-devel@nongnu.org; Mon, 05 Mar 2012 10:15:02 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1S4Zch-0000L8-Vd
	for qemu-devel@nongnu.org; Mon, 05 Mar 2012 10:14:57 -0500
Received: from mx1.redhat.com ([209.132.183.28]:33415)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1S4Zch-0000Kx-Nt
	for qemu-devel@nongnu.org; Mon, 05 Mar 2012 10:14:51 -0500
Message-ID: <4F54D866.30402@redhat.com>
Date: Mon, 05 Mar 2012 17:14:46 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
References: <1330936455-23802-1-git-send-email-pbonzini@redhat.com>
	<4F548263.1070905@siemens.com> <4F54CC8A.5010509@redhat.com>
	<4F54CDFE.3030309@redhat.com>
In-Reply-To: <4F54CDFE.3030309@redhat.com>
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH] fix select(2) race between
	main_loop_wait and qemu_aio_wait
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: anthony@codemonkey.ws, Jan Kiszka <jan.kiszka@siemens.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org, laurent@vivier.eu

On 03/05/2012 04:30 PM, Paolo Bonzini wrote:
> Il 05/03/2012 15:24, Avi Kivity ha scritto:
> > On 03/05/2012 11:07 AM, Jan Kiszka wrote:
> >> On 2012-03-05 09:34, Paolo Bonzini wrote:
> >>> This is quite ugly.  Two threads, one running main_loop_wait and
> >>> one running qemu_aio_wait, can race with each other on running the
> >>> same iohandler.  The result is that an iohandler could run while the
> >>> underlying socket is not readable or writable, with possibly ill effects.
> >>
> >> Hmm, isn't it a problem already that a socket is polled by two threads
> >> at the same time? Can't that be avoided?
> > 
> > Could it be done simply by adding a mutex there?  It's hardly a clean
> > fix, but it's not a clean problem.
>
> Hmm, I don't think so.  It would need to protect execution of the
> iohandlers too, and pretty much everything can happen there including a
> nested loop.  Of course recursive mutexes exist, but it sounds like too
> big an axe.

The I/O handlers would still use the qemu mutex, no?  we'd just protect
the select() (taking the mutex from before releasing the global lock,
and reacquiring it afterwards).

> I could add a generation count updated by qemu_aio_wait(), and rerun the
> select() only if the generation count changes during its execution.
>
> Or we can call it an NBD bug.  I'm not against that, but it seemed to me
> that the problem is more general.

What about making sure all callers of qemu_aio_wait() run from
coroutines (or threads)?  Then they just ask the main thread to wake
them up, instead of dispatching completions themselves.

-- 
error compiling committee.c: too many arguments to function