From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:36008) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4Zco-0003CL-9h for qemu-devel@nongnu.org; Mon, 05 Mar 2012 10:15:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4Zch-0000L8-Vd for qemu-devel@nongnu.org; Mon, 05 Mar 2012 10:14:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33415) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4Zch-0000Kx-Nt for qemu-devel@nongnu.org; Mon, 05 Mar 2012 10:14:51 -0500 Message-ID: <4F54D866.30402@redhat.com> Date: Mon, 05 Mar 2012 17:14:46 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1330936455-23802-1-git-send-email-pbonzini@redhat.com> <4F548263.1070905@siemens.com> <4F54CC8A.5010509@redhat.com> <4F54CDFE.3030309@redhat.com> In-Reply-To: <4F54CDFE.3030309@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: anthony@codemonkey.ws, Jan Kiszka , qemu-devel@nongnu.org, kvm@vger.kernel.org, laurent@vivier.eu On 03/05/2012 04:30 PM, Paolo Bonzini wrote: > Il 05/03/2012 15:24, Avi Kivity ha scritto: > > On 03/05/2012 11:07 AM, Jan Kiszka wrote: > >> On 2012-03-05 09:34, Paolo Bonzini wrote: > >>> This is quite ugly. Two threads, one running main_loop_wait and > >>> one running qemu_aio_wait, can race with each other on running the > >>> same iohandler. The result is that an iohandler could run while the > >>> underlying socket is not readable or writable, with possibly ill effects. > >> > >> Hmm, isn't it a problem already that a socket is polled by two threads > >> at the same time? Can't that be avoided? > > > > Could it be done simply by adding a mutex there? It's hardly a clean > > fix, but it's not a clean problem. > > Hmm, I don't think so. It would need to protect execution of the > iohandlers too, and pretty much everything can happen there including a > nested loop. Of course recursive mutexes exist, but it sounds like too > big an axe. The I/O handlers would still use the qemu mutex, no? we'd just protect the select() (taking the mutex from before releasing the global lock, and reacquiring it afterwards). > I could add a generation count updated by qemu_aio_wait(), and rerun the > select() only if the generation count changes during its execution. > > Or we can call it an NBD bug. I'm not against that, but it seemed to me > that the problem is more general. What about making sure all callers of qemu_aio_wait() run from coroutines (or threads)? Then they just ask the main thread to wake them up, instead of dispatching completions themselves. -- error compiling committee.c: too many arguments to function