From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:48236) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4YwA-00006E-Sp for qemu-devel@nongnu.org; Mon, 05 Mar 2012 09:30:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4Yvm-0006Su-4W for qemu-devel@nongnu.org; Mon, 05 Mar 2012 09:30:54 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45704) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4Yvl-0006SO-S6 for qemu-devel@nongnu.org; Mon, 05 Mar 2012 09:30:30 -0500 Message-ID: <4F54CDFE.3030309@redhat.com> Date: Mon, 05 Mar 2012 15:30:22 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1330936455-23802-1-git-send-email-pbonzini@redhat.com> <4F548263.1070905@siemens.com> <4F54CC8A.5010509@redhat.com> In-Reply-To: <4F54CC8A.5010509@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: anthony@codemonkey.ws, Jan Kiszka , qemu-devel@nongnu.org, kvm@vger.kernel.org, laurent@vivier.eu Il 05/03/2012 15:24, Avi Kivity ha scritto: > On 03/05/2012 11:07 AM, Jan Kiszka wrote: >> On 2012-03-05 09:34, Paolo Bonzini wrote: >>> This is quite ugly. Two threads, one running main_loop_wait and >>> one running qemu_aio_wait, can race with each other on running the >>> same iohandler. The result is that an iohandler could run while the >>> underlying socket is not readable or writable, with possibly ill effects. >> >> Hmm, isn't it a problem already that a socket is polled by two threads >> at the same time? Can't that be avoided? > > Could it be done simply by adding a mutex there? It's hardly a clean > fix, but it's not a clean problem. Hmm, I don't think so. It would need to protect execution of the iohandlers too, and pretty much everything can happen there including a nested loop. Of course recursive mutexes exist, but it sounds like too big an axe. I could add a generation count updated by qemu_aio_wait(), and rerun the select() only if the generation count changes during its execution. Or we can call it an NBD bug. I'm not against that, but it seemed to me that the problem is more general. Paolo