From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:35755) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4TNE-00049Q-0E for qemu-devel@nongnu.org; Mon, 05 Mar 2012 03:34:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4TN9-0005Ya-4v for qemu-devel@nongnu.org; Mon, 05 Mar 2012 03:34:27 -0500 Received: from mail-ee0-f45.google.com ([74.125.83.45]:44647) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4TN8-0005YL-Sn for qemu-devel@nongnu.org; Mon, 05 Mar 2012 03:34:23 -0500 Received: by eeit10 with SMTP id t10so1625802eei.4 for ; Mon, 05 Mar 2012 00:34:20 -0800 (PST) Sender: Paolo Bonzini From: Paolo Bonzini Date: Mon, 5 Mar 2012 09:34:15 +0100 Message-Id: <1330936455-23802-1-git-send-email-pbonzini@redhat.com> Subject: [Qemu-devel] [RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: anthony@codemonkey.ws, laurent@vivier.eu, kvm@vger.kernel.org This is quite ugly. Two threads, one running main_loop_wait and one running qemu_aio_wait, can race with each other on running the same iohandler. The result is that an iohandler could run while the underlying socket is not readable or writable, with possibly ill effects. This shows as a failure to boot an IDE disk using the NBD device. We can consider it a bug in NBD or in the main loop. The patch fixes this in main_loop_wait, which is always going to lose the race because qemu_aio_wait runs select with the global lock held. Reported-by: Laurent Vivier Signed-off-by: Paolo Bonzini --- Anthony, if you think this is too ugly tell me and I can post an NBD fix too. main-loop.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/main-loop.c b/main-loop.c index db23de0..3beccff 100644 --- a/main-loop.c +++ b/main-loop.c @@ -458,6 +458,13 @@ int main_loop_wait(int nonblocking) if (timeout > 0) { qemu_mutex_lock_iothread(); + + /* Poll again. A qemu_aio_wait() on another thread + * could have made the fdsets stale. + */ + tv.tv_sec = 0; + tv.tv_usec = 0; + ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); } glib_select_poll(&rfds, &wfds, &xfds, (ret < 0)); -- 1.7.7.6