From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:43527) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SRqCX-0004jA-H8 for qemu-devel@nongnu.org; Tue, 08 May 2012 15:36:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SRqCU-0000jg-EJ for qemu-devel@nongnu.org; Tue, 08 May 2012 15:36:01 -0400 Received: from goliath.siemens.de ([192.35.17.28]:18410) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SRqCU-0000ie-3g for qemu-devel@nongnu.org; Tue, 08 May 2012 15:35:58 -0400 Message-ID: <4FA97596.4000807@siemens.com> Date: Tue, 08 May 2012 16:35:50 -0300 From: Jan Kiszka MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] coroutine-ucontext broken for x86-32 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel Cc: Kevin Wolf , Anthony Liguori , Michael Tokarev Hi, I hunted down a fairly subtle corruption of the VCPU thread signal mask in KVM mode when using the ucontext version of coroutines: coroutine_new calls getcontext, makecontext, swapcontext. Those functions get/set also the signal mask of the caller. Unfortunately, they only use the sigprocmask syscall on i386, not the rt_sigprocmask version. So they do not properly save/restore the blocked RT signals, namely our SIG_IPI - it becomes unblocke this way. And this will sooner or later make the kernel actually deliver a SIG_IPI to our dummy_handler, and we miss a wakeup, which means losing control over VCPU thread - qemu hangs. I was able to reproduce the issue very reliably with virtio-block enabled, 32-bit qemu userspace on a 64-bit host, using a 32-bit WinXP guest. Simple workaround: diff --git a/main-loop.h b/main-loop.h index c06b8bc..dce1cd9 100644 --- a/main-loop.h +++ b/main-loop.h @@ -25,11 +25,7 @@ #ifndef QEMU_MAIN_LOOP_H #define QEMU_MAIN_LOOP_H 1 -#ifdef SIGRTMIN -#define SIG_IPI (SIGRTMIN+4) -#else #define SIG_IPI SIGUSR1 -#endif /** * qemu_init_main_loop: Set up the process so that it can run the main loop. But maybe someone has a better idea, ie. something that addresses the issue at the root. Otherwise we would have to erect large warning signs: "Do not use RT signals! Coroutines will break them for you." Michael, maybe this also relates to the issue you saw. I'm not able to reproduce any VAPIC problems after make Windows bootable by switching to SIGUSR1. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux