From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:50314) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SS4ox-0007Tv-MY for qemu-devel@nongnu.org; Wed, 09 May 2012 07:12:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SS4or-0004FA-8E for qemu-devel@nongnu.org; Wed, 09 May 2012 07:12:39 -0400 Received: from thoth.sbs.de ([192.35.17.2]:24111) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SS4oq-0004Et-Ux for qemu-devel@nongnu.org; Wed, 09 May 2012 07:12:33 -0400 Message-ID: <4FAA5119.4070109@siemens.com> Date: Wed, 09 May 2012 08:12:25 -0300 From: Jan Kiszka MIME-Version: 1.0 References: <4FA97596.4000807@siemens.com> <4FAA1D75.6080108@msgid.tls.msk.ru> In-Reply-To: <4FAA1D75.6080108@msgid.tls.msk.ru> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] coroutine-ucontext broken for x86-32 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Tokarev Cc: Kevin Wolf , Anthony Liguori , qemu-devel On 2012-05-09 04:32, Michael Tokarev wrote: > On 08.05.2012 23:35, Jan Kiszka wrote: >> Hi, >> >> I hunted down a fairly subtle corruption of the VCPU thread signal mask >> in KVM mode when using the ucontext version of coroutines: >> >> coroutine_new calls getcontext, makecontext, swapcontext. Those >> functions get/set also the signal mask of the caller. Unfortunately, >> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >> version. So they do not properly save/restore the blocked RT signals, >> namely our SIG_IPI - it becomes unblocke this way. And this will sooner >> or later make the kernel actually deliver a SIG_IPI to our >> dummy_handler, and we miss a wakeup, which means losing control over >> VCPU thread - qemu hangs. >> >> I was able to reproduce the issue very reliably with virtio-block >> enabled, 32-bit qemu userspace on a 64-bit host, using a 32-bit WinXP >> guest. > > Jan, I tried to hunt down (well, FSVO anyway, since I don't understand > qemu code as a whole still) this very issue since some 0.15 (IIRC - > when coroutines were introduced) version. The sympthom I faced was > 32bit kvm process lockup when rebooting windows guest. The cause > was lost/ignored interrupts, and for me it was possible to just > suspend/resume (SIGSTOP/SIGCONT) the kvm process or to attach a > debugger or strace to it. It looked like a corruption somewhere, > and while bisecting I were finding "unrelated" commits -- like, > eg, "switch qcow2 to coroutines" (I was using -snapshot, so qcow2 > was actually in use, but the commit itself were innocent). There > are several discussions in archives, debian bugreport about it and > several IRC discussions, all with no outcome. So at least now I > can say that it is not only me who see the issue, so it passes a > reality check somehow... ;) > > But the thing is: generally, almost no one cares about 32/64bit > "mixed" environment anymore. I had a few users in Debian who > complained, and it has always been the same scenario: an old 32bit > install moved to a new hardware, next due to large amount of > memory, switch to 64bit kernel, and the result is "something > not working". My suggestion to them has always been "reinstall". > I use such a mixed environment myself on my development box > (and actually even on production machines @office), so I'm > one of the first to face issues in this area, and it sometimes > does not let me to do other things -- eg, I can't debug some > other bug because qemu locks up due to this 32/64 thing. I > learned to use a 64bit chroot for this things after all. > > So I'm not sure if there's enough interest to hunt this. It > must be something very simple, and it might pop up somewhere > else, but so far it - seemingly - only affects 32/64bit mixed > environment. This issue also affects 32/32 installations. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux