From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:38777) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SSBVr-0007Az-4D for qemu-devel@nongnu.org; Wed, 09 May 2012 14:21:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SSBVo-0006NN-7y for qemu-devel@nongnu.org; Wed, 09 May 2012 14:21:22 -0400 Received: from goliath.siemens.de ([192.35.17.28]:22581) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SSBVn-0006Mo-UR for qemu-devel@nongnu.org; Wed, 09 May 2012 14:21:20 -0400 Message-ID: <4FAAB598.9000000@siemens.com> Date: Wed, 09 May 2012 15:21:12 -0300 From: Jan Kiszka MIME-Version: 1.0 References: <4FA97596.4000807@siemens.com> <4FAA42EB.2080407@redhat.com> <4FAA5721.9060201@siemens.com> <4FAAA6AA.2040400@codemonkey.ws> <4FAAA893.9050506@siemens.com> <4FAAA9F7.7020702@us.ibm.com> In-Reply-To: <4FAAA9F7.7020702@us.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] coroutine-ucontext broken for x86-32 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Kevin Wolf , Peter Maydell , Michael Tokarev , qemu-devel , Anthony Liguori On 2012-05-09 14:31, Anthony Liguori wrote: > On 05/09/2012 12:25 PM, Jan Kiszka wrote: >> On 2012-05-09 14:17, Anthony Liguori wrote: >>> On 05/09/2012 06:38 AM, Jan Kiszka wrote: >>>> On 2012-05-09 08:15, Peter Maydell wrote: >>>>> On 9 May 2012 11:11, Kevin Wolf wrote: >>>>>> Am 08.05.2012 21:35, schrieb Jan Kiszka: >>>>>>> I hunted down a fairly subtle corruption of the VCPU thread signal mask >>>>>>> in KVM mode when using the ucontext version of coroutines: >>>>>>> >>>>>>> coroutine_new calls getcontext, makecontext, swapcontext. Those >>>>>>> functions get/set also the signal mask of the caller. Unfortunately, >>>>>>> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >>>>>>> version. So they do not properly save/restore the blocked RT signals, >>>>>>> namely our SIG_IPI - it becomes unblocke this way. >>>>>> >>>>>> If other coroutine backends work (sigaltstack?), we could try to detect >>>>>> the situation in configure and set the right default. Not sure what the >>>>>> condition is, glibc + i386? >>>>> >>>>> I don't think you can do a compile-time test for this short of >>>>> just disabling use of the ucontext code on all i386/Linux platforms. >>>>> >>>>> I think it's becoming increasingly obvious that the setcontext/getcontext >>>>> code path is not very well used and prone to nasty libc bugs. Trying >>>>> to implement coroutines in C is just a really bad idea and I think >>>>> we should be trying to reduce our use of them if we possibly can, >>>>> presumably by switching to actually using threads where we really >>>>> need the parallelism. >>>> >>>> I tend to agree. >>>> >>>> FWIW, sigaltstack works around the issue here, but I'm still looking s >>>> bit skeptical at its implementation. >>> >>> Is there any downside to using SIGUSR1? >> >> You mean for SIG_IPI? I don't think so. But the point is that the, well, >> limitation of ucontext will continue to break RT signals, > > Yes, but we currently don't use RT signals, right? So we could switch to > SIGUSR1, fix the problem in glibc, and call it a day, no? That cures the current symptom but does not prevent future diseases around RT signals. I would prefer to disable ucontext usage on those platforms we identified as broken. BTW, I'm starting to believe it's not a glibc but rather a Linux kernel issue, only biting us on 32/64. sigprocmask should only manipulate those signals, its masks can address. Digging deeper... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux