* [Qemu-devel] coroutine-ucontext broken for x86-32 @ 2012-05-08 19:35 Jan Kiszka 2012-05-09 7:32 ` Michael Tokarev ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Jan Kiszka @ 2012-05-08 19:35 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Michael Tokarev Hi, I hunted down a fairly subtle corruption of the VCPU thread signal mask in KVM mode when using the ucontext version of coroutines: coroutine_new calls getcontext, makecontext, swapcontext. Those functions get/set also the signal mask of the caller. Unfortunately, they only use the sigprocmask syscall on i386, not the rt_sigprocmask version. So they do not properly save/restore the blocked RT signals, namely our SIG_IPI - it becomes unblocke this way. And this will sooner or later make the kernel actually deliver a SIG_IPI to our dummy_handler, and we miss a wakeup, which means losing control over VCPU thread - qemu hangs. I was able to reproduce the issue very reliably with virtio-block enabled, 32-bit qemu userspace on a 64-bit host, using a 32-bit WinXP guest. Simple workaround: diff --git a/main-loop.h b/main-loop.h index c06b8bc..dce1cd9 100644 --- a/main-loop.h +++ b/main-loop.h @@ -25,11 +25,7 @@ #ifndef QEMU_MAIN_LOOP_H #define QEMU_MAIN_LOOP_H 1 -#ifdef SIGRTMIN -#define SIG_IPI (SIGRTMIN+4) -#else #define SIG_IPI SIGUSR1 -#endif /** * qemu_init_main_loop: Set up the process so that it can run the main loop. But maybe someone has a better idea, ie. something that addresses the issue at the root. Otherwise we would have to erect large warning signs: "Do not use RT signals! Coroutines will break them for you." Michael, maybe this also relates to the issue you saw. I'm not able to reproduce any VAPIC problems after make Windows bootable by switching to SIGUSR1. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-08 19:35 [Qemu-devel] coroutine-ucontext broken for x86-32 Jan Kiszka @ 2012-05-09 7:32 ` Michael Tokarev 2012-05-09 11:12 ` Jan Kiszka 2012-05-09 10:11 ` Kevin Wolf 2012-05-09 18:05 ` Michael Tokarev 2 siblings, 1 reply; 13+ messages in thread From: Michael Tokarev @ 2012-05-09 7:32 UTC (permalink / raw) To: Jan Kiszka; +Cc: Kevin Wolf, Anthony Liguori, qemu-devel On 08.05.2012 23:35, Jan Kiszka wrote: > Hi, > > I hunted down a fairly subtle corruption of the VCPU thread signal mask > in KVM mode when using the ucontext version of coroutines: > > coroutine_new calls getcontext, makecontext, swapcontext. Those > functions get/set also the signal mask of the caller. Unfortunately, > they only use the sigprocmask syscall on i386, not the rt_sigprocmask > version. So they do not properly save/restore the blocked RT signals, > namely our SIG_IPI - it becomes unblocke this way. And this will sooner > or later make the kernel actually deliver a SIG_IPI to our > dummy_handler, and we miss a wakeup, which means losing control over > VCPU thread - qemu hangs. > > I was able to reproduce the issue very reliably with virtio-block > enabled, 32-bit qemu userspace on a 64-bit host, using a 32-bit WinXP > guest. Jan, I tried to hunt down (well, FSVO anyway, since I don't understand qemu code as a whole still) this very issue since some 0.15 (IIRC - when coroutines were introduced) version. The sympthom I faced was 32bit kvm process lockup when rebooting windows guest. The cause was lost/ignored interrupts, and for me it was possible to just suspend/resume (SIGSTOP/SIGCONT) the kvm process or to attach a debugger or strace to it. It looked like a corruption somewhere, and while bisecting I were finding "unrelated" commits -- like, eg, "switch qcow2 to coroutines" (I was using -snapshot, so qcow2 was actually in use, but the commit itself were innocent). There are several discussions in archives, debian bugreport about it and several IRC discussions, all with no outcome. So at least now I can say that it is not only me who see the issue, so it passes a reality check somehow... ;) But the thing is: generally, almost no one cares about 32/64bit "mixed" environment anymore. I had a few users in Debian who complained, and it has always been the same scenario: an old 32bit install moved to a new hardware, next due to large amount of memory, switch to 64bit kernel, and the result is "something not working". My suggestion to them has always been "reinstall". I use such a mixed environment myself on my development box (and actually even on production machines @office), so I'm one of the first to face issues in this area, and it sometimes does not let me to do other things -- eg, I can't debug some other bug because qemu locks up due to this 32/64 thing. I learned to use a 64bit chroot for this things after all. So I'm not sure if there's enough interest to hunt this. It must be something very simple, and it might pop up somewhere else, but so far it - seemingly - only affects 32/64bit mixed environment. > Simple workaround: > > diff --git a/main-loop.h b/main-loop.h > index c06b8bc..dce1cd9 100644 > --- a/main-loop.h > +++ b/main-loop.h > @@ -25,11 +25,7 @@ > #ifndef QEMU_MAIN_LOOP_H > #define QEMU_MAIN_LOOP_H 1 > > -#ifdef SIGRTMIN > -#define SIG_IPI (SIGRTMIN+4) > -#else > #define SIG_IPI SIGUSR1 > -#endif > > /** > * qemu_init_main_loop: Set up the process so that it can run the main loop. > > > But maybe someone has a better idea, ie. something that addresses the > issue at the root. Otherwise we would have to erect large warning signs: > "Do not use RT signals! Coroutines will break them for you." > > Michael, maybe this also relates to the issue you saw. I'm not able to > reproduce any VAPIC problems after make Windows bootable by switching > to SIGUSR1. I'll try to verify it later today, I've unrelated urgent family issues right now... Thank you! /mjt ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 7:32 ` Michael Tokarev @ 2012-05-09 11:12 ` Jan Kiszka 0 siblings, 0 replies; 13+ messages in thread From: Jan Kiszka @ 2012-05-09 11:12 UTC (permalink / raw) To: Michael Tokarev; +Cc: Kevin Wolf, Anthony Liguori, qemu-devel On 2012-05-09 04:32, Michael Tokarev wrote: > On 08.05.2012 23:35, Jan Kiszka wrote: >> Hi, >> >> I hunted down a fairly subtle corruption of the VCPU thread signal mask >> in KVM mode when using the ucontext version of coroutines: >> >> coroutine_new calls getcontext, makecontext, swapcontext. Those >> functions get/set also the signal mask of the caller. Unfortunately, >> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >> version. So they do not properly save/restore the blocked RT signals, >> namely our SIG_IPI - it becomes unblocke this way. And this will sooner >> or later make the kernel actually deliver a SIG_IPI to our >> dummy_handler, and we miss a wakeup, which means losing control over >> VCPU thread - qemu hangs. >> >> I was able to reproduce the issue very reliably with virtio-block >> enabled, 32-bit qemu userspace on a 64-bit host, using a 32-bit WinXP >> guest. > > Jan, I tried to hunt down (well, FSVO anyway, since I don't understand > qemu code as a whole still) this very issue since some 0.15 (IIRC - > when coroutines were introduced) version. The sympthom I faced was > 32bit kvm process lockup when rebooting windows guest. The cause > was lost/ignored interrupts, and for me it was possible to just > suspend/resume (SIGSTOP/SIGCONT) the kvm process or to attach a > debugger or strace to it. It looked like a corruption somewhere, > and while bisecting I were finding "unrelated" commits -- like, > eg, "switch qcow2 to coroutines" (I was using -snapshot, so qcow2 > was actually in use, but the commit itself were innocent). There > are several discussions in archives, debian bugreport about it and > several IRC discussions, all with no outcome. So at least now I > can say that it is not only me who see the issue, so it passes a > reality check somehow... ;) > > But the thing is: generally, almost no one cares about 32/64bit > "mixed" environment anymore. I had a few users in Debian who > complained, and it has always been the same scenario: an old 32bit > install moved to a new hardware, next due to large amount of > memory, switch to 64bit kernel, and the result is "something > not working". My suggestion to them has always been "reinstall". > I use such a mixed environment myself on my development box > (and actually even on production machines @office), so I'm > one of the first to face issues in this area, and it sometimes > does not let me to do other things -- eg, I can't debug some > other bug because qemu locks up due to this 32/64 thing. I > learned to use a 64bit chroot for this things after all. > > So I'm not sure if there's enough interest to hunt this. It > must be something very simple, and it might pop up somewhere > else, but so far it - seemingly - only affects 32/64bit mixed > environment. This issue also affects 32/32 installations. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-08 19:35 [Qemu-devel] coroutine-ucontext broken for x86-32 Jan Kiszka 2012-05-09 7:32 ` Michael Tokarev @ 2012-05-09 10:11 ` Kevin Wolf 2012-05-09 10:55 ` Jan Kiszka 2012-05-09 11:15 ` Peter Maydell 2012-05-09 18:05 ` Michael Tokarev 2 siblings, 2 replies; 13+ messages in thread From: Kevin Wolf @ 2012-05-09 10:11 UTC (permalink / raw) To: Jan Kiszka; +Cc: Anthony Liguori, Michael Tokarev, qemu-devel Am 08.05.2012 21:35, schrieb Jan Kiszka: > Hi, > > I hunted down a fairly subtle corruption of the VCPU thread signal mask > in KVM mode when using the ucontext version of coroutines: > > coroutine_new calls getcontext, makecontext, swapcontext. Those > functions get/set also the signal mask of the caller. Unfortunately, > they only use the sigprocmask syscall on i386, not the rt_sigprocmask > version. So they do not properly save/restore the blocked RT signals, > namely our SIG_IPI - it becomes unblocke this way. If other coroutine backends work (sigaltstack?), we could try to detect the situation in configure and set the right default. Not sure what the condition is, glibc + i386? Kevin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 10:11 ` Kevin Wolf @ 2012-05-09 10:55 ` Jan Kiszka 2012-05-09 11:15 ` Peter Maydell 1 sibling, 0 replies; 13+ messages in thread From: Jan Kiszka @ 2012-05-09 10:55 UTC (permalink / raw) To: Kevin Wolf; +Cc: Anthony Liguori, Michael Tokarev, qemu-devel On 2012-05-09 07:11, Kevin Wolf wrote: > Am 08.05.2012 21:35, schrieb Jan Kiszka: >> Hi, >> >> I hunted down a fairly subtle corruption of the VCPU thread signal mask >> in KVM mode when using the ucontext version of coroutines: >> >> coroutine_new calls getcontext, makecontext, swapcontext. Those >> functions get/set also the signal mask of the caller. Unfortunately, >> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >> version. So they do not properly save/restore the blocked RT signals, >> namely our SIG_IPI - it becomes unblocke this way. > > If other coroutine backends work (sigaltstack?), Had a brief look at signalstack. Is kill(getpid(), SIGUSR2) in coroutine_new correct? This should broadcast to the whole process, but we only block SIGUSR2 in the caller's context. Why not pthread_kill? I will give it a try nevertheless. > we could try to detect > the situation in configure and set the right default. Not sure what the > condition is, glibc + i386? So far it looks like a glibc/i386 thing, but maybe it is driven by the ABI that defines the size of jmpbuf and, thus, the ability to store RT signals as well. Then it is a generic i386 limitation. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 10:11 ` Kevin Wolf 2012-05-09 10:55 ` Jan Kiszka @ 2012-05-09 11:15 ` Peter Maydell 2012-05-09 11:38 ` Jan Kiszka 2012-05-09 14:37 ` Avi Kivity 1 sibling, 2 replies; 13+ messages in thread From: Peter Maydell @ 2012-05-09 11:15 UTC (permalink / raw) To: Kevin Wolf; +Cc: Jan Kiszka, Anthony Liguori, Michael Tokarev, qemu-devel On 9 May 2012 11:11, Kevin Wolf <kwolf@redhat.com> wrote: > Am 08.05.2012 21:35, schrieb Jan Kiszka: >> I hunted down a fairly subtle corruption of the VCPU thread signal mask >> in KVM mode when using the ucontext version of coroutines: >> >> coroutine_new calls getcontext, makecontext, swapcontext. Those >> functions get/set also the signal mask of the caller. Unfortunately, >> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >> version. So they do not properly save/restore the blocked RT signals, >> namely our SIG_IPI - it becomes unblocke this way. > > If other coroutine backends work (sigaltstack?), we could try to detect > the situation in configure and set the right default. Not sure what the > condition is, glibc + i386? I don't think you can do a compile-time test for this short of just disabling use of the ucontext code on all i386/Linux platforms. I think it's becoming increasingly obvious that the setcontext/getcontext code path is not very well used and prone to nasty libc bugs. Trying to implement coroutines in C is just a really bad idea and I think we should be trying to reduce our use of them if we possibly can, presumably by switching to actually using threads where we really need the parallelism. -- PMM ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 11:15 ` Peter Maydell @ 2012-05-09 11:38 ` Jan Kiszka 2012-05-09 17:17 ` Anthony Liguori 2012-05-09 14:37 ` Avi Kivity 1 sibling, 1 reply; 13+ messages in thread From: Jan Kiszka @ 2012-05-09 11:38 UTC (permalink / raw) To: Peter Maydell; +Cc: Kevin Wolf, Anthony Liguori, Michael Tokarev, qemu-devel On 2012-05-09 08:15, Peter Maydell wrote: > On 9 May 2012 11:11, Kevin Wolf <kwolf@redhat.com> wrote: >> Am 08.05.2012 21:35, schrieb Jan Kiszka: >>> I hunted down a fairly subtle corruption of the VCPU thread signal mask >>> in KVM mode when using the ucontext version of coroutines: >>> >>> coroutine_new calls getcontext, makecontext, swapcontext. Those >>> functions get/set also the signal mask of the caller. Unfortunately, >>> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >>> version. So they do not properly save/restore the blocked RT signals, >>> namely our SIG_IPI - it becomes unblocke this way. >> >> If other coroutine backends work (sigaltstack?), we could try to detect >> the situation in configure and set the right default. Not sure what the >> condition is, glibc + i386? > > I don't think you can do a compile-time test for this short of > just disabling use of the ucontext code on all i386/Linux platforms. > > I think it's becoming increasingly obvious that the setcontext/getcontext > code path is not very well used and prone to nasty libc bugs. Trying > to implement coroutines in C is just a really bad idea and I think > we should be trying to reduce our use of them if we possibly can, > presumably by switching to actually using threads where we really > need the parallelism. I tend to agree. FWIW, sigaltstack works around the issue here, but I'm still looking s bit skeptical at its implementation. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 11:38 ` Jan Kiszka @ 2012-05-09 17:17 ` Anthony Liguori 2012-05-09 17:25 ` Jan Kiszka 0 siblings, 1 reply; 13+ messages in thread From: Anthony Liguori @ 2012-05-09 17:17 UTC (permalink / raw) To: Jan Kiszka Cc: Kevin Wolf, Peter Maydell, Anthony Liguori, Michael Tokarev, qemu-devel On 05/09/2012 06:38 AM, Jan Kiszka wrote: > On 2012-05-09 08:15, Peter Maydell wrote: >> On 9 May 2012 11:11, Kevin Wolf<kwolf@redhat.com> wrote: >>> Am 08.05.2012 21:35, schrieb Jan Kiszka: >>>> I hunted down a fairly subtle corruption of the VCPU thread signal mask >>>> in KVM mode when using the ucontext version of coroutines: >>>> >>>> coroutine_new calls getcontext, makecontext, swapcontext. Those >>>> functions get/set also the signal mask of the caller. Unfortunately, >>>> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >>>> version. So they do not properly save/restore the blocked RT signals, >>>> namely our SIG_IPI - it becomes unblocke this way. >>> >>> If other coroutine backends work (sigaltstack?), we could try to detect >>> the situation in configure and set the right default. Not sure what the >>> condition is, glibc + i386? >> >> I don't think you can do a compile-time test for this short of >> just disabling use of the ucontext code on all i386/Linux platforms. >> >> I think it's becoming increasingly obvious that the setcontext/getcontext >> code path is not very well used and prone to nasty libc bugs. Trying >> to implement coroutines in C is just a really bad idea and I think >> we should be trying to reduce our use of them if we possibly can, >> presumably by switching to actually using threads where we really >> need the parallelism. > > I tend to agree. > > FWIW, sigaltstack works around the issue here, but I'm still looking s > bit skeptical at its implementation. Is there any downside to using SIGUSR1? Regards, Anthony Liguori > > Jan > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 17:17 ` Anthony Liguori @ 2012-05-09 17:25 ` Jan Kiszka 2012-05-09 17:31 ` Anthony Liguori 0 siblings, 1 reply; 13+ messages in thread From: Jan Kiszka @ 2012-05-09 17:25 UTC (permalink / raw) To: Anthony Liguori Cc: Kevin Wolf, Peter Maydell, Anthony Liguori, Michael Tokarev, qemu-devel On 2012-05-09 14:17, Anthony Liguori wrote: > On 05/09/2012 06:38 AM, Jan Kiszka wrote: >> On 2012-05-09 08:15, Peter Maydell wrote: >>> On 9 May 2012 11:11, Kevin Wolf<kwolf@redhat.com> wrote: >>>> Am 08.05.2012 21:35, schrieb Jan Kiszka: >>>>> I hunted down a fairly subtle corruption of the VCPU thread signal mask >>>>> in KVM mode when using the ucontext version of coroutines: >>>>> >>>>> coroutine_new calls getcontext, makecontext, swapcontext. Those >>>>> functions get/set also the signal mask of the caller. Unfortunately, >>>>> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >>>>> version. So they do not properly save/restore the blocked RT signals, >>>>> namely our SIG_IPI - it becomes unblocke this way. >>>> >>>> If other coroutine backends work (sigaltstack?), we could try to detect >>>> the situation in configure and set the right default. Not sure what the >>>> condition is, glibc + i386? >>> >>> I don't think you can do a compile-time test for this short of >>> just disabling use of the ucontext code on all i386/Linux platforms. >>> >>> I think it's becoming increasingly obvious that the setcontext/getcontext >>> code path is not very well used and prone to nasty libc bugs. Trying >>> to implement coroutines in C is just a really bad idea and I think >>> we should be trying to reduce our use of them if we possibly can, >>> presumably by switching to actually using threads where we really >>> need the parallelism. >> >> I tend to agree. >> >> FWIW, sigaltstack works around the issue here, but I'm still looking s >> bit skeptical at its implementation. > > Is there any downside to using SIGUSR1? You mean for SIG_IPI? I don't think so. But the point is that the, well, limitation of ucontext will continue to break RT signals, and this in a very nasty way as only a specific setup is affected. I can't imagine we want this. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 17:25 ` Jan Kiszka @ 2012-05-09 17:31 ` Anthony Liguori 2012-05-09 18:21 ` Jan Kiszka 0 siblings, 1 reply; 13+ messages in thread From: Anthony Liguori @ 2012-05-09 17:31 UTC (permalink / raw) To: Jan Kiszka Cc: Kevin Wolf, Peter Maydell, Michael Tokarev, qemu-devel, Anthony Liguori On 05/09/2012 12:25 PM, Jan Kiszka wrote: > On 2012-05-09 14:17, Anthony Liguori wrote: >> On 05/09/2012 06:38 AM, Jan Kiszka wrote: >>> On 2012-05-09 08:15, Peter Maydell wrote: >>>> On 9 May 2012 11:11, Kevin Wolf<kwolf@redhat.com> wrote: >>>>> Am 08.05.2012 21:35, schrieb Jan Kiszka: >>>>>> I hunted down a fairly subtle corruption of the VCPU thread signal mask >>>>>> in KVM mode when using the ucontext version of coroutines: >>>>>> >>>>>> coroutine_new calls getcontext, makecontext, swapcontext. Those >>>>>> functions get/set also the signal mask of the caller. Unfortunately, >>>>>> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >>>>>> version. So they do not properly save/restore the blocked RT signals, >>>>>> namely our SIG_IPI - it becomes unblocke this way. >>>>> >>>>> If other coroutine backends work (sigaltstack?), we could try to detect >>>>> the situation in configure and set the right default. Not sure what the >>>>> condition is, glibc + i386? >>>> >>>> I don't think you can do a compile-time test for this short of >>>> just disabling use of the ucontext code on all i386/Linux platforms. >>>> >>>> I think it's becoming increasingly obvious that the setcontext/getcontext >>>> code path is not very well used and prone to nasty libc bugs. Trying >>>> to implement coroutines in C is just a really bad idea and I think >>>> we should be trying to reduce our use of them if we possibly can, >>>> presumably by switching to actually using threads where we really >>>> need the parallelism. >>> >>> I tend to agree. >>> >>> FWIW, sigaltstack works around the issue here, but I'm still looking s >>> bit skeptical at its implementation. >> >> Is there any downside to using SIGUSR1? > > You mean for SIG_IPI? I don't think so. But the point is that the, well, > limitation of ucontext will continue to break RT signals, Yes, but we currently don't use RT signals, right? So we could switch to SIGUSR1, fix the problem in glibc, and call it a day, no? Regards, Anthony Liguori and this in a > very nasty way as only a specific setup is affected. I can't imagine we > want this. > > Jan > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 17:31 ` Anthony Liguori @ 2012-05-09 18:21 ` Jan Kiszka 0 siblings, 0 replies; 13+ messages in thread From: Jan Kiszka @ 2012-05-09 18:21 UTC (permalink / raw) To: Anthony Liguori Cc: Kevin Wolf, Peter Maydell, Michael Tokarev, qemu-devel, Anthony Liguori On 2012-05-09 14:31, Anthony Liguori wrote: > On 05/09/2012 12:25 PM, Jan Kiszka wrote: >> On 2012-05-09 14:17, Anthony Liguori wrote: >>> On 05/09/2012 06:38 AM, Jan Kiszka wrote: >>>> On 2012-05-09 08:15, Peter Maydell wrote: >>>>> On 9 May 2012 11:11, Kevin Wolf<kwolf@redhat.com> wrote: >>>>>> Am 08.05.2012 21:35, schrieb Jan Kiszka: >>>>>>> I hunted down a fairly subtle corruption of the VCPU thread signal mask >>>>>>> in KVM mode when using the ucontext version of coroutines: >>>>>>> >>>>>>> coroutine_new calls getcontext, makecontext, swapcontext. Those >>>>>>> functions get/set also the signal mask of the caller. Unfortunately, >>>>>>> they only use the sigprocmask syscall on i386, not the rt_sigprocmask >>>>>>> version. So they do not properly save/restore the blocked RT signals, >>>>>>> namely our SIG_IPI - it becomes unblocke this way. >>>>>> >>>>>> If other coroutine backends work (sigaltstack?), we could try to detect >>>>>> the situation in configure and set the right default. Not sure what the >>>>>> condition is, glibc + i386? >>>>> >>>>> I don't think you can do a compile-time test for this short of >>>>> just disabling use of the ucontext code on all i386/Linux platforms. >>>>> >>>>> I think it's becoming increasingly obvious that the setcontext/getcontext >>>>> code path is not very well used and prone to nasty libc bugs. Trying >>>>> to implement coroutines in C is just a really bad idea and I think >>>>> we should be trying to reduce our use of them if we possibly can, >>>>> presumably by switching to actually using threads where we really >>>>> need the parallelism. >>>> >>>> I tend to agree. >>>> >>>> FWIW, sigaltstack works around the issue here, but I'm still looking s >>>> bit skeptical at its implementation. >>> >>> Is there any downside to using SIGUSR1? >> >> You mean for SIG_IPI? I don't think so. But the point is that the, well, >> limitation of ucontext will continue to break RT signals, > > Yes, but we currently don't use RT signals, right? So we could switch to > SIGUSR1, fix the problem in glibc, and call it a day, no? That cures the current symptom but does not prevent future diseases around RT signals. I would prefer to disable ucontext usage on those platforms we identified as broken. BTW, I'm starting to believe it's not a glibc but rather a Linux kernel issue, only biting us on 32/64. sigprocmask should only manipulate those signals, its masks can address. Digging deeper... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-09 11:15 ` Peter Maydell 2012-05-09 11:38 ` Jan Kiszka @ 2012-05-09 14:37 ` Avi Kivity 1 sibling, 0 replies; 13+ messages in thread From: Avi Kivity @ 2012-05-09 14:37 UTC (permalink / raw) To: Peter Maydell Cc: Kevin Wolf, Jan Kiszka, Anthony Liguori, Michael Tokarev, qemu-devel On 05/09/2012 02:15 PM, Peter Maydell wrote: > I think it's becoming increasingly obvious that the setcontext/getcontext > code path is not very well used and prone to nasty libc bugs. Trying > to implement coroutines in C is just a really bad idea and I think > we should be trying to reduce our use of them if we possibly can, > presumably by switching to actually using threads where we really > need the parallelism. > I happen to like coroutines, but if we switch to threads, we should ensure that we eliminate the extra context switch if the layer below (usually posix-aio-compat) also uses threads. That leaves a theoretical regression with qcow2-on-native-aio, but that's a fairly rare use case. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] coroutine-ucontext broken for x86-32 2012-05-08 19:35 [Qemu-devel] coroutine-ucontext broken for x86-32 Jan Kiszka 2012-05-09 7:32 ` Michael Tokarev 2012-05-09 10:11 ` Kevin Wolf @ 2012-05-09 18:05 ` Michael Tokarev 2 siblings, 0 replies; 13+ messages in thread From: Michael Tokarev @ 2012-05-09 18:05 UTC (permalink / raw) To: Jan Kiszka; +Cc: Kevin Wolf, Anthony Liguori, qemu-devel On 08.05.2012 23:35, Jan Kiszka wrote: > Hi, > > I hunted down a fairly subtle corruption of the VCPU thread signal mask > in KVM mode when using the ucontext version of coroutines: > > coroutine_new calls getcontext, makecontext, swapcontext. Those > functions get/set also the signal mask of the caller. Unfortunately, > they only use the sigprocmask syscall on i386, not the rt_sigprocmask > version. So they do not properly save/restore the blocked RT signals, > namely our SIG_IPI - it becomes unblocke this way. And this will sooner > or later make the kernel actually deliver a SIG_IPI to our > dummy_handler, and we miss a wakeup, which means losing control over > VCPU thread - qemu hangs. > > I was able to reproduce the issue very reliably with virtio-block > enabled, 32-bit qemu userspace on a 64-bit host, using a 32-bit WinXP > guest. > > Simple workaround: > > diff --git a/main-loop.h b/main-loop.h > index c06b8bc..dce1cd9 100644 > --- a/main-loop.h > +++ b/main-loop.h > @@ -25,11 +25,7 @@ > #ifndef QEMU_MAIN_LOOP_H > #define QEMU_MAIN_LOOP_H 1 > > -#ifdef SIGRTMIN > -#define SIG_IPI (SIGRTMIN+4) > -#else > #define SIG_IPI SIGUSR1 > -#endif [] > Michael, maybe this also relates to the issue you saw. I'm not able to > reproduce any VAPIC problems after make Windows bootable by switching > to SIGUSR1. FWIW, this fixes both the STOP 0x5c during reboot of windows 32bit guest and my other issue, with qemu stalling on 32/64bit environment. So yes indeed, that's the same thing apparently... Nice catch! /mjt ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-05-09 18:21 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-08 19:35 [Qemu-devel] coroutine-ucontext broken for x86-32 Jan Kiszka 2012-05-09 7:32 ` Michael Tokarev 2012-05-09 11:12 ` Jan Kiszka 2012-05-09 10:11 ` Kevin Wolf 2012-05-09 10:55 ` Jan Kiszka 2012-05-09 11:15 ` Peter Maydell 2012-05-09 11:38 ` Jan Kiszka 2012-05-09 17:17 ` Anthony Liguori 2012-05-09 17:25 ` Jan Kiszka 2012-05-09 17:31 ` Anthony Liguori 2012-05-09 18:21 ` Jan Kiszka 2012-05-09 14:37 ` Avi Kivity 2012-05-09 18:05 ` Michael Tokarev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).