* [Qemu-devel] [5578] Increase default IO timeout from 10ms to 5s
@ 2008-10-31 18:40 Anthony Liguori
2008-10-31 18:52 ` Anthony Liguori
2008-10-31 19:41 ` [Qemu-devel] " Jamie Lokier
0 siblings, 2 replies; 28+ messages in thread
From: Anthony Liguori @ 2008-10-31 18:40 UTC (permalink / raw)
To: qemu-devel
Revision: 5578
http://svn.sv.gnu.org/viewvc/?view=rev&root=qemu&revision=5578
Author: aliguori
Date: 2008-10-31 18:40:25 +0000 (Fri, 31 Oct 2008)
Log Message:
-----------
Increase default IO timeout from 10ms to 5s
With the recent changes to the main loop, we no longer have unconditional
polling. This means we can now sleep in select() for much longer than we
previously did. This patch increases our select() sleep time from 10ms to 5s
which is effectively unlimited since we're going to wake up sooner than that
in almost all circumstances.
With this patch, I see the number of wake-ups with an idle dynamic ticks guest
drop from 80 per second to about 15 times per second.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Modified Paths:
--------------
trunk/vl.c
Modified: trunk/vl.c
===================================================================
--- trunk/vl.c 2008-10-31 18:07:17 UTC (rev 5577)
+++ trunk/vl.c 2008-10-31 18:40:25 UTC (rev 5578)
@@ -8182,7 +8182,7 @@
timeout = 0;
}
} else {
- timeout = 10;
+ timeout = 5000;
}
} else {
timeout = 0;
@@ -8192,7 +8192,7 @@
ret = EXCP_INTERRUPT;
break;
}
- timeout = 10;
+ timeout = 5000;
}
#ifdef CONFIG_PROFILER
ti = profile_getclock();
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [5578] Increase default IO timeout from 10ms to 5s
2008-10-31 18:40 [Qemu-devel] [5578] Increase default IO timeout from 10ms to 5s Anthony Liguori
@ 2008-10-31 18:52 ` Anthony Liguori
2008-11-02 19:08 ` [Qemu-devel] " Jan Kiszka
2008-10-31 19:41 ` [Qemu-devel] " Jamie Lokier
1 sibling, 1 reply; 28+ messages in thread
From: Anthony Liguori @ 2008-10-31 18:52 UTC (permalink / raw)
To: qemu-devel
If anyone notices a slow down, I'd check this revision. There may be
broken bits of code out there that depends on wake up rate. I've done a
fair bit of testing and haven't found any but you never know.
Regards,
Anthony Liguori
Anthony Liguori wrote:
> Revision: 5578
> http://svn.sv.gnu.org/viewvc/?view=rev&root=qemu&revision=5578
> Author: aliguori
> Date: 2008-10-31 18:40:25 +0000 (Fri, 31 Oct 2008)
>
> Log Message:
> -----------
> Increase default IO timeout from 10ms to 5s
>
> With the recent changes to the main loop, we no longer have unconditional
> polling. This means we can now sleep in select() for much longer than we
> previously did. This patch increases our select() sleep time from 10ms to 5s
> which is effectively unlimited since we're going to wake up sooner than that
> in almost all circumstances.
>
> With this patch, I see the number of wake-ups with an idle dynamic ticks guest
> drop from 80 per second to about 15 times per second.
>
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>
> Modified Paths:
> --------------
> trunk/vl.c
>
> Modified: trunk/vl.c
> ===================================================================
> --- trunk/vl.c 2008-10-31 18:07:17 UTC (rev 5577)
> +++ trunk/vl.c 2008-10-31 18:40:25 UTC (rev 5578)
> @@ -8182,7 +8182,7 @@
> timeout = 0;
> }
> } else {
> - timeout = 10;
> + timeout = 5000;
> }
> } else {
> timeout = 0;
> @@ -8192,7 +8192,7 @@
> ret = EXCP_INTERRUPT;
> break;
> }
> - timeout = 10;
> + timeout = 5000;
> }
> #ifdef CONFIG_PROFILER
> ti = profile_getclock();
>
>
>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [5578] Increase default IO timeout from 10ms to 5s
2008-10-31 18:40 [Qemu-devel] [5578] Increase default IO timeout from 10ms to 5s Anthony Liguori
2008-10-31 18:52 ` Anthony Liguori
@ 2008-10-31 19:41 ` Jamie Lokier
2008-10-31 20:13 ` Anthony Liguori
1 sibling, 1 reply; 28+ messages in thread
From: Jamie Lokier @ 2008-10-31 19:41 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> With this patch, I see the number of wake-ups with an idle dynamic
> ticks guest drop from 80 per second to about 15 times per second.
That's great, it really is, but shouldn't it be less than that?
Did you look at the wakeup rate shown by powertop in the guest, to see
how many extra wakeups are caused by QEMU internals?
-- Jamie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] [5578] Increase default IO timeout from 10ms to 5s
2008-10-31 19:41 ` [Qemu-devel] " Jamie Lokier
@ 2008-10-31 20:13 ` Anthony Liguori
0 siblings, 0 replies; 28+ messages in thread
From: Anthony Liguori @ 2008-10-31 20:13 UTC (permalink / raw)
To: qemu-devel
Jamie Lokier wrote:
> Anthony Liguori wrote:
>
>> With this patch, I see the number of wake-ups with an idle dynamic
>> ticks guest drop from 80 per second to about 15 times per second.
>>
>
> That's great, it really is, but shouldn't it be less than that?
>
> Did you look at the wakeup rate shown by powertop in the guest, to see
> how many extra wakeups are caused by QEMU internals?
>
I don't have a guest handy that I could run powertop on, but I know of
at least a few timers that can be gotten rid of (like the rtc timers).
Regards,
Anthony Liguori
> -- Jamie
>
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-10-31 18:52 ` Anthony Liguori
@ 2008-11-02 19:08 ` Jan Kiszka
2008-11-03 20:04 ` Anthony Liguori
0 siblings, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2008-11-02 19:08 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 562 bytes --]
Anthony Liguori wrote:
> If anyone notices a slow down, I'd check this revision. There may be
> broken bits of code out there that depends on wake up rate. I've done a
> fair bit of testing and haven't found any but you never know.
OK, here is one brokenness for you: Musicpal (qemu-arm-system) becomes
unusable, sounds is stopping for seconds, even keyboard input is
inpossible, including switching to the monitor console. What should I
check for? Note that this board uses a ptimer to a PIT to the guest. May
this make a difference here?
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-02 19:08 ` [Qemu-devel] " Jan Kiszka
@ 2008-11-03 20:04 ` Anthony Liguori
2008-11-03 20:36 ` Jan Kiszka
0 siblings, 1 reply; 28+ messages in thread
From: Anthony Liguori @ 2008-11-03 20:04 UTC (permalink / raw)
To: Jan Kiszka; +Cc: qemu-devel
Jan Kiszka wrote:
> Anthony Liguori wrote:
>
>> If anyone notices a slow down, I'd check this revision. There may be
>> broken bits of code out there that depends on wake up rate. I've done a
>> fair bit of testing and haven't found any but you never know.
>>
>
> OK, here is one brokenness for you: Musicpal (qemu-arm-system) becomes
> unusable, sounds is stopping for seconds, even keyboard input is
> inpossible, including switching to the monitor console. What should I
> check for? Note that this board uses a ptimer to a PIT to the guest. May
> this make a difference here?
>
Keyboard input should be governed by the VGA refresh timer which should
be firing 30 times a second.
How do you have qemu configured? Are you using SDL? What sound driver
are you using on the host?
Regards,
Anthony Liguori
> Jan
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-03 20:04 ` Anthony Liguori
@ 2008-11-03 20:36 ` Jan Kiszka
2008-11-03 21:50 ` Jan Kiszka
0 siblings, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2008-11-03 20:36 UTC (permalink / raw)
To: Anthony Liguori; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]
Anthony Liguori wrote:
> Jan Kiszka wrote:
>> Anthony Liguori wrote:
>>
>>> If anyone notices a slow down, I'd check this revision. There may be
>>> broken bits of code out there that depends on wake up rate. I've done a
>>> fair bit of testing and haven't found any but you never know.
>>>
>>
>> OK, here is one brokenness for you: Musicpal (qemu-arm-system) becomes
>> unusable, sounds is stopping for seconds, even keyboard input is
>> inpossible, including switching to the monitor console. What should I
>> check for? Note that this board uses a ptimer to a PIT to the guest. May
>> this make a difference here?
>>
>
> Keyboard input should be governed by the VGA refresh timer which should
> be firing 30 times a second.
If you mean *the* VGA emulation - there is none, the target is an
embedded ARM board.
>
> How do you have qemu configured?
qemu-system-arm -M musicpal -pflash musicpal.image -snapshot \
-kernel u-boot.image -serial stdio
(I can provide you both privately if you want to play with it.)
> Are you using SDL?
Yes.
> What sound driver
> are you using on the host?
ALSA.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-03 20:36 ` Jan Kiszka
@ 2008-11-03 21:50 ` Jan Kiszka
2008-11-03 22:00 ` Anthony Liguori
2008-11-04 8:29 ` Avi Kivity
0 siblings, 2 replies; 28+ messages in thread
From: Jan Kiszka @ 2008-11-03 21:50 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2736 bytes --]
Jan Kiszka wrote:
> Anthony Liguori wrote:
>> Jan Kiszka wrote:
>>> Anthony Liguori wrote:
>>>
>>>> If anyone notices a slow down, I'd check this revision. There may be
>>>> broken bits of code out there that depends on wake up rate. I've done a
>>>> fair bit of testing and haven't found any but you never know.
>>>>
>>> OK, here is one brokenness for you: Musicpal (qemu-arm-system) becomes
>>> unusable, sounds is stopping for seconds, even keyboard input is
>>> inpossible, including switching to the monitor console. What should I
>>> check for? Note that this board uses a ptimer to a PIT to the guest. May
>>> this make a difference here?
>>>
I think I understood the problem:
[...]
timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xca8d90) = 0
select(24, [0 5 6 8 23], [], [], {0, 0}) = 0 (Timeout)
ioctl(20, 0x4122, 0xbb) = 0
ioctl(20, 0x4122, 0x54) = 0
ioctl(20, 0x4122, 0x3) = 0
ioctl(20, 0x4122, 0) = 0
ioctl(20, 0x4122, 0xac) = 0
ioctl(20, 0x4122, 0) = 0
ioctl(20, 0x4122, 0x400) = 0
ioctl(21, USBDEVFS_IOCTL, 0xd64080) = 0
ioctl(20, 0x4122, 0xc000000000020400) = 0
semop(34242562, 0x7fff65de5960, 2) = 0
semop(34242562, 0x7fff65de5970, 1) = 0
read(19, "\1\0\0\0\0\201\377\377\321Z\2\0\0\0\0\0\'PO&\0\0\0\0\1\0\0\0\377\377\377\377", 128) = 32
timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0xca8d90) = 1
select(24, [0 5 6 8 23], [], [], {5, 0}^C <unfinished ...>
There is a race between the alarm_timer firing SIGALRM and
main_loop_wait reaching the safe harbor of select (with that infamous 5
second timeout). If the signal comes when already blocked in select, it
will properly resume the latter immediately. But if the timer fired
BEFORE that point, host_alarm_handler will only set a flag that the host
timer has fired, the actual rearming will be done AFTER return from
select. Ooops....
So, select should actually include the host timer as event. timerfd?
Unfortunately a recent Linux-only feature :-/. I don't think we can
rearm the timer from within the signal handler, at least not without
running all the pending qemu timers. And that is surely not a signal
handler job (qemu timer handler aren't thread-safe in general).
Anyone any ideas? /me is thinking a bit more about it as well.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-03 21:50 ` Jan Kiszka
@ 2008-11-03 22:00 ` Anthony Liguori
2008-11-03 22:03 ` Jan Kiszka
2008-11-04 8:07 ` andrzej zaborowski
2008-11-04 8:29 ` Avi Kivity
1 sibling, 2 replies; 28+ messages in thread
From: Anthony Liguori @ 2008-11-03 22:00 UTC (permalink / raw)
To: Jan Kiszka; +Cc: qemu-devel
Jan Kiszka wrote:
> Jan Kiszka wrote:
>
>
> There is a race between the alarm_timer firing SIGALRM and
> main_loop_wait reaching the safe harbor of select (with that infamous 5
> second timeout). If the signal comes when already blocked in select, it
> will properly resume the latter immediately. But if the timer fired
> BEFORE that point, host_alarm_handler will only set a flag that the host
> timer has fired, the actual rearming will be done AFTER return from
> select. Ooops....
>
Ah, so before this was causing the timer to potentially come 10ms later
than it should have. I was hoping that this change would shake out this
stuff :-)
> So, select should actually include the host timer as event. timerfd?
> Unfortunately a recent Linux-only feature :-/. I don't think we can
> rearm the timer from within the signal handler, at least not without
> running all the pending qemu timers. And that is surely not a signal
> handler job (qemu timer handler aren't thread-safe in general).
>
> Anyone any ideas? /me is thinking a bit more about it as well.
>
host_alarm_handler should write to a file descriptor instead of setting
a flag. That file descriptor should then be select()'d on (just like we
do for SIGUSR2 in block-raw-posix.c).
Regards,
Anthony Liguori
> Jan
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-03 22:00 ` Anthony Liguori
@ 2008-11-03 22:03 ` Jan Kiszka
2008-11-04 8:07 ` andrzej zaborowski
1 sibling, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2008-11-03 22:03 UTC (permalink / raw)
To: Anthony Liguori; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]
Anthony Liguori wrote:
> Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>
>> There is a race between the alarm_timer firing SIGALRM and
>> main_loop_wait reaching the safe harbor of select (with that infamous 5
>> second timeout). If the signal comes when already blocked in select, it
>> will properly resume the latter immediately. But if the timer fired
>> BEFORE that point, host_alarm_handler will only set a flag that the host
>> timer has fired, the actual rearming will be done AFTER return from
>> select. Ooops....
>>
>
> Ah, so before this was causing the timer to potentially come 10ms later
> than it should have. I was hoping that this change would shake out this
> stuff :-)
>
>> So, select should actually include the host timer as event. timerfd?
>> Unfortunately a recent Linux-only feature :-/. I don't think we can
>> rearm the timer from within the signal handler, at least not without
>> running all the pending qemu timers. And that is surely not a signal
>> handler job (qemu timer handler aren't thread-safe in general).
>>
>> Anyone any ideas? /me is thinking a bit more about it as well.
>>
>
> host_alarm_handler should write to a file descriptor instead of setting
> a flag. That file descriptor should then be select()'d on (just like we
> do for SIGUSR2 in block-raw-posix.c).
A pipe, that slowly came to my mind as well. OK, will play with it.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-03 22:00 ` Anthony Liguori
2008-11-03 22:03 ` Jan Kiszka
@ 2008-11-04 8:07 ` andrzej zaborowski
2008-11-04 8:22 ` Jan Kiszka
1 sibling, 1 reply; 28+ messages in thread
From: andrzej zaborowski @ 2008-11-04 8:07 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Kiszka
2008/11/3 Anthony Liguori <anthony@codemonkey.ws>:
> Jan Kiszka wrote:
>>
>> Jan Kiszka wrote:
>>
>> There is a race between the alarm_timer firing SIGALRM and
>> main_loop_wait reaching the safe harbor of select (with that infamous 5
>> second timeout). If the signal comes when already blocked in select, it
>> will properly resume the latter immediately. But if the timer fired
>> BEFORE that point, host_alarm_handler will only set a flag that the host
>> timer has fired, the actual rearming will be done AFTER return from
>> select. Ooops....
>>
>
> Ah, so before this was causing the timer to potentially come 10ms later than
> it should have. I was hoping that this change would shake out this stuff
> :-)
>
>> So, select should actually include the host timer as event. timerfd?
>> Unfortunately a recent Linux-only feature :-/. I don't think we can
>> rearm the timer from within the signal handler, at least not without
>> running all the pending qemu timers. And that is surely not a signal
>> handler job (qemu timer handler aren't thread-safe in general).
>>
>> Anyone any ideas? /me is thinking a bit more about it as well.
The select() man page on Linux mentions this race explicitely and
explains that pselect() is a solution.
>>
>
> host_alarm_handler should write to a file descriptor instead of setting a
> flag. That file descriptor should then be select()'d on (just like we do
> for SIGUSR2 in block-raw-posix.c).
Or you can do this.
Cheers
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-04 8:07 ` andrzej zaborowski
@ 2008-11-04 8:22 ` Jan Kiszka
2008-11-04 8:33 ` andrzej zaborowski
0 siblings, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2008-11-04 8:22 UTC (permalink / raw)
To: andrzej zaborowski; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]
andrzej zaborowski wrote:
> 2008/11/3 Anthony Liguori <anthony@codemonkey.ws>:
>> Jan Kiszka wrote:
>>> Jan Kiszka wrote:
>>>
>>> There is a race between the alarm_timer firing SIGALRM and
>>> main_loop_wait reaching the safe harbor of select (with that infamous 5
>>> second timeout). If the signal comes when already blocked in select, it
>>> will properly resume the latter immediately. But if the timer fired
>>> BEFORE that point, host_alarm_handler will only set a flag that the host
>>> timer has fired, the actual rearming will be done AFTER return from
>>> select. Ooops....
>>>
>> Ah, so before this was causing the timer to potentially come 10ms later than
>> it should have. I was hoping that this change would shake out this stuff
>> :-)
>>
>>> So, select should actually include the host timer as event. timerfd?
>>> Unfortunately a recent Linux-only feature :-/. I don't think we can
>>> rearm the timer from within the signal handler, at least not without
>>> running all the pending qemu timers. And that is surely not a signal
>>> handler job (qemu timer handler aren't thread-safe in general).
>>>
>>> Anyone any ideas? /me is thinking a bit more about it as well.
>
> The select() man page on Linux mentions this race explicitely and
> explains that pselect() is a solution.
>
>> host_alarm_handler should write to a file descriptor instead of setting a
>> flag. That file descriptor should then be select()'d on (just like we do
>> for SIGUSR2 in block-raw-posix.c).
>
> Or you can do this.
I think this is safer. Or what's the state of pselect on all supported
platforms (including WIN32)? My man page even warns that the Linux
kernel is not implementing it yet, though I don't think this still
applies to recent 2.6.2x kernels.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-03 21:50 ` Jan Kiszka
2008-11-03 22:00 ` Anthony Liguori
@ 2008-11-04 8:29 ` Avi Kivity
1 sibling, 0 replies; 28+ messages in thread
From: Avi Kivity @ 2008-11-04 8:29 UTC (permalink / raw)
To: qemu-devel
Jan Kiszka wrote:
> There is a race between the alarm_timer firing SIGALRM and
> main_loop_wait reaching the safe harbor of select (with that infamous 5
> second timeout). If the signal comes when already blocked in select, it
> will properly resume the latter immediately. But if the timer fired
> BEFORE that point, host_alarm_handler will only set a flag that the host
> timer has fired, the actual rearming will be done AFTER return from
> select. Ooops....
>
> So, select should actually include the host timer as event. timerfd?
> Unfortunately a recent Linux-only feature :-/. I don't think we can
> rearm the timer from within the signal handler, at least not without
> running all the pending qemu timers. And that is surely not a signal
> handler job (qemu timer handler aren't thread-safe in general).
>
> Anyone any ideas? /me is thinking a bit more about it as well.
>
pselect():
1. run tcg code
2. block signals
3. pselect (enabling signals)
4. process pselect results (signals disabled again)
5. unblock signals
6. goto 1
Prior to Linux 2.6.16, pselect() was racy since it was implemented in
userspace rather than the kernel. I don't think it's a problem now.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-04 8:22 ` Jan Kiszka
@ 2008-11-04 8:33 ` andrzej zaborowski
2008-11-04 11:32 ` Jamie Lokier
0 siblings, 1 reply; 28+ messages in thread
From: andrzej zaborowski @ 2008-11-04 8:33 UTC (permalink / raw)
To: Jan Kiszka; +Cc: qemu-devel
2008/11/4 Jan Kiszka <jan.kiszka@web.de>:
> andrzej zaborowski wrote:
>> 2008/11/3 Anthony Liguori <anthony@codemonkey.ws>:
>>> Jan Kiszka wrote:
>>>> Jan Kiszka wrote:
>>>>
>>>> There is a race between the alarm_timer firing SIGALRM and
>>>> main_loop_wait reaching the safe harbor of select (with that infamous 5
>>>> second timeout). If the signal comes when already blocked in select, it
>>>> will properly resume the latter immediately. But if the timer fired
>>>> BEFORE that point, host_alarm_handler will only set a flag that the host
>>>> timer has fired, the actual rearming will be done AFTER return from
>>>> select. Ooops....
>>>>
>>> Ah, so before this was causing the timer to potentially come 10ms later than
>>> it should have. I was hoping that this change would shake out this stuff
>>> :-)
>>>
>>>> So, select should actually include the host timer as event. timerfd?
>>>> Unfortunately a recent Linux-only feature :-/. I don't think we can
>>>> rearm the timer from within the signal handler, at least not without
>>>> running all the pending qemu timers. And that is surely not a signal
>>>> handler job (qemu timer handler aren't thread-safe in general).
>>>>
>>>> Anyone any ideas? /me is thinking a bit more about it as well.
>>
>> The select() man page on Linux mentions this race explicitely and
>> explains that pselect() is a solution.
>>
>>> host_alarm_handler should write to a file descriptor instead of setting a
>>> flag. That file descriptor should then be select()'d on (just like we do
>>> for SIGUSR2 in block-raw-posix.c).
>>
>> Or you can do this.
>
> I think this is safer. Or what's the state of pselect on all supported
> platforms (including WIN32)?
Supposedly it's in posix, but no idea about win32. Maybe the pipe is safer.
> My man page even warns that the Linux
> kernel is not implementing it yet, though I don't think this still
> applies to recent 2.6.2x kernels.
According to the man page it moved to kernel at 2.6.16 but the glibc
wrapper should be ok too.
Cheers
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-04 8:33 ` andrzej zaborowski
@ 2008-11-04 11:32 ` Jamie Lokier
2008-11-04 16:22 ` M. Warner Losh
0 siblings, 1 reply; 28+ messages in thread
From: Jamie Lokier @ 2008-11-04 11:32 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Kiszka
andrzej zaborowski wrote:
> > My man page even warns that the Linux
> > kernel is not implementing it yet, though I don't think this still
> > applies to recent 2.6.2x kernels.
>
> According to the man page it moved to kernel at 2.6.16 but the glibc
> wrapper should be ok too.
If there's a glibc wrapper, it cannot be reliable...
*Looks at glibc source*
That's right. The glibc pselect() wrapper has the same race condition
which prompted this QEMU bug. If the signal arrives after unmasking
and before select() in the wrapper, then blocks.
In other words, don't use pselect() if you might run on a kernel older
than 2.6.16, or on a host architecture which adds pselect() in a later
kernel version. Also, I wouldn't be surprised if older versions of
some BSDs have similar dodgy wrappers.
-- Jamie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-04 11:32 ` Jamie Lokier
@ 2008-11-04 16:22 ` M. Warner Losh
2008-11-04 17:10 ` Jan Kiszka
2008-11-05 15:00 ` Jamie Lokier
0 siblings, 2 replies; 28+ messages in thread
From: M. Warner Losh @ 2008-11-04 16:22 UTC (permalink / raw)
To: qemu-devel, jamie; +Cc: jan.kiszka
In message: <20081104113204.GA32125@shareable.org>
Jamie Lokier <jamie@shareable.org> writes:
: andrzej zaborowski wrote:
: > > My man page even warns that the Linux
: > > kernel is not implementing it yet, though I don't think this still
: > > applies to recent 2.6.2x kernels.
: >
: > According to the man page it moved to kernel at 2.6.16 but the glibc
: > wrapper should be ok too.
:
: If there's a glibc wrapper, it cannot be reliable...
:
: *Looks at glibc source*
:
: That's right. The glibc pselect() wrapper has the same race condition
: which prompted this QEMU bug. If the signal arrives after unmasking
: and before select() in the wrapper, then blocks.
:
: In other words, don't use pselect() if you might run on a kernel older
: than 2.6.16, or on a host architecture which adds pselect() in a later
: kernel version. Also, I wouldn't be surprised if older versions of
: some BSDs have similar dodgy wrappers.
Which ones have a good kernel implementation of it? FreeBSD's is
currently approximately:
if (!mask)
_sigprocmask(mask, &oldmask);
/* here */
select();
if (!mask)
_sigprocmask(oldmask, NULL);
I'm assuming that the problem is due to a signal arriving at /* here */.
Warner
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-04 16:22 ` M. Warner Losh
@ 2008-11-04 17:10 ` Jan Kiszka
2008-11-04 17:55 ` M. Warner Losh
2008-11-05 15:00 ` Jamie Lokier
1 sibling, 1 reply; 28+ messages in thread
From: Jan Kiszka @ 2008-11-04 17:10 UTC (permalink / raw)
To: M. Warner Losh; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1546 bytes --]
M. Warner Losh wrote:
> In message: <20081104113204.GA32125@shareable.org>
> Jamie Lokier <jamie@shareable.org> writes:
> : andrzej zaborowski wrote:
> : > > My man page even warns that the Linux
> : > > kernel is not implementing it yet, though I don't think this still
> : > > applies to recent 2.6.2x kernels.
> : >
> : > According to the man page it moved to kernel at 2.6.16 but the glibc
> : > wrapper should be ok too.
> :
> : If there's a glibc wrapper, it cannot be reliable...
> :
> : *Looks at glibc source*
> :
> : That's right. The glibc pselect() wrapper has the same race condition
> : which prompted this QEMU bug. If the signal arrives after unmasking
> : and before select() in the wrapper, then blocks.
> :
> : In other words, don't use pselect() if you might run on a kernel older
> : than 2.6.16, or on a host architecture which adds pselect() in a later
> : kernel version. Also, I wouldn't be surprised if older versions of
> : some BSDs have similar dodgy wrappers.
>
> Which ones have a good kernel implementation of it? FreeBSD's is
> currently approximately:
>
> if (!mask)
> _sigprocmask(mask, &oldmask);
> /* here */
> select();
> if (!mask)
> _sigprocmask(oldmask, NULL);
>
> I'm assuming that the problem is due to a signal arriving at /* here */.
I guess those things happen under some kind of preemption lock,
otherwise it would be a really poor implementation.
However, think we buried the idea of using pselect for this anyway.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-04 17:10 ` Jan Kiszka
@ 2008-11-04 17:55 ` M. Warner Losh
2008-11-04 19:08 ` Jan Kiszka
0 siblings, 1 reply; 28+ messages in thread
From: M. Warner Losh @ 2008-11-04 17:55 UTC (permalink / raw)
To: jan.kiszka; +Cc: qemu-devel
In message: <491081EB.5070905@web.de>
Jan Kiszka <jan.kiszka@web.de> writes:
: M. Warner Losh wrote:
: > In message: <20081104113204.GA32125@shareable.org>
: > Jamie Lokier <jamie@shareable.org> writes:
: > : andrzej zaborowski wrote:
: > : > > My man page even warns that the Linux
: > : > > kernel is not implementing it yet, though I don't think this still
: > : > > applies to recent 2.6.2x kernels.
: > : >
: > : > According to the man page it moved to kernel at 2.6.16 but the glibc
: > : > wrapper should be ok too.
: > :
: > : If there's a glibc wrapper, it cannot be reliable...
: > :
: > : *Looks at glibc source*
: > :
: > : That's right. The glibc pselect() wrapper has the same race condition
: > : which prompted this QEMU bug. If the signal arrives after unmasking
: > : and before select() in the wrapper, then blocks.
: > :
: > : In other words, don't use pselect() if you might run on a kernel older
: > : than 2.6.16, or on a host architecture which adds pselect() in a later
: > : kernel version. Also, I wouldn't be surprised if older versions of
: > : some BSDs have similar dodgy wrappers.
: >
: > Which ones have a good kernel implementation of it? FreeBSD's is
: > currently approximately:
: >
: > if (!mask)
: > _sigprocmask(mask, &oldmask);
: > /* here */
: > select();
: > if (!mask)
: > _sigprocmask(oldmask, NULL);
: >
: > I'm assuming that the problem is due to a signal arriving at /* here */.
:
: I guess those things happen under some kind of preemption lock,
: otherwise it would be a really poor implementation.
Why? There's races there anyway. IF you have a mutex to prevent
multiple calls, doesn't that serialize calls to select?
Warner
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-04 17:55 ` M. Warner Losh
@ 2008-11-04 19:08 ` Jan Kiszka
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kiszka @ 2008-11-04 19:08 UTC (permalink / raw)
To: M. Warner Losh; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2217 bytes --]
M. Warner Losh wrote:
> In message: <491081EB.5070905@web.de>
> Jan Kiszka <jan.kiszka@web.de> writes:
> : M. Warner Losh wrote:
> : > In message: <20081104113204.GA32125@shareable.org>
> : > Jamie Lokier <jamie@shareable.org> writes:
> : > : andrzej zaborowski wrote:
> : > : > > My man page even warns that the Linux
> : > : > > kernel is not implementing it yet, though I don't think this still
> : > : > > applies to recent 2.6.2x kernels.
> : > : >
> : > : > According to the man page it moved to kernel at 2.6.16 but the glibc
> : > : > wrapper should be ok too.
> : > :
> : > : If there's a glibc wrapper, it cannot be reliable...
> : > :
> : > : *Looks at glibc source*
> : > :
> : > : That's right. The glibc pselect() wrapper has the same race condition
> : > : which prompted this QEMU bug. If the signal arrives after unmasking
> : > : and before select() in the wrapper, then blocks.
> : > :
> : > : In other words, don't use pselect() if you might run on a kernel older
> : > : than 2.6.16, or on a host architecture which adds pselect() in a later
> : > : kernel version. Also, I wouldn't be surprised if older versions of
> : > : some BSDs have similar dodgy wrappers.
> : >
> : > Which ones have a good kernel implementation of it? FreeBSD's is
> : > currently approximately:
> : >
> : > if (!mask)
> : > _sigprocmask(mask, &oldmask);
> : > /* here */
> : > select();
> : > if (!mask)
> : > _sigprocmask(oldmask, NULL);
> : >
> : > I'm assuming that the problem is due to a signal arriving at /* here */.
> :
> : I guess those things happen under some kind of preemption lock,
> : otherwise it would be a really poor implementation.
>
> Why? There's races there anyway. IF you have a mutex to prevent
> multiple calls, doesn't that serialize calls to select?
[ sorting my mind ] What I meant was that some care will likely be taken
to prevent signal delivery _to userspace_ between the unmasking and the
select() call. At least Linux does delivery on syscall return, so should
FreeBSD do. I guess that signals arriving around "/* here */" will
simply prevent select() to block (or even run).
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 258 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-04 16:22 ` M. Warner Losh
2008-11-04 17:10 ` Jan Kiszka
@ 2008-11-05 15:00 ` Jamie Lokier
2008-11-05 16:10 ` M. Warner Losh
1 sibling, 1 reply; 28+ messages in thread
From: Jamie Lokier @ 2008-11-05 15:00 UTC (permalink / raw)
To: M. Warner Losh; +Cc: jan.kiszka, qemu-devel
M. Warner Losh wrote:
> : In other words, don't use pselect() if you might run on a kernel older
> : than 2.6.16, or on a host architecture which adds pselect() in a later
> : kernel version. Also, I wouldn't be surprised if older versions of
> : some BSDs have similar dodgy wrappers.
>
> Which ones have a good kernel implementation of it? FreeBSD's is
> currently approximately:
>
> if (!mask)
> _sigprocmask(mask, &oldmask);
> /* here */
> select();
> if (!mask)
> _sigprocmask(oldmask, NULL);
>
> I'm assuming that the problem is due to a signal arriving at /* here */.
If that's _kernel_ code and the kernel behaves like Linux, it's not a
problem because signals don't affect the control flow until returning
to userspace, meaning the select() will return EINTR.
If that's userspace (libc) code, then it is no good. Nobody should
ever have written crappy pselect() wrappers in userspace (i.e. Glibc),
it just causes portable software to have to keep a whitelist of
reliable pselect() platforms (i.e. not Linux) instead of just using
it. Same for crappy broken pread() and pwrite() wrappers.
-- Jamie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-05 15:00 ` Jamie Lokier
@ 2008-11-05 16:10 ` M. Warner Losh
2008-11-05 18:21 ` Jan Kiszka
2008-11-06 0:53 ` Jamie Lokier
0 siblings, 2 replies; 28+ messages in thread
From: M. Warner Losh @ 2008-11-05 16:10 UTC (permalink / raw)
To: jamie; +Cc: jan.kiszka, qemu-devel
In message: <20081105150042.GJ13630@shareable.org>
Jamie Lokier <jamie@shareable.org> writes:
: M. Warner Losh wrote:
: > : In other words, don't use pselect() if you might run on a kernel older
: > : than 2.6.16, or on a host architecture which adds pselect() in a later
: > : kernel version. Also, I wouldn't be surprised if older versions of
: > : some BSDs have similar dodgy wrappers.
: >
: > Which ones have a good kernel implementation of it? FreeBSD's is
: > currently approximately:
: >
: > if (!mask)
: > _sigprocmask(mask, &oldmask);
: > /* here */
: > select();
: > if (!mask)
: > _sigprocmask(oldmask, NULL);
: >
: > I'm assuming that the problem is due to a signal arriving at /* here */.
:
: If that's _kernel_ code and the kernel behaves like Linux, it's not a
: problem because signals don't affect the control flow until returning
: to userspace, meaning the select() will return EINTR.
It is currently user level code, and I'm looking at moving it into the
kernel, but I need to understand the race being talked about here.
: If that's userspace (libc) code, then it is no good. Nobody should
: ever have written crappy pselect() wrappers in userspace (i.e. Glibc),
: it just causes portable software to have to keep a whitelist of
: reliable pselect() platforms (i.e. not Linux) instead of just using
: it. Same for crappy broken pread() and pwrite() wrappers.
Why is it no good. What is the race here? Is it just the oldmask
thing and multiple callers to select, or is it something else? And if
it is the oldmask thing, why wouldn't multiple callers of pselect mess
it up depending on what order they have. I must have missed the
original description of the race and why it matters..
Warner
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-05 16:10 ` M. Warner Losh
@ 2008-11-05 18:21 ` Jan Kiszka
2008-11-05 18:41 ` Daniel P. Berrange
2008-11-05 20:16 ` andrzej zaborowski
2008-11-06 0:53 ` Jamie Lokier
1 sibling, 2 replies; 28+ messages in thread
From: Jan Kiszka @ 2008-11-05 18:21 UTC (permalink / raw)
To: M. Warner Losh; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1915 bytes --]
M. Warner Losh wrote:
> In message: <20081105150042.GJ13630@shareable.org>
> Jamie Lokier <jamie@shareable.org> writes:
> : M. Warner Losh wrote:
> : > : In other words, don't use pselect() if you might run on a kernel older
> : > : than 2.6.16, or on a host architecture which adds pselect() in a later
> : > : kernel version. Also, I wouldn't be surprised if older versions of
> : > : some BSDs have similar dodgy wrappers.
> : >
> : > Which ones have a good kernel implementation of it? FreeBSD's is
> : > currently approximately:
> : >
> : > if (!mask)
> : > _sigprocmask(mask, &oldmask);
> : > /* here */
> : > select();
> : > if (!mask)
> : > _sigprocmask(oldmask, NULL);
> : >
> : > I'm assuming that the problem is due to a signal arriving at /* here */.
> :
> : If that's _kernel_ code and the kernel behaves like Linux, it's not a
> : problem because signals don't affect the control flow until returning
> : to userspace, meaning the select() will return EINTR.
>
> It is currently user level code, and I'm looking at moving it into the
> kernel, but I need to understand the race being talked about here.
From the Linux man page on [p]select:
"The reason that pselect() is needed is that if one wants to wait for
either a signal or for a file descriptor to become ready, then an atomic
test is needed to prevent race conditions. (Suppose the signal handler
sets a global flag and returns. Then a test of this global flag followed
by a call of select() could hang indefinitely if the signal arrived just
after the test but just before the call. By contrast, pselect() allows
one to first block signals, handle the signals that have come in, then
call pselect() with the desired sigmask, avoiding the race.)"
So the unmasking and possible blocking on select must be done
atomically. And that is only feasible in kernel land.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-05 18:21 ` Jan Kiszka
@ 2008-11-05 18:41 ` Daniel P. Berrange
2008-11-05 20:16 ` andrzej zaborowski
1 sibling, 0 replies; 28+ messages in thread
From: Daniel P. Berrange @ 2008-11-05 18:41 UTC (permalink / raw)
To: qemu-devel
On Wed, Nov 05, 2008 at 07:21:40PM +0100, Jan Kiszka wrote:
> M. Warner Losh wrote:
> > In message: <20081105150042.GJ13630@shareable.org>
> > Jamie Lokier <jamie@shareable.org> writes:
> > : M. Warner Losh wrote:
> > : > : In other words, don't use pselect() if you might run on a kernel older
> > : > : than 2.6.16, or on a host architecture which adds pselect() in a later
> > : > : kernel version. Also, I wouldn't be surprised if older versions of
> > : > : some BSDs have similar dodgy wrappers.
> > : >
> > : > Which ones have a good kernel implementation of it? FreeBSD's is
> > : > currently approximately:
> > : >
> > : > if (!mask)
> > : > _sigprocmask(mask, &oldmask);
> > : > /* here */
> > : > select();
> > : > if (!mask)
> > : > _sigprocmask(oldmask, NULL);
> > : >
> > : > I'm assuming that the problem is due to a signal arriving at /* here */.
> > :
> > : If that's _kernel_ code and the kernel behaves like Linux, it's not a
> > : problem because signals don't affect the control flow until returning
> > : to userspace, meaning the select() will return EINTR.
> >
> > It is currently user level code, and I'm looking at moving it into the
> > kernel, but I need to understand the race being talked about here.
>
> From the Linux man page on [p]select:
>
> "The reason that pselect() is needed is that if one wants to wait for
> either a signal or for a file descriptor to become ready, then an atomic
> test is needed to prevent race conditions. (Suppose the signal handler
> sets a global flag and returns. Then a test of this global flag followed
> by a call of select() could hang indefinitely if the signal arrived just
> after the test but just before the call. By contrast, pselect() allows
> one to first block signals, handle the signals that have come in, then
> call pselect() with the desired sigmask, avoiding the race.)"
There's another good expanded description on LWN too:
http://lwn.net/Articles/176911/
Regards,
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-05 18:21 ` Jan Kiszka
2008-11-05 18:41 ` Daniel P. Berrange
@ 2008-11-05 20:16 ` andrzej zaborowski
2008-11-05 20:28 ` Daniel P. Berrange
1 sibling, 1 reply; 28+ messages in thread
From: andrzej zaborowski @ 2008-11-05 20:16 UTC (permalink / raw)
To: qemu-devel
2008/11/5 Jan Kiszka <jan.kiszka@web.de>:
> M. Warner Losh wrote:
>> In message: <20081105150042.GJ13630@shareable.org>
>> Jamie Lokier <jamie@shareable.org> writes:
>> : M. Warner Losh wrote:
>> : > : In other words, don't use pselect() if you might run on a kernel older
>> : > : than 2.6.16, or on a host architecture which adds pselect() in a later
>> : > : kernel version. Also, I wouldn't be surprised if older versions of
>> : > : some BSDs have similar dodgy wrappers.
>> : >
>> : > Which ones have a good kernel implementation of it? FreeBSD's is
>> : > currently approximately:
>> : >
>> : > if (!mask)
>> : > _sigprocmask(mask, &oldmask);
>> : > /* here */
>> : > select();
>> : > if (!mask)
>> : > _sigprocmask(oldmask, NULL);
>> : >
>> : > I'm assuming that the problem is due to a signal arriving at /* here */.
>> :
>> : If that's _kernel_ code and the kernel behaves like Linux, it's not a
>> : problem because signals don't affect the control flow until returning
>> : to userspace, meaning the select() will return EINTR.
>>
>> It is currently user level code, and I'm looking at moving it into the
>> kernel, but I need to understand the race being talked about here.
>
> From the Linux man page on [p]select:
>
> "The reason that pselect() is needed is that if one wants to wait for
> either a signal or for a file descriptor to become ready, then an atomic
> test is needed to prevent race conditions. (Suppose the signal handler
> sets a global flag and returns. Then a test of this global flag followed
> by a call of select() could hang indefinitely if the signal arrived just
> after the test but just before the call. By contrast, pselect() allows
> one to first block signals, handle the signals that have come in, then
> call pselect() with the desired sigmask, avoiding the race.)"
>
> So the unmasking and possible blocking on select must be done
> atomically. And that is only feasible in kernel land.
To be exact, it *was* possible for glibc to implement a pselect free of races:
that is by using the same trick as your patch, i.e. making a pipe and
adding it to select()ed fd's and mangling the sigmask.
Cheers
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-05 20:16 ` andrzej zaborowski
@ 2008-11-05 20:28 ` Daniel P. Berrange
2008-11-05 23:38 ` Jamie Lokier
0 siblings, 1 reply; 28+ messages in thread
From: Daniel P. Berrange @ 2008-11-05 20:28 UTC (permalink / raw)
To: qemu-devel
On Wed, Nov 05, 2008 at 09:16:59PM +0100, andrzej zaborowski wrote:
> 2008/11/5 Jan Kiszka <jan.kiszka@web.de>:
> > M. Warner Losh wrote:
> >> In message: <20081105150042.GJ13630@shareable.org>
> >> Jamie Lokier <jamie@shareable.org> writes:
> >> : M. Warner Losh wrote:
> >> : > : In other words, don't use pselect() if you might run on a kernel older
> >> : > : than 2.6.16, or on a host architecture which adds pselect() in a later
> >> : > : kernel version. Also, I wouldn't be surprised if older versions of
> >> : > : some BSDs have similar dodgy wrappers.
> >> : >
> >> : > Which ones have a good kernel implementation of it? FreeBSD's is
> >> : > currently approximately:
> >> : >
> >> : > if (!mask)
> >> : > _sigprocmask(mask, &oldmask);
> >> : > /* here */
> >> : > select();
> >> : > if (!mask)
> >> : > _sigprocmask(oldmask, NULL);
> >> : >
> >> : > I'm assuming that the problem is due to a signal arriving at /* here */.
> >> :
> >> : If that's _kernel_ code and the kernel behaves like Linux, it's not a
> >> : problem because signals don't affect the control flow until returning
> >> : to userspace, meaning the select() will return EINTR.
> >>
> >> It is currently user level code, and I'm looking at moving it into the
> >> kernel, but I need to understand the race being talked about here.
> >
> > From the Linux man page on [p]select:
> >
> > "The reason that pselect() is needed is that if one wants to wait for
> > either a signal or for a file descriptor to become ready, then an atomic
> > test is needed to prevent race conditions. (Suppose the signal handler
> > sets a global flag and returns. Then a test of this global flag followed
> > by a call of select() could hang indefinitely if the signal arrived just
> > after the test but just before the call. By contrast, pselect() allows
> > one to first block signals, handle the signals that have come in, then
> > call pselect() with the desired sigmask, avoiding the race.)"
> >
> > So the unmasking and possible blocking on select must be done
> > atomically. And that is only feasible in kernel land.
>
> To be exact, it *was* possible for glibc to implement a pselect free of races:
> that is by using the same trick as your patch, i.e. making a pipe and
> adding it to select()ed fd's and mangling the sigmask.
Yes & no. The trouble with glibc using pipes behind your back is that
then it creates a totally different race in threaded apps, where a FD
could be leaked to a child process between glibc opening its secret
pipe and setting the O_CLOSEXEC flag. Indeed it already suffers from
this problem with name resolving
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-05 20:28 ` Daniel P. Berrange
@ 2008-11-05 23:38 ` Jamie Lokier
0 siblings, 0 replies; 28+ messages in thread
From: Jamie Lokier @ 2008-11-05 23:38 UTC (permalink / raw)
To: Daniel P. Berrange, qemu-devel
Daniel P. Berrange wrote:
> > To be exact, it *was* possible for glibc to implement a pselect
> > free of races: that is by using the same trick as your patch,
> > i.e. making a pipe and adding it to select()ed fd's and mangling
> > the sigmask.
>
> Yes & no. The trouble with glibc using pipes behind your back is that
> then it creates a totally different race in threaded apps, where a FD
> could be leaked to a child process between glibc opening its secret
> pipe and setting the O_CLOSEXEC flag. Indeed it already suffers from
> this problem with name resolving
That involves wrapping every signal handler too, and because of
threads, handlers would need to be wrapped all the time by a
sigaction() replacement - quite a lot of cruft in libc would be
required!
If you're writing libc, and willing to go to such crazy lengths to
implement pselect() "properly", you can fix the close-on-exec race by
wrapping execve() too.
Or you can avoid close-on-exec by using siglongjmp() from the wrapped
signal handlers to jump out of select(), not requiring an fd at all.
Unsurprisingly no libc that I know goes to these lengths.
-- Jamie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-05 16:10 ` M. Warner Losh
2008-11-05 18:21 ` Jan Kiszka
@ 2008-11-06 0:53 ` Jamie Lokier
2008-11-06 5:19 ` M. Warner Losh
1 sibling, 1 reply; 28+ messages in thread
From: Jamie Lokier @ 2008-11-06 0:53 UTC (permalink / raw)
To: M. Warner Losh; +Cc: jan.kiszka, qemu-devel
M. Warner Losh wrote:
> : > Which ones have a good kernel implementation of it? FreeBSD's is
> : > currently approximately:
> : >
> : > if (!mask)
> : > _sigprocmask(mask, &oldmask);
> : > /* here */
> : > select();
> : > if (!mask)
> : > _sigprocmask(oldmask, NULL);
> : >
> : > I'm assuming that the problem is due to a signal arriving at /* here */.
> :
> : If that's _kernel_ code and the kernel behaves like Linux, it's not a
> : problem because signals don't affect the control flow until returning
> : to userspace, meaning the select() will return EINTR.
>
> It is currently user level code, and I'm looking at moving it into the
> kernel, but I need to understand the race being talked about here.
Ugh, I had imagined FreeBSD would have got that right, since it's
quite good in other areas. I've added FreeBSD to my blacklist of
broken pselect() implementations, thanks for the info.
Do you know if FreeBSD's pread() and pwrite() are also thread-unsafe
userspace wrappers using lseek+read/write? They are harder to avoid
when you're looking at high performance code.
> Why is it no good. What is the race here? Is it just the oldmask
> thing and multiple callers to select, or is it something else?
It's racy with a single caller. The race is: program's signal handler
sets a flag like "alarm_happened = 1". The program's main loop checks
the flag before calling select(). If the signal is delivered before
that check, the program doesn't call select() and handles the reason
for the flag. If the signal is delivered during select(), that
returns EINTR and the program handles the reason for the flag. But if
the signal is delivered _between_ checking the flag and calling
select(), the program gets stuck.
pselect() avoids that stuck state, by blocking the signal before the
program checks the flag, and guaranteeing if the signal is delivered
after that point, pselect() returns EINTR. It's sort of analogous to
pthread_cond_wait() needing a mutex.
> And if it is the oldmask thing, why wouldn't multiple callers of
> pselect mess it up depending on what order they have.
Signal masks are per-thread anyway, multiple callers isn't an issue.
-- Jamie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Qemu-devel] Re: [5578] Increase default IO timeout from 10ms to 5s
2008-11-06 0:53 ` Jamie Lokier
@ 2008-11-06 5:19 ` M. Warner Losh
0 siblings, 0 replies; 28+ messages in thread
From: M. Warner Losh @ 2008-11-06 5:19 UTC (permalink / raw)
To: jamie; +Cc: jan.kiszka, qemu-devel
In message: <20081106005312.GA26173@shareable.org>
Jamie Lokier <jamie@shareable.org> writes:
: M. Warner Losh wrote:
: > : > Which ones have a good kernel implementation of it? FreeBSD's is
: > : > currently approximately:
: > : >
: > : > if (!mask)
: > : > _sigprocmask(mask, &oldmask);
: > : > /* here */
: > : > select();
: > : > if (!mask)
: > : > _sigprocmask(oldmask, NULL);
: > : >
: > : > I'm assuming that the problem is due to a signal arriving at /* here */.
: > :
: > : If that's _kernel_ code and the kernel behaves like Linux, it's not a
: > : problem because signals don't affect the control flow until returning
: > : to userspace, meaning the select() will return EINTR.
: >
: > It is currently user level code, and I'm looking at moving it into the
: > kernel, but I need to understand the race being talked about here.
:
: Ugh, I had imagined FreeBSD would have got that right, since it's
: quite good in other areas. I've added FreeBSD to my blacklist of
: broken pselect() implementations, thanks for the info.
:
: Do you know if FreeBSD's pread() and pwrite() are also thread-unsafe
: userspace wrappers using lseek+read/write? They are harder to avoid
: when you're looking at high performance code.
I haven't looked...
: > Why is it no good. What is the race here? Is it just the oldmask
: > thing and multiple callers to select, or is it something else?
:
: It's racy with a single caller. The race is: program's signal handler
: sets a flag like "alarm_happened = 1". The program's main loop checks
: the flag before calling select(). If the signal is delivered before
: that check, the program doesn't call select() and handles the reason
: for the flag. If the signal is delivered during select(), that
: returns EINTR and the program handles the reason for the flag. But if
: the signal is delivered _between_ checking the flag and calling
: select(), the program gets stuck.
:
: pselect() avoids that stuck state, by blocking the signal before the
: program checks the flag, and guaranteeing if the signal is delivered
: after that point, pselect() returns EINTR. It's sort of analogous to
: pthread_cond_wait() needing a mutex.
OK. that makes sense.
: > And if it is the oldmask thing, why wouldn't multiple callers of
: > pselect mess it up depending on what order they have.
:
: Signal masks are per-thread anyway, multiple callers isn't an issue.
OK. I have never had good things happen with signals and threads...
Warner
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2008-11-06 5:20 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-31 18:40 [Qemu-devel] [5578] Increase default IO timeout from 10ms to 5s Anthony Liguori
2008-10-31 18:52 ` Anthony Liguori
2008-11-02 19:08 ` [Qemu-devel] " Jan Kiszka
2008-11-03 20:04 ` Anthony Liguori
2008-11-03 20:36 ` Jan Kiszka
2008-11-03 21:50 ` Jan Kiszka
2008-11-03 22:00 ` Anthony Liguori
2008-11-03 22:03 ` Jan Kiszka
2008-11-04 8:07 ` andrzej zaborowski
2008-11-04 8:22 ` Jan Kiszka
2008-11-04 8:33 ` andrzej zaborowski
2008-11-04 11:32 ` Jamie Lokier
2008-11-04 16:22 ` M. Warner Losh
2008-11-04 17:10 ` Jan Kiszka
2008-11-04 17:55 ` M. Warner Losh
2008-11-04 19:08 ` Jan Kiszka
2008-11-05 15:00 ` Jamie Lokier
2008-11-05 16:10 ` M. Warner Losh
2008-11-05 18:21 ` Jan Kiszka
2008-11-05 18:41 ` Daniel P. Berrange
2008-11-05 20:16 ` andrzej zaborowski
2008-11-05 20:28 ` Daniel P. Berrange
2008-11-05 23:38 ` Jamie Lokier
2008-11-06 0:53 ` Jamie Lokier
2008-11-06 5:19 ` M. Warner Losh
2008-11-04 8:29 ` Avi Kivity
2008-10-31 19:41 ` [Qemu-devel] " Jamie Lokier
2008-10-31 20:13 ` Anthony Liguori
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).