* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
[not found] ` <1466581044.3188.34.camel@gmail.com>
@ 2016-06-22 8:44 ` Thomas Gleixner
2016-06-22 9:06 ` Mike Galbraith
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Thomas Gleixner @ 2016-06-22 8:44 UTC (permalink / raw)
To: ltp
B1;2802;0cOn Wed, 22 Jun 2016, Mike Galbraith wrote:
> FWIW, testing with ltp, I noticed a new failure in logs. It turns out
> to be intermittent, but the testcase mostly fails.
You forgot to cc the LTP folks ...
> rtbox:~ # /usr/local/ltp/conformance/interfaces/sigtimedwait/sigtimedwait_1-1.run-test
> Test FAILED: sigtimedwait() did not return in the required time
> time_elapsed: 1.197057
> ...come on, you can do it...
> rtbox:~ # /usr/local/ltp/conformance/interfaces/sigtimedwait/sigtimedwait_1-1.run-test
> Test PASSED
>
> #define ERRORMARGIN 0.1
> ...
> if ((time_elapsed > SIGTIMEDWAITSEC + ERRORMARGIN)
> || (time_elapsed < SIGTIMEDWAITSEC - ERRORMARGIN)) {
> printf("Test FAILED: sigtimedwait() did not return in "
> "the required time\n");
> printf("time_elapsed: %lf\n", time_elapsed);
> return PTS_FAIL;
> }
>
> Looks hohum to me, but gripe did arrive with patch set, so you get a note.
hohum is a euphemism. That's completely bogus.
The only guarantee a syscall with timers has is: timer does not fire early.
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
2016-06-22 8:44 ` [LTP] [patch V2 00/20] timer: Refactor the timer wheel Thomas Gleixner
@ 2016-06-22 9:06 ` Mike Galbraith
2016-06-22 13:37 ` Mike Galbraith
2016-06-22 10:28 ` Cyril Hrubis
[not found] ` <20160626190025.GC11162@amd>
2 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2016-06-22 9:06 UTC (permalink / raw)
To: ltp
On Wed, 2016-06-22 at 10:44 +0200, Thomas Gleixner wrote:
> B1;2802;0cOn Wed, 22 Jun 2016, Mike Galbraith wrote:
> > FWIW, testing with ltp, I noticed a new failure in logs. It turns
> out
> > to be intermittent, but the testcase mostly fails.
>
> You forgot to cc the LTP folks ...
This ain't the only one, it's just new. I'll mention it.
File under FYI/FWIW: I also plugged the set into RT, and nothing fell
out of local boxen. The below is falling out of my 8 socket box
though.. maybe a portage booboo.
[ 1503.988863] clocksource: timekeeping watchdog on CPU42: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 1504.203800] clocksource: 'hpet' wd_now: 38b55bb wd_last: 8303f269 mask: ffffffff
[ 1504.296111] clocksource: 'tsc' cs_now: 3a3aa717794 cs_last: 354624eea7b mask: ffffffffffffffff
[ 1504.402329] clocksource: Switched to clocksource hpet
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
2016-06-22 8:44 ` [LTP] [patch V2 00/20] timer: Refactor the timer wheel Thomas Gleixner
2016-06-22 9:06 ` Mike Galbraith
@ 2016-06-22 10:28 ` Cyril Hrubis
2016-06-23 8:27 ` Thomas Gleixner
[not found] ` <20160626190025.GC11162@amd>
2 siblings, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2016-06-22 10:28 UTC (permalink / raw)
To: ltp
Hi!
> > rtbox:~ # /usr/local/ltp/conformance/interfaces/sigtimedwait/sigtimedwait_1-1.run-test
> > Test FAILED: sigtimedwait() did not return in the required time
> > time_elapsed: 1.197057
> > ...come on, you can do it...
> > rtbox:~ # /usr/local/ltp/conformance/interfaces/sigtimedwait/sigtimedwait_1-1.run-test
> > Test PASSED
> >
> > #define ERRORMARGIN 0.1
> > ...
> > if ((time_elapsed > SIGTIMEDWAITSEC + ERRORMARGIN)
> > || (time_elapsed < SIGTIMEDWAITSEC - ERRORMARGIN)) {
> > printf("Test FAILED: sigtimedwait() did not return in "
> > "the required time\n");
> > printf("time_elapsed: %lf\n", time_elapsed);
> > return PTS_FAIL;
> > }
> >
> > Looks hohum to me, but gripe did arrive with patch set, so you get a note.
>
> hohum is a euphemism. That's completely bogus.
>
> The only guarantee a syscall with timers has is: timer does not fire early.
While this is true, checking with reasonable error margin works just
fine 99% of the time. You cannot really test that timer expires, without
setting arbitrary margin.
Looking into POSIX sigtimedwait() timer should run on CLOCK_MONOTONIC so
we can call clock_getres(CLOCK_MONOTOINC, ...) double or tripple the
value and use it for error margin. And also fix the test to use
the CLOCK_MONOTONIC timer.
And of course the error margin must not be used when we check that the
elapsed time wasn't shorter than we expected.
Does that sound reasonable?
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
2016-06-22 9:06 ` Mike Galbraith
@ 2016-06-22 13:37 ` Mike Galbraith
0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2016-06-22 13:37 UTC (permalink / raw)
To: ltp
On Wed, 2016-06-22 at 11:06 +0200, Mike Galbraith wrote:
> On Wed, 2016-06-22 at 10:44 +0200, Thomas Gleixner wrote:
> > B1;2802;0cOn Wed, 22 Jun 2016, Mike Galbraith wrote:
> > > FWIW, testing with ltp, I noticed a new failure in logs. It turns
> > out
> > > to be intermittent, but the testcase mostly fails.
> >
> > You forgot to cc the LTP folks ...
>
> This ain't the only one, it's just new. I'll mention it.
>
> File under FYI/FWIW: I also plugged the set into RT, and nothing fell
> out of local boxen. The below is falling out of my 8 socket box
> though.. maybe a portage booboo.
>
> [ 1503.988863] clocksource: timekeeping watchdog on CPU42: Marking clocksource 'tsc' as unstable because the skew is too large:
> [ 1504.203800] clocksource: 'hpet' wd_now: 38b55bb wd_last: 8303f269 mask: ffffffff
> [ 1504.296111] clocksource: 'tsc' cs_now: 3a3aa717794 cs_last: 354624eea7b mask: ffffffffffffffff
> [ 1504.402329] clocksource: Switched to clocksource hpet
Nope, not RT portage booboo. Virgin x86-tip/WIP.timers..
vogelweide:~/:[130]# dmesg|grep clocksource:
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
[ 5.608205] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[ 9.151208] clocksource: Switched to clocksource hpet
[ 9.485907] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 11.947226] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x20974986637, max_idle_ns: 440795286310 ns
[ 13.012145] clocksource: Switched to clocksource tsc
[ 434.868215] clocksource: timekeeping watchdog on CPU59: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 434.982251] clocksource: 'hpet' wd_now: 732cdf37 wd_last: df3d99d8 mask: ffffffff
[ 435.085875] clocksource: 'tsc' cs_now: 16e6780d1eb cs_last: 11326fa576e mask: ffffffffffffffff
[ 435.211249] clocksource: Switched to clocksource hpet
vogelweide:~/:[0]#
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
2016-06-22 10:28 ` Cyril Hrubis
@ 2016-06-23 8:27 ` Thomas Gleixner
2016-06-23 11:47 ` Cyril Hrubis
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2016-06-23 8:27 UTC (permalink / raw)
To: ltp
On Wed, 22 Jun 2016, Cyril Hrubis wrote:
> Hi!
> > > rtbox:~ # /usr/local/ltp/conformance/interfaces/sigtimedwait/sigtimedwait_1-1.run-test
> > > Test FAILED: sigtimedwait() did not return in the required time
> > > time_elapsed: 1.197057
> > > ...come on, you can do it...
> > > rtbox:~ # /usr/local/ltp/conformance/interfaces/sigtimedwait/sigtimedwait_1-1.run-test
> > > Test PASSED
> > >
> > > #define ERRORMARGIN 0.1
> > > ...
> > > if ((time_elapsed > SIGTIMEDWAITSEC + ERRORMARGIN)
> > > || (time_elapsed < SIGTIMEDWAITSEC - ERRORMARGIN)) {
> > > printf("Test FAILED: sigtimedwait() did not return in "
> > > "the required time\n");
> > > printf("time_elapsed: %lf\n", time_elapsed);
> > > return PTS_FAIL;
> > > }
> > >
> > > Looks hohum to me, but gripe did arrive with patch set, so you get a note.
> >
> > hohum is a euphemism. That's completely bogus.
> >
> > The only guarantee a syscall with timers has is: timer does not fire early.
>
> While this is true, checking with reasonable error margin works just
> fine 99% of the time. You cannot really test that timer expires, without
> setting arbitrary margin.
Err. You know that the timer expired because sigtimedwait() returns
EAGAIN. And the only thing you can reliably check for is that the timer did
not expired to early. Anything else is guesswork and voodoo programming.
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
2016-06-23 8:27 ` Thomas Gleixner
@ 2016-06-23 11:47 ` Cyril Hrubis
[not found] ` <20160623135803.636.qmail@ns.sciencehorizons.net>
0 siblings, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2016-06-23 11:47 UTC (permalink / raw)
To: ltp
Hi!
> > While this is true, checking with reasonable error margin works just
> > fine 99% of the time. You cannot really test that timer expires, without
> > setting arbitrary margin.
>
> Err. You know that the timer expired because sigtimedwait() returns
> EAGAIN. And the only thing you can reliably check for is that the timer did
> not expired to early. Anything else is guesswork and voodoo programming.
There is quite a lot of things that can happen on mutitasking OS and
there are even NMIs in hardware, etc. But seriously is there a reason
why OS that is not under heavy load cannot expire timers with reasonable
overruns? I.e. if I ask for a second of sleep and expect it to be woken
up not much more than half of a second later?
If we stick only to guarantees that are defined in POSIX playing music
with mplayer would not be possible since it sleeps in futex() and if it
wakes too late it will fail to fill buffers. In practice this worked
fine for me for years.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
[not found] ` <20160623135803.636.qmail@ns.sciencehorizons.net>
@ 2016-06-23 14:10 ` Thomas Gleixner
2016-06-23 15:11 ` Cyril Hrubis
1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2016-06-23 14:10 UTC (permalink / raw)
To: ltp
On Thu, 23 Jun 2016, George Spelvin wrote:
> Cyril Hrubis wrote:
> > Thomas Gleixner wrote:
> >> Err. You know that the timer expired because sigtimedwait() returns
> >> EAGAIN. And the only thing you can reliably check for is that the timer did
> >> not expired to early. Anything else is guesswork and voodoo programming.
>
> > But seriously is there a reason
> > why OS that is not under heavy load cannot expire timers with reasonable
> > overruns? I.e. if I ask for a second of sleep and expect it to be woken
> > up not much more than half of a second later?
>
> > If we stick only to guarantees that are defined in POSIX playing music
> > with mplayer would not be possible since it sleeps in futex() and if it
> > wakes too late it will fail to fill buffers. In practice this worked
> > fine for me for years.
>
> Two points:
> 1) sigtimedwait() is unusual in that it uses the jiffies timer. Most
> system call timeouts (including specifically the one in FUTEX_WAIT)
> use the high-resolution timer subsystem, which is a whole different
> animal with tighter guarantees, and
As Peter said we want to convert sigtimedwait() to use hrtimers as well. We
converted almost all syscalls with timeouts (futex, poll, select ....) to
hrtimers years ago, but somehow we missed to do the same to sigtimedwait.
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
[not found] ` <20160623135803.636.qmail@ns.sciencehorizons.net>
2016-06-23 14:10 ` Thomas Gleixner
@ 2016-06-23 15:11 ` Cyril Hrubis
2016-06-23 15:21 ` Thomas Gleixner
1 sibling, 1 reply; 11+ messages in thread
From: Cyril Hrubis @ 2016-06-23 15:11 UTC (permalink / raw)
To: ltp
Hi!
> Two points:
> 1) sigtimedwait() is unusual in that it uses the jiffies timer. Most
> system call timeouts (including specifically the one in FUTEX_WAIT)
> use the high-resolution timer subsystem, which is a whole different
> animal with tighter guarantees, and
That is likely POSIX conformance bug, since POSIX explicitly states that
sigtimedwait() shall use CLOCK_MONOTONIC to measure the timeout.
"If the Monotonic Clock option is supported, the CLOCK_MONOTONIC clock
shall be used to measure the time interval specified by the timeout
argument."
> 2) The worst-case error in tglx's proposal is 1/8 of the requested
> timeout: the wakeup is after 112.5% of the requested time, plus
> one tick. This is well within your requested accuracy. (For very
> short timeouts, the "plus one tick" can dominate the percentage error.)
Hmm, that still does not add up to the number in the original email
where it says time_elapsed: 1.197057. As far as I can tell the worst
case for a tick is CONFIG_HZ=100 so one tick is 0.01s and even after
that we get 118.7% since we requested 1s. But that may be caused by the
fact that the test uses gettimeofday() to measure the elapsed time, it
should use CLOCK_MONOTONIC instead.
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
2016-06-23 15:11 ` Cyril Hrubis
@ 2016-06-23 15:21 ` Thomas Gleixner
2016-06-23 16:31 ` Cyril Hrubis
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2016-06-23 15:21 UTC (permalink / raw)
To: ltp
On Thu, 23 Jun 2016, Cyril Hrubis wrote:
> > 1) sigtimedwait() is unusual in that it uses the jiffies timer. Most
> > system call timeouts (including specifically the one in FUTEX_WAIT)
> > use the high-resolution timer subsystem, which is a whole different
> > animal with tighter guarantees, and
>
> That is likely POSIX conformance bug, since POSIX explicitly states that
> sigtimedwait() shall use CLOCK_MONOTONIC to measure the timeout.
>
> "If the Monotonic Clock option is supported, the CLOCK_MONOTONIC clock
> shall be used to measure the time interval specified by the timeout
> argument."
That's fine because jiffies is a less granular form of CLOCK_MONOTONIC.
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
2016-06-23 15:21 ` Thomas Gleixner
@ 2016-06-23 16:31 ` Cyril Hrubis
0 siblings, 0 replies; 11+ messages in thread
From: Cyril Hrubis @ 2016-06-23 16:31 UTC (permalink / raw)
To: ltp
Hi!
> > That is likely POSIX conformance bug, since POSIX explicitly states that
> > sigtimedwait() shall use CLOCK_MONOTONIC to measure the timeout.
> >
> > "If the Monotonic Clock option is supported, the CLOCK_MONOTONIC clock
> > shall be used to measure the time interval specified by the timeout
> > argument."
>
> That's fine because jiffies is a less granular form of CLOCK_MONOTONIC.
Looking into POSIX Realtime Clock and Timers it seems to allow that time
service based on CLOCK_* clocks to have different resolution if it's
less or equal than 20ms and if this fact is documented. If we wanted to
be pedantic about this the man page shoud be patched...
Also this gives us reasonably safe upper bound on timer expiration to be
something as:
sleep_time * 1.125 + 20ms
Does this sounds reasonable now?
--
Cyril Hrubis
chrubis@suse.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* [LTP] [patch V2 00/20] timer: Refactor the timer wheel
[not found] ` <20160626190025.GC11162@amd>
@ 2016-06-26 19:21 ` Arjan van de Ven
0 siblings, 0 replies; 11+ messages in thread
From: Arjan van de Ven @ 2016-06-26 19:21 UTC (permalink / raw)
To: ltp
On Sun, Jun 26, 2016 at 12:00 PM, Pavel Machek <pavel@ucw.cz> wrote:
>
> Umm. I'm not sure if you should be designing kernel...
>
> I have alarm clock application. It does sleep(60) many times till its
> time to wake me up. I'll be very angry if sleep(60) takes 65 seconds
> without some very, very good reason.
I'm fairly sure you shouldn't be designing alarm clock applications!
Because on busy systems you get random (scheduler) delays added to your timer.
Having said that, your example is completely crooked here, sleep()
does not use these kernel timers, it uses hrtimers instead.
(hrtimers also have slack, but an alarm clock application that is this
broken would have the choice to set such slack to 0)
What happened here is that these sigtimewait were actually not great,
it is just about the only application visible interface that's still
in jiffies/HZ,
and in the follow-on patch set, Thomas converted them properly to
hrtimers as well to make them both accurate and CONFIG_HZ independent.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-06-26 19:21 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20160617121134.417319325@linutronix.de>
[not found] ` <1466581044.3188.34.camel@gmail.com>
2016-06-22 8:44 ` [LTP] [patch V2 00/20] timer: Refactor the timer wheel Thomas Gleixner
2016-06-22 9:06 ` Mike Galbraith
2016-06-22 13:37 ` Mike Galbraith
2016-06-22 10:28 ` Cyril Hrubis
2016-06-23 8:27 ` Thomas Gleixner
2016-06-23 11:47 ` Cyril Hrubis
[not found] ` <20160623135803.636.qmail@ns.sciencehorizons.net>
2016-06-23 14:10 ` Thomas Gleixner
2016-06-23 15:11 ` Cyril Hrubis
2016-06-23 15:21 ` Thomas Gleixner
2016-06-23 16:31 ` Cyril Hrubis
[not found] ` <20160626190025.GC11162@amd>
2016-06-26 19:21 ` Arjan van de Ven
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox