Detecting shift of CLOCK_REALTIME with clock

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
@ 2012-10-31 15:54 Scot Salmon
  2012-10-31 20:08 ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Scot Salmon @ 2012-10-31 15:54 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Thomas Gleixner, Alexander Shishkin

At RTLWS, I talked to tglx about patching clock_nanosleep to detect shifts 
in CLOCK_REALTIME.  He suggested I bring it up here for comments.

Alexander Shishkin proposed a patch for this a while back: 
http://thread.gmane.org/gmane.linux.kernel/1110759/focus=1131917

At that time Thomas rightly responded that hacking the POSIX 
clock_nanosleep API (both overloading a parameter and adding a new return 
value) was quite horrible.  The nanosleep use case in that thread seemed 
somewhat speculative, with references to "weird" cron jobs, and Alexander 
seemed satisfied when Thomas added this feature to timerfd.

I described a more concrete use case to Thomas that is not solved by 
timerfd.  We have multiple devices running control loops using 
clock_nanosleep and TIMER_ABSTIME to get good periodic wakeups.  The 
clocks need to be synchronized across the controllers so that the loops 
themselves can be in sync.  In order to use a synchronized clock we have 
to use CLOCK_REALTIME.  But if the control loop starts, and then the time 
sync protocol kicks in and shifts the clock, that breaks the control loop, 
the most obvious case being if time shifts backwards and a loop that 
should be running at 100us takes 100us + some arbitrary amount of time 
shift, potentially measured in minutes or even days.  timerfd has the 
behavior I need, but its performance is much worse than clock_nanosleep, 
we believe because the wakeup goes through ksoftirqd.

I applied Alexander's patches 2 and 3, which is the subset I care about 
(avoiding 1 and 4, since we already implemented 1 as a first pass at this 
issue, and Thomas already implemented 4 separately), and it does what I 
need.  But, Thomas's objections are still valid so I am not sure I like 
this solution.

Could reworking the way that timerfd handles wakeups be an option?  We'd 
like some guidance: if that's something you feel worth investigating, 
we're willing to do so.

-Scot

------
Scot Salmon (scot.salmon@ni.com)
National Instruments, Austin, TX
101010, 222, 52, ...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-10-31 15:54 Detecting shift of CLOCK_REALTIME with clock_nanosleep (again) Scot Salmon
@ 2012-10-31 20:08 ` Thomas Gleixner
  2012-11-15 19:28   ` Scot Salmon
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2012-10-31 20:08 UTC (permalink / raw)
  To: Scot Salmon
  Cc: linux-rt-users, Alexander Shishkin, Peter Zijlstra, John Stultz

On Wed, 31 Oct 2012, Scot Salmon wrote:
> I described a more concrete use case to Thomas that is not solved by 
> timerfd.  We have multiple devices running control loops using 
> clock_nanosleep and TIMER_ABSTIME to get good periodic wakeups.  The 
> clocks need to be synchronized across the controllers so that the loops 
> themselves can be in sync.  In order to use a synchronized clock we have 
> to use CLOCK_REALTIME.  But if the control loop starts, and then the time 
> sync protocol kicks in and shifts the clock, that breaks the control loop, 
> the most obvious case being if time shifts backwards and a loop that 
> should be running at 100us takes 100us + some arbitrary amount of time 
> shift, potentially measured in minutes or even days.  timerfd has the 
> behavior I need, but its performance is much worse than clock_nanosleep, 
> we believe because the wakeup goes through ksoftirqd.

With less conference induced brain damage I think your problem needs
to be solved differently.

What you are concerned about is keeping the machines in sync on a
common global timeline. Though your more fundamental requirement is
that you get the wakeup on each machine in the given cycle time. The
global synchronization mechanism just adjusts that local periodic
schedule.

So when you start up a control process on a node you align the cycle
time of this node to the global CLOCK_REALTIME timeline. That's why
you decided to use CLOCK_REALTIME in the first place, but then as you
observed correctly this sucks due to the nature of CLOCK_REALTIME
which can be affected by leap seconds, daylight saving changes and
other interesting events.

So ideally you should use CLOCK_MONOTONIC for scheduling your periodic
timeline, but you can't as you do not have a proper correlation
between CLOCK_REALTIME, which provides your global synchronization,
and the machine local CLOCK_MONOTONIC.

What you really want is an atomic readout facility for CLOCK_MONOTONIC
and CLOCK_REALTIME. That allows you to align the CLOCK_MONOTONIC based
timer with the global CLOCK_REALTIME based time line and in the event
that the CLOCK_REALTIME clock was set and jumped forward/backward you
have full software control over the aligning mechanism including the
ability to do sanity checking.

Lets look at an example.

T1   1000
     1050	<--- Time correction resets global time to 1000
T2   1100

Now you have the problem when your wakeup is actually happening. 50 us
delta is not a huge amount of time to propagate this to all CPUs and
all involved distributed systems. So what happens if system 1 sees
that update right away, but system 2 sees it just at the real timer
wakeup point? Then suddenly your loops are off by 50us for at least
one cycle. Not what you want, right?

So in the CLOCK_MONOTONIC case you still maintain the accuracy of your
periodic 100us event. The accuracy of CLOCK_MONOTONIC across (NTP/PTP)
time synced systems is way better than any mechanism which relies on
"timely" notification of CLOCK_REALTIME changes.

The minimal clock skew adjustments which affect the global
CLOCK_REALTIME are propagated to CLOCK_MONOTONIC as well, so you don't
have to worry about that at all. All what you need to be concerned
about is the time jump issue. But then again CLOCK_MONOTONIC will not
follow those time jumps and therefor maintain your XXXus periods for
quite some time with accurate synchronous behaviour.

With an atomic readout of CLOCK_MONOTONIC and CLOCK_REALTIME you can
be clever and safe about adjusting to a 50us or whatever large scale
global time line change. You can actually verify in your cluster
whether this was a legitimate change or just the random typo of the
sysadmin and you can agree on how to deal with the time jump in a
coordinated way, i.e. jumping forward sychronously on a given time
stamp or gradually adjusting it in microsecond steps.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-10-31 20:08 ` Thomas Gleixner
@ 2012-11-15 19:28   ` Scot Salmon
  2012-11-15 20:53     ` John Stultz
  0 siblings, 1 reply; 15+ messages in thread
From: Scot Salmon @ 2012-11-15 19:28 UTC (permalink / raw)
  To: linux-rt-users; +Cc: John Stultz, Thomas Gleixner

Thomas Gleixner <tglx@linutronix.de> wrote on 10/31/2012 03:08:45 PM:
> What you are concerned about is keeping the machines in sync on a
> common global timeline. Though your more fundamental requirement is
> that you get the wakeup on each machine in the given cycle time. The
> global synchronization mechanism just adjusts that local periodic
> schedule.
[...]
> What you really want is an atomic readout facility for CLOCK_MONOTONIC
> and CLOCK_REALTIME. That allows you to align the CLOCK_MONOTONIC based
> timer with the global CLOCK_REALTIME based time line and in the event
> that the CLOCK_REALTIME clock was set and jumped forward/backward you
> have full software control over the aligning mechanism including the
> ability to do sanity checking.

This works for me.  We discussed it on IRC and agreed on "something
like clock_gettime2(id, id, *t1, *t2) with a sanity check on the ids,
that allows us other combinations than MONO/REAL as well".

-Scot


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-11-15 19:28   ` Scot Salmon
@ 2012-11-15 20:53     ` John Stultz
  2012-11-15 21:01       ` Thomas Gleixner
  2012-12-19 20:43       ` Scot Salmon
  0 siblings, 2 replies; 15+ messages in thread
From: John Stultz @ 2012-11-15 20:53 UTC (permalink / raw)
  To: Scot Salmon; +Cc: linux-rt-users, Thomas Gleixner

On 11/15/2012 11:28 AM, Scot Salmon wrote:
> Thomas Gleixner <tglx@linutronix.de> wrote on 10/31/2012 03:08:45 PM:
>> What you are concerned about is keeping the machines in sync on a
>> common global timeline. Though your more fundamental requirement is
>> that you get the wakeup on each machine in the given cycle time. The
>> global synchronization mechanism just adjusts that local periodic
>> schedule.
> [...]
>> What you really want is an atomic readout facility for CLOCK_MONOTONIC
>> and CLOCK_REALTIME. That allows you to align the CLOCK_MONOTONIC based
>> timer with the global CLOCK_REALTIME based time line and in the event
>> that the CLOCK_REALTIME clock was set and jumped forward/backward you
>> have full software control over the aligning mechanism including the
>> ability to do sanity checking.
> This works for me.  We discussed it on IRC and agreed on "something
> like clock_gettime2(id, id, *t1, *t2) with a sanity check on the ids,
> that allows us other combinations than MONO/REAL as well".
Yea, there's been a few times that folks have brought up the need, and 
this sort of interface is probably the best way to go.

I suspect clock_gettime3 isn't far behind though :)

thanks
-john


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-11-15 20:53     ` John Stultz
@ 2012-11-15 21:01       ` Thomas Gleixner
  2012-11-15 22:25         ` John Stultz
  2012-12-19 20:43       ` Scot Salmon
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2012-11-15 21:01 UTC (permalink / raw)
  To: John Stultz; +Cc: Scot Salmon, linux-rt-users

On Thu, 15 Nov 2012, John Stultz wrote:
> On 11/15/2012 11:28 AM, Scot Salmon wrote:
> > Thomas Gleixner <tglx@linutronix.de> wrote on 10/31/2012 03:08:45 PM:
> > > What you are concerned about is keeping the machines in sync on a
> > > common global timeline. Though your more fundamental requirement is
> > > that you get the wakeup on each machine in the given cycle time. The
> > > global synchronization mechanism just adjusts that local periodic
> > > schedule.
> > [...]
> > > What you really want is an atomic readout facility for CLOCK_MONOTONIC
> > > and CLOCK_REALTIME. That allows you to align the CLOCK_MONOTONIC based
> > > timer with the global CLOCK_REALTIME based time line and in the event
> > > that the CLOCK_REALTIME clock was set and jumped forward/backward you
> > > have full software control over the aligning mechanism including the
> > > ability to do sanity checking.
> > This works for me.  We discussed it on IRC and agreed on "something
> > like clock_gettime2(id, id, *t1, *t2) with a sanity check on the ids,
> > that allows us other combinations than MONO/REAL as well".
> Yea, there's been a few times that folks have brought up the need, and this
> sort of interface is probably the best way to go.
> 
> I suspect clock_gettime3 isn't far behind though :)

I guess you are talking about MONO/MONORAW/REAL. So if you already
know about such requirements we should go for that, but we should try
to restrict it to combinations which can be easily supported in the
VDSO as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-11-15 21:01       ` Thomas Gleixner
@ 2012-11-15 22:25         ` John Stultz
  2012-11-19 18:27           ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: John Stultz @ 2012-11-15 22:25 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Scot Salmon, linux-rt-users

On 11/15/2012 01:01 PM, Thomas Gleixner wrote:
> On Thu, 15 Nov 2012, John Stultz wrote:
>> On 11/15/2012 11:28 AM, Scot Salmon wrote:
>>> Thomas Gleixner <tglx@linutronix.de> wrote on 10/31/2012 03:08:45 PM:
>>>> What you are concerned about is keeping the machines in sync on a
>>>> common global timeline. Though your more fundamental requirement is
>>>> that you get the wakeup on each machine in the given cycle time. The
>>>> global synchronization mechanism just adjusts that local periodic
>>>> schedule.
>>> [...]
>>>> What you really want is an atomic readout facility for CLOCK_MONOTONIC
>>>> and CLOCK_REALTIME. That allows you to align the CLOCK_MONOTONIC based
>>>> timer with the global CLOCK_REALTIME based time line and in the event
>>>> that the CLOCK_REALTIME clock was set and jumped forward/backward you
>>>> have full software control over the aligning mechanism including the
>>>> ability to do sanity checking.
>>> This works for me.  We discussed it on IRC and agreed on "something
>>> like clock_gettime2(id, id, *t1, *t2) with a sanity check on the ids,
>>> that allows us other combinations than MONO/REAL as well".
>> Yea, there's been a few times that folks have brought up the need, and this
>> sort of interface is probably the best way to go.
>>
>> I suspect clock_gettime3 isn't far behind though :)
> I guess you are talking about MONO/MONORAW/REAL.
Or MONO/BOOT/REAL  or eventually MONO/TAI/REAL, etc..

But there's probably someone out there who will want all kernel state 
exported atomically, and its just not reasonable.

The real problem is that figuring out the relationship between clocks 
has required the three read interpoloation, which isn't something easily 
or quickly done with good accuracy. If we provide clock_gettime2(), that 
will allow the relationship between clockid pairs to be accurately 
determined. And if folks need to calculate the relationship between 
three clockids, they can do a similar three read and bound the output.
ie:
do {
     clock_gettime2(MONO, REAL, mon1,real1);
     clock_gettime2(MONO,BOOT, mon2,boot2);
     clock_gettime2(MONO,REAL, mon3, real3);
} while( real1-mon1 != real3-mon3);

So it might be over-designing to provide a clock_gettime3 from the start 
(since then folks will want clock_gettime5.. :)

>   So if you already
> know about such requirements we should go for that, but we should try
> to restrict it to combinations which can be easily supported in the
> VDSO as well.
Other then the cputime clockids, I'm not sure I see what combinations 
wouldn't work with the vdso.
As long as the clocksource is accessible, everything else is exportable.


thanks
-john


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-11-15 22:25         ` John Stultz
@ 2012-11-19 18:27           ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2012-11-19 18:27 UTC (permalink / raw)
  To: John Stultz; +Cc: Scot Salmon, linux-rt-users

On Thu, 15 Nov 2012, John Stultz wrote:
> On 11/15/2012 01:01 PM, Thomas Gleixner wrote:
> > On Thu, 15 Nov 2012, John Stultz wrote:
> > > On 11/15/2012 11:28 AM, Scot Salmon wrote:
> > > > Thomas Gleixner <tglx@linutronix.de> wrote on 10/31/2012 03:08:45 PM:
> > > > > What you are concerned about is keeping the machines in sync on a
> > > > > common global timeline. Though your more fundamental requirement is
> > > > > that you get the wakeup on each machine in the given cycle time. The
> > > > > global synchronization mechanism just adjusts that local periodic
> > > > > schedule.
> > > > [...]
> > > > > What you really want is an atomic readout facility for CLOCK_MONOTONIC
> > > > > and CLOCK_REALTIME. That allows you to align the CLOCK_MONOTONIC based
> > > > > timer with the global CLOCK_REALTIME based time line and in the event
> > > > > that the CLOCK_REALTIME clock was set and jumped forward/backward you
> > > > > have full software control over the aligning mechanism including the
> > > > > ability to do sanity checking.
> > > > This works for me.  We discussed it on IRC and agreed on "something
> > > > like clock_gettime2(id, id, *t1, *t2) with a sanity check on the ids,
> > > > that allows us other combinations than MONO/REAL as well".
> > > Yea, there's been a few times that folks have brought up the need, and
> > > this
> > > sort of interface is probably the best way to go.
> > > 
> > > I suspect clock_gettime3 isn't far behind though :)
> > I guess you are talking about MONO/MONORAW/REAL.
> Or MONO/BOOT/REAL  or eventually MONO/TAI/REAL, etc..
> 
> But there's probably someone out there who will want all kernel state exported
> atomically, and its just not reasonable.
> 
> The real problem is that figuring out the relationship between clocks has
> required the three read interpoloation, which isn't something easily or
> quickly done with good accuracy. If we provide clock_gettime2(), that will
> allow the relationship between clockid pairs to be accurately determined. And
> if folks need to calculate the relationship between three clockids, they can
> do a similar three read and bound the output.
> ie:
> do {
>     clock_gettime2(MONO, REAL, mon1,real1);
>     clock_gettime2(MONO,BOOT, mon2,boot2);
>     clock_gettime2(MONO,REAL, mon3, real3);
> } while( real1-mon1 != real3-mon3);
> 
> So it might be over-designing to provide a clock_gettime3 from the start
> (since then folks will want clock_gettime5.. :)

Fair enough!
 
> >   So if you already
> > know about such requirements we should go for that, but we should try
> > to restrict it to combinations which can be easily supported in the
> > VDSO as well.
> Other then the cputime clockids, I'm not sure I see what combinations wouldn't
> work with the vdso.
> As long as the clocksource is accessible, everything else is exportable.

Ok. We probably should add the missing ones to the VDSO then.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-11-15 20:53     ` John Stultz
  2012-11-15 21:01       ` Thomas Gleixner
@ 2012-12-19 20:43       ` Scot Salmon
  2012-12-19 20:57         ` John Stultz
  1 sibling, 1 reply; 15+ messages in thread
From: Scot Salmon @ 2012-12-19 20:43 UTC (permalink / raw)
  To: John Stultz; +Cc: linux-rt-users, Thomas Gleixner

John Stultz wrote on 11/15/2012 02:53:56 PM:
> On 11/15/2012 11:28 AM, Scot Salmon wrote:
> > Thomas Gleixner wrote on 10/31/2012 03:08:45 PM:
> > > What you are concerned about is keeping the machines in sync on a
> > > common global timeline. Though your more fundamental requirement is
> > > that you get the wakeup on each machine in the given cycle time. The
> > > global synchronization mechanism just adjusts that local periodic
> > > schedule.
> > [...]
> > > What you really want is an atomic readout facility for 
> > > CLOCK_MONOTONIC and CLOCK_REALTIME. That allows you to align the 
> > > CLOCK_MONOTONIC based timer with the global CLOCK_REALTIME based 
> > > time line and in the event that the CLOCK_REALTIME clock was set 
> > > and jumped forward/backward you have full software control over 
> > > the aligning mechanism including the ability to do sanity checking.
> > This works for me.  We discussed it on IRC and agreed on "something
> > like clock_gettime2(id, id, *t1, *t2) with a sanity check on the ids,
> > that allows us other combinations than MONO/REAL as well".
> Yea, there's been a few times that folks have brought up the need, and 
> this sort of interface is probably the best way to go.

I spent a few days working on clock_gettime2 this week, and I've hit a 
snag.  I tried an implementation where I look up the two kclocks from 
their clockid and call kc1->clock_get() and kc2->clock_get() with the 
timekeeping data locked.  The problem is, if id1 and id2 are the obvious 
choices of MONO and REAL, their clock_get's end up calling 
timekeeping_get_ns which samples the clock source synchronously, so I 
get two discrete samples of the underlying clock even if I have locked 
out any changes to the timekeeping data.

This implementation does work for the _COARSE clockid variants, but it 
seems from some quick checks that those aren't going to be suitable for 
my needs (too coarse).

Since there's nothing I can do to keep the underlying hardware clock 
source from ticking, it seems like my options are pretty limited.  I 
can't do anything that will lead to two samples of the clock source. 
All I can think of is to sample it for the first clockid, then derive 
the value for the second from the first, for example by offsetting a 
CLOCK_MONOTONIC read with tk->offs_real and returning that as the 
synchronized value for CLOCK_REALTIME.  Which isn't incredibly horrible 
for those two, but making it general would get very ugly.

Any thoughts?

-Scot

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-12-19 20:43       ` Scot Salmon
@ 2012-12-19 20:57         ` John Stultz
  2013-01-05  4:09           ` Richard Cochran
  0 siblings, 1 reply; 15+ messages in thread
From: John Stultz @ 2012-12-19 20:57 UTC (permalink / raw)
  To: Scot Salmon; +Cc: linux-rt-users, Thomas Gleixner

On 12/19/2012 12:43 PM, Scot Salmon wrote:
> John Stultz wrote on 11/15/2012 02:53:56 PM:
>> On 11/15/2012 11:28 AM, Scot Salmon wrote:
>>> Thomas Gleixner wrote on 10/31/2012 03:08:45 PM:
>>>> What you are concerned about is keeping the machines in sync on a
>>>> common global timeline. Though your more fundamental requirement is
>>>> that you get the wakeup on each machine in the given cycle time. The
>>>> global synchronization mechanism just adjusts that local periodic
>>>> schedule.
>>> [...]
>>>> What you really want is an atomic readout facility for
>>>> CLOCK_MONOTONIC and CLOCK_REALTIME. That allows you to align the
>>>> CLOCK_MONOTONIC based timer with the global CLOCK_REALTIME based
>>>> time line and in the event that the CLOCK_REALTIME clock was set
>>>> and jumped forward/backward you have full software control over
>>>> the aligning mechanism including the ability to do sanity checking.
>>> This works for me.  We discussed it on IRC and agreed on "something
>>> like clock_gettime2(id, id, *t1, *t2) with a sanity check on the ids,
>>> that allows us other combinations than MONO/REAL as well".
>> Yea, there's been a few times that folks have brought up the need, and
>> this sort of interface is probably the best way to go.
> I spent a few days working on clock_gettime2 this week, and I've hit a
> snag.  I tried an implementation where I look up the two kclocks from
> their clockid and call kc1->clock_get() and kc2->clock_get() with the
> timekeeping data locked.  The problem is, if id1 and id2 are the obvious
> choices of MONO and REAL, their clock_get's end up calling
> timekeeping_get_ns which samples the clock source synchronously, so I
> get two discrete samples of the underlying clock even if I have locked
> out any changes to the timekeeping data.
>
> This implementation does work for the _COARSE clockid variants, but it
> seems from some quick checks that those aren't going to be suitable for
> my needs (too coarse).
>
> Since there's nothing I can do to keep the underlying hardware clock
> source from ticking, it seems like my options are pretty limited.  I
> can't do anything that will lead to two samples of the clock source.
> All I can think of is to sample it for the first clockid, then derive
> the value for the second from the first, for example by offsetting a
> CLOCK_MONOTONIC read with tk->offs_real and returning that as the
> synchronized value for CLOCK_REALTIME.  Which isn't incredibly horrible
> for those two, but making it general would get very ugly.
>
> Any thoughts?

Likely this will require some refactoring of the current timekeeping 
accessor functions.

Basically you need to split the current accsessor functions into two parts:
1) The clocksource hardware read
2) The time calculation, given the clocksource value

Then for the gettime2() implementation, do a *single* read of the 
hardware, and pass that clocksource value into the different specified 
clockid functions.

Another important thing to consider here, is that if we are considering 
adding a new syscall, we should avoid the 2038 issue, and specify a new 
time64_t and timespec64 types, and use them in the new interface. This 
way we can provide a 64bit time interface for 32bit systems.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2012-12-19 20:57         ` John Stultz
@ 2013-01-05  4:09           ` Richard Cochran
  2013-01-21 15:36             ` Scot Salmon
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Cochran @ 2013-01-05  4:09 UTC (permalink / raw)
  To: John Stultz; +Cc: Scot Salmon, linux-rt-users, Thomas Gleixner

Instead of looking for new system calls, why not just solve this
problem at the application level?

1. boot machine
2. start ptp4l
3. wait five seconds
4. start control app, and just use clock_realtime

HTH,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2013-01-05  4:09           ` Richard Cochran
@ 2013-01-21 15:36             ` Scot Salmon
  2013-01-21 19:08               ` John Stultz
  2013-01-21 19:12               ` Richard Cochran
  0 siblings, 2 replies; 15+ messages in thread
From: Scot Salmon @ 2013-01-21 15:36 UTC (permalink / raw)
  To: Richard Cochran; +Cc: John Stultz, linux-rt-users, Thomas Gleixner

Richard Cochran <richardcochran@gmail.com> wrote on 01/04/2013 10:09:46 
PM:
> Instead of looking for new system calls, why not just solve this
> problem at the application level?
> 
> 1. boot machine
> 2. start ptp4l
> 3. wait five seconds
> 4. start control app, and just use clock_realtime
> 
> HTH,
> Richard

Richard and I talked about this a bit (thank you again Richard).  My 
solution needs to handle a variety of different clock shift scenarios, 
because I'm not designing the final application but trying to provide a 
system that can support a variety of applications.  I think the two 
variables that matter most are (1) that the two controllers might be on 
the same subnet or in different countries, and (2) that they might be 
synced by 1588 or SNTP.  Another possibly important variable is that the 
time sync server might not be reliably available.  Ideally the same system 
would work on all the combinations. 

Richard suggested a user-mode solution where I just sample the two clocks 
in a tight loop and determine the shift between them by the smallest delta 
across N loops.  That solution is nice for a couple of reasons, first, no 
syscall, second, it supports arbitrary combinations of clockid's really 
nicely.

One drawback of that solution for me is that for the specific clocks I 
care about (MONO and REAL) it is not optimal.  The delta between those two 
clocks is available in the kernel timekeeping data, so a syscall just 
handling those two would be able to get the exact offset quickly.  I'd 
still really like to be able to get that offset.

One way to do that would be to take a small step along the clock_gettime2 
path and just have it return an error if the clockid's are anything other 
than MONO and REAL.  That seems ugly, but it's straightforward to 
implement, and would work for me and maybe others with basically similar 
requirements, so I'll throw it out there.  Another option is to continue 
all the way down the clock_gettime2 path, as John suggested:

> Basically you need to split the current accsessor functions into two 
parts:
> 1) The clocksource hardware read
> 2) The time calculation, given the clocksource value
> 
> Then for the gettime2() implementation, do a *single* read of the 
> hardware, and pass that clocksource value into the different specified 
> clockid functions.
> 
> Another important thing to consider here, is that if we are considering 
> adding a new syscall, we should avoid the 2038 issue, and specify a new 
> time64_t and timespec64 types, and use them in the new interface. This 
> way we can provide a 64bit time interface for 32bit systems.

This is something I'd like to work on, but it seems like a significant 
project, and I'm wondering if it might be acceptable to start with the 
reduced functionality version I mentioned above and work towards this.

As a side note, Richard mentioned that the idea of a CLOCK_TAI has been 
thrown around before.  I agree with him that it would be a useful addition 
for test & measurement applications.

-Scot

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2013-01-21 15:36             ` Scot Salmon
@ 2013-01-21 19:08               ` John Stultz
  2013-01-22  2:42                 ` John Stultz
  2013-01-21 19:12               ` Richard Cochran
  1 sibling, 1 reply; 15+ messages in thread
From: John Stultz @ 2013-01-21 19:08 UTC (permalink / raw)
  To: Scot Salmon; +Cc: Richard Cochran, linux-rt-users, Thomas Gleixner

On 01/21/2013 07:36 AM, Scot Salmon wrote:
>
> One drawback of that solution for me is that for the specific clocks I
> care about (MONO and REAL) it is not optimal.  The delta between those two
> clocks is available in the kernel timekeeping data, so a syscall just
> handling those two would be able to get the exact offset quickly.  I'd
> still really like to be able to get that offset.
>
> One way to do that would be to take a small step along the clock_gettime2
> path and just have it return an error if the clockid's are anything other
> than MONO and REAL.  That seems ugly, but it's straightforward to
> implement, and would work for me and maybe others with basically similar
> requirements, so I'll throw it out there.  Another option is to continue
> all the way down the clock_gettime2 path, as John suggested:
>
>> Basically you need to split the current accsessor functions into two
> parts:
>> 1) The clocksource hardware read
>> 2) The time calculation, given the clocksource value
>>
>> Then for the gettime2() implementation, do a *single* read of the
>> hardware, and pass that clocksource value into the different specified
>> clockid functions.
>>
>> Another important thing to consider here, is that if we are considering
>> adding a new syscall, we should avoid the 2038 issue, and specify a new
>> time64_t and timespec64 types, and use them in the new interface. This
>> way we can provide a 64bit time interface for 32bit systems.
> This is something I'd like to work on, but it seems like a significant
> project, and I'm wondering if it might be acceptable to start with the
> reduced functionality version I mentioned above and work towards this.

Yea, its likely to be a reasonable amount of work to do it right 
(although really, its not too crazy, given its pretty limited scope - 
you're really just moving code into sub-functions). The problem with 
starting with the reduced functionality, especially when there's a sense 
of what the right approach should be, is that if we merge that, 
*someone* still has to clean things up to really finish the interface. 
So its usually not a good idea to merge such an approach unless there's 
a clear commitment and history of the submitter following through.

The exceptions for this are usually when there's some interesting 
feature, but no one knows how to best integrate it (ie: how deeply), and 
in those cases, having a reduced functionality (even hackish) 
implementation, lets folks get a better sense of the functionality and 
what it duplicates or would integrate well with. In those cases, having 
something merged to learn from is more valuable.

Now, as far as the 2038 issue goes, just defining the interface to be 
64bit would be fine. You wouldn't have to convert all the internals to 
64bits for just the interface to be useful (since converting the 
internals to be 64bits is something on my todo already, and on 64bit 
systems it would already be supported).

> As a side note, Richard mentioned that the idea of a CLOCK_TAI has been
> thrown around before.  I agree with him that it would be a useful addition
> for test & measurement applications.
Yea, I've had some CLOCK_TAI patches in my tree for awhile now. However, 
its sort of half done, as I need to still integrate CLOCK_TAI hrtimer 
support before it can properly be merged. I've not had the time recently 
to finish this, but I'm hoping to get to it before too long (or so I've 
said for the last few releases).

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2013-01-21 19:08               ` John Stultz
@ 2013-01-22  2:42                 ` John Stultz
  0 siblings, 0 replies; 15+ messages in thread
From: John Stultz @ 2013-01-22  2:42 UTC (permalink / raw)
  To: Scot Salmon; +Cc: Richard Cochran, linux-rt-users, Thomas Gleixner

On 01/21/2013 11:08 AM, John Stultz wrote:
> On 01/21/2013 07:36 AM, Scot Salmon wrote:
>> As a side note, Richard mentioned that the idea of a CLOCK_TAI has been
>> thrown around before.  I agree with him that it would be a useful 
>> addition
>> for test & measurement applications.
> Yea, I've had some CLOCK_TAI patches in my tree for awhile now. 
> However, its sort of half done, as I need to still integrate CLOCK_TAI 
> hrtimer support before it can properly be merged. I've not had the 
> time recently to finish this, but I'm hoping to get to it before too 
> long (or so I've said for the last few releases).

So I figured I'd spend a few minutes and try to get some progress on this.

My current tree can be found here:
http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=shortlog;h=refs/heads/fortglx/3.10/time

There's still some issues around notifying the hrtimer when the 
tai_offset changes that have to be dealt with, but if folks want to play 
around with it, I'd appreciate any feedback.

thanks
-john


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2013-01-21 15:36             ` Scot Salmon
  2013-01-21 19:08               ` John Stultz
@ 2013-01-21 19:12               ` Richard Cochran
  2013-01-23 15:14                 ` Scot Salmon
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Cochran @ 2013-01-21 19:12 UTC (permalink / raw)
  To: Scot Salmon; +Cc: John Stultz, linux-rt-users, Thomas Gleixner

I still think that your problems can be solved at the application
level quite easily. Just out of curiosity, I wrote a little test
program, below, and I think it shows a promising direction to solve
your issue. (The 'clk_cmp' function was taken from the userland servo
in the phc2sys program, from the linuxptp project.)

I am not sure what your application wants to do if it should notice a
jump in CLOCK_REALTIME: keep the old phase angle or re-align to the
new UTC second. My test program does not shift its phase, but I think
you would know how to do it. 

Running on my dinky atom netbook (32 bit, no vdso, no RT) in an
unscientific test, I see the value of 'expected - offset' typically
around a few hundred nanoseconds, with occasional outliers of a few
microseconds. With RT-PREMPT these could surely be avoided, or by
using a moving average of say, 10 readings.

Also, rather than jumping to the new phase angle (if that is what you
want in fact to do), you could smoothly ajust the loop phase using a
PI controller or similar.

Anyhow, here it is. What do you think?

Thanks,
Richard

---
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <inttypes.h>

#define NS_PER_SEC 1000000000LL

static int clk_cmp(clockid_t clkid, clockid_t sysclk, int readings,
		    int64_t *offset, uint64_t *ts)
{
	struct timespec tdst1, tdst2, tsrc;
	int i;
	int64_t interval, best_interval = INT64_MAX;

	/* Pick the quickest clkid reading. */
	for (i = 0; i < readings; i++) {
		if (clock_gettime(sysclk, &tdst1) ||
				clock_gettime(clkid, &tsrc) ||
				clock_gettime(sysclk, &tdst2)) {
			perror("clock_gettime");
			return -1;
		}

		interval = (tdst2.tv_sec - tdst1.tv_sec) * NS_PER_SEC +
			tdst2.tv_nsec - tdst1.tv_nsec;

		if (best_interval > interval) {
			best_interval = interval;
			*offset = (tdst1.tv_sec - tsrc.tv_sec) * NS_PER_SEC +
				tdst1.tv_nsec - tsrc.tv_nsec + interval / 2;
			*ts = tdst2.tv_sec * NS_PER_SEC + tdst2.tv_nsec;
		}
	}

	return 0;
}

int main(int argc, char *argv[])
{
	int64_t expected, offset;
	uint64_t systs;
	struct timespec next;
	struct timespec incr = {
		0, 250000000
	};
	clockid_t clk = CLOCK_MONOTONIC;
	int n_readings = 1;

	/*
	 * Calibrate the offset.
	 */
	if (clk_cmp(clk, CLOCK_REALTIME, 25, &expected, &systs)) {
		return -1;
	}
	if (clock_gettime(clk, &next)) {
		perror("clock_gettime");
		return -1;
	}

	next.tv_sec++;

	while (1) {
		if (clock_nanosleep(clk, TIMER_ABSTIME, &next, NULL)) {
			perror("clock_nanosleep");
			break;
		}
		if (clk_cmp(clk, CLOCK_REALTIME, n_readings, &offset, &systs)) {
			break;
		}
		printf("%" PRId64 "\n", expected - offset);
		/*
		 * At this point, if the difference is more than, say,
		 * 10 microseconds, then you could simply recalibrate
		 * the offset and set the next deadline.
		 */
		next.tv_sec  += incr.tv_sec;
		next.tv_nsec += incr.tv_nsec;
		while (next.tv_nsec >= NS_PER_SEC) {
			next.tv_nsec -= NS_PER_SEC;
			next.tv_sec++;
		}
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Detecting shift of CLOCK_REALTIME with clock_nanosleep (again)
  2013-01-21 19:12               ` Richard Cochran
@ 2013-01-23 15:14                 ` Scot Salmon
  0 siblings, 0 replies; 15+ messages in thread
From: Scot Salmon @ 2013-01-23 15:14 UTC (permalink / raw)
  To: Richard Cochran
  Cc: John Stultz, linux-rt-users, linux-rt-users-owner,
	Thomas Gleixner

Richard-

Skipping to the punch line, that worked for us. Thanks again.

Details:

> I still think that your problems can be solved at the application
> level quite easily. Just out of curiosity, I wrote a little test
> program, below, and I think it shows a promising direction to solve
> your issue. (The 'clk_cmp' function was taken from the userland servo
> in the phc2sys program, from the linuxptp project.)
...
> Running on my dinky atom netbook (32 bit, no vdso, no RT) in an
> unscientific test, I see the value of 'expected - offset' typically
> around a few hundred nanoseconds, with occasional outliers of a few
> microseconds. With RT-PREMPT these could surely be avoided...

I tried your code on a 667MHz ARM Cortex-A9 RT patched system and the 
initial results looked good.  I made a few tweaks to make the test a 
little more aggressive, changing the period to something faster and 
not-round so that it would be more likely to conflict with periodic 
interrupts, changing the reporting so that it only tells me when the delta 
got worse so that the spam from the faster rate is manageable, and setting 
it to run at an RT priority appropriate for our applications (without the 
RT priority, the results were predictably very bad under load, like 180us 
of variation).  Then I ran it under some load (gzip'ing two large files to 
load both cores, then scp'ing another large file to hit the network 
interrupt).  I saw a few readings more than 10us from the expected value 
which concerned me at first.  But, I think this turns out to be okay, as 
you predicted:

> ...using a moving average of say, 10 readings.
> 
> Also, rather than jumping to the new phase angle (if that is what you
> want in fact to do), you could smoothly ajust the loop phase using a
> PI controller or similar.

I talked to the guys who actually use the data I'm providing here, and 
they already do what you suggest, and figure that the occasional spike I 
reported is acceptable for that reason.  They also like the fact that the 
user mode solution would work on older kernels where the syscall would not 
be available.

I also noticed later that the 2nd...n'th readings after the nanosleep are 
consistently much better than the first one, i.e. n_readings of at least 2 
gives a perhaps-surprising improvement in the results.  The 1st offset 
reading is consistently low, i.e. with n_readings = 1, expected - offset 
is almost always positive.  This may be because of a first-iteration cache 
miss in the gettime(MONO) code path, and is mitigated either by using 
n_readings = 2 or adding a dummy gettime(MONO) before clk_cmp.

It still makes me itch a little to use an approximation when the exact 
value is tantalizingly close, but all things considered your user mode 
solution is the right answer.  Thank you very much for the idea, the demo, 
and all the patient advice.

John had mentioned earlier in the thread, regarding clock_gettime2:
> Yea, there's been a few times that folks have brought up the need, and 
> this sort of interface is probably the best way to go.

I'd be interested to know more about the other needs.  Ultimately if 
gettime2 is still useful and worth adding anyway, then I'd use it, so I 
can still pursue it if others think it's worthwhile for uses not covered 
by Richard's solution.

-Scot

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-01-23 15:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-31 15:54 Detecting shift of CLOCK_REALTIME with clock_nanosleep (again) Scot Salmon
2012-10-31 20:08 ` Thomas Gleixner
2012-11-15 19:28   ` Scot Salmon
2012-11-15 20:53     ` John Stultz
2012-11-15 21:01       ` Thomas Gleixner
2012-11-15 22:25         ` John Stultz
2012-11-19 18:27           ` Thomas Gleixner
2012-12-19 20:43       ` Scot Salmon
2012-12-19 20:57         ` John Stultz
2013-01-05  4:09           ` Richard Cochran
2013-01-21 15:36             ` Scot Salmon
2013-01-21 19:08               ` John Stultz
2013-01-22  2:42                 ` John Stultz
2013-01-21 19:12               ` Richard Cochran
2013-01-23 15:14                 ` Scot Salmon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).