hardware time stamps + existing time stamp usage

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* hardware time stamps + existing time stamp usage
@ 2008-10-17 14:23 Patrick Ohly
  2008-10-18  5:02 ` Eric Dumazet
  2008-10-18 19:37 ` Octavian Purdila
  0 siblings, 2 replies; 14+ messages in thread
From: Patrick Ohly @ 2008-10-17 14:23 UTC (permalink / raw)
  To: netdev
  Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
	Ronciak, John

Hello folks!

It's been a while, I hope you are still interested in the topic. It was
previously discussed under the subject "[RFC][PATCH 1/1] net: support
for hardware timestamping". I would like to revive the discussion (and
eventually the implementation), therefore I'm starting a new thread. I
also have two questions about oddities (?) in the current code.

Octavian posted a patch which modified the sk_buff::tstamp field so that
it can store both system time and hardware time stamps (which may be
unrelated to system time!). A single bit distinguishes the two. Ingo
suggested to drop that distinction. Before going into details of what
might have to be changed, let me take stock of what is currently done
with sk_buff::tstamp.

There seem to be at least three usages: 
      * the netfilter code uses it to trigger timing related filter
        rules (net/netfilter/xt_time.c)
      * keep track of the time stamp of the last packet received via a
        socket (SOCK_TIMESTAMP, net/sock/core.c), used for
        SIOCGSTAMP[NS] 
      * deliver receive time together with packet to user space
        (SOCK_RCVTSTAMP[NS], net/sock/sock.c)

Currently time stamping is enabled via sock_enable_timestamp(), which
itself uses the lower level net_enable_timestamp(). At that level, a
counter keeps track of how many users need time stamping.

Based on how sk_buff::tstamp is used, one can conclude that it needs to
be reasonably close to system time (for the netfilter code) but not
absolutely the same. Ingo also said that it should be monotonically
increasing. However, I doubt that this is currently guaranteed: the
value is created with ktime_get_real(), which in contrast to ktime_get()
is not monotonic (if I read the comments right).

While looking at the code I ran into a few oddities which I don't quite
understand. Could be me, of course ;-}

First, in net/ipv[46]/netfilter/ip*_queue.c, the call to
net_enable_timestamp() is in an else branch of __ipq_rcv_skb(). The
net_disable_timestamp() is unconditionally in __ipq_reset(). Shouldn't
the code take care that enable/disable calls always match exactly?
Perhaps I'm missing something, but at least at first glance that doesn't
seem to be the case. Also, is it possible that net_enable_timestamp() in
__ipq_rcv_skb() is called repeatedly? 

Second, sock_recv_timestamp() in include/net/sock.h only copies
sk_buff::tstamp into sock::sk_tstamp if SOCK_RCVTSTAMP[NS] is not set.
If this is set (note that SOCK_RCVTSTAMPNS also sets SOCK_RCVTSTAMP),
then __sock_recv_timestamp() copies the value into cmsgs instead. Is
that really the intended semantic? My expectation is that all of the
usages above are possible at the same time.

Let's move on to the changes necessary for hardware time stamping.

With regards to hardware time stamps we identified the following
additional usages of sk_buff::tstamp (assuming that we recycle it
instead of adding a new field): 
      * Transport the original hardware timestamps to user space:
        Octavian is doing that with custom patches at the moment that he
        would like to replace with an upstream solution. These hardware
        time stamps are *not* synchronized with system time, only
        between cards. Transforming them to system time decreases their
        accuracy and therefore is not desirable. 
      * Use hardware timestamps as replacement for the currently rather
        inaccurate, software-only time stamps, both for incoming and for
        outgoing packets: this improves the accuracy of system time
        synchronization with PTP [1]. For this use case, the time stamp
        delivered to the user space PTPd should be consistently
        generated either by hardware or in software. Alternating between
        the two methods introduces jumps, which decreases the accuracy
        of the clock synchronization.

The first use case is problematic if the hardware time diverges from
system time *and* net time stamping is enabled (implying that one of the
existing usages of tstamp is active). Would it be acceptable to let the
user of the Linux kernel avoid this conflict or does the kernel itself
need to detect the conflict?

The second additional use case has no such conflict. Ensuring that the
user space daemon just gets the kind of time stamps he wants is harder.
In the previous discussion we ended with the proposal to add socket
flags which determine what kind of time stamps are to be generated (TX
or RX, hardware or software). After looking at this again I believe that
deciding that at the socket level is too late: suppose the daemon has
initialized the hardware time stamping successfully and then requests to
get only hardware time stamps. A packet is received but couldn't be time
stamped (can happen due to hardware limitations). The IP filter needs a
time stamp and therefore generates one in software, which is stored in
sk_buff::tstamp. Now the socket code cannot tell whether this is a time
stamp that it can report to the daemon.

The only solution that I see is to use one bit as flag to distinguish
between hardware and software time stamps, as Octavian originally
suggested. In contrast to his proposal, the rest of the bits are to be
interpreted as system time, i.e., there would be no delayed conversion
of hardware time stamps to system time stamps. In my opinion, such a
conversion would be tricky, for example because it would have to be done
by the hardware driver which generated the time stamp, but there is no
link back to it from sk_buff.

If that flag bit is not acceptable for Linux upstream, then PTPd would
still work, albeit with lower accuracy.

That's all for now - the mail is long enough as it is...
Comments?

[1] http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-17 14:23 hardware time stamps + existing time stamp usage Patrick Ohly
@ 2008-10-18  5:02 ` Eric Dumazet
  2008-10-18  7:38   ` Oliver Hartkopp
  2008-10-18 19:37 ` Octavian Purdila
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2008-10-18  5:02 UTC (permalink / raw)
  To: Patrick Ohly
  Cc: netdev, Octavian Purdila, Stephen Hemminger, Ingo Oeser,
	Andi Kleen, Ronciak, John

Patrick Ohly a écrit :
> Hello folks!
> 
> It's been a while, I hope you are still interested in the topic. It was
> previously discussed under the subject "[RFC][PATCH 1/1] net: support
> for hardware timestamping". I would like to revive the discussion (and
> eventually the implementation), therefore I'm starting a new thread. I
> also have two questions about oddities (?) in the current code.
> 
> Octavian posted a patch which modified the sk_buff::tstamp field so that
> it can store both system time and hardware time stamps (which may be
> unrelated to system time!). A single bit distinguishes the two. Ingo
> suggested to drop that distinction. Before going into details of what
> might have to be changed, let me take stock of what is currently done
> with sk_buff::tstamp.
> 
> There seem to be at least three usages: 
>       * the netfilter code uses it to trigger timing related filter
>         rules (net/netfilter/xt_time.c)

This netfilter code could be changed to use current system time 
(low second resolution : xtime), and not skb time. When no sniffer is running,
we get timestamp at the time netfilter code is running anyway, not
at time packet is received by NIC driver.


>       * keep track of the time stamp of the last packet received via a
>         socket (SOCK_TIMESTAMP, net/sock/core.c), used for
>         SIOCGSTAMP[NS] 
>       * deliver receive time together with packet to user space
>         (SOCK_RCVTSTAMP[NS], net/sock/sock.c)
> 
> Currently time stamping is enabled via sock_enable_timestamp(), which
> itself uses the lower level net_enable_timestamp(). At that level, a
> counter keeps track of how many users need time stamping.
> 
> Based on how sk_buff::tstamp is used, one can conclude that it needs to
> be reasonably close to system time (for the netfilter code) but not
> absolutely the same. Ingo also said that it should be monotonically
> increasing. However, I doubt that this is currently guaranteed: the
> value is created with ktime_get_real(), which in contrast to ktime_get()
> is not monotonic (if I read the comments right).
> 
> While looking at the code I ran into a few oddities which I don't quite
> understand. Could be me, of course ;-}
> 
> First, in net/ipv[46]/netfilter/ip*_queue.c, the call to
> net_enable_timestamp() is in an else branch of __ipq_rcv_skb(). The
> net_disable_timestamp() is unconditionally in __ipq_reset(). Shouldn't
> the code take care that enable/disable calls always match exactly?
> Perhaps I'm missing something, but at least at first glance that doesn't
> seem to be the case. Also, is it possible that net_enable_timestamp() in
> __ipq_rcv_skb() is called repeatedly? 
> 
> Second, sock_recv_timestamp() in include/net/sock.h only copies
> sk_buff::tstamp into sock::sk_tstamp if SOCK_RCVTSTAMP[NS] is not set.
> If this is set (note that SOCK_RCVTSTAMPNS also sets SOCK_RCVTSTAMP),
> then __sock_recv_timestamp() copies the value into cmsgs instead. Is
> that really the intended semantic? My expectation is that all of the
> usages above are possible at the same time.
> 
> Let's move on to the changes necessary for hardware time stamping.
> 
> With regards to hardware time stamps we identified the following
> additional usages of sk_buff::tstamp (assuming that we recycle it
> instead of adding a new field): 
>       * Transport the original hardware timestamps to user space:
>         Octavian is doing that with custom patches at the moment that he
>         would like to replace with an upstream solution. These hardware
>         time stamps are *not* synchronized with system time, only
>         between cards. Transforming them to system time decreases their
>         accuracy and therefore is not desirable. 
>       * Use hardware timestamps as replacement for the currently rather
>         inaccurate, software-only time stamps, both for incoming and for
>         outgoing packets: this improves the accuracy of system time
>         synchronization with PTP [1]. For this use case, the time stamp
>         delivered to the user space PTPd should be consistently
>         generated either by hardware or in software. Alternating between
>         the two methods introduces jumps, which decreases the accuracy
>         of the clock synchronization.
> 
> The first use case is problematic if the hardware time diverges from
> system time *and* net time stamping is enabled (implying that one of the
> existing usages of tstamp is active). Would it be acceptable to let the
> user of the Linux kernel avoid this conflict or does the kernel itself
> need to detect the conflict?
> 
> The second additional use case has no such conflict. Ensuring that the
> user space daemon just gets the kind of time stamps he wants is harder.
> In the previous discussion we ended with the proposal to add socket
> flags which determine what kind of time stamps are to be generated (TX
> or RX, hardware or software). After looking at this again I believe that
> deciding that at the socket level is too late: suppose the daemon has
> initialized the hardware time stamping successfully and then requests to
> get only hardware time stamps. A packet is received but couldn't be time
> stamped (can happen due to hardware limitations). The IP filter needs a
> time stamp and therefore generates one in software, which is stored in
> sk_buff::tstamp. Now the socket code cannot tell whether this is a time
> stamp that it can report to the daemon.
> 
> The only solution that I see is to use one bit as flag to distinguish
> between hardware and software time stamps, as Octavian originally
> suggested. In contrast to his proposal, the rest of the bits are to be
> interpreted as system time, i.e., there would be no delayed conversion
> of hardware time stamps to system time stamps. In my opinion, such a
> conversion would be tricky, for example because it would have to be done
> by the hardware driver which generated the time stamp, but there is no
> link back to it from sk_buff.
> 
> If that flag bit is not acceptable for Linux upstream, then PTPd would
> still work, albeit with lower accuracy.
> 
> That's all for now - the mail is long enough as it is...
> Comments?
> 
> [1] http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
> 

Interesting stuff :)

1) You want hardware TX stamping on all frames sent behalf a given socket
  Mark a WANT_HARDWARE_TX_STAMP flag at socket level
   Copy this flag when generating skb for this socket.
     When transmitting WANT_HARDWARE_TX_STAMP tagged frame to device,
     dont feed it to dev_queue_xmit_nit() in dev_hard_start_xmit()
     In NIC driver tx completion, test skb WANT_HARDWARE_TX_STAMP flag.
     If set, get tstamp from hardware and copy it to skb tstamp, 
     and call dev_queue_xmit_nit() (we might avoid cloning skb there, since
     nic driver doesnt need it anymore)

  This flag could also be set at device level, for all sent frames. (tcpdump new option)

2) You want hardware RX stamping on a particular device, yet being able to
   deliver system time to legacy apps, unaware of hardware tstamps.

   Set a global flag on device, telling linux stack this device feeds hardware stamp.
   In driver RX completion, set skb tstamp with hardware stamps.

   Mark a WANT_HARDWARE_RX_STAMP flag at socket level, for PTP applications.

   In recv(), if current socket is not marked WANT_HARDWARE_RX_STAMP and device has
   the global flag set, copy system time in tstamp, overrinding hardware tstamp.






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-18  5:02 ` Eric Dumazet
@ 2008-10-18  7:38   ` Oliver Hartkopp
  2008-10-18  8:54     ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2008-10-18  7:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick Ohly, netdev, Octavian Purdila, Stephen Hemminger,
	Ingo Oeser, Andi Kleen, Ronciak, John

Eric Dumazet wrote:
>
> Interesting stuff :)
>

Indeed.

> 1) You want hardware TX stamping on all frames sent behalf a given socket
>  Mark a WANT_HARDWARE_TX_STAMP flag at socket level
>   Copy this flag when generating skb for this socket.
>     When transmitting WANT_HARDWARE_TX_STAMP tagged frame to device,
>     dont feed it to dev_queue_xmit_nit() in dev_hard_start_xmit()
>     In NIC driver tx completion, test skb WANT_HARDWARE_TX_STAMP flag.
>     If set, get tstamp from hardware and copy it to skb tstamp,     
> and call dev_queue_xmit_nit() (we might avoid cloning skb there, since
>     nic driver doesnt need it anymore)
>
>  This flag could also be set at device level, for all sent frames. 
> (tcpdump new option)
>
> 2) You want hardware RX stamping on a particular device, yet being 
> able to
>   deliver system time to legacy apps, unaware of hardware tstamps.
>
>   Set a global flag on device, telling linux stack this device feeds 
> hardware stamp.
>   In driver RX completion, set skb tstamp with hardware stamps.
>
>   Mark a WANT_HARDWARE_RX_STAMP flag at socket level, for PTP 
> applications.
>
>   In recv(), if current socket is not marked WANT_HARDWARE_RX_STAMP 
> and device has
>   the global flag set, copy system time in tstamp, overrinding 
> hardware tstamp.
>
>

Looks good to me. Just one question regarding
'copy system time in tstamp, overrinding hardware tstamp':

When recv() delivers to several sockets there would be probably 
*different* system time values copied and delivered for the *same* skb, 
right?

If so i would tend to fill both (system time and hw timestamp) on driver 
level into the skb and then decide on socket level what to push into 
user space as you suggested above.

Regards,
Oliver


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-18  7:38   ` Oliver Hartkopp
@ 2008-10-18  8:54     ` Eric Dumazet
  2008-10-18 10:10       ` Oliver Hartkopp
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2008-10-18  8:54 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Patrick Ohly, netdev, Octavian Purdila, Stephen Hemminger,
	Ingo Oeser, Andi Kleen, Ronciak, John

Oliver Hartkopp a écrit :
> Eric Dumazet wrote:
>>
>> Interesting stuff :)
>>
> 
> Indeed.
> 
>> 1) You want hardware TX stamping on all frames sent behalf a given socket
>>  Mark a WANT_HARDWARE_TX_STAMP flag at socket level
>>   Copy this flag when generating skb for this socket.
>>     When transmitting WANT_HARDWARE_TX_STAMP tagged frame to device,
>>     dont feed it to dev_queue_xmit_nit() in dev_hard_start_xmit()
>>     In NIC driver tx completion, test skb WANT_HARDWARE_TX_STAMP flag.
>>     If set, get tstamp from hardware and copy it to skb tstamp,     
>> and call dev_queue_xmit_nit() (we might avoid cloning skb there, since
>>     nic driver doesnt need it anymore)
>>
>>  This flag could also be set at device level, for all sent frames. 
>> (tcpdump new option)
>>
>> 2) You want hardware RX stamping on a particular device, yet being 
>> able to
>>   deliver system time to legacy apps, unaware of hardware tstamps.
>>
>>   Set a global flag on device, telling linux stack this device feeds 
>> hardware stamp.
>>   In driver RX completion, set skb tstamp with hardware stamps.
>>
>>   Mark a WANT_HARDWARE_RX_STAMP flag at socket level, for PTP 
>> applications.
>>
>>   In recv(), if current socket is not marked WANT_HARDWARE_RX_STAMP 
>> and device has
>>   the global flag set, copy system time in tstamp, overrinding 
>> hardware tstamp.
>>
>>
> 
> Looks good to me. Just one question regarding
> 'copy system time in tstamp, overrinding hardware tstamp':
> 
> When recv() delivers to several sockets there would be probably 
> *different* system time values copied and delivered for the *same* skb, 
> right?

As we introduced a new skb flag for the TX part, we could reuse it in order
to copy system time to tstamp only once for the RX part.
> 
> If so i would tend to fill both (system time and hw timestamp) on driver 
> level into the skb and then decide on socket level what to push into 
> user space as you suggested above.

Well, this would enlarge skb structure by 8 bytes, since you cannot use
same tstamp location to fille both 8 bytes values.
This is probably the easy way, but very expensive...





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-18  8:54     ` Eric Dumazet
@ 2008-10-18 10:10       ` Oliver Hartkopp
  2008-10-20  7:35         ` Patrick Ohly
  0 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2008-10-18 10:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick Ohly, netdev, Octavian Purdila, Stephen Hemminger,
	Ingo Oeser, Andi Kleen, Ronciak, John

Eric Dumazet wrote:
> Oliver Hartkopp a écrit :
>> Eric Dumazet wrote:
>>
>>> 2) You want hardware RX stamping on a particular device, yet being 
>>> able to
>>>   deliver system time to legacy apps, unaware of hardware tstamps.
>>>
>>>   Set a global flag on device, telling linux stack this device feeds 
>>> hardware stamp.
>>>   In driver RX completion, set skb tstamp with hardware stamps.
>>>

But at this point you would also need to save the system timestamp into 
the skb for sockets that do not want the hardware timestamp. The device 
driver has no chance to know whether the socket this skb is sent to 
wants hw tstamps or not.

>>>   Mark a WANT_HARDWARE_RX_STAMP flag at socket level, for PTP 
>>> applications.
>>>
>>>   In recv(), if current socket is not marked WANT_HARDWARE_RX_STAMP 
>>> and device has
>>>   the global flag set, copy system time in tstamp, overrinding 
>>> hardware tstamp.
>>>
>>>
>>
>> Looks good to me. Just one question regarding
>> 'copy system time in tstamp, overrinding hardware tstamp':
>>
>> When recv() delivers to several sockets there would be probably 
>> *different* system time values copied and delivered for the *same* 
>> skb, right?
>
> As we introduced a new skb flag for the TX part, we could reuse it in 
> order
> to copy system time to tstamp only once for the RX part.

But this does not help on received packets, right?

>>
>> If so i would tend to fill both (system time and hw timestamp) on 
>> driver level into the skb and then decide on socket level what to 
>> push into user space as you suggested above.
>
> Well, this would enlarge skb structure by 8 bytes, since you cannot use
> same tstamp location to fille both 8 bytes values.
> This is probably the easy way, but very expensive...

IMHO this is the only way to fulfill the given requirements.
Maybe we should introduce a new kernel config option for hw tstamps then ...

Regards,
Oliver

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-17 14:23 hardware time stamps + existing time stamp usage Patrick Ohly
  2008-10-18  5:02 ` Eric Dumazet
@ 2008-10-18 19:37 ` Octavian Purdila
  2008-10-20 12:27   ` Patrick Ohly
  2008-10-21  7:04   ` Andi Kleen
  1 sibling, 2 replies; 14+ messages in thread
From: Octavian Purdila @ 2008-10-18 19:37 UTC (permalink / raw)
  To: Patrick Ohly
  Cc: netdev, Stephen Hemminger, Ingo Oeser, Andi Kleen, Ronciak, John

From: Patrick Ohly <patrick.ohly@intel.com>
Date:	Fri, 17 Oct 2008 16:23:43 +0200

> The only solution that I see is to use one bit as flag to distinguish
> between hardware and software time stamps, as Octavian originally
> suggested. 

There is one more approach that I think it could work.

Once hardware times-tamping has been enabled, the driver will:
a) compute the delta between the system time and the hardware time
b) store in the skb the hardware timestamp + delta (thus effectively using an 
approximation of system time which I think it should be OK since system time 
is not that precise anyway)

This is the approach that Stephen used in his patch for sky2. But we need 
more, we need to get the "untainted" hardware timestamps to userspace.

And we can to that through a driver specific ioctl (or maybe via a new ethtool 
method?):

hwtimestamp htimestamp_ioctl(timestamp)
{
	return timestamp - delta;
}

There are some corner cases with this approach: 

1. Reseting hardware timestamps. Our devices do that when the hardware 
timestamps are synchronized across different cards, in which case hardware 
timestamps are all reseted to zero. 

We can easily detect this condition (current hardware timestamp < hardware 
timestamp at the moment at which we computed the delta) and update the delta.

2. When the hardware is unable to generate a hardware timestamps (Patrick 
mentioned that this may occur with certain hardware).

In that case the driver should generate a system time timestamp. 

The problem here is that we want to distinguish between system and hardware 
timestamps. A possible approach would be to use a slightly coarser precision 
(say Xns instead of 1ns) and then use the modulo X to encode state into the 
timestamp.

For example, we could say that hardware timestamp = (hwtimestamp/X)*X and 
software timestamp = ((system time/X)*X) +1

Than, in the hwtimestamp_ioctl we can check if a received time is software or 
hardware, and we can let the application know.

We can even compute the delta periodically now, to maintain better system - 
hardware timestamps synchronization, as we can keep and multiple deltas (each 
one associated with a modulo number).

Thanks,
tavi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-18 10:10       ` Oliver Hartkopp
@ 2008-10-20  7:35         ` Patrick Ohly
  2008-10-20 18:01           ` Oliver Hartkopp
  0 siblings, 1 reply; 14+ messages in thread
From: Patrick Ohly @ 2008-10-20  7:35 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Eric Dumazet, netdev@vger.kernel.org, Octavian Purdila,
	Stephen Hemminger, Ingo Oeser, Andi Kleen, Ronciak, John

On Sat, 2008-10-18 at 04:10 -0600, Oliver Hartkopp wrote:
> Eric Dumazet wrote:
> > Oliver Hartkopp a écrit :
> >> Eric Dumazet wrote:
> >> If so i would tend to fill both (system time and hw timestamp) on
> >> driver level into the skb and then decide on socket level what to
> >> push into user space as you suggested above.
> >
> > Well, this would enlarge skb structure by 8 bytes, since you cannot use
> > same tstamp location to fille both 8 bytes values.
> > This is probably the easy way, but very expensive...
> 
> IMHO this is the only way to fulfill the given requirements.
> Maybe we should introduce a new kernel config option for hw tstamps then ...

The last time this topic was discussed the initial proposal also was to
add another time stamp, pretty much for the same reasons. This approach
was discarded because enlarging a common structure like skb for rather
obscure ("Objection, your honor!" - "Rejected.") use cases is not
acceptable. A config option doesn't help much either because to be
useful for distribution users, it would have to be on by default.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-18 19:37 ` Octavian Purdila
@ 2008-10-20 12:27   ` Patrick Ohly
  2008-10-20 13:07     ` Octavian Purdila
  2008-10-21  7:04   ` Andi Kleen
  1 sibling, 1 reply; 14+ messages in thread
From: Patrick Ohly @ 2008-10-20 12:27 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: netdev@vger.kernel.org, Stephen Hemminger, Ingo Oeser, Andi Kleen,
	Ronciak, John

On Sat, 2008-10-18 at 13:37 -0600, Octavian Purdila wrote:
> From: Patrick Ohly <patrick.ohly@intel.com>
> Date:   Fri, 17 Oct 2008 16:23:43 +0200
> 
> > The only solution that I see is to use one bit as flag to distinguish
> > between hardware and software time stamps, as Octavian originally
> > suggested.
> 
> There is one more approach that I think it could work.
> 
> Once hardware times-tamping has been enabled, the driver will:
> a) compute the delta between the system time and the hardware time
> b) store in the skb the hardware timestamp + delta (thus effectively using an
> approximation of system time which I think it should be OK since system time
> is not that precise anyway)

In other words, skb::tstamp will always look like a system time stamp to
most of the kernel (basically everything above the driver), but those
parts which need to know can still detect true hardware time stamps.
Yes, that should work.

> This is the approach that Stephen used in his patch for sky2. But we need
> more, we need to get the "untainted" hardware timestamps to userspace.
> 
> And we can to that through a driver specific ioctl (or maybe via a new ethtool
> method?):
> 
> hwtimestamp htimestamp_ioctl(timestamp)
> {
>         return timestamp - delta;
> }

The problem here is correlating the hardware time stamp with the
hardware that generated it. Consider for example multiple NICs, or the
delay between receiving the time stamp and asking for the transformation
(there's a race condition).

I think it would be best to transform back to raw time stamps at the
socket level if requested by the application: SO_TIMESTAMPNS would
always return the transformed time stamp and a new SO_TIMESTAMP_HARDWARE
the corresponding hardware time stamp, if one exists. If that value is
not needed and computing it is considered to costly, a
SO_TIMESTAME_IS_HARDWARE could also be added.

> There are some corner cases with this approach:
> 
> 1. Reseting hardware timestamps. Our devices do that when the hardware
> timestamps are synchronized across different cards, in which case hardware
> timestamps are all reseted to zero.
> 
> We can easily detect this condition (current hardware timestamp < hardware
> timestamp at the moment at which we computed the delta) and update the delta.

My proposal is to implement as much of this in generic code. A specific
driver then only has to provide access to its clock and alert the
generic code of special circumstances (like reseting the clock). It can
also choose between an advanced method (see below) and a simple delta,
as needed.

> 2. When the hardware is unable to generate a hardware timestamps (Patrick
> mentioned that this may occur with certain hardware).
> 
> In that case the driver should generate a system time timestamp.

Agreed.

> The problem here is that we want to distinguish between system and hardware
> timestamps. A possible approach would be to use a slightly coarser precision
> (say Xns instead of 1ns) and then use the modulo X to encode state into the
> timestamp.
> 
> For example, we could say that hardware timestamp = (hwtimestamp/X)*X and
> software timestamp = ((system time/X)*X) +1

My expectation is that the lower bits of both software and hardware time
stamps are unused anyway. But I would reverse the logic and return the
more common software time stamps with the lower bits cleared, so that
ideally they are identical to time stamps without the additional
semantic.

Perhaps it would be acceptable to add a single bit flag to sk_buff
itself instead, but I'm not sure about that.

> Than, in the hwtimestamp_ioctl we can check if a received time is software or
> hardware, and we can let the application know.

As I said above, I think this should be done in recv_msg() as configured
by socket flags.

> We can even compute the delta periodically now, to maintain better system -
> hardware timestamps synchronization, as we can keep and multiple deltas (each
> one associated with a modulo number).

The transformation that I used was "system time = hardware time + delta
+ skew * time since last measurement". Perhaps this is overkill: the
last summand often was small (a few nanoseconds), but that depends on
the skew. Although it complicates the implementation, I would prefer to
implement that mapping function, just to be on the safe side.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-20 12:27   ` Patrick Ohly
@ 2008-10-20 13:07     ` Octavian Purdila
  2008-10-20 13:37       ` Patrick Ohly
  0 siblings, 1 reply; 14+ messages in thread
From: Octavian Purdila @ 2008-10-20 13:07 UTC (permalink / raw)
  To: Patrick Ohly
  Cc: netdev@vger.kernel.org, Stephen Hemminger, Ingo Oeser, Andi Kleen,
	Ronciak, John

From: Patrick Ohly <patrick.ohly@intel.com>
Date: Mon, 20 Oct 2008 14:27:25 +0200

> I think it would be best to transform back to raw time stamps at the
> socket level if requested by the application: SO_TIMESTAMPNS would
> always return the transformed time stamp and a new SO_TIMESTAMP_HARDWARE
> the corresponding hardware time stamp, if one exists. 

This is better, indeed. 

> If that value is
> not needed and computing it is considered to costly, a
> SO_TIMESTAME_IS_HARDWARE could also be added.

I didn't get this part.

> My proposal is to implement as much of this in generic code. A specific
> driver then only has to provide access to its clock and alert the
> generic code of special circumstances (like reseting the clock). It can
> also choose between an advanced method (see below) and a simple delta,
> as needed.

Agreed.

> > The problem here is that we want to distinguish between system and
> > hardware timestamps. A possible approach would be to use a slightly
> > coarser precision (say Xns instead of 1ns) and then use the modulo X to
> > encode state into the timestamp.
> >
> > For example, we could say that hardware timestamp = (hwtimestamp/X)*X and
> > software timestamp = ((system time/X)*X) +1
>
> My expectation is that the lower bits of both software and hardware time
> stamps are unused anyway. But I would reverse the logic and return the
> more common software time stamps with the lower bits cleared, so that
> ideally they are identical to time stamps without the additional
> semantic.
>

Right, its better this way.

> Perhaps it would be acceptable to add a single bit flag to sk_buff
> itself instead, but I'm not sure about that.
>

Last time I've checked, sk_buff didn't had any holes, thus adding one bit will 
enlarge it. And I think that this approach gives use more room for any 
enhancements we may need in the future.

> > Than, in the hwtimestamp_ioctl we can check if a received time is
> > software or hardware, and we can let the application know.
>
> As I said above, I think this should be done in recv_msg() as configured
> by socket flags.
>

Agreed.

> > We can even compute the delta periodically now, to maintain better system
> > - hardware timestamps synchronization, as we can keep and multiple deltas
> > (each one associated with a modulo number).
>
> The transformation that I used was "system time = hardware time + delta
> + skew * time since last measurement". Perhaps this is overkill: the
> last summand often was small (a few nanoseconds), but that depends on
> the skew. Although it complicates the implementation, I would prefer to
> implement that mapping function, just to be on the safe side.

Sure, we can use multiple sync methods as you've said, one being the simple 
delta and then this one as the more advanced method. 


Thanks,
tavi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-20 13:07     ` Octavian Purdila
@ 2008-10-20 13:37       ` Patrick Ohly
  0 siblings, 0 replies; 14+ messages in thread
From: Patrick Ohly @ 2008-10-20 13:37 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: netdev@vger.kernel.org, Stephen Hemminger, Ingo Oeser, Andi Kleen,
	Ronciak, John

Hello Octavian!

Seems like we agree on the way forward. I'll follow up with patches...

On Mon, 2008-10-20 at 07:07 -0600, Octavian Purdila wrote:
> > If that value is
> > not needed and computing it is considered to costly, a
> > SO_TIMESTAME_IS_HARDWARE could also be added.
>
> I didn't get this part.

For PTPd, access to the original hardware time stamps isn't necessary.
PTPd only needs to know whether the value returned by SO_TIMESTAMPNS was
created by hardware of software so that it can skip the ones done in
software. PTPd would use SO_TIMESTAMPNS + SO_TIMESTAMP_IS_HARDWARE, but
not SO_TIMESTAMP_HARDWARE.

Computing the original value can be costly, in particular when using the
advanced conversion to system time (okay, not that expensive, but
still...). Avoiding it when not necessary seems prudent.

There's one more argument in favor of adding both
SO_TIMESTAMP_IS_HARDWARE and SO_TIMESTAME_HARDWARE: as Andi mentioned in
a discussion I had with him today off the list, the link back to the
interface can get lost when a packet passes through complex IP filter
rules. SO_TIMESTAMP_IS_HARDWARE would always work while
SO_TIMESTAME_HARDWARE fails in this case.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-20  7:35         ` Patrick Ohly
@ 2008-10-20 18:01           ` Oliver Hartkopp
  2008-10-21  7:29             ` Patrick Ohly
  0 siblings, 1 reply; 14+ messages in thread
From: Oliver Hartkopp @ 2008-10-20 18:01 UTC (permalink / raw)
  To: Patrick Ohly
  Cc: Eric Dumazet, netdev@vger.kernel.org, Octavian Purdila,
	Stephen Hemminger, Ingo Oeser, Andi Kleen, Ronciak, John

Patrick Ohly wrote:
> On Sat, 2008-10-18 at 04:10 -0600, Oliver Hartkopp wrote:
>   
>> Eric Dumazet wrote:
>>     
>>> Oliver Hartkopp a écrit :
>>>       
>>>> If so i would tend to fill both (system time and hw timestamp) on
>>>> driver level into the skb and then decide on socket level what to
>>>> push into user space as you suggested above.
>>>>         
>>> Well, this would enlarge skb structure by 8 bytes, since you cannot use
>>> same tstamp location to fille both 8 bytes values.
>>> This is probably the easy way, but very expensive...
>>>       
>> IMHO this is the only way to fulfill the given requirements.
>> Maybe we should introduce a new kernel config option for hw tstamps then ...
>>     
>
> The last time this topic was discussed the initial proposal also was to
> add another time stamp, pretty much for the same reasons. This approach
> was discarded because enlarging a common structure like skb for rather
> obscure ("Objection, your honor!" - "Rejected.") use cases is not
> acceptable.

I don't want to raise dust again but having HW timestamps are also 
interesting for some CAN (Controller Area Network) users.
We had several discussions on the SocketCAN ML on HW timestamps that are 
provided by some CAN controllers or active/intelligent CAN nodes (with 
onboard-CPUs). For me it was not that relevant as stamping the skb in 
the rx-path was always 'accurate enough' for me - but I'm not the CAN 
timestamp expert. Fortunately the HW timestamp was not pushed into 
skb->data (ugh!) but supporting HW timestamps for userspace apps is 
still a wanted feature.

>  A config option doesn't help much either because to be
> useful for distribution users, it would have to be on by default.
>   

Hm - i tried to follow your points in the linked PDF 
(http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf) 
- and from my perspective having a kernel config option looks like an 
appropriate solution here. Either some CAN controllers or HPC clusters 
that would benefit from HW timestamps are IMHO no 'standard use-cases' 
that use 'standard kernels' provided by a 'standard distributor', right?

I assume the system timestamps to be accurate enough for 'standard 
users' so HW timestamps could be a possible candidate for a config 
option - or did i miss anything vital here?

Especially it makes the implementation very clear and without any 
expensive how-to-bitcompress-several-values-into-tstamp approaches.

Regards,
Oliver


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-18 19:37 ` Octavian Purdila
  2008-10-20 12:27   ` Patrick Ohly
@ 2008-10-21  7:04   ` Andi Kleen
  2008-10-21  7:40     ` Patrick Ohly
  1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2008-10-21  7:04 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Patrick Ohly, netdev, Stephen Hemminger, Ingo Oeser,
	Ronciak, John

> We can even compute the delta periodically now, to maintain better system - 
> hardware timestamps synchronization, as we can keep and multiple deltas (each 
> one associated with a modulo number).

The problem with this scheme is that it's unlikely to be precise enough to guarantee
monoticity (that is that your delta clock compared to the system clock never goes
backwards). And that tends to be a common requirements in system time stamps.
Not having that would risk breaking existing applications.

My recommendation would be to find some way to use a separate field and also
use a separate API. That would also allow you to extend it (e.g. pass down
the interface number), so that different time stamps from different interfaces
are supported.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-20 18:01           ` Oliver Hartkopp
@ 2008-10-21  7:29             ` Patrick Ohly
  0 siblings, 0 replies; 14+ messages in thread
From: Patrick Ohly @ 2008-10-21  7:29 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Eric Dumazet, netdev@vger.kernel.org, Octavian Purdila,
	Stephen Hemminger, Ingo Oeser, Andi Kleen, Ronciak, John

On Mon, 2008-10-20 at 11:01 -0700, Oliver Hartkopp wrote:
> Patrick Ohly wrote:
> > The last time this topic was discussed the initial proposal also was to
> > add another time stamp, pretty much for the same reasons. This approach
> > was discarded because enlarging a common structure like skb for rather
> > obscure ("Objection, your honor!" - "Rejected.") use cases is not
> > acceptable.
> 
> I don't want to raise dust again

Please, keep raising that dust ;-) I'd very much prefer to have an
another field myself, but I also want to get the patch into the upstream
kernel. The more people argue in favor of adding it, the more likely
that gets.

In the meantime I'll proceed with an implementation based on bit
mangling. The latest iteration of the user space APIs hide this
implementation detail, so it'll be easy to switch from bit mangling to a
separate field.

> >  A config option doesn't help much either because to be
> > useful for distribution users, it would have to be on by default.
> >
> 
> Hm - i tried to follow your points in the linked PDF
> (http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf)
> - and from my perspective having a kernel config option looks like an
> appropriate solution here. Either some CAN controllers or HPC clusters
> that would benefit from HW timestamps are IMHO no 'standard use-cases'
> that use 'standard kernels' provided by a 'standard distributor', right?

My estimation is that there are a lot more HPC clusters which use
standard "Enterprise Linux" distributions with vendor support and/or
cannot/do not want to use a self-compiled kernel. My goal therefore is
to have the support for HW time stamps enabled in the default kernel
configuration.

Perhaps the "use separate field" implementation of that support could be
selected via an option for those who really need it.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hardware time stamps + existing time stamp usage
  2008-10-21  7:04   ` Andi Kleen
@ 2008-10-21  7:40     ` Patrick Ohly
  0 siblings, 0 replies; 14+ messages in thread
From: Patrick Ohly @ 2008-10-21  7:40 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Octavian Purdila, netdev@vger.kernel.org, Stephen Hemminger,
	Ingo Oeser, Ronciak, John

On Tue, 2008-10-21 at 01:04 -0600, Andi Kleen wrote:
> > We can even compute the delta periodically now, to maintain better system -
> > hardware timestamps synchronization, as we can keep and multiple deltas (each
> > one associated with a modulo number).
> 
> The problem with this scheme is that it's unlikely to be precise enough to guarantee
> monoticity (that is that your delta clock compared to the system clock never goes
> backwards). And that tends to be a common requirements in system time stamps.
> Not having that would risk breaking existing applications.

Agreed. But even those users who need absolute monoticity would be able
to use PTP: at least the Intel hardware would be configured to only time
stamp PTP packets while the application packets that the user cares
about are still time stamped in software, as before.

> My recommendation would be to find some way to use a separate field and also
> use a separate API. That would also allow you to extend it (e.g. pass down
> the interface number), so that different time stamps from different interfaces
> are supported.

The latest proposal already uses such a separate API for HW time stamps,
so we are fine in that regard. In my opinion the API should only return
information which isn't available otherwise (currently the original HW
time stamp); the interface number should be returned with the existing
IP_PKTINFO.

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-10-21  7:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-17 14:23 hardware time stamps + existing time stamp usage Patrick Ohly
2008-10-18  5:02 ` Eric Dumazet
2008-10-18  7:38   ` Oliver Hartkopp
2008-10-18  8:54     ` Eric Dumazet
2008-10-18 10:10       ` Oliver Hartkopp
2008-10-20  7:35         ` Patrick Ohly
2008-10-20 18:01           ` Oliver Hartkopp
2008-10-21  7:29             ` Patrick Ohly
2008-10-18 19:37 ` Octavian Purdila
2008-10-20 12:27   ` Patrick Ohly
2008-10-20 13:07     ` Octavian Purdila
2008-10-20 13:37       ` Patrick Ohly
2008-10-21  7:04   ` Andi Kleen
2008-10-21  7:40     ` Patrick Ohly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).