* [RFC] support for IEEE 1588
@ 2008-07-03 22:47 Octavian Purdila
2008-07-03 23:24 ` Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Octavian Purdila @ 2008-07-03 22:47 UTC (permalink / raw)
To: netdev
Hi everybody,
IEEE 1588 Precision Time Protocol [1] requires hardware timestamping support
for both RX and TX frames. It seems that in Linux we do not have the support
required for this protocol to be implemented.
Any feedback on the approach we are planing to take is greatly appreciated. We
will follow with a patch at some point, but I just want to check with you
gurus early, to avoid potential design flaws. If a patch is preferred for
commenting on, then please ignore this and will come back later with the
patch.
1. RX path
- add a new field in skb to keep the hardware stamp (hwstamp)
- add a new socket flag to enable RX stamping
- add a new control message to retrieve the hwstamp from the skb to user-space
application (for UDP and maybe PF_PACKET)
2. TX path - this is a bit more complicated since we need a new mechanism to
wait for a packet transmission on wire, from users-space.
- add a new flag for the skb to request TX stamping
- add a new control message to propagate the TX stamping request from
userspace to the skb
- when the driver will send the packet will get the stamp from the TX
completion ring; the driver will then propagate the stamp either to
(a) the skb stamp field, or (b) some special structure - this to avoid keeping
the skb around
- the special structure or the skb will be linked to a special queue in the
socket and a POLLPRI event will be generated
- the application will use recvmsg and will receive a new control message
which contains the timestamp from the socket special queue
We will probably need to associate a cookie with each TX stamping control
message which will be retrieve in the later control message, so that the
application can match send packets with timestamps.
[1] http://ieee1588.nist.gov/tutorial-basic.pdf
Thanks,
tavi
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RFC] support for IEEE 1588 2008-07-03 22:47 [RFC] support for IEEE 1588 Octavian Purdila @ 2008-07-03 23:24 ` Stephen Hemminger 2008-07-03 23:40 ` Octavian Purdila 2008-07-04 13:37 ` Patrick Ohly 2008-07-09 15:31 ` Lennart Sorensen 2 siblings, 1 reply; 10+ messages in thread From: Stephen Hemminger @ 2008-07-03 23:24 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On Fri, 4 Jul 2008 01:47:11 +0300 Octavian Purdila <opurdila@ixiacom.com> wrote: > > Hi everybody, > > IEEE 1588 Precision Time Protocol [1] requires hardware timestamping support > for both RX and TX frames. It seems that in Linux we do not have the support > required for this protocol to be implemented. > > Any feedback on the approach we are planing to take is greatly appreciated. We > will follow with a patch at some point, but I just want to check with you > gurus early, to avoid potential design flaws. If a patch is preferred for > commenting on, then please ignore this and will come back later with the > patch. > > 1. RX path > - add a new field in skb to keep the hardware stamp (hwstamp) > - add a new socket flag to enable RX stamping > - add a new control message to retrieve the hwstamp from the skb to user-space > application (for UDP and maybe PF_PACKET) The existing skb timestamp is there, and if the hardware supports it, it could be updated by the device driver I had a version of sky2 that did just that but never fully pushed it upstream because of available time and testing issues. The API's are already there (and used) for timestamping; don't invent new ones. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] support for IEEE 1588 2008-07-03 23:24 ` Stephen Hemminger @ 2008-07-03 23:40 ` Octavian Purdila 2008-07-04 0:15 ` Rick Jones 0 siblings, 1 reply; 10+ messages in thread From: Octavian Purdila @ 2008-07-03 23:40 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev On Friday 04 July 2008, Stephen Hemminger wrote: > > > > 1. RX path > > - add a new field in skb to keep the hardware stamp (hwstamp) > > - add a new socket flag to enable RX stamping > > - add a new control message to retrieve the hwstamp from the skb to > > user-space application (for UDP and maybe PF_PACKET) > > The existing skb timestamp is there, and if the hardware supports it, it > could be updated by the device driver I had a version of sky2 that did > just that but never fully pushed it upstream because of available time and > testing issues. > > The API's are already there (and used) for timestamping; don't invent > new ones. Hi Stephen, Thanks for taking the time to respond. The hardware we will be using will not have the timestamping unit synchronized to gettimeofday(). In this conditions, is it OK to put our hw stamp into skb->tstamp? Also, are there any APIs we can use on the TX side as well? Thanks, tavi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] support for IEEE 1588 2008-07-03 23:40 ` Octavian Purdila @ 2008-07-04 0:15 ` Rick Jones 2008-07-04 0:42 ` Octavian Purdila 2008-07-04 11:24 ` Andi Kleen 0 siblings, 2 replies; 10+ messages in thread From: Rick Jones @ 2008-07-04 0:15 UTC (permalink / raw) To: Octavian Purdila; +Cc: Stephen Hemminger, netdev Octavian Purdila wrote: > On Friday 04 July 2008, Stephen Hemminger wrote: > > >>>1. RX path >>>- add a new field in skb to keep the hardware stamp (hwstamp) >>>- add a new socket flag to enable RX stamping >>>- add a new control message to retrieve the hwstamp from the skb to >>>user-space application (for UDP and maybe PF_PACKET) >> >>The existing skb timestamp is there, and if the hardware supports it, it >>could be updated by the device driver I had a version of sky2 that did >>just that but never fully pushed it upstream because of available time and >>testing issues. >> >>The API's are already there (and used) for timestamping; don't invent >>new ones. > > > Hi Stephen, > > Thanks for taking the time to respond. > > The hardware we will be using will not have the timestamping unit synchronized > to gettimeofday(). In this conditions, is it OK to put our hw stamp into > skb->tstamp? I've not had a good emily litella moment in at least a week, so I'll ask - if the clock in the hardware generating the timestamp and the clock in the host aren't synchronized in _some_ way, what benefit is there to putting the hardware's timestamp in there? rick jones wonders the extent to which 1588 might enable one-way latency measurements in something like netperf... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] support for IEEE 1588 2008-07-04 0:15 ` Rick Jones @ 2008-07-04 0:42 ` Octavian Purdila 2008-07-04 11:24 ` Andi Kleen 1 sibling, 0 replies; 10+ messages in thread From: Octavian Purdila @ 2008-07-04 0:42 UTC (permalink / raw) To: Rick Jones; +Cc: Stephen Hemminger, netdev On Friday 04 July 2008, Rick Jones wrote: > I've not had a good emily litella moment in at least a week, so I'll ask > - if the clock in the hardware generating the timestamp and the clock in > the host aren't synchronized in _some_ way, what benefit is there to > putting the hardware's timestamp in there? > We actually currently use them for delay/jitter calculation in conjunction with having the RX and TX port's source timestamping units running in sync. We can do that since both the RX and TX port (Linux based) will run in our hardware (chassis). I guess we could try to do a simple sync between the host clock and the hw clock by getting the initial delta between the two. But since the two clocks are not in sync, they will diverge in time. And since I do not know enough about the way in which tstamp is currently used, I'm not very confident that this will not break something... so, back to grepping :) Thanks, tavi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] support for IEEE 1588 2008-07-04 0:15 ` Rick Jones 2008-07-04 0:42 ` Octavian Purdila @ 2008-07-04 11:24 ` Andi Kleen 1 sibling, 0 replies; 10+ messages in thread From: Andi Kleen @ 2008-07-04 11:24 UTC (permalink / raw) To: Rick Jones; +Cc: Octavian Purdila, Stephen Hemminger, netdev Rick Jones <rick.jones2@hp.com> writes: > > I've not had a good emily litella moment in at least a week, so I'll > ask - if the clock in the hardware generating the timestamp and the > clock in the host aren't synchronized in _some_ way, what benefit is > there to putting the hardware's timestamp in there? The point is to synchronize them using a high quality data stamp. -Andi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] support for IEEE 1588 2008-07-03 22:47 [RFC] support for IEEE 1588 Octavian Purdila 2008-07-03 23:24 ` Stephen Hemminger @ 2008-07-04 13:37 ` Patrick Ohly 2008-07-05 0:21 ` Octavian Purdila 2008-07-09 15:31 ` Lennart Sorensen 2 siblings, 1 reply; 10+ messages in thread From: Patrick Ohly @ 2008-07-04 13:37 UTC (permalink / raw) To: netdev Hallo Tavi, Interesting initiative. I'm employed by Intel and had the chance to do some exploratory work on software PTP support for Intel's new 82576 Gigabit Ethernet Controller [1], which introduces hardware time stamping for PTP packets. I modified the open source PTPd so that it uses the more accurate hardware time stamps instead of time stamps generated by the Linux IP stack. The advantage was 50x higher accuracy under load. You can read more about that in a paper [2]. [1] http://download.intel.com/design/network/ProdBrf/320025.pdf [2] http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf In order to get these time stamps and read the clock inside the NIC which generates these time stamps, we had to add ioctl() calls to the igb driver - not nice and certainly not a suitable long-term solution. If there is a consensus on a better user space API and the Linux IP stack gets a general framework for PTP, then perhaps it could also be used with Intel's new NICs. Note that I'm not speaking in any official capacity for Intel here, just expressing my own opinion (and hope). I'm not even in the network team. I cannot release the PTPd and igb patches right now because that would require legal approval, but if there is interest I can get that process started. There's no reason not to do that. So, let's move on to Tavi's proposal: On Fri, 2008-07-04 at 01:47 +0300, Octavian Purdila wrote: > 1. RX path > - add a new field in skb to keep the hardware stamp (hwstamp) > - add a new socket flag to enable RX stamping > - add a new control message to retrieve the hwstamp from the skb to user-space > application (for UDP and maybe PF_PACKET) I agree. Currently there is something similar with SO_TIMESTAMP and SCM_TIMESTAMP, but the problem with those is that only a timeval is returned, i.e., accuracy is limited to microseconds. To make full use of hardware time stamps we'll want a timespec with nanoseconds. We also need something more flexible than SO_TIMESTAMP. Depending on what the user space program wants to measure, it would be useful to time stamp * the various flavors of PTP packets (v1/v2/802.1as, SYNC/DELAY_REQUEST) selectively * all packets The hardware might not be capable of supporting all modes, but at least the API should support them and provide room for future extensions. It would be possible to fall back to time stamping using system time if the hardware is incapable of implementing the requested operation. Depending on how that fallback is implemented, PTPd's accuracy might be improved even without any hardware support. > 2. TX path - this is a bit more complicated since we need a new mechanism to > wait for a packet transmission on wire, from users-space. > - add a new flag for the skb to request TX stamping > - add a new control message to propagate the TX stamping request from > userspace to the skb Forgive me my ignorance, can you provide more details how that would work? How about adding a new flag for send/sendto/sendmsg() instead of a new control message? > - when the driver will send the packet will get the stamp from the TX > completion ring; the driver will then propagate the stamp either to > (a) the skb stamp field, or (b) some special structure - this to avoid keeping > the skb around > - the special structure or the skb will be linked to a special queue in the > socket and a POLLPRI event will be generated > - the application will use recvmsg and will receive a new control message > which contains the timestamp from the socket special queue Sounds a bit complicated to me. The trick currently used by PTPd might be more elegant and/or require less changes: it enables looping of outgoing packets with IP_MULTICAST_LOOP. The RX timestamp of the looped packet is then used as approximation for the TX time stamp of the original outgoing packet. Clearly this is inaccurate, in particular under load, but it is very easy to use. When a driver gets a skb with the request to generate a TX time stamp, it could send the packet, upon completion obtain the time stamp from the hardware and feed the packet and the time stamp back to the upper layers as if it had just been received. Would that work? The user space then obtains TX time stamps just like RX time stamps and can use the payload to determine what kind of time stamp it got. That also avoids the need for special cookies to detect packet loss or reordering. So far all that we get out of this is access to the raw time stamps. There may be some use for that, as Tavi said, but it would be a lot more interesting if the kernel would transform the raw time stamps into system time stamps if the user space process wants that. Then it can be used by a modified PTPd to synchronize the system time inside a cluster a lot more accurately than it is currently possible with NTP (think sub-microsecond accuracy instead of milliseconds). On Fri, 2008-07-04 at 03:42 +0300, Octavian Purdila wrote: > I guess we could try to do a simple sync between the host clock and the hw > clock by getting the initial delta between the two. But since the two clocks > are not in sync, they will diverge in time. For the paper I tried out two different ways of synchronizing the system time with the NIC time. The one called "Assisted System Time" could be implemented relatively easily inside the IP stack: the driver only has to provide access to the NIC's hardware clock. Then the layer above it can sample the system time/NIC time offset at regular intervals; when they drift apart, that drift rate can be tracked as part of the measurements and be taken into account when transforming from one time base into the other. The other method ("Two-Level PTP") is more complicated and didn't bring much benefit. Bye, Patrick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] support for IEEE 1588 2008-07-04 13:37 ` Patrick Ohly @ 2008-07-05 0:21 ` Octavian Purdila 2008-07-07 12:34 ` Patrick Ohly 0 siblings, 1 reply; 10+ messages in thread From: Octavian Purdila @ 2008-07-05 0:21 UTC (permalink / raw) To: Patrick Ohly; +Cc: netdev On Friday 04 July 2008, Patrick Ohly wrote: > Hallo Tavi, > > Interesting initiative. I'm employed by Intel and had the chance to do > some exploratory work on software PTP support for Intel's new 82576 > Gigabit Ethernet Controller [1], which introduces hardware time stamping > for PTP packets. I modified the open source PTPd so that it uses the > more accurate hardware time stamps instead of time stamps generated by > the Linux IP stack. The advantage was 50x higher accuracy under load. > You can read more about that in a paper [2]. > Nice work, will need some time to chew on the paper :) > > 2. TX path - this is a bit more complicated since we need a new mechanism > > to wait for a packet transmission on wire, from users-space. > > - add a new flag for the skb to request TX stamping > > - add a new control message to propagate the TX stamping request from > > userspace to the skb > > Forgive me my ignorance, can you provide more details how that would > work? > > How about adding a new flag for send/sendto/sendmsg() instead of a new > control message? > The control message will allow us to associate a cookie with the skb (say, for instance, that the app will receive the value of the skb pointer). That cookie will be returned when we will get the TX stamp, and will thus allow us to match the stamp and the packet. For the PTPd, this is probably not required, but it will help with applications that have multiple outstanding TX packets. > > - when the driver will send the packet will get the stamp from the TX > > completion ring; the driver will then propagate the stamp either to > > (a) the skb stamp field, or (b) some special structure - this to avoid > > keeping the skb around > > - the special structure or the skb will be linked to a special queue in > > the socket and a POLLPRI event will be generated > > - the application will use recvmsg and will receive a new control message > > which contains the timestamp from the socket special queue > > Sounds a bit complicated to me. The trick currently used by PTPd might > be more elegant and/or require less changes: it enables looping of > outgoing packets with IP_MULTICAST_LOOP. The RX timestamp of the looped > packet is then used as approximation for the TX time stamp of the > original outgoing packet. Clearly this is inaccurate, in particular > under load, but it is very easy to use. > I am probably missing something: I thought IP_MULTICAST_LOOP is done in software... If so, how would the hardware be able to timestamp? > When a driver gets a skb with the request to generate a TX time stamp, > it could send the packet, upon completion obtain the time stamp from the > hardware and feed the packet and the time stamp back to the upper layers > as if it had just been received. Would that work? > > The user space then obtains TX time stamps just like RX time stamps and > can use the payload to determine what kind of time stamp it got. That > also avoids the need for special cookies to detect packet loss or > reordering. > For a generic protocols (not PTP) I think this will not work: e.g an UDP packet could be dropped or hit the wrong socket (due to missmatch between the source and destination port). > So far all that we get out of this is access to the raw time stamps. > There may be some use for that, as Tavi said, but it would be a lot more > interesting if the kernel would transform the raw time stamps into > system time stamps if the user space process wants that. Then it can be > used by a modified PTPd to synchronize the system time inside a cluster > a lot more accurately than it is currently possible with NTP (think > sub-microsecond accuracy instead of milliseconds). > > For the paper I tried out two different ways of synchronizing the system > time with the NIC time. The one called "Assisted System Time" could be > implemented relatively easily inside the IP stack: the driver only has > to provide access to the NIC's hardware clock. Then the layer above it > can sample the system time/NIC time offset at regular intervals; when > they drift apart, that drift rate can be tracked as part of the > measurements and be taken into account when transforming from one time > base into the other. The other method ("Two-Level PTP") is more > complicated and didn't bring much benefit. > Will look into it, thanks for pointing it out. Thanks, tavi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] support for IEEE 1588 2008-07-05 0:21 ` Octavian Purdila @ 2008-07-07 12:34 ` Patrick Ohly 0 siblings, 0 replies; 10+ messages in thread From: Patrick Ohly @ 2008-07-07 12:34 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On Sat, 2008-07-05 at 03:21 +0300, Octavian Purdila wrote: > On Friday 04 July 2008, Patrick Ohly wrote: [cookie for TX time stamp] > For the PTPd, this is probably not required, but it will help with > applications that have multiple outstanding TX packets. I think it is relevant also for PTPd: without such a cookie the next SYNC cannot be sent unless the previous time stamp was obtained successfully. If the packet is lost without triggering the hardware time stamping (which I have seen happen in practice under load), then the PTPd would get stuck. With a cookie it can send SYNCs at the normal intervals and be sure not to mix up time stamps. This is more of a theoretical problem, though: if there is no time stamp after two seconds, the corresponding packet most likely got lost and it is fairly safe to send a new SYNC and to assume that the next time stamp will be for that SYNC. > > > - when the driver will send the packet will get the stamp from the TX > > > completion ring; the driver will then propagate the stamp either to > > > (a) the skb stamp field, or (b) some special structure - this to avoid > > > keeping the skb around > > > - the special structure or the skb will be linked to a special queue in > > > the socket and a POLLPRI event will be generated > > > - the application will use recvmsg and will receive a new control message > > > which contains the timestamp from the socket special queue > > > > Sounds a bit complicated to me. The trick currently used by PTPd might > > be more elegant and/or require less changes: it enables looping of > > outgoing packets with IP_MULTICAST_LOOP. The RX timestamp of the looped > > packet is then used as approximation for the TX time stamp of the > > original outgoing packet. Clearly this is inaccurate, in particular > > under load, but it is very easy to use. > > > > I am probably missing something: I thought IP_MULTICAST_LOOP is done in > software... If so, how would the hardware be able to timestamp? I was just using IP_MULTICAST_LOOP+SO_TIMESTAMP as an example of how a similar mechanism could work for hardware time stamping. > > When a driver gets a skb with the request to generate a TX time stamp, > > it could send the packet, upon completion obtain the time stamp from the > > hardware and feed the packet and the time stamp back to the upper layers > > as if it had just been received. Would that work? > > > > The user space then obtains TX time stamps just like RX time stamps and > > can use the payload to determine what kind of time stamp it got. That > > also avoids the need for special cookies to detect packet loss or > > reordering. > > > > For a generic protocols (not PTP) I think this will not work: e.g an UDP > packet could be dropped or hit the wrong socket (due to missmatch between the > source and destination port). Right. Perhaps that could be solved with an additional control message which specifies where the bounced packet is to be sent to, but that is not much (not at all?) easier than the cookie approach. I'm confident that adapting PTPd to the cookie approach wouldn't be difficult, so if you want to implement it, then go for it. -- Best Regards, Patrick Ohly The content of this message is my personal opinion only and although I am an employee of Intel, the statements I make here in no way represent Intel's position on the issue, nor am I authorized to speak on behalf of Intel on this matter. The email footer below is automatically added to comply with company policy; this particular email is not confidental and does not have a limited set of recipients. Therefore it can be redistributed and discussed without restrictions. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC] support for IEEE 1588 2008-07-03 22:47 [RFC] support for IEEE 1588 Octavian Purdila 2008-07-03 23:24 ` Stephen Hemminger 2008-07-04 13:37 ` Patrick Ohly @ 2008-07-09 15:31 ` Lennart Sorensen 2 siblings, 0 replies; 10+ messages in thread From: Lennart Sorensen @ 2008-07-09 15:31 UTC (permalink / raw) To: Octavian Purdila; +Cc: netdev On Fri, Jul 04, 2008 at 01:47:11AM +0300, Octavian Purdila wrote: > > Hi everybody, > > IEEE 1588 Precision Time Protocol [1] requires hardware timestamping support > for both RX and TX frames. It seems that in Linux we do not have the support > required for this protocol to be implemented. ptpd in user space plus some specialized hardware is doing it for us. No kernel changes were required. Of course the specialized hardware maintains time and receives it from GPS, so I guess that's only one possible use. As a client the hardware deals with the 1588 packets and generates NMEA style messages on a serial port and makes it look to ntp as if it is receiving time from a standard GPS with PPS on the CD pin. To support receiving 1588 on one port and forwarding it out another would certainly need more support both from hardware and probably software. > Any feedback on the approach we are planing to take is greatly appreciated. We > will follow with a patch at some point, but I just want to check with you > gurus early, to avoid potential design flaws. If a patch is preferred for > commenting on, then please ignore this and will come back later with the > patch. > > 1. RX path > - add a new field in skb to keep the hardware stamp (hwstamp) > - add a new socket flag to enable RX stamping > - add a new control message to retrieve the hwstamp from the skb to user-space > application (for UDP and maybe PF_PACKET) > > 2. TX path - this is a bit more complicated since we need a new mechanism to > wait for a packet transmission on wire, from users-space. > - add a new flag for the skb to request TX stamping > - add a new control message to propagate the TX stamping request from > userspace to the skb > - when the driver will send the packet will get the stamp from the TX > completion ring; the driver will then propagate the stamp either to > (a) the skb stamp field, or (b) some special structure - this to avoid keeping > the skb around > - the special structure or the skb will be linked to a special queue in the > socket and a POLLPRI event will be generated > - the application will use recvmsg and will receive a new control message > which contains the timestamp from the socket special queue > > We will probably need to associate a cookie with each TX stamping control > message which will be retrieve in the later control message, so that the > application can match send packets with timestamps. > > [1] http://ieee1588.nist.gov/tutorial-basic.pdf -- Len Sorensen ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-07-09 15:31 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-03 22:47 [RFC] support for IEEE 1588 Octavian Purdila 2008-07-03 23:24 ` Stephen Hemminger 2008-07-03 23:40 ` Octavian Purdila 2008-07-04 0:15 ` Rick Jones 2008-07-04 0:42 ` Octavian Purdila 2008-07-04 11:24 ` Andi Kleen 2008-07-04 13:37 ` Patrick Ohly 2008-07-05 0:21 ` Octavian Purdila 2008-07-07 12:34 ` Patrick Ohly 2008-07-09 15:31 ` Lennart Sorensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).