* [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
@ 2008-10-22 8:17 ` Patrick Ohly
2008-11-12 7:41 ` Eric Dumazet
2008-11-12 9:58 ` David Miller
2008-10-22 12:46 ` [RFC PATCH 01/13] put_cmsg_compat + SO_TIMESTAMP[NS]: use same name for value as caller Patrick Ohly
` (12 subsequent siblings)
13 siblings, 2 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-10-22 8:17 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
If generated in hardware, then the driver must convert to system
time before storing the transformed value in sk_buff with skb_hwtstamp_set().
If conversion back to the original hardware time stamp is desired,
then the driver needs to implement the hwtstamp_raw() callback, which
is called by skb_hwtstamp_raw().
The purpose of the new skb_* methods is the hiding of how hardware
time stamps are really stored. Later they might be stored in an extra
field instead of mangling the existing tstamp.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
include/linux/netdevice.h | 12 +++++++
include/linux/skbuff.h | 76 +++++++++++++++++++++++++++++++++++++++++++-
net/core/skbuff.c | 32 +++++++++++++++++++
3 files changed, 118 insertions(+), 2 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 488c56e..4da51cb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -749,6 +749,18 @@ struct net_device
/* for setting kernel sock attribute on TCP connection setup */
#define GSO_MAX_SIZE 65536
unsigned int gso_max_size;
+
+ /* hardware time stamping support */
+#define HAVE_HW_TIME_STAMP
+ /* Transforms skb->tstamp back to the original, raw hardware
+ * time stamp. The value must have been generated by the
+ * device. Implementing this is optional, but necessary for
+ * SO_TIMESTAMP_HARDWARE.
+ *
+ * Returns 1 if value could be retrieved, 0 otherwise.
+ */
+ int (*hwtstamp_raw)(const struct sk_buff *skb,
+ struct timespec *stamp);
};
#define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9099237..0b3b36a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -199,7 +199,10 @@ typedef unsigned char *sk_buff_data_t;
* @next: Next buffer in list
* @prev: Previous buffer in list
* @sk: Socket we are owned by
- * @tstamp: Time we arrived
+ * @tstamp: Time we arrived: usually generated by ktime_get_real() and
+ * thus is recorded in system time. If the lowest bit is set,
+ * then the value was originally generated by a different clock
+ * in the receiving hardware and then transformed to system time.
* @dev: Device we arrived on/are leaving by
* @transport_header: Transport layer header
* @network_header: Network layer header
@@ -1524,23 +1527,52 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
extern void skb_init(void);
+/** returns skb->tstamp without the bit which marks hardware time stamps */
+static inline union ktime skb_get_ktime(const struct sk_buff *skb)
+{
+#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+ return ktime_set(skb->tstamp.tv.sec,
+ skb->tstamp.tv.nsec & ~1);
+#else
+ return (ktime_t) { .tv64 = skb->tstamp.tv64 & ~1UL };
+#endif
+}
+
/**
* skb_get_timestamp - get timestamp from a skb
* @skb: skb to get stamp from
* @stamp: pointer to struct timeval to store stamp in
*
* Timestamps are stored in the skb as offsets to a base timestamp.
+ * The lowest bit is set if and only if the time stamp was originally
+ * created by hardware when processing the packet.
+ *
* This function converts the offset back to a struct timeval and stores
* it in stamp.
*/
static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval *stamp)
{
- *stamp = ktime_to_timeval(skb->tstamp);
+ *stamp = ktime_to_timeval(skb_get_ktime(skb));
+}
+
+static inline void skb_get_timestampns(const struct sk_buff *skb, struct timespec *stamp)
+{
+ *stamp = ktime_to_timespec(skb_get_ktime(skb));
}
static inline void __net_timestamp(struct sk_buff *skb)
{
skb->tstamp = ktime_get_real();
+
+ /*
+ * make sure that lowest bit is never set: it marks hardware
+ * time stamps
+ */
+#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+ skb->tstamp.tv.sec = skb->tstamp.tv.sec / 2 * 2;
+#else
+ skb->tstamp.tv64 = skb->tstamp.tv64 / 2 * 2;
+#endif
}
static inline ktime_t net_timedelta(ktime_t t)
@@ -1553,6 +1585,46 @@ static inline ktime_t net_invalid_timestamp(void)
return ktime_set(0, 0);
}
+/**
+ * checks whether the time stamp value has been set (= non-zero)
+ * and really came from hardware
+ */
+static inline int skb_hwtstamp_available(const struct sk_buff *skb)
+{
+ return skb->tstamp.tv64 & 1;
+}
+
+static inline void skb_hwtstamp_set(struct sk_buff *skb,
+ union ktime stamp)
+{
+#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+ skb->tstamp.tv.sec = stamp.tv.sec;
+ skb->tstamp.tv.nsec = stamp.tv.nsec | 1;
+#else
+ skb->tstamp.tv64 = stamp.tv64 | 1;
+#endif
+}
+
+/**
+ * Fills the timespec with the original, "raw" time stamp as generated
+ * by the hardware when it processed the packet and returns 1 if such
+ * a hardware time stamp is unavailable or cannot be inferred. Otherwise
+ * it returns 0;
+ */
+int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp);
+
+/**
+ * Fills the timespec with the hardware time stamp generated when the
+ * hardware processed the packet, transformed to system time. Beware
+ * that this transformation is not perfect: packet A received on
+ * interface 1 before packet B on interface 2 might have a higher
+ * transformed time stamp.
+ *
+ * Returns 1 if a transformed hardware time stamp is available, 0
+ * otherwise.
+ */
+int skb_hwtstamp_transformed(const struct sk_buff *skb, struct timespec *stamp);
+
extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ca1ccdf..7a95062 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -44,6 +44,7 @@
#include <linux/in.h>
#include <linux/inet.h>
#include <linux/slab.h>
+#include <linux/inetdevice.h>
#include <linux/netdevice.h>
#ifdef CONFIG_NET_CLS_ACT
#include <net/pkt_sched.h>
@@ -2323,6 +2324,37 @@ err:
EXPORT_SYMBOL_GPL(skb_segment);
+int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp)
+{
+ struct rtable *rt;
+ struct in_device *idev;
+ struct net_device *netdev;
+
+ if (skb_hwtstamp_available(skb) &&
+ (rt = skb->rtable) != NULL &&
+ (idev = rt->idev) != NULL &&
+ (netdev = idev->dev) != NULL &&
+ netdev->hwtstamp_raw) {
+ return netdev->hwtstamp_raw(skb, stamp);
+ } else {
+ return 0;
+ }
+}
+
+EXPORT_SYMBOL_GPL(skb_hwtstamp_raw);
+
+int skb_hwtstamp_transformed(const struct sk_buff *skb, struct timespec *stamp)
+{
+ if (skb_hwtstamp_available(skb)) {
+ skb_get_timestampns(skb, stamp);
+ return 1;
+ } else {
+ return 0;
+ }
+}
+
+EXPORT_SYMBOL_GPL(skb_hwtstamp_transformed);
+
void __init skb_init(void)
{
skbuff_head_cache = kmem_cache_create("skbuff_head_cache",
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps
2008-10-22 8:17 ` [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps Patrick Ohly
@ 2008-11-12 7:41 ` Eric Dumazet
2008-11-12 8:09 ` Patrick Ohly
2008-11-12 9:58 ` David Miller
1 sibling, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2008-11-12 7:41 UTC (permalink / raw)
To: Patrick Ohly
Cc: netdev, Octavian Purdila, Stephen Hemminger, Ingo Oeser,
Andi Kleen, John Ronciak, Oliver Hartkopp
Patrick Ohly a écrit :
> If generated in hardware, then the driver must convert to system
> time before storing the transformed value in sk_buff with skb_hwtstamp_set().
> If conversion back to the original hardware time stamp is desired,
> then the driver needs to implement the hwtstamp_raw() callback, which
> is called by skb_hwtstamp_raw().
>
> The purpose of the new skb_* methods is the hiding of how hardware
> time stamps are really stored. Later they might be stored in an extra
> field instead of mangling the existing tstamp.
>
> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
> ---
> include/linux/netdevice.h | 12 +++++++
> include/linux/skbuff.h | 76 +++++++++++++++++++++++++++++++++++++++++++-
> net/core/skbuff.c | 32 +++++++++++++++++++
> 3 files changed, 118 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 488c56e..4da51cb 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -749,6 +749,18 @@ struct net_device
> /* for setting kernel sock attribute on TCP connection setup */
> #define GSO_MAX_SIZE 65536
> unsigned int gso_max_size;
> +
> + /* hardware time stamping support */
> +#define HAVE_HW_TIME_STAMP
> + /* Transforms skb->tstamp back to the original, raw hardware
> + * time stamp. The value must have been generated by the
> + * device. Implementing this is optional, but necessary for
> + * SO_TIMESTAMP_HARDWARE.
> + *
> + * Returns 1 if value could be retrieved, 0 otherwise.
> + */
> + int (*hwtstamp_raw)(const struct sk_buff *skb,
> + struct timespec *stamp);
> };
> #define to_net_dev(d) container_of(d, struct net_device, dev)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 9099237..0b3b36a 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -199,7 +199,10 @@ typedef unsigned char *sk_buff_data_t;
> * @next: Next buffer in list
> * @prev: Previous buffer in list
> * @sk: Socket we are owned by
> - * @tstamp: Time we arrived
> + * @tstamp: Time we arrived: usually generated by ktime_get_real() and
> + * thus is recorded in system time. If the lowest bit is set,
> + * then the value was originally generated by a different clock
> + * in the receiving hardware and then transformed to system time.
> * @dev: Device we arrived on/are leaving by
> * @transport_header: Transport layer header
> * @network_header: Network layer header
> @@ -1524,23 +1527,52 @@ static inline void skb_copy_to_linear_data_offset(struct sk_buff *skb,
>
> extern void skb_init(void);
Please use ktime_t instead of "union ktime"
>
> +/** returns skb->tstamp without the bit which marks hardware time stamps */
> +static inline union ktime skb_get_ktime(const struct sk_buff *skb)
> +{
> +#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
> + return ktime_set(skb->tstamp.tv.sec,
> + skb->tstamp.tv.nsec & ~1);
> +#else
> + return (ktime_t) { .tv64 = skb->tstamp.tv64 & ~1UL };
> +#endif
> +}
> +
> /**
> * skb_get_timestamp - get timestamp from a skb
> * @skb: skb to get stamp from
> * @stamp: pointer to struct timeval to store stamp in
> *
> * Timestamps are stored in the skb as offsets to a base timestamp.
> + * The lowest bit is set if and only if the time stamp was originally
> + * created by hardware when processing the packet.
> + *
> * This function converts the offset back to a struct timeval and stores
> * it in stamp.
> */
> static inline void skb_get_timestamp(const struct sk_buff *skb, struct timeval *stamp)
> {
> - *stamp = ktime_to_timeval(skb->tstamp);
> + *stamp = ktime_to_timeval(skb_get_ktime(skb));
> +}
> +
> +static inline void skb_get_timestampns(const struct sk_buff *skb, struct timespec *stamp)
> +{
> + *stamp = ktime_to_timespec(skb_get_ktime(skb));
> }
>
> static inline void __net_timestamp(struct sk_buff *skb)
> {
> skb->tstamp = ktime_get_real();
> +
> + /*
> + * make sure that lowest bit is never set: it marks hardware
> + * time stamps
> + */
> +#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
> + skb->tstamp.tv.sec = skb->tstamp.tv.sec / 2 * 2;
.tv.sec ? are you sure you dont want .tv.nsec ?
> +#else
> + skb->tstamp.tv64 = skb->tstamp.tv64 / 2 * 2;
> +#endif
> }
>
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps
2008-11-12 7:41 ` Eric Dumazet
@ 2008-11-12 8:09 ` Patrick Ohly
2008-11-12 10:09 ` David Miller
0 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-11-12 8:09 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev@vger.kernel.org, Octavian Purdila, Stephen Hemminger,
Ingo Oeser, Andi Kleen, Ronciak, John, Oliver Hartkopp
On Wed, 2008-11-12 at 07:41 +0000, Eric Dumazet wrote:
> Please use ktime_t instead of "union ktime"
Are you sure?
include/linux/ktime.h says
typedef union ktime ktime_t; /* Kill this */
and the CodingStyle also seems to be against it.
I thought it would be good to avoid using the typedef in new code, but
if consistency with the existing code is preferred, then I'll change it.
> > +
> > + /*
> > + * make sure that lowest bit is never set: it marks hardware
> > + * time stamps
> > + */
> > +#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
> > + skb->tstamp.tv.sec = skb->tstamp.tv.sec / 2 * 2;
>
> .tv.sec ? are you sure you dont want .tv.nsec ?
Eek! Right. I'm pretty sure I compiled this in 32 bit mode, but I
haven't actually tried the result.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps
2008-11-12 8:09 ` Patrick Ohly
@ 2008-11-12 10:09 ` David Miller
0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2008-11-12 10:09 UTC (permalink / raw)
To: patrick.ohly
Cc: dada1, netdev, opurdila, shemminger, netdev, ak, john.ronciak,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 12 Nov 2008 09:09:04 +0100
> On Wed, 2008-11-12 at 07:41 +0000, Eric Dumazet wrote:
> > Please use ktime_t instead of "union ktime"
>
> Are you sure?
>
> include/linux/ktime.h says
> typedef union ktime ktime_t; /* Kill this */
> and the CodingStyle also seems to be against it.
>
> I thought it would be good to avoid using the typedef in new code, but
> if consistency with the existing code is preferred, then I'll change it.
Well you then go ahead and cast return values to "ktime_t"
so this code is not even being consistent about the choice.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps
2008-10-22 8:17 ` [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps Patrick Ohly
2008-11-12 7:41 ` Eric Dumazet
@ 2008-11-12 9:58 ` David Miller
2008-11-19 12:50 ` Patrick Ohly
1 sibling, 1 reply; 48+ messages in thread
From: David Miller @ 2008-11-12 9:58 UTC (permalink / raw)
To: patrick.ohly
Cc: netdev, opurdila, shemminger, netdev, ak, john.ronciak, dada1,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 22 Oct 2008 10:17:24 +0200
> +int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp)
> +{
> + struct rtable *rt;
> + struct in_device *idev;
> + struct net_device *netdev;
> +
> + if (skb_hwtstamp_available(skb) &&
> + (rt = skb->rtable) != NULL &&
> + (idev = rt->idev) != NULL &&
> + (netdev = idev->dev) != NULL &&
> + netdev->hwtstamp_raw) {
> + return netdev->hwtstamp_raw(skb, stamp);
> + } else {
> + return 0;
> + }
> +}
> +
> +EXPORT_SYMBOL_GPL(skb_hwtstamp_raw);
You can't be accessing the generic destination cache entry attached to
the SKB, here in generic SKB code, as a pointer to an ipv4 specific
route object. What if this is an IPV6 or DECNET packet?
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps
2008-11-12 9:58 ` David Miller
@ 2008-11-19 12:50 ` Patrick Ohly
0 siblings, 0 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-11-19 12:50 UTC (permalink / raw)
To: David Miller; +Cc: netdev@vger.kernel.org
On Wed, 2008-11-12 at 09:58 +0000, David Miller wrote:
> From: Patrick Ohly <patrick.ohly@intel.com>
> Date: Wed, 22 Oct 2008 10:17:24 +0200
>
> > +int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp)
> > +{
> > + struct rtable *rt;
> > + struct in_device *idev;
> > + struct net_device *netdev;
> > +
> > + if (skb_hwtstamp_available(skb) &&
> > + (rt = skb->rtable) != NULL &&
> > + (idev = rt->idev) != NULL &&
> > + (netdev = idev->dev) != NULL &&
> > + netdev->hwtstamp_raw) {
> > + return netdev->hwtstamp_raw(skb, stamp);
> > + } else {
> > + return 0;
> > + }
> > +}
> > +
> > +EXPORT_SYMBOL_GPL(skb_hwtstamp_raw);
>
> You can't be accessing the generic destination cache entry attached to
> the SKB, here in generic SKB code, as a pointer to an ipv4 specific
> route object. What if this is an IPV6 or DECNET packet?
Yes, this is problematic. The revised patch still depends on this
pointer chain, now to convert the raw hardware time stamp to system
time. I don't see any clean solution right now except adding both raw
and transformed value to the skb (16 additional bytes instead of just 8,
or zero as in the original patch).
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 48+ messages in thread
* [RFC PATCH 01/13] put_cmsg_compat + SO_TIMESTAMP[NS]: use same name for value as caller
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
2008-10-22 8:17 ` [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps Patrick Ohly
@ 2008-10-22 12:46 ` Patrick Ohly
2008-11-12 9:55 ` David Miller
2008-10-22 15:01 ` [RFC PATCH 03/13] user space API for time stamping of incoming and outgoing packets Patrick Ohly
` (11 subsequent siblings)
13 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-10-22 12:46 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
In __sock_recv_timestamp() the additional SCM_TIMESTAMP[NS] is used. This
has the same value as SO_TIMESTAMP[NS], so this is a purely cosmetic change.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
net/compat.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/compat.c b/net/compat.c
index 67fb6a3..6ce1a1c 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -226,14 +226,14 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
return 0; /* XXX: return error? check spec. */
}
- if (level == SOL_SOCKET && type == SO_TIMESTAMP) {
+ if (level == SOL_SOCKET && type == SCM_TIMESTAMP) {
struct timeval *tv = (struct timeval *)data;
ctv.tv_sec = tv->tv_sec;
ctv.tv_usec = tv->tv_usec;
data = &ctv;
len = sizeof(ctv);
}
- if (level == SOL_SOCKET && type == SO_TIMESTAMPNS) {
+ if (level == SOL_SOCKET && type == SCM_TIMESTAMPNS) {
struct timespec *ts = (struct timespec *)data;
cts.tv_sec = ts->tv_sec;
cts.tv_nsec = ts->tv_nsec;
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 01/13] put_cmsg_compat + SO_TIMESTAMP[NS]: use same name for value as caller
2008-10-22 12:46 ` [RFC PATCH 01/13] put_cmsg_compat + SO_TIMESTAMP[NS]: use same name for value as caller Patrick Ohly
@ 2008-11-12 9:55 ` David Miller
0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2008-11-12 9:55 UTC (permalink / raw)
To: patrick.ohly
Cc: netdev, opurdila, shemminger, netdev, ak, john.ronciak, dada1,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 22 Oct 2008 14:46:39 +0200
> In __sock_recv_timestamp() the additional SCM_TIMESTAMP[NS] is used. This
> has the same value as SO_TIMESTAMP[NS], so this is a purely cosmetic change.
>
> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
I've applied this to net-2.6 as really this is a correction.
Thanks!
^ permalink raw reply [flat|nested] 48+ messages in thread
* [RFC PATCH 03/13] user space API for time stamping of incoming and outgoing packets
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
2008-10-22 8:17 ` [RFC PATCH 02/13] extended semantic of sk_buff::tstamp: lowest bit marks hardware time stamps Patrick Ohly
2008-10-22 12:46 ` [RFC PATCH 01/13] put_cmsg_compat + SO_TIMESTAMP[NS]: use same name for value as caller Patrick Ohly
@ 2008-10-22 15:01 ` Patrick Ohly
2008-11-12 10:02 ` David Miller
2008-10-24 13:41 ` [RFC PATCH 04/13] net: implement generic SOF_TIMESTAMPING_TX_* support Patrick Ohly
` (10 subsequent siblings)
13 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-10-22 15:01 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
New socket option SO_TIMESTAMPING. Supersedes SO_TIMESTAMP[NS]. The
overlap with the old SO_TIMESTAMP[NS] options is handled so that time
stamping in software (net_enable_timestamp()) is enabled when
SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE is set. It's
disabled if all of these are off.
User space can request hardware and/or software time stamping.
Reporting of the result(s) via a new control message is enabled
separately for each field in the message because some of the
fields may require additional computation and thus cause overhead.
New SIOCSHWTSTAMP ioctl number which controls the hardware which
does the hardware time stamping. Must be implemented by each
device driver which supports hardware time stamping together
with the new time stamp transformation methods in struct
net_device.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
Documentation/networking/timestamping.txt | 147 +++++++
.../networking/timestamping/timestamping.c | 441 ++++++++++++++++++++
include/asm-x86/socket.h | 3 +
include/linux/errqueue.h | 1 +
include/linux/netdevice.h | 15 +-
include/linux/skbuff.h | 4 +-
include/linux/sockios.h | 3 +
include/net/sock.h | 22 +-
include/net/timestamping.h | 92 ++++
net/compat.c | 19 +-
net/core/skbuff.c | 13 +-
net/core/sock.c | 88 ++++-
net/socket.c | 68 +++-
13 files changed, 861 insertions(+), 55 deletions(-)
create mode 100644 Documentation/networking/timestamping.txt
create mode 100644 Documentation/networking/timestamping/timestamping.c
create mode 100644 include/net/timestamping.h
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
new file mode 100644
index 0000000..10ecb1d
--- /dev/null
+++ b/Documentation/networking/timestamping.txt
@@ -0,0 +1,147 @@
+The existing interfaces for getting network packages time stamped are:
+
+* SO_TIMESTAMP
+ Generate time stamp for each incoming packet using the (not necessarily
+ monotonous!) system time. Result is returned via recv_msg() in a
+ control message as timeval (usec resolution).
+
+* SO_TIMESTAMPNS
+ Same time stamping mechanism as SO_TIMESTAMP, but returns result as
+ timespec (nsec resolution).
+
+* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
+ Only for multicasts: approximate send time stamp by receiving the looped
+ packet and using its receive time stamp.
+
+The following interface complements the existing ones: receive time
+stamps can be generated and returned for arbitrary packets and much
+closer to the point where the packet is really sent. Time stamps can
+be generated in software (as before) or in hardware (if the hardware
+has such a feature).
+
+SO_TIMESTAMPING:
+
+Instructs the socket layer which kind of information is wanted. The
+parameter is an integer with some of the following bits set. Setting
+other bits is an error and doesn't change the current state.
+
+SOF_TIMESTAMPING_TX_HARDWARE: try to obtain send time stamp in hardware
+SOF_TIMESTAMPING_TX_SOFTWARE: if SOF_TIMESTAMPING_TX_HARDWARE is off or
+ fails, then do it in software
+SOF_TIMESTAMPING_RX_HARDWARE: return the original, unmodified time stamp
+ as generated by the hardware
+SOF_TIMESTAMPING_RX_SOFTWARE: if SOF_TIMESTAMPING_RX_HARDWARE is off or
+ fails, then do it in software
+SOF_TIMESTAMPING_RAW_HARDWARE: return original raw hardware time stamp
+SOF_TIMESTAMPING_SYS_HARDWARE: return hardware time stamp transformed to
+ the system time base
+SOF_TIMESTAMPING_SOFTWARE: return system time stamp generated in
+ software
+
+SOF_TIMESTAMPING_TX/RX determine how time stamps are generated.
+SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the
+following control message:
+ struct scm_timestamping {
+ struct timespec systime;
+ struct timespec hwtimetrans;
+ struct timespec hwtimeraw;
+ };
+
+recvmsg() can be used to get this control message for regular incoming
+packets. For send time stamps the outgoing packet is looped back to
+the socket's error queue with the send time stamp(s) attached. It can
+be received with recvmsg(flags=MSG_ERRQUEUE). The call returns the
+original outgoing packet data preceeded by all headers down to and
+including the link layer (for example, PTPv1 over Ethernet has 14
+bytes Ethernet header, 20 bytes IP header, 8 bytes UDP header in front
+of the original data), the scm_timestamping control message and a
+sock_extended_err control message with ee_errno==0 and
+ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending
+bounced packet is ready for reading as far as select() is concerned.
+
+All three values correspond to the same event in time, but were
+generated in different ways. Each of these values may be empty (= all
+zero), in which case no such value was available. If the application
+is not interested in some of these values, they can be left blank to
+avoid the potential overhead of calculating them.
+
+systime is the value of the system time at that moment. This
+corresponds to the value also returned via SO_TIMESTAMP[NS]. If the
+time stamp was generated by hardware, then this field is
+empty. Otherwise it is filled in if SOF_TIMESTAMPING_SOFTWARE is
+set.
+
+hwtimetrans is the hardware time stamp transformed so that it
+corresponds as good as possible to system time. This correlation is
+not perfect; as a consequence, sorting packets received via different
+NICs by their hwtimetrans may differ from the order in which they were
+received. hwtimetrans may be non-monotonic even for the same NIC.
+Filled in if SOF_TIMESTAMPING_SYS_HARDWARE is set.
+
+hwtimeraw is the original hardware time stamp. Depending on the device
+driver and how Linux was compiled (separate fields in sk_buff or
+just one time stamp field, see below) it might not be possible to
+fill in this value. Filled in if SOF_TIMESTAMPING_RAW_HARDWARE is
+set.
+
+
+SIOCSHWTSTAMP:
+
+Hardware time stamping must also be initialized for each device driver
+that is expected to do hardware time stamping. The parameter is:
+
+struct hwtstamp_config {
+ int flags; /**< no flags defined right now, must be zero */
+ int tx_type; /**< HWTSTAMP_TX_* */
+ int rx_filter_type; /**< HWTSTAMP_RX_* */
+};
+
+/** possible values for hwtstamp_config->tx_type */
+enum {
+ /**
+ * no outgoing packet will need hardware time stamping;
+ * should a packet arrive which asks for it, no hardware
+ * time stamping will be done
+ */
+ HWTSTAMP_TX_OFF,
+
+ /**
+ * enables hardware time stamping for outgoing packets;
+ * the sender of the packet decides which are to be
+ * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
+ * before sending the packet
+ */
+ HWSTAMP_TX_ON,
+};
+
+/** possible values for hwtstamp_config->rx_filter_type */
+enum {
+ /** time stamp no incoming packet at all */
+ HWTSTAMP_FILTER_NONE,
+
+ /** time stamp any incoming packet */
+ HWTSTAMP_FILTER_ALL,
+
+ /** PTP v1, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
+
+ ...
+};
+
+
+LIMITATIONS
+
+Unless the Linux kernel gets modified, there is only one field in
+sk_buff which can hold time stamps. Therefore it is impossible to
+record both real system time and hardware time for the same packet.
+The lowest bit is used to distinguish between hardware and software
+time stamps; this reduces the resolution to 2ns.
+
+In order to remain as compatible as possible, hardware time stamps are
+stored in sk_buff after a transformation to the system time base.
+This value is then returned by SO_TIMESTAMP[NS] and hwtimetrans.
+
+The original hardware time stamp can only be returned after
+transforming it back, which might not be supported by the driver which
+generated the packet. In that case hwtimetrans is set, but hwtimeraw
+is not.
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
new file mode 100644
index 0000000..abb2b79
--- /dev/null
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -0,0 +1,441 @@
+/**
+ * This program demonstrates how the various time stamping features in
+ * the Linux kernel work. It emulates the behavior of a PTP
+ * implementation in stand-alone master mode by sending PTPv1 Sync
+ * multicasts once every second. It looks for similar packets, but
+ * beyond that doesn't actually implement PTP.
+ *
+ * Outgoing packets are time stamped with SO_TIMESTAMPING with or
+ * without hardware support.
+ *
+ * Incoming packets are time stamped with SO_TIMESTAMPING with or
+ * without hardware support, SIOCGSTAMP[NS] (per-socket time stamp) and
+ * SO_TIMESTAMP[NS].
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <sys/ioctl.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+
+#include "net/timestamping.h"
+
+#ifndef SO_TIMESTAMPNS
+# define SO_TIMESTAMPNS 35
+#endif
+
+#ifndef SIOCGSTAMPNS
+# define SIOCGSTAMPNS 0x8907
+#endif
+
+static void usage(const char *error)
+{
+ if (error) {
+ printf("invalid option: %s\n", error);
+ }
+ printf("timestamping interface (IP_MULTICAST_LOOP|SO_TIMESTAMP|SO_TIMESTAMPNS|SOF_TIMESTAMPING_TX_HARDWARE|SOF_TIMESTAMPING_TX_SOFTWARE|SOF_TIMESTAMPING_RX_HARDWARE|SOF_TIMESTAMPING_RX_SOFTWARE|SOF_TIMESTAMPING_SOFTWARE|SOF_TIMESTAMPING_SYS_HARDWARE|SOF_TIMESTAMPING_RAW_HARDWARE|SIOCGSTAMP|SIOCGSTAMPNS)*\n");
+ exit(1);
+}
+
+static void bail(const char *error)
+{
+ printf("%s: %s\n", error, strerror(errno));
+ exit(1);
+}
+
+static const unsigned char sync[] = {
+ 0x00,0x01, 0x00,0x01,
+ 0x5f,0x44, 0x46,0x4c,
+ 0x54,0x00, 0x00,0x00,
+ 0x00,0x00, 0x00,0x00,
+ 0x00,0x00, 0x00,0x00,
+ 0x01,0x01,
+
+ /* fake uuid */
+ 0x00,0x01,
+ 0x02,0x03, 0x04,0x05,
+
+ 0x00,0x01, 0x00,0x37,
+ 0x00,0x00, 0x00,0x08,
+ 0x00,0x00, 0x00,0x00,
+ 0x49,0x05, 0xcd,0x01,
+ 0x29,0xb1, 0x8d,0xb0,
+ 0x00,0x00, 0x00,0x00,
+ 0x00,0x01,
+
+ /* fake uuid */
+ 0x00,0x01,
+ 0x02,0x03, 0x04,0x05,
+
+ 0x00,0x00, 0x00,0x37,
+ 0x00,0x00, 0x00,0x04,
+ 0x44,0x46, 0x4c,0x54,
+ 0x00,0x00, 0xf0,0x60,
+ 0x00,0x01, 0x00,0x00,
+ 0x00,0x00, 0x00,0x01,
+ 0x00,0x00, 0xf0,0x60,
+ 0x00,0x00, 0x00,0x00,
+ 0x00,0x00, 0x00,0x04,
+ 0x44,0x46, 0x4c,0x54,
+ 0x00,0x01,
+
+ /* fake uuid */
+ 0x00,0x01,
+ 0x02,0x03, 0x04,0x05,
+
+ 0x00,0x00, 0x00,0x00,
+ 0x00,0x00, 0x00,0x00,
+ 0x00,0x00, 0x00,0x00,
+ 0x00,0x00, 0x00,0x00
+};
+
+static void sendpacket(int sock, struct sockaddr *addr, socklen_t addr_len)
+{
+ struct timeval now;
+ int res;
+
+ res = sendto(sock, sync, sizeof(sync), 0,
+ addr, addr_len);
+ gettimeofday(&now, 0);
+ if (res < 0) {
+ printf("%s: %s\n", "send", strerror(errno));
+ } else {
+ printf("%ld.%06ld: sent %d bytes\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ res);
+ }
+}
+
+static void recvpacket(int sock, int recvmsg_flags,
+ int siocgstamp, int siocgstampns)
+{
+ char data[256];
+ struct timeval now;
+ struct msghdr msg;
+ struct iovec entry;
+ struct sockaddr_in from_addr;
+ struct {
+ struct cmsghdr cm;
+ char control[512];
+ } control;
+ int res;
+
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_iov = &entry;
+ msg.msg_iovlen = 1;
+ entry.iov_base = data;
+ entry.iov_len = sizeof(data);
+ msg.msg_name = (caddr_t)&from_addr;
+ msg.msg_namelen = sizeof(from_addr);
+ msg.msg_control = &control;
+ msg.msg_controllen = sizeof(control);
+
+ res = recvmsg(sock, &msg, recvmsg_flags|MSG_DONTWAIT);
+ gettimeofday(&now, 0);
+ if (res < 0) {
+ printf("%s %s: %s\n",
+ "recvmsg",
+ (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
+ strerror(errno));
+ } else {
+ struct cmsghdr *cmsg;
+ struct timeval tv;
+ struct timespec ts;
+
+ printf("%ld.%06ld: received %s data, %d bytes from %s, %d bytes control messages\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
+ res,
+ inet_ntoa(from_addr.sin_addr),
+ msg.msg_controllen);
+ for (cmsg = CMSG_FIRSTHDR(&msg);
+ cmsg;
+ cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+ printf(" cmsg len %d: ", cmsg->cmsg_len);
+ switch (cmsg->cmsg_level) {
+ case SOL_SOCKET:
+ printf("SOL_SOCKET ");
+ switch (cmsg->cmsg_type) {
+ case SO_TIMESTAMP: {
+ struct timeval *stamp =
+ (struct timeval *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMP %ld.%06ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_usec);
+ break;
+ }
+ case SO_TIMESTAMPNS: {
+ struct timespec *stamp =
+ (struct timespec *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMPNS %ld.%09ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ break;
+ }
+ case SO_TIMESTAMPING: {
+ struct timespec *stamp =
+ (struct timespec *)CMSG_DATA(cmsg);
+ printf("SO_TIMESTAMPING ");
+ printf("SW %ld.%09ld ",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ stamp++;
+ printf("HW transformed %ld.%09ld ",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ stamp++;
+ printf("HW raw %ld.%09ld",
+ (long)stamp->tv_sec,
+ (long)stamp->tv_nsec);
+ break;
+ }
+ default:
+ printf("type %d", cmsg->cmsg_type);
+ break;
+ }
+ break;
+ case IPPROTO_IP:
+ printf("IPPROTO_IP ");
+ switch (cmsg->cmsg_type) {
+ case IP_RECVERR: {
+#ifdef SO_EE_ORIGIN_TIMESTAMPING
+ struct sock_extended_err *err =
+ (struct sock_extended_err *)CMSG_DATA(cmsg);
+ printf("IP_RECVERR ee_errno '%s' ee_origin %d => %s",
+ strerror(err->ee_errno),
+ err->err_origin,
+ err->err_origin == SO_EE_ORIGIN_TIMESTAMPING ?
+ "bounced packet" : "error");
+#else
+ printf("IP_RECVERR, probably SO_EE_ORIGIN_TIMESTAMPING");
+#endif
+ if (!memcmp(sync, data + 14 + 20 + 8,
+ sizeof(sync))) {
+ printf(" => GOT OUR DATA BACK (HURRAY!)");
+ }
+ break;
+ }
+ default:
+ printf("type %d", cmsg->cmsg_type);
+ break;
+ }
+ break;
+ default:
+ printf("level %d type %d",
+ cmsg->cmsg_level,
+ cmsg->cmsg_type);
+ break;
+ }
+ printf("\n");
+ }
+
+ if (siocgstamp) {
+ if (ioctl(sock, SIOCGSTAMP, &tv)) {
+ printf(" %s: %s\n", "SIOCGSTAMP", strerror(errno));
+ } else {
+ printf("SIOCGSTAMP %ld.%06ld\n",
+ (long)tv.tv_sec,
+ (long)tv.tv_usec);
+ }
+ }
+ if (siocgstampns) {
+ if (ioctl(sock, SIOCGSTAMPNS, &ts)) {
+ printf(" %s: %s\n", "SIOCGSTAMPNS", strerror(errno));
+ } else {
+ printf("SIOCGSTAMPNS %ld.%09ld\n",
+ (long)ts.tv_sec,
+ (long)ts.tv_nsec);
+ }
+ }
+ }
+}
+
+int main(int argc, char **argv)
+{
+ int so_timestamping_flags = 0;
+ int so_timestamp = 0;
+ int so_timestampns = 0;
+ int siocgstamp = 0;
+ int siocgstampns = 0;
+ int ip_multicast_loop = 0;
+ char *interface;
+ int i;
+ int enabled = 1;
+ int sock;
+ struct ifreq device;
+ struct sockaddr_in addr;
+ struct ip_mreq imr;
+ struct in_addr iaddr;
+ int val;
+ socklen_t len;
+ struct timeval next;
+
+ if (argc < 2) {
+ usage(0);
+ }
+ interface = argv[1];
+
+ for (i = 2; i < argc; i++ ) {
+ if (!strcasecmp(argv[i], "SO_TIMESTAMP")) {
+ so_timestamp = 1;
+ } else if (!strcasecmp(argv[i], "SO_TIMESTAMPNS")) {
+ so_timestampns = 1;
+ } else if (!strcasecmp(argv[i], "SIOCGSTAMP")) {
+ siocgstamp = 1;
+ } else if (!strcasecmp(argv[i], "SIOCGSTAMPNS")) {
+ siocgstampns = 1;
+ } else if (!strcasecmp(argv[i], "IP_MULTICAST_LOOP")) {
+ ip_multicast_loop = 1;
+ } else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_HARDWARE")) {
+ so_timestamping_flags |= SOF_TIMESTAMPING_TX_HARDWARE;
+ } else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_TX_SOFTWARE")) {
+ so_timestamping_flags |= SOF_TIMESTAMPING_TX_SOFTWARE;
+ } else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_HARDWARE")) {
+ so_timestamping_flags |= SOF_TIMESTAMPING_RX_HARDWARE;
+ } else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RX_SOFTWARE")) {
+ so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE;
+ } else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE")) {
+ so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE;
+ } else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SYS_HARDWARE")) {
+ so_timestamping_flags |= SOF_TIMESTAMPING_SYS_HARDWARE;
+ } else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE")) {
+ so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE;
+ } else {
+ usage(argv[i]);
+ }
+ }
+
+ sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
+ if (socket < 0) {
+ bail("socket");
+ }
+
+ memset(&device, 0, sizeof(device));
+ strncpy(device.ifr_name, interface, sizeof(device.ifr_name));
+ if (ioctl(sock, SIOCGIFADDR, &device) < 0) {
+ bail("getting interface IP address");
+ }
+
+ /* bind to PTP port */
+ addr.sin_family = AF_INET;
+ addr.sin_addr.s_addr = htonl(INADDR_ANY);
+ addr.sin_port = htons(319 /* PTP event port */);
+ if (bind(sock, (struct sockaddr*)&addr, sizeof(struct sockaddr_in)) < 0) {
+ bail("bind");
+ }
+
+ /* set multicast group for outgoing packets */
+ inet_aton("224.0.1.130", &iaddr); /* alternate PTP domain 1 */
+ addr.sin_addr = iaddr;
+ imr.imr_multiaddr.s_addr = iaddr.s_addr;
+ imr.imr_interface.s_addr = ((struct sockaddr_in *)&device.ifr_addr)->sin_addr.s_addr;
+ if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_IF, &imr.imr_interface.s_addr, sizeof(struct in_addr)) < 0) {
+ bail("set multicast");
+ }
+
+ /* join multicast group, loop our own packet */
+ if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, &imr, sizeof(struct ip_mreq)) < 0) {
+ bail("join multicast group");
+ }
+ if (setsockopt(sock, IPPROTO_IP, IP_MULTICAST_LOOP, &ip_multicast_loop, sizeof(enabled)) < 0) {
+ bail("loop multicast");
+ }
+
+ /* set socket options for time stamping */
+ if (so_timestamp &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &enabled, sizeof(enabled)) < 0) {
+ bail("setsockopt SO_TIMESTAMP");
+ }
+ if (so_timestampns &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &enabled, sizeof(enabled)) < 0) {
+ bail("setsockopt SO_TIMESTAMPNS");
+ }
+ if (so_timestamping_flags &&
+ setsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &so_timestamping_flags, sizeof(so_timestamping_flags)) < 0) {
+ bail("setsockopt SO_TIMESTAMPING");
+ }
+
+ /* verify socket options */
+ len = sizeof(val);
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &val, &len) < 0) {
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMP", strerror(errno));
+ } else {
+ printf("SO_TIMESTAMP %d\n", val);
+ }
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPNS, &val, &len) < 0) {
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMPNS", strerror(errno));
+ } else {
+ printf("SO_TIMESTAMPNS %d\n", val);
+ }
+ if (getsockopt(sock, SOL_SOCKET, SO_TIMESTAMPING, &val, &len) < 0) {
+ printf("%s: %s\n", "getsockopt SO_TIMESTAMPING", strerror(errno));
+ } else {
+ printf("SO_TIMESTAMPING %d\n", val);
+ if (val != so_timestamping_flags) {
+ printf(" not the expected value %d\n", so_timestamping_flags);
+ }
+ }
+
+ /* send packets forever every five seconds */
+ gettimeofday(&next, 0);
+ next.tv_sec = (next.tv_sec + 1) / 5 * 5;
+ next.tv_usec = 0;
+ while(1) {
+ struct timeval now;
+ struct timeval delta;
+ long delta_us;
+ int res;
+ fd_set readfs, errorfs;
+
+ gettimeofday(&now, 0);
+ delta_us = (long)(next.tv_sec - now.tv_sec) * 1000000 +
+ (long)(next.tv_usec - now.tv_usec);
+ if (delta_us > 0) {
+ /* continue waiting for timeout or data */
+ delta.tv_sec = delta_us / 1000000;
+ delta.tv_usec = delta_us % 1000000;
+
+ FD_ZERO(&readfs);
+ FD_ZERO(&errorfs);
+ FD_SET(sock, &readfs);
+ FD_SET(sock, &errorfs);
+ printf("%ld.%06ld: select %ldus\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ delta_us);
+ res = select(sock + 1, &readfs, 0, &errorfs, &delta);
+ gettimeofday(&now, 0);
+ printf("%ld.%06ld: select returned: %d, %s\n",
+ (long)now.tv_sec, (long)now.tv_usec,
+ res,
+ res < 0 ? strerror(errno) : "success");
+ if (res > 0) {
+ if (FD_ISSET(sock, &readfs)) {
+ printf("ready for reading\n");
+ }
+ if (FD_ISSET(sock, &errorfs)) {
+ printf("has error\n");
+ }
+ recvpacket(sock, 0,
+ siocgstamp,
+ siocgstampns);
+ recvpacket(sock, MSG_ERRQUEUE,
+ siocgstamp,
+ siocgstampns);
+ }
+ } else {
+ /* write one packet */
+ sendpacket(sock, (struct sockaddr *)&addr, sizeof(addr));
+ next.tv_sec += 5;
+ continue;
+ }
+ }
+
+ return 0;
+}
diff --git a/include/asm-x86/socket.h b/include/asm-x86/socket.h
index 80af9c4..8412a75 100644
--- a/include/asm-x86/socket.h
+++ b/include/asm-x86/socket.h
@@ -54,4 +54,7 @@
#define SO_MARK 36
+#define SO_TIMESTAMPING 37
+#define SCM_TIMESTAMPING SO_TIMESTAMPING
+
#endif /* _ASM_SOCKET_H */
diff --git a/include/linux/errqueue.h b/include/linux/errqueue.h
index 92f8d4f..86d88dd 100644
--- a/include/linux/errqueue.h
+++ b/include/linux/errqueue.h
@@ -16,6 +16,7 @@ struct sock_extended_err
#define SO_EE_ORIGIN_LOCAL 1
#define SO_EE_ORIGIN_ICMP 2
#define SO_EE_ORIGIN_ICMP6 3
+#define SO_EE_ORIGIN_TIMESTAMPING 4
#define SO_EE_OFFENDER(ee) ((struct sockaddr*)((ee)+1))
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4da51cb..79221a1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -752,15 +752,16 @@ struct net_device
/* hardware time stamping support */
#define HAVE_HW_TIME_STAMP
- /* Transforms skb->tstamp back to the original, raw hardware
- * time stamp. The value must have been generated by the
- * device. Implementing this is optional, but necessary for
- * SO_TIMESTAMP_HARDWARE.
+ /* Transforms time stamp back from system time base
+ * to the original, raw hardware time stamp. This call
+ * is necessary only when scm_timestamping::hwtimeraw
+ * is to be supported.
*
- * Returns 1 if value could be retrieved, 0 otherwise.
+ * Returns empty stamp (= all zero) if conversion wasn't
+ * possible.
*/
- int (*hwtstamp_raw)(const struct sk_buff *skb,
- struct timespec *stamp);
+ union ktime (*hwtstamp_sys2raw)(struct net_device *dev,
+ union ktime stamp);
};
#define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0b3b36a..b8818dc 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1609,7 +1609,7 @@ static inline void skb_hwtstamp_set(struct sk_buff *skb,
* Fills the timespec with the original, "raw" time stamp as generated
* by the hardware when it processed the packet and returns 1 if such
* a hardware time stamp is unavailable or cannot be inferred. Otherwise
- * it returns 0;
+ * it returns 0 and doesn't modify the stamp.
*/
int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp);
@@ -1621,7 +1621,7 @@ int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp);
* transformed time stamp.
*
* Returns 1 if a transformed hardware time stamp is available, 0
- * otherwise.
+ * otherwise. In that case the stamp is left unchanged.
*/
int skb_hwtstamp_transformed(const struct sk_buff *skb, struct timespec *stamp);
diff --git a/include/linux/sockios.h b/include/linux/sockios.h
index abef759..209ee22 100644
--- a/include/linux/sockios.h
+++ b/include/linux/sockios.h
@@ -122,6 +122,9 @@
#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
#define SIOCBRDELIF 0x89a3 /* remove interface from bridge */
+/* hardware time stamping: parameters in net/timestamping.h */
+#define SIOCSHWTSTAMP 0x89b0
+
/* Device private ioctl calls */
/*
diff --git a/include/net/sock.h b/include/net/sock.h
index 06c5259..739a8e8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -152,7 +152,7 @@ struct sock_common {
* @sk_allocation: allocation mode
* @sk_sndbuf: size of send buffer in bytes
* @sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
- * %SO_OOBINLINE settings
+ * %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
* @sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets
* @sk_route_caps: route capabilities (e.g. %NETIF_F_TSO)
* @sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
@@ -413,6 +413,13 @@ enum sock_flags {
SOCK_RCVTSTAMPNS, /* %SO_TIMESTAMPNS setting */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+ SOCK_TIMESTAMPING_TX_HARDWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_TX_HARDWARE */
+ SOCK_TIMESTAMPING_TX_SOFTWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_TX_SOFTWARE */
+ SOCK_TIMESTAMPING_RX_HARDWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_RX_HARDWARE */
+ SOCK_TIMESTAMPING_RX_SOFTWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_RX_SOFTWARE */
+ SOCK_TIMESTAMPING_SOFTWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_SOFTWARE */
+ SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_RAW_HARDWARE */
+ SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SO_TIMESTAMPING %SOF_TIMESTAMPING_SYS_HARDWARE */
};
static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -1260,7 +1267,16 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
{
ktime_t kt = skb->tstamp;
- if (sock_flag(sk, SOCK_RCVTSTAMP))
+ /*
+ * generate control messages if receive time stamping requested
+ * or if time stamp available (RX hardware or TX software/hardware
+ * case) and reporting via SO_TIMESTAMPING enabled
+ */
+ if ((sock_flag(sk, SOCK_RCVTSTAMP) ||
+ sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) ||
+ (kt.tv64 && (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+ sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE) ||
+ sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE))))
__sock_recv_timestamp(msg, sk, skb);
else
sk->sk_stamp = kt;
@@ -1322,7 +1338,7 @@ static inline void sk_change_net(struct sock *sk, struct net *net)
sock_net_set(sk, hold_net(net));
}
-extern void sock_enable_timestamp(struct sock *sk);
+extern void sock_enable_timestamp(struct sock *sk, int flag);
extern int sock_get_timestamp(struct sock *, struct timeval __user *);
extern int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/include/net/timestamping.h b/include/net/timestamping.h
new file mode 100644
index 0000000..53cb603
--- /dev/null
+++ b/include/net/timestamping.h
@@ -0,0 +1,92 @@
+#ifndef _NET_TIMESTAMPING_H
+#define _NET_TIMESTAMPING_H
+
+#include <linux/socket.h> /* for SO_TIMESTAMPING */
+
+/**
+ * user space linux/socket.h might not have these defines yet:
+ * provide fallback
+ */
+#if !defined(__kernel__) && !defined(SO_TIMESTAMPING)
+# define SO_TIMESTAMPING 37
+# define SCM_TIMESTAMPING SO_TIMESTAMPING
+#endif
+
+/** %SO_TIMESTAMPING gets an integer bit field comprised of these values */
+enum {
+ SOF_TIMESTAMPING_TX_HARDWARE = (1<<0),
+ SOF_TIMESTAMPING_TX_SOFTWARE = (1<<1),
+ SOF_TIMESTAMPING_RX_HARDWARE = (1<<2),
+ SOF_TIMESTAMPING_RX_SOFTWARE = (1<<3),
+ SOF_TIMESTAMPING_SOFTWARE = (1<<4),
+ SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5),
+ SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6),
+ SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_RAW_HARDWARE - 1) | SOF_TIMESTAMPING_RAW_HARDWARE
+};
+
+#if !defined(__kernel__) && !defined(SIOCSHWTSTAMP)
+# define SIOCSHWTSTAMP 0x89b0
+#endif
+
+/** %SIOCSHWTSTAMP expects a pointer to this struct */
+struct hwtstamp_config {
+ int flags; /**< no flags defined right now, must be zero */
+ int tx_type; /**< one of HWTSTAMP_TX_* */
+ int rx_filter_type; /**< one of HWTSTAMP_RX_* */
+};
+
+/** possible values for hwtstamp_config->tx_type */
+enum {
+ /**
+ * no outgoing packet will need hardware time stamping;
+ * should a packet arrive which asks for it, no hardware
+ * time stamping will be done
+ */
+ HWTSTAMP_TX_OFF,
+
+ /**
+ * enables hardware time stamping for outgoing packets;
+ * the sender of the packet decides which are to be
+ * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
+ * before sending the packet
+ */
+ HWSTAMP_TX_ON,
+};
+
+/** possible values for hwtstamp_config->rx_filter_type */
+enum {
+ /** time stamp no incoming packet at all */
+ HWTSTAMP_FILTER_NONE,
+
+ /** time stamp any incoming packet */
+ HWTSTAMP_FILTER_ALL,
+
+ /** PTP v1, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
+ /** PTP v1, UDP, Sync packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_SYNC,
+ /** PTP v1, UDP, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ,
+ /** PTP v2, UDP, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_EVENT,
+ /** PTP v2, UDP, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_SYNC,
+ /** PTP v2, UDP, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ,
+
+ /** 802.AS1, Ethernet, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_EVENT,
+ /** 802.AS1, Ethernet, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_SYNC,
+ /** 802.AS1, Ethernet, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ,
+
+ /** PTP v2/802.AS1, any layer, any kind of event packet */
+ HWTSTAMP_FILTER_PTP_V2_EVENT,
+ /** PTP v2/802.AS1, any layer, Sync packet */
+ HWTSTAMP_FILTER_PTP_V2_SYNC,
+ /** PTP v2/802.AS1, any layer, Delay_req packet */
+ HWTSTAMP_FILTER_PTP_V2_DELAY_REQ,
+};
+
+#endif /* _NET_TIMESTAMPING_H */
diff --git a/net/compat.c b/net/compat.c
index 6ce1a1c..954377e 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -216,7 +216,7 @@ Efault:
int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *data)
{
struct compat_timeval ctv;
- struct compat_timespec cts;
+ struct compat_timespec cts[3];
struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
struct compat_cmsghdr cmhdr;
int cmlen;
@@ -233,12 +233,17 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
data = &ctv;
len = sizeof(ctv);
}
- if (level == SOL_SOCKET && type == SCM_TIMESTAMPNS) {
+ if (level == SOL_SOCKET &&
+ (type == SCM_TIMESTAMPNS || type == SCM_TIMESTAMPING)) {
+ int count = type == SCM_TIMESTAMPNS ? 1 : 3;
+ int i;
struct timespec *ts = (struct timespec *)data;
- cts.tv_sec = ts->tv_sec;
- cts.tv_nsec = ts->tv_nsec;
+ for (i = 0; i < count; i++) {
+ cts[i].tv_sec = ts[i].tv_sec;
+ cts[i].tv_nsec = ts[i].tv_nsec;
+ }
data = &cts;
- len = sizeof(cts);
+ len = sizeof(cts[0]) * count;
}
cmlen = CMSG_COMPAT_LEN(len);
@@ -455,7 +460,7 @@ int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
struct timeval tv;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
tv = ktime_to_timeval(sk->sk_stamp);
if (tv.tv_sec == -1)
return err;
@@ -479,7 +484,7 @@ int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *usersta
struct timespec ts;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
ts = ktime_to_timespec(sk->sk_stamp);
if (ts.tv_sec == -1)
return err;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7a95062..3663b62 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2334,11 +2334,16 @@ int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp)
(rt = skb->rtable) != NULL &&
(idev = rt->idev) != NULL &&
(netdev = idev->dev) != NULL &&
- netdev->hwtstamp_raw) {
- return netdev->hwtstamp_raw(skb, stamp);
- } else {
- return 0;
+ netdev->hwtstamp_sys2raw) {
+ union ktime kstamp = netdev->hwtstamp_sys2raw(netdev,
+ skb_get_ktime(skb));
+ if (kstamp.tv64) {
+ *stamp = ktime_to_timespec(kstamp);
+ return 1;
+ }
}
+
+ return 0;
}
EXPORT_SYMBOL_GPL(skb_hwtstamp_raw);
diff --git a/net/core/sock.c b/net/core/sock.c
index 91f8bbc..d02a831 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -120,6 +120,7 @@
#include <net/net_namespace.h>
#include <net/request_sock.h>
#include <net/sock.h>
+#include <net/timestamping.h>
#include <net/xfrm.h>
#include <linux/ipsec.h>
@@ -254,11 +255,14 @@ static void sock_warn_obsolete_bsdism(const char *name)
}
}
-static void sock_disable_timestamp(struct sock *sk)
+static void sock_disable_timestamp(struct sock *sk, int flag)
{
- if (sock_flag(sk, SOCK_TIMESTAMP)) {
- sock_reset_flag(sk, SOCK_TIMESTAMP);
- net_disable_timestamp();
+ if (sock_flag(sk, flag)) {
+ sock_reset_flag(sk, flag);
+ if (!sock_flag(sk, SOCK_TIMESTAMP) &&
+ !sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) {
+ net_disable_timestamp();
+ }
}
}
@@ -613,13 +617,37 @@ set_rcvbuf:
else
sock_set_flag(sk, SOCK_RCVTSTAMPNS);
sock_set_flag(sk, SOCK_RCVTSTAMP);
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
} else {
sock_reset_flag(sk, SOCK_RCVTSTAMP);
sock_reset_flag(sk, SOCK_RCVTSTAMPNS);
}
break;
+ case SO_TIMESTAMPING:
+ if (val & ~SOF_TIMESTAMPING_MASK) {
+ ret = EINVAL;
+ break;
+ }
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE,
+ val & SOF_TIMESTAMPING_TX_HARDWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE,
+ val & SOF_TIMESTAMPING_TX_SOFTWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE,
+ val & SOF_TIMESTAMPING_RX_HARDWARE);
+ if (val & SOF_TIMESTAMPING_RX_SOFTWARE) {
+ sock_enable_timestamp(sk, SOCK_TIMESTAMPING_RX_SOFTWARE);
+ } else {
+ sock_disable_timestamp(sk, SOCK_TIMESTAMPING_RX_SOFTWARE);
+ }
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_SOFTWARE,
+ val & SOF_TIMESTAMPING_SOFTWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE,
+ val & SOF_TIMESTAMPING_SYS_HARDWARE);
+ sock_valbool_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE,
+ val & SOF_TIMESTAMPING_RAW_HARDWARE);
+ break;
+
case SO_RCVLOWAT:
if (val < 0)
val = INT_MAX;
@@ -765,6 +793,31 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
v.val = sock_flag(sk, SOCK_RCVTSTAMPNS);
break;
+ case SO_TIMESTAMPING:
+ v.val = 0;
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE)) {
+ v.val |= SOF_TIMESTAMPING_TX_HARDWARE;
+ }
+ if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE)) {
+ v.val |= SOF_TIMESTAMPING_TX_SOFTWARE;
+ }
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE)) {
+ v.val |= SOF_TIMESTAMPING_RX_HARDWARE;
+ }
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE)) {
+ v.val |= SOF_TIMESTAMPING_RX_SOFTWARE;
+ }
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) {
+ v.val |= SOF_TIMESTAMPING_SOFTWARE;
+ }
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE)) {
+ v.val |= SOF_TIMESTAMPING_SYS_HARDWARE;
+ }
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) {
+ v.val |= SOF_TIMESTAMPING_RAW_HARDWARE;
+ }
+ break;
+
case SO_RCVTIMEO:
lv=sizeof(struct timeval);
if (sk->sk_rcvtimeo == MAX_SCHEDULE_TIMEOUT) {
@@ -966,7 +1019,8 @@ void sk_free(struct sock *sk)
rcu_assign_pointer(sk->sk_filter, NULL);
}
- sock_disable_timestamp(sk);
+ sock_disable_timestamp(sk, SOCK_TIMESTAMP);
+ sock_disable_timestamp(sk, SOCK_TIMESTAMPING_RX_SOFTWARE);
if (atomic_read(&sk->sk_omem_alloc))
printk(KERN_DEBUG "%s: optmem leakage (%d bytes) detected.\n",
@@ -1780,7 +1834,7 @@ int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
{
struct timeval tv;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
tv = ktime_to_timeval(sk->sk_stamp);
if (tv.tv_sec == -1)
return -ENOENT;
@@ -1796,7 +1850,7 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
{
struct timespec ts;
if (!sock_flag(sk, SOCK_TIMESTAMP))
- sock_enable_timestamp(sk);
+ sock_enable_timestamp(sk, SOCK_TIMESTAMP);
ts = ktime_to_timespec(sk->sk_stamp);
if (ts.tv_sec == -1)
return -ENOENT;
@@ -1808,11 +1862,21 @@ int sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
}
EXPORT_SYMBOL(sock_get_timestampns);
-void sock_enable_timestamp(struct sock *sk)
+void sock_enable_timestamp(struct sock *sk, int flag)
{
- if (!sock_flag(sk, SOCK_TIMESTAMP)) {
- sock_set_flag(sk, SOCK_TIMESTAMP);
- net_enable_timestamp();
+ if (!sock_flag(sk, flag)) {
+ sock_set_flag(sk, flag);
+ /*
+ * we just set one of the two flags which require net
+ * time stamping, but time stamping might have been on
+ * already because of the other one
+ */
+ if (!sock_flag(sk,
+ flag == SOCK_TIMESTAMP ?
+ SOCK_TIMESTAMPING_RX_SOFTWARE :
+ SOCK_TIMESTAMP)) {
+ net_enable_timestamp();
+ }
}
}
diff --git a/net/socket.c b/net/socket.c
index 3e8d4e3..6fb6b40 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -602,26 +602,54 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
struct sk_buff *skb)
{
- ktime_t kt = skb->tstamp;
-
- if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) {
- struct timeval tv;
- /* Race occurred between timestamp enabling and packet
- receiving. Fill in the current time for now. */
- if (kt.tv64 == 0)
- kt = ktime_get_real();
- skb->tstamp = kt;
- tv = ktime_to_timeval(kt);
- put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP, sizeof(tv), &tv);
- } else {
- struct timespec ts;
- /* Race occurred between timestamp enabling and packet
- receiving. Fill in the current time for now. */
- if (kt.tv64 == 0)
- kt = ktime_get_real();
- skb->tstamp = kt;
- ts = ktime_to_timespec(kt);
- put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS, sizeof(ts), &ts);
+ /* Race occurred between timestamp enabling and packet
+ receiving. Fill in the current time for now. */
+ if (skb->tstamp.tv64 == 0)
+ __net_timestamp(skb);
+
+ if (sock_flag(sk, SOCK_RCVTSTAMP)) {
+ if (!sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+ struct timeval tv;
+ skb_get_timestamp(skb, &tv);
+ put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+ sizeof(tv), &tv);
+ } else {
+ struct timespec ts;
+ skb_get_timestampns(skb, &ts);
+ put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
+ sizeof(ts), &ts);
+ }
+ }
+
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+ sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE) ||
+ sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) {
+ struct timespec ts[3];
+ int empty = 1;
+ memset(ts, 0, sizeof(ts));
+ /*
+ * currently either hardware or software time stamp are available,
+ * but not both
+ */
+ if (!skb_hwtstamp_available(skb)) {
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) {
+ skb_get_timestampns(skb, ts + 0);
+ empty = 0;
+ }
+ } else {
+ if (sock_flag(sk, SOCK_TIMESTAMPING_SYS_HARDWARE)) {
+ skb_hwtstamp_transformed(skb, ts + 1);
+ empty = 0;
+ }
+ if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)) {
+ skb_hwtstamp_raw(skb, ts + 2);
+ empty = 0;
+ }
+ }
+ if (!empty) {
+ put_cmsg(msg, SOL_SOCKET,
+ SCM_TIMESTAMPING, sizeof(ts), &ts);
+ }
}
}
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 03/13] user space API for time stamping of incoming and outgoing packets
2008-10-22 15:01 ` [RFC PATCH 03/13] user space API for time stamping of incoming and outgoing packets Patrick Ohly
@ 2008-11-12 10:02 ` David Miller
0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2008-11-12 10:02 UTC (permalink / raw)
To: patrick.ohly
Cc: netdev, opurdila, shemminger, netdev, ak, john.ronciak, dada1,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 22 Oct 2008 17:01:05 +0200
> diff --git a/include/net/timestamping.h b/include/net/timestamping.h
> new file mode 100644
> index 0000000..53cb603
> --- /dev/null
> +++ b/include/net/timestamping.h
> @@ -0,0 +1,92 @@
> +#ifndef _NET_TIMESTAMPING_H
> +#define _NET_TIMESTAMPING_H
> +
> +#include <linux/socket.h> /* for SO_TIMESTAMPING */
> +
> +/**
> + * user space linux/socket.h might not have these defines yet:
> + * provide fallback
> + */
> +#if !defined(__kernel__) && !defined(SO_TIMESTAMPING)
It's __KERNEL__ not __kernel__, and user visible interfaces
should not be added to <net/foo.h> files.
Rather, they should be put into <linux/foo.h> files.
Since "timestamping.h" is too generic a name, use something
like <linux/net_tstamp.h>
^ permalink raw reply [flat|nested] 48+ messages in thread
* [RFC PATCH 04/13] net: implement generic SOF_TIMESTAMPING_TX_* support
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (2 preceding siblings ...)
2008-10-22 15:01 ` [RFC PATCH 03/13] user space API for time stamping of incoming and outgoing packets Patrick Ohly
@ 2008-10-24 13:41 ` Patrick Ohly
2008-11-11 23:15 ` Octavian Purdila
2008-10-24 13:49 ` [RFC PATCH 05/13] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
` (9 subsequent siblings)
13 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-10-24 13:41 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
We make use of the upper bits in the skb->tstamp to transport the
senders time stamping settings into the lower levels. Currently these
are per-socket settings, but a per-packet control message could also
be added.
When a TX timestamp operation is requested, the TX skb will be cloned
and the clone will be time stamped (in hardware or software) and added
to the socket error queue of the skb, if the skb has a socket
associated.
The actual timestamp will reach userspace as a RX timestamp on the
cloned packet. If timestamping is requested and no timestamping is
done in the device driver (potentially this may use hardware
timestamping), it will be done in software after the device's
start_hard_xmit routine.
The new semantic for hardware/software time stamping around
net_device->hard_start_xmit() is based on two assumptions about
existing network device drivers which don't support hardware
time stamping and know nothing about it:
- they leave the skb->tstamp field unmodified
- the keep the connection to the originating socket in skb->sk
alive, i.e., don't call skb_orphan()
The first assumption seems to hold for in-tree drivers. The second
is only true for some drivers. As a result, software TX time stamping
currently works with the bnx2 driver, but not with the unmodified
igb driver (the two drivers this patch was tested with).
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
Documentation/networking/timestamping.txt | 31 +++++++++++++++++
include/linux/netdevice.h | 10 ++++++
include/linux/skbuff.h | 51 +++++++++++++++++++++++++++++
include/net/sock.h | 14 ++++++++
net/core/dev.c | 34 +++++++++++++++++--
net/core/skbuff.c | 36 ++++++++++++++++++++
net/socket.c | 15 ++++++++
7 files changed, 188 insertions(+), 3 deletions(-)
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 10ecb1d..6a87a96 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -145,3 +145,34 @@ The original hardware time stamp can only be returned after
transforming it back, which might not be supported by the driver which
generated the packet. In that case hwtimetrans is set, but hwtimeraw
is not.
+
+
+DEVICE IMPLEMENTATION
+
+A driver which supports hardware time stamping must support the
+SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored
+in the skb with skb_hwtstamp_set().
+
+Time stamps for outgoing packets are to be generated as follows:
+- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware()
+ returns non-zero. If yes, then the driver is expected
+ to do hardware time stamping.
+- If this is possible for the skb and requested, then declare
+ that the driver is doing the time stamping by calling
+ skb_hwtstamp_tx_in_progress(). A driver not supporting
+ hardware time stamping doesn't do that. A driver must never
+ touch sk_buff::tstamp! It is used to store how time stamping
+ for an outgoing packets is to be done.
+- As soon as the driver has sent the packet and/or obtained a
+ hardware time stamp for it, it passes the time stamp back by
+ calling skb_hwtstamp_tx() with the original skb, the raw
+ hardware time stamp and a handle to the device (necessary
+ to convert the hardware time stamp to system time). If obtaining
+ the hardware time stamp somehow fails, then the driver should
+ not fall back to software time stamping. The rationale is that
+ this would occur at a later time in the processing pipeline
+ than other software time stamping and therefore could lead
+ to unexpected deltas between time stamps.
+- If the driver did not call skb_hwtstamp_tx_in_progress(), then
+ dev_hard_start_xmit() checks whether software time stamping
+ is wanted as fallback and potentially generates the time stamp.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 79221a1..89f4025 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -752,6 +752,16 @@ struct net_device
/* hardware time stamping support */
#define HAVE_HW_TIME_STAMP
+ /* Transforms original raw hardware time stamp to
+ * system time base. Always required when supporting
+ * hardware time stamping.
+ *
+ * Returns empty stamp (= all zero) if conversion wasn't
+ * possible.
+ */
+ union ktime (*hwtstamp_raw2sys)(struct net_device *dev,
+ union ktime stamp);
+
/* Transforms time stamp back from system time base
* to the original, raw hardware time stamp. This call
* is necessary only when scm_timestamping::hwtimeraw
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b8818dc..bcca8fc 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1625,6 +1625,57 @@ int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp);
*/
int skb_hwtstamp_transformed(const struct sk_buff *skb, struct timespec *stamp);
+/*
+ * Timestamps for outgoing skbs have special meaning:
+ * - request TX timestamping in hardware
+ * - request for TX hardware time stamp is processed by hardware
+ * - request TX timestamping in software as fallback
+ */
+#define SKB_TSTAMP_TX_HARDWARE (1LL << 62)
+#define SKB_TSTAMP_TX_HARDWARE_IN_PROGRESS (1LL << 61)
+#define SKB_TSTAMP_TX_SOFTWARE (1LL << 60)
+
+static inline int skb_hwtstamp_check_tx_hardware(struct sk_buff *skb)
+{
+ return (skb->tstamp.tv64 & SKB_TSTAMP_TX_HARDWARE) ? 1 : 0;
+}
+
+static inline void skb_hwtstamp_tx_in_progress(struct sk_buff *skb)
+{
+ skb->tstamp.tv64 |= SKB_TSTAMP_TX_HARDWARE_IN_PROGRESS;
+}
+static inline int skb_hwtstamp_check_tx_software(struct sk_buff *skb)
+{
+ return (skb->tstamp.tv64 & SKB_TSTAMP_TX_SOFTWARE) ? 1 : 0;
+}
+
+/**
+ * skb_hwtstamp_tx - queue clone of skb with send time stamp
+ * @orig_skb: the original outgoing packet
+ * @stamp: either raw hardware time stamp or result of ktime_get_real()
+ * @dev: NULL if time stamp from ktime_get_real(), otherwise device
+ * which generated the hardware time stamp; the device may or
+ * may not implement
+ *
+ * This function will not actually timestamp the skb, but, if the skb has a
+ * socket associated, clone the skb, timestamp it, and queue it to the error
+ * queue of the socket. Errors are silently ignored.
+ */
+void skb_hwtstamp_tx(struct sk_buff *orig_skb,
+ union ktime stamp,
+ struct net_device *dev);
+
+/**
+ * skb_tx_software_timestamp - software fallback for send time stamping
+ */
+static inline void skb_tx_software_timestamp(struct sk_buff *skb)
+{
+ if ((skb->tstamp.tv64 & SKB_TSTAMP_TX_SOFTWARE) &&
+ !(skb->tstamp.tv64 & SKB_TSTAMP_TX_HARDWARE_IN_PROGRESS)) {
+ skb_hwtstamp_tx(skb, ktime_get_real(), NULL);
+ }
+}
+
extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
diff --git a/include/net/sock.h b/include/net/sock.h
index 739a8e8..98af0a4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1283,6 +1283,20 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
}
/**
+ * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
+ * @msg: outgoing packet
+ * @sk: socket sending this packet
+ * @tstamp: set to combination of SKB_TSTAMP_TX_* flags by this function
+ *
+ * Currently only depends on SOCK_TIMESTAMPING* flags. Returns error code if
+ * parameters are invalid.
+ */
+extern int sock_tx_timestamp(struct msghdr *msg,
+ struct sock *sk,
+ union ktime *tstamp);
+
+
+/**
* sk_eat_skb - Release a skb if it is no longer needed
* @sk: socket to eat this skb from
* @skb: socket buffer to eat
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ae08d3..7cf31fb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1623,9 +1623,20 @@ static int dev_gso_segment(struct sk_buff *skb)
int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
struct netdev_queue *txq)
{
+ int rc;
+ union ktime tstamp = skb->tstamp;
+
if (likely(!skb->next)) {
- if (!list_empty(&ptype_all))
+ if (!list_empty(&ptype_all)) {
+ /*
+ * dev_queue_xmit_nit() sets skb->tstamp if
+ * net time stamping is on: when calling
+ * dev->hard_start_xmit() we need the original
+ * SKB_TSTAMP_* flags there, so restore it
+ */
dev_queue_xmit_nit(skb, dev);
+ skb->tstamp = tstamp;
+ }
if (netif_needs_gso(dev, skb)) {
if (unlikely(dev_gso_segment(skb)))
@@ -1634,13 +1645,29 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
goto gso;
}
- return dev->hard_start_xmit(skb, dev);
+ rc = dev->hard_start_xmit(skb, dev);
+ /*
+ * TODO: if skb_orphan() was called by
+ * dev->hard_start_xmit() (for example, the unmodified
+ * igb driver does that; bnx2 doesn't), then
+ * skb_tx_software_timestamp() will be unable to send
+ * back the time stamp.
+ *
+ * How can this be prevented? Always create another
+ * reference to the socket before calling
+ * dev->hard_start_xmit()? Prevent that skb_orphan()
+ * does anything in dev->hard_start_xmit() by clearing
+ * the skb destructor before the call and restoring it
+ * afterwards, then doing the skb_orphan() ourselves?
+ */
+ if (likely(!rc))
+ skb_tx_software_timestamp(skb);
+ return rc;
}
gso:
do {
struct sk_buff *nskb = skb->next;
- int rc;
skb->next = nskb->next;
nskb->next = NULL;
@@ -1650,6 +1677,7 @@ gso:
skb->next = nskb;
return rc;
}
+ skb_tx_software_timestamp(skb);
if (unlikely(netif_tx_queue_stopped(txq) && skb->next))
return NETDEV_TX_BUSY;
} while (skb->next);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3663b62..7d714b8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2566,6 +2566,42 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
return elt;
}
+void skb_hwtstamp_tx(struct sk_buff *orig_skb,
+ union ktime stamp,
+ struct net_device *dev)
+{
+ struct sock *sk = orig_skb->sk;
+ struct sk_buff *skb;
+ int err = -ENOMEM;
+
+ if (!sk)
+ return;
+
+ skb = skb_clone(orig_skb, GFP_ATOMIC);
+ if (!skb)
+ return;
+
+ if (dev) {
+ skb_hwtstamp_set(skb,
+ dev->hwtstamp_raw2sys ?
+ dev->hwtstamp_raw2sys(dev, stamp) :
+ stamp);
+ } else {
+ skb->tstamp = stamp;
+#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+ skb->tstamp.tv.sec = skb->tstamp.tv.sec / 2 * 2;
+#else
+ skb->tstamp.tv64 = skb->tstamp.tv64 / 2 * 2;
+#endif
+ }
+
+ err = sock_queue_err_skb(sk, skb);
+ if (err)
+ kfree_skb(skb);
+}
+EXPORT_SYMBOL_GPL(skb_hwtstamp_tx);
+
+
/**
* skb_partial_csum_set - set up and verify partial csum values for packet
* @skb: the skb to set
diff --git a/net/socket.c b/net/socket.c
index 6fb6b40..ea4b128 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -546,6 +546,21 @@ void sock_release(struct socket *sock)
sock->file = NULL;
}
+int sock_tx_timestamp(struct msghdr *msg, struct sock *sk, ktime_t *tstamp)
+{
+ if (!sk) {
+ tstamp->tv64 = 0;
+ } else {
+ tstamp->tv64 =
+ (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE) ?
+ SKB_TSTAMP_TX_HARDWARE : 0) |
+ (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE) ?
+ SKB_TSTAMP_TX_SOFTWARE : 0);
+ }
+ return 0;
+}
+EXPORT_SYMBOL(sock_tx_timestamp);
+
static inline int __sock_sendmsg(struct kiocb *iocb, struct socket *sock,
struct msghdr *msg, size_t size)
{
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 04/13] net: implement generic SOF_TIMESTAMPING_TX_* support
2008-10-24 13:41 ` [RFC PATCH 04/13] net: implement generic SOF_TIMESTAMPING_TX_* support Patrick Ohly
@ 2008-11-11 23:15 ` Octavian Purdila
2008-11-12 8:38 ` Patrick Ohly
0 siblings, 1 reply; 48+ messages in thread
From: Octavian Purdila @ 2008-11-11 23:15 UTC (permalink / raw)
To: Patrick Ohly
Cc: netdev, Stephen Hemminger, Ingo Oeser, Andi Kleen, John Ronciak,
Eric Dumazet, Oliver Hartkopp
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Fri, 24 Oct 2008 15:41:34 +0200
> + if (dev) {
> + skb_hwtstamp_set(skb,
> + dev->hwtstamp_raw2sys ?
> + dev->hwtstamp_raw2sys(dev, stamp) :
> + stamp);
> + } else {
> + skb->tstamp = stamp;
> +#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
> + skb->tstamp.tv.sec = skb->tstamp.tv.sec / 2 * 2;
> +#else
> + skb->tstamp.tv64 = skb->tstamp.tv64 / 2 * 2;
> +#endif
> + }
> +
I think the addition of the following bits will be of use to applications:
serr = SKB_EXT_ERR(skb);
serr->ee.ee_origin = SO_EE_ORIGIN_TXTSTAMP;
serr->ee.ee_mac = skb->mac.raw - skb->data;
serr->ee.ee_network = skb->nh.raw - skb->data;
serr->ee.ee_transport = skb->h.raw - skb->data;
For example, for UDP PTP we don't have to manually skip the ethernet (and take
into account VLANs) and IP headers.
Thanks,
tavi
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 04/13] net: implement generic SOF_TIMESTAMPING_TX_* support
2008-11-11 23:15 ` Octavian Purdila
@ 2008-11-12 8:38 ` Patrick Ohly
0 siblings, 0 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-11-12 8:38 UTC (permalink / raw)
To: Octavian Purdila
Cc: netdev@vger.kernel.org, Stephen Hemminger, Ingo Oeser, Andi Kleen,
Ronciak, John, Eric Dumazet, Oliver Hartkopp
On Tue, 2008-11-11 at 23:15 +0000, Octavian Purdila wrote:
> I think the addition of the following bits will be of use to applications:
>
> serr = SKB_EXT_ERR(skb);
> serr->ee.ee_origin = SO_EE_ORIGIN_TXTSTAMP;
> serr->ee.ee_mac = skb->mac.raw - skb->data;
> serr->ee.ee_network = skb->nh.raw - skb->data;
> serr->ee.ee_transport = skb->h.raw - skb->data;
>
> For example, for UDP PTP we don't have to manually skip the ethernet (and take
> into account VLANs) and IP headers.
Yes, that would be useful. Will add it.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 48+ messages in thread
* [RFC PATCH 05/13] ip: support for TX timestamps on UDP and RAW sockets
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (3 preceding siblings ...)
2008-10-24 13:41 ` [RFC PATCH 04/13] net: implement generic SOF_TIMESTAMPING_TX_* support Patrick Ohly
@ 2008-10-24 13:49 ` Patrick Ohly
2008-11-12 9:59 ` David Miller
2008-10-29 14:48 ` [RFC PATCH 06/13] workaround: detect time stamp when command flags are expected Patrick Ohly
` (8 subsequent siblings)
13 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-10-24 13:49 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
include/net/ip.h | 1 +
net/can/raw.c | 8 ++++++++
net/ipv4/icmp.c | 2 ++
net/ipv4/ip_output.c | 2 ++
net/ipv4/raw.c | 1 +
net/ipv4/udp.c | 4 ++++
6 files changed, 18 insertions(+), 0 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index 250e6ef..76cee15 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -54,6 +54,7 @@ struct ipcm_cookie
__be32 addr;
int oif;
struct ip_options *opt;
+ union ktime tstamp;
};
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
diff --git a/net/can/raw.c b/net/can/raw.c
index 6e0663f..b3a978b 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -618,6 +618,9 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
struct raw_sock *ro = raw_sk(sk);
struct sk_buff *skb;
struct net_device *dev;
+ union ktime tstamp = {
+ .tv64 = 0.
+ };
int ifindex;
int err;
@@ -639,6 +642,10 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
if (!dev)
return -ENXIO;
+ err = sock_tx_timestamp(msg, sk, &tstamp);
+ if (err < 0)
+ return err;
+
skb = sock_alloc_send_skb(sk, size, msg->msg_flags & MSG_DONTWAIT,
&err);
if (!skb) {
@@ -654,6 +661,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct socket *sock,
}
skb->dev = dev;
skb->sk = sk;
+ skb->tstamp = tstamp;
err = can_send(skb, ro->loopback);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 55c355e..27cd661 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -375,6 +375,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
inet->tos = ip_hdr(skb)->tos;
daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
+ ipc.tstamp.tv64 = 0;
if (icmp_param->replyopts.optlen) {
ipc.opt = &icmp_param->replyopts;
if (ipc.opt->srr)
@@ -532,6 +533,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
inet_sk(sk)->tos = tos;
ipc.addr = iph->saddr;
ipc.opt = &icmp_param.replyopts;
+ ipc.tstamp.tv64 = 0;
{
struct flowi fl = {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d533a89..437906d 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -939,6 +939,7 @@ alloc_new_skb:
skb->ip_summed = csummode;
skb->csum = 0;
skb_reserve(skb, hh_len);
+ skb->tstamp = ipc->tstamp;
/*
* Find where to start putting bytes.
@@ -1353,6 +1354,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
+ ipc.tstamp.tv64 = 0;
if (replyopts.opt.optlen) {
ipc.opt = &replyopts.opt;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index cd97574..2120ac5 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -493,6 +493,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.addr = inet->saddr;
ipc.opt = NULL;
+ ipc.tstamp.tv64 = 0;
ipc.oif = sk->sk_bound_dev_if;
if (msg->msg_controllen) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 57e26fa..6b9c544 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -557,6 +557,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
return -EOPNOTSUPP;
ipc.opt = NULL;
+ ipc.tstamp.tv64 = 0;
if (up->pending) {
/*
@@ -604,6 +605,9 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
ipc.addr = inet->saddr;
ipc.oif = sk->sk_bound_dev_if;
+ err = sock_tx_timestamp(msg, sk, &ipc.tstamp);
+ if (err)
+ return err;
if (msg->msg_controllen) {
err = ip_cmsg_send(sock_net(sk), msg, &ipc);
if (err)
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 05/13] ip: support for TX timestamps on UDP and RAW sockets
2008-10-24 13:49 ` [RFC PATCH 05/13] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
@ 2008-11-12 9:59 ` David Miller
0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2008-11-12 9:59 UTC (permalink / raw)
To: patrick.ohly
Cc: netdev, opurdila, shemminger, netdev, ak, john.ronciak, dada1,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Fri, 24 Oct 2008 15:49:10 +0200
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 250e6ef..76cee15 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -54,6 +54,7 @@ struct ipcm_cookie
> __be32 addr;
> int oif;
> struct ip_options *opt;
> + union ktime tstamp;
> };
>
> #define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb))
Please use ktime_t
^ permalink raw reply [flat|nested] 48+ messages in thread
* [RFC PATCH 06/13] workaround: detect time stamp when command flags are expected
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (4 preceding siblings ...)
2008-10-24 13:49 ` [RFC PATCH 05/13] ip: support for TX timestamps on UDP and RAW sockets Patrick Ohly
@ 2008-10-29 14:48 ` Patrick Ohly
2008-11-12 10:00 ` David Miller
2008-10-31 11:43 ` [RFC PATCH 07/13] net: add SIOCSHWTSTAMP - hardware time stamping of packets Patrick Ohly
` (7 subsequent siblings)
13 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-10-29 14:48 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
This happens when IP_MULTICAST_LOOP is on. Apparently the time
stamped packet goes through the loop device's start_hard_xmit?!
TODO: find a clean solution.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
net/core/skbuff.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7d714b8..7d9f1dd 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2574,6 +2574,13 @@ void skb_hwtstamp_tx(struct sk_buff *orig_skb,
struct sk_buff *skb;
int err = -ENOMEM;
+ /* sanity check: extra bits set => might be a real time stamp */
+ if (orig_skb->tstamp.tv64 & ~(SKB_TSTAMP_TX_HARDWARE|SKB_TSTAMP_TX_HARDWARE_IN_PROGRESS|SKB_TSTAMP_TX_SOFTWARE)) {
+ printk(KERN_DEBUG
+ "skb_hwtstamp_tx: invalid command flags\n");
+ return;
+ }
+
if (!sk)
return;
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 06/13] workaround: detect time stamp when command flags are expected
2008-10-29 14:48 ` [RFC PATCH 06/13] workaround: detect time stamp when command flags are expected Patrick Ohly
@ 2008-11-12 10:00 ` David Miller
0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2008-11-12 10:00 UTC (permalink / raw)
To: patrick.ohly
Cc: netdev, opurdila, shemminger, netdev, ak, john.ronciak, dada1,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 29 Oct 2008 15:48:48 +0100
> + /* sanity check: extra bits set => might be a real time stamp */
> + if (orig_skb->tstamp.tv64 & ~(SKB_TSTAMP_TX_HARDWARE|SKB_TSTAMP_TX_HARDWARE_IN_PROGRESS|SKB_TSTAMP_TX_SOFTWARE)) {
Line is way too long, split up and group the bits:
if (orig_skb->tstamp.tv64 & ~(SKB_TSTAMP_TX_HARDWARE |
SKB_TSTAMP_TX_HARDWARE_IN_PROGRESS |
SKB_TSTAMP_TX_SOFTWARE)) {
^ permalink raw reply [flat|nested] 48+ messages in thread
* [RFC PATCH 07/13] net: add SIOCSHWTSTAMP - hardware time stamping of packets
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (5 preceding siblings ...)
2008-10-29 14:48 ` [RFC PATCH 06/13] workaround: detect time stamp when command flags are expected Patrick Ohly
@ 2008-10-31 11:43 ` Patrick Ohly
2008-10-31 12:21 ` [RFC PATCH 08/13] igb: stub support for SIOCSHWTSTAMP Patrick Ohly
` (6 subsequent siblings)
13 siblings, 0 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-10-31 11:43 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
In its current form the new ioctl allows time stamping
PTP packets (all currently useful flavors) and all packets.
This should be good enough for the use cases discussed
on Linux netdev so far.
It does not yet allow user space control over the clock
in the NIC. If this should become necessary, then it will
have to be extended.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
Documentation/networking/timestamping.txt | 23 ++++++++++++++++-
.../networking/timestamping/timestamping.c | 26 ++++++++++++++++++++
fs/compat_ioctl.c | 1 +
include/net/timestamping.h | 7 ++++-
net/core/dev.c | 2 +
5 files changed, 56 insertions(+), 3 deletions(-)
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 6a87a96..537e55b 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -96,6 +96,24 @@ struct hwtstamp_config {
int rx_filter_type; /**< HWTSTAMP_RX_* */
};
+Desired behavior is passed into the kernel and to a specific device by
+calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose
+ifr_data points to a struct hwtstamp_config. The tx_type and
+rx_filter_type are hints to the driver what it is expected to do. If
+the requested fine-grained filtering for incoming packets is not
+supported, the driver may time stamp more than just the requested types
+of packets.
+
+A driver which supports hardware time stamping shall update the struct
+with the actual, possibly more permissive configuration. If the
+requested packets cannot be time stamped, then nothing should be
+changed and ERANGE shall be returned (in contrast to EINVAL, which
+indicates that SIOCSHWTSTAMP is not supported at all).
+
+Only a processes with admin rights may change the configuration. User
+space is responsible to ensure that multiple processes don't interfere
+with each other and that the settings are reset.
+
/** possible values for hwtstamp_config->tx_type */
enum {
/**
@@ -111,7 +129,7 @@ enum {
* time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
* before sending the packet
*/
- HWSTAMP_TX_ON,
+ HWTSTAMP_TX_ON,
};
/** possible values for hwtstamp_config->rx_filter_type */
@@ -122,6 +140,9 @@ enum {
/** time stamp any incoming packet */
HWTSTAMP_FILTER_ALL,
+ /** return value: time stamp all packets requested plus some others */
+ HWTSTAMP_FILTER_SOME,
+
/** PTP v1, UDP, any kind of event packet */
HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
index abb2b79..bfb6cd6 100644
--- a/Documentation/networking/timestamping/timestamping.c
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -270,6 +270,8 @@ int main(int argc, char **argv)
int enabled = 1;
int sock;
struct ifreq device;
+ struct ifreq hwtstamp;
+ struct hwtstamp_config hwconfig, hwconfig_requested;
struct sockaddr_in addr;
struct ip_mreq imr;
struct in_addr iaddr;
@@ -323,6 +325,30 @@ int main(int argc, char **argv)
bail("getting interface IP address");
}
+ memset(&hwtstamp, 0, sizeof(hwtstamp));
+ strncpy(hwtstamp.ifr_name, interface, sizeof(hwtstamp.ifr_name));
+ hwtstamp.ifr_data = (void *)&hwconfig;
+ memset(&hwconfig, 0, sizeof(&hwconfig));
+ hwconfig.tx_type =
+ (so_timestamping_flags & SOF_TIMESTAMPING_TX_HARDWARE) ?
+ HWTSTAMP_TX_ON : HWTSTAMP_TX_OFF;
+ hwconfig.rx_filter_type =
+ (so_timestamping_flags & SOF_TIMESTAMPING_RX_HARDWARE) ?
+ HWTSTAMP_FILTER_PTP_V1_L4_SYNC : HWTSTAMP_FILTER_NONE;
+ hwconfig_requested = hwconfig;
+ if (ioctl(sock, SIOCSHWTSTAMP, &hwtstamp) < 0) {
+ if (errno == EINVAL &&
+ hwconfig_requested.tx_type == HWTSTAMP_TX_OFF &&
+ hwconfig_requested.rx_filter_type == HWTSTAMP_FILTER_NONE) {
+ printf("SIOCSHWTSTAMP: disabling hardware time stamping not possible\n");
+ } else {
+ bail("SIOCSHWTSTAMP");
+ }
+ }
+ printf("SIOCSHWTSTAMP: tx_type %d requested, got %d; rx_filter_type %d requested, got %d\n",
+ hwconfig_requested.tx_type, hwconfig.tx_type,
+ hwconfig_requested.rx_filter_type, hwconfig.rx_filter_type);
+
/* bind to PTP port */
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 5235c67..a5001a6 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -2555,6 +2555,7 @@ HANDLE_IOCTL(SIOCSIFMAP, dev_ifsioc)
HANDLE_IOCTL(SIOCGIFADDR, dev_ifsioc)
HANDLE_IOCTL(SIOCSIFADDR, dev_ifsioc)
HANDLE_IOCTL(SIOCSIFHWBROADCAST, dev_ifsioc)
+HANDLE_IOCTL(SIOCSHWTSTAMP, dev_ifsioc)
/* ioctls used by appletalk ddp.c */
HANDLE_IOCTL(SIOCATALKDIFADDR, dev_ifsioc)
diff --git a/include/net/timestamping.h b/include/net/timestamping.h
index 53cb603..c271caa 100644
--- a/include/net/timestamping.h
+++ b/include/net/timestamping.h
@@ -28,7 +28,7 @@ enum {
# define SIOCSHWTSTAMP 0x89b0
#endif
-/** %SIOCSHWTSTAMP expects a pointer to this struct */
+/** %SIOCSHWTSTAMP expects a struct ifreq with a ifr_data pointer to this struct */
struct hwtstamp_config {
int flags; /**< no flags defined right now, must be zero */
int tx_type; /**< one of HWTSTAMP_TX_* */
@@ -50,7 +50,7 @@ enum {
* time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
* before sending the packet
*/
- HWSTAMP_TX_ON,
+ HWTSTAMP_TX_ON,
};
/** possible values for hwtstamp_config->rx_filter_type */
@@ -61,6 +61,9 @@ enum {
/** time stamp any incoming packet */
HWTSTAMP_FILTER_ALL,
+ /** return value: time stamp all packets requested plus some others */
+ HWTSTAMP_FILTER_SOME,
+
/** PTP v1, UDP, any kind of event packet */
HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
/** PTP v1, UDP, Sync packet */
diff --git a/net/core/dev.c b/net/core/dev.c
index 7cf31fb..69d7c04 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3621,6 +3621,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, unsigned int cmd)
cmd == SIOCSMIIREG ||
cmd == SIOCBRADDIF ||
cmd == SIOCBRDELIF ||
+ cmd == SIOCSHWTSTAMP ||
cmd == SIOCWANDEV) {
err = -EOPNOTSUPP;
if (dev->do_ioctl) {
@@ -3776,6 +3777,7 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
case SIOCBONDCHANGEACTIVE:
case SIOCBRADDIF:
case SIOCBRDELIF:
+ case SIOCSHWTSTAMP:
if (!capable(CAP_NET_ADMIN))
return -EPERM;
/* fall through */
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* [RFC PATCH 08/13] igb: stub support for SIOCSHWTSTAMP
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (6 preceding siblings ...)
2008-10-31 11:43 ` [RFC PATCH 07/13] net: add SIOCSHWTSTAMP - hardware time stamping of packets Patrick Ohly
@ 2008-10-31 12:21 ` Patrick Ohly
2008-11-04 9:23 ` [RFC PATCH 09/13] clocksource: allow usage independent of timekeeping.c Patrick Ohly
` (5 subsequent siblings)
13 siblings, 0 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-10-31 12:21 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
drivers/net/igb/igb_main.c | 32 ++++++++++++++++++++++++++++++++
1 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 634c4c9..becf8d6 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -34,6 +34,7 @@
#include <linux/ipv6.h>
#include <net/checksum.h>
#include <net/ip6_checksum.h>
+#include <net/timestamping.h>
#include <linux/mii.h>
#include <linux/ethtool.h>
#include <linux/if_vlan.h>
@@ -4103,6 +4104,35 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
}
/**
+ * igb_hwtstamp_ioctl - control hardware time stamping
+ * @netdev:
+ * @ifreq:
+ * @cmd:
+ *
+ * Currently cannot enable any kind of hardware time stamping, but
+ * supports SIOCSHWTSTAMP in general.
+ **/
+static int igb_hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+ struct hwtstamp_config config;
+
+ printk("igb_hwtstamp_ioctl\n");
+
+ if (copy_from_user(&config, ifr->ifr_data, sizeof(config)))
+ return -EFAULT;
+
+ /* reserved for future extensions */
+ if (config.flags)
+ return -EINVAL;
+
+ if (config.tx_type == HWTSTAMP_TX_OFF &&
+ config.rx_filter_type == HWTSTAMP_FILTER_NONE)
+ return 0;
+
+ return -ERANGE;
+}
+
+/**
* igb_ioctl -
* @netdev:
* @ifreq:
@@ -4115,6 +4145,8 @@ static int igb_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
case SIOCGMIIREG:
case SIOCSMIIREG:
return igb_mii_ioctl(netdev, ifr, cmd);
+ case SIOCSHWTSTAMP:
+ return igb_hwtstamp_ioctl(netdev, ifr, cmd);
default:
return -EOPNOTSUPP;
}
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* [RFC PATCH 09/13] clocksource: allow usage independent of timekeeping.c
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (7 preceding siblings ...)
2008-10-31 12:21 ` [RFC PATCH 08/13] igb: stub support for SIOCSHWTSTAMP Patrick Ohly
@ 2008-11-04 9:23 ` Patrick Ohly
2008-11-12 10:04 ` David Miller
2008-11-04 9:27 ` [RFC PATCH 10/13] igb: infrastructure for hardware time stamping Patrick Ohly
` (4 subsequent siblings)
13 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-11-04 9:23 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
So far struct clocksource acted as the interface between time/timekeeping
and hardware. This patch generalizes the concept so that the same
interface can also be used in other contexts.
The only change as far as kernel/time/timekeeping is concerned is that
the hardware access can be done either with or without passing
the clocksource pointer as context. This is necessary in those
cases when there is more than one instance of the hardware.
The extensions in this patch add code which turns the raw cycle count
provided by hardware into a continously increasing time value. This
reuses fields also used by timekeeping.c. Because of slightly different
semantic (__get_nsec_offset does not update cycle_last, clocksource_read_ns
does that transparently) timekeeping.c was not modified to use the
generalized code.
The new code does no locking of the clocksource. This is the responsibility
of the caller.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
include/linux/clocksource.h | 119 ++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 118 insertions(+), 1 deletions(-)
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 55e434f..da4c7cd 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -24,6 +24,9 @@ struct clocksource;
/**
* struct clocksource - hardware abstraction for a free running counter
* Provides mostly state-free accessors to the underlying hardware.
+ * Also provides utility functions which convert the underlying
+ * hardware cycle values into a non-decreasing count of nanoseconds
+ * ("time").
*
* @name: ptr to clocksource name
* @list: list head for registration
@@ -43,6 +46,9 @@ struct clocksource;
* The ideal clocksource. A must-use where
* available.
* @read: returns a cycle value
+ * @read_clock: alternative to read which gets a pointer to the clock
+ * source so that the same code can read different clocks;
+ * either read or read_clock must be set
* @mask: bitmask for two's complement
* subtraction of non 64 bit counters
* @mult: cycle to nanosecond multiplier
@@ -61,6 +67,7 @@ struct clocksource {
struct list_head list;
int rating;
cycle_t (*read)(void);
+ cycle_t (*read_clock)(struct clocksource *cs);
cycle_t mask;
u32 mult;
u32 shift;
@@ -166,7 +173,7 @@ static inline u32 clocksource_hz2mult(u32 hz, u32 shift_constant)
*/
static inline cycle_t clocksource_read(struct clocksource *cs)
{
- return cs->read();
+ return (cs->read ? cs->read() : cs->read_clock(cs));
}
/**
@@ -186,6 +193,116 @@ static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles)
}
/**
+ * clocksource_read_ns - get nanoseconds since last call of this function
+ * (never negative)
+ * @cs: Pointer to clocksource
+ *
+ * When the underlying cycle counter runs over, this will be handled
+ * correctly as long as it does not run over more than once between
+ * calls.
+ *
+ * The first call to this function for a new clock source initializes
+ * the time tracking and returns bogus results.
+ */
+static inline s64 clocksource_read_ns(struct clocksource *cs)
+{
+ cycle_t cycle_now, cycle_delta;
+ s64 ns_offset;
+
+ /* read clocksource: */
+ cycle_now = clocksource_read(cs);
+
+ /* calculate the delta since the last clocksource_read_ns: */
+ cycle_delta = (cycle_now - cs->cycle_last) & cs->mask;
+
+ /* convert to nanoseconds: */
+ ns_offset = cyc2ns(cs, cycle_delta);
+
+ /* update time stamp of clocksource_read_ns call: */
+ cs->cycle_last = cycle_now;
+
+ return ns_offset;
+}
+
+/**
+ * clocksource_init_time - initialize a clock source for use with
+ * %clocksource_read_time() and
+ * %clocksource_cyc2time()
+ * @cs: Pointer to clocksource.
+ * @start_tstamp: Arbitrary initial time stamp.
+ *
+ * After this call the current cycle register (roughly) corresponds to
+ * the initial time stamp. Every call to %clocksource_read_time()
+ * increments the time stamp counter by the number of elapsed
+ * nanoseconds.
+ */
+static inline void clocksource_init_time(struct clocksource *cs,
+ u64 start_tstamp)
+{
+ cs->cycle_last = clocksource_read(cs);
+ cs->xtime_nsec = start_tstamp;
+}
+
+/**
+ * clocksource_read_time - return nanoseconds since %clocksource_init_time()
+ * plus the initial time stamp
+ * @cs: Pointer to clocksource.
+ *
+ * In other words, keeps track of time since the same epoch as
+ * the function which generated the initial time stamp. Don't mix
+ * with calls to %clocksource_read_ns()!
+ */
+static inline u64 clocksource_read_time(struct clocksource *cs)
+{
+ u64 nsec;
+
+ /* increment time by nanoseconds since last call */
+ nsec = clocksource_read_ns(cs);
+ nsec += cs->xtime_nsec;
+ cs->xtime_nsec = nsec;
+
+ return nsec;
+}
+
+/**
+ * clocksource_cyc2time - convert an absolute cycle time stamp to same
+ * time base as values returned by
+ * %clocksource_read_time()
+ * @cs: Pointer to clocksource.
+ * @cycle_tstamp: a value returned by cs->read()
+ *
+ * Cycle time stamps that are converted correctly as long as they
+ * fall into the time interval [-1/2 max cycle count, 1/2 cycle count],
+ * with "max cycle count" == cs->mask+1.
+ *
+ * This avoids situations where a cycle time stamp is generated, the
+ * current cycle counter is updated, and then when transforming the
+ * time stamp the value is treated as if it was in the future. Always
+ * updating the cycle counter would also work, but incurr additional
+ * overhead.
+ */
+static inline u64 clocksource_cyc2time(struct clocksource *cs,
+ cycle_t cycle_tstamp)
+{
+ u64 cycle_delta = (cycle_tstamp - cs->cycle_last) & cs->mask;
+ u64 nsec;
+
+ /*
+ * Instead of always treating cycle_tstamp as more recent
+ * than cs->cycle_last, detect when it is too far in the
+ * future and treat it as old time stamp instead.
+ */
+ if (cycle_delta > cs->mask / 2) {
+ cycle_delta = (cs->cycle_last - cycle_tstamp) & cs->mask;
+ nsec = cs->xtime_nsec - cyc2ns(cs, cycle_delta);
+ } else {
+ nsec = cyc2ns(cs, cycle_delta) + cs->xtime_nsec;
+ }
+
+ return nsec;
+}
+
+/**
* clocksource_calculate_interval - Calculates a clocksource interval struct
*
* @c: Pointer to clocksource.
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 09/13] clocksource: allow usage independent of timekeeping.c
2008-11-04 9:23 ` [RFC PATCH 09/13] clocksource: allow usage independent of timekeeping.c Patrick Ohly
@ 2008-11-12 10:04 ` David Miller
0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2008-11-12 10:04 UTC (permalink / raw)
To: patrick.ohly
Cc: netdev, opurdila, shemminger, netdev, ak, john.ronciak, dada1,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Tue, 4 Nov 2008 10:23:42 +0100
> So far struct clocksource acted as the interface between time/timekeeping
> and hardware. This patch generalizes the concept so that the same
> interface can also be used in other contexts.
>
> The only change as far as kernel/time/timekeeping is concerned is that
> the hardware access can be done either with or without passing
> the clocksource pointer as context. This is necessary in those
> cases when there is more than one instance of the hardware.
>
> The extensions in this patch add code which turns the raw cycle count
> provided by hardware into a continously increasing time value. This
> reuses fields also used by timekeeping.c. Because of slightly different
> semantic (__get_nsec_offset does not update cycle_last, clocksource_read_ns
> does that transparently) timekeeping.c was not modified to use the
> generalized code.
>
> The new code does no locking of the clocksource. This is the responsibility
> of the caller.
>
> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
This patch, since it changes generic facilities in the kernel rather
than networking specific ones, will need to get a full review on
linux-kernel
^ permalink raw reply [flat|nested] 48+ messages in thread
* [RFC PATCH 10/13] igb: infrastructure for hardware time stamping
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (8 preceding siblings ...)
2008-11-04 9:23 ` [RFC PATCH 09/13] clocksource: allow usage independent of timekeeping.c Patrick Ohly
@ 2008-11-04 9:27 ` Patrick Ohly
2008-11-05 9:58 ` [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time Patrick Ohly
` (3 subsequent siblings)
13 siblings, 0 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-11-04 9:27 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
Adds register definitions and a clocksource accessing the
NIC time.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
drivers/net/igb/e1000_regs.h | 28 +++++++++++
drivers/net/igb/igb.h | 3 +
drivers/net/igb/igb_main.c | 105 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 136 insertions(+), 0 deletions(-)
diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h
index 95523af..37f9d55 100644
--- a/drivers/net/igb/e1000_regs.h
+++ b/drivers/net/igb/e1000_regs.h
@@ -75,6 +75,34 @@
#define E1000_FCRTH 0x02168 /* Flow Control Receive Threshold High - RW */
#define E1000_RDFPCQ(_n) (0x02430 + (0x4 * (_n)))
#define E1000_FCRTV 0x02460 /* Flow Control Refresh Timer Value - RW */
+
+/* IEEE 1588 TIMESYNCH */
+#define E1000_TSYNCTXCTL 0x0B614
+#define E1000_TSYNCRXCTL 0x0B620
+#define E1000_TSYNCRXCFG 0x05F50
+
+#define E1000_SYSTIML 0x0B600
+#define E1000_SYSTIMH 0x0B604
+#define E1000_TIMINCA 0x0B608
+
+#define E1000_RXMTRL 0x0B634
+#define E1000_RXSTMPL 0x0B624
+#define E1000_RXSTMPH 0x0B628
+#define E1000_RXSATRL 0x0B62C
+#define E1000_RXSATRH 0x0B630
+
+#define E1000_TXSTMPL 0x0B618
+#define E1000_TXSTMPH 0x0B61C
+
+#define E1000_ETQF0 0x05CB0
+#define E1000_ETQF1 0x05CB4
+#define E1000_ETQF2 0x05CB8
+#define E1000_ETQF3 0x05CBC
+#define E1000_ETQF4 0x05CC0
+#define E1000_ETQF5 0x05CC4
+#define E1000_ETQF6 0x05CC8
+#define E1000_ETQF7 0x05CCC
+
/* Split and Replication RX Control - RW */
/*
* Convenience macros
diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index 4ff6f05..2938ab3 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -34,6 +34,8 @@
#include "e1000_mac.h"
#include "e1000_82575.h"
+#include <linux/clocksource.h>
+
struct igb_adapter;
#ifdef CONFIG_IGB_LRO
@@ -262,6 +264,7 @@ struct igb_adapter {
struct napi_struct napi;
struct pci_dev *pdev;
struct net_device_stats net_stats;
+ struct clocksource clock;
/* structs defined in e1000_hw.h */
struct e1000_hw hw;
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index becf8d6..3a4772e 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -179,6 +179,54 @@ MODULE_DESCRIPTION("Intel(R) Gigabit Ethernet Network Driver");
MODULE_LICENSE("GPL");
MODULE_VERSION(DRV_VERSION);
+/**
+ * Scale the NIC clock cycle by a large factor so that
+ * relatively small clock corrections can be added or
+ * substracted at each clock tick. The drawbacks of a
+ * large factor are a) that the clock register overflows
+ * more quickly (not such a big deal) and b) that the
+ * increment per tick has to fit into 24 bits.
+ *
+ * Note that
+ * TIMINCA = IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS *
+ * IGB_TSYNC_SCALE
+ * TIMINCA += TIMINCA * adjustment [ppm] / 1e9
+ *
+ * The base scale factor is intentionally a power of two
+ * so that the division in clocksource can be done with
+ * a shift.
+ */
+#define IGB_TSYNC_SHIFT (19)
+#define IGB_TSYNC_SCALE (1<<IGB_TSYNC_SHIFT)
+
+/**
+ * The duration of one clock cycle of the NIC.
+ *
+ * @todo This hard-coded value is part of the specification and might change
+ * in future hardware revisions. Add revision check.
+ */
+#define IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS 16
+
+#if (IGB_TSYNC_SCALE * IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS) >= (1<<24)
+# error IGB_TSYNC_SCALE and/or IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS are too large to fit into TIMINCA
+#endif
+
+/**
+ * igb_read_clock - read raw cycle counter (to be used by clocksource)
+ */
+static cycle_t igb_read_clock(struct clocksource *cs)
+{
+ struct igb_adapter *adapter =
+ container_of(cs, struct igb_adapter, clock);
+ struct e1000_hw *hw = &adapter->hw;
+ u64 stamp;
+
+ stamp = rd32(E1000_SYSTIML);
+ stamp |= (u64)rd32(E1000_SYSTIMH) << 32ULL;
+
+ return stamp;
+}
+
#ifdef DEBUG
/**
* igb_get_hw_dev_name - return device name string
@@ -189,6 +237,27 @@ char *igb_get_hw_dev_name(struct e1000_hw *hw)
struct igb_adapter *adapter = hw->back;
return adapter->netdev->name;
}
+
+/**
+ * igb_get_time_str - format current NIC and system time as string
+ */
+static char *igb_get_time_str(struct igb_adapter *adapter,
+ char buffer[160])
+{
+ struct timespec nic = ns_to_timespec(clocksource_read_time(&adapter->clock));
+ struct timespec sys;
+ struct timespec delta;
+ getnstimeofday(&sys);
+
+ delta = timespec_sub(nic, sys);
+
+ sprintf(buffer, "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ (long)nic.tv_sec, nic.tv_nsec,
+ (long)sys.tv_sec, sys.tv_nsec,
+ (long)delta.tv_sec, delta.tv_nsec);
+
+ return buffer;
+}
#endif
/**
@@ -1250,6 +1319,42 @@ static int __devinit igb_probe(struct pci_dev *pdev,
}
#endif
+ /*
+ * Initialize hardware timer: we keep it running just in case
+ * that some program needs it later on.
+ */
+ memset(&adapter->clock, 0, sizeof(adapter->clock));
+ adapter->clock.read_clock = igb_read_clock;
+ adapter->clock.mask = (u64)(s64)-1;
+ adapter->clock.mult = 1;
+ adapter->clock.shift = IGB_TSYNC_SHIFT;
+ wr32(E1000_TIMINCA, (1<<24) | IGB_TSYNC_CYCLE_TIME_IN_NANOSECONDS * IGB_TSYNC_SCALE);
+#if 0
+ /*
+ * Avoid rollover while we initialize by resetting the time counter.
+ */
+ wr32(E1000_SYSTIML, 0x00000000);
+ wr32(E1000_SYSTIMH, 0x00000000);
+#else
+ /*
+ * Set registers so that rollover occurs soon to test this.
+ */
+ wr32(E1000_SYSTIML, 0x00000000);
+ wr32(E1000_SYSTIMH, 0xFF800000);
+#endif
+ wrfl();
+ clocksource_init_time(&adapter->clock, ktime_to_ns(ktime_get_real()));
+
+#ifdef DEBUG
+ {
+ char buffer[160];
+ printk(KERN_DEBUG
+ "igb: %s: hw %p initialized timer\n",
+ igb_get_time_str(adapter, buffer),
+ &adapter->hw);
+ }
+#endif
+
dev_info(&pdev->dev, "Intel(R) Gigabit Ethernet Network Connection\n");
/* print bus type/speed/width info */
dev_info(&pdev->dev,
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (9 preceding siblings ...)
2008-11-04 9:27 ` [RFC PATCH 10/13] igb: infrastructure for hardware time stamping Patrick Ohly
@ 2008-11-05 9:58 ` Patrick Ohly
2008-11-11 16:18 ` Andi Kleen
2008-11-12 10:05 ` David Miller
2008-11-06 11:13 ` [RFC PATCH 12/13] igb: use clocksync to implement hardware time stamping Patrick Ohly
` (2 subsequent siblings)
13 siblings, 2 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-11-05 9:58 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
Currently only mapping from clock source to system time is implemented.
The interface could have been made more versatile by not depending on a clock source,
but this wasn't done to avoid writing glue code elsewhere.
The method implemented here is the one used and analyzed under the name
"assisted PTP" in the LCI PTP paper:
http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
include/linux/clocksync.h | 141 +++++++++++++++++++++++++++++++++++++++++++++
kernel/time/Makefile | 2 +-
kernel/time/clocksync.c | 108 ++++++++++++++++++++++++++++++++++
3 files changed, 250 insertions(+), 1 deletions(-)
create mode 100644 include/linux/clocksync.h
create mode 100644 kernel/time/clocksync.c
diff --git a/include/linux/clocksync.h b/include/linux/clocksync.h
new file mode 100644
index 0000000..a5bec6f
--- /dev/null
+++ b/include/linux/clocksync.h
@@ -0,0 +1,141 @@
+/*
+ * Utility code which helps transforming between hardware time stamps
+ * generated by a clocksource and system time. The clocksource is
+ * assumed to return monotonically increasing time (but this code does
+ * its best to compensate if that is not the case) whereas system time
+ * may jump.
+ */
+#ifndef _LINUX_CLOCKSYNC_H
+#define _LINUX_CLOCKSYNC_H
+
+#include <linux/clocksource.h>
+#include <linux/ktime.h>
+
+/**
+ * struct clocksync - stores state and configuration for the two clocks
+ *
+ * Initialize to zero, then set clock, systime, num_samples.
+ *
+ * Transformation between HW time and system time is done with:
+ * HW time transformed = HW time + offset +
+ * (HW time - last_update) * skew / CLOCKSYNC_SKEW_RESOLUTION
+ *
+ * @clock: the source for HW time stamps (%clocksource_read_time)
+ * @systime: function returning current system time (ktime_get
+ * for monotonic time, or ktime_get_real for wall clock)
+ * @num_samples: number of times that HW time and system time are to
+ * be compared when determining their offset
+ * @offset: (system time - HW time) at the time of the last update
+ * @skew: average (system time - HW time) / delta HW time *
+ * CLOCKSYNC_SKEW_RESOLUTION
+ * @last_update: last HW time stamp when clock offset was measured
+ */
+struct clocksync {
+ struct clocksource *clock;
+ union ktime (*systime)(void);
+ int num_samples;
+
+ s64 offset;
+ s64 skew;
+ u64 last_update;
+};
+
+/**
+ * CLOCKSYNC_SKEW_RESOLUTION - fixed point arithmetic scale factor for skew
+ *
+ * Usually one would measure skew in ppb (parts per billion, 1e9), but
+ * using a factor of 2 simplifies the math.
+ */
+#define CLOCKSYNC_SKEW_RESOLUTION (((s64)1)<<30)
+
+/**
+ * clocksync_hw2sys - transform HW time stamp into corresponding system time
+ * @sync: context for clock sync
+ * @hwtstamp: the result of %clocksource_read_time or
+ * %clocksource_cyc2time
+ */
+static inline union ktime clocksync_hw2sys(struct clocksync *sync,
+ u64 hwtstamp)
+{
+ u64 nsec;
+
+ nsec = hwtstamp + sync->offset;
+ nsec += (s64)(hwtstamp - sync->last_update) * sync->skew /
+ CLOCKSYNC_SKEW_RESOLUTION;
+
+ return ns_to_ktime(nsec);
+}
+
+/**
+ * clocksync_offset - measure current (system time - HW time) offset
+ * @sync: context for clock sync
+ * @offset: average offset during sample period returned here
+ * @hwtstamp: average HW time during sample period returned here
+ *
+ * Returns number of samples used. Might be zero (= no result) in the
+ * unlikely case that system time was monotonically decreasing for all
+ * samples (= broken).
+ */
+int clocksync_offset(struct clocksync *sync,
+ s64 *offset,
+ u64 *hwtstamp);
+
+/**
+ * clocksync_update - update offset and skew by measuring current offset
+ * @sync: context for clock sync
+ * @hwtstamp: the result of %clocksource_read_time or
+ * %clocksource_cyc2time, pass zero to force update
+ *
+ * Updates are only done at most once per second.
+ */
+static inline void clocksync_update(struct clocksync *sync,
+ u64 hwtstamp)
+{
+ s64 offset;
+ u64 average_time;
+
+ if (hwtstamp &&
+ (s64)(hwtstamp - sync->last_update) < NSEC_PER_SEC) {
+ return;
+ }
+
+ if (!clocksync_offset(sync, &offset, &average_time)) {
+ return;
+ }
+
+ printk(KERN_DEBUG
+ "average offset: %lld\n", offset);
+
+ if (!sync->last_update) {
+ sync->last_update = average_time;
+ sync->offset = offset;
+ sync->skew = 0;
+ } else {
+ s64 delta_nsec = average_time - sync->last_update;
+
+ /* avoid division by negative or small deltas */
+ if (delta_nsec >= 10000) {
+ s64 delta_offset_nsec = offset - sync->offset;
+ s64 skew = delta_offset_nsec *
+ CLOCKSYNC_SKEW_RESOLUTION /
+ delta_nsec;
+
+ /**
+ * Calculate new overall skew as 4/16 the
+ * old value and 12/16 the new one. This is
+ * a rather arbitrary tradeoff between
+ * only using the latest measurement (0/16 and
+ * 16/16) and even more weight on past measurements.
+ */
+#define CLOCKSYNC_NEW_SKEW_PER_16 12
+ sync->skew =
+ ((16 - CLOCKSYNC_NEW_SKEW_PER_16) * sync->skew +
+ CLOCKSYNC_NEW_SKEW_PER_16 * skew) /
+ 16;
+ sync->last_update = average_time;
+ sync->offset = offset;
+ }
+ }
+}
+
+#endif /* _LINUX_CLOCKSYNC_H */
diff --git a/kernel/time/Makefile b/kernel/time/Makefile
index 905b0b5..6279fb0 100644
--- a/kernel/time/Makefile
+++ b/kernel/time/Makefile
@@ -1,4 +1,4 @@
-obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o
+obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o clocksync.o
obj-$(CONFIG_GENERIC_CLOCKEVENTS_BUILD) += clockevents.o
obj-$(CONFIG_GENERIC_CLOCKEVENTS) += tick-common.o
diff --git a/kernel/time/clocksync.c b/kernel/time/clocksync.c
new file mode 100644
index 0000000..8942ab5
--- /dev/null
+++ b/kernel/time/clocksync.c
@@ -0,0 +1,108 @@
+/*
+ * Utility code which helps transforming between hardware time stamps
+ * generated by a clocksource and system time.
+ *
+ * Copyright (C) 2008 Intel, Patrick Ohly (patrick.ohly@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/clocksync.h>
+#include <linux/module.h>
+
+int clocksync_offset(struct clocksync *sync,
+ s64 *offset,
+ u64 *hwtstamp)
+{
+ u64 starthw = 0, endhw = 0;
+ struct {
+ s64 offset;
+ s64 duration_sys;
+ } samples[100], sample;
+ int counter = 0, i;
+ int used;
+ int index;
+ int num_samples = sync->num_samples;
+
+ if (num_samples > sizeof(samples)/sizeof(samples[0])) {
+ num_samples = sizeof(samples)/sizeof(samples[0]);
+ }
+
+ /* run until we have enough valid samples, but do not try forever */
+ i = 0;
+ counter = 0;
+ while (1) {
+ u64 ts;
+ union ktime start, end;
+
+ start = sync->systime();
+ ts = clocksource_read_time(sync->clock);
+ end = sync->systime();
+
+ if (!i) {
+ starthw = ts;
+ }
+
+ /* ignore negative durations */
+ sample.duration_sys = ktime_to_ns(ktime_sub(end, start));
+ if (sample.duration_sys >= 0) {
+ /*
+ * assume symetric delay to and from HW: average system time
+ * corresponds to measured HW time
+ */
+ sample.offset = ktime_to_ns(ktime_add(end, start)) / 2 -
+ ts;
+
+ /* simple insertion sort based on duration */
+ index = counter - 1;
+ while (index >= 0) {
+ if(samples[index].duration_sys < sample.duration_sys) {
+ break;
+ }
+ samples[index + 1] = samples[index];
+ index--;
+ }
+ samples[index + 1] = sample;
+ counter++;
+ }
+
+ i++;
+ if (counter >= num_samples || i >= 100000) {
+ endhw = ts;
+ break;
+ }
+ }
+
+ *hwtstamp = (endhw + starthw) / 2;
+
+ /* remove outliers by only using 75% of the samples */
+ used = counter * 3 / 4;
+ if (!used) {
+ used = counter;
+ }
+ if (used) {
+ /* calculate average */
+ s64 off = 0;
+ for (index = 0; index < used; index++) {
+ off += samples[index].offset;
+ }
+ off /= used;
+ *offset = off;
+ }
+
+ return used;
+}
+
+EXPORT_SYMBOL_GPL(clocksync_offset);
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time
2008-11-05 9:58 ` [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time Patrick Ohly
@ 2008-11-11 16:18 ` Andi Kleen
2008-11-12 8:01 ` Patrick Ohly
2008-11-12 10:05 ` David Miller
1 sibling, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-11-11 16:18 UTC (permalink / raw)
To: Patrick Ohly
Cc: netdev, Octavian Purdila, Stephen Hemminger, Ingo Oeser,
John Ronciak, Eric Dumazet, Oliver Hartkopp
> +
> +int clocksync_offset(struct clocksync *sync,
> + s64 *offset,
> + u64 *hwtstamp)
> +{
> + u64 starthw = 0, endhw = 0;
> + struct {
> + s64 offset;
> + s64 duration_sys;
> + } samples[100],
That should be separately allocated to avoid potential stack overflow.
Also as a style nit there are normally no {} around single line
statements.
-Andi
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time
2008-11-11 16:18 ` Andi Kleen
@ 2008-11-12 8:01 ` Patrick Ohly
2008-11-12 10:08 ` David Miller
0 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-11-12 8:01 UTC (permalink / raw)
To: Andi Kleen
Cc: netdev@vger.kernel.org, Octavian Purdila, Stephen Hemminger,
Ingo Oeser, Ronciak, John, Eric Dumazet, Oliver Hartkopp
On Tue, 2008-11-11 at 16:18 +0000, Andi Kleen wrote:
> > +
> > +int clocksync_offset(struct clocksync *sync,
> > + s64 *offset,
> > + u64 *hwtstamp)
> > +{
> > + u64 starthw = 0, endhw = 0;
> > + struct {
> > + s64 offset;
> > + s64 duration_sys;
> > + } samples[100],
>
> That should be separately allocated to avoid potential stack overflow.
Good catch. "make checkstack" also complains about it, but I didn't get
around to fixing it yet.
I'd prefer to allocate a very small array on the stack (10 entries = 160
bytes) and only fall back to dynamic allocation if the user of clocksync
wants more samples.
> Also as a style nit there are normally no {} around single line
> statements.
This is the part of the CodingStyle that I had most trouble adapting to
because a) I wrote a lot of code where the required style explicitly
asked for {} and b) I can think of several reasons for adding them
always and only one for not adding them.
Anyway, I'll try to keep this in mind, but would prefer to not reformat
the patches unless I have to touch them for other reasons.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time
2008-11-12 8:01 ` Patrick Ohly
@ 2008-11-12 10:08 ` David Miller
2008-11-12 16:14 ` Patrick Ohly
0 siblings, 1 reply; 48+ messages in thread
From: David Miller @ 2008-11-12 10:08 UTC (permalink / raw)
To: patrick.ohly
Cc: ak, netdev, opurdila, shemminger, netdev, john.ronciak, dada1,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 12 Nov 2008 09:01:38 +0100
> Anyway, I'll try to keep this in mind, but would prefer to not reformat
> the patches unless I have to touch them for other reasons.
That distracts the eyes of the people reviewing the code, because
such people spend most of their time reading code that conforms
to the proper kernel coding style.
Therefore, please fix up these issues rather than defer them.
What does it take like 5 minutes of your time? About the same
amount of time it took you to say you would defer it? Come on...
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time
2008-11-12 10:08 ` David Miller
@ 2008-11-12 16:14 ` Patrick Ohly
2008-11-12 16:28 ` Eric Dumazet
0 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-11-12 16:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev
On Wed, 2008-11-12 at 10:08 +0000, David Miller wrote:
> From: Patrick Ohly <patrick.ohly@intel.com>
> Date: Wed, 12 Nov 2008 09:01:38 +0100
>
> > Anyway, I'll try to keep this in mind, but would prefer to not reformat
> > the patches unless I have to touch them for other reasons.
>
> That distracts the eyes of the people reviewing the code, because
> such people spend most of their time reading code that conforms
> to the proper kernel coding style.
You are right of course. I have changed this and also addressed the
other comments. I'll give it a few more days in case that there are
further comments, then resubmit with linux-kernel on CC.
Should I rebase against net-2.6 or net-next-2.6?
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time
2008-11-12 16:14 ` Patrick Ohly
@ 2008-11-12 16:28 ` Eric Dumazet
0 siblings, 0 replies; 48+ messages in thread
From: Eric Dumazet @ 2008-11-12 16:28 UTC (permalink / raw)
To: Patrick Ohly; +Cc: David Miller, netdev
Patrick Ohly a écrit :
> On Wed, 2008-11-12 at 10:08 +0000, David Miller wrote:
>> From: Patrick Ohly <patrick.ohly@intel.com>
>> Date: Wed, 12 Nov 2008 09:01:38 +0100
>>
>>> Anyway, I'll try to keep this in mind, but would prefer to not reformat
>>> the patches unless I have to touch them for other reasons.
>> That distracts the eyes of the people reviewing the code, because
>> such people spend most of their time reading code that conforms
>> to the proper kernel coding style.
>
> You are right of course. I have changed this and also addressed the
> other comments. I'll give it a few more days in case that there are
> further comments, then resubmit with linux-kernel on CC.
>
> Should I rebase against net-2.6 or net-next-2.6?
>
net-next-2.6 is the tree you want to use for new network developments
net-2.6 is for bug fixes only
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time
2008-11-05 9:58 ` [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time Patrick Ohly
2008-11-11 16:18 ` Andi Kleen
@ 2008-11-12 10:05 ` David Miller
1 sibling, 0 replies; 48+ messages in thread
From: David Miller @ 2008-11-12 10:05 UTC (permalink / raw)
To: patrick.ohly
Cc: netdev, opurdila, shemminger, netdev, ak, john.ronciak, dada1,
oliver
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Wed, 5 Nov 2008 10:58:39 +0100
> Currently only mapping from clock source to system time is implemented.
> The interface could have been made more versatile by not depending on a clock source,
> but this wasn't done to avoid writing glue code elsewhere.
>
> The method implemented here is the one used and analyzed under the name
> "assisted PTP" in the LCI PTP paper:
> http://www.linuxclustersinstitute.org/conferences/archive/2008/PDF/Ohly_92221.pdf
>
> Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Like patch 9, this will need a full review on linux-kernel
^ permalink raw reply [flat|nested] 48+ messages in thread
* [RFC PATCH 12/13] igb: use clocksync to implement hardware time stamping
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (10 preceding siblings ...)
2008-11-05 9:58 ` [RFC PATCH 11/13] time sync: generic infrastructure to map between time stamps generated by a clock source and system time Patrick Ohly
@ 2008-11-06 11:13 ` Patrick Ohly
2008-11-07 9:26 ` [RFC PATCH 13/13] skbuff: optionally store hardware time stamps in new field Patrick Ohly
2008-11-12 16:06 ` [RFC PATCH 00/13] hardware time stamping + igb example implementation Andi Kleen
13 siblings, 0 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-11-06 11:13 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
Currently only TX hardware time stamping is implemented. Due to
hardware limitations it is not possible to verify reliably which
packet was time stamped when multiple were pending for sending; this
will be solved by only allowing one packet marked for hardware time
stamping into the queue (not implemented yet).
RX time stamping relies on the flag in the packet descriptor which
marks packets that were time stamped. In "all packet" mode this flag
is not set. TODO: also support that mode (even though it'll suffer
from race conditions).
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
drivers/net/igb/e1000_82575.h | 1 +
drivers/net/igb/e1000_defines.h | 1 +
drivers/net/igb/e1000_regs.h | 40 +++++++
drivers/net/igb/igb.h | 2 +
drivers/net/igb/igb_main.c | 239 +++++++++++++++++++++++++++++++++++++--
5 files changed, 275 insertions(+), 8 deletions(-)
diff --git a/drivers/net/igb/e1000_82575.h b/drivers/net/igb/e1000_82575.h
index c1928b5..dd32a6f 100644
--- a/drivers/net/igb/e1000_82575.h
+++ b/drivers/net/igb/e1000_82575.h
@@ -116,6 +116,7 @@ union e1000_adv_tx_desc {
};
/* Adv Transmit Descriptor Config Masks */
+#define E1000_ADVTXD_MAC_TSTAMP 0x00080000 /* IEEE1588 Timestamp packet */
#define E1000_ADVTXD_DTYP_CTXT 0x00200000 /* Advanced Context Descriptor */
#define E1000_ADVTXD_DTYP_DATA 0x00300000 /* Advanced Data Descriptor */
#define E1000_ADVTXD_DCMD_IFCS 0x02000000 /* Insert FCS (Ethernet CRC) */
diff --git a/drivers/net/igb/e1000_defines.h b/drivers/net/igb/e1000_defines.h
index ce70068..2a19698 100644
--- a/drivers/net/igb/e1000_defines.h
+++ b/drivers/net/igb/e1000_defines.h
@@ -104,6 +104,7 @@
#define E1000_RXD_STAT_UDPCS 0x10 /* UDP xsum calculated */
#define E1000_RXD_STAT_TCPCS 0x20 /* TCP xsum calculated */
#define E1000_RXD_STAT_DYNINT 0x800 /* Pkt caused INT via DYNINT */
+#define E1000_RXD_STAT_TS 0x10000 /* Pkt was time stamped */
#define E1000_RXD_ERR_CE 0x01 /* CRC Error */
#define E1000_RXD_ERR_SE 0x02 /* Symbol Error */
#define E1000_RXD_ERR_SEQ 0x04 /* Sequence Error */
diff --git a/drivers/net/igb/e1000_regs.h b/drivers/net/igb/e1000_regs.h
index 37f9d55..7b561a1 100644
--- a/drivers/net/igb/e1000_regs.h
+++ b/drivers/net/igb/e1000_regs.h
@@ -78,9 +78,37 @@
/* IEEE 1588 TIMESYNCH */
#define E1000_TSYNCTXCTL 0x0B614
+#define E1000_TSYNCTXCTL_VALID (1<<0)
+#define E1000_TSYNCTXCTL_ENABLED (1<<4)
#define E1000_TSYNCRXCTL 0x0B620
+#define E1000_TSYNCRXCTL_VALID (1<<0)
+#define E1000_TSYNCRXCTL_ENABLED (1<<4)
+enum {
+ E1000_TSYNCRXCTL_TYPE_L2_V2 = 0,
+ E1000_TSYNCRXCTL_TYPE_L4_V1 = (1<<1),
+ E1000_TSYNCRXCTL_TYPE_L2_L4_V2 = (1<<2),
+ E1000_TSYNCRXCTL_TYPE_ALL = (1<<3),
+ E1000_TSYNCRXCTL_TYPE_EVENT_V2 = (1<<3) | (1<<1),
+};
#define E1000_TSYNCRXCFG 0x05F50
+enum {
+ E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE = 0<<0,
+ E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE = 1<<0,
+ E1000_TSYNCRXCFG_PTP_V1_FOLLOWUP_MESSAGE = 2<<0,
+ E1000_TSYNCRXCFG_PTP_V1_DELAY_RESP_MESSAGE = 3<<0,
+ E1000_TSYNCRXCFG_PTP_V1_MANAGEMENT_MESSAGE = 4<<0,
+ E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE = 0<<8,
+ E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE = 1<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_REQ_MESSAGE = 2<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_RESP_MESSAGE = 3<<8,
+ E1000_TSYNCRXCFG_PTP_V2_FOLLOWUP_MESSAGE = 8<<8,
+ E1000_TSYNCRXCFG_PTP_V2_DELAY_RESP_MESSAGE = 9<<8,
+ E1000_TSYNCRXCFG_PTP_V2_PATH_DELAY_FOLLOWUP_MESSAGE = 0xA<<8,
+ E1000_TSYNCRXCFG_PTP_V2_ANNOUNCE_MESSAGE = 0xB<<8,
+ E1000_TSYNCRXCFG_PTP_V2_SIGNALLING_MESSAGE = 0xC<<8,
+ E1000_TSYNCRXCFG_PTP_V2_MANAGEMENT_MESSAGE = 0xD<<8,
+};
#define E1000_SYSTIML 0x0B600
#define E1000_SYSTIMH 0x0B604
#define E1000_TIMINCA 0x0B608
@@ -103,6 +131,18 @@
#define E1000_ETQF6 0x05CC8
#define E1000_ETQF7 0x05CCC
+/* Filtering Registers */
+#define E1000_SAQF(_n) (0x5980 + 4 * (_n))
+#define E1000_DAQF(_n) (0x59A0 + 4 * (_n))
+#define E1000_SPQF(_n) (0x59C0 + 4 * (_n))
+#define E1000_FTQF(_n) (0x59E0 + 4 * (_n))
+#define E1000_SAQF0 E1000_SAQF(0)
+#define E1000_DAQF0 E1000_DAQF(0)
+#define E1000_SPQF0 E1000_SPQF(0)
+#define E1000_FTQF0 E1000_FTQF(0)
+#define E1000_SYNQF(_n) (0x055FC + (4 * (_n))) /* SYN Packet Queue Fltr */
+#define E1000_ETQF(_n) (0x05CB0 + (4 * (_n))) /* EType Queue Fltr */
+
/* Split and Replication RX Control - RW */
/*
* Convenience macros
diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index 2938ab3..86ef1a2 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -35,6 +35,7 @@
#include "e1000_82575.h"
#include <linux/clocksource.h>
+#include <linux/clocksync.h>
struct igb_adapter;
@@ -265,6 +266,7 @@ struct igb_adapter {
struct pci_dev *pdev;
struct net_device_stats net_stats;
struct clocksource clock;
+ struct clocksync sync;
/* structs defined in e1000_hw.h */
struct e1000_hw hw;
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 3a4772e..b320fec 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -227,6 +227,13 @@ static cycle_t igb_read_clock(struct clocksource *cs)
return stamp;
}
+static union ktime igb_hwtstamp_raw2sys(struct net_device *netdev,
+ union ktime stamp)
+{
+ struct igb_adapter *adapter = netdev_priv(netdev);
+ return clocksync_hw2sys(&adapter->sync, ktime_to_ns(stamp));
+}
+
#ifdef DEBUG
/**
* igb_get_hw_dev_name - return device name string
@@ -244,6 +251,7 @@ char *igb_get_hw_dev_name(struct e1000_hw *hw)
static char *igb_get_time_str(struct igb_adapter *adapter,
char buffer[160])
{
+ cycle_t hw = clocksource_read(&adapter->clock);
struct timespec nic = ns_to_timespec(clocksource_read_time(&adapter->clock));
struct timespec sys;
struct timespec delta;
@@ -251,7 +259,8 @@ static char *igb_get_time_str(struct igb_adapter *adapter,
delta = timespec_sub(nic, sys);
- sprintf(buffer, "NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ sprintf(buffer, "HW %llu, NIC %ld.%09lus, SYS %ld.%09lus, NIC-SYS %lds + %09luns",
+ hw,
(long)nic.tv_sec, nic.tv_nsec,
(long)sys.tv_sec, sys.tv_nsec,
(long)delta.tv_sec, delta.tv_nsec);
@@ -1345,6 +1354,19 @@ static int __devinit igb_probe(struct pci_dev *pdev,
wrfl();
clocksource_init_time(&adapter->clock, ktime_to_ns(ktime_get_real()));
+ /*
+ * Synchronize our NIC clock against system wall clock. NIC
+ * time stamp reading requires ~3us per sample, each sample
+ * was pretty stable even under load => only require 10
+ * samples for each offset comparison.
+ */
+ memset(&adapter->sync, 0, sizeof(adapter->sync));
+ adapter->sync.clock = &adapter->clock;
+ adapter->sync.systime = ktime_get_real;
+ adapter->sync.num_samples = 10;
+ clocksync_update(&adapter->sync, 0);
+ netdev->hwtstamp_raw2sys = igb_hwtstamp_raw2sys;
+
#ifdef DEBUG
{
char buffer[160];
@@ -2716,6 +2738,7 @@ set_itr_now:
#define IGB_TX_FLAGS_VLAN 0x00000002
#define IGB_TX_FLAGS_TSO 0x00000004
#define IGB_TX_FLAGS_IPV4 0x00000008
+#define IGB_TX_FLAGS_TSTAMP 0x00000010
#define IGB_TX_FLAGS_VLAN_MASK 0xffff0000
#define IGB_TX_FLAGS_VLAN_SHIFT 16
@@ -2936,6 +2959,9 @@ static inline void igb_tx_queue_adv(struct igb_adapter *adapter,
if (tx_flags & IGB_TX_FLAGS_VLAN)
cmd_type_len |= E1000_ADVTXD_DCMD_VLE;
+ if (tx_flags & IGB_TX_FLAGS_TSTAMP)
+ cmd_type_len |= E1000_ADVTXD_MAC_TSTAMP;
+
if (tx_flags & IGB_TX_FLAGS_TSO) {
cmd_type_len |= E1000_ADVTXD_DCMD_TSE;
@@ -3048,7 +3074,27 @@ static int igb_xmit_frame_ring_adv(struct sk_buff *skb,
/* this is a hard error */
return NETDEV_TX_BUSY;
}
- skb_orphan(skb);
+
+ /*
+ * TODO: check that there currently is no other packet with
+ * time stamping in the queue
+ *
+ * when doing time stamping, keep the connection to the socket
+ * a while longer, it is still needed by skb_hwtstamp_tx(), either
+ * in igb_clean_tx_irq() or
+ */
+ if (skb_hwtstamp_check_tx_hardware(skb)) {
+ skb_hwtstamp_tx_in_progress(skb);
+ tx_flags |= IGB_TX_FLAGS_TSTAMP;
+ } else if (!skb_hwtstamp_check_tx_software(skb)) {
+ /*
+ * TODO: can this be solved in dev.c:dev_hard_start_xmit()?
+ * There are probably unmodified driver which do something
+ * like this and thus don't work in combination with
+ * SOF_TIMESTAMPING_TX_SOFTWARE.
+ */
+ skb_orphan(skb);
+ }
if (adapter->vlgrp && vlan_tx_tag_present(skb)) {
tx_flags |= IGB_TX_FLAGS_VLAN;
@@ -3746,6 +3792,28 @@ static bool igb_clean_tx_irq(struct igb_ring *tx_ring)
skb->len;
total_packets += segs;
total_bytes += bytecount;
+
+ /*
+ * if we were asked to do hardware
+ * stamping and such a time stamp is
+ * available, then it must have been
+ * for this one here because we only
+ * allow only one such packet into the
+ * queue
+ */
+ if (skb_hwtstamp_check_tx_hardware(skb)) {
+ u32 valid = rd32(E1000_TSYNCTXCTL) & E1000_TSYNCTXCTL_VALID;
+ if (valid) {
+ u64 tstamp = rd32(E1000_TXSTMPL);
+ tstamp |= (u64)rd32(E1000_TXSTMPH) << 32;
+ clocksync_update(&adapter->sync, tstamp);
+ skb_hwtstamp_tx(skb,
+ ns_to_ktime(clocksource_cyc2time(&adapter->clock,
+ tstamp)),
+ netdev);
+ }
+ skb_orphan(skb);
+ }
}
igb_unmap_and_free_tx_resource(adapter, buffer_info);
@@ -3929,6 +3997,7 @@ static bool igb_clean_rx_irq_adv(struct igb_ring *rx_ring,
{
struct igb_adapter *adapter = rx_ring->adapter;
struct net_device *netdev = adapter->netdev;
+ struct e1000_hw *hw = &adapter->hw;
struct pci_dev *pdev = adapter->pdev;
union e1000_adv_rx_desc *rx_desc , *next_rxd;
struct igb_buffer *buffer_info , *next_buffer;
@@ -4018,6 +4087,38 @@ send_up:
goto next_desc;
}
+ /*
+ * If this bit is set, then the RX registers contain
+ * the time stamp. No other packet will be time
+ * stamped until we read these registers, so read the
+ * registers to make them available again. Because
+ * only one packet can be time stamped at a time, we
+ * know that the register values must belong to this
+ * one here and therefore we don't need to compare
+ * any of the additional attributes stored for it.
+ *
+ * TODO: can time stamping be triggered (thus locking
+ * the registers) without the packet reaching this point
+ * here? In that case RX time stamping would get stuck.
+ *
+ * TODO: in "time stamp all packets" mode this bit is
+ * not set. Need a global flag for this mode and then
+ * always read the registers. Cannot be done without
+ * a race condition.
+ */
+ if (staterr & E1000_RXD_STAT_TS) {
+ u64 tstamp;
+
+ WARN(!(rd32(E1000_TSYNCRXCTL) & E1000_TSYNCRXCTL_VALID),
+ "igb: no RX time stamp available for time stamped packet");
+ tstamp = rd32(E1000_RXSTMPL);
+ tstamp |= (u64)rd32(E1000_RXSTMPH) << 32;
+ clocksync_update(&adapter->sync, tstamp);
+ skb_hwtstamp_set(skb,
+ ns_to_ktime(clocksource_cyc2time(&adapter->clock,
+ tstamp)));
+ }
+
if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
dev_kfree_skb_irq(skb);
goto next_desc;
@@ -4214,12 +4315,32 @@ static int igb_mii_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
* @ifreq:
* @cmd:
*
- * Currently cannot enable any kind of hardware time stamping, but
- * supports SIOCSHWTSTAMP in general.
+ * Outgoing time stamping can be enabled and disabled. Play nice and
+ * disable it when requested, although it shouldn't case any overhead
+ * when no packet needs it. At most one packet in the queue may be
+ * marked for time stamping, otherwise it would be impossible to tell
+ * for sure to which packet the hardware time stamp belongs.
+ *
+ * Incoming time stamping has to be configured via the hardware
+ * filters. Not all combinations are supported, in particular event
+ * type has to be specified. Matching the kind of event packet is
+ * not supported, with the exception of "all V2 events regardless of
+ * level 2 or 4".
+ *
**/
static int igb_hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
{
+ struct igb_adapter *adapter = netdev_priv(netdev);
+ struct e1000_hw *hw = &adapter->hw;
struct hwtstamp_config config;
+ u32 tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED;
+ u32 tsync_rx_ctl_bit = E1000_TSYNCRXCTL_ENABLED;
+ u32 tsync_rx_ctl_type = 0;
+ u32 tsync_rx_cfg = 0;
+ int is_l4 = 0;
+ int is_l2 = 0;
+ short port = 319; /* PTP */
+ u32 regval;
printk("igb_hwtstamp_ioctl\n");
@@ -4230,11 +4351,113 @@ static int igb_hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr, int
if (config.flags)
return -EINVAL;
- if (config.tx_type == HWTSTAMP_TX_OFF &&
- config.rx_filter_type == HWTSTAMP_FILTER_NONE)
- return 0;
+ switch (config.tx_type) {
+ case HWTSTAMP_TX_OFF:
+ tsync_tx_ctl_bit = 0;
+ break;
+ case HWTSTAMP_TX_ON:
+ tsync_tx_ctl_bit = E1000_TSYNCTXCTL_ENABLED;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ switch (config.rx_filter_type) {
+ case HWTSTAMP_FILTER_NONE:
+ tsync_rx_ctl_bit = 0;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+ case HWTSTAMP_FILTER_ALL:
+ /*
+ * register TSYNCRXCFG must be set, therefore it is not
+ * possible to time stamp both Sync and Delay_Req messages
+ * => fall back to time stamping all packets
+ */
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_ALL;
+ config.rx_filter_type = HWTSTAMP_FILTER_ALL;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_SYNC_MESSAGE;
+ is_l4 = 1;
+ break;
+ case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L4_V1;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V1_DELAY_REQ_MESSAGE;
+ is_l4 = 1;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_SYNC_MESSAGE;
+ is_l2 = 1;
+ is_l4 = 1;
+ config.rx_filter_type = HWTSTAMP_FILTER_SOME;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+ case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_L2_L4_V2;
+ tsync_rx_cfg = E1000_TSYNCRXCFG_PTP_V2_DELAY_REQ_MESSAGE;
+ is_l2 = 1;
+ is_l4 = 1;
+ config.rx_filter_type = HWTSTAMP_FILTER_SOME;
+ break;
+ case HWTSTAMP_FILTER_PTP_V2_EVENT:
+ case HWTSTAMP_FILTER_PTP_V2_SYNC:
+ case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+ tsync_rx_ctl_type = E1000_TSYNCRXCTL_TYPE_EVENT_V2;
+ config.rx_filter_type = HWTSTAMP_FILTER_PTP_V2_EVENT;
+ is_l2 = 1;
+ break;
+ default:
+ return -ERANGE;
+ }
+
+ /* enable/disable TX */
+ regval = rd32(E1000_TSYNCTXCTL);
+ regval = (regval & ~E1000_TSYNCTXCTL_ENABLED) | tsync_tx_ctl_bit;
+ wr32(E1000_TSYNCTXCTL, regval);
+
+ /* enable/disable RX, define which PTP packets are time stamped */
+ regval = rd32(E1000_TSYNCRXCTL);
+ regval = (regval & ~E1000_TSYNCRXCTL_ENABLED) | tsync_rx_ctl_bit;
+ regval = (regval & ~0xE) | tsync_rx_ctl_type;
+ wr32(E1000_TSYNCRXCTL, regval);
+ wr32(E1000_TSYNCRXCFG, tsync_rx_cfg);
+
+ /*
+ * Ethertype Filter Queue Filter[0][15:0] = 0x88F7 (Ethertype to filter on)
+ * Ethertype Filter Queue Filter[0][26] = 0x1 (Enable filter)
+ * Ethertype Filter Queue Filter[0][30] = 0x1 (Enable Timestamping)
+ */
+ wr32(E1000_ETQF0, is_l2 ? 0x440088f7 : 0);
+
+ /* L4 Queue Filter[0]: only filter by source and destination port */
+ wr32(E1000_SPQF0, htons(port));
+ wr32(E1000_IMIREXT(0), is_l4 ?
+ ((1<<12) | (1<<19) /* bypass size and control flags */) : 0);
+ wr32(E1000_IMIR(0), is_l4 ?
+ (htons(port)
+ | (0<<16) /* immediate interrupt disabled */
+ | 0 /* (1<<17) bit cleared: do not bypass destination port check */)
+ : 0);
+ wr32(E1000_FTQF0, is_l4 ?
+ (0x11 /* UDP */
+ | (1<<15) /* VF not compared */
+ | (1<<27) /* Enable Timestamping */
+ | (7<<28) /* only source port filter enabled, source/target address and protocol masked */ )
+ : ( (1<<15) | (15<<28) /* all mask bits set = filter not enabled */));
+
+ wrfl();
+
+ /* clear TX/RX time stamp registers, just to be sure */
+ regval = rd32(E1000_TXSTMPH);
+ regval = rd32(E1000_RXSTMPH);
- return -ERANGE;
+ return copy_to_user(ifr->ifr_data, &config, sizeof(config)) ?
+ -EFAULT : 0;
}
/**
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* [RFC PATCH 13/13] skbuff: optionally store hardware time stamps in new field
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (11 preceding siblings ...)
2008-11-06 11:13 ` [RFC PATCH 12/13] igb: use clocksync to implement hardware time stamping Patrick Ohly
@ 2008-11-07 9:26 ` Patrick Ohly
2008-11-12 16:06 ` [RFC PATCH 00/13] hardware time stamping + igb example implementation Andi Kleen
13 siblings, 0 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-11-07 9:26 UTC (permalink / raw)
To: netdev
Cc: Octavian Purdila, Stephen Hemminger, Ingo Oeser, Andi Kleen,
John Ronciak, Eric Dumazet, Oliver Hartkopp
Because of performance reasons, adding a new field to struct sk_buff
was avoided. Hardware time stamps are stored in the existing field, but
in order to not break other code, they must have been transformed to
the system time base.
To obtain the original hardware time stamp before the transformation,
a network device driver must implement the inverse transformation.
The clocksync code has no support for that yet and it would be
difficult to implement 100% accurately (rounding errors, updated
offset/skew values).
Instead of implementing this inverse transformation, this patch
adds another field for hardware time stamps. It is off by default
and mainstream Linux distributions should leave it off (PTP time
synchronization doesn't need it), but special distributions/users
could enable it if needed without having to patch the mainline
kernel source.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
---
drivers/net/igb/igb_main.c | 3 +-
include/linux/skbuff.h | 48 ++++++++++++++++++++++++++++++++++---------
net/Kconfig | 16 ++++++++++++++
net/core/skbuff.c | 18 +++++++++++++--
4 files changed, 71 insertions(+), 14 deletions(-)
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index b320fec..2fed508 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -4116,7 +4116,8 @@ send_up:
clocksync_update(&adapter->sync, tstamp);
skb_hwtstamp_set(skb,
ns_to_ktime(clocksource_cyc2time(&adapter->clock,
- tstamp)));
+ tstamp)),
+ ns_to_ktime(tstamp));
}
if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bcca8fc..123711d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -203,6 +203,7 @@ typedef unsigned char *sk_buff_data_t;
* thus is recorded in system time. If the lowest bit is set,
* then the value was originally generated by a different clock
* in the receiving hardware and then transformed to system time.
+ * @hwtstamp: raw, unmodified hardware time stamp (optional)
* @dev: Device we arrived on/are leaving by
* @transport_header: Transport layer header
* @network_header: Network layer header
@@ -260,6 +261,9 @@ struct sk_buff {
struct sock *sk;
ktime_t tstamp;
+#ifdef CONFIG_NET_SKBUFF_HWTSTAMPS
+ union ktime hwtstamp;
+#endif
struct net_device *dev;
union {
@@ -1530,11 +1534,15 @@ extern void skb_init(void);
/** returns skb->tstamp without the bit which marks hardware time stamps */
static inline union ktime skb_get_ktime(const struct sk_buff *skb)
{
-#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+#ifdef CONFIG_NET_SKBUFF_HWTSTAMPS
+ return skb->tstamp;
+#else
+# if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
return ktime_set(skb->tstamp.tv.sec,
skb->tstamp.tv.nsec & ~1);
-#else
+# else
return (ktime_t) { .tv64 = skb->tstamp.tv64 & ~1UL };
+# endif
#endif
}
@@ -1564,15 +1572,19 @@ static inline void __net_timestamp(struct sk_buff *skb)
{
skb->tstamp = ktime_get_real();
+#ifdef CONFIG_NET_SKBUFF_HWTSTAMPS
+ skb->hwtstamp.tv64 = 0;
+#else
/*
* make sure that lowest bit is never set: it marks hardware
* time stamps
*/
-#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+# if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
skb->tstamp.tv.sec = skb->tstamp.tv.sec / 2 * 2;
-#else
+# else
skb->tstamp.tv64 = skb->tstamp.tv64 / 2 * 2;
-#endif
+# endif
+#endif /* CONFIG_NET_SKBUFF_HWTSTAMPS */
}
static inline ktime_t net_timedelta(ktime_t t)
@@ -1591,18 +1603,34 @@ static inline ktime_t net_invalid_timestamp(void)
*/
static inline int skb_hwtstamp_available(const struct sk_buff *skb)
{
+#ifdef CONFIG_NET_SKBUFF_HWTSTAMPS
+ return skb->hwtstamp.tv64 != 0;
+#else
return skb->tstamp.tv64 & 1;
+#endif
}
+/**
+ * skb_hwtstamp_set - stores a time stamp generated by hardware in the skb
+ * @skb: time stamp is stored here
+ * @stamp: hardware time stamp transformed to system time
+ * @hwtstamp: original, untransformed hardware time stamp
+ */
static inline void skb_hwtstamp_set(struct sk_buff *skb,
- union ktime stamp)
-{
-#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+ union ktime stamp,
+ union ktime hwtstamp)
+{
+#ifdef CONFIG_NET_SKBUFF_HWTSTAMPS
+ skb->tstamp = stamp;
+ skb->hwtstamp = hwtstamp;
+#else /* CONFIG_NET_SKBUFF_HWTSTAMPS */
+# if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
skb->tstamp.tv.sec = stamp.tv.sec;
skb->tstamp.tv.nsec = stamp.tv.nsec | 1;
-#else
+# else
skb->tstamp.tv64 = stamp.tv64 | 1;
-#endif
+# endif
+#endif /* CONFIG_NET_SKBUFF_HWTSTAMPS */
}
/**
diff --git a/net/Kconfig b/net/Kconfig
index 7612cc8..b37b891 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -32,6 +32,22 @@ config NET_NS
Allow user space to create what appear to be multiple instances
of the network stack.
+config NET_SKBUFF_HWTSTAMPS
+ bool "Additional hardware time stamp field in struct sk_buff"
+ default n
+ depends on EXPERIMENTAL
+ help
+ Increase the size of sk_buff by 64 bits to store a raw hardware
+ time stamp in addition to the system time stamp. This is only
+ necessary when a) there is a network device which supports
+ hardware time stamping and b) access to these raw, unmodified values
+ is required.
+
+ Usually it is sufficient to convert the raw time stamps into system
+ time and store that in the existing time stamp value. Increasing
+ the size of sk_buff can have a performance impact, so if in doubt
+ say N here.
+
source "net/packet/Kconfig"
source "net/unix/Kconfig"
source "net/xfrm/Kconfig"
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7d9f1dd..8b7960e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2326,6 +2326,12 @@ EXPORT_SYMBOL_GPL(skb_segment);
int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp)
{
+#ifdef CONFIG_NET_SKBUFF_HWTSTAMPS
+ if (skb_hwtstamp_available(skb)) {
+ *stamp = ktime_to_timespec(skb->hwtstamp);
+ return 1;
+ }
+#else
struct rtable *rt;
struct in_device *idev;
struct net_device *netdev;
@@ -2342,6 +2348,7 @@ int skb_hwtstamp_raw(const struct sk_buff *skb, struct timespec *stamp)
return 1;
}
}
+#endif
return 0;
}
@@ -2592,14 +2599,19 @@ void skb_hwtstamp_tx(struct sk_buff *orig_skb,
skb_hwtstamp_set(skb,
dev->hwtstamp_raw2sys ?
dev->hwtstamp_raw2sys(dev, stamp) :
+ stamp,
stamp);
} else {
skb->tstamp = stamp;
-#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
- skb->tstamp.tv.sec = skb->tstamp.tv.sec / 2 * 2;
+#ifdef CONFIG_NET_SKBUFF_HWTSTAMPS
+ skb->hwtstamp.tv64 = 0;
#else
+# if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
+ skb->tstamp.tv.sec = skb->tstamp.tv.sec / 2 * 2;
+# else
skb->tstamp.tv64 = skb->tstamp.tv64 / 2 * 2;
-#endif
+# endif
+#endif /* CONFIG_NET_SKBUFF_HWTSTAMPS */
}
err = sock_queue_err_skb(sk, skb);
--
1.6.0.4
^ permalink raw reply related [flat|nested] 48+ messages in thread* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-11 14:44 [RFC PATCH 00/13] hardware time stamping + igb example implementation Patrick Ohly
` (12 preceding siblings ...)
2008-11-07 9:26 ` [RFC PATCH 13/13] skbuff: optionally store hardware time stamps in new field Patrick Ohly
@ 2008-11-12 16:06 ` Andi Kleen
2008-11-12 16:25 ` Patrick Ohly
13 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2008-11-12 16:06 UTC (permalink / raw)
To: Patrick Ohly
Cc: netdev, Octavian Purdila, Stephen Hemminger, Ingo Oeser,
John Ronciak, Eric Dumazet, Oliver Hartkopp
As a general comment on the patch series I'm still a little sceptical
the time stamp offset method is a good idea. Since it tries to approximate
several unsynchronized clocks the result will always be of a little poor
quality, which will likely lead to problems sooner or later (or rather
require ugly workarounds in the user).
I think it would be better to just bite the bullet and add new fields
for this to the skbs. Hardware timestamps are useful enough to justify
this.
-Andi
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 16:06 ` [RFC PATCH 00/13] hardware time stamping + igb example implementation Andi Kleen
@ 2008-11-12 16:25 ` Patrick Ohly
2008-11-12 18:44 ` Oliver Hartkopp
0 siblings, 1 reply; 48+ messages in thread
From: Patrick Ohly @ 2008-11-12 16:25 UTC (permalink / raw)
To: Andi Kleen
Cc: netdev@vger.kernel.org, Octavian Purdila, Stephen Hemminger,
Ingo Oeser, Ronciak, John, Eric Dumazet, Oliver Hartkopp
On Wed, 2008-11-12 at 16:06 +0000, Andi Kleen wrote:
> As a general comment on the patch series I'm still a little sceptical
> the time stamp offset method is a good idea. Since it tries to approximate
> several unsynchronized clocks the result will always be of a little poor
> quality, which will likely lead to problems sooner or later (or rather
> require ugly workarounds in the user).
>
> I think it would be better to just bite the bullet and add new fields
> for this to the skbs. Hardware timestamps are useful enough to justify
> this.
I'm all for it, as long as it doesn't keep this feature out of the
mainline.
At least one additional ktime_t field would be needed for the raw
hardware time stamp. Transformation to system time (as needed by PTP)
would have to be delayed until the packet is delivered via a socket. The
code would be easier (and a bit more accurate) if also another ktime_t
was added to store the transformed value directly after generating it.
An extra field would also solve one of the open problems (tstamp set to
time stamp when dev_start_xmit_hard is called for IP_MULTICAST_LOOP).
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 16:25 ` Patrick Ohly
@ 2008-11-12 18:44 ` Oliver Hartkopp
2008-11-12 19:22 ` Eric Dumazet
2008-11-19 12:39 ` Patrick Ohly
0 siblings, 2 replies; 48+ messages in thread
From: Oliver Hartkopp @ 2008-11-12 18:44 UTC (permalink / raw)
To: Patrick Ohly
Cc: Andi Kleen, netdev@vger.kernel.org, Octavian Purdila,
Stephen Hemminger, Ingo Oeser, Ronciak, John, Eric Dumazet
Patrick Ohly wrote:
> On Wed, 2008-11-12 at 16:06 +0000, Andi Kleen wrote:
>
>> As a general comment on the patch series I'm still a little sceptical
>> the time stamp offset method is a good idea. Since it tries to approximate
>> several unsynchronized clocks the result will always be of a little poor
>> quality, which will likely lead to problems sooner or later (or rather
>> require ugly workarounds in the user).
>>
>> I think it would be better to just bite the bullet and add new fields
>> for this to the skbs. Hardware timestamps are useful enough to justify
>> this.
>>
>
> I'm all for it, as long as it doesn't keep this feature out of the
> mainline.
>
> At least one additional ktime_t field would be needed for the raw
> hardware time stamp. Transformation to system time (as needed by PTP)
> would have to be delayed until the packet is delivered via a socket. The
> code would be easier (and a bit more accurate) if also another ktime_t
> was added to store the transformed value directly after generating it.
>
> An extra field would also solve one of the open problems (tstamp set to
> time stamp when dev_start_xmit_hard is called for IP_MULTICAST_LOOP).
>
>
I really wondered if you posted the series to get an impression why
adding a new field is a good idea ;-)
Ok, i'm not that experienced on timestamps but i really got confused
reading the patches and it's documentation (even together with the
discussion on the ML). I would also vote for having a new field in the
skb instead of this current 'bit-compression' approach which smells
quite expensive at runtime and in code size. Not talking about the
mentioned potential locking issues ...
Regards,
Oliver
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 18:44 ` Oliver Hartkopp
@ 2008-11-12 19:22 ` Eric Dumazet
2008-11-12 20:23 ` Andi Kleen
2008-11-12 20:23 ` Andi Kleen
2008-11-19 12:39 ` Patrick Ohly
1 sibling, 2 replies; 48+ messages in thread
From: Eric Dumazet @ 2008-11-12 19:22 UTC (permalink / raw)
To: Oliver Hartkopp
Cc: Patrick Ohly, Andi Kleen, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Oliver Hartkopp a écrit :
> Patrick Ohly wrote:
>> On Wed, 2008-11-12 at 16:06 +0000, Andi Kleen wrote:
>>
>>> As a general comment on the patch series I'm still a little sceptical
>>> the time stamp offset method is a good idea. Since it tries to
>>> approximate
>>> several unsynchronized clocks the result will always be of a little poor
>>> quality, which will likely lead to problems sooner or later (or rather
>>> require ugly workarounds in the user).
>>>
>>> I think it would be better to just bite the bullet and add new fields
>>> for this to the skbs. Hardware timestamps are useful enough to justify
>>> this.
>>>
>>
>> I'm all for it, as long as it doesn't keep this feature out of the
>> mainline.
>>
>> At least one additional ktime_t field would be needed for the raw
>> hardware time stamp. Transformation to system time (as needed by PTP)
>> would have to be delayed until the packet is delivered via a socket. The
>> code would be easier (and a bit more accurate) if also another ktime_t
>> was added to store the transformed value directly after generating it.
>>
>> An extra field would also solve one of the open problems (tstamp set to
>> time stamp when dev_start_xmit_hard is called for IP_MULTICAST_LOOP).
>>
>>
>
> I really wondered if you posted the series to get an impression why
> adding a new field is a good idea ;-)
> Ok, i'm not that experienced on timestamps but i really got confused
> reading the patches and it's documentation (even together with the
> discussion on the ML). I would also vote for having a new field in the
> skb instead of this current 'bit-compression' approach which smells
> quite expensive at runtime and in code size. Not talking about the
> mentioned potential locking issues ...
New fields in skb are probably the easy way to handle the problem, we
all know that.
But adding fields on such heavy duty structure for less than 0.001 % of
handled frames is disgusting.
Crazy idea here :
Say your NIC is capable of generating hw timestamps at TX or RX time.
Instead of storing them in skb, store them in a local structure (of the driver)
The local structure could contain an array of 4096 (or whatever size) couples of
{pointer to skb, hardware timestamp with whatever format is needed by this NIC}
If an application needs skb hw timestamp, get it when reading message, with appropriate
API, that calls NIC driver method, giving skb pointer as an argument. NIC driver
search in its local table a match of skb pointer (giving the most recent match of course),
and converts hwtimestamp into "generic application format". No need for a fast search, just
a linear search in the table, so that feeding it is really easy (maybe lockless)
For TX side, a flag on skb could ask NIC driver to feed transmited skb (or a copy of them)
to a raw socket (kind of a loopback for selected packets), once TX hstamp is collected in local table.
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 19:22 ` Eric Dumazet
@ 2008-11-12 20:23 ` Andi Kleen
2008-11-12 20:23 ` Andi Kleen
1 sibling, 0 replies; 48+ messages in thread
From: Andi Kleen @ 2008-11-12 20:23 UTC (permalink / raw)
To: Eric Dumazet
Cc: Oliver Hartkopp, Patrick Ohly, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Eric Dumazet wrote:
> Oliver Hartkopp a écrit :
>> Patrick Ohly wrote:
>>> On Wed, 2008-11-12 at 16:06 +0000, Andi Kleen wrote:
>>>
>>>> As a general comment on the patch series I'm still a little sceptical
>>>> the time stamp offset method is a good idea. Since it tries to
>>>> approximate
>>>> several unsynchronized clocks the result will always be of a little
>>>> poor
>>>> quality, which will likely lead to problems sooner or later (or rather
>>>> require ugly workarounds in the user).
>>>>
>>>> I think it would be better to just bite the bullet and add new fields
>>>> for this to the skbs. Hardware timestamps are useful enough to justify
>>>> this.
>>>>
>>>
>>> I'm all for it, as long as it doesn't keep this feature out of the
>>> mainline.
>>>
>>> At least one additional ktime_t field would be needed for the raw
>>> hardware time stamp. Transformation to system time (as needed by PTP)
>>> would have to be delayed until the packet is delivered via a socket. The
>>> code would be easier (and a bit more accurate) if also another ktime_t
>>> was added to store the transformed value directly after generating it.
>>>
>>> An extra field would also solve one of the open problems (tstamp set to
>>> time stamp when dev_start_xmit_hard is called for IP_MULTICAST_LOOP).
>>>
>>>
>>
>> I really wondered if you posted the series to get an impression why
>> adding a new field is a good idea ;-)
>> Ok, i'm not that experienced on timestamps but i really got confused
>> reading the patches and it's documentation (even together with the
>> discussion on the ML). I would also vote for having a new field in the
>> skb instead of this current 'bit-compression' approach which smells
>> quite expensive at runtime and in code size. Not talking about the
>> mentioned potential locking issues ...
>
> New fields in skb are probably the easy way to handle the problem, we
> all know that.
>
> But adding fields on such heavy duty structure for less than 0.001 % of
> handled frames is disgusting.
You have a strange definition of "disgusting".
But if that's true that applies to the existing timestamp in there then too
(and a couple of other fields in there too)
Also I suspect that your percent numbers are wrong, depending on the workload.
Personally I think hardware time stamps should replace the existing
time stamp and I suspect more and more applications will move to that eventually.
> If an application needs skb hw timestamp, get it when reading message,
> with appropriate
> API, that calls NIC driver method, giving skb pointer as an argument.
> NIC driver
> search in its local table a match of skb pointer (giving the most recent
> match of course),
> and converts hwtimestamp into "generic application format". No need for
> a fast search, just
> a linear search in the table, so that feeding it is really easy (maybe
> lockless)
This will probably be a disaster on e.g. high speed network sniffing
(which is one of the primary use cases of the hardware
As soon as there is any reordering in the queue (and that is inevitable
if you scale over multiple CPUs) your linear searches could get quite
long and bounce cache lines like mad. Also I doubt it can be really
done lockless.
Also to be honest such a complicated and likely badly performing scheme just to save 4-8 bytes
would match my own definition of "disgusting".
-Andi
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 19:22 ` Eric Dumazet
2008-11-12 20:23 ` Andi Kleen
@ 2008-11-12 20:23 ` Andi Kleen
2008-11-12 20:56 ` Eric Dumazet
2008-11-12 22:17 ` David Miller
1 sibling, 2 replies; 48+ messages in thread
From: Andi Kleen @ 2008-11-12 20:23 UTC (permalink / raw)
To: Eric Dumazet
Cc: Oliver Hartkopp, Patrick Ohly, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Eric Dumazet wrote:
> Oliver Hartkopp a écrit :
>> Patrick Ohly wrote:
>>> On Wed, 2008-11-12 at 16:06 +0000, Andi Kleen wrote:
>>>
>>>> As a general comment on the patch series I'm still a little sceptical
>>>> the time stamp offset method is a good idea. Since it tries to
>>>> approximate
>>>> several unsynchronized clocks the result will always be of a little
>>>> poor
>>>> quality, which will likely lead to problems sooner or later (or rather
>>>> require ugly workarounds in the user).
>>>>
>>>> I think it would be better to just bite the bullet and add new fields
>>>> for this to the skbs. Hardware timestamps are useful enough to justify
>>>> this.
>>>>
>>>
>>> I'm all for it, as long as it doesn't keep this feature out of the
>>> mainline.
>>>
>>> At least one additional ktime_t field would be needed for the raw
>>> hardware time stamp. Transformation to system time (as needed by PTP)
>>> would have to be delayed until the packet is delivered via a socket. The
>>> code would be easier (and a bit more accurate) if also another ktime_t
>>> was added to store the transformed value directly after generating it.
>>>
>>> An extra field would also solve one of the open problems (tstamp set to
>>> time stamp when dev_start_xmit_hard is called for IP_MULTICAST_LOOP).
>>>
>>>
>>
>> I really wondered if you posted the series to get an impression why
>> adding a new field is a good idea ;-)
>> Ok, i'm not that experienced on timestamps but i really got confused
>> reading the patches and it's documentation (even together with the
>> discussion on the ML). I would also vote for having a new field in the
>> skb instead of this current 'bit-compression' approach which smells
>> quite expensive at runtime and in code size. Not talking about the
>> mentioned potential locking issues ...
>
> New fields in skb are probably the easy way to handle the problem, we
> all know that.
>
> But adding fields on such heavy duty structure for less than 0.001 % of
> handled frames is disgusting.
You have a strange definition of "disgusting".
But if that's true that applies to the existing timestamp in there then too
(and a couple of other fields in there too)
Also I suspect that your percent numbers are wrong, depending on the workload.
Personally I think hardware time stamps should replace the existing
time stamp and I suspect more and more applications will move to that eventually.
> If an application needs skb hw timestamp, get it when reading message,
> with appropriate
> API, that calls NIC driver method, giving skb pointer as an argument.
> NIC driver
> search in its local table a match of skb pointer (giving the most recent
> match of course),
> and converts hwtimestamp into "generic application format". No need for
> a fast search, just
> a linear search in the table, so that feeding it is really easy (maybe
> lockless)
This will probably be a disaster on e.g. high speed network sniffing
(which is one of the primary use cases of the hardware
As soon as there is any reordering in the queue (and that is inevitable
if you scale over multiple CPUs) your linear searches could get quite
long and bounce cache lines like mad. Also I doubt it can be really
done lockless.
Also to be honest such a complicated and likely badly performing scheme just to save 4-8 bytes
would match my own definition of "disgusting".
-Andi
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 20:23 ` Andi Kleen
@ 2008-11-12 20:56 ` Eric Dumazet
2008-11-12 21:34 ` Andi Kleen
2008-11-12 22:17 ` David Miller
1 sibling, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2008-11-12 20:56 UTC (permalink / raw)
To: Andi Kleen
Cc: Oliver Hartkopp, Patrick Ohly, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Andi Kleen a écrit :
> Eric Dumazet wrote:
>> Oliver Hartkopp a écrit :
>>> Patrick Ohly wrote:
>>>> On Wed, 2008-11-12 at 16:06 +0000, Andi Kleen wrote:
>>>>
>>>>> As a general comment on the patch series I'm still a little sceptical
>>>>> the time stamp offset method is a good idea. Since it tries to
>>>>> approximate
>>>>> several unsynchronized clocks the result will always be of a little
>>>>> poor
>>>>> quality, which will likely lead to problems sooner or later (or rather
>>>>> require ugly workarounds in the user).
>>>>>
>>>>> I think it would be better to just bite the bullet and add new fields
>>>>> for this to the skbs. Hardware timestamps are useful enough to justify
>>>>> this.
>>>>>
>>>>
>>>> I'm all for it, as long as it doesn't keep this feature out of the
>>>> mainline.
>>>>
>>>> At least one additional ktime_t field would be needed for the raw
>>>> hardware time stamp. Transformation to system time (as needed by PTP)
>>>> would have to be delayed until the packet is delivered via a socket.
>>>> The
>>>> code would be easier (and a bit more accurate) if also another ktime_t
>>>> was added to store the transformed value directly after generating it.
>>>>
>>>> An extra field would also solve one of the open problems (tstamp set to
>>>> time stamp when dev_start_xmit_hard is called for IP_MULTICAST_LOOP).
>>>>
>>>>
>>>
>>> I really wondered if you posted the series to get an impression why
>>> adding a new field is a good idea ;-)
>>> Ok, i'm not that experienced on timestamps but i really got confused
>>> reading the patches and it's documentation (even together with the
>>> discussion on the ML). I would also vote for having a new field in
>>> the skb instead of this current 'bit-compression' approach which
>>> smells quite expensive at runtime and in code size. Not talking about
>>> the mentioned potential locking issues ...
>>
>> New fields in skb are probably the easy way to handle the problem, we
>> all know that.
>>
>> But adding fields on such heavy duty structure for less than 0.001 % of
>> handled frames is disgusting.
>
> You have a strange definition of "disgusting".
>
> But if that's true that applies to the existing timestamp in there then too
> (and a couple of other fields in there too)
>
> Also I suspect that your percent numbers are wrong, depending on the
> workload.
>
> Personally I think hardware time stamps should replace the existing
> time stamp and I suspect more and more applications will move to that
> eventually.
>
tstamp is the time stamp at the time NIC driver got the frame. Not the time
the NIC got the frame from the wire.
In about five years, maybe libpcap is updated to use timespec instead of timeval.
>
>> If an application needs skb hw timestamp, get it when reading message,
>> with appropriate
>> API, that calls NIC driver method, giving skb pointer as an argument.
>> NIC driver
>> search in its local table a match of skb pointer (giving the most
>> recent match of course),
>> and converts hwtimestamp into "generic application format". No need
>> for a fast search, just
>> a linear search in the table, so that feeding it is really easy (maybe
>> lockless)
>
> This will probably be a disaster on e.g. high speed network sniffing
> (which is one of the primary use cases of the hardware
> As soon as there is any reordering in the queue (and that is inevitable
> if you scale over multiple CPUs) your linear searches could get quite
> long and bounce cache lines like mad. Also I doubt it can be really
> done lockless.
>
> Also to be honest such a complicated and likely badly performing scheme
> just to save 4-8 bytes
> would match my own definition of "disgusting".
>
This scheme only is needed for special devices, used by PTP.
Only *selected* frames really need to gather hwtstamp.
TCP trafic wont use hwtstamp. Most UDP trafic wont use hwstamp.
I threw a "crazy idea", that can be changed if necessary, say with a cookie
that identifies the slot in NIC driver structure. O(1) lookup if really needed.
Yes, I find year 2008 not appropriate to enlarge skb with a hwstamp,
but then YMMV
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 20:56 ` Eric Dumazet
@ 2008-11-12 21:34 ` Andi Kleen
2008-11-12 22:26 ` Oliver Hartkopp
` (2 more replies)
0 siblings, 3 replies; 48+ messages in thread
From: Andi Kleen @ 2008-11-12 21:34 UTC (permalink / raw)
To: Eric Dumazet
Cc: Oliver Hartkopp, Patrick Ohly, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Eric Dumazet wrote:
> This scheme only is needed for special devices,
It's going to be supported by a large range of mass market NICs,
not special devices.
used by PTP.
>
> Only *selected* frames really need to gather hwtstamp.
Yes but the NIC cannot decide that.
> TCP trafic wont use hwtstamp.
Actually it wouldn't surprise me if one of the numerous
TCP congestion avoidance algorithms that get added all the time
starts making use of such an enhanced time stamp.
>Most UDP trafic wont use hwstamp.
That depends. For example if you're running a packet sniffer
most packets will carry it. Probably also others. e.g. dhcp
is using timestamps and it would make sense to switch it to
hw timestamps when available.
Now if you're running a DHCP server ...
> I threw a "crazy idea", that can be changed if necessary, say with a cookie
> that identifies the slot in NIC driver structure. O(1) lookup if really
> needed.
I think "crazy" describes it well because it would be a lot of dubious
and likely not performing well effort just to save 8 bytes.
BTW it wouldn't surprise me if skb heads had some free space in common
situations anyways becaus it's unlikely it fits exactly into 4K pages
in slab/slub.
-Andi
^ permalink raw reply [flat|nested] 48+ messages in thread* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 21:34 ` Andi Kleen
@ 2008-11-12 22:26 ` Oliver Hartkopp
2008-11-13 15:53 ` Ohly, Patrick
2008-11-13 6:15 ` Oliver Hartkopp
2008-11-16 8:15 ` Andrew Shewmaker
2 siblings, 1 reply; 48+ messages in thread
From: Oliver Hartkopp @ 2008-11-12 22:26 UTC (permalink / raw)
To: Andi Kleen
Cc: Eric Dumazet, Patrick Ohly, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Andi Kleen wrote:
> Eric Dumazet wrote:
>
>> This scheme only is needed for special devices,
>
> It's going to be supported by a large range of mass market NICs,
> not special devices.
>
> used by PTP.
HW Timestamps are also state-of-the-art in a large range of Controller
Area Network (CAN) NICs.
And when you want to write 'really professional' traffic sniffers or you
need to deal with sensor fusion, hw timestamps allow big improvements in
reliability of the sensor information.
>
>> TCP trafic wont use hwtstamp.
>
> Actually it wouldn't surprise me if one of the numerous
> TCP congestion avoidance algorithms that get added all the time
> starts making use of such an enhanced time stamp.
>
I would also assume that people will find new use-cases for hw
timestamps once they are available.
>> I threw a "crazy idea", that can be changed if necessary, say with a
>> cookie
>> that identifies the slot in NIC driver structure. O(1) lookup if
>> really needed.
>
> I think "crazy" describes it well because it would be a lot of dubious
> and likely not performing well effort just to save 8 bytes.
>
The crazy idea from Eric looks easier and more clearly to me than the
discussed patch set from Patrick Ohly - but i wonder if we should give a
separate hw timestamp a try ...
I know Patrick is not a friend of a CONFIG option here. But when we make
it right HW timestamp could only be disabled on CONFIG_EMBEDDED or
something like that.
Regards,
Oliver
> BTW it wouldn't surprise me if skb heads had some free space in common
> situations anyways becaus it's unlikely it fits exactly into 4K pages
> in slab/slub.
>
> -Andi
^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 22:26 ` Oliver Hartkopp
@ 2008-11-13 15:53 ` Ohly, Patrick
0 siblings, 0 replies; 48+ messages in thread
From: Ohly, Patrick @ 2008-11-13 15:53 UTC (permalink / raw)
To: Oliver Hartkopp, Andi Kleen
Cc: Eric Dumazet, netdev@vger.kernel.org, Octavian Purdila,
Stephen Hemminger, Ingo Oeser, Ronciak, John
Oliver Hartkopp wrote:
>Andi Kleen wrote:
>>Eric Dumazet wrote:
>>> I threw a "crazy idea", that can be changed if necessary, say with a
>>> cookie
>>> that identifies the slot in NIC driver structure. O(1) lookup if
>>> really needed.
>>
>> I think "crazy" describes it well because it would be a lot of dubious
>> and likely not performing well effort just to save 8 bytes.
>>
>
> The crazy idea from Eric looks easier and more clearly to me than the
> discussed patch set from Patrick Ohly - but i wonder if we should give a
> separate hw timestamp a try ...
For reasons that have been mentioned already here (some hardware can
time stamp every packet, new use cases) I think it would be important
to have the hwtstamp information right in the skb. I can change the
patch series so that it uses one additional ktime_t hwtstamp field; give
me a few days, I'm currently traveling.
> I know Patrick is not a friend of a CONFIG option here. But when we make
> it right HW timestamp could only be disabled on CONFIG_EMBEDDED or
> something like that.
I'm not a friend of a config option because it was suggested that
hardware tstamps should off on *standard* kernels. That's of little
use for users of unmodified distributions who want to run PTP.
If the feature is only disabled on special distributions, then I really
don't mind, but at the same time I wonder whether these distributions
are performance sensitive enough to care about the additional field.
Bye, Patrick
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 21:34 ` Andi Kleen
2008-11-12 22:26 ` Oliver Hartkopp
@ 2008-11-13 6:15 ` Oliver Hartkopp
2008-11-13 6:29 ` Eric Dumazet
2008-11-13 16:05 ` Ohly, Patrick
2008-11-16 8:15 ` Andrew Shewmaker
2 siblings, 2 replies; 48+ messages in thread
From: Oliver Hartkopp @ 2008-11-13 6:15 UTC (permalink / raw)
To: Patrick Ohly
Cc: Andi Kleen, Eric Dumazet, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Andi Kleen wrote:
> Eric Dumazet wrote:
>
>> I threw a "crazy idea", that can be changed if necessary, say with a
>> cookie
>> that identifies the slot in NIC driver structure. O(1) lookup if
>> really needed.
>
> I think "crazy" describes it well because it would be a lot of dubious
> and likely not performing well effort just to save 8 bytes.
Patrick,
one question about a new crazy idea:
If we would tend to add new space in the skb, won't 4 bytes enough then?
A 32 bit value gives a nsec resolution of 4.294967296 seconds or +/-
2.147483648 seconds.
If we make a 'full qualified' 64 bit sys-timestamp available anyway, the
new 32 bit value could be used as an offest (or it could be given to the
userspace directly) to calculate the hw timestamp within the
sys-timestamp context, right?
Regards,
Oliver
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-13 6:15 ` Oliver Hartkopp
@ 2008-11-13 6:29 ` Eric Dumazet
2008-11-13 16:05 ` Ohly, Patrick
1 sibling, 0 replies; 48+ messages in thread
From: Eric Dumazet @ 2008-11-13 6:29 UTC (permalink / raw)
To: Oliver Hartkopp
Cc: Patrick Ohly, Andi Kleen, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Oliver Hartkopp a écrit :
> Andi Kleen wrote:
>> Eric Dumazet wrote:
>>
>>> I threw a "crazy idea", that can be changed if necessary, say with a
>>> cookie
>>> that identifies the slot in NIC driver structure. O(1) lookup if
>>> really needed.
>>
>> I think "crazy" describes it well because it would be a lot of dubious
>> and likely not performing well effort just to save 8 bytes.
>
> Patrick,
>
> one question about a new crazy idea:
>
> If we would tend to add new space in the skb, won't 4 bytes enough then?
>
> A 32 bit value gives a nsec resolution of 4.294967296 seconds or +/-
> 2.147483648 seconds.
>
> If we make a 'full qualified' 64 bit sys-timestamp available anyway, the
> new 32 bit value could be used as an offest (or it could be given to the
> userspace directly) to calculate the hw timestamp within the
> sys-timestamp context, right?
>
If NIC is going to receive 100.000 frames per second as Andi mentioned earlier
my guess is you dont want to make sophisticated computation in NIC rx handler,
but storing raw data delivered by NIC.
Then, later, for the happy few^Wmany applications that need to get hwstamp, perform
the computation if needed ?
I hope tcp stack wont need hwstamp before 2013 or so ;)
^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-13 6:15 ` Oliver Hartkopp
2008-11-13 6:29 ` Eric Dumazet
@ 2008-11-13 16:05 ` Ohly, Patrick
1 sibling, 0 replies; 48+ messages in thread
From: Ohly, Patrick @ 2008-11-13 16:05 UTC (permalink / raw)
To: Oliver Hartkopp
Cc: Andi Kleen, Eric Dumazet, netdev@vger.kernel.org,
Octavian Purdila, Stephen Hemminger, Ingo Oeser, Ronciak, John
Oliver Hartkopp wrote:
> one question about a new crazy idea:
>
> If we would tend to add new space in the skb, won't 4 bytes enough then?
>
> A 32 bit value gives a nsec resolution of 4.294967296 seconds or +/-
> 2.147483648 seconds.
>
> If we make a 'full qualified' 64 bit sys-timestamp available anyway, the
> new 32 bit value could be used as an offest (or it could be given to the
> userspace directly) to calculate the hw timestamp within the
> sys-timestamp context, right?
The sys-timestamp is normally not generated. The offset scheme would
add a call to gettimeofdayns() even if there is no other use for the
value. This might be acceptable; the bigger problem IMHO is that without
tracking system time in the hardware, hardware and system time will quickly
(~ a few days with the hardware I was looking at, if I remember correctly)
diverge more than can be stored in the 32 bit offset.
I'd prefer to spend 64 bits and be done without the need for further
encoding hacks.
Bye, Patrick
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 21:34 ` Andi Kleen
2008-11-12 22:26 ` Oliver Hartkopp
2008-11-13 6:15 ` Oliver Hartkopp
@ 2008-11-16 8:15 ` Andrew Shewmaker
2 siblings, 0 replies; 48+ messages in thread
From: Andrew Shewmaker @ 2008-11-16 8:15 UTC (permalink / raw)
To: Andi Kleen
Cc: Eric Dumazet, Oliver Hartkopp, Patrick Ohly,
netdev@vger.kernel.org, Octavian Purdila, Stephen Hemminger,
Ingo Oeser, Ronciak, John
On Wed, Nov 12, 2008 at 2:34 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> TCP trafic wont use hwtstamp.
>
> Actually it wouldn't surprise me if one of the numerous
> TCP congestion avoidance algorithms that get added all the time
> starts making use of such an enhanced time stamp.
I would like point to CUBIC, the Probe Control Protocol, and TCP Santa
Cruz as evidence that you are correct.
CUBIC v2.3 has a new slow start variant called Hystart.
http://marc.info/?l=linux-netdev&m=122531684115306&w=2
In their tech report, they refer to the packet train technique for
measuring available bandwidth used by Dovrolis in his Pathload tool.
One of the reasons Hystart uses heuristics rather than the algorithms
described in the Pathload paper is the unavailability of high
precision timestamps.
http://www.cc.gatech.edu/fac/Constantinos.Dovrolis/pathload.html
The Probe Control Protocol is a non-TCP protocol. The authors
implemented the Pathload algorithm for measuring available bandwidth,
but they used libpcap to do it.
http://www.cs.washington.edu/research/networking/pcp/
TCP Santa Cruz is another, older, variant of TCP that proposed using
timestamps to model the depth of the queue in the bottleneck switch
between two hosts.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.981
And lastly, I would welcome good TX and RX timestamps for use with my
own research in providing better QoS on commodity networks.
--
Andrew Shewmaker
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 20:23 ` Andi Kleen
2008-11-12 20:56 ` Eric Dumazet
@ 2008-11-12 22:17 ` David Miller
1 sibling, 0 replies; 48+ messages in thread
From: David Miller @ 2008-11-12 22:17 UTC (permalink / raw)
To: ak
Cc: dada1, oliver, patrick.ohly, netdev, opurdila, shemminger, netdev,
john.ronciak
From: Andi Kleen <ak@linux.intel.com>
Date: Wed, 12 Nov 2008 21:23:37 +0100
> Also I suspect that your percent numbers are wrong, depending on the
> workload.
I think that, considered globally, Eric's estimate is an over-estimate
in fact.
^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [RFC PATCH 00/13] hardware time stamping + igb example implementation
2008-11-12 18:44 ` Oliver Hartkopp
2008-11-12 19:22 ` Eric Dumazet
@ 2008-11-19 12:39 ` Patrick Ohly
1 sibling, 0 replies; 48+ messages in thread
From: Patrick Ohly @ 2008-11-19 12:39 UTC (permalink / raw)
To: Oliver Hartkopp
Cc: Andi Kleen, netdev@vger.kernel.org, Octavian Purdila,
Stephen Hemminger, Ingo Oeser, Ronciak, John, Eric Dumazet
On Wed, 2008-11-12 at 18:44 +0000, Oliver Hartkopp wrote:
> I really wondered if you posted the series to get an impression why
> adding a new field is a good idea ;-)
Oh dear, my secret plan has been revealed ;-) No, I was really hoping
that the patch would be acceptable. After rewriting the patch series
with one additional field the code is simpler (or so I hope). I just
posted it to linux-kernel and linux-net.
> I would also vote for having a new field in the
> skb instead of this current 'bit-compression' approach which smells
> quite expensive at runtime and in code size. Not talking about the
> mentioned potential locking issues ...
The locking issues still remain: the hardware reconfiguration in the
ioctl needs to be coordinated with the ongoing time stamping. Then
there's the raw2sys callback which is made by the socket layer into the
device. That one is problematic also because finding that device isn't
as easy as I thought (see my other mails), so perhaps we should get rid
of the delayed transformation and add two fields.
--
Best Regards, Patrick Ohly
The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.
^ permalink raw reply [flat|nested] 48+ messages in thread