netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SO_TIMESTAMPING fix and design decisions
@ 2009-09-19 17:25 Christopher Zimmermann
  2009-09-19 22:09 ` Peter P Waskiewicz Jr
  0 siblings, 1 reply; 6+ messages in thread
From: Christopher Zimmermann @ 2009-09-19 17:25 UTC (permalink / raw)
  To: netdev


[-- Attachment #1.1: Type: text/plain, Size: 3049 bytes --]

Hi, 

I'm currently working on the SO_TIMESTAMPING feature which is currently 
pretty much broken. The current status is the following:

-tx software timestamps don't work because of a race condition. See 
commit cd4d8fdad1f1320.
-rx software timestamps do work. But they are nothing new.
SO_TIMESTAMP[NS] has been available for years.

hardware timestamps only work for the Intel igb driver. I have access to 
two test machines with NICs supported by this driver.

-tx hardware timestamps do work. I still have to check what happens when 
there is high load of packets requesting timestamping.
-rx hardware timestamps work only for special PTP (Precision Time 
Protocol) packets. There exists a HWTSTAMP_FILTER_ALL option to 
timestamp all packets, but it doesn't work and it will not work. This is 
due to a problem in the hardware design. The Intel hardware is tuned for 
PTP (and so is the ioctl interface).


Right now I'm trying to fix the software tx timestamps. I see several 
ways to fix it:

-Do skb_get() before calling ops->ndo_start_xmit(). This breaks the 
wireless stack, because it is incompatible with pskb_expand_head().

-Do skb_clone() and skb_set_owner_w() before calling 
ops->ndo_start_xmit(). If the driver promises to do timestamping 
(shtx->in_progress==1), then this clone will be abandoned. -> Software 
timestamps only as fallback.

-Do skb_clone() and skb_set_owner_w() before calling 
ops->ndo_start_xmit(). Use this to send the software timestamp 
regardless of what the driver is doing. This results in software 
timestamps being always generated. Not only as fallback. The drawback is 
that userspace will eventually have to parse two timestamping messages 
(only if requested). This is not a big deal.

I chose the last option since it is easiest to implement without much 
interaction with the driver. Patch is attached. Any comments or ideas 
for a better implementation are welcome.
Currently the tx timestamp is returned with the whole packet, including 
all transport layer headers. I would like to return only the payload, 
since this would make the interface easier for userspace. There is 
nothing like a "payload" pointer in the sk_buff. How can I solve this? 
Add such a pointer?


To fix the hardware rx timestamps my idea is to change the ioctl 
interface, so that it allows userspace to fine tune the relevant 
hardware registers of the intel controler. This would allow hardware 
timestamps to be used in other scenarios than just for PTP.
I don't know any application which already uses this interface right 
now. But if there is one it would be easy to fix.
In case there appear some other controlers with timestamping support 
they will either need their own custom interface (if they are similarly 
limited) or they can just go without the ioctl interface and do hardware 
timestamping of all received packages if they are not so limited.

Could this be a way to go or what would you suggest?


Regards, 

Christopher Zimmermann

[-- Attachment #1.2: patch --]
[-- Type: application/octet-stream, Size: 4924 bytes --]

commit fe7b307ab374a495feba8c951fb7da2518a4e422
Author: Christopher Zimmermann <madroach@zakweb.de>
Date:   Sat Sep 19 19:21:28 2009 +0200

    net: Implement timestamping
    
    Avoid the skb_clone, skb_hold and software fallback problems by
    returning two seperate messages to userspace. One for software and one
    for hardware timestamp.

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index d2639c4..1bdce95 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -4435,7 +4435,7 @@ static void igb_tx_hwtstamp(struct igb_adapter *adapter, struct sk_buff *skb)
 			shhwtstamps.hwtstamp = ns_to_ktime(ns);
 			shhwtstamps.syststamp =
 				timecompare_transform(&adapter->compare, ns);
-			skb_tstamp_tx(skb, &shhwtstamps);
+			skb_tstamp_hw_tx(skb, &shhwtstamps);
 		}
 	}
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index df7b23a..315a4c4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1868,7 +1868,7 @@ static inline ktime_t net_invalid_timestamp(void)
 }
 
 /**
- * skb_tstamp_tx - queue clone of skb with send time stamps
+ * skb_tstamp_tx - queue clone of skb with hardware send time stamps
  * @orig_skb:	the original outgoing packet
  * @hwtstamps:	hardware time stamps, may be NULL if not available
  *
@@ -1878,9 +1878,21 @@ static inline ktime_t net_invalid_timestamp(void)
  * generates a software time stamp (otherwise), then queues the clone
  * to the error queue of the socket.  Errors are silently ignored.
  */
-extern void skb_tstamp_tx(struct sk_buff *orig_skb,
+extern void skb_tstamp_hw_tx(struct sk_buff *orig_skb,
 			struct skb_shared_hwtstamps *hwtstamps);
 
+/**
+ * skb_tstamp_tx - queue clone of skb with software generated send time stamps
+ * @orig_skb:	the original outgoing packet
+ *
+ * If the skb has a socket associated, then this function clones the
+ * skb (thus sharing the actual data and optional structures), generates a
+ * software time stamp, then queues the clone to the error queue of the socket.
+ * Errors are silently ignored.
+ */
+extern void skb_tstamp_tx(struct sk_buff *skb);
+			
+
 extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
 extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 560c8c9..c04d3dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1701,6 +1701,16 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	int rc;
+	struct sk_buff *skb_tstamp = NULL;
+
+	if (unlikely(skb_tx(skb)->software && skb->sk)) {
+		skb_tstamp = skb_clone(skb, 0);
+
+		/* TODO: Is a sock_hold() needed here?
+		 * skb_set_owner_w doesn't do it. */
+		if (likely(skb_tstamp))
+		    	skb_set_owner_w(skb_tstamp, skb->sk);
+	}
 
 	if (likely(!skb->next)) {
 		if (!list_empty(&ptype_all))
@@ -1721,8 +1731,11 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			skb_dst_drop(skb);
 
 		rc = ops->ndo_start_xmit(skb, dev);
-		if (rc == NETDEV_TX_OK)
+		if (rc == NETDEV_TX_OK) {
+			if (unlikely(skb_tstamp))
+				skb_tstamp_tx(skb_tstamp);
 			txq_trans_update(txq);
+		}
 		/*
 		 * TODO: if skb_orphan() was called by
 		 * dev->hard_start_xmit() (for example, the unmodified
@@ -1752,6 +1765,8 @@ gso:
 			skb->next = nskb;
 			return rc;
 		}
+		if (unlikely(skb_tstamp))
+			skb_tstamp_tx(skb_tstamp);
 		txq_trans_update(txq);
 		if (unlikely(netif_tx_queue_stopped(txq) && skb->next))
 			return NETDEV_TX_BUSY;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 80a9616..1517a5e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2967,7 +2967,7 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 }
 EXPORT_SYMBOL_GPL(skb_cow_data);
 
-void skb_tstamp_tx(struct sk_buff *orig_skb,
+void skb_tstamp_hw_tx(struct sk_buff *orig_skb,
 		struct skb_shared_hwtstamps *hwtstamps)
 {
 	struct sock *sk = orig_skb->sk;
@@ -2982,17 +2982,9 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!skb)
 		return;
 
-	if (hwtstamps) {
+	if (hwtstamps)
 		*skb_hwtstamps(skb) =
 			*hwtstamps;
-	} else {
-		/*
-		 * no hardware time stamps available,
-		 * so keep the skb_shared_tx and only
-		 * store software time stamp
-		 */
-		skb->tstamp = ktime_get_real();
-	}
 
 	serr = SKB_EXT_ERR(skb);
 	memset(serr, 0, sizeof(*serr));
@@ -3002,6 +2994,23 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (err)
 		kfree_skb(skb);
 }
+EXPORT_SYMBOL_GPL(skb_tstamp_hw_tx);
+
+void skb_tstamp_tx(struct sk_buff *skb)
+{
+	struct sock_exterr_skb *serr;
+	int err;
+
+	skb->tstamp = ktime_get_real();
+
+	serr = SKB_EXT_ERR(skb);
+	memset(serr, 0, sizeof(*serr));
+	serr->ee.ee_errno = ENOMSG;
+	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+	err = sock_queue_err_skb(skb->sk, skb);
+	if (err)
+		kfree_skb(skb);
+}
 EXPORT_SYMBOL_GPL(skb_tstamp_tx);
 
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-09-21 17:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-19 17:25 SO_TIMESTAMPING fix and design decisions Christopher Zimmermann
2009-09-19 22:09 ` Peter P Waskiewicz Jr
2009-09-20  7:52   ` Christopher Zimmermann
2009-09-20 17:48     ` Peter P Waskiewicz Jr
2009-09-20 18:50       ` Christopher Zimmermann
2009-09-21 17:59         ` John Ronciak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).