From: Christopher Zimmermann <madroach@zakweb.de>
To: netdev@vger.kernel.org
Subject: SO_TIMESTAMPING fix and design decisions
Date: Sat, 19 Sep 2009 19:25:49 +0200 [thread overview]
Message-ID: <20090919192549.0735c93a@pundit> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 3049 bytes --]
Hi,
I'm currently working on the SO_TIMESTAMPING feature which is currently
pretty much broken. The current status is the following:
-tx software timestamps don't work because of a race condition. See
commit cd4d8fdad1f1320.
-rx software timestamps do work. But they are nothing new.
SO_TIMESTAMP[NS] has been available for years.
hardware timestamps only work for the Intel igb driver. I have access to
two test machines with NICs supported by this driver.
-tx hardware timestamps do work. I still have to check what happens when
there is high load of packets requesting timestamping.
-rx hardware timestamps work only for special PTP (Precision Time
Protocol) packets. There exists a HWTSTAMP_FILTER_ALL option to
timestamp all packets, but it doesn't work and it will not work. This is
due to a problem in the hardware design. The Intel hardware is tuned for
PTP (and so is the ioctl interface).
Right now I'm trying to fix the software tx timestamps. I see several
ways to fix it:
-Do skb_get() before calling ops->ndo_start_xmit(). This breaks the
wireless stack, because it is incompatible with pskb_expand_head().
-Do skb_clone() and skb_set_owner_w() before calling
ops->ndo_start_xmit(). If the driver promises to do timestamping
(shtx->in_progress==1), then this clone will be abandoned. -> Software
timestamps only as fallback.
-Do skb_clone() and skb_set_owner_w() before calling
ops->ndo_start_xmit(). Use this to send the software timestamp
regardless of what the driver is doing. This results in software
timestamps being always generated. Not only as fallback. The drawback is
that userspace will eventually have to parse two timestamping messages
(only if requested). This is not a big deal.
I chose the last option since it is easiest to implement without much
interaction with the driver. Patch is attached. Any comments or ideas
for a better implementation are welcome.
Currently the tx timestamp is returned with the whole packet, including
all transport layer headers. I would like to return only the payload,
since this would make the interface easier for userspace. There is
nothing like a "payload" pointer in the sk_buff. How can I solve this?
Add such a pointer?
To fix the hardware rx timestamps my idea is to change the ioctl
interface, so that it allows userspace to fine tune the relevant
hardware registers of the intel controler. This would allow hardware
timestamps to be used in other scenarios than just for PTP.
I don't know any application which already uses this interface right
now. But if there is one it would be easy to fix.
In case there appear some other controlers with timestamping support
they will either need their own custom interface (if they are similarly
limited) or they can just go without the ioctl interface and do hardware
timestamping of all received packages if they are not so limited.
Could this be a way to go or what would you suggest?
Regards,
Christopher Zimmermann
[-- Attachment #1.2: patch --]
[-- Type: application/octet-stream, Size: 4924 bytes --]
commit fe7b307ab374a495feba8c951fb7da2518a4e422
Author: Christopher Zimmermann <madroach@zakweb.de>
Date: Sat Sep 19 19:21:28 2009 +0200
net: Implement timestamping
Avoid the skb_clone, skb_hold and software fallback problems by
returning two seperate messages to userspace. One for software and one
for hardware timestamp.
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index d2639c4..1bdce95 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -4435,7 +4435,7 @@ static void igb_tx_hwtstamp(struct igb_adapter *adapter, struct sk_buff *skb)
shhwtstamps.hwtstamp = ns_to_ktime(ns);
shhwtstamps.syststamp =
timecompare_transform(&adapter->compare, ns);
- skb_tstamp_tx(skb, &shhwtstamps);
+ skb_tstamp_hw_tx(skb, &shhwtstamps);
}
}
}
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index df7b23a..315a4c4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1868,7 +1868,7 @@ static inline ktime_t net_invalid_timestamp(void)
}
/**
- * skb_tstamp_tx - queue clone of skb with send time stamps
+ * skb_tstamp_tx - queue clone of skb with hardware send time stamps
* @orig_skb: the original outgoing packet
* @hwtstamps: hardware time stamps, may be NULL if not available
*
@@ -1878,9 +1878,21 @@ static inline ktime_t net_invalid_timestamp(void)
* generates a software time stamp (otherwise), then queues the clone
* to the error queue of the socket. Errors are silently ignored.
*/
-extern void skb_tstamp_tx(struct sk_buff *orig_skb,
+extern void skb_tstamp_hw_tx(struct sk_buff *orig_skb,
struct skb_shared_hwtstamps *hwtstamps);
+/**
+ * skb_tstamp_tx - queue clone of skb with software generated send time stamps
+ * @orig_skb: the original outgoing packet
+ *
+ * If the skb has a socket associated, then this function clones the
+ * skb (thus sharing the actual data and optional structures), generates a
+ * software time stamp, then queues the clone to the error queue of the socket.
+ * Errors are silently ignored.
+ */
+extern void skb_tstamp_tx(struct sk_buff *skb);
+
+
extern __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len);
extern __sum16 __skb_checksum_complete(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 560c8c9..c04d3dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1701,6 +1701,16 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
{
const struct net_device_ops *ops = dev->netdev_ops;
int rc;
+ struct sk_buff *skb_tstamp = NULL;
+
+ if (unlikely(skb_tx(skb)->software && skb->sk)) {
+ skb_tstamp = skb_clone(skb, 0);
+
+ /* TODO: Is a sock_hold() needed here?
+ * skb_set_owner_w doesn't do it. */
+ if (likely(skb_tstamp))
+ skb_set_owner_w(skb_tstamp, skb->sk);
+ }
if (likely(!skb->next)) {
if (!list_empty(&ptype_all))
@@ -1721,8 +1731,11 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
skb_dst_drop(skb);
rc = ops->ndo_start_xmit(skb, dev);
- if (rc == NETDEV_TX_OK)
+ if (rc == NETDEV_TX_OK) {
+ if (unlikely(skb_tstamp))
+ skb_tstamp_tx(skb_tstamp);
txq_trans_update(txq);
+ }
/*
* TODO: if skb_orphan() was called by
* dev->hard_start_xmit() (for example, the unmodified
@@ -1752,6 +1765,8 @@ gso:
skb->next = nskb;
return rc;
}
+ if (unlikely(skb_tstamp))
+ skb_tstamp_tx(skb_tstamp);
txq_trans_update(txq);
if (unlikely(netif_tx_queue_stopped(txq) && skb->next))
return NETDEV_TX_BUSY;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 80a9616..1517a5e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2967,7 +2967,7 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
}
EXPORT_SYMBOL_GPL(skb_cow_data);
-void skb_tstamp_tx(struct sk_buff *orig_skb,
+void skb_tstamp_hw_tx(struct sk_buff *orig_skb,
struct skb_shared_hwtstamps *hwtstamps)
{
struct sock *sk = orig_skb->sk;
@@ -2982,17 +2982,9 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
if (!skb)
return;
- if (hwtstamps) {
+ if (hwtstamps)
*skb_hwtstamps(skb) =
*hwtstamps;
- } else {
- /*
- * no hardware time stamps available,
- * so keep the skb_shared_tx and only
- * store software time stamp
- */
- skb->tstamp = ktime_get_real();
- }
serr = SKB_EXT_ERR(skb);
memset(serr, 0, sizeof(*serr));
@@ -3002,6 +2994,23 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
if (err)
kfree_skb(skb);
}
+EXPORT_SYMBOL_GPL(skb_tstamp_hw_tx);
+
+void skb_tstamp_tx(struct sk_buff *skb)
+{
+ struct sock_exterr_skb *serr;
+ int err;
+
+ skb->tstamp = ktime_get_real();
+
+ serr = SKB_EXT_ERR(skb);
+ memset(serr, 0, sizeof(*serr));
+ serr->ee.ee_errno = ENOMSG;
+ serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+ err = sock_queue_err_skb(skb->sk, skb);
+ if (err)
+ kfree_skb(skb);
+}
EXPORT_SYMBOL_GPL(skb_tstamp_tx);
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
next reply other threads:[~2009-09-19 17:26 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-19 17:25 Christopher Zimmermann [this message]
2009-09-19 22:09 ` SO_TIMESTAMPING fix and design decisions Peter P Waskiewicz Jr
2009-09-20 7:52 ` Christopher Zimmermann
2009-09-20 17:48 ` Peter P Waskiewicz Jr
2009-09-20 18:50 ` Christopher Zimmermann
2009-09-21 17:59 ` John Ronciak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090919192549.0735c93a@pundit \
--to=madroach@zakweb.de \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).