* [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and
@ 2014-07-30 15:48 Willem de Bruijn
  2014-07-30 15:48 ` [PATCH net-next v4 1/5] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Willem de Bruijn @ 2014-07-30 15:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn
Extend socket tx timestamping:
- allow multiple types of software timestamps aside from send (1)
- add software timestamp on tc enqueue (3)
- add software timestamp for TCP (4)
- add software timestamp for TCP on ACK (5)
The sk_flags option space is nearly exhausted. Also move the
many timestamp options to a new sk->sk_tstamps (2).
The patchset extends Linux tx timestamping to monitoring of latency
incurred within the kernel stack and to protocols embedded in TCP.
Complex kernel setups may have multiple layers of queueing, including
multiple instances of traffic shaping, and many classes per layer.
Many applications embed discrete payloads into TCP bytestreams for
reliability, flow control, etcetera. Detecting application tail
latency in such scenarios relies on identifying the exact queue
responsible if on the host, or the network latency if otherwise.
Changelog:
v3->v4
  - (v3 review comment) removed skb->mark packet identification (*A)
  - (v3 review comment) fixed indentation
  - tcp: fixed poll() to return POLLERR on non-zero queue
  - rebased to work without syststamp
  - comments: removed all traces of MSG_TSTAMP_.. (*B)
v2->v3
  - extend the SO_TIMESTAMPING API, instead of defining a new one.
  - add protocol independent support to correlate tstamps with data,
    based on returning skb->mark.
  - removed no-payload optimization and documentation (for now):
    I have a follow-on patch that reintroduces MSG_TSTAMP along with a
    new socket option SOF_TIMESTAMPING_OPT_ONFLAG. This is equivalent
    to sequence setsockopt(<enable>); send(..); setsockopt(<disable>),
    but avoids the need to define a MSG_TSTAMP_<TYPE> for each type.
    I will leave these three patches as follow-on, as this patchset is
    large enough as is.
v1->2
  - expand timestamping (existing and new) to SOCK_RAW and ping sockets
  - rename sock_errqueue_timestamping to scm_timestamping
  - change timestamp data format: do not add fields to scm_timestamping.
      Doing so could break legacy applications. Instead, communicate
      through an existing, but unused, field in the error message.
  - rename SOF_.._OPT_TX_NO_PAYLOAD to shorter SOF_.._OPT_TSONLY
  - move msg_tstamp test app out of patchset and to github
      git://github.com/wdebruij/kerneltools.git
v4 comments:
A: unique correlation of payload and timestamp when multiple payloads
   are in flight requires more extensive changes if we do not use
   skb->mark. That is better left to a follow-up patch.
B: similarly, I intend to submit the MSG_TSTAMP short-hand for
   setsockopt(<enable>); send; setsockopt(<disable>); as follow-up
   for review.
-- 
2.0.0.526.g5318336
^ permalink raw reply	[flat|nested] 13+ messages in thread
* [PATCH net-next v4 1/5] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
  2014-07-30 15:48 [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and Willem de Bruijn
@ 2014-07-30 15:48 ` Willem de Bruijn
  2014-07-31 20:37   ` David Miller
  2014-07-30 15:48 ` [PATCH net-next v4 2/5] net-timestamp: move timestamp flags out of sk_flags Willem de Bruijn
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2014-07-30 15:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn
Applications that request kernel tx timestamps with SO_TIMESTAMPING
read timestamps as recvmsg() ancillary data. The response is defined
implicitly as timespec[3].
1) define struct scm_timestamping explicitly and
2) add support for new tstamp types. On tx, scm_timestamping always
   accompanies a sock_extended_err. Define previously unused field
   ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND.
The reception path is not modified. On rx, no struct similar to
sock_extended_err is passed along with SCM_TIMESTAMPING.
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h        |  3 +++
 include/net/sock.h            |  4 +++-
 include/uapi/linux/errqueue.h | 15 +++++++++++++++
 net/core/skbuff.c             |  1 +
 net/socket.c                  | 20 +++++++++++---------
 5 files changed, 33 insertions(+), 10 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 281dece..477f0f6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -249,6 +249,9 @@ enum {
 	SKBTX_SHARED_FRAG = 1 << 5,
 };
 
+#define SKBTX_ANY_SW_TSTAMP	SKBTX_SW_TSTAMP
+#define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
+
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
diff --git a/include/net/sock.h b/include/net/sock.h
index b91c886..02f5b35 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2169,7 +2169,9 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 	 */
 	if (sock_flag(sk, SOCK_RCVTSTAMP) ||
 	    sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE) ||
-	    (kt.tv64 && sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE)) ||
+	    (kt.tv64 &&
+	     (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+	      skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP)) ||
 	    (hwtstamps->hwtstamp.tv64 &&
 	     sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)))
 		__sock_recv_timestamp(msg, sk, skb);
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index aacd4fb..97e1872 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -22,5 +22,20 @@ struct sock_extended_err {
 
 #define SO_EE_OFFENDER(ee)	((struct sockaddr*)((ee)+1))
 
+/**
+ *	struct scm_timestamping - timestamps exposed through cmsg
+ *
+ *	The timestamping interfaces SO_TIMESTAMPING, MSG_TSTAMP_*
+ *	communicate network timestamps by passing this struct in a cmsg with
+ *	recvmsg(). See Documentation/networking/timestamping.txt for details.
+ */
+struct scm_timestamping {
+	struct timespec ts[3];
+};
+
+/* type of ts[0], passed in ee_info */
+enum {
+	SCM_TSTAMP_SND = 1,	/* driver passed skb to NIC */
+};
 
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c1a3303..1bcd05d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3521,6 +3521,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	memset(serr, 0, sizeof(*serr));
 	serr->ee.ee_errno = ENOMSG;
 	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
+	serr->ee.ee_info = hwtstamps ? 0 : SCM_TSTAMP_SND;
 
 	err = sock_queue_err_skb(sk, skb);
 
diff --git a/net/socket.c b/net/socket.c
index d8222c0..dc0cc5d 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -106,6 +106,7 @@
 #include <linux/sockios.h>
 #include <linux/atalk.h>
 #include <net/busy_poll.h>
+#include <linux/errqueue.h>
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
 unsigned int sysctl_net_busy_read __read_mostly;
@@ -697,7 +698,7 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 	struct sk_buff *skb)
 {
 	int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP);
-	struct timespec ts[3];
+	struct scm_timestamping tss;
 	int empty = 1;
 	struct skb_shared_hwtstamps *shhwtstamps =
 		skb_hwtstamps(skb);
@@ -714,24 +715,25 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 			put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
 				 sizeof(tv), &tv);
 		} else {
-			skb_get_timestampns(skb, &ts[0]);
+			struct timespec ts;
+			skb_get_timestampns(skb, &ts);
 			put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
-				 sizeof(ts[0]), &ts[0]);
+				 sizeof(ts), &ts);
 		}
 	}
 
-
-	memset(ts, 0, sizeof(ts));
-	if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) &&
-	    ktime_to_timespec_cond(skb->tstamp, ts + 0))
+	memset(&tss, 0, sizeof(tss));
+	if ((sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+	     skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP) &&
+	    ktime_to_timespec_cond(skb->tstamp, tss.ts + 0))
 		empty = 0;
 	if (shhwtstamps &&
 	    sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE) &&
-	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, ts + 2))
+	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2))
 		empty = 0;
 	if (!empty)
 		put_cmsg(msg, SOL_SOCKET,
-			 SCM_TIMESTAMPING, sizeof(ts), &ts);
+			 SCM_TIMESTAMPING, sizeof(tss), &tss);
 }
 EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
 
-- 
2.0.0.526.g5318336
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH net-next v4 2/5] net-timestamp: move timestamp flags out of sk_flags
  2014-07-30 15:48 [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and Willem de Bruijn
  2014-07-30 15:48 ` [PATCH net-next v4 1/5] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
@ 2014-07-30 15:48 ` Willem de Bruijn
  2014-07-30 15:48 ` [PATCH net-next v4 3/5] net-timestamp: ENQ timestamp on enqueue to traffic shaping layer Willem de Bruijn
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2014-07-30 15:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn
sk_flags is reaching its limit. New timestamping options will not fit.
Move all of them into a new field sk->sk_tsflags.
Added benefit is that this removes boilerplate code to convert between
SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt.
SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive
timestamp logic (netstamp_needed). That can be simplified and this
last key removed, but will leave that for a separate patch.
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/net/sock.h | 28 ++++++++++------------------
 net/core/sock.c    | 25 ++-----------------------
 net/socket.c       |  8 ++++----
 3 files changed, 16 insertions(+), 45 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 02f5b35..cb5e15b 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -67,6 +67,7 @@
 #include <linux/atomic.h>
 #include <net/dst.h>
 #include <net/checksum.h>
+#include <linux/net_tstamp.h>
 
 struct cgroup;
 struct cgroup_subsys;
@@ -411,6 +412,7 @@ struct sock {
 	void			*sk_protinfo;
 	struct timer_list	sk_timer;
 	ktime_t			sk_stamp;
+	u16			sk_tsflags;
 	struct socket		*sk_socket;
 	void			*sk_user_data;
 	struct page_frag	sk_frag;
@@ -701,12 +703,7 @@ enum sock_flags {
 	SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
 	SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
 	SOCK_MEMALLOC, /* VM depends on this socket for swapping */
-	SOCK_TIMESTAMPING_TX_HARDWARE,  /* %SOF_TIMESTAMPING_TX_HARDWARE */
-	SOCK_TIMESTAMPING_TX_SOFTWARE,  /* %SOF_TIMESTAMPING_TX_SOFTWARE */
-	SOCK_TIMESTAMPING_RX_HARDWARE,  /* %SOF_TIMESTAMPING_RX_HARDWARE */
 	SOCK_TIMESTAMPING_RX_SOFTWARE,  /* %SOF_TIMESTAMPING_RX_SOFTWARE */
-	SOCK_TIMESTAMPING_SOFTWARE,     /* %SOF_TIMESTAMPING_SOFTWARE */
-	SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */
 	SOCK_FASYNC, /* fasync() active */
 	SOCK_RXQ_OVFL,
 	SOCK_ZEROCOPY, /* buffers from userspace */
@@ -2160,20 +2157,17 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 
 	/*
 	 * generate control messages if
-	 * - receive time stamping in software requested (SOCK_RCVTSTAMP
-	 *   or SOCK_TIMESTAMPING_RX_SOFTWARE)
+	 * - receive time stamping in software requested
 	 * - software time stamp available and wanted
-	 *   (SOCK_TIMESTAMPING_SOFTWARE)
 	 * - hardware time stamps available and wanted
-	 *   SOCK_TIMESTAMPING_RAW_HARDWARE
 	 */
 	if (sock_flag(sk, SOCK_RCVTSTAMP) ||
-	    sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE) ||
+	    (sk->sk_tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
 	    (kt.tv64 &&
-	     (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+	     (sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE ||
 	      skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP)) ||
 	    (hwtstamps->hwtstamp.tv64 &&
-	     sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE)))
+	     (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
 		__sock_recv_timestamp(msg, sk, skb);
 	else
 		sk->sk_stamp = kt;
@@ -2189,11 +2183,11 @@ static inline void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
 					  struct sk_buff *skb)
 {
 #define FLAGS_TS_OR_DROPS ((1UL << SOCK_RXQ_OVFL)			| \
-			   (1UL << SOCK_RCVTSTAMP)			| \
-			   (1UL << SOCK_TIMESTAMPING_SOFTWARE)		| \
-			   (1UL << SOCK_TIMESTAMPING_RAW_HARDWARE))
+			   (1UL << SOCK_RCVTSTAMP))
+#define TSFLAGS_ANY	  (SOF_TIMESTAMPING_SOFTWARE			| \
+			   SOF_TIMESTAMPING_RAW_HARDWARE)
 
-	if (sk->sk_flags & FLAGS_TS_OR_DROPS)
+	if (sk->sk_flags & FLAGS_TS_OR_DROPS || sk->sk_tsflags & TSFLAGS_ANY)
 		__sock_recv_ts_and_drops(msg, sk, skb);
 	else
 		sk->sk_stamp = skb->tstamp;
@@ -2203,8 +2197,6 @@ static inline void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
  * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
  * @sk:		socket sending this packet
  * @tx_flags:	filled with instructions for time stamping
- *
- * Currently only depends on SOCK_TIMESTAMPING* flags.
  */
 void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags);
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 134291d..2ad5407 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -848,22 +848,13 @@ set_rcvbuf:
 			ret = -EINVAL;
 			break;
 		}
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE,
-				  val & SOF_TIMESTAMPING_TX_HARDWARE);
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE,
-				  val & SOF_TIMESTAMPING_TX_SOFTWARE);
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE,
-				  val & SOF_TIMESTAMPING_RX_HARDWARE);
+		sk->sk_tsflags = val;
 		if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
 			sock_enable_timestamp(sk,
 					      SOCK_TIMESTAMPING_RX_SOFTWARE);
 		else
 			sock_disable_timestamp(sk,
 					       (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE));
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_SOFTWARE,
-				  val & SOF_TIMESTAMPING_SOFTWARE);
-		sock_valbool_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE,
-				  val & SOF_TIMESTAMPING_RAW_HARDWARE);
 		break;
 
 	case SO_RCVLOWAT:
@@ -1089,19 +1080,7 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		break;
 
 	case SO_TIMESTAMPING:
-		v.val = 0;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE))
-			v.val |= SOF_TIMESTAMPING_TX_HARDWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE))
-			v.val |= SOF_TIMESTAMPING_TX_SOFTWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_RX_HARDWARE))
-			v.val |= SOF_TIMESTAMPING_RX_HARDWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_RX_SOFTWARE))
-			v.val |= SOF_TIMESTAMPING_RX_SOFTWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE))
-			v.val |= SOF_TIMESTAMPING_SOFTWARE;
-		if (sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE))
-			v.val |= SOF_TIMESTAMPING_RAW_HARDWARE;
+		v.val = sk->sk_tsflags;
 		break;
 
 	case SO_RCVTIMEO:
diff --git a/net/socket.c b/net/socket.c
index dc0cc5d..255d9b8 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -613,9 +613,9 @@ EXPORT_SYMBOL(sock_release);
 void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags)
 {
 	*tx_flags = 0;
-	if (sock_flag(sk, SOCK_TIMESTAMPING_TX_HARDWARE))
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_HARDWARE)
 		*tx_flags |= SKBTX_HW_TSTAMP;
-	if (sock_flag(sk, SOCK_TIMESTAMPING_TX_SOFTWARE))
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_SOFTWARE)
 		*tx_flags |= SKBTX_SW_TSTAMP;
 	if (sock_flag(sk, SOCK_WIFI_STATUS))
 		*tx_flags |= SKBTX_WIFI_STATUS;
@@ -723,12 +723,12 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 	}
 
 	memset(&tss, 0, sizeof(tss));
-	if ((sock_flag(sk, SOCK_TIMESTAMPING_SOFTWARE) ||
+	if ((sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE ||
 	     skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP) &&
 	    ktime_to_timespec_cond(skb->tstamp, tss.ts + 0))
 		empty = 0;
 	if (shhwtstamps &&
-	    sock_flag(sk, SOCK_TIMESTAMPING_RAW_HARDWARE) &&
+	    (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
 	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2))
 		empty = 0;
 	if (!empty)
-- 
2.0.0.526.g5318336
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH net-next v4 3/5] net-timestamp: ENQ timestamp on enqueue to traffic shaping layer
  2014-07-30 15:48 [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and Willem de Bruijn
  2014-07-30 15:48 ` [PATCH net-next v4 1/5] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
  2014-07-30 15:48 ` [PATCH net-next v4 2/5] net-timestamp: move timestamp flags out of sk_flags Willem de Bruijn
@ 2014-07-30 15:48 ` Willem de Bruijn
  2014-07-31 20:38   ` David Miller
  2014-07-30 15:48 ` [PATCH net-next v4 4/5] net-timestamp: TCP timestamping Willem de Bruijn
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2014-07-30 15:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn
Kernel transmit latency is often incurred in the traffic shaping
layer. This patch adds a new timestamp on transmission just before
entering traffic shaping. When data travels through multiple devices
(bonding, tunneling, ...) each device will export an individual
timestamp.
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h          | 11 +++++++++--
 include/uapi/linux/errqueue.h   |  1 +
 include/uapi/linux/net_tstamp.h |  8 +++++---
 net/core/dev.c                  |  4 ++++
 net/core/skbuff.c               | 16 ++++++++++++----
 net/socket.c                    |  3 +++
 6 files changed, 34 insertions(+), 9 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 477f0f6..faa7ebe 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -229,7 +229,7 @@ enum {
 	/* generate hardware time stamp */
 	SKBTX_HW_TSTAMP = 1 << 0,
 
-	/* generate software time stamp */
+	/* generate software time stamp when queueing packet to NIC */
 	SKBTX_SW_TSTAMP = 1 << 1,
 
 	/* device driver is going to provide hardware time stamp */
@@ -247,9 +247,12 @@ enum {
 	 * all frags to avoid possible bad checksum
 	 */
 	SKBTX_SHARED_FRAG = 1 << 5,
+
+	/* generate software time stamp when queueing packing in TC */
+	SKBTX_ENQ_TSTAMP = 1 << 6,
 };
 
-#define SKBTX_ANY_SW_TSTAMP	SKBTX_SW_TSTAMP
+#define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP | SKBTX_ENQ_TSTAMP)
 #define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
 
 /*
@@ -2694,6 +2697,10 @@ static inline bool skb_defer_rx_timestamp(struct sk_buff *skb)
 void skb_complete_tx_timestamp(struct sk_buff *skb,
 			       struct skb_shared_hwtstamps *hwtstamps);
 
+void __skb_tstamp_tx(struct sk_buff *orig_skb,
+		     struct skb_shared_hwtstamps *hwtstamps,
+		     struct sock *sk, int tstype);
+
 /**
  * skb_tstamp_tx - queue clone of skb with send time stamps
  * @orig_skb:	the original outgoing packet
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index 97e1872..d37e5c7 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -36,6 +36,7 @@ struct scm_timestamping {
 /* type of ts[0], passed in ee_info */
 enum {
 	SCM_TSTAMP_SND = 1,	/* driver passed skb to NIC */
+	SCM_TSTAMP_ENQ,		/* data enqueued in traffic shaping layer */
 };
 
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index f53879c..99c2035 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -20,9 +20,11 @@ enum {
 	SOF_TIMESTAMPING_SOFTWARE = (1<<4),
 	SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5),
 	SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6),
-	SOF_TIMESTAMPING_MASK =
-	(SOF_TIMESTAMPING_RAW_HARDWARE - 1) |
-	SOF_TIMESTAMPING_RAW_HARDWARE
+	SOF_TIMESTAMPING_TX_ENQ = (1<<7),
+
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_ENQ,
+	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
+				 SOF_TIMESTAMPING_LAST
 };
 
 /**
diff --git a/net/core/dev.c b/net/core/dev.c
index e1b7cfa..9854bdc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -132,6 +132,7 @@
 #include <linux/hashtable.h>
 #include <linux/vmalloc.h>
 #include <linux/if_macvlan.h>
+#include <linux/errqueue.h>
 
 #include "net-sysfs.h"
 
@@ -2876,6 +2877,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 
 	skb_reset_mac_header(skb);
 
+	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_ENQ_TSTAMP))
+		__skb_tstamp_tx(skb, NULL, skb->sk, SCM_TSTAMP_ENQ);
+
 	/* Disable soft irqs for various locks below. Also
 	 * stops preemption for RCU.
 	 */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1bcd05d..8f58faa 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3490,10 +3490,10 @@ int sock_queue_err_skb(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(sock_queue_err_skb);
 
-void skb_tstamp_tx(struct sk_buff *orig_skb,
-		struct skb_shared_hwtstamps *hwtstamps)
+void __skb_tstamp_tx(struct sk_buff *orig_skb,
+		     struct skb_shared_hwtstamps *hwtstamps,
+		     struct sock *sk, int tstype)
 {
-	struct sock *sk = orig_skb->sk;
 	struct sock_exterr_skb *serr;
 	struct sk_buff *skb;
 	int err;
@@ -3521,13 +3521,21 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
 	memset(serr, 0, sizeof(*serr));
 	serr->ee.ee_errno = ENOMSG;
 	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
-	serr->ee.ee_info = hwtstamps ? 0 : SCM_TSTAMP_SND;
+	serr->ee.ee_info = tstype;
 
 	err = sock_queue_err_skb(sk, skb);
 
 	if (err)
 		kfree_skb(skb);
 }
+EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
+
+void skb_tstamp_tx(struct sk_buff *orig_skb,
+		   struct skb_shared_hwtstamps *hwtstamps)
+{
+	return __skb_tstamp_tx(orig_skb, hwtstamps, orig_skb->sk,
+			       hwtstamps ? 0 : SCM_TSTAMP_SND);
+}
 EXPORT_SYMBOL_GPL(skb_tstamp_tx);
 
 void skb_complete_wifi_ack(struct sk_buff *skb, bool acked)
diff --git a/net/socket.c b/net/socket.c
index 255d9b8..c2849ba 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -617,6 +617,9 @@ void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags)
 		*tx_flags |= SKBTX_HW_TSTAMP;
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_SOFTWARE)
 		*tx_flags |= SKBTX_SW_TSTAMP;
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_ENQ)
+		*tx_flags |= SKBTX_ENQ_TSTAMP;
+
 	if (sock_flag(sk, SOCK_WIFI_STATUS))
 		*tx_flags |= SKBTX_WIFI_STATUS;
 }
-- 
2.0.0.526.g5318336
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH net-next v4 4/5] net-timestamp: TCP timestamping
  2014-07-30 15:48 [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and Willem de Bruijn
                   ` (2 preceding siblings ...)
  2014-07-30 15:48 ` [PATCH net-next v4 3/5] net-timestamp: ENQ timestamp on enqueue to traffic shaping layer Willem de Bruijn
@ 2014-07-30 15:48 ` Willem de Bruijn
  2014-07-31 20:41   ` David Miller
  2014-07-30 15:48 ` [PATCH net-next v4 5/5] net-timestamp: ACK timestamp for bytestreams Willem de Bruijn
  2014-07-31 20:43 ` [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and David Miller
  5 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2014-07-30 15:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn
TCP timestamping extends SO_TIMESTAMPING to bytestreams.
Bytestreams do not have a 1:1 relationship between send() buffers and
network packets. The feature interprets a send call on a bytestream as
a request for a timestamp for the last byte in that send() buffer.
The choice corresponds to a request for a timestamp when all bytes in
the buffer have been sent. That assumption depends on in-order kernel
transmission. This is the common case. That said, it is possible to
construct a traffic shaping tree that would result in reordering.
The guarantee is strong, then, but not ironclad.
This implementation supports send and sendpages (splice). GSO replaces
one large packet with multiple smaller packets. This patch also copies
the option into the correct smaller packet.
This patch does not yet support timestamping on data in an initial TCP
Fast Open SYN, because that takes a very different data path.
The implementation supports a single timestamp per packet. To avoid
having multiple timestamp requests per sk_buff, the skb is locked
against extension once the flag is set.
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/tcp.c         | 23 ++++++++++++++++++-----
 net/ipv4/tcp_offload.c |  4 ++++
 2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9d2118e..739d514 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -523,7 +523,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	}
 	/* This barrier is coupled with smp_wmb() in tcp_reset() */
 	smp_rmb();
-	if (sk->sk_err)
+	if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
 		mask |= POLLERR;
 
 	return mask;
@@ -878,6 +878,11 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
 	return mss_now;
 }
 
+static bool tcp_skb_can_extend(struct sk_buff *skb)
+{
+	return !(skb_shinfo(skb)->tx_flags & SKBTX_ANY_SW_TSTAMP);
+}
+
 static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 				size_t size, int flags)
 {
@@ -911,7 +916,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 		int copy, i;
 		bool can_coalesce;
 
-		if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
+		if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0 ||
+		    !tcp_skb_can_extend(skb)) {
 new_segment:
 			if (!sk_stream_memory_free(sk))
 				goto wait_for_sndbuf;
@@ -959,8 +965,10 @@ new_segment:
 
 		copied += copy;
 		offset += copy;
-		if (!(size -= copy))
+		if (!(size -= copy)) {
+			sock_tx_timestamp(sk, &skb_shinfo(skb)->tx_flags);
 			goto out;
+		}
 
 		if (skb->len < size_goal || (flags & MSG_OOB))
 			continue;
@@ -1160,7 +1168,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 				copy = max - skb->len;
 			}
 
-			if (copy <= 0) {
+			if (copy <= 0 || !tcp_skb_can_extend(skb)) {
 new_segment:
 				/* Allocate new segment. If the interface is SG,
 				 * allocate skb fitting to single page.
@@ -1252,8 +1260,10 @@ new_segment:
 
 			from += copy;
 			copied += copy;
-			if ((seglen -= copy) == 0 && iovlen == 0)
+			if ((seglen -= copy) == 0 && iovlen == 0) {
+				sock_tx_timestamp(sk, &skb_shinfo(skb)->tx_flags);
 				goto out;
+			}
 
 			if (skb->len < max || (flags & MSG_OOB) || unlikely(tp->repair))
 				continue;
@@ -1617,6 +1627,9 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	u32 urg_hole = 0;
 
+	if (unlikely(flags & MSG_ERRQUEUE))
+		return ip_recv_error(sk, msg, len, addr_len);
+
 	if (sk_can_busy_loop(sk) && skb_queue_empty(&sk->sk_receive_queue) &&
 	    (sk->sk_state == TCP_ESTABLISHED))
 		sk_busy_loop(sk, nonblock);
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 55046ec..156e21e 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -134,6 +134,10 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 				(__force u32)delta));
 	if (skb->ip_summed != CHECKSUM_PARTIAL)
 		th->check = gso_make_checksum(skb, ~th->check);
+
+	if (unlikely(skb_shinfo(gso_skb)->tx_flags & SKBTX_SW_TSTAMP))
+		skb_shinfo(skb)->tx_flags |= SKBTX_SW_TSTAMP;
+
 out:
 	return segs;
 }
-- 
2.0.0.526.g5318336
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH net-next v4 5/5] net-timestamp: ACK timestamp for bytestreams
  2014-07-30 15:48 [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and Willem de Bruijn
                   ` (3 preceding siblings ...)
  2014-07-30 15:48 ` [PATCH net-next v4 4/5] net-timestamp: TCP timestamping Willem de Bruijn
@ 2014-07-30 15:48 ` Willem de Bruijn
  2014-07-31 20:43 ` [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and David Miller
  5 siblings, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2014-07-30 15:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, richardcochran, Willem de Bruijn
Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
in the send() call is acknowledged. It implements the feature for TCP.
The timestamp is generated when the TCP socket cumulative ACK is moved
beyond the tracked seqno for the first time. The feature ignores SACK
and FACK, because those acknowledge the specific byte, but not
necessarily the entire contents of the buffer up to that byte.
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/skbuff.h          | 7 ++++++-
 include/uapi/linux/errqueue.h   | 1 +
 include/uapi/linux/net_tstamp.h | 3 ++-
 net/ipv4/tcp_input.c            | 4 ++++
 net/socket.c                    | 2 ++
 5 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index faa7ebe..17b0df7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -250,9 +250,14 @@ enum {
 
 	/* generate software time stamp when queueing packing in TC */
 	SKBTX_ENQ_TSTAMP = 1 << 6,
+
+	/* generate software timestamp on peer data acknowledgment */
+	SKBTX_ACK_TSTAMP = 1 << 7,
 };
 
-#define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP | SKBTX_ENQ_TSTAMP)
+#define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP  | \
+				 SKBTX_ENQ_TSTAMP | \
+				 SKBTX_ACK_TSTAMP)
 #define SKBTX_ANY_TSTAMP	(SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP)
 
 /*
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index d37e5c7..81639bc 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -37,6 +37,7 @@ struct scm_timestamping {
 enum {
 	SCM_TSTAMP_SND = 1,	/* driver passed skb to NIC */
 	SCM_TSTAMP_ENQ,		/* data enqueued in traffic shaping layer */
+	SCM_TSTAMP_ACK,		/* data acknowledged by peer */
 };
 
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 99c2035..bf9f338 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -21,8 +21,9 @@ enum {
 	SOF_TIMESTAMPING_SYS_HARDWARE = (1<<5),
 	SOF_TIMESTAMPING_RAW_HARDWARE = (1<<6),
 	SOF_TIMESTAMPING_TX_ENQ = (1<<7),
+	SOF_TIMESTAMPING_TX_ACK = (1<<8),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_ENQ,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_TX_ACK,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7832d94..f52f159 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -74,6 +74,7 @@
 #include <linux/ipsec.h>
 #include <asm/unaligned.h>
 #include <net/netdma.h>
+#include <linux/errqueue.h>
 
 int sysctl_tcp_timestamps __read_mostly = 1;
 int sysctl_tcp_window_scaling __read_mostly = 1;
@@ -3102,6 +3103,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 		if (!fully_acked)
 			break;
 
+		if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_ACK_TSTAMP))
+			__skb_tstamp_tx(skb, NULL, sk, SCM_TSTAMP_ACK);
+
 		tcp_unlink_write_queue(skb, sk);
 		sk_wmem_free_skb(sk, skb);
 		if (skb == tp->retransmit_skb_hint)
diff --git a/net/socket.c b/net/socket.c
index c2849ba..340098a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -619,6 +619,8 @@ void sock_tx_timestamp(struct sock *sk, __u8 *tx_flags)
 		*tx_flags |= SKBTX_SW_TSTAMP;
 	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_ENQ)
 		*tx_flags |= SKBTX_ENQ_TSTAMP;
+	if (sk->sk_tsflags & SOF_TIMESTAMPING_TX_ACK)
+		*tx_flags |= SKBTX_ACK_TSTAMP;
 
 	if (sock_flag(sk, SOCK_WIFI_STATUS))
 		*tx_flags |= SKBTX_WIFI_STATUS;
-- 
2.0.0.526.g5318336
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 1/5] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct
  2014-07-30 15:48 ` [PATCH net-next v4 1/5] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
@ 2014-07-31 20:37   ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2014-07-31 20:37 UTC (permalink / raw)
  To: willemb; +Cc: netdev, eric.dumazet, richardcochran
From: Willem de Bruijn <willemb@google.com>
Date: Wed, 30 Jul 2014 11:48:44 -0400
> Applications that request kernel tx timestamps with SO_TIMESTAMPING
> read timestamps as recvmsg() ancillary data. The response is defined
> implicitly as timespec[3].
> 
> 1) define struct scm_timestamping explicitly and
> 
> 2) add support for new tstamp types. On tx, scm_timestamping always
>    accompanies a sock_extended_err. Define previously unused field
>    ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND.
> 
> The reception path is not modified. On rx, no struct similar to
> sock_extended_err is passed along with SCM_TIMESTAMPING.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
 ...
> +/* type of ts[0], passed in ee_info */
> +enum {
> +	SCM_TSTAMP_SND = 1,	/* driver passed skb to NIC */
> +};
 ...
> @@ -3521,6 +3521,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
>  	memset(serr, 0, sizeof(*serr));
>  	serr->ee.ee_errno = ENOMSG;
>  	serr->ee.ee_origin = SO_EE_ORIGIN_TIMESTAMPING;
> +	serr->ee.ee_info = hwtstamps ? 0 : SCM_TSTAMP_SND;
>  
>  	err = sock_queue_err_skb(sk, skb);
>  
Up until now we've placed the value zero in the ee_info field, do you have
a strong reason to not define SCM_TSTAMP_SND to zero as well?
Unless you have a compelling reason to do otherwise, we should use zero.
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 3/5] net-timestamp: ENQ timestamp on enqueue to traffic shaping layer
  2014-07-30 15:48 ` [PATCH net-next v4 3/5] net-timestamp: ENQ timestamp on enqueue to traffic shaping layer Willem de Bruijn
@ 2014-07-31 20:38   ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2014-07-31 20:38 UTC (permalink / raw)
  To: willemb; +Cc: netdev, eric.dumazet, richardcochran
From: Willem de Bruijn <willemb@google.com>
Date: Wed, 30 Jul 2014 11:48:46 -0400
> Kernel transmit latency is often incurred in the traffic shaping
> layer. This patch adds a new timestamp on transmission just before
> entering traffic shaping. When data travels through multiple devices
> (bonding, tunneling, ...) each device will export an individual
> timestamp.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
 ...
> @@ -247,9 +247,12 @@ enum {
>  	 * all frags to avoid possible bad checksum
>  	 */
>  	SKBTX_SHARED_FRAG = 1 << 5,
> +
> +	/* generate software time stamp when queueing packing in TC */
> +	SKBTX_ENQ_TSTAMP = 1 << 6,
A lot of mixed up terminology being used here.
Subject and commit message say "traffic shaping", yet this comment
says "TC" which I assume means "Traffic Classifier" which isn't
exactly right, the packet scheduler can have no classifiers attached
to it.
Everyone knows what net/sched is, and therefore it might be best to
say "packet scheduler" in all of these places.
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 4/5] net-timestamp: TCP timestamping
  2014-07-30 15:48 ` [PATCH net-next v4 4/5] net-timestamp: TCP timestamping Willem de Bruijn
@ 2014-07-31 20:41   ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2014-07-31 20:41 UTC (permalink / raw)
  To: willemb; +Cc: netdev, eric.dumazet, richardcochran
From: Willem de Bruijn <willemb@google.com>
Date: Wed, 30 Jul 2014 11:48:47 -0400
> TCP timestamping extends SO_TIMESTAMPING to bytestreams.
> 
> Bytestreams do not have a 1:1 relationship between send() buffers and
> network packets. The feature interprets a send call on a bytestream as
> a request for a timestamp for the last byte in that send() buffer.
> 
> The choice corresponds to a request for a timestamp when all bytes in
> the buffer have been sent. That assumption depends on in-order kernel
> transmission. This is the common case. That said, it is possible to
> construct a traffic shaping tree that would result in reordering.
> The guarantee is strong, then, but not ironclad.
> 
> This implementation supports send and sendpages (splice). GSO replaces
> one large packet with multiple smaller packets. This patch also copies
> the option into the correct smaller packet.
> 
> This patch does not yet support timestamping on data in an initial TCP
> Fast Open SYN, because that takes a very different data path.
> 
> The implementation supports a single timestamp per packet. To avoid
> having multiple timestamp requests per sk_buff, the skb is locked
> against extension once the flag is set.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
I'm cautious about a timestamping facility which changes the byte
stream, as this patch does.
Is it possible to define different semantics, in order to avoid this
adverse effect?  For example, only adhere to the first timestamp
request and ignore the rest?
Or perhaps come up with a cheap way to chain them up when coalescing
and expansion occurs.
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and
  2014-07-30 15:48 [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and Willem de Bruijn
                   ` (4 preceding siblings ...)
  2014-07-30 15:48 ` [PATCH net-next v4 5/5] net-timestamp: ACK timestamp for bytestreams Willem de Bruijn
@ 2014-07-31 20:43 ` David Miller
  2014-07-31 21:36   ` Willem de Bruijn
  5 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2014-07-31 20:43 UTC (permalink / raw)
  To: willemb; +Cc: netdev, eric.dumazet, richardcochran
From: Willem de Bruijn <willemb@google.com>
Date: Wed, 30 Jul 2014 11:48:43 -0400
> Extend socket tx timestamping:
> - allow multiple types of software timestamps aside from send (1)
> - add software timestamp on tc enqueue (3)
> - add software timestamp for TCP (4)
> - add software timestamp for TCP on ACK (5)
> 
> The sk_flags option space is nearly exhausted. Also move the
> many timestamp options to a new sk->sk_tstamps (2).
> 
> The patchset extends Linux tx timestamping to monitoring of latency
> incurred within the kernel stack and to protocols embedded in TCP.
> Complex kernel setups may have multiple layers of queueing, including
> multiple instances of traffic shaping, and many classes per layer.
> Many applications embed discrete payloads into TCP bytestreams for
> reliability, flow control, etcetera. Detecting application tail
> latency in such scenarios relies on identifying the exact queue
> responsible if on the host, or the network latency if otherwise.
I'm mostly fine with this patchset, the only major blocker is the bit
that changes TCP segmenting behavior when timestamps are enabled.  I
really don't want to see us doing that, because that makes the
timestamps non-passive.
Please work towards a solution to that problem and also fixup the
minor naming et al. issues I brought up too.
Thanks.
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and
  2014-07-31 20:43 ` [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and David Miller
@ 2014-07-31 21:36   ` Willem de Bruijn
  2014-08-01  5:35     ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2014-07-31 21:36 UTC (permalink / raw)
  To: David Miller; +Cc: Network Development, Eric Dumazet, Richard Cochran
> I'm mostly fine with this patchset, the only major blocker is the bit
> that changes TCP segmenting behavior when timestamps are enabled.  I
> really don't want to see us doing that, because that makes the
> timestamps non-passive.
Good point. It can be avoided with some storage overhead.
The only reason for locking the buffers is to ensure that
the tracked byte is the last byte in the packet. We have to
know the tracked byte at GSO time, to apply the SKBTX_..
flag to the right segment, and at ACK, to identify when the cumulative
ACK exceeds this byte for the first time.
An earlier internal version of the patch did not lock the buffers, but
recorded the seqno in skb_shared_info and referenced that in these
cases. The obvious drawback is having to store an u32 in
skb_shinfo. We did just regain 64b with the removal of syststamp,
though. Would this be a reasonable approach?
> Please work towards a solution to that problem and also fixup the
> minor naming et al. issues I brought up too.
The other issues are clear and easy to fix. Thanks for the review, David.
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and
  2014-07-31 21:36   ` Willem de Bruijn
@ 2014-08-01  5:35     ` David Miller
  2014-08-01 13:44       ` Willem de Bruijn
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2014-08-01  5:35 UTC (permalink / raw)
  To: willemb; +Cc: netdev, eric.dumazet, richardcochran
From: Willem de Bruijn <willemb@google.com>
Date: Thu, 31 Jul 2014 17:36:16 -0400
> An earlier internal version of the patch did not lock the buffers,
> but recorded the seqno in skb_shared_info and referenced that in
> these cases. The obvious drawback is having to store an u32 in
> skb_shinfo. We did just regain 64b with the removal of syststamp,
> though. Would this be a reasonable approach?
At the time we were discussing the removal syststamp, the intention
was to use that space for a new value that can be use to match up
timestamps properly with the packets they are for.
Originally you wanted to use skb->mark for this and then we discussed
all of the drawbacks and shortcoming of that.
What happened to those plans?
Also, there might be 4 bytes available in tcp_skb_cb.
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and
  2014-08-01  5:35     ` David Miller
@ 2014-08-01 13:44       ` Willem de Bruijn
  0 siblings, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2014-08-01 13:44 UTC (permalink / raw)
  To: David Miller; +Cc: Network Development, Eric Dumazet, Richard Cochran
> At the time we were discussing the removal syststamp, the intention
> was to use that space for a new value that can be use to match up
> timestamps properly with the packets they are for.
>
> Originally you wanted to use skb->mark for this and then we discussed
> all of the drawbacks and shortcoming of that.
>
> What happened to those plans?
Before skb->mark, in v1, I returned this stored seqno for that purpose.
Leaking the raw seqno is not ideal, and hard to work with in userspace
without the initial seqno.
An option that avoids this problem is to store the initial seqno in
sock and return the offset, to give a natural byte index into the stream.
For datagrams, the same field in skb_shared_info can store a packet
index, incremented on each timestamp request, in which case the
field in sock stores the timestamp counter.
> Also, there might be 4 bytes available in tcp_skb_cb.
I'll look into that. If the field stores the key to return to
userspace, it has to persist across GSO and into skb_tstamp_tx,
where skb->cb can longer be trusted to hold tcp_skb_cb.
A minor variation is to always return the simple counter, for both datagram
and bytestream sockets, in which case seqno is not used for this purpose,
never exposed, nor needed beyond GSO. In that case, I can try to store
this u32 in tcp_skb_cb and store the separate u16 counter in
skb_shared_info (a separate patch).
I prefer the first variant, using the same field for both purposes and
returning a byte index for bytestreams and packet index for datagram
sockets.
^ permalink raw reply	[flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-08-01 13:45 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-30 15:48 [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and Willem de Bruijn
2014-07-30 15:48 ` [PATCH net-next v4 1/5] net-timestamp: extend SCM_TIMESTAMPING ancillary data struct Willem de Bruijn
2014-07-31 20:37   ` David Miller
2014-07-30 15:48 ` [PATCH net-next v4 2/5] net-timestamp: move timestamp flags out of sk_flags Willem de Bruijn
2014-07-30 15:48 ` [PATCH net-next v4 3/5] net-timestamp: ENQ timestamp on enqueue to traffic shaping layer Willem de Bruijn
2014-07-31 20:38   ` David Miller
2014-07-30 15:48 ` [PATCH net-next v4 4/5] net-timestamp: TCP timestamping Willem de Bruijn
2014-07-31 20:41   ` David Miller
2014-07-30 15:48 ` [PATCH net-next v4 5/5] net-timestamp: ACK timestamp for bytestreams Willem de Bruijn
2014-07-31 20:43 ` [PATCH net-next v4 0/5] net-timestamp: additional sw tstamps and David Miller
2014-07-31 21:36   ` Willem de Bruijn
2014-08-01  5:35     ` David Miller
2014-08-01 13:44       ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).