netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path
@ 2014-09-15 11:19 Eric Dumazet
  2014-09-15 11:19 ` [PATCH v2 net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Eric Dumazet @ 2014-09-15 11:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet

Looking at tcp_try_coalesce() I was wondering why I did :

if (tcp_hdr(from)->fin)
     return false;

The answer would be to allow the aggregation, if we simply OR the FIN and PSH
flags eventually present in @from to @to packet. (Note a change is also
needed in skb_try_coalesce() to avoid calling skb_put() with 0 len)

Then, looking at tcp_recvmsg(), I realized we access tcp_hdr(skb)->syn
(and maybe tcp_hdr(skb)->fin) for every packet we process from socket
receive queue.

We have to understand TCP flags are cold in cpu caches most of the time
(assuming TCP timestamps, and that application calls recvmsg() a long
time after incoming packet was processed), and bringing a whole
cache line only to access one bit is not very nice.

It would make sense to use in TCP input path TCP_SKB_CB(skb)->tcp_flags
as we do in output path.

This saves one cache line miss, and TCP tcp_collapse() can avoid dealing
with the headers.


Eric Dumazet (3):
  tcp: use TCP_SKB_CB(skb)->tcp_flags in input path
  tcp: allow segment with FIN in tcp_try_coalesce()
  tcp: do not copy headers in tcp_collapse()

 net/core/skbuff.c    |  3 ++-
 net/ipv4/tcp.c       | 18 ++++++++++--------
 net/ipv4/tcp_input.c | 31 ++++++++-----------------------
 net/ipv4/tcp_ipv4.c  |  1 +
 net/ipv6/tcp_ipv6.c  |  1 +
 5 files changed, 22 insertions(+), 32 deletions(-)

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags in input path
  2014-09-15 11:19 [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
@ 2014-09-15 11:19 ` Eric Dumazet
  2014-09-15 16:13   ` Neal Cardwell
  2014-09-15 11:19 ` [PATCH v2 net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce() Eric Dumazet
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2014-09-15 11:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet

Input path of TCP do not currently uses TCP_SKB_CB(skb)->tcp_flags,
which is only used in output path.

tcp_recvmsg(), looks at tcp_hdr(skb)->syn for every skb found in receive queue,
and its unfortunate because this bit is located in a cache line right before
the payload.

We can simplify TCP by copying tcp flags into TCP_SKB_CB(skb)->tcp_flags.

This patch does so, and avoids the cache line miss in tcp_recvmsg()

Following patches will
- allow a segment with FIN being coalesced in tcp_try_coalesce()
- simplify tcp_collapse() by not copying the headers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c       | 18 ++++++++++--------
 net/ipv4/tcp_input.c | 10 +++++-----
 net/ipv4/tcp_ipv4.c  |  1 +
 net/ipv6/tcp_ipv6.c  |  1 +
 4 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 541f26a67ba2..070aeff1b131 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1510,9 +1510,9 @@ static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 
 	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
 		offset = seq - TCP_SKB_CB(skb)->seq;
-		if (tcp_hdr(skb)->syn)
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
 			offset--;
-		if (offset < skb->len || tcp_hdr(skb)->fin) {
+		if (offset < skb->len || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) {
 			*off = offset;
 			return skb;
 		}
@@ -1585,7 +1585,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 			if (offset + 1 != skb->len)
 				continue;
 		}
-		if (tcp_hdr(skb)->fin) {
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
 			sk_eat_skb(sk, skb, false);
 			++seq;
 			break;
@@ -1722,11 +1722,11 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 				break;
 
 			offset = *seq - TCP_SKB_CB(skb)->seq;
-			if (tcp_hdr(skb)->syn)
+			if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
 				offset--;
 			if (offset < skb->len)
 				goto found_ok_skb;
-			if (tcp_hdr(skb)->fin)
+			if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 				goto found_fin_ok;
 			WARN(!(flags & MSG_PEEK),
 			     "recvmsg bug 2: copied %X seq %X rcvnxt %X fl %X\n",
@@ -1959,7 +1959,7 @@ skip_copy:
 		if (used + offset < skb->len)
 			continue;
 
-		if (tcp_hdr(skb)->fin)
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			goto found_fin_ok;
 		if (!(flags & MSG_PEEK)) {
 			sk_eat_skb(sk, skb, copied_early);
@@ -2160,8 +2160,10 @@ void tcp_close(struct sock *sk, long timeout)
 	 *  reader process may not have drained the data yet!
 	 */
 	while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
-		u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq -
-			  tcp_hdr(skb)->fin;
+		u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
+
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+			len--;
 		data_was_unread += len;
 		__kfree_skb(skb);
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f97003ad0af5..8f639a4face9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4093,7 +4093,7 @@ static void tcp_ofo_queue(struct sock *sk)
 		__skb_unlink(skb, &tp->out_of_order_queue);
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
 		tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
-		if (tcp_hdr(skb)->fin)
+		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			tcp_fin(sk);
 	}
 }
@@ -4513,7 +4513,7 @@ restart:
 		 * - bloated or contains data before "start" or
 		 *   overlaps to the next one.
 		 */
-		if (!tcp_hdr(skb)->syn && !tcp_hdr(skb)->fin &&
+		if (!(TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) &&
 		    (tcp_win_from_space(skb->truesize) > skb->len ||
 		     before(TCP_SKB_CB(skb)->seq, start))) {
 			end_of_skbs = false;
@@ -4532,7 +4532,8 @@ restart:
 		/* Decided to skip this, advance start seq. */
 		start = TCP_SKB_CB(skb)->end_seq;
 	}
-	if (end_of_skbs || tcp_hdr(skb)->syn || tcp_hdr(skb)->fin)
+	if (end_of_skbs ||
+	    (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
 		return;
 
 	while (before(start, end)) {
@@ -4579,8 +4580,7 @@ restart:
 				skb = tcp_collapse_one(sk, skb, list);
 				if (!skb ||
 				    skb == tail ||
-				    tcp_hdr(skb)->syn ||
-				    tcp_hdr(skb)->fin)
+				    (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
 					return;
 			}
 		}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7881b96d2b72..006b045716d8 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1638,6 +1638,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin +
 				    skb->len - th->doff * 4);
 	TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq);
+	TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th);
 	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph);
 	TCP_SKB_CB(skb)->sacked	 = 0;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 1835480336ac..de51a88bec6f 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1415,6 +1415,7 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 	TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin +
 				    skb->len - th->doff*4);
 	TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq);
+	TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th);
 	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv6_get_dsfield(hdr);
 	TCP_SKB_CB(skb)->sacked = 0;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce()
  2014-09-15 11:19 [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
  2014-09-15 11:19 ` [PATCH v2 net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
@ 2014-09-15 11:19 ` Eric Dumazet
  2014-09-15 16:20   ` Neal Cardwell
  2014-09-15 11:19 ` [PATCH v2 net-next 3/3] tcp: do not copy headers in tcp_collapse() Eric Dumazet
  2014-09-15 18:41 ` [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path David Miller
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2014-09-15 11:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet

We can allow a segment with FIN to be aggregated,
if we take care to add tcp flags,
and if skb_try_coalesce() takes care of zero sized skbs.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/skbuff.c    | 3 ++-
 net/ipv4/tcp_input.c | 4 +---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c8259ac38745..29f7f0121491 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3936,7 +3936,8 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 		return false;
 
 	if (len <= skb_tailroom(to)) {
-		BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
+		if (len)
+			BUG_ON(skb_copy_bits(from, 0, skb_put(to, len), len));
 		*delta_truesize = 0;
 		return true;
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8f639a4face9..228bf0c5ff19 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4143,9 +4143,6 @@ static bool tcp_try_coalesce(struct sock *sk,
 
 	*fragstolen = false;
 
-	if (tcp_hdr(from)->fin)
-		return false;
-
 	/* Its possible this segment overlaps with prior segment in queue */
 	if (TCP_SKB_CB(from)->seq != TCP_SKB_CB(to)->end_seq)
 		return false;
@@ -4158,6 +4155,7 @@ static bool tcp_try_coalesce(struct sock *sk,
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPRCVCOALESCE);
 	TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
 	TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
+	TCP_SKB_CB(to)->tcp_flags |= TCP_SKB_CB(from)->tcp_flags;
 	return true;
 }
 
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 3/3] tcp: do not copy headers in tcp_collapse()
  2014-09-15 11:19 [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
  2014-09-15 11:19 ` [PATCH v2 net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
  2014-09-15 11:19 ` [PATCH v2 net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce() Eric Dumazet
@ 2014-09-15 11:19 ` Eric Dumazet
  2014-09-15 16:29   ` Neal Cardwell
  2014-09-15 18:41 ` [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path David Miller
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2014-09-15 11:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Yuchung Cheng, Neal Cardwell, Eric Dumazet

tcp_collapse() wants to shrink skb so that the overhead is minimal.

Now we store tcp flags into TCP_SKB_CB(skb)->tcp_flags, we no longer
need to keep around full headers.
Whole available space is dedicated to the payload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 228bf0c5ff19..ea92f23ffaf1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4535,26 +4535,13 @@ restart:
 		return;
 
 	while (before(start, end)) {
+		int copy = min_t(int, SKB_MAX_ORDER(0, 0), end - start);
 		struct sk_buff *nskb;
-		unsigned int header = skb_headroom(skb);
-		int copy = SKB_MAX_ORDER(header, 0);
 
-		/* Too big header? This can happen with IPv6. */
-		if (copy < 0)
-			return;
-		if (end - start < copy)
-			copy = end - start;
-		nskb = alloc_skb(copy + header, GFP_ATOMIC);
+		nskb = alloc_skb(copy, GFP_ATOMIC);
 		if (!nskb)
 			return;
 
-		skb_set_mac_header(nskb, skb_mac_header(skb) - skb->head);
-		skb_set_network_header(nskb, (skb_network_header(skb) -
-					      skb->head));
-		skb_set_transport_header(nskb, (skb_transport_header(skb) -
-						skb->head));
-		skb_reserve(nskb, header);
-		memcpy(nskb->head, skb->head, header);
 		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
 		TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start;
 		__skb_queue_before(list, skb, nskb);
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags in input path
  2014-09-15 11:19 ` [PATCH v2 net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
@ 2014-09-15 16:13   ` Neal Cardwell
  0 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2014-09-15 16:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Netdev, Yuchung Cheng

On Mon, Sep 15, 2014 at 7:19 AM, Eric Dumazet <edumazet@google.com> wrote:
> Input path of TCP do not currently uses TCP_SKB_CB(skb)->tcp_flags,
> which is only used in output path.
>
> tcp_recvmsg(), looks at tcp_hdr(skb)->syn for every skb found in receive queue,
> and its unfortunate because this bit is located in a cache line right before
> the payload.
>
> We can simplify TCP by copying tcp flags into TCP_SKB_CB(skb)->tcp_flags.
>
> This patch does so, and avoids the cache line miss in tcp_recvmsg()
>
> Following patches will
> - allow a segment with FIN being coalesced in tcp_try_coalesce()
> - simplify tcp_collapse() by not copying the headers.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce()
  2014-09-15 11:19 ` [PATCH v2 net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce() Eric Dumazet
@ 2014-09-15 16:20   ` Neal Cardwell
  0 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2014-09-15 16:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Netdev, Yuchung Cheng

On Mon, Sep 15, 2014 at 7:19 AM, Eric Dumazet <edumazet@google.com> wrote:
> We can allow a segment with FIN to be aggregated,
> if we take care to add tcp flags,
> and if skb_try_coalesce() takes care of zero sized skbs.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 3/3] tcp: do not copy headers in tcp_collapse()
  2014-09-15 11:19 ` [PATCH v2 net-next 3/3] tcp: do not copy headers in tcp_collapse() Eric Dumazet
@ 2014-09-15 16:29   ` Neal Cardwell
  0 siblings, 0 replies; 8+ messages in thread
From: Neal Cardwell @ 2014-09-15 16:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Netdev, Yuchung Cheng

On Mon, Sep 15, 2014 at 7:19 AM, Eric Dumazet <edumazet@google.com> wrote:
> tcp_collapse() wants to shrink skb so that the overhead is minimal.
>
> Now we store tcp flags into TCP_SKB_CB(skb)->tcp_flags, we no longer
> need to keep around full headers.
> Whole available space is dedicated to the payload.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path
  2014-09-15 11:19 [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
                   ` (2 preceding siblings ...)
  2014-09-15 11:19 ` [PATCH v2 net-next 3/3] tcp: do not copy headers in tcp_collapse() Eric Dumazet
@ 2014-09-15 18:41 ` David Miller
  3 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2014-09-15 18:41 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, ycheng, ncardwell

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 15 Sep 2014 04:19:50 -0700

> Looking at tcp_try_coalesce() I was wondering why I did :
> 
> if (tcp_hdr(from)->fin)
>      return false;
> 
> The answer would be to allow the aggregation, if we simply OR the FIN and PSH
> flags eventually present in @from to @to packet. (Note a change is also
> needed in skb_try_coalesce() to avoid calling skb_put() with 0 len)
> 
> Then, looking at tcp_recvmsg(), I realized we access tcp_hdr(skb)->syn
> (and maybe tcp_hdr(skb)->fin) for every packet we process from socket
> receive queue.
> 
> We have to understand TCP flags are cold in cpu caches most of the time
> (assuming TCP timestamps, and that application calls recvmsg() a long
> time after incoming packet was processed), and bringing a whole
> cache line only to access one bit is not very nice.
> 
> It would make sense to use in TCP input path TCP_SKB_CB(skb)->tcp_flags
> as we do in output path.
> 
> This saves one cache line miss, and TCP tcp_collapse() can avoid dealing
> with the headers.

Looks great, applied, thanks Eric.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-09-15 18:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-15 11:19 [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path Eric Dumazet
2014-09-15 11:19 ` [PATCH v2 net-next 1/3] tcp: use TCP_SKB_CB(skb)->tcp_flags " Eric Dumazet
2014-09-15 16:13   ` Neal Cardwell
2014-09-15 11:19 ` [PATCH v2 net-next 2/3] tcp: allow segment with FIN in tcp_try_coalesce() Eric Dumazet
2014-09-15 16:20   ` Neal Cardwell
2014-09-15 11:19 ` [PATCH v2 net-next 3/3] tcp: do not copy headers in tcp_collapse() Eric Dumazet
2014-09-15 16:29   ` Neal Cardwell
2014-09-15 18:41 ` [PATCH v2 net-next 0/3] tcp: no longer keep around headers in input path David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).