netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/6] tcp: remove non GSO code
@ 2018-02-19 19:56 Eric Dumazet
  2018-02-19 19:56 ` [PATCH net-next 1/6] tcp: switch to GSO being always on Eric Dumazet
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-19 19:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh,
	Eric Dumazet, Oleksandr Natalenko, Eric Dumazet

Switching TCP to GSO mode, relying on core networking layers
to perform eventual adaptation for dumb devices was overdue.

1) Most TCP developments are done with TSO in mind.
2) Less high-resolution timers needs to be armed for TCP-pacing
3) GSO can benefit of xmit_more hint
4) Receiver GRO is more effective (as if TSO was used for real on sender)
   -> less ACK packets and overhead.
5) Write queues have less overhead (one skb holds about 64KB of payload)
6) SACK coalescing just works. (no payload in skb->head)
7) rtx rb-tree contains less packets, SACK is cheaper.
8) Removal of legacy code. Less maintenance hassles.

Note that I have left the sendpage/zerocopy paths, but they probably can
benefit from the same strategy.

Thanks to Oleksandr Natalenko for reporting a performance issue for BBR/fq_codel,
which was the main reason I worked on this patch series.

Eric Dumazet (6):
  tcp: switch to GSO being always on
  tcp: remove sk_can_gso() use
  tcp: remove sk_check_csum_caps()
  tcp: tcp_sendmsg() only deals with CHECKSUM_PARTIAL
  tcp: remove dead code from tcp_set_skb_tso_segs()
  tcp: remove dead code after CHECKSUM_PARTIAL adoption

 include/net/sock.h    | 10 +-------
 net/core/sock.c       |  2 +-
 net/ipv4/tcp.c        | 57 ++++++++++++-------------------------------
 net/ipv4/tcp_input.c  |  3 ---
 net/ipv4/tcp_ipv4.c   | 13 +++-------
 net/ipv4/tcp_output.c | 40 +++++-------------------------
 6 files changed, 26 insertions(+), 99 deletions(-)

-- 
2.16.1.291.g4437f3f132-goog

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH net-next 1/6] tcp: switch to GSO being always on
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
@ 2018-02-19 19:56 ` Eric Dumazet
  2018-02-20  1:22   ` kbuild test robot
  2018-02-19 19:56 ` [PATCH net-next 2/6] tcp: remove sk_can_gso() use Eric Dumazet
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2018-02-19 19:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh,
	Eric Dumazet, Oleksandr Natalenko, Eric Dumazet

Oleksandr Natalenko reported performance issues with BBR without FQ
packet scheduler that were root caused to lack of SG and GSO/TSO on
his configuration.

In this mode, TCP internal pacing has to setup a high resolution timer
for each MSS sent.

We could implement in TCP a strategy similar to the one adopted
in commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers drifts")
or decide to finally switch TCP stack to a GSO only mode.

This has many benefits :

1) Most TCP developments are done with TSO in mind.
2) Less high-resolution timers needs to be armed for TCP-pacing
3) GSO can benefit of xmit_more hint
4) Receiver GRO is more effective (as if TSO was used for real on sender)
   -> Lower ACK traffic
5) Write queues have less overhead (one skb holds about 64KB of payload)
6) SACK coalescing just works.
7) rtx rb-tree contains less packets, SACK is cheaper.

This patch implements the minimum patch, but we can remove some legacy
code as follow ups.

Tested:

On 40Gbit link, one netperf -t TCP_STREAM

BBR+fq:
sg on:  26 Gbits/sec
sg off: 15.7 Gbits/sec   (was 2.3 Gbit before patch)

BBR+pfifo_fast:
sg on:  24.2 Gbits/sec
sg off: 14.9 Gbits/sec  (was 0.66 Gbit before patch !!! )

BBR+fq_codel:
sg on:  24.4 Gbits/sec
sg off: 15 Gbits/sec  (was 0.66 Gbit before patch !!! )

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
---
 include/net/sock.h | 1 +
 net/core/sock.c    | 2 +-
 net/ipv4/tcp.c     | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 3aa7b7d6e6c7faddf08a77f5c0844049b31d8442..f0f576ff5603eb0f282f37fbf76138a9bdd0f724 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -417,6 +417,7 @@ struct sock {
 	struct page_frag	sk_frag;
 	netdev_features_t	sk_route_caps;
 	netdev_features_t	sk_route_nocaps;
+	netdev_features_t	sk_route_forced_caps;
 	int			sk_gso_type;
 	unsigned int		sk_gso_max_size;
 	gfp_t			sk_allocation;
diff --git a/net/core/sock.c b/net/core/sock.c
index a1fa4a548f1be714c5b505b4f269ffbee5572321..507d8c6c431965242efa19f206a1eef28d0f2cff 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1777,7 +1777,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 	u32 max_segs = 1;
 
 	sk_dst_set(sk, dst);
-	sk->sk_route_caps = dst->dev->features;
+	sk->sk_route_caps = dst->dev->features | sk->sk_route_forced_caps;
 	if (sk->sk_route_caps & NETIF_F_GSO)
 		sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
 	sk->sk_route_caps &= ~sk->sk_route_nocaps;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 48636aee23c31244494b7c7acbc911a7f1823691..4b46a2ae46e3ae89e26e7c0885347ab289f16814 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -453,6 +453,7 @@ void tcp_init_sock(struct sock *sk)
 	sk->sk_rcvbuf = sock_net(sk)->ipv4.sysctl_tcp_rmem[1];
 
 	sk_sockets_allocated_inc(sk);
+	sk->sk_route_forced_caps = NETIF_F_GSO;
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
-- 
2.16.1.291.g4437f3f132-goog

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 2/6] tcp: remove sk_can_gso() use
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
  2018-02-19 19:56 ` [PATCH net-next 1/6] tcp: switch to GSO being always on Eric Dumazet
@ 2018-02-19 19:56 ` Eric Dumazet
  2018-02-19 19:56 ` [PATCH net-next 3/6] tcp: remove sk_check_csum_caps() Eric Dumazet
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-19 19:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh,
	Eric Dumazet, Oleksandr Natalenko, Eric Dumazet

After previous commit, sk_can_gso() is always true.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c       | 34 ++++++++--------------------------
 net/ipv4/tcp_input.c |  3 ---
 2 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4b46a2ae46e3ae89e26e7c0885347ab289f16814..6f35c12af85aaed5bbed5090c7d7e63a952b8dab 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -898,7 +898,7 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 new_size_goal, size_goal;
 
-	if (!large_allowed || !sk_can_gso(sk))
+	if (!large_allowed)
 		return mss_now;
 
 	/* Note : tcp_tso_autosize() will eventually split this later */
@@ -1103,27 +1103,11 @@ static int linear_payload_sz(bool first_skb)
 	return 0;
 }
 
-static int select_size(const struct sock *sk, bool sg, bool first_skb, bool zc)
+static int select_size(bool first_skb, bool zc)
 {
-	const struct tcp_sock *tp = tcp_sk(sk);
-	int tmp = tp->mss_cache;
-
-	if (sg) {
-		if (zc)
-			return 0;
-
-		if (sk_can_gso(sk)) {
-			tmp = linear_payload_sz(first_skb);
-		} else {
-			int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER);
-
-			if (tmp >= pgbreak &&
-			    tmp <= pgbreak + (MAX_SKB_FRAGS - 1) * PAGE_SIZE)
-				tmp = pgbreak;
-		}
-	}
-
-	return tmp;
+	if (zc)
+		return 0;
+	return linear_payload_sz(first_skb);
 }
 
 void tcp_free_fastopen_req(struct tcp_sock *tp)
@@ -1188,7 +1172,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	int flags, err, copied = 0;
 	int mss_now = 0, size_goal, copied_syn = 0;
 	bool process_backlog = false;
-	bool sg, zc = false;
+	bool zc = false;
 	long timeo;
 
 	flags = msg->msg_flags;
@@ -1269,8 +1253,6 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 		goto do_error;
 
-	sg = !!(sk->sk_route_caps & NETIF_F_SG);
-
 	while (msg_data_left(msg)) {
 		int copy = 0;
 		int max = size_goal;
@@ -1298,7 +1280,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 				goto restart;
 			}
 			first_skb = tcp_rtx_and_write_queues_empty(sk);
-			linear = select_size(sk, sg, first_skb, zc);
+			linear = select_size(first_skb, zc);
 			skb = sk_stream_alloc_skb(sk, linear, sk->sk_allocation,
 						  first_skb);
 			if (!skb)
@@ -1344,7 +1326,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 			if (!skb_can_coalesce(skb, i, pfrag->page,
 					      pfrag->offset)) {
-				if (i >= sysctl_max_skb_frags || !sg) {
+				if (i >= sysctl_max_skb_frags) {
 					tcp_mark_push(tp, skb);
 					goto new_segment;
 				}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a6b48f6253e3f91d396bf6b03f06be285ba1006c..06b9c4765f422ff6b23b75f4345b0c77a0d967e2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1358,9 +1358,6 @@ static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb,
 	int len;
 	int in_sack;
 
-	if (!sk_can_gso(sk))
-		goto fallback;
-
 	/* Normally R but no L won't result in plain S */
 	if (!dup_sack &&
 	    (TCP_SKB_CB(skb)->sacked & (TCPCB_LOST|TCPCB_SACKED_RETRANS)) == TCPCB_SACKED_RETRANS)
-- 
2.16.1.291.g4437f3f132-goog

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 3/6] tcp: remove sk_check_csum_caps()
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
  2018-02-19 19:56 ` [PATCH net-next 1/6] tcp: switch to GSO being always on Eric Dumazet
  2018-02-19 19:56 ` [PATCH net-next 2/6] tcp: remove sk_can_gso() use Eric Dumazet
@ 2018-02-19 19:56 ` Eric Dumazet
  2018-02-19 19:56 ` [PATCH net-next 4/6] tcp: tcp_sendmsg() only deals with CHECKSUM_PARTIAL Eric Dumazet
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-19 19:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh,
	Eric Dumazet, Oleksandr Natalenko, Eric Dumazet

Since TCP relies on GSO, we do not need this helper anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h |  9 ---------
 net/ipv4/tcp.c     | 11 +++--------
 2 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index f0f576ff5603eb0f282f37fbf76138a9bdd0f724..b9624581d639ff7f1f9466d7e7cd50c6b87cabf5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1863,15 +1863,6 @@ static inline void sk_nocaps_add(struct sock *sk, netdev_features_t flags)
 	sk->sk_route_caps &= ~flags;
 }
 
-static inline bool sk_check_csum_caps(struct sock *sk)
-{
-	return (sk->sk_route_caps & NETIF_F_HW_CSUM) ||
-	       (sk->sk_family == PF_INET &&
-		(sk->sk_route_caps & NETIF_F_IP_CSUM)) ||
-	       (sk->sk_family == PF_INET6 &&
-		(sk->sk_route_caps & NETIF_F_IPV6_CSUM));
-}
-
 static inline int skb_do_copy_data_nocache(struct sock *sk, struct sk_buff *skb,
 					   struct iov_iter *from, char *to,
 					   int copy, int offset)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6f35c12af85aaed5bbed5090c7d7e63a952b8dab..7c41402718876ef8adc7dc90cd51994e6a4fded9 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1063,8 +1063,7 @@ EXPORT_SYMBOL_GPL(do_tcp_sendpages);
 int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
 			size_t size, int flags)
 {
-	if (!(sk->sk_route_caps & NETIF_F_SG) ||
-	    !sk_check_csum_caps(sk))
+	if (!(sk->sk_route_caps & NETIF_F_SG))
 		return sock_no_sendpage_locked(sk, page, offset, size, flags);
 
 	tcp_rate_check_app_limited(sk);  /* is sending application-limited? */
@@ -1190,7 +1189,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 			goto out_err;
 		}
 
-		zc = sk_check_csum_caps(sk) && sk->sk_route_caps & NETIF_F_SG;
+		zc = sk->sk_route_caps & NETIF_F_SG;
 		if (!zc)
 			uarg->zerocopy = 0;
 	}
@@ -1287,11 +1286,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 				goto wait_for_memory;
 
 			process_backlog = true;
-			/*
-			 * Check whether we can use HW checksum.
-			 */
-			if (sk_check_csum_caps(sk))
-				skb->ip_summed = CHECKSUM_PARTIAL;
+			skb->ip_summed = CHECKSUM_PARTIAL;
 
 			skb_entail(sk, skb);
 			copy = size_goal;
-- 
2.16.1.291.g4437f3f132-goog

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 4/6] tcp: tcp_sendmsg() only deals with CHECKSUM_PARTIAL
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
                   ` (2 preceding siblings ...)
  2018-02-19 19:56 ` [PATCH net-next 3/6] tcp: remove sk_check_csum_caps() Eric Dumazet
@ 2018-02-19 19:56 ` Eric Dumazet
  2018-02-19 19:56 ` [PATCH net-next 5/6] tcp: remove dead code from tcp_set_skb_tso_segs() Eric Dumazet
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-19 19:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh,
	Eric Dumazet, Oleksandr Natalenko, Eric Dumazet

We no longer have skbs with skb->ip_summed == CHECKSUM_NONE
in TCP write queues.

We can remove dead code in tcp_sendmsg().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7c41402718876ef8adc7dc90cd51994e6a4fded9..a33539798bf61b99760b8f5923e3df14cd7400a7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1254,14 +1254,10 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 	while (msg_data_left(msg)) {
 		int copy = 0;
-		int max = size_goal;
 
 		skb = tcp_write_queue_tail(sk);
-		if (skb) {
-			if (skb->ip_summed == CHECKSUM_NONE)
-				max = mss_now;
-			copy = max - skb->len;
-		}
+		if (skb)
+			copy = size_goal - skb->len;
 
 		if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
 			bool first_skb;
@@ -1290,7 +1286,6 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 			skb_entail(sk, skb);
 			copy = size_goal;
-			max = size_goal;
 
 			/* All packets are restored as if they have
 			 * already been sent. skb_mstamp isn't set to
@@ -1374,7 +1369,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 			goto out;
 		}
 
-		if (skb->len < max || (flags & MSG_OOB) || unlikely(tp->repair))
+		if (skb->len < size_goal || (flags & MSG_OOB) || unlikely(tp->repair))
 			continue;
 
 		if (forced_push(tp)) {
-- 
2.16.1.291.g4437f3f132-goog

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 5/6] tcp: remove dead code from tcp_set_skb_tso_segs()
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
                   ` (3 preceding siblings ...)
  2018-02-19 19:56 ` [PATCH net-next 4/6] tcp: tcp_sendmsg() only deals with CHECKSUM_PARTIAL Eric Dumazet
@ 2018-02-19 19:56 ` Eric Dumazet
  2018-02-19 19:56 ` [PATCH net-next 6/6] tcp: remove dead code after CHECKSUM_PARTIAL adoption Eric Dumazet
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-19 19:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh,
	Eric Dumazet, Oleksandr Natalenko, Eric Dumazet

We no longer have skbs with skb->ip_summed == CHECKSUM_NONE
in TCP write queues.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e9f985e42405a38fc95980da5debb7ac8b51fbb5..3867d9e59b84c4f6b26a5c8ebfabb82452fe9268 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1206,7 +1206,7 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff *skb)
 /* Initialize TSO segments for a packet. */
 static void tcp_set_skb_tso_segs(struct sk_buff *skb, unsigned int mss_now)
 {
-	if (skb->len <= mss_now || skb->ip_summed == CHECKSUM_NONE) {
+	if (skb->len <= mss_now) {
 		/* Avoid the costly divide in the normal
 		 * non-TSO case.
 		 */
-- 
2.16.1.291.g4437f3f132-goog

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 6/6] tcp: remove dead code after CHECKSUM_PARTIAL adoption
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
                   ` (4 preceding siblings ...)
  2018-02-19 19:56 ` [PATCH net-next 5/6] tcp: remove dead code from tcp_set_skb_tso_segs() Eric Dumazet
@ 2018-02-19 19:56 ` Eric Dumazet
  2018-02-20  1:45 ` [PATCH net-next 0/6] tcp: remove non GSO code Soheil Hassas Yeganeh
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-19 19:56 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh,
	Eric Dumazet, Oleksandr Natalenko, Eric Dumazet

Since all skbs in write/rtx queues have CHECKSUM_PARTIAL,
we can remove dead code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_ipv4.c   | 13 +++----------
 net/ipv4/tcp_output.c | 38 +++++---------------------------------
 2 files changed, 8 insertions(+), 43 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f3e52bc9898029ad7606479bcd1fd3dc84ed261d..2c6aec2643e8ed68b1cb20c996edbb8859cb8a0f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -561,16 +561,9 @@ void __tcp_v4_send_check(struct sk_buff *skb, __be32 saddr, __be32 daddr)
 {
 	struct tcphdr *th = tcp_hdr(skb);
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL) {
-		th->check = ~tcp_v4_check(skb->len, saddr, daddr, 0);
-		skb->csum_start = skb_transport_header(skb) - skb->head;
-		skb->csum_offset = offsetof(struct tcphdr, check);
-	} else {
-		th->check = tcp_v4_check(skb->len, saddr, daddr,
-					 csum_partial(th,
-						      th->doff << 2,
-						      skb->csum));
-	}
+	th->check = ~tcp_v4_check(skb->len, saddr, daddr, 0);
+	skb->csum_start = skb_transport_header(skb) - skb->head;
+	skb->csum_offset = offsetof(struct tcphdr, check);
 }
 
 /* This routine computes an IPv4 TCP checksum. */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3867d9e59b84c4f6b26a5c8ebfabb82452fe9268..e14ae1d657b696decfd92853dd579f90ed150851 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1335,21 +1335,9 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
 	TCP_SKB_CB(buff)->sacked = TCP_SKB_CB(skb)->sacked;
 	tcp_skb_fragment_eor(skb, buff);
 
-	if (!skb_shinfo(skb)->nr_frags && skb->ip_summed != CHECKSUM_PARTIAL) {
-		/* Copy and checksum data tail into the new buffer. */
-		buff->csum = csum_partial_copy_nocheck(skb->data + len,
-						       skb_put(buff, nsize),
-						       nsize, 0);
-
-		skb_trim(skb, len);
-
-		skb->csum = csum_block_sub(skb->csum, buff->csum, len);
-	} else {
-		skb->ip_summed = CHECKSUM_PARTIAL;
-		skb_split(skb, buff, len);
-	}
+	skb_split(skb, buff, len);
 
-	buff->ip_summed = skb->ip_summed;
+	buff->ip_summed = CHECKSUM_PARTIAL;
 
 	buff->tstamp = skb->tstamp;
 	tcp_fragment_tstamp(skb, buff);
@@ -1901,7 +1889,7 @@ static int tso_fragment(struct sock *sk, enum tcp_queue tcp_queue,
 
 	tcp_skb_fragment_eor(skb, buff);
 
-	buff->ip_summed = skb->ip_summed = CHECKSUM_PARTIAL;
+	buff->ip_summed = CHECKSUM_PARTIAL;
 	skb_split(skb, buff, len);
 	tcp_fragment_tstamp(skb, buff);
 
@@ -2113,7 +2101,7 @@ static int tcp_mtu_probe(struct sock *sk)
 	TCP_SKB_CB(nskb)->tcp_flags = TCPHDR_ACK;
 	TCP_SKB_CB(nskb)->sacked = 0;
 	nskb->csum = 0;
-	nskb->ip_summed = skb->ip_summed;
+	nskb->ip_summed = CHECKSUM_PARTIAL;
 
 	tcp_insert_write_queue_before(nskb, skb, sk);
 	tcp_highest_sack_replace(sk, skb, nskb);
@@ -2121,14 +2109,7 @@ static int tcp_mtu_probe(struct sock *sk)
 	len = 0;
 	tcp_for_write_queue_from_safe(skb, next, sk) {
 		copy = min_t(int, skb->len, probe_size - len);
-		if (nskb->ip_summed) {
-			skb_copy_bits(skb, 0, skb_put(nskb, copy), copy);
-		} else {
-			__wsum csum = skb_copy_and_csum_bits(skb, 0,
-							     skb_put(nskb, copy),
-							     copy, 0);
-			nskb->csum = csum_block_add(nskb->csum, csum, len);
-		}
+		skb_copy_bits(skb, 0, skb_put(nskb, copy), copy);
 
 		if (skb->len <= copy) {
 			/* We've eaten all the data from this skb.
@@ -2141,9 +2122,6 @@ static int tcp_mtu_probe(struct sock *sk)
 						   ~(TCPHDR_FIN|TCPHDR_PSH);
 			if (!skb_shinfo(skb)->nr_frags) {
 				skb_pull(skb, copy);
-				if (skb->ip_summed != CHECKSUM_PARTIAL)
-					skb->csum = csum_partial(skb->data,
-								 skb->len, 0);
 			} else {
 				__pskb_trim_head(skb, copy);
 				tcp_set_skb_tso_segs(skb, mss_now);
@@ -2721,12 +2699,6 @@ static bool tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
 	}
 	tcp_highest_sack_replace(sk, next_skb, skb);
 
-	if (next_skb->ip_summed == CHECKSUM_PARTIAL)
-		skb->ip_summed = CHECKSUM_PARTIAL;
-
-	if (skb->ip_summed != CHECKSUM_PARTIAL)
-		skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size);
-
 	/* Update sequence range on original skb. */
 	TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(next_skb)->end_seq;
 
-- 
2.16.1.291.g4437f3f132-goog

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 1/6] tcp: switch to GSO being always on
  2018-02-19 19:56 ` [PATCH net-next 1/6] tcp: switch to GSO being always on Eric Dumazet
@ 2018-02-20  1:22   ` kbuild test robot
  0 siblings, 0 replies; 30+ messages in thread
From: kbuild test robot @ 2018-02-20  1:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kbuild-all, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh, Eric Dumazet,
	Oleksandr Natalenko, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 16891 bytes --]

Hi Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-remove-non-GSO-code/20180220-064839
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

   net/mac80211/sta_info.h:584: warning: Function parameter or member 'status_stats.retry_count' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'status_stats.lost_packets' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'status_stats.last_tdls_pkt_time' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'msdu_retries' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'msdu_failed' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'status_stats.last_ack' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'tx_stats.packets' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'tx_stats.bytes' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'tx_stats.last_rate' not described in 'sta_info'
   net/mac80211/sta_info.h:584: warning: Function parameter or member 'msdu' not described in 'sta_info'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.active' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.cb' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.poll' not described in 'dma_buf'
   include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.active' not described in 'dma_buf'
   include/linux/dma-fence-array.h:54: warning: Function parameter or member 'work' not described in 'dma_fence_array'
   include/linux/gpio/driver.h:142: warning: Function parameter or member 'request_key' not described in 'gpio_irq_chip'
   include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.sign' not described in 'iio_chan_spec'
   include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.realbits' not described in 'iio_chan_spec'
   include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.storagebits' not described in 'iio_chan_spec'
   include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.shift' not described in 'iio_chan_spec'
   include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.repeat' not described in 'iio_chan_spec'
   include/linux/iio/iio.h:270: warning: Function parameter or member 'scan_type.endianness' not described in 'iio_chan_spec'
   include/linux/iio/hw-consumer.h:1: warning: no structured comments found
   include/linux/device.h:294: warning: Function parameter or member 'coredump' not described in 'device_driver'
   include/linux/input/sparse-keymap.h:46: warning: Function parameter or member 'sw' not described in 'key_entry'
   include/linux/mtd/rawnand.h:709: warning: Function parameter or member 'timings.sdr' not described in 'nand_data_interface'
   include/linux/mtd/rawnand.h:774: warning: Function parameter or member 'buf' not described in 'nand_op_data_instr'
   include/linux/mtd/rawnand.h:774: warning: Function parameter or member 'buf.in' not described in 'nand_op_data_instr'
   include/linux/mtd/rawnand.h:774: warning: Function parameter or member 'buf.out' not described in 'nand_op_data_instr'
   include/linux/mtd/rawnand.h:820: warning: Function parameter or member 'ctx' not described in 'nand_op_instr'
   include/linux/mtd/rawnand.h:820: warning: Function parameter or member 'ctx.cmd' not described in 'nand_op_instr'
   include/linux/mtd/rawnand.h:820: warning: Function parameter or member 'ctx.addr' not described in 'nand_op_instr'
   include/linux/mtd/rawnand.h:820: warning: Function parameter or member 'ctx.data' not described in 'nand_op_instr'
   include/linux/mtd/rawnand.h:820: warning: Function parameter or member 'ctx.waitrdy' not described in 'nand_op_instr'
   include/linux/mtd/rawnand.h:967: warning: Function parameter or member 'ctx' not described in 'nand_op_parser_pattern_elem'
   include/linux/mtd/rawnand.h:967: warning: Function parameter or member 'ctx.addr' not described in 'nand_op_parser_pattern_elem'
   include/linux/mtd/rawnand.h:967: warning: Function parameter or member 'ctx.data' not described in 'nand_op_parser_pattern_elem'
   include/linux/mtd/rawnand.h:1281: warning: Function parameter or member 'manufacturer.desc' not described in 'nand_chip'
   include/linux/mtd/rawnand.h:1281: warning: Function parameter or member 'manufacturer.priv' not described in 'nand_chip'
   include/linux/regulator/driver.h:221: warning: Function parameter or member 'resume_early' not described in 'regulator_ops'
   drivers/regulator/core.c:4299: warning: Excess function parameter 'state' description in 'regulator_suspend_late'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw0' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw1' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw2' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw3' not described in 'irb'
   arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.eadm' not described in 'irb'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_pin' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_unpin' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_res_obj' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_get_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_import_sg_table' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_vmap' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_vunmap' not described in 'drm_driver'
   include/drm/drm_drv.h:609: warning: Function parameter or member 'gem_prime_mmap' not described in 'drm_driver'
   include/drm/drm_connector.h:370: warning: Function parameter or member 'margins.left' not described in 'drm_tv_connector_state'
   include/drm/drm_connector.h:370: warning: Function parameter or member 'margins.right' not described in 'drm_tv_connector_state'
   include/drm/drm_connector.h:370: warning: Function parameter or member 'margins.top' not described in 'drm_tv_connector_state'
   include/drm/drm_connector.h:370: warning: Function parameter or member 'margins.bottom' not described in 'drm_tv_connector_state'
   include/drm/drm_vblank.h:63: warning: Function parameter or member 'event.base' not described in 'drm_pending_vblank_event'
   include/drm/drm_vblank.h:63: warning: Function parameter or member 'event.vbl' not described in 'drm_pending_vblank_event'
   include/drm/drm_vblank.h:63: warning: Function parameter or member 'event.seq' not described in 'drm_pending_vblank_event'
   drivers/gpu/drm/tve200/tve200_drv.c:1: warning: no structured comments found
   include/linux/skbuff.h:849: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'skb_mstamp' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member '__unused' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'pfmemalloc' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'offload_fwd_mark' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'offload_mr_fwd_mark' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
   include/linux/skbuff.h:849: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
   include/net/sock.h:234: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
   include/net/sock.h:234: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
   include/net/sock.h:488: warning: Function parameter or member 'sk_backlog.rmem_alloc' not described in 'sock'
   include/net/sock.h:488: warning: Function parameter or member 'sk_backlog.len' not described in 'sock'
   include/net/sock.h:488: warning: Function parameter or member 'sk_backlog.head' not described in 'sock'
   include/net/sock.h:488: warning: Function parameter or member 'sk_backlog.tail' not described in 'sock'
   include/net/sock.h:488: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
   include/net/sock.h:488: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
>> include/net/sock.h:488: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'adj_list.upper' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'adj_list.lower' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'gso_partial_features' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'switchdev_ops' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'name_assign_type' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'mpls_ptr' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'xdp_prog' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member '____cacheline_aligned_in_smp' not described in 'net_device'
   include/linux/netdevice.h:1946: warning: Function parameter or member 'qdisc_hash' not described in 'net_device'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising' not described in 'phylink_link_state'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising' not described in 'phylink_link_state'
   include/linux/rcupdate.h:570: ERROR: Unexpected indentation.
   include/linux/rcupdate.h:574: ERROR: Unexpected indentation.
   include/linux/rcupdate.h:578: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/rcupdate.h:580: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/rcupdate.h:580: WARNING: Inline literal start-string without end-string.
   kernel/time/timer.c:1259: ERROR: Unexpected indentation.
   kernel/time/timer.c:1261: ERROR: Unexpected indentation.
   kernel/time/timer.c:1262: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:110: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/wait.h:113: ERROR: Unexpected indentation.
   include/linux/wait.h:115: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/time/hrtimer.c:1113: WARNING: Block quote ends without a blank line; unexpected unindent.
   kernel/signal.c:327: WARNING: Inline literal start-string without end-string.
   drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string.
   drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string.
   drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string.
   drivers/video/fbdev/core/modedb.c:647: WARNING: Inline strong start-string without end-string.
   include/linux/iio/iio.h:191: ERROR: Unexpected indentation.
   include/linux/iio/iio.h:192: WARNING: Block quote ends without a blank line; unexpected unindent.
   include/linux/iio/iio.h:198: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/ata/libata-core.c:5920: ERROR: Unknown target name: "hw".
   drivers/message/fusion/mptbase.c:5052: WARNING: Definition list ends without a blank line; unexpected unindent.
   drivers/tty/serial/serial_core.c:1901: WARNING: Definition list ends without a blank line; unexpected unindent.
   include/linux/mtd/rawnand.h:805: ERROR: Unexpected indentation.
   include/linux/mtd/rawnand.h:1391: WARNING: Inline strong start-string without end-string.
   include/linux/mtd/rawnand.h:1393: WARNING: Inline strong start-string without end-string.
   include/linux/regulator/driver.h:273: ERROR: Unknown target name: "regulator_regmap_x_voltage".
   Documentation/driver-api/slimbus.rst:93: WARNING: Title underline too short.

vim +488 include/net/sock.h

^1da177e Linus Torvalds 2005-04-16 @488  

:::::: The code at line 488 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6812 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
                   ` (5 preceding siblings ...)
  2018-02-19 19:56 ` [PATCH net-next 6/6] tcp: remove dead code after CHECKSUM_PARTIAL adoption Eric Dumazet
@ 2018-02-20  1:45 ` Soheil Hassas Yeganeh
  2018-02-20  9:32 ` Oleksandr Natalenko
  2018-02-21 19:37 ` [PATCH net-next 0/6] tcp: remove non GSO code David Miller
  8 siblings, 0 replies; 30+ messages in thread
From: Soheil Hassas Yeganeh @ 2018-02-20  1:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Neal Cardwell, Yuchung Cheng, oleksandr,
	Eric Dumazet

On Mon, Feb 19, 2018 at 2:56 PM Eric Dumazet <edumazet@google.com> wrote:
> Switching TCP to GSO mode, relying on core networking layers
> to perform eventual adaptation for dumb devices was overdue.

> 1) Most TCP developments are done with TSO in mind.
> 2) Less high-resolution timers needs to be armed for TCP-pacing
> 3) GSO can benefit of xmit_more hint
> 4) Receiver GRO is more effective (as if TSO was used for real on sender)
>      -> less ACK packets and overhead.
> 5) Write queues have less overhead (one skb holds about 64KB of payload)
> 6) SACK coalescing just works. (no payload in skb->head)
> 7) rtx rb-tree contains less packets, SACK is cheaper.
> 8) Removal of legacy code. Less maintenance hassles.

> Note that I have left the sendpage/zerocopy paths, but they probably can
> benefit from the same strategy.

> Thanks to Oleksandr Natalenko for reporting a performance issue for
BBR/fq_codel,
> which was the main reason I worked on this patch series.

> Eric Dumazet (6):
>     tcp: switch to GSO being always on
>     tcp: remove sk_can_gso() use
>     tcp: remove sk_check_csum_caps()
>     tcp: tcp_sendmsg() only deals with CHECKSUM_PARTIAL
>     tcp: remove dead code from tcp_set_skb_tso_segs()
>     tcp: remove dead code after CHECKSUM_PARTIAL adoption

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Very nice patch-series! Thank you, Eric!

>    include/net/sock.h    | 10 +-------
>    net/core/sock.c       |  2 +-
>    net/ipv4/tcp.c        | 57 ++++++++++++-------------------------------
>    net/ipv4/tcp_input.c  |  3 ---
>    net/ipv4/tcp_ipv4.c   | 13 +++-------
>    net/ipv4/tcp_output.c | 40 +++++-------------------------
>    6 files changed, 26 insertions(+), 99 deletions(-)
> --
> 2.16.1.291.g4437f3f132-goog

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
                   ` (6 preceding siblings ...)
  2018-02-20  1:45 ` [PATCH net-next 0/6] tcp: remove non GSO code Soheil Hassas Yeganeh
@ 2018-02-20  9:32 ` Oleksandr Natalenko
  2018-02-20 15:39   ` Eric Dumazet
  2018-02-21 19:37 ` [PATCH net-next 0/6] tcp: remove non GSO code David Miller
  8 siblings, 1 reply; 30+ messages in thread
From: Oleksandr Natalenko @ 2018-02-20  9:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh, Eric Dumazet

Hi.

19.02.2018 20:56, Eric Dumazet wrote:
> Switching TCP to GSO mode, relying on core networking layers
> to perform eventual adaptation for dumb devices was overdue.
> 
> 1) Most TCP developments are done with TSO in mind.
> 2) Less high-resolution timers needs to be armed for TCP-pacing
> 3) GSO can benefit of xmit_more hint
> 4) Receiver GRO is more effective (as if TSO was used for real on 
> sender)
>    -> less ACK packets and overhead.
> 5) Write queues have less overhead (one skb holds about 64KB of 
> payload)
> 6) SACK coalescing just works. (no payload in skb->head)
> 7) rtx rb-tree contains less packets, SACK is cheaper.
> 8) Removal of legacy code. Less maintenance hassles.
> 
> Note that I have left the sendpage/zerocopy paths, but they probably 
> can
> benefit from the same strategy.
> 
> Thanks to Oleksandr Natalenko for reporting a performance issue for
> BBR/fq_codel,
> which was the main reason I worked on this patch series.

Thanks for dealing with this that fast.

Does this mean that the option to optimise internal TCP pacing is still 
an open question?

Oleksandr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20  9:32 ` Oleksandr Natalenko
@ 2018-02-20 15:39   ` Eric Dumazet
  2018-02-20 18:57     ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2018-02-20 15:39 UTC (permalink / raw)
  To: Oleksandr Natalenko, Eric Dumazet
  Cc: David S . Miller, netdev, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh

On Tue, 2018-02-20 at 10:32 +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> 19.02.2018 20:56, Eric Dumazet wrote:
> > Switching TCP to GSO mode, relying on core networking layers
> > to perform eventual adaptation for dumb devices was overdue.
> > 
> > 1) Most TCP developments are done with TSO in mind.
> > 2) Less high-resolution timers needs to be armed for TCP-pacing
> > 3) GSO can benefit of xmit_more hint
> > 4) Receiver GRO is more effective (as if TSO was used for real on 
> > sender)
> >    -> less ACK packets and overhead.
> > 5) Write queues have less overhead (one skb holds about 64KB of 
> > payload)
> > 6) SACK coalescing just works. (no payload in skb->head)
> > 7) rtx rb-tree contains less packets, SACK is cheaper.
> > 8) Removal of legacy code. Less maintenance hassles.
> > 
> > Note that I have left the sendpage/zerocopy paths, but they probably 
> > can
> > benefit from the same strategy.
> > 
> > Thanks to Oleksandr Natalenko for reporting a performance issue for
> > BBR/fq_codel,
> > which was the main reason I worked on this patch series.
> 
> Thanks for dealing with this that fast.
> 
> Does this mean that the option to optimise internal TCP pacing is still 
> an open question?

It is not an optimization that is needed, but taking into account that
highres timers can have latencies of ~2 usec or more.

When sending 64KB TSO packets, having extra 2 usec after every ~54 usec
(at 10Gbit) has no big impact, since TCP computes a slightly inflated
pacing rate anyway.

But when sending one MSS/packet every usec, this definitely can
demonstrate a big slowdown.

But the anser is yes, I will take a look at this timer drift.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 15:39   ` Eric Dumazet
@ 2018-02-20 18:57     ` Eric Dumazet
  2018-02-20 19:35       ` Oleksandr Natalenko
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2018-02-20 18:57 UTC (permalink / raw)
  To: Oleksandr Natalenko, Eric Dumazet
  Cc: David S . Miller, netdev, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh

On Tue, 2018-02-20 at 07:39 -0800, Eric Dumazet wrote:
> On Tue, 2018-02-20 at 10:32 +0100, Oleksandr Natalenko wrote:
> > Hi.
> > 
> > 19.02.2018 20:56, Eric Dumazet wrote:
> > > Switching TCP to GSO mode, relying on core networking layers
> > > to perform eventual adaptation for dumb devices was overdue.
> > > 
> > > 1) Most TCP developments are done with TSO in mind.
> > > 2) Less high-resolution timers needs to be armed for TCP-pacing
> > > 3) GSO can benefit of xmit_more hint
> > > 4) Receiver GRO is more effective (as if TSO was used for real on 
> > > sender)
> > >    -> less ACK packets and overhead.
> > > 5) Write queues have less overhead (one skb holds about 64KB of 
> > > payload)
> > > 6) SACK coalescing just works. (no payload in skb->head)
> > > 7) rtx rb-tree contains less packets, SACK is cheaper.
> > > 8) Removal of legacy code. Less maintenance hassles.
> > > 
> > > Note that I have left the sendpage/zerocopy paths, but they probably 
> > > can
> > > benefit from the same strategy.
> > > 
> > > Thanks to Oleksandr Natalenko for reporting a performance issue for
> > > BBR/fq_codel,
> > > which was the main reason I worked on this patch series.
> > 
> > Thanks for dealing with this that fast.
> > 
> > Does this mean that the option to optimise internal TCP pacing is still 
> > an open question?
> 
> It is not an optimization that is needed, but taking into account that
> highres timers can have latencies of ~2 usec or more.
> 
> When sending 64KB TSO packets, having extra 2 usec after every ~54 usec
> (at 10Gbit) has no big impact, since TCP computes a slightly inflated
> pacing rate anyway.
> 
> But when sending one MSS/packet every usec, this definitely can
> demonstrate a big slowdown.
> 
> But the anser is yes, I will take a look at this timer drift.

Actually timer drifts are not horrible (at least on my lab hosts)

But BBR has a pessimistic way to sense the burst size, as it is tied to
TSO/GSO being there.

Following patch helps a lot.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b2bca373f8bee35267df49b5947a6793fed71a12..6818042cd8a9a1778f54637861647091afd9a769 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1730,7 +1730,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 	 */
 	segs = max_t(u32, bytes / mss_now, min_tso_segs);
 
-	return min_t(u32, segs, sk->sk_gso_max_segs);
+	return segs;
 }
 EXPORT_SYMBOL(tcp_tso_autosize);
 
@@ -1742,9 +1742,10 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
 	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0;
 
-	return tso_segs ? :
-		tcp_tso_autosize(sk, mss_now,
-				 sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+	if (!tso_segs)
+		tso_segs = tcp_tso_autosize(sk, mss_now,
+				sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+	return min_t(u32, tso_segs, sk->sk_gso_max_segs);
 }
 
 /* Returns the portion of skb which can be sent right away */

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 18:57     ` Eric Dumazet
@ 2018-02-20 19:35       ` Oleksandr Natalenko
  2018-02-20 19:39         ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Oleksandr Natalenko @ 2018-02-20 19:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

Hi.

On úterý 20. února 2018 19:57:42 CET Eric Dumazet wrote:
> Actually timer drifts are not horrible (at least on my lab hosts)
> 
> But BBR has a pessimistic way to sense the burst size, as it is tied to
> TSO/GSO being there.
> 
> Following patch helps a lot.

Not really, at least if applied to v4.15.4. Still getting 2 Gbps less between 
VMs if using BBR instead of Reno.

Am I doing something wrong?

Oleksandr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 19:35       ` Oleksandr Natalenko
@ 2018-02-20 19:39         ` Eric Dumazet
  2018-02-20 19:51           ` Oleksandr Natalenko
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2018-02-20 19:39 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

On Tue, Feb 20, 2018 at 11:35 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> Hi.
>
> On úterý 20. února 2018 19:57:42 CET Eric Dumazet wrote:
>> Actually timer drifts are not horrible (at least on my lab hosts)
>>
>> But BBR has a pessimistic way to sense the burst size, as it is tied to
>> TSO/GSO being there.
>>
>> Following patch helps a lot.
>
> Not really, at least if applied to v4.15.4. Still getting 2 Gbps less between
> VMs if using BBR instead of Reno.
>
> Am I doing something wrong?

I am not trying to compare BBR and Reno on a lossless link.

Reno is running as fast as possible and will win when bufferbloat is
not an issue.

If bufferbloat is not an issue, simply use Reno and be happy ;)

My patch helps BBR only, I thought it was obvious ;)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 19:39         ` Eric Dumazet
@ 2018-02-20 19:51           ` Oleksandr Natalenko
  2018-02-20 19:56             ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Oleksandr Natalenko @ 2018-02-20 19:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

On úterý 20. února 2018 20:39:49 CET Eric Dumazet wrote:
> I am not trying to compare BBR and Reno on a lossless link.
> 
> Reno is running as fast as possible and will win when bufferbloat is
> not an issue.
> 
> If bufferbloat is not an issue, simply use Reno and be happy ;)
> 
> My patch helps BBR only, I thought it was obvious ;)

Umm, yes, and my point was rather something like "the speed on a lossless link 
while using BBR with and without this patch is the same". Sorry for a 
confusion. I guess, the key word here is "lossless".

Oleksandr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 19:51           ` Oleksandr Natalenko
@ 2018-02-20 19:56             ` Eric Dumazet
  2018-02-20 20:06               ` Oleksandr Natalenko
  2018-02-20 20:09               ` Eric Dumazet
  0 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-20 19:56 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

On Tue, Feb 20, 2018 at 11:51 AM, Oleksandr Natalenko
<oleksandr@natalenko.name> wrote:
> On úterý 20. února 2018 20:39:49 CET Eric Dumazet wrote:
>> I am not trying to compare BBR and Reno on a lossless link.
>>
>> Reno is running as fast as possible and will win when bufferbloat is
>> not an issue.
>>
>> If bufferbloat is not an issue, simply use Reno and be happy ;)
>>
>> My patch helps BBR only, I thought it was obvious ;)
>
> Umm, yes, and my point was rather something like "the speed on a lossless link
> while using BBR with and without this patch is the same". Sorry for a
> confusion. I guess, the key word here is "lossless".

That is with the other patches _not_ applied ?

Here the gain is quite big, since BBR can setup a slightly better
cwnd, allowing proper GRO on receiver.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 19:56             ` Eric Dumazet
@ 2018-02-20 20:06               ` Oleksandr Natalenko
  2018-02-20 20:09               ` Eric Dumazet
  1 sibling, 0 replies; 30+ messages in thread
From: Oleksandr Natalenko @ 2018-02-20 20:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

On úterý 20. února 2018 20:56:24 CET Eric Dumazet wrote:
> That is with the other patches _not_ applied ?

Yes, other patches are not applied. It is v4.15.4 + this patch only + BBR + 
fq_codel or pfifo_fast. Shall I re-test it on the net-next with the whole 
patchset (because it is not applied cleanly to 4.15)?

Oleksandr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 19:56             ` Eric Dumazet
  2018-02-20 20:06               ` Oleksandr Natalenko
@ 2018-02-20 20:09               ` Eric Dumazet
  2018-02-20 20:45                 ` Oleksandr Natalenko
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2018-02-20 20:09 UTC (permalink / raw)
  To: Eric Dumazet, Oleksandr Natalenko
  Cc: David S . Miller, netdev, Neal Cardwell, Yuchung Cheng,
	Soheil Hassas Yeganeh

On Tue, 2018-02-20 at 11:56 -0800, Eric Dumazet wrote:
> On Tue, Feb 20, 2018 at 11:51 AM, Oleksandr Natalenko
> <oleksandr@natalenko.name> wrote:
> > On úterý 20. února 2018 20:39:49 CET Eric Dumazet wrote:
> > > I am not trying to compare BBR and Reno on a lossless link.
> > > 
> > > Reno is running as fast as possible and will win when bufferbloat is
> > > not an issue.
> > > 
> > > If bufferbloat is not an issue, simply use Reno and be happy ;)
> > > 
> > > My patch helps BBR only, I thought it was obvious ;)
> > 
> > Umm, yes, and my point was rather something like "the speed on a lossless link
> > while using BBR with and without this patch is the same". Sorry for a
> > confusion. I guess, the key word here is "lossless".
> 
> That is with the other patches _not_ applied ?
> 
> Here the gain is quite big, since BBR can setup a slightly better
> cwnd, allowing proper GRO on receiver.

Also you can tune your NIC to accept few MSS per GSO/TSO packet

ip link set dev eth0 gso_max_segs 2

So even if TSO/GSO is there, BBR should not use sk->sk_gso_max_segs to
size its bursts, since burt sizes are also impacting GRO on the
receiver.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 20:09               ` Eric Dumazet
@ 2018-02-20 20:45                 ` Oleksandr Natalenko
  2018-02-20 23:21                   ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Oleksandr Natalenko @ 2018-02-20 20:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

On úterý 20. února 2018 21:09:37 CET Eric Dumazet wrote:
> Also you can tune your NIC to accept few MSS per GSO/TSO packet
> 
> ip link set dev eth0 gso_max_segs 2
> 
> So even if TSO/GSO is there, BBR should not use sk->sk_gso_max_segs to
> size its bursts, since burt sizes are also impacting GRO on the
> receiver.

net-next + 7 patches (6 from the patchset + this one).

Before playing with gso_max_segs:

BBR+fq
sg on:  4.39 Gbits/sec
sg off: 1.33 Gbits/sec

BBR+fq_codel
sg on:  4.02 Gbits/sec
sg off: 1.41 Gbits/sec

BBR+pfifo_fast
sg on:  3.66 Gbits/sec
sg off: 1.41 Gbits/sec

Reno+fq
sg on:  5.69 Gbits/sec
sg off: 1.53 Gbits/sec

Reno+fq_codel
sg on:  6.33 Gbits/sec
sg off: 1.50 Gbits/sec

Reno+pfifo_fast
sg on:  6.26 Gbits/sec
sg off: 1.48 Gbits/sec

After "ip link set dev eth1 gso_max_segs 2":

BBR+fq
sg on:  806 Mbits/sec
sg off: 886 Mbits/sec

BBR+fq_codel
sg on:  206 Mbits/sec
sg off: 207 Mbits/sec

BBR+pfifo_fast
sg on:  220 Mbits/sec
sg off: 200 Mbits/sec

Reno+fq
sg on:  2.16 Gbits/sec
sg off: 1.27 Gbits/sec

Reno+fq_codel
sg on:  2.45 Gbits/sec
sg off: 1.52 Gbits/sec

Reno+pfifo_fast
sg on:  2.31 Gbits/sec
sg off: 1.54 Gbits/sec

Oleksandr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 20:45                 ` Oleksandr Natalenko
@ 2018-02-20 23:21                   ` Eric Dumazet
  2018-02-21  6:14                     ` Oleksandr Natalenko
  2018-02-21 14:43                     ` [PATCH net] tcp_bbr: better deal with suboptimal GSO Eric Dumazet
  0 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-20 23:21 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

On Tue, 2018-02-20 at 21:45 +0100, Oleksandr Natalenko wrote:
> On úterý 20. února 2018 21:09:37 CET Eric Dumazet wrote:
> > Also you can tune your NIC to accept few MSS per GSO/TSO packet
> > 
> > ip link set dev eth0 gso_max_segs 2
> > 
> > So even if TSO/GSO is there, BBR should not use sk->sk_gso_max_segs to
> > size its bursts, since burt sizes are also impacting GRO on the
> > receiver.
> 
> net-next + 7 patches (6 from the patchset + this one).

My latest patch (fixing BBR underestimation of cwnd)
was meant for net tree, on a NIC where SG/TSO/GSO) are disabled.

( ie when sk->sk_gso_max_segs is not set to 'infinite' )

It is packet scheduler independent really.

Tested here with pfifo_fast, TSO/GSO off.

Before patch :
for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
    691  (ss -temoi shows cwnd is stuck around 6 )
    667
    651
    631
    517

After patch :
# for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
   1733 (ss -temoi shows cwnd is around 386 )
   1778
   1746
   1781
   1718

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-20 23:21                   ` Eric Dumazet
@ 2018-02-21  6:14                     ` Oleksandr Natalenko
  2018-02-21 14:43                     ` [PATCH net] tcp_bbr: better deal with suboptimal GSO Eric Dumazet
  1 sibling, 0 replies; 30+ messages in thread
From: Oleksandr Natalenko @ 2018-02-21  6:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
	Yuchung Cheng, Soheil Hassas Yeganeh

Hi.

On středa 21. února 2018 0:21:37 CET Eric Dumazet wrote:
> My latest patch (fixing BBR underestimation of cwnd)
> was meant for net tree, on a NIC where SG/TSO/GSO) are disabled.
> 
> ( ie when sk->sk_gso_max_segs is not set to 'infinite' )
> 
> It is packet scheduler independent really.
> 
> Tested here with pfifo_fast, TSO/GSO off.

Well, before the patch with BBR and sg off here it is ~450 Mbps for fq and 
~115 Mbps for pfifo_fast. So, comparing to what I see with the patch (850 and 
200 respectively), it is definitely an improvement.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH net] tcp_bbr: better deal with suboptimal GSO
  2018-02-20 23:21                   ` Eric Dumazet
  2018-02-21  6:14                     ` Oleksandr Natalenko
@ 2018-02-21 14:43                     ` Eric Dumazet
  2018-02-21 15:01                       ` Paolo Abeni
                                         ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-02-21 14:43 UTC (permalink / raw)
  To: Oleksandr Natalenko, David Miller
  Cc: Eric Dumazet, netdev, Soheil Hassas Yeganeh, Neal Cardwell

From: Eric Dumazet <edumazet@google.com>

BBR uses tcp_tso_autosize() in an attempt to probe what would be the
burst sizes and to adjust cwnd in bbr_target_cwnd() with following
gold formula :

/* Allow enough full-sized skbs in flight to utilize end systems. */
cwnd += 3 * bbr->tso_segs_goal;

But GSO can be lacking or be constrained to very small
units (ip link set dev ... gso_max_segs 2)

What we really want is to have enough packets in flight so that both
GSO and GRO are efficient.

So in the case GSO is off or downgraded, we still want to have the same
number of packets in flight as if GSO/TSO was fully operational, so
that GRO can hopefully be working efficiently.

To fix this issue, we make tcp_tso_autosize() unaware of
sk->sk_gso_max_segs

Only tcp_tso_segs() has to enforce the gso_max_segs limit.

Tested:

ethtool -K eth0 tso off gso off
tc qd replace dev eth0 root pfifo_fast

Before patch:
for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
    691  (ss -temoi shows cwnd is stuck around 6 )
    667
    651
    631
    517

After patch :
# for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
   1733 (ss -temoi shows cwnd is around 386 )
   1778
   1746
   1781
   1718

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
---
 net/ipv4/tcp_output.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b2bca373f8bee35267df49b5947a6793fed71a12..6818042cd8a9a1778f54637861647091afd9a769 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1730,7 +1730,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 	 */
 	segs = max_t(u32, bytes / mss_now, min_tso_segs);
 
-	return min_t(u32, segs, sk->sk_gso_max_segs);
+	return segs;
 }
 EXPORT_SYMBOL(tcp_tso_autosize);
 
@@ -1742,9 +1742,10 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
 	const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
 	u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0;
 
-	return tso_segs ? :
-		tcp_tso_autosize(sk, mss_now,
-				 sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+	if (!tso_segs)
+		tso_segs = tcp_tso_autosize(sk, mss_now,
+				sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs);
+	return min_t(u32, tso_segs, sk->sk_gso_max_segs);
 }
 
 /* Returns the portion of skb which can be sent right away */

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net] tcp_bbr: better deal with suboptimal GSO
  2018-02-21 14:43                     ` [PATCH net] tcp_bbr: better deal with suboptimal GSO Eric Dumazet
@ 2018-02-21 15:01                       ` Paolo Abeni
  2018-02-21 15:09                         ` Eric Dumazet
  2018-02-21 15:14                       ` Neal Cardwell
  2018-02-22 19:16                       ` David Miller
  2 siblings, 1 reply; 30+ messages in thread
From: Paolo Abeni @ 2018-02-21 15:01 UTC (permalink / raw)
  To: Eric Dumazet, Oleksandr Natalenko, David Miller
  Cc: Eric Dumazet, netdev, Soheil Hassas Yeganeh, Neal Cardwell

Hi,

On Wed, 2018-02-21 at 06:43 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> BBR uses tcp_tso_autosize() in an attempt to probe what would be the
> burst sizes and to adjust cwnd in bbr_target_cwnd() with following
> gold formula :
> 
> /* Allow enough full-sized skbs in flight to utilize end systems. */
> cwnd += 3 * bbr->tso_segs_goal;
> 
> But GSO can be lacking or be constrained to very small
> units (ip link set dev ... gso_max_segs 2)
> 
> What we really want is to have enough packets in flight so that both
> GSO and GRO are efficient.
> 
> So in the case GSO is off or downgraded, we still want to have the same
> number of packets in flight as if GSO/TSO was fully operational, so
> that GRO can hopefully be working efficiently.
> 
> To fix this issue, we make tcp_tso_autosize() unaware of
> sk->sk_gso_max_segs
> 
> Only tcp_tso_segs() has to enforce the gso_max_segs limit.
> 
> Tested:
> 
> ethtool -K eth0 tso off gso off
> tc qd replace dev eth0 root pfifo_fast
> 
> Before patch:
> for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
>     691  (ss -temoi shows cwnd is stuck around 6 )
>     667
>     651
>     631
>     517
> 
> After patch :
> # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
>    1733 (ss -temoi shows cwnd is around 386 )
>    1778
>    1746
>    1781
>    1718
> 
> Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  net/ipv4/tcp_output.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index b2bca373f8bee35267df49b5947a6793fed71a12..6818042cd8a9a1778f54637861647091afd9a769 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1730,7 +1730,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
>  	 */
>  	segs = max_t(u32, bytes / mss_now, min_tso_segs);
>  
> -	return min_t(u32, segs, sk->sk_gso_max_segs);
> +	return segs;
>  }
>  EXPORT_SYMBOL(tcp_tso_autosize);
>  

Very minor nit, why don't:

	return max_t(u32, bytes / mss_now, min_tso_segs);

and drop the 'segs' local variable?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] tcp_bbr: better deal with suboptimal GSO
  2018-02-21 15:01                       ` Paolo Abeni
@ 2018-02-21 15:09                         ` Eric Dumazet
  2018-02-21 15:55                           ` Paolo Abeni
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2018-02-21 15:09 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Eric Dumazet, Oleksandr Natalenko, David Miller, netdev,
	Soheil Hassas Yeganeh, Neal Cardwell

On Wed, Feb 21, 2018 at 7:01 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>
> Very minor nit, why don't:
>
>         return max_t(u32, bytes / mss_now, min_tso_segs);
>
> and drop the 'segs' local variable?

Simply to ease backports.

We had some constant changes in this function in the past.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] tcp_bbr: better deal with suboptimal GSO
  2018-02-21 14:43                     ` [PATCH net] tcp_bbr: better deal with suboptimal GSO Eric Dumazet
  2018-02-21 15:01                       ` Paolo Abeni
@ 2018-02-21 15:14                       ` Neal Cardwell
  2018-02-21 15:18                         ` Soheil Hassas Yeganeh
  2018-02-22 19:16                       ` David Miller
  2 siblings, 1 reply; 30+ messages in thread
From: Neal Cardwell @ 2018-02-21 15:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Oleksandr Natalenko, David Miller, Eric Dumazet, netdev,
	Soheil Hassas Yeganeh

On Wed, Feb 21, 2018 at 9:43 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> BBR uses tcp_tso_autosize() in an attempt to probe what would be the
> burst sizes and to adjust cwnd in bbr_target_cwnd() with following
> gold formula :
>
> /* Allow enough full-sized skbs in flight to utilize end systems. */
> cwnd += 3 * bbr->tso_segs_goal;
>
> But GSO can be lacking or be constrained to very small
> units (ip link set dev ... gso_max_segs 2)
>
> What we really want is to have enough packets in flight so that both
> GSO and GRO are efficient.
>
> So in the case GSO is off or downgraded, we still want to have the same
> number of packets in flight as if GSO/TSO was fully operational, so
> that GRO can hopefully be working efficiently.
>
> To fix this issue, we make tcp_tso_autosize() unaware of
> sk->sk_gso_max_segs
>
> Only tcp_tso_segs() has to enforce the gso_max_segs limit.
>
> Tested:
>
> ethtool -K eth0 tso off gso off
> tc qd replace dev eth0 root pfifo_fast
>
> Before patch:
> for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
>     691  (ss -temoi shows cwnd is stuck around 6 )
>     667
>     651
>     631
>     517
>
> After patch :
> # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
>    1733 (ss -temoi shows cwnd is around 386 )
>    1778
>    1746
>    1781
>    1718
>
> Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  net/ipv4/tcp_output.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] tcp_bbr: better deal with suboptimal GSO
  2018-02-21 15:14                       ` Neal Cardwell
@ 2018-02-21 15:18                         ` Soheil Hassas Yeganeh
  0 siblings, 0 replies; 30+ messages in thread
From: Soheil Hassas Yeganeh @ 2018-02-21 15:18 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Eric Dumazet, oleksandr, David Miller, Eric Dumazet, netdev

On Wed, Feb 21, 2018 at 10:14 AM Neal Cardwell <ncardwell@google.com> wrote:

> On Wed, Feb 21, 2018 at 9:43 AM, Eric Dumazet <eric.dumazet@gmail.com>
wrote:
> >
> > From: Eric Dumazet <edumazet@google.com>
> >
> > BBR uses tcp_tso_autosize() in an attempt to probe what would be the
> > burst sizes and to adjust cwnd in bbr_target_cwnd() with following
> > gold formula :
> >
> > /* Allow enough full-sized skbs in flight to utilize end systems. */
> > cwnd += 3 * bbr->tso_segs_goal;
> >
> > But GSO can be lacking or be constrained to very small
> > units (ip link set dev ... gso_max_segs 2)
> >
> > What we really want is to have enough packets in flight so that both
> > GSO and GRO are efficient.
> >
> > So in the case GSO is off or downgraded, we still want to have the same
> > number of packets in flight as if GSO/TSO was fully operational, so
> > that GRO can hopefully be working efficiently.
> >
> > To fix this issue, we make tcp_tso_autosize() unaware of
> > sk->sk_gso_max_segs
> >
> > Only tcp_tso_segs() has to enforce the gso_max_segs limit.
> >
> > Tested:
> >
> > ethtool -K eth0 tso off gso off
> > tc qd replace dev eth0 root pfifo_fast
> >
> > Before patch:
> > for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
> >     691  (ss -temoi shows cwnd is stuck around 6 )
> >     667
> >     651
> >     631
> >     517
> >
> > After patch :
> > # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
> >    1733 (ss -temoi shows cwnd is around 386 )
> >    1778
> >    1746
> >    1781
> >    1718
> >
> > Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> > ---
> >  net/ipv4/tcp_output.c |    9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)

> Acked-by: Neal Cardwell <ncardwell@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thank you Eric for the nice patch!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] tcp_bbr: better deal with suboptimal GSO
  2018-02-21 15:09                         ` Eric Dumazet
@ 2018-02-21 15:55                           ` Paolo Abeni
  0 siblings, 0 replies; 30+ messages in thread
From: Paolo Abeni @ 2018-02-21 15:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, Oleksandr Natalenko, David Miller, netdev,
	Soheil Hassas Yeganeh, Neal Cardwell

On Wed, 2018-02-21 at 07:09 -0800, Eric Dumazet wrote:
> On Wed, Feb 21, 2018 at 7:01 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > Very minor nit, why don't:
> > 
> >         return max_t(u32, bytes / mss_now, min_tso_segs);
> > 
> > and drop the 'segs' local variable?
> 
> Simply to ease backports.
> 
> We had some constant changes in this function in the past.

Ok, thank you for the explanation. No objections on my side.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
                   ` (7 preceding siblings ...)
  2018-02-20  9:32 ` Oleksandr Natalenko
@ 2018-02-21 19:37 ` David Miller
  2018-02-28 20:10   ` Marcelo Ricardo Leitner
  8 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2018-02-21 19:37 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, ncardwell, ycheng, soheil, oleksandr, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 19 Feb 2018 11:56:46 -0800

> Switching TCP to GSO mode, relying on core networking layers
> to perform eventual adaptation for dumb devices was overdue.
> 
> 1) Most TCP developments are done with TSO in mind.
> 2) Less high-resolution timers needs to be armed for TCP-pacing
> 3) GSO can benefit of xmit_more hint
> 4) Receiver GRO is more effective (as if TSO was used for real on sender)
>    -> less ACK packets and overhead.
> 5) Write queues have less overhead (one skb holds about 64KB of payload)
> 6) SACK coalescing just works. (no payload in skb->head)
> 7) rtx rb-tree contains less packets, SACK is cheaper.
> 8) Removal of legacy code. Less maintenance hassles.
> 
> Note that I have left the sendpage/zerocopy paths, but they probably can
> benefit from the same strategy.
> 
> Thanks to Oleksandr Natalenko for reporting a performance issue for BBR/fq_codel,
> which was the main reason I worked on this patch series.

Series applied, thanks Eric.

SCTP might want to do something similar, and if so we can get rid
of sk_can_gso() too.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] tcp_bbr: better deal with suboptimal GSO
  2018-02-21 14:43                     ` [PATCH net] tcp_bbr: better deal with suboptimal GSO Eric Dumazet
  2018-02-21 15:01                       ` Paolo Abeni
  2018-02-21 15:14                       ` Neal Cardwell
@ 2018-02-22 19:16                       ` David Miller
  2 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2018-02-22 19:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: oleksandr, edumazet, netdev, soheil, ncardwell

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Feb 2018 06:43:03 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> BBR uses tcp_tso_autosize() in an attempt to probe what would be the
> burst sizes and to adjust cwnd in bbr_target_cwnd() with following
> gold formula :
> 
> /* Allow enough full-sized skbs in flight to utilize end systems. */
> cwnd += 3 * bbr->tso_segs_goal;
> 
> But GSO can be lacking or be constrained to very small
> units (ip link set dev ... gso_max_segs 2)
> 
> What we really want is to have enough packets in flight so that both
> GSO and GRO are efficient.
> 
> So in the case GSO is off or downgraded, we still want to have the same
> number of packets in flight as if GSO/TSO was fully operational, so
> that GRO can hopefully be working efficiently.
> 
> To fix this issue, we make tcp_tso_autosize() unaware of
> sk->sk_gso_max_segs
> 
> Only tcp_tso_segs() has to enforce the gso_max_segs limit.
 . ..
> Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>


Applied and queued up for -stable, thanks Eric.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 0/6] tcp: remove non GSO code
  2018-02-21 19:37 ` [PATCH net-next 0/6] tcp: remove non GSO code David Miller
@ 2018-02-28 20:10   ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 30+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-02-28 20:10 UTC (permalink / raw)
  To: David Miller
  Cc: edumazet, netdev, ncardwell, ycheng, soheil, oleksandr,
	eric.dumazet, linux-sctp

On Wed, Feb 21, 2018 at 02:37:48PM -0500, David Miller wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Mon, 19 Feb 2018 11:56:46 -0800
> 
> > Switching TCP to GSO mode, relying on core networking layers
> > to perform eventual adaptation for dumb devices was overdue.
> > 
> > 1) Most TCP developments are done with TSO in mind.
> > 2) Less high-resolution timers needs to be armed for TCP-pacing
> > 3) GSO can benefit of xmit_more hint
> > 4) Receiver GRO is more effective (as if TSO was used for real on sender)
> >    -> less ACK packets and overhead.
> > 5) Write queues have less overhead (one skb holds about 64KB of payload)
> > 6) SACK coalescing just works. (no payload in skb->head)
> > 7) rtx rb-tree contains less packets, SACK is cheaper.
> > 8) Removal of legacy code. Less maintenance hassles.
> > 
> > Note that I have left the sendpage/zerocopy paths, but they probably can
> > benefit from the same strategy.
> > 
> > Thanks to Oleksandr Natalenko for reporting a performance issue for BBR/fq_codel,
> > which was the main reason I worked on this patch series.
> 
> Series applied, thanks Eric.
> 
> SCTP might want to do something similar, and if so we can get rid
> of sk_can_gso() too.

Cc'ing linux-sctp and adding to the ToDo here, although it may be too
soon for SCTP. GSO support was added just a few months ago and
considering that it is not that much widely used as TCP, I fear we may
have some issues that didn't show up yet.

  Marcelo

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-02-28 20:10 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-19 19:56 [PATCH net-next 0/6] tcp: remove non GSO code Eric Dumazet
2018-02-19 19:56 ` [PATCH net-next 1/6] tcp: switch to GSO being always on Eric Dumazet
2018-02-20  1:22   ` kbuild test robot
2018-02-19 19:56 ` [PATCH net-next 2/6] tcp: remove sk_can_gso() use Eric Dumazet
2018-02-19 19:56 ` [PATCH net-next 3/6] tcp: remove sk_check_csum_caps() Eric Dumazet
2018-02-19 19:56 ` [PATCH net-next 4/6] tcp: tcp_sendmsg() only deals with CHECKSUM_PARTIAL Eric Dumazet
2018-02-19 19:56 ` [PATCH net-next 5/6] tcp: remove dead code from tcp_set_skb_tso_segs() Eric Dumazet
2018-02-19 19:56 ` [PATCH net-next 6/6] tcp: remove dead code after CHECKSUM_PARTIAL adoption Eric Dumazet
2018-02-20  1:45 ` [PATCH net-next 0/6] tcp: remove non GSO code Soheil Hassas Yeganeh
2018-02-20  9:32 ` Oleksandr Natalenko
2018-02-20 15:39   ` Eric Dumazet
2018-02-20 18:57     ` Eric Dumazet
2018-02-20 19:35       ` Oleksandr Natalenko
2018-02-20 19:39         ` Eric Dumazet
2018-02-20 19:51           ` Oleksandr Natalenko
2018-02-20 19:56             ` Eric Dumazet
2018-02-20 20:06               ` Oleksandr Natalenko
2018-02-20 20:09               ` Eric Dumazet
2018-02-20 20:45                 ` Oleksandr Natalenko
2018-02-20 23:21                   ` Eric Dumazet
2018-02-21  6:14                     ` Oleksandr Natalenko
2018-02-21 14:43                     ` [PATCH net] tcp_bbr: better deal with suboptimal GSO Eric Dumazet
2018-02-21 15:01                       ` Paolo Abeni
2018-02-21 15:09                         ` Eric Dumazet
2018-02-21 15:55                           ` Paolo Abeni
2018-02-21 15:14                       ` Neal Cardwell
2018-02-21 15:18                         ` Soheil Hassas Yeganeh
2018-02-22 19:16                       ` David Miller
2018-02-21 19:37 ` [PATCH net-next 0/6] tcp: remove non GSO code David Miller
2018-02-28 20:10   ` Marcelo Ricardo Leitner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).