[RFC net-next 0/6] tcp: remove prequeue and header prediction

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC net-next 0/6] tcp: remove prequeue and header prediction
@ 2017-07-27 23:31 Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 1/6] tcp: remove prequeue support Florian Westphal
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Florian Westphal @ 2017-07-27 23:31 UTC (permalink / raw)
  To: netdev; +Cc: ycheng, ncardwell, edumazet, soheil, weiwan, brakmo

This RFC removes tcp prequeueing and header prediction support.

After a hallway discussion with Eric Dumazet some
maybe-not-so-useful-anymore TCP stack features came up, HP and
Prequeue among these.

So this RFC proposes to axe both.

In brief, TCP prequeue assumes a single-process-blocking-read
design, which is not that common anymore, and the most frequently
used high-performance networking program that does this is netperf :)

With more commong (e)poll designs, prequeue doesn't work.

The idea behind prequeueing isn't so bad in itself; it moves
part of tcp processing -- including ack processing (including
retransmit queue processing) into process context.
However, removing it would not just avoid some code, for most
programs it elimiates dead code.

As processing then always occurs in BH context, it would allow us
to experiment e.g. with bulk-freeing of skb heads when a packet acks
data on the retransmit queue.

Header prediction is also less useful nowadays.
For packet trains, GRO will aggregate packets so we do not get
a per-packet benefit.
Header prediction will also break down with light packet loss due to SACK.

So, In short: What do others think?

Florian Westphal (6):
      tcp: remove prequeue support
      tcp: reindent two spots after prequeue removal
      tcp: remove low_latency sysctl
      tcp: remove header prediction
      tcp: remove CA_ACK_SLOWPATH
      tcp: remove unused mib counters

 Documentation/networking/ip-sysctl.txt |    7 
 include/linux/tcp.h                    |   15 -
 include/net/tcp.h                      |   40 ----
 include/uapi/linux/snmp.h              |    8 
 net/ipv4/proc.c                        |    8 
 net/ipv4/sysctl_net_ipv4.c             |    3 
 net/ipv4/tcp.c                         |  109 -----------
 net/ipv4/tcp_input.c                   |  303 +++------------------------------
 net/ipv4/tcp_ipv4.c                    |   63 ------
 net/ipv4/tcp_minisocks.c               |    3 
 net/ipv4/tcp_output.c                  |    2 
 net/ipv4/tcp_timer.c                   |   12 -
 net/ipv4/tcp_westwood.c                |   31 ---
 net/ipv6/tcp_ipv6.c                    |    3 
 14 files changed, 43 insertions(+), 564 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH net-next 1/6] tcp: remove prequeue support
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
@ 2017-07-27 23:31 ` Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 2/6] tcp: reindent two spots after prequeue removal Florian Westphal
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2017-07-27 23:31 UTC (permalink / raw)
  To: netdev
  Cc: ycheng, ncardwell, edumazet, soheil, weiwan, brakmo,
	Florian Westphal

prequeue is a tcp receive optimization that moves part of rx processing from
bh to process context.

This only works if the socket being processed belongs to a process that
blocks in recv on this socket.  In practice, this doesn't happen anymore that
often, as servers normally use an event driven (epoll) model.

Even normal clients (e.g. web browsers) commonly use many tcp connections
in parallel.

Lets remove this.

This has measure impact only on netperf from host to local vm.
There are no changes with bulk transfers that use select/poll etc. to
get notified about new data.

I also see no changes when using netperf between two physical hosts
with ixgbe interfaces.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/tcp.h      |   9 ----
 include/net/tcp.h        |  11 -----
 net/ipv4/tcp.c           | 105 -----------------------------------------------
 net/ipv4/tcp_input.c     |  62 ----------------------------
 net/ipv4/tcp_ipv4.c      |  61 +--------------------------
 net/ipv4/tcp_minisocks.c |   1 -
 net/ipv4/tcp_timer.c     |  12 ------
 net/ipv6/tcp_ipv6.c      |   3 +-
 8 files changed, 2 insertions(+), 262 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 542ca1ae02c4..32fb37cfb0d1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -192,15 +192,6 @@ struct tcp_sock {
 
 	struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
 
-	/* Data for direct copy to user */
-	struct {
-		struct sk_buff_head	prequeue;
-		struct task_struct	*task;
-		struct msghdr		*msg;
-		int			memory;
-		int			len;
-	} ucopy;
-
 	u32	snd_wl1;	/* Sequence for window update		*/
 	u32	snd_wnd;	/* The window we expect to receive	*/
 	u32	max_window;	/* Maximal window ever seen from peer	*/
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 12d68335acd4..93f115cfc8f8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1244,17 +1244,6 @@ static inline bool tcp_checksum_complete(struct sk_buff *skb)
 		__tcp_checksum_complete(skb);
 }
 
-/* Prequeue for VJ style copy to user, combined with checksumming. */
-
-static inline void tcp_prequeue_init(struct tcp_sock *tp)
-{
-	tp->ucopy.task = NULL;
-	tp->ucopy.len = 0;
-	tp->ucopy.memory = 0;
-	skb_queue_head_init(&tp->ucopy.prequeue);
-}
-
-bool tcp_prequeue(struct sock *sk, struct sk_buff *skb);
 bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb);
 int tcp_filter(struct sock *sk, struct sk_buff *skb);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 71ce33decd97..62018ea6f45f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -400,7 +400,6 @@ void tcp_init_sock(struct sock *sk)
 
 	tp->out_of_order_queue = RB_ROOT;
 	tcp_init_xmit_timers(sk);
-	tcp_prequeue_init(tp);
 	INIT_LIST_HEAD(&tp->tsq_node);
 
 	icsk->icsk_rto = TCP_TIMEOUT_INIT;
@@ -1525,20 +1524,6 @@ static void tcp_cleanup_rbuf(struct sock *sk, int copied)
 		tcp_send_ack(sk);
 }
 
-static void tcp_prequeue_process(struct sock *sk)
-{
-	struct sk_buff *skb;
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPREQUEUED);
-
-	while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
-		sk_backlog_rcv(sk, skb);
-
-	/* Clear memory counter. */
-	tp->ucopy.memory = 0;
-}
-
 static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 {
 	struct sk_buff *skb;
@@ -1671,7 +1656,6 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	int err;
 	int target;		/* Read at least this many bytes */
 	long timeo;
-	struct task_struct *user_recv = NULL;
 	struct sk_buff *skb, *last;
 	u32 urg_hole = 0;
 
@@ -1806,51 +1790,6 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 
 		tcp_cleanup_rbuf(sk, copied);
 
-		if (!sysctl_tcp_low_latency && tp->ucopy.task == user_recv) {
-			/* Install new reader */
-			if (!user_recv && !(flags & (MSG_TRUNC | MSG_PEEK))) {
-				user_recv = current;
-				tp->ucopy.task = user_recv;
-				tp->ucopy.msg = msg;
-			}
-
-			tp->ucopy.len = len;
-
-			WARN_ON(tp->copied_seq != tp->rcv_nxt &&
-				!(flags & (MSG_PEEK | MSG_TRUNC)));
-
-			/* Ugly... If prequeue is not empty, we have to
-			 * process it before releasing socket, otherwise
-			 * order will be broken at second iteration.
-			 * More elegant solution is required!!!
-			 *
-			 * Look: we have the following (pseudo)queues:
-			 *
-			 * 1. packets in flight
-			 * 2. backlog
-			 * 3. prequeue
-			 * 4. receive_queue
-			 *
-			 * Each queue can be processed only if the next ones
-			 * are empty. At this point we have empty receive_queue.
-			 * But prequeue _can_ be not empty after 2nd iteration,
-			 * when we jumped to start of loop because backlog
-			 * processing added something to receive_queue.
-			 * We cannot release_sock(), because backlog contains
-			 * packets arrived _after_ prequeued ones.
-			 *
-			 * Shortly, algorithm is clear --- to process all
-			 * the queues in order. We could make it more directly,
-			 * requeueing packets from backlog to prequeue, if
-			 * is not empty. It is more elegant, but eats cycles,
-			 * unfortunately.
-			 */
-			if (!skb_queue_empty(&tp->ucopy.prequeue))
-				goto do_prequeue;
-
-			/* __ Set realtime policy in scheduler __ */
-		}
-
 		if (copied >= target) {
 			/* Do not sleep, just process backlog. */
 			release_sock(sk);
@@ -1859,31 +1798,6 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 			sk_wait_data(sk, &timeo, last);
 		}
 
-		if (user_recv) {
-			int chunk;
-
-			/* __ Restore normal policy in scheduler __ */
-
-			chunk = len - tp->ucopy.len;
-			if (chunk != 0) {
-				NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG, chunk);
-				len -= chunk;
-				copied += chunk;
-			}
-
-			if (tp->rcv_nxt == tp->copied_seq &&
-			    !skb_queue_empty(&tp->ucopy.prequeue)) {
-do_prequeue:
-				tcp_prequeue_process(sk);
-
-				chunk = len - tp->ucopy.len;
-				if (chunk != 0) {
-					NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
-					len -= chunk;
-					copied += chunk;
-				}
-			}
-		}
 		if ((flags & MSG_PEEK) &&
 		    (peek_seq - copied - urg_hole != tp->copied_seq)) {
 			net_dbg_ratelimited("TCP(%s:%d): Application bug, race in MSG_PEEK\n",
@@ -1955,25 +1869,6 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		break;
 	} while (len > 0);
 
-	if (user_recv) {
-		if (!skb_queue_empty(&tp->ucopy.prequeue)) {
-			int chunk;
-
-			tp->ucopy.len = copied > 0 ? len : 0;
-
-			tcp_prequeue_process(sk);
-
-			if (copied > 0 && (chunk = len - tp->ucopy.len) != 0) {
-				NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
-				len -= chunk;
-				copied += chunk;
-			}
-		}
-
-		tp->ucopy.task = NULL;
-		tp->ucopy.len = 0;
-	}
-
 	/* According to UNIX98, msg_name/msg_namelen are ignored
 	 * on connected socket. I was just happy when found this 8) --ANK
 	 */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index adc3f3e9468c..770ce6cb3eca 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4611,22 +4611,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 			goto out_of_window;
 
 		/* Ok. In sequence. In window. */
-		if (tp->ucopy.task == current &&
-		    tp->copied_seq == tp->rcv_nxt && tp->ucopy.len &&
-		    sock_owned_by_user(sk) && !tp->urg_data) {
-			int chunk = min_t(unsigned int, skb->len,
-					  tp->ucopy.len);
-
-			__set_current_state(TASK_RUNNING);
-
-			if (!skb_copy_datagram_msg(skb, 0, tp->ucopy.msg, chunk)) {
-				tp->ucopy.len -= chunk;
-				tp->copied_seq += chunk;
-				eaten = (chunk == skb->len);
-				tcp_rcv_space_adjust(sk);
-			}
-		}
-
 		if (eaten <= 0) {
 queue_and_out:
 			if (eaten < 0) {
@@ -5186,26 +5170,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, const struct tcphdr *t
 	}
 }
 
-static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-	int chunk = skb->len - hlen;
-	int err;
-
-	if (skb_csum_unnecessary(skb))
-		err = skb_copy_datagram_msg(skb, hlen, tp->ucopy.msg, chunk);
-	else
-		err = skb_copy_and_csum_datagram_msg(skb, hlen, tp->ucopy.msg);
-
-	if (!err) {
-		tp->ucopy.len -= chunk;
-		tp->copied_seq += chunk;
-		tcp_rcv_space_adjust(sk);
-	}
-
-	return err;
-}
-
 /* Accept RST for rcv_nxt - 1 after a FIN.
  * When tcp connections are abruptly terminated from Mac OSX (via ^C), a
  * FIN is sent followed by a RST packet. The RST is sent with the same
@@ -5446,32 +5410,6 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 			int eaten = 0;
 			bool fragstolen = false;
 
-			if (tp->ucopy.task == current &&
-			    tp->copied_seq == tp->rcv_nxt &&
-			    len - tcp_header_len <= tp->ucopy.len &&
-			    sock_owned_by_user(sk)) {
-				__set_current_state(TASK_RUNNING);
-
-				if (!tcp_copy_to_iovec(sk, skb, tcp_header_len)) {
-					/* Predicted packet is in window by definition.
-					 * seq == rcv_nxt and rcv_wup <= rcv_nxt.
-					 * Hence, check seq<=rcv_wup reduces to:
-					 */
-					if (tcp_header_len ==
-					    (sizeof(struct tcphdr) +
-					     TCPOLEN_TSTAMP_ALIGNED) &&
-					    tp->rcv_nxt == tp->rcv_wup)
-						tcp_store_ts_recent(tp);
-
-					tcp_rcv_rtt_measure_ts(sk, skb);
-
-					__skb_pull(skb, tcp_header_len);
-					tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
-					NET_INC_STATS(sock_net(sk),
-							LINUX_MIB_TCPHPHITSTOUSER);
-					eaten = 1;
-				}
-			}
 			if (!eaten) {
 				if (tcp_checksum_complete(skb))
 					goto csum_error;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 3a19ea28339f..a68eb4577d36 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1541,61 +1541,6 @@ void tcp_v4_early_demux(struct sk_buff *skb)
 	}
 }
 
-/* Packet is added to VJ-style prequeue for processing in process
- * context, if a reader task is waiting. Apparently, this exciting
- * idea (VJ's mail "Re: query about TCP header on tcp-ip" of 07 Sep 93)
- * failed somewhere. Latency? Burstiness? Well, at least now we will
- * see, why it failed. 8)8)				  --ANK
- *
- */
-bool tcp_prequeue(struct sock *sk, struct sk_buff *skb)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (sysctl_tcp_low_latency || !tp->ucopy.task)
-		return false;
-
-	if (skb->len <= tcp_hdrlen(skb) &&
-	    skb_queue_len(&tp->ucopy.prequeue) == 0)
-		return false;
-
-	/* Before escaping RCU protected region, we need to take care of skb
-	 * dst. Prequeue is only enabled for established sockets.
-	 * For such sockets, we might need the skb dst only to set sk->sk_rx_dst
-	 * Instead of doing full sk_rx_dst validity here, let's perform
-	 * an optimistic check.
-	 */
-	if (likely(sk->sk_rx_dst))
-		skb_dst_drop(skb);
-	else
-		skb_dst_force_safe(skb);
-
-	__skb_queue_tail(&tp->ucopy.prequeue, skb);
-	tp->ucopy.memory += skb->truesize;
-	if (skb_queue_len(&tp->ucopy.prequeue) >= 32 ||
-	    tp->ucopy.memory + atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) {
-		struct sk_buff *skb1;
-
-		BUG_ON(sock_owned_by_user(sk));
-		__NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPPREQUEUEDROPPED,
-				skb_queue_len(&tp->ucopy.prequeue));
-
-		while ((skb1 = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
-			sk_backlog_rcv(sk, skb1);
-
-		tp->ucopy.memory = 0;
-	} else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {
-		wake_up_interruptible_sync_poll(sk_sleep(sk),
-					   POLLIN | POLLRDNORM | POLLRDBAND);
-		if (!inet_csk_ack_scheduled(sk))
-			inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
-						  (3 * tcp_rto_min(sk)) / 4,
-						  TCP_RTO_MAX);
-	}
-	return true;
-}
-EXPORT_SYMBOL(tcp_prequeue);
-
 bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
@@ -1770,8 +1715,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	tcp_segs_in(tcp_sk(sk), skb);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
-		if (!tcp_prequeue(sk, skb))
-			ret = tcp_v4_do_rcv(sk, skb);
+		ret = tcp_v4_do_rcv(sk, skb);
 	} else if (tcp_add_backlog(sk, skb)) {
 		goto discard_and_relse;
 	}
@@ -1936,9 +1880,6 @@ void tcp_v4_destroy_sock(struct sock *sk)
 	}
 #endif
 
-	/* Clean prequeue, it must be empty really */
-	__skb_queue_purge(&tp->ucopy.prequeue);
-
 	/* Clean up a referenced TCP bind bucket. */
 	if (inet_csk(sk)->icsk_bind_hash)
 		inet_put_port(sk);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 0ff83c1637d8..188a6f31356d 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -445,7 +445,6 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 		newtp->snd_sml = newtp->snd_una =
 		newtp->snd_nxt = newtp->snd_up = treq->snt_isn + 1;
 
-		tcp_prequeue_init(newtp);
 		INIT_LIST_HEAD(&newtp->tsq_node);
 
 		tcp_init_wl(newtp, treq->rcv_isn);
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index c0feeeef962a..f753f9d2fee3 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -239,7 +239,6 @@ static int tcp_write_timeout(struct sock *sk)
 /* Called with BH disabled */
 void tcp_delack_timer_handler(struct sock *sk)
 {
-	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
 
 	sk_mem_reclaim_partial(sk);
@@ -254,17 +253,6 @@ void tcp_delack_timer_handler(struct sock *sk)
 	}
 	icsk->icsk_ack.pending &= ~ICSK_ACK_TIMER;
 
-	if (!skb_queue_empty(&tp->ucopy.prequeue)) {
-		struct sk_buff *skb;
-
-		__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSCHEDULERFAILED);
-
-		while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
-			sk_backlog_rcv(sk, skb);
-
-		tp->ucopy.memory = 0;
-	}
-
 	if (inet_csk_ack_scheduled(sk)) {
 		if (!icsk->icsk_ack.pingpong) {
 			/* Delayed ACK missed: inflate ATO. */
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 90a32576c3d0..ced5dcf37465 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1505,8 +1505,7 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 	tcp_segs_in(tcp_sk(sk), skb);
 	ret = 0;
 	if (!sock_owned_by_user(sk)) {
-		if (!tcp_prequeue(sk, skb))
-			ret = tcp_v6_do_rcv(sk, skb);
+		ret = tcp_v6_do_rcv(sk, skb);
 	} else if (tcp_add_backlog(sk, skb)) {
 		goto discard_and_relse;
 	}
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH net-next 2/6] tcp: reindent two spots after prequeue removal
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 1/6] tcp: remove prequeue support Florian Westphal
@ 2017-07-27 23:31 ` Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 3/6] tcp: remove low_latency sysctl Florian Westphal
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2017-07-27 23:31 UTC (permalink / raw)
  To: netdev
  Cc: ycheng, ncardwell, edumazet, soheil, weiwan, brakmo,
	Florian Westphal

These two branches are now always true, remove the conditional.
objdiff shows no changes.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/tcp_input.c | 50 +++++++++++++++++++++++---------------------------
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 770ce6cb3eca..87efde9f5a90 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4611,16 +4611,14 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 			goto out_of_window;
 
 		/* Ok. In sequence. In window. */
-		if (eaten <= 0) {
 queue_and_out:
-			if (eaten < 0) {
-				if (skb_queue_len(&sk->sk_receive_queue) == 0)
-					sk_forced_mem_schedule(sk, skb->truesize);
-				else if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
-					goto drop;
-			}
-			eaten = tcp_queue_rcv(sk, skb, 0, &fragstolen);
+		if (eaten < 0) {
+			if (skb_queue_len(&sk->sk_receive_queue) == 0)
+				sk_forced_mem_schedule(sk, skb->truesize);
+			else if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
+				goto drop;
 		}
+		eaten = tcp_queue_rcv(sk, skb, 0, &fragstolen);
 		tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
 		if (skb->len)
 			tcp_event_data_recv(sk, skb);
@@ -5410,30 +5408,28 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 			int eaten = 0;
 			bool fragstolen = false;
 
-			if (!eaten) {
-				if (tcp_checksum_complete(skb))
-					goto csum_error;
+			if (tcp_checksum_complete(skb))
+				goto csum_error;
 
-				if ((int)skb->truesize > sk->sk_forward_alloc)
-					goto step5;
+			if ((int)skb->truesize > sk->sk_forward_alloc)
+				goto step5;
 
-				/* Predicted packet is in window by definition.
-				 * seq == rcv_nxt and rcv_wup <= rcv_nxt.
-				 * Hence, check seq<=rcv_wup reduces to:
-				 */
-				if (tcp_header_len ==
-				    (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
-				    tp->rcv_nxt == tp->rcv_wup)
-					tcp_store_ts_recent(tp);
+			/* Predicted packet is in window by definition.
+			 * seq == rcv_nxt and rcv_wup <= rcv_nxt.
+			 * Hence, check seq<=rcv_wup reduces to:
+			 */
+			if (tcp_header_len ==
+			    (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
+			    tp->rcv_nxt == tp->rcv_wup)
+				tcp_store_ts_recent(tp);
 
-				tcp_rcv_rtt_measure_ts(sk, skb);
+			tcp_rcv_rtt_measure_ts(sk, skb);
 
-				NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPHITS);
+			NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPHITS);
 
-				/* Bulk data transfer: receiver */
-				eaten = tcp_queue_rcv(sk, skb, tcp_header_len,
-						      &fragstolen);
-			}
+			/* Bulk data transfer: receiver */
+			eaten = tcp_queue_rcv(sk, skb, tcp_header_len,
+					      &fragstolen);
 
 			tcp_event_data_recv(sk, skb);
 
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH net-next 3/6] tcp: remove low_latency sysctl
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 1/6] tcp: remove prequeue support Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 2/6] tcp: reindent two spots after prequeue removal Florian Westphal
@ 2017-07-27 23:31 ` Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 4/6] tcp: remove header prediction Florian Westphal
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2017-07-27 23:31 UTC (permalink / raw)
  To: netdev
  Cc: ycheng, ncardwell, edumazet, soheil, weiwan, brakmo,
	Florian Westphal

this option was used by the removed prequeue code, it has no effect
anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Documentation/networking/ip-sysctl.txt | 7 +------
 include/net/tcp.h                      | 1 -
 net/ipv4/sysctl_net_ipv4.c             | 3 +++
 net/ipv4/tcp_ipv4.c                    | 2 --
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f485d553e65c..84c9b8cee780 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -353,12 +353,7 @@ tcp_l3mdev_accept - BOOLEAN
 	compiled with CONFIG_NET_L3_MASTER_DEV.
 
 tcp_low_latency - BOOLEAN
-	If set, the TCP stack makes decisions that prefer lower
-	latency as opposed to higher throughput.  By default, this
-	option is not set meaning that higher throughput is preferred.
-	An example of an application where this default should be
-	changed would be a Beowulf compute cluster.
-	Default: 0
+	This is a legacy option, it has no effect anymore.
 
 tcp_max_orphans - INTEGER
 	Maximal number of TCP sockets not attached to any user file handle,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 93f115cfc8f8..8507c81fb0e9 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -256,7 +256,6 @@ extern int sysctl_tcp_rmem[3];
 extern int sysctl_tcp_app_win;
 extern int sysctl_tcp_adv_win_scale;
 extern int sysctl_tcp_frto;
-extern int sysctl_tcp_low_latency;
 extern int sysctl_tcp_nometrics_save;
 extern int sysctl_tcp_moderate_rcvbuf;
 extern int sysctl_tcp_tso_win_divisor;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 9bf809726066..0d3c038d7b04 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -45,6 +45,9 @@ static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
 static int ip_ping_group_range_min[] = { 0, 0 };
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
 
+/* obsolete */
+static int sysctl_tcp_low_latency __read_mostly;
+
 /* Update system visible IP port range */
 static void set_local_port_range(struct net *net, int range[2])
 {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a68eb4577d36..9b51663cd5a4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -85,8 +85,6 @@
 #include <crypto/hash.h>
 #include <linux/scatterlist.h>
 
-int sysctl_tcp_low_latency __read_mostly;
-
 #ifdef CONFIG_TCP_MD5SIG
 static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 			       __be32 daddr, __be32 saddr, const struct tcphdr *th);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH net-next 4/6] tcp: remove header prediction
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
                   ` (2 preceding siblings ...)
  2017-07-27 23:31 ` [RFC PATCH net-next 3/6] tcp: remove low_latency sysctl Florian Westphal
@ 2017-07-27 23:31 ` Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 5/6] tcp: remove CA_ACK_SLOWPATH Florian Westphal
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2017-07-27 23:31 UTC (permalink / raw)
  To: netdev
  Cc: ycheng, ncardwell, edumazet, soheil, weiwan, brakmo,
	Florian Westphal

Like prequeue, I am not sure this is overly useful nowadays.

If we receive a train of packets, GRO will aggregate them if the
headers are the same (HP predates GRO by several years) so we don't
get a per-packet benefit, only a per-aggregated-packet one.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/tcp.h      |   6 --
 include/net/tcp.h        |  23 ------
 net/ipv4/tcp.c           |   4 +-
 net/ipv4/tcp_input.c     | 192 +++--------------------------------------------
 net/ipv4/tcp_minisocks.c |   2 -
 net/ipv4/tcp_output.c    |   2 -
 6 files changed, 10 insertions(+), 219 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 32fb37cfb0d1..d7389ea36e10 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -148,12 +148,6 @@ struct tcp_sock {
 	u16	gso_segs;	/* Max number of segs per GSO packet	*/
 
 /*
- *	Header prediction flags
- *	0x5?10 << 16 + snd_wnd in net byte order
- */
-	__be32	pred_flags;
-
-/*
  *	RFC793 variables by their proper names. This means you can
  *	read the code and the spec side by side (and laugh ...)
  *	See RFC793 and RFC1122. The RFC writes these in capitals.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 8507c81fb0e9..8f11b82b5b5a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -631,29 +631,6 @@ static inline u32 __tcp_set_rto(const struct tcp_sock *tp)
 	return usecs_to_jiffies((tp->srtt_us >> 3) + tp->rttvar_us);
 }
 
-static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd)
-{
-	tp->pred_flags = htonl((tp->tcp_header_len << 26) |
-			       ntohl(TCP_FLAG_ACK) |
-			       snd_wnd);
-}
-
-static inline void tcp_fast_path_on(struct tcp_sock *tp)
-{
-	__tcp_fast_path_on(tp, tp->snd_wnd >> tp->rx_opt.snd_wscale);
-}
-
-static inline void tcp_fast_path_check(struct sock *sk)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (RB_EMPTY_ROOT(&tp->out_of_order_queue) &&
-	    tp->rcv_wnd &&
-	    atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf &&
-	    !tp->urg_data)
-		tcp_fast_path_on(tp);
-}
-
 /* Compute the actual rto_min value */
 static inline u32 tcp_rto_min(struct sock *sk)
 {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 62018ea6f45f..e022874d509f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1848,10 +1848,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		tcp_rcv_space_adjust(sk);
 
 skip_copy:
-		if (tp->urg_data && after(tp->copied_seq, tp->urg_seq)) {
+		if (tp->urg_data && after(tp->copied_seq, tp->urg_seq))
 			tp->urg_data = 0;
-			tcp_fast_path_check(sk);
-		}
 		if (used + offset < skb->len)
 			continue;
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87efde9f5a90..bfde9d7d210e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -103,7 +103,6 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 #define FLAG_DATA_SACKED	0x20 /* New SACK.				*/
 #define FLAG_ECE		0x40 /* ECE in this ACK				*/
 #define FLAG_LOST_RETRANS	0x80 /* This ACK marks some retransmission lost */
-#define FLAG_SLOWPATH		0x100 /* Do not skip RFC checks for window update.*/
 #define FLAG_ORIG_SACK_ACKED	0x200 /* Never retransmitted data are (s)acked	*/
 #define FLAG_SND_UNA_ADVANCED	0x400 /* Snd_una was changed (!= FLAG_DATA_ACKED) */
 #define FLAG_DSACKING_ACK	0x800 /* SACK blocks contained D-SACK info */
@@ -3367,12 +3366,6 @@ static int tcp_ack_update_window(struct sock *sk, const struct sk_buff *skb, u32
 		if (tp->snd_wnd != nwin) {
 			tp->snd_wnd = nwin;
 
-			/* Note, it is the only place, where
-			 * fast path is recovered for sending TCP.
-			 */
-			tp->pred_flags = 0;
-			tcp_fast_path_check(sk);
-
 			if (tcp_send_head(sk))
 				tcp_slow_start_after_idle_check(sk);
 
@@ -3597,19 +3590,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if (flag & FLAG_UPDATE_TS_RECENT)
 		tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
 
-	if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
-		/* Window is constant, pure forward advance.
-		 * No more checks are required.
-		 * Note, we use the fact that SND.UNA>=SND.WL2.
-		 */
-		tcp_update_wl(tp, ack_seq);
-		tcp_snd_una_update(tp, ack);
-		flag |= FLAG_WIN_UPDATE;
-
-		tcp_in_ack_event(sk, CA_ACK_WIN_UPDATE);
-
-		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPACKS);
-	} else {
+	{
 		u32 ack_ev_flags = CA_ACK_SLOWPATH;
 
 		if (ack_seq != TCP_SKB_CB(skb)->end_seq)
@@ -4398,8 +4379,6 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		return;
 	}
 
-	/* Disable header prediction. */
-	tp->pred_flags = 0;
 	inet_csk_schedule_ack(sk);
 
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
@@ -4638,8 +4617,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		if (tp->rx_opt.num_sacks)
 			tcp_sack_remove(tp);
 
-		tcp_fast_path_check(sk);
-
 		if (eaten > 0)
 			kfree_skb_partial(skb, fragstolen);
 		if (!sock_flag(sk, SOCK_DEAD))
@@ -4965,7 +4942,6 @@ static int tcp_prune_queue(struct sock *sk)
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_RCVPRUNED);
 
 	/* Massive buffer overcommit. */
-	tp->pred_flags = 0;
 	return -1;
 }
 
@@ -5137,9 +5113,6 @@ static void tcp_check_urg(struct sock *sk, const struct tcphdr *th)
 
 	tp->urg_data = TCP_URG_NOTYET;
 	tp->urg_seq = ptr;
-
-	/* Disable header prediction. */
-	tp->pred_flags = 0;
 }
 
 /* This is the 'fast' part of urgent handling. */
@@ -5298,26 +5271,6 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
 
 /*
  *	TCP receive function for the ESTABLISHED state.
- *
- *	It is split into a fast path and a slow path. The fast path is
- * 	disabled when:
- *	- A zero window was announced from us - zero window probing
- *        is only handled properly in the slow path.
- *	- Out of order segments arrived.
- *	- Urgent data is expected.
- *	- There is no buffer space left
- *	- Unexpected TCP flags/window values/header lengths are received
- *	  (detected by checking the TCP header against pred_flags)
- *	- Data is sent in both directions. Fast path only supports pure senders
- *	  or pure receivers (this means either the sequence number or the ack
- *	  value must stay constant)
- *	- Unexpected TCP option.
- *
- *	When these conditions are not satisfied it drops into a standard
- *	receive procedure patterned after RFC793 to handle all cases.
- *	The first three cases are guaranteed by proper pred_flags setting,
- *	the rest is checked inline. Fast processing is turned on in
- *	tcp_data_queue when everything is OK.
  */
 void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 			 const struct tcphdr *th)
@@ -5328,144 +5281,19 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
 	tcp_mstamp_refresh(tp);
 	if (unlikely(!sk->sk_rx_dst))
 		inet_csk(sk)->icsk_af_ops->sk_rx_dst_set(sk, skb);
-	/*
-	 *	Header prediction.
-	 *	The code loosely follows the one in the famous
-	 *	"30 instruction TCP receive" Van Jacobson mail.
-	 *
-	 *	Van's trick is to deposit buffers into socket queue
-	 *	on a device interrupt, to call tcp_recv function
-	 *	on the receive process context and checksum and copy
-	 *	the buffer to user space. smart...
-	 *
-	 *	Our current scheme is not silly either but we take the
-	 *	extra cost of the net_bh soft interrupt processing...
-	 *	We do checksum and copy also but from device to kernel.
-	 */
 
 	tp->rx_opt.saw_tstamp = 0;
 
-	/*	pred_flags is 0xS?10 << 16 + snd_wnd
-	 *	if header_prediction is to be made
-	 *	'S' will always be tp->tcp_header_len >> 2
-	 *	'?' will be 0 for the fast path, otherwise pred_flags is 0 to
-	 *  turn it off	(when there are holes in the receive
-	 *	 space for instance)
-	 *	PSH flag is ignored.
-	 */
-
-	if ((tcp_flag_word(th) & TCP_HP_BITS) == tp->pred_flags &&
-	    TCP_SKB_CB(skb)->seq == tp->rcv_nxt &&
-	    !after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
-		int tcp_header_len = tp->tcp_header_len;
-
-		/* Timestamp header prediction: tcp_header_len
-		 * is automatically equal to th->doff*4 due to pred_flags
-		 * match.
-		 */
-
-		/* Check timestamp */
-		if (tcp_header_len == sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) {
-			/* No? Slow path! */
-			if (!tcp_parse_aligned_timestamp(tp, th))
-				goto slow_path;
-
-			/* If PAWS failed, check it more carefully in slow path */
-			if ((s32)(tp->rx_opt.rcv_tsval - tp->rx_opt.ts_recent) < 0)
-				goto slow_path;
-
-			/* DO NOT update ts_recent here, if checksum fails
-			 * and timestamp was corrupted part, it will result
-			 * in a hung connection since we will drop all
-			 * future packets due to the PAWS test.
-			 */
-		}
-
-		if (len <= tcp_header_len) {
-			/* Bulk data transfer: sender */
-			if (len == tcp_header_len) {
-				/* Predicted packet is in window by definition.
-				 * seq == rcv_nxt and rcv_wup <= rcv_nxt.
-				 * Hence, check seq<=rcv_wup reduces to:
-				 */
-				if (tcp_header_len ==
-				    (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
-				    tp->rcv_nxt == tp->rcv_wup)
-					tcp_store_ts_recent(tp);
-
-				/* We know that such packets are checksummed
-				 * on entry.
-				 */
-				tcp_ack(sk, skb, 0);
-				__kfree_skb(skb);
-				tcp_data_snd_check(sk);
-				return;
-			} else { /* Header too small */
-				TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
-				goto discard;
-			}
-		} else {
-			int eaten = 0;
-			bool fragstolen = false;
-
-			if (tcp_checksum_complete(skb))
-				goto csum_error;
-
-			if ((int)skb->truesize > sk->sk_forward_alloc)
-				goto step5;
-
-			/* Predicted packet is in window by definition.
-			 * seq == rcv_nxt and rcv_wup <= rcv_nxt.
-			 * Hence, check seq<=rcv_wup reduces to:
-			 */
-			if (tcp_header_len ==
-			    (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
-			    tp->rcv_nxt == tp->rcv_wup)
-				tcp_store_ts_recent(tp);
-
-			tcp_rcv_rtt_measure_ts(sk, skb);
-
-			NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPHITS);
-
-			/* Bulk data transfer: receiver */
-			eaten = tcp_queue_rcv(sk, skb, tcp_header_len,
-					      &fragstolen);
-
-			tcp_event_data_recv(sk, skb);
-
-			if (TCP_SKB_CB(skb)->ack_seq != tp->snd_una) {
-				/* Well, only one small jumplet in fast path... */
-				tcp_ack(sk, skb, FLAG_DATA);
-				tcp_data_snd_check(sk);
-				if (!inet_csk_ack_scheduled(sk))
-					goto no_ack;
-			}
-
-			__tcp_ack_snd_check(sk, 0);
-no_ack:
-			if (eaten)
-				kfree_skb_partial(skb, fragstolen);
-			sk->sk_data_ready(sk);
-			return;
-		}
-	}
-
-slow_path:
 	if (len < (th->doff << 2) || tcp_checksum_complete(skb))
 		goto csum_error;
 
 	if (!th->ack && !th->rst && !th->syn)
 		goto discard;
 
-	/*
-	 *	Standard slow path.
-	 */
-
 	if (!tcp_validate_incoming(sk, skb, th, 1))
 		return;
 
-step5:
-	if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0)
+	if (tcp_ack(sk, skb, FLAG_UPDATE_TS_RECENT) < 0)
 		goto discard;
 
 	tcp_rcv_rtt_measure_ts(sk, skb);
@@ -5519,11 +5347,10 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
 	if (sock_flag(sk, SOCK_KEEPOPEN))
 		inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));
 
-	if (!tp->rx_opt.snd_wscale)
-		__tcp_fast_path_on(tp, tp->snd_wnd);
-	else
-		tp->pred_flags = 0;
-
+	if (!sock_flag(sk, SOCK_DEAD)) {
+		sk->sk_state_change(sk);
+		sk_wake_async(sk, SOCK_WAKE_IO, POLL_OUT);
+	}
 }
 
 static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
@@ -5652,7 +5479,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		tcp_ecn_rcv_synack(tp, th);
 
 		tcp_init_wl(tp, TCP_SKB_CB(skb)->seq);
-		tcp_ack(sk, skb, FLAG_SLOWPATH);
+		tcp_ack(sk, skb, 0);
 
 		/* Ok.. it's good. Set up sequence numbers and
 		 * move to established.
@@ -5888,8 +5715,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		return 0;
 
 	/* step 5: check the ACK field */
-	acceptable = tcp_ack(sk, skb, FLAG_SLOWPATH |
-				      FLAG_UPDATE_TS_RECENT |
+
+	acceptable = tcp_ack(sk, skb, FLAG_UPDATE_TS_RECENT |
 				      FLAG_NO_CHALLENGE_ACK) > 0;
 
 	if (!acceptable) {
@@ -5957,7 +5784,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		tp->lsndtime = tcp_jiffies32;
 
 		tcp_initialize_rcv_mss(sk);
-		tcp_fast_path_on(tp);
 		break;
 
 	case TCP_FIN_WAIT1: {
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 188a6f31356d..1537b87c657f 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -436,8 +436,6 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 		struct tcp_sock *newtp = tcp_sk(newsk);
 
 		/* Now setup tcp_sock */
-		newtp->pred_flags = 0;
-
 		newtp->rcv_wup = newtp->copied_seq =
 		newtp->rcv_nxt = treq->rcv_isn + 1;
 		newtp->segs_in = 1;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 886d874775df..8380464aead1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -295,9 +295,7 @@ static u16 tcp_select_window(struct sock *sk)
 	/* RFC1323 scaling applied */
 	new_win >>= tp->rx_opt.rcv_wscale;
 
-	/* If we advertise zero window, disable fast path. */
 	if (new_win == 0) {
-		tp->pred_flags = 0;
 		if (old_win)
 			NET_INC_STATS(sock_net(sk),
 				      LINUX_MIB_TCPTOZEROWINDOWADV);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH net-next 5/6] tcp: remove CA_ACK_SLOWPATH
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
                   ` (3 preceding siblings ...)
  2017-07-27 23:31 ` [RFC PATCH net-next 4/6] tcp: remove header prediction Florian Westphal
@ 2017-07-27 23:31 ` Florian Westphal
  2017-07-27 23:31 ` [RFC PATCH net-next 6/6] tcp: remove unused mib counters Florian Westphal
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2017-07-27 23:31 UTC (permalink / raw)
  To: netdev
  Cc: ycheng, ncardwell, edumazet, soheil, weiwan, brakmo,
	Florian Westphal

re-indent tcp_ack, and remove CA_ACK_SLOWPATH; it is always set now.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/tcp.h       |  5 ++---
 net/ipv4/tcp_input.c    | 35 ++++++++++++++++-------------------
 net/ipv4/tcp_westwood.c | 31 ++++---------------------------
 3 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 8f11b82b5b5a..3ecb62811004 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -880,9 +880,8 @@ enum tcp_ca_event {
 
 /* Information about inbound ACK, passed to cong_ops->in_ack_event() */
 enum tcp_ca_ack_event_flags {
-	CA_ACK_SLOWPATH		= (1 << 0),	/* In slow path processing */
-	CA_ACK_WIN_UPDATE	= (1 << 1),	/* ACK updated window */
-	CA_ACK_ECE		= (1 << 2),	/* ECE bit is set on ack */
+	CA_ACK_WIN_UPDATE	= (1 << 0),	/* ACK updated window */
+	CA_ACK_ECE		= (1 << 1),	/* ECE bit is set on ack */
 };
 
 /*
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bfde9d7d210e..af0a98d54b62 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3547,6 +3547,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	u32 lost = tp->lost;
 	int acked = 0; /* Number of packets newly acked */
 	int rexmit = REXMIT_NONE; /* Flag to (re)transmit to recover losses */
+	u32 ack_ev_flags = 0;
 
 	sack_state.first_sackt = 0;
 	sack_state.rate = &rs;
@@ -3590,30 +3591,26 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	if (flag & FLAG_UPDATE_TS_RECENT)
 		tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
 
-	{
-		u32 ack_ev_flags = CA_ACK_SLOWPATH;
-
-		if (ack_seq != TCP_SKB_CB(skb)->end_seq)
-			flag |= FLAG_DATA;
-		else
-			NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPUREACKS);
+	if (ack_seq != TCP_SKB_CB(skb)->end_seq)
+		flag |= FLAG_DATA;
+	else
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPUREACKS);
 
-		flag |= tcp_ack_update_window(sk, skb, ack, ack_seq);
+	flag |= tcp_ack_update_window(sk, skb, ack, ack_seq);
 
-		if (TCP_SKB_CB(skb)->sacked)
-			flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
-							&sack_state);
+	if (TCP_SKB_CB(skb)->sacked)
+		flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
+						&sack_state);
 
-		if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb))) {
-			flag |= FLAG_ECE;
-			ack_ev_flags |= CA_ACK_ECE;
-		}
+	if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb))) {
+		flag |= FLAG_ECE;
+		ack_ev_flags = CA_ACK_ECE;
+	}
 
-		if (flag & FLAG_WIN_UPDATE)
-			ack_ev_flags |= CA_ACK_WIN_UPDATE;
+	if (flag & FLAG_WIN_UPDATE)
+		ack_ev_flags |= CA_ACK_WIN_UPDATE;
 
-		tcp_in_ack_event(sk, ack_ev_flags);
-	}
+	tcp_in_ack_event(sk, ack_ev_flags);
 
 	/* We passed data and got it acked, remove any soft error
 	 * log. Something worked...
diff --git a/net/ipv4/tcp_westwood.c b/net/ipv4/tcp_westwood.c
index bec9cafbe3f9..e5de84310949 100644
--- a/net/ipv4/tcp_westwood.c
+++ b/net/ipv4/tcp_westwood.c
@@ -154,24 +154,6 @@ static inline void update_rtt_min(struct westwood *w)
 }
 
 /*
- * @westwood_fast_bw
- * It is called when we are in fast path. In particular it is called when
- * header prediction is successful. In such case in fact update is
- * straight forward and doesn't need any particular care.
- */
-static inline void westwood_fast_bw(struct sock *sk)
-{
-	const struct tcp_sock *tp = tcp_sk(sk);
-	struct westwood *w = inet_csk_ca(sk);
-
-	westwood_update_window(sk);
-
-	w->bk += tp->snd_una - w->snd_una;
-	w->snd_una = tp->snd_una;
-	update_rtt_min(w);
-}
-
-/*
  * @westwood_acked_count
  * This function evaluates cumul_ack for evaluating bk in case of
  * delayed or partial acks.
@@ -223,17 +205,12 @@ static u32 tcp_westwood_bw_rttmin(const struct sock *sk)
 
 static void tcp_westwood_ack(struct sock *sk, u32 ack_flags)
 {
-	if (ack_flags & CA_ACK_SLOWPATH) {
-		struct westwood *w = inet_csk_ca(sk);
-
-		westwood_update_window(sk);
-		w->bk += westwood_acked_count(sk);
+	struct westwood *w = inet_csk_ca(sk);
 
-		update_rtt_min(w);
-		return;
-	}
+	westwood_update_window(sk);
+	w->bk += westwood_acked_count(sk);
 
-	westwood_fast_bw(sk);
+	update_rtt_min(w);
 }
 
 static void tcp_westwood_event(struct sock *sk, enum tcp_ca_event event)
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH net-next 6/6] tcp: remove unused mib counters
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
                   ` (4 preceding siblings ...)
  2017-07-27 23:31 ` [RFC PATCH net-next 5/6] tcp: remove CA_ACK_SLOWPATH Florian Westphal
@ 2017-07-27 23:31 ` Florian Westphal
  2017-07-28 19:19 ` [RFC net-next 0/6] tcp: remove prequeue and header prediction Yuchung Cheng
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2017-07-27 23:31 UTC (permalink / raw)
  To: netdev
  Cc: ycheng, ncardwell, edumazet, soheil, weiwan, brakmo,
	Florian Westphal

was used by tcp prequeue, TCPFORWARDRETRANS use was removed in january.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/uapi/linux/snmp.h | 8 --------
 net/ipv4/proc.c           | 8 --------
 2 files changed, 16 deletions(-)

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index d85693295798..73c15719fd35 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -185,13 +185,7 @@ enum
 	LINUX_MIB_LISTENOVERFLOWS,		/* ListenOverflows */
 	LINUX_MIB_LISTENDROPS,			/* ListenDrops */
 	LINUX_MIB_TCPPREQUEUED,			/* TCPPrequeued */
-	LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG,	/* TCPDirectCopyFromBacklog */
-	LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE,	/* TCPDirectCopyFromPrequeue */
-	LINUX_MIB_TCPPREQUEUEDROPPED,		/* TCPPrequeueDropped */
-	LINUX_MIB_TCPHPHITS,			/* TCPHPHits */
-	LINUX_MIB_TCPHPHITSTOUSER,		/* TCPHPHitsToUser */
 	LINUX_MIB_TCPPUREACKS,			/* TCPPureAcks */
-	LINUX_MIB_TCPHPACKS,			/* TCPHPAcks */
 	LINUX_MIB_TCPRENORECOVERY,		/* TCPRenoRecovery */
 	LINUX_MIB_TCPSACKRECOVERY,		/* TCPSackRecovery */
 	LINUX_MIB_TCPSACKRENEGING,		/* TCPSACKReneging */
@@ -208,14 +202,12 @@ enum
 	LINUX_MIB_TCPSACKFAILURES,		/* TCPSackFailures */
 	LINUX_MIB_TCPLOSSFAILURES,		/* TCPLossFailures */
 	LINUX_MIB_TCPFASTRETRANS,		/* TCPFastRetrans */
-	LINUX_MIB_TCPFORWARDRETRANS,		/* TCPForwardRetrans */
 	LINUX_MIB_TCPSLOWSTARTRETRANS,		/* TCPSlowStartRetrans */
 	LINUX_MIB_TCPTIMEOUTS,			/* TCPTimeouts */
 	LINUX_MIB_TCPLOSSPROBES,		/* TCPLossProbes */
 	LINUX_MIB_TCPLOSSPROBERECOVERY,		/* TCPLossProbeRecovery */
 	LINUX_MIB_TCPRENORECOVERYFAIL,		/* TCPRenoRecoveryFail */
 	LINUX_MIB_TCPSACKRECOVERYFAIL,		/* TCPSackRecoveryFail */
-	LINUX_MIB_TCPSCHEDULERFAILED,		/* TCPSchedulerFailed */
 	LINUX_MIB_TCPRCVCOLLAPSED,		/* TCPRcvCollapsed */
 	LINUX_MIB_TCPDSACKOLDSENT,		/* TCPDSACKOldSent */
 	LINUX_MIB_TCPDSACKOFOSENT,		/* TCPDSACKOfoSent */
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 43eb6567b3a0..e2c91375cadc 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -207,13 +207,7 @@ static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("ListenOverflows", LINUX_MIB_LISTENOVERFLOWS),
 	SNMP_MIB_ITEM("ListenDrops", LINUX_MIB_LISTENDROPS),
 	SNMP_MIB_ITEM("TCPPrequeued", LINUX_MIB_TCPPREQUEUED),
-	SNMP_MIB_ITEM("TCPDirectCopyFromBacklog", LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG),
-	SNMP_MIB_ITEM("TCPDirectCopyFromPrequeue", LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE),
-	SNMP_MIB_ITEM("TCPPrequeueDropped", LINUX_MIB_TCPPREQUEUEDROPPED),
-	SNMP_MIB_ITEM("TCPHPHits", LINUX_MIB_TCPHPHITS),
-	SNMP_MIB_ITEM("TCPHPHitsToUser", LINUX_MIB_TCPHPHITSTOUSER),
 	SNMP_MIB_ITEM("TCPPureAcks", LINUX_MIB_TCPPUREACKS),
-	SNMP_MIB_ITEM("TCPHPAcks", LINUX_MIB_TCPHPACKS),
 	SNMP_MIB_ITEM("TCPRenoRecovery", LINUX_MIB_TCPRENORECOVERY),
 	SNMP_MIB_ITEM("TCPSackRecovery", LINUX_MIB_TCPSACKRECOVERY),
 	SNMP_MIB_ITEM("TCPSACKReneging", LINUX_MIB_TCPSACKRENEGING),
@@ -230,14 +224,12 @@ static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPSackFailures", LINUX_MIB_TCPSACKFAILURES),
 	SNMP_MIB_ITEM("TCPLossFailures", LINUX_MIB_TCPLOSSFAILURES),
 	SNMP_MIB_ITEM("TCPFastRetrans", LINUX_MIB_TCPFASTRETRANS),
-	SNMP_MIB_ITEM("TCPForwardRetrans", LINUX_MIB_TCPFORWARDRETRANS),
 	SNMP_MIB_ITEM("TCPSlowStartRetrans", LINUX_MIB_TCPSLOWSTARTRETRANS),
 	SNMP_MIB_ITEM("TCPTimeouts", LINUX_MIB_TCPTIMEOUTS),
 	SNMP_MIB_ITEM("TCPLossProbes", LINUX_MIB_TCPLOSSPROBES),
 	SNMP_MIB_ITEM("TCPLossProbeRecovery", LINUX_MIB_TCPLOSSPROBERECOVERY),
 	SNMP_MIB_ITEM("TCPRenoRecoveryFail", LINUX_MIB_TCPRENORECOVERYFAIL),
 	SNMP_MIB_ITEM("TCPSackRecoveryFail", LINUX_MIB_TCPSACKRECOVERYFAIL),
-	SNMP_MIB_ITEM("TCPSchedulerFailed", LINUX_MIB_TCPSCHEDULERFAILED),
 	SNMP_MIB_ITEM("TCPRcvCollapsed", LINUX_MIB_TCPRCVCOLLAPSED),
 	SNMP_MIB_ITEM("TCPDSACKOldSent", LINUX_MIB_TCPDSACKOLDSENT),
 	SNMP_MIB_ITEM("TCPDSACKOfoSent", LINUX_MIB_TCPDSACKOFOSENT),
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC net-next 0/6] tcp: remove prequeue and header prediction
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
                   ` (5 preceding siblings ...)
  2017-07-27 23:31 ` [RFC PATCH net-next 6/6] tcp: remove unused mib counters Florian Westphal
@ 2017-07-28 19:19 ` Yuchung Cheng
  2017-07-29 22:22 ` David Miller
  2017-07-30  2:25 ` Neal Cardwell
  8 siblings, 0 replies; 13+ messages in thread
From: Yuchung Cheng @ 2017-07-28 19:19 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, Neal Cardwell, Eric Dumazet, Soheil Hassas Yeganeh,
	Wei Wang, Lawrence Brakmo

On Thu, Jul 27, 2017 at 4:31 PM, Florian Westphal <fw@strlen.de> wrote:
>
> This RFC removes tcp prequeueing and header prediction support.
>
> After a hallway discussion with Eric Dumazet some
> maybe-not-so-useful-anymore TCP stack features came up, HP and
> Prequeue among these.
>
> So this RFC proposes to axe both.
>
> In brief, TCP prequeue assumes a single-process-blocking-read
> design, which is not that common anymore, and the most frequently
> used high-performance networking program that does this is netperf :)
>
> With more commong (e)poll designs, prequeue doesn't work.
>
> The idea behind prequeueing isn't so bad in itself; it moves
> part of tcp processing -- including ack processing (including
> retransmit queue processing) into process context.
> However, removing it would not just avoid some code, for most
> programs it elimiates dead code.
>
> As processing then always occurs in BH context, it would allow us
> to experiment e.g. with bulk-freeing of skb heads when a packet acks
> data on the retransmit queue.
>
> Header prediction is also less useful nowadays.
> For packet trains, GRO will aggregate packets so we do not get
> a per-packet benefit.
> Header prediction will also break down with light packet loss due to SACK.
>
> So, In short: What do others think?
+2 for this move. Will review the patches soon.

>
> Florian Westphal (6):
>       tcp: remove prequeue support
>       tcp: reindent two spots after prequeue removal
>       tcp: remove low_latency sysctl
>       tcp: remove header prediction
>       tcp: remove CA_ACK_SLOWPATH
>       tcp: remove unused mib counters
>
>  Documentation/networking/ip-sysctl.txt |    7
>  include/linux/tcp.h                    |   15 -
>  include/net/tcp.h                      |   40 ----
>  include/uapi/linux/snmp.h              |    8
>  net/ipv4/proc.c                        |    8
>  net/ipv4/sysctl_net_ipv4.c             |    3
>  net/ipv4/tcp.c                         |  109 -----------
>  net/ipv4/tcp_input.c                   |  303 +++------------------------------
>  net/ipv4/tcp_ipv4.c                    |   63 ------
>  net/ipv4/tcp_minisocks.c               |    3
>  net/ipv4/tcp_output.c                  |    2
>  net/ipv4/tcp_timer.c                   |   12 -
>  net/ipv4/tcp_westwood.c                |   31 ---
>  net/ipv6/tcp_ipv6.c                    |    3
>  14 files changed, 43 insertions(+), 564 deletions(-)
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC net-next 0/6] tcp: remove prequeue and header prediction
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
                   ` (6 preceding siblings ...)
  2017-07-28 19:19 ` [RFC net-next 0/6] tcp: remove prequeue and header prediction Yuchung Cheng
@ 2017-07-29 22:22 ` David Miller
  2017-07-30  2:25 ` Neal Cardwell
  8 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2017-07-29 22:22 UTC (permalink / raw)
  To: fw; +Cc: netdev, ycheng, ncardwell, edumazet, soheil, weiwan, brakmo

From: Florian Westphal <fw@strlen.de>
Date: Fri, 28 Jul 2017 01:31:11 +0200

> This RFC removes tcp prequeueing and header prediction support.
> 
> After a hallway discussion with Eric Dumazet some
> maybe-not-so-useful-anymore TCP stack features came up, HP and
> Prequeue among these.
> 
> So this RFC proposes to axe both.
> 
> In brief, TCP prequeue assumes a single-process-blocking-read
> design, which is not that common anymore, and the most frequently
> used high-performance networking program that does this is netperf :)
> 
> With more commong (e)poll designs, prequeue doesn't work.
> 
> The idea behind prequeueing isn't so bad in itself; it moves
> part of tcp processing -- including ack processing (including
> retransmit queue processing) into process context.
> However, removing it would not just avoid some code, for most
> programs it elimiates dead code.
> 
> As processing then always occurs in BH context, it would allow us
> to experiment e.g. with bulk-freeing of skb heads when a packet acks
> data on the retransmit queue.
> 
> Header prediction is also less useful nowadays.
> For packet trains, GRO will aggregate packets so we do not get
> a per-packet benefit.
> Header prediction will also break down with light packet loss due to SACK.
> 
> So, In short: What do others think?

I have no objections to any of this. :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC net-next 0/6] tcp: remove prequeue and header prediction
  2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
                   ` (7 preceding siblings ...)
  2017-07-29 22:22 ` David Miller
@ 2017-07-30  2:25 ` Neal Cardwell
  2017-07-31 20:04   ` Yuchung Cheng
  8 siblings, 1 reply; 13+ messages in thread
From: Neal Cardwell @ 2017-07-30  2:25 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Netdev, Yuchung Cheng, Eric Dumazet, Soheil Hassas Yeganeh,
	Wei Wang, Lawrence Brakmo, David Miller, Lorenzo Colitti

On Thu, Jul 27, 2017 at 7:31 PM, Florian Westphal <fw@strlen.de> wrote:
> This RFC removes tcp prequeueing and header prediction support.
>
> After a hallway discussion with Eric Dumazet some
> maybe-not-so-useful-anymore TCP stack features came up, HP and
> Prequeue among these.
>
> So this RFC proposes to axe both.
>
> In brief, TCP prequeue assumes a single-process-blocking-read
> design, which is not that common anymore, and the most frequently
> used high-performance networking program that does this is netperf :)
>
> With more commong (e)poll designs, prequeue doesn't work.
>
> The idea behind prequeueing isn't so bad in itself; it moves
> part of tcp processing -- including ack processing (including
> retransmit queue processing) into process context.
> However, removing it would not just avoid some code, for most
> programs it elimiates dead code.
>
> As processing then always occurs in BH context, it would allow us
> to experiment e.g. with bulk-freeing of skb heads when a packet acks
> data on the retransmit queue.
>
> Header prediction is also less useful nowadays.
> For packet trains, GRO will aggregate packets so we do not get
> a per-packet benefit.
> Header prediction will also break down with light packet loss due to SACK.
>
> So, In short: What do others think?
>
> Florian Westphal (6):
>       tcp: remove prequeue support
>       tcp: reindent two spots after prequeue removal
>       tcp: remove low_latency sysctl
>       tcp: remove header prediction
>       tcp: remove CA_ACK_SLOWPATH
>       tcp: remove unused mib counters
>
>  Documentation/networking/ip-sysctl.txt |    7
>  include/linux/tcp.h                    |   15 -
>  include/net/tcp.h                      |   40 ----
>  include/uapi/linux/snmp.h              |    8
>  net/ipv4/proc.c                        |    8
>  net/ipv4/sysctl_net_ipv4.c             |    3
>  net/ipv4/tcp.c                         |  109 -----------
>  net/ipv4/tcp_input.c                   |  303 +++------------------------------
>  net/ipv4/tcp_ipv4.c                    |   63 ------
>  net/ipv4/tcp_minisocks.c               |    3
>  net/ipv4/tcp_output.c                  |    2
>  net/ipv4/tcp_timer.c                   |   12 -
>  net/ipv4/tcp_westwood.c                |   31 ---
>  net/ipv6/tcp_ipv6.c                    |    3
>  14 files changed, 43 insertions(+), 564 deletions(-)
>

I unconditionally support the removal of prequeue support.

For the header prediction code: IMHO before removing the header
prediction code it would be useful to do some kind of before-and-after
benchmarking on a low-powered device where battery life is the main
concern. I am thinking about ARM-based cell phones, IoT/embedded
devices, raspberry pi, etc. You mention GRO helping to make header
prediction obsolete, but in those devices packets arrive so slowly
that probably GRO does not help. With slow CPUs and battery life the
main concern, it seems conceivable to me that header prediction might
still be a win (and worth keeping, since the complexity cost is
largely in the past; the maintenance overhead has been low). Just a
thought.

thanks,
neal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC net-next 0/6] tcp: remove prequeue and header prediction
  2017-07-30  2:25 ` Neal Cardwell
@ 2017-07-31 20:04   ` Yuchung Cheng
  2017-07-31 20:22     ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Yuchung Cheng @ 2017-07-31 20:04 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Florian Westphal, Netdev, Eric Dumazet, Soheil Hassas Yeganeh,
	Wei Wang, Lawrence Brakmo, David Miller, Lorenzo Colitti,
	Van Jacobson

On Sat, Jul 29, 2017 at 7:25 PM, Neal Cardwell <ncardwell@google.com> wrote:
> On Thu, Jul 27, 2017 at 7:31 PM, Florian Westphal <fw@strlen.de> wrote:
>> This RFC removes tcp prequeueing and header prediction support.
>>
>> After a hallway discussion with Eric Dumazet some
>> maybe-not-so-useful-anymore TCP stack features came up, HP and
>> Prequeue among these.
>>
>> So this RFC proposes to axe both.
>>
>> In brief, TCP prequeue assumes a single-process-blocking-read
>> design, which is not that common anymore, and the most frequently
>> used high-performance networking program that does this is netperf :)
>>
>> With more commong (e)poll designs, prequeue doesn't work.
>>
>> The idea behind prequeueing isn't so bad in itself; it moves
>> part of tcp processing -- including ack processing (including
>> retransmit queue processing) into process context.
>> However, removing it would not just avoid some code, for most
>> programs it elimiates dead code.
>>
>> As processing then always occurs in BH context, it would allow us
>> to experiment e.g. with bulk-freeing of skb heads when a packet acks
>> data on the retransmit queue.
>>
>> Header prediction is also less useful nowadays.
>> For packet trains, GRO will aggregate packets so we do not get
>> a per-packet benefit.
>> Header prediction will also break down with light packet loss due to SACK.
>>
>> So, In short: What do others think?
>>
>> Florian Westphal (6):
>>       tcp: remove prequeue support
>>       tcp: reindent two spots after prequeue removal
>>       tcp: remove low_latency sysctl
>>       tcp: remove header prediction
>>       tcp: remove CA_ACK_SLOWPATH
>>       tcp: remove unused mib counters
>>
>>  Documentation/networking/ip-sysctl.txt |    7
>>  include/linux/tcp.h                    |   15 -
>>  include/net/tcp.h                      |   40 ----
>>  include/uapi/linux/snmp.h              |    8
>>  net/ipv4/proc.c                        |    8
>>  net/ipv4/sysctl_net_ipv4.c             |    3
>>  net/ipv4/tcp.c                         |  109 -----------
>>  net/ipv4/tcp_input.c                   |  303 +++------------------------------
>>  net/ipv4/tcp_ipv4.c                    |   63 ------
>>  net/ipv4/tcp_minisocks.c               |    3
>>  net/ipv4/tcp_output.c                  |    2
>>  net/ipv4/tcp_timer.c                   |   12 -
>>  net/ipv4/tcp_westwood.c                |   31 ---
>>  net/ipv6/tcp_ipv6.c                    |    3
>>  14 files changed, 43 insertions(+), 564 deletions(-)
>>
>
> I unconditionally support the removal of prequeue support.
>
> For the header prediction code: IMHO before removing the header
> prediction code it would be useful to do some kind of before-and-after
> benchmarking on a low-powered device where battery life is the main
> concern. I am thinking about ARM-based cell phones, IoT/embedded
> devices, raspberry pi, etc. You mention GRO helping to make header
> prediction obsolete, but in those devices packets arrive so slowly
> that probably GRO does not help. With slow CPUs and battery life the
> main concern, it seems conceivable to me that header prediction might
> still be a win (and worth keeping, since the complexity cost is

by the time these devices use 4.12 kernels they are likely powerful
enough to make header prediction irrelevant...

> largely in the past; the maintenance overhead has been low). Just a
> thought.
>
> thanks,
> neal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC net-next 0/6] tcp: remove prequeue and header prediction
  2017-07-31 20:04   ` Yuchung Cheng
@ 2017-07-31 20:22     ` Eric Dumazet
  2017-07-31 21:38       ` David Miller
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2017-07-31 20:22 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Neal Cardwell, Florian Westphal, Netdev, Soheil Hassas Yeganeh,
	Wei Wang, Lawrence Brakmo, David Miller, Lorenzo Colitti,
	Van Jacobson

On Mon, Jul 31, 2017 at 1:04 PM, Yuchung Cheng <ycheng@google.com> wrote:
> by the time these devices use 4.12 kernels they are likely powerful
> enough to make header prediction irrelevant...

Also note that TCP stack complexity has increased a lot, I seriously
doubt anyone could notice any difference.

On small devices, the major cost is the wakeup of the cpu to process
one frame before going back to idle...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC net-next 0/6] tcp: remove prequeue and header prediction
  2017-07-31 20:22     ` Eric Dumazet
@ 2017-07-31 21:38       ` David Miller
  0 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2017-07-31 21:38 UTC (permalink / raw)
  To: edumazet
  Cc: ycheng, ncardwell, fw, netdev, soheil, weiwan, brakmo, lorenzo,
	vanj

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 31 Jul 2017 13:22:22 -0700

> On Mon, Jul 31, 2017 at 1:04 PM, Yuchung Cheng <ycheng@google.com> wrote:
>> by the time these devices use 4.12 kernels they are likely powerful
>> enough to make header prediction irrelevant...
> 
> Also note that TCP stack complexity has increased a lot, I seriously
> doubt anyone could notice any difference.
> 
> On small devices, the major cost is the wakeup of the cpu to process
> one frame before going back to idle...

I agree with Yuchung and Eric on all counts.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-07-31 21:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-27 23:31 [RFC net-next 0/6] tcp: remove prequeue and header prediction Florian Westphal
2017-07-27 23:31 ` [RFC PATCH net-next 1/6] tcp: remove prequeue support Florian Westphal
2017-07-27 23:31 ` [RFC PATCH net-next 2/6] tcp: reindent two spots after prequeue removal Florian Westphal
2017-07-27 23:31 ` [RFC PATCH net-next 3/6] tcp: remove low_latency sysctl Florian Westphal
2017-07-27 23:31 ` [RFC PATCH net-next 4/6] tcp: remove header prediction Florian Westphal
2017-07-27 23:31 ` [RFC PATCH net-next 5/6] tcp: remove CA_ACK_SLOWPATH Florian Westphal
2017-07-27 23:31 ` [RFC PATCH net-next 6/6] tcp: remove unused mib counters Florian Westphal
2017-07-28 19:19 ` [RFC net-next 0/6] tcp: remove prequeue and header prediction Yuchung Cheng
2017-07-29 22:22 ` David Miller
2017-07-30  2:25 ` Neal Cardwell
2017-07-31 20:04   ` Yuchung Cheng
2017-07-31 20:22     ` Eric Dumazet
2017-07-31 21:38       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).