Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next] tcp: force cwnd at least 2 in tcp_cwnd_reduction
@ 2018-06-27  1:52 Lawrence Brakmo
  2018-06-27  2:18 ` kbuild test robot
  2018-06-27  4:22 ` kbuild test robot
  0 siblings, 2 replies; 3+ messages in thread
From: Lawrence Brakmo @ 2018-06-27  1:52 UTC (permalink / raw)
  To: netdev; +Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Eric Dumazet

When using dctcp and doing RPCs, if the last packet of a request is
ECN marked as having seen congestion (CE), the sender can decrease its
cwnd to 1. As a result, it will only send one packet when a new request
is sent. In some instances this results in high tail latencies.

For example, in one setup there are 3 hosts sending to a 4th one, with
each sender having 3 flows (1 stream, 1 1MB back-to-back RPCs and 1 10KB
back-to-back RPCs). The following table shows the 99% and 99.9%
latencies for both Cubic and dctcp

           Cubic 99%  Cubic 99.9%   dctcp 99%    dctcp 99.9%
 1MB RPCs    3.5ms       6.0ms         43ms          208ms
10KB RPCs    1.0ms       2.5ms         53ms          212ms

On 4.11, pcap traces indicate that in some instances the 1st packet of
the RPC is received but no ACK is sent before the packet is
retransmitted. On 4.11 netstat shows TCP timeouts, with some of them
spurious.

On 4.16, we don't see retransmits in netstat but the high tail latencies
are still there. Forcing cwnd to be at least 2 in tcp_cwnd_reduction
fixes the problem with the high tail latencies. The latencies now look
like this:

             dctcp 99%       dctcp 99.9%
 1MB RPCs      3.8ms             4.4ms
10KB RPCs      168us             211us

Another group working with dctcp saw the same issue with production
traffic and it was solved with this patch.

The only issue is if it is safe to always use 2 or if it is better to
use min(2, snd_ssthresh) (which could still trigger the problem).

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 net/ipv4/tcp_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 76ca88f63b70..a9255c424761 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2477,7 +2477,7 @@ void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked, int flag)
 	}
 	/* Force a fast retransmit upon entering fast recovery */
 	sndcnt = max(sndcnt, (tp->prr_out ? 0 : 1));
-	tp->snd_cwnd = tcp_packets_in_flight(tp) + sndcnt;
+	tp->snd_cwnd = max(tcp_packets_in_flight(tp) + sndcnt, 2);
 }
 
 static inline void tcp_end_cwnd_reduction(struct sock *sk)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] tcp: force cwnd at least 2 in tcp_cwnd_reduction
  2018-06-27  1:52 [PATCH net-next] tcp: force cwnd at least 2 in tcp_cwnd_reduction Lawrence Brakmo
@ 2018-06-27  2:18 ` kbuild test robot
  2018-06-27  4:22 ` kbuild test robot
  1 sibling, 0 replies; 3+ messages in thread
From: kbuild test robot @ 2018-06-27  2:18 UTC (permalink / raw)
  To: Lawrence Brakmo
  Cc: kbuild-all, netdev, Kernel Team, Blake Matheny,
	Alexei Starovoitov, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 2947 bytes --]

Hi Lawrence,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Lawrence-Brakmo/tcp-force-cwnd-at-least-2-in-tcp_cwnd_reduction/20180627-095533
config: x86_64-randconfig-x003-201825 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:18:0,
                    from arch/x86/include/asm/bug.h:83,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:9,
                    from net/ipv4/tcp_input.c:67:
   net/ipv4/tcp_input.c: In function 'tcp_cwnd_reduction':
   include/linux/kernel.h:812:29: warning: comparison of distinct pointer types lacks a cast
      (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                                ^
   include/linux/kernel.h:826:4: note: in expansion of macro '__typecheck'
      (__typecheck(x, y) && __no_side_effects(x, y))
       ^~~~~~~~~~~
   include/linux/kernel.h:836:24: note: in expansion of macro '__safe_cmp'
     __builtin_choose_expr(__safe_cmp(x, y), \
                           ^~~~~~~~~~
   include/linux/kernel.h:852:19: note: in expansion of macro '__careful_cmp'
    #define max(x, y) __careful_cmp(x, y, >)
                      ^~~~~~~~~~~~~
>> net/ipv4/tcp_input.c:2480:17: note: in expansion of macro 'max'
     tp->snd_cwnd = max(tcp_packets_in_flight(tp) + sndcnt, 2);
                    ^~~

vim +/max +2480 net/ipv4/tcp_input.c

  2455	
  2456	void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked, int flag)
  2457	{
  2458		struct tcp_sock *tp = tcp_sk(sk);
  2459		int sndcnt = 0;
  2460		int delta = tp->snd_ssthresh - tcp_packets_in_flight(tp);
  2461	
  2462		if (newly_acked_sacked <= 0 || WARN_ON_ONCE(!tp->prior_cwnd))
  2463			return;
  2464	
  2465		tp->prr_delivered += newly_acked_sacked;
  2466		if (delta < 0) {
  2467			u64 dividend = (u64)tp->snd_ssthresh * tp->prr_delivered +
  2468				       tp->prior_cwnd - 1;
  2469			sndcnt = div_u64(dividend, tp->prior_cwnd) - tp->prr_out;
  2470		} else if ((flag & FLAG_RETRANS_DATA_ACKED) &&
  2471			   !(flag & FLAG_LOST_RETRANS)) {
  2472			sndcnt = min_t(int, delta,
  2473				       max_t(int, tp->prr_delivered - tp->prr_out,
  2474					     newly_acked_sacked) + 1);
  2475		} else {
  2476			sndcnt = min(delta, newly_acked_sacked);
  2477		}
  2478		/* Force a fast retransmit upon entering fast recovery */
  2479		sndcnt = max(sndcnt, (tp->prr_out ? 0 : 1));
> 2480		tp->snd_cwnd = max(tcp_packets_in_flight(tp) + sndcnt, 2);
  2481	}
  2482	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32495 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next] tcp: force cwnd at least 2 in tcp_cwnd_reduction
  2018-06-27  1:52 [PATCH net-next] tcp: force cwnd at least 2 in tcp_cwnd_reduction Lawrence Brakmo
  2018-06-27  2:18 ` kbuild test robot
@ 2018-06-27  4:22 ` kbuild test robot
  1 sibling, 0 replies; 3+ messages in thread
From: kbuild test robot @ 2018-06-27  4:22 UTC (permalink / raw)
  To: Lawrence Brakmo
  Cc: kbuild-all, netdev, Kernel Team, Blake Matheny,
	Alexei Starovoitov, Eric Dumazet

Hi Lawrence,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Lawrence-Brakmo/tcp-force-cwnd-at-least-2-in-tcp_cwnd_reduction/20180627-095533
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   net/ipv4/tcp_input.c:168:42: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:168:42: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:213:21: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:213:21: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:329:19: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:329:19: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:336:19: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:337:19: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:337:19: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:347:33: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:347:33: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:412:32: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:412:32: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:413:44: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:413:44: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:436:33: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:436:33: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:464:44: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:464:44: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:473:36: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:473:36: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:475:28: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:475:28: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:492:33: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:492:33: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:496:36: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:496:36: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:509:29: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:509:29: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:511:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:511:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:512:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:513:16: sparse: expression using sizeof(void)
   include/net/tcp.h:739:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:652:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:652:26: sparse: expression using sizeof(void)
   include/net/tcp.h:739:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:790:33: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:790:33: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:794:23: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:818:17: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:818:17: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:827:9: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:827:9: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:867:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:867:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:902:34: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:902:34: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1654:25: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1848:17: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1849:17: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1849:17: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1869:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1869:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1896:34: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1995:34: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:1995:34: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2024:39: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2024:39: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2400:44: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2472:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2476:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2476:26: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2479:18: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2479:18: sparse: expression using sizeof(void)
>> net/ipv4/tcp_input.c:2480:24: sparse: incompatible types in comparison expression (different signedness)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:1138:24: sparse: expression using sizeof(void)
   include/net/tcp.h:739:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:2990:48: sparse: expression using sizeof(void)
   include/net/tcp.h:739:16: sparse: expression using sizeof(void)
   include/net/tcp.h:739:16: sparse: expression using sizeof(void)
   include/net/tcp.h:739:16: sparse: expression using sizeof(void)
   include/net/tcp.h:739:16: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:3195:46: sparse: expression using sizeof(void)
   net/ipv4/tcp_input.c:3195:46: sparse: expression using sizeof(void)
   include/net/tcp.h:739:16: sparse: expression using sizeof(void)
   include/net/tcp.h:1206:16: sparse: expression using sizeof(void)
   include/net/tcp.h:1215:31: sparse: expression using sizeof(void)
   include/net/tcp.h:1215:31: sparse: too many warnings
   In file included from include/asm-generic/bug.h:18:0,
                    from arch/x86/include/asm/bug.h:83,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:9,
                    from net/ipv4/tcp_input.c:67:
   net/ipv4/tcp_input.c: In function 'tcp_cwnd_reduction':
   include/linux/kernel.h:812:29: warning: comparison of distinct pointer types lacks a cast
      (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                                ^
   include/linux/kernel.h:826:4: note: in expansion of macro '__typecheck'
      (__typecheck(x, y) && __no_side_effects(x, y))
       ^~~~~~~~~~~
   include/linux/kernel.h:836:24: note: in expansion of macro '__safe_cmp'
     __builtin_choose_expr(__safe_cmp(x, y), 117-                        ^~~~~~~~~~
   include/linux/kernel.h:852:19: note: in expansion of macro '__careful_cmp'
    #define max(x, y) __careful_cmp(x, y, >)
                      ^~~~~~~~~~~~~
   net/ipv4/tcp_input.c:2480:17: note: in expansion of macro 'max'
     tp->snd_cwnd = max(tcp_packets_in_flight(tp) + sndcnt, 2);
                    ^~~

vim +2480 net/ipv4/tcp_input.c

  2455	
  2456	void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked, int flag)
  2457	{
  2458		struct tcp_sock *tp = tcp_sk(sk);
  2459		int sndcnt = 0;
  2460		int delta = tp->snd_ssthresh - tcp_packets_in_flight(tp);
  2461	
  2462		if (newly_acked_sacked <= 0 || WARN_ON_ONCE(!tp->prior_cwnd))
  2463			return;
  2464	
  2465		tp->prr_delivered += newly_acked_sacked;
  2466		if (delta < 0) {
  2467			u64 dividend = (u64)tp->snd_ssthresh * tp->prr_delivered +
  2468				       tp->prior_cwnd - 1;
  2469			sndcnt = div_u64(dividend, tp->prior_cwnd) - tp->prr_out;
  2470		} else if ((flag & FLAG_RETRANS_DATA_ACKED) &&
  2471			   !(flag & FLAG_LOST_RETRANS)) {
> 2472			sndcnt = min_t(int, delta,
  2473				       max_t(int, tp->prr_delivered - tp->prr_out,
  2474					     newly_acked_sacked) + 1);
  2475		} else {
  2476			sndcnt = min(delta, newly_acked_sacked);
  2477		}
  2478		/* Force a fast retransmit upon entering fast recovery */
  2479		sndcnt = max(sndcnt, (tp->prr_out ? 0 : 1));
> 2480		tp->snd_cwnd = max(tcp_packets_in_flight(tp) + sndcnt, 2);
  2481	}
  2482	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-06-27  4:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-27  1:52 [PATCH net-next] tcp: force cwnd at least 2 in tcp_cwnd_reduction Lawrence Brakmo
2018-06-27  2:18 ` kbuild test robot
2018-06-27  4:22 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox