* [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams @ 2015-09-23 16:49 Bendik Rønning Opstad 2015-09-23 18:05 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 5+ messages in thread From: Bendik Rønning Opstad @ 2015-09-23 16:49 UTC (permalink / raw) To: David S. Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Eric Dumazet, Neal Cardwell Cc: netdev, Bendik Rønning Opstad, Andreas Petlund, Carsten Griwodz, Jonas Markussen, Kenneth Klette Jonassen, Mads Johannessen Application limited streams such as thin streams, that transmit small amounts of payload in relatively few packets per RTT, can be prevented from growing the CWND when in congestion avoidance. This leads to increased sojourn times for data segments in streams that often transmit time-dependent data. Currently, a connection is considered CWND limited only after having successfully transmitted at least one packet with new data, while at the same time failing to transmit some unsent data from the output queue because the CWND is full. Applications that produce small amounts of data may be left in a state where it is never considered to be CWND limited, because all unsent data is successfully transmitted each time an incoming ACK opens up for more data to be transmitted in the send window. Fix by always testing whether the CWND is fully used after successful packet transmissions, such that a connection is considered CWND limited whenever the CWND has been filled. This is the correct behavior as specified in RFC2861 (section 3.1). Cc: Andreas Petlund <apetlund@simula.no> Cc: Carsten Griwodz <griff@simula.no> Cc: Jonas Markussen <jonassm@ifi.uio.no> Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Cc: Mads Johannessen <madsjoh@ifi.uio.no> Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com> --- Changes in v2: - Instead of updating tp->is_cwnd_limited after failing to send any packets, test whether CWND is full after successfull packet transmissions. - Updating commit message according to changes. net/ipv4/tcp_output.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 4cd0b50..57a586f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1827,7 +1827,7 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb, /* Ok, it looks like it is advisable to defer. */ - if (cong_win < send_win && cong_win < skb->len) + if (cong_win < send_win && cong_win <= skb->len) *is_cwnd_limited = true; return true; @@ -2060,7 +2060,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, cwnd_quota = tcp_cwnd_test(tp, skb); if (!cwnd_quota) { - is_cwnd_limited = true; if (push_one == 2) /* Force out a loss probe pkt. */ cwnd_quota = 1; @@ -2142,6 +2141,7 @@ repair: /* Send one loss probe per tail loss episode. */ if (push_one != 2) tcp_schedule_loss_probe(sk); + is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd); tcp_cwnd_validate(sk, is_cwnd_limited); return false; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams 2015-09-23 16:49 [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams Bendik Rønning Opstad @ 2015-09-23 18:05 ` Eric Dumazet 2015-09-23 18:37 ` Neal Cardwell 2015-09-27 16:33 ` Eric Dumazet 2 siblings, 0 replies; 5+ messages in thread From: Eric Dumazet @ 2015-09-23 18:05 UTC (permalink / raw) To: Bendik Rønning Opstad Cc: David S. Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Eric Dumazet, Neal Cardwell, netdev, Bendik Rønning Opstad, Andreas Petlund, Carsten Griwodz, Jonas Markussen, Kenneth Klette Jonassen, Mads Johannessen On Wed, 2015-09-23 at 18:49 +0200, Bendik Rønning Opstad wrote: > Application limited streams such as thin streams, that transmit small > amounts of payload in relatively few packets per RTT, can be prevented > from growing the CWND when in congestion avoidance. This leads to > increased sojourn times for data segments in streams that often transmit > time-dependent data. > > Currently, a connection is considered CWND limited only after having > successfully transmitted at least one packet with new data, while at the > same time failing to transmit some unsent data from the output queue > because the CWND is full. Applications that produce small amounts of > data may be left in a state where it is never considered to be CWND > limited, because all unsent data is successfully transmitted each time > an incoming ACK opens up for more data to be transmitted in the send > window. > > Fix by always testing whether the CWND is fully used after successful > packet transmissions, such that a connection is considered CWND limited > whenever the CWND has been filled. This is the correct behavior as > specified in RFC2861 (section 3.1). > > Cc: Andreas Petlund <apetlund@simula.no> > Cc: Carsten Griwodz <griff@simula.no> > Cc: Jonas Markussen <jonassm@ifi.uio.no> > Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> > Cc: Mads Johannessen <madsjoh@ifi.uio.no> > Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com> > --- Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Eric Dumazet <edumazet@google.com> Tested with following packetdrill script : // Establish a connection and send 1 MSS. 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 setsockopt(3, IPPROTO_TCP, TCP_CONGESTION, "reno", 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK> +.200 < . 1:1(0) ack 1 win 65535 +0 accept(3, ..., ...) = 4 +0 write(4, ..., 100) = 100 +0 > P. 1:101(100) ack 1 +.000 %{ print tcpi_rto }% // TLP +.500~+.505 > P. 1:101(100) ack 1 // RTO +.600~+.605 > P. 1:101(100) ack 1 +.200 < . 1:1(0) ack 101 win 65535 // cwnd should be 2, ssthresh should be 7 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% 2.000 write(4, ..., 100) = 100 +0 > P. 101:201(100) ack 1 // TLP +.500~+.505 > P. 101:201(100) ack 1 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% // RTO +1.200~+1.210 > P. 101:201(100) ack 1 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% +.200 < . 1:1(0) ack 201 win 65535 4.00 write(4, ..., 100) = 100 +0 > P. 201:301(100) ack 1 4.01 write(4, ..., 100) = 100 +0 > P. 301:401(100) ack 1 4.02 write(4, ..., 100) = 100 4.03 write(4, ..., 100) = 100 4.04 write(4, ..., 100) = 100 4.05 write(4, ..., 100) = 100 4.06 write(4, ..., 100) = 100 4.07 write(4, ..., 100) = 100 4.08 write(4, ..., 100) = 100 4.09 write(4, ..., 100) = 100 4.10 write(4, ..., 100) = 100 4.11 write(4, ..., 100) = 100 4.12 write(4, ..., 100) = 100 4.13 write(4, ..., 100) = 100 4.14 write(4, ..., 100) = 100 4.15 write(4, ..., 100) = 100 4.16 write(4, ..., 100) = 100 4.17 write(4, ..., 100) = 100 4.18 write(4, ..., 100) = 100 4.19 write(4, ..., 100) = 100 4.20 write(4, ..., 100) = 100 4.20 < . 1:1(0) ack 301 win 65535 4.20 > . 401:1401(1000) ack 1 4.21 write(4, ..., 100) = 100 4.21 < . 1:1(0) ack 401 win 65535 4.21 > P. 1401:2401(1000) ack 1 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% 4.22 write(4, ..., 100) = 100 4.22 > P. 2401:2501(100) ack 1 4.23 write(4, ..., 100) = 100 4.24 write(4, ..., 100) = 100 4.25 write(4, ..., 100) = 100 4.26 write(4, ..., 100) = 100 4.27 write(4, ..., 100) = 100 4.28 write(4, ..., 100) = 100 4.29 write(4, ..., 100) = 100 4.31 write(4, ..., 100) = 100 4.32 write(4, ..., 100) = 100 4.33 write(4, ..., 100) = 100 4.34 write(4, ..., 100) = 100 4.35 write(4, ..., 100) = 100 4.36 write(4, ..., 100) = 100 4.37 write(4, ..., 100) = 100 4.38 write(4, ..., 100) = 100 4.39 write(4, ..., 100) = 100 4.40 write(4, ..., 100) = 100 4.40 < . 1:1(0) ack 1401 win 65535 4.40 > . 2501:3501(1000) ack 1 4.41 write(4, ..., 100) = 100 4.41 < . 1:1(0) ack 2401 win 65535 4.41 > P. 3501:4301(800) ack 1 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% 4.42 write(4, ..., 100) = 100 4.43 write(4, ..., 100) = 100 4.44 write(4, ..., 100) = 100 4.45 write(4, ..., 100) = 100 4.46 write(4, ..., 100) = 100 4.47 write(4, ..., 100) = 100 4.48 write(4, ..., 100) = 100 4.49 write(4, ..., 100) = 100 4.50 write(4, ..., 100) = 100 4.51 write(4, ..., 100) = 100 4.52 write(4, ..., 100) = 100 4.53 write(4, ..., 100) = 100 4.54 write(4, ..., 100) = 100 4.55 write(4, ..., 100) = 100 4.56 write(4, ..., 100) = 100 4.57 write(4, ..., 100) = 100 4.58 write(4, ..., 100) = 100 4.59 write(4, ..., 100) = 100 4.60 write(4, ..., 100) = 100 4.60 < . 1:1(0) ack 3401 win 65535 4.60 > P. 4301:6201(1900) ack 1 4.61 write(4, ..., 100) = 100 4.61 < . 1:1(0) ack 4301 win 65535 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% 4.61 > P. 6201:6301(100) ack 1 4.62 write(4, ..., 100) = 100 4.62 > P. 6301:6401(100) ack 1 4.63 write(4, ..., 100) = 100 4.64 write(4, ..., 100) = 100 4.65 write(4, ..., 100) = 100 4.66 write(4, ..., 100) = 100 4.67 write(4, ..., 100) = 100 4.68 write(4, ..., 100) = 100 4.69 write(4, ..., 100) = 100 4.70 write(4, ..., 100) = 100 4.71 write(4, ..., 100) = 100 4.72 write(4, ..., 100) = 100 4.73 write(4, ..., 100) = 100 4.74 write(4, ..., 100) = 100 4.75 write(4, ..., 100) = 100 4.76 write(4, ..., 100) = 100 4.77 write(4, ..., 100) = 100 4.78 write(4, ..., 100) = 100 4.79 write(4, ..., 100) = 100 4.80 write(4, ..., 100) = 100 4.80 < . 1:1(0) ack 5301 win 65535 4.80 > . 6401:7401(1000) ack 1 4.81 write(4, ..., 100) = 100 4.81 < . 1:1(0) ack 6301 win 65535 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% 4.81 > P. 7401:8301(900) ack 1 4.82 write(4, ..., 100) = 100 4.82 > P. 8301:8401(100) ack 1 4.83 write(4, ..., 100) = 100 4.83 > P. 8401:8501(100) ack 1 4.84 write(4, ..., 100) = 100 4.85 write(4, ..., 100) = 100 4.86 write(4, ..., 100) = 100 4.87 write(4, ..., 100) = 100 4.88 write(4, ..., 100) = 100 4.89 write(4, ..., 100) = 100 4.90 write(4, ..., 100) = 100 4.91 write(4, ..., 100) = 100 4.92 write(4, ..., 100) = 100 4.93 write(4, ..., 100) = 100 4.94 write(4, ..., 100) = 100 4.95 write(4, ..., 100) = 100 4.96 write(4, ..., 100) = 100 4.97 write(4, ..., 100) = 100 4.98 write(4, ..., 100) = 100 4.99 write(4, ..., 100) = 100 5.00 write(4, ..., 100) = 100 5.00 < . 1:1(0) ack 7301 win 65535 5.00 > . 8501:9501(1000) ack 1 5.01 write(4, ..., 100) = 100 5.01 < . 1:1(0) ack 8301 win 65535 5.01 > P. 9501:10301(800) ack 1 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% 5.02 write(4, ..., 100) = 100 5.02 > P. 10301:10401(100) ack 1 5.03 write(4, ..., 100) = 100 5.04 write(4, ..., 100) = 100 5.05 write(4, ..., 100) = 100 5.06 write(4, ..., 100) = 100 5.07 write(4, ..., 100) = 100 5.08 write(4, ..., 100) = 100 5.09 write(4, ..., 100) = 100 5.10 write(4, ..., 100) = 100 5.11 write(4, ..., 100) = 100 5.12 write(4, ..., 100) = 100 5.13 write(4, ..., 100) = 100 5.14 write(4, ..., 100) = 100 5.15 write(4, ..., 100) = 100 5.16 write(4, ..., 100) = 100 5.17 write(4, ..., 100) = 100 5.18 write(4, ..., 100) = 100 5.19 write(4, ..., 100) = 100 5.20 write(4, ..., 100) = 100 5.20 < . 1:1(0) ack 9301 win 65535 5.20 > P. 10401:12201(1800) ack 1 +0 %{ print "tcpi_snd_cwnd=%d tcpi_snd_ssthresh=%d" % (tcpi_snd_cwnd, tcpi_snd_ssthresh) }% Result : 601000 tcpi_snd_cwnd=2 tcpi_snd_ssthresh=5 tcpi_snd_cwnd=2 tcpi_snd_ssthresh=5 tcpi_snd_cwnd=1 tcpi_snd_ssthresh=2 tcpi_snd_cwnd=3 tcpi_snd_ssthresh=2 tcpi_snd_cwnd=3 tcpi_snd_ssthresh=2 tcpi_snd_cwnd=4 tcpi_snd_ssthresh=2 tcpi_snd_cwnd=5 tcpi_snd_ssthresh=2 tcpi_snd_cwnd=5 tcpi_snd_ssthresh=2 tcpi_snd_cwnd=6 tcpi_snd_ssthresh=2 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams 2015-09-23 16:49 [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams Bendik Rønning Opstad 2015-09-23 18:05 ` Eric Dumazet @ 2015-09-23 18:37 ` Neal Cardwell 2015-09-27 16:33 ` Eric Dumazet 2 siblings, 0 replies; 5+ messages in thread From: Neal Cardwell @ 2015-09-23 18:37 UTC (permalink / raw) To: Bendik Rønning Opstad Cc: David S. Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Eric Dumazet, Netdev, Bendik Rønning Opstad, Andreas Petlund, Carsten Griwodz, Jonas Markussen, Kenneth Klette Jonassen, Mads Johannessen On Wed, Sep 23, 2015 at 12:49 PM, Bendik Rønning Opstad <bro.devel@gmail.com> wrote: > Application limited streams such as thin streams, that transmit small > amounts of payload in relatively few packets per RTT, can be prevented > from growing the CWND when in congestion avoidance. This leads to > increased sojourn times for data segments in streams that often transmit > time-dependent data. > > Currently, a connection is considered CWND limited only after having > successfully transmitted at least one packet with new data, while at the > same time failing to transmit some unsent data from the output queue > because the CWND is full. Applications that produce small amounts of > data may be left in a state where it is never considered to be CWND > limited, because all unsent data is successfully transmitted each time > an incoming ACK opens up for more data to be transmitted in the send > window. > > Fix by always testing whether the CWND is fully used after successful > packet transmissions, such that a connection is considered CWND limited > whenever the CWND has been filled. This is the correct behavior as > specified in RFC2861 (section 3.1). > > Cc: Andreas Petlund <apetlund@simula.no> > Cc: Carsten Griwodz <griff@simula.no> > Cc: Jonas Markussen <jonassm@ifi.uio.no> > Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> > Cc: Mads Johannessen <madsjoh@ifi.uio.no> > Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com> > --- Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> I ran all the Google packetdrill test cases with this patched in, and all still pass. I also re-ran the test I posted in the v1 thread (based on Eric's test), and it still passes. neal ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams 2015-09-23 16:49 [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams Bendik Rønning Opstad 2015-09-23 18:05 ` Eric Dumazet 2015-09-23 18:37 ` Neal Cardwell @ 2015-09-27 16:33 ` Eric Dumazet 2015-09-29 5:37 ` David Miller 2 siblings, 1 reply; 5+ messages in thread From: Eric Dumazet @ 2015-09-27 16:33 UTC (permalink / raw) To: Bendik Rønning Opstad, David Miller Cc: Neal Cardwell, netdev, Bendik Rønning Opstad On Wed, 2015-09-23 at 18:49 +0200, Bendik Rønning Opstad wrote: > Application limited streams such as thin streams, that transmit small > amounts of payload in relatively few packets per RTT, can be prevented > from growing the CWND when in congestion avoidance. This leads to > increased sojourn times for data segments in streams that often transmit > time-dependent data. > > Currently, a connection is considered CWND limited only after having > successfully transmitted at least one packet with new data, while at the > same time failing to transmit some unsent data from the output queue > because the CWND is full. Applications that produce small amounts of > data may be left in a state where it is never considered to be CWND > limited, because all unsent data is successfully transmitted each time > an incoming ACK opens up for more data to be transmitted in the send > window. > > Fix by always testing whether the CWND is fully used after successful > packet transmissions, such that a connection is considered CWND limited > whenever the CWND has been filled. This is the correct behavior as > specified in RFC2861 (section 3.1). > > Cc: Andreas Petlund <apetlund@simula.no> > Cc: Carsten Griwodz <griff@simula.no> > Cc: Jonas Markussen <jonassm@ifi.uio.no> > Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> > Cc: Mads Johannessen <madsjoh@ifi.uio.no> > Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com> > --- > Changes in v2: > - Instead of updating tp->is_cwnd_limited after failing to send > any packets, test whether CWND is full after successfull packet > transmissions. > - Updating commit message according to changes. > > net/ipv4/tcp_output.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 4cd0b50..57a586f 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1827,7 +1827,7 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb, > > /* Ok, it looks like it is advisable to defer. */ > > - if (cong_win < send_win && cong_win < skb->len) > + if (cong_win < send_win && cong_win <= skb->len) > *is_cwnd_limited = true; > > return true; > @@ -2060,7 +2060,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > > cwnd_quota = tcp_cwnd_test(tp, skb); > if (!cwnd_quota) { > - is_cwnd_limited = true; > if (push_one == 2) > /* Force out a loss probe pkt. */ > cwnd_quota = 1; > @@ -2142,6 +2141,7 @@ repair: > /* Send one loss probe per tail loss episode. */ > if (push_one != 2) > tcp_schedule_loss_probe(sk); > + is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd); > tcp_cwnd_validate(sk, is_cwnd_limited); > return false; > } David, any idea of what happened to Bendik patch ? https://patchwork.ozlabs.org/patch/521765 Do we need to re-submit or something ? Thanks ! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams 2015-09-27 16:33 ` Eric Dumazet @ 2015-09-29 5:37 ` David Miller 0 siblings, 0 replies; 5+ messages in thread From: David Miller @ 2015-09-29 5:37 UTC (permalink / raw) To: eric.dumazet; +Cc: bro.devel, ncardwell, netdev, bro.devel+kernel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 27 Sep 2015 09:33:00 -0700 > David, any idea of what happened to Bendik patch ? > > https://patchwork.ozlabs.org/patch/521765 > > Do we need to re-submit or something ? Sorry, applied and build testing, I don't know how this disappeared like that :-) ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-09-29 5:37 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-23 16:49 [PATCH v2 net-next] tcp: Fix CWV being too strict on thin streams Bendik Rønning Opstad 2015-09-23 18:05 ` Eric Dumazet 2015-09-23 18:37 ` Neal Cardwell 2015-09-27 16:33 ` Eric Dumazet 2015-09-29 5:37 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).