* Why does tcp collapse behavior depend on nr_frags? @ 2016-11-01 8:15 Ilya Lesokhin 2016-11-01 13:33 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Ilya Lesokhin @ 2016-11-01 8:15 UTC (permalink / raw) To: netdev@vger.kernel.org; +Cc: Eric Dumazet, tls-fpga-sw-dev Hi, I've notice that tcp_can_collapse() returns false if skb_shinfo(skb)->nr_frags != 0. Is there a reason why we want to base the collapse decision in retransmission on whether the data is located in a frag or the linear part? The relevant commit is tcp: collapse more than two on retransmission ('4a17fc3add594fcc1c778e93a95b6ecf47f630e5') Thanks, Ilya ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Why does tcp collapse behavior depend on nr_frags? 2016-11-01 8:15 Why does tcp collapse behavior depend on nr_frags? Ilya Lesokhin @ 2016-11-01 13:33 ` Eric Dumazet 2016-11-01 15:32 ` [PATCH net-next] tcp: enhance tcp collapsing Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2016-11-01 13:33 UTC (permalink / raw) To: Ilya Lesokhin; +Cc: netdev@vger.kernel.org, tls-fpga-sw-dev, Ilpo Järvinen On Tue, 2016-11-01 at 08:15 +0000, Ilya Lesokhin wrote: > Hi, > I've notice that tcp_can_collapse() returns false if skb_shinfo(skb)->nr_frags != 0. > Is there a reason why we want to base the collapse decision in retransmission on whether > the data is located in a frag or the linear part? > > The relevant commit is > tcp: collapse more than two on retransmission ('4a17fc3add594fcc1c778e93a95b6ecf47f630e5') > > Thanks, > Ilya Good point. I am guessing you see this after commit 3613b3dbd1ade9a6a626dae1f608c57638eb5e8a ("tcp: prepare skbs for better sack shifting") ? Problem is that the left skb might have no skb_availroom() anyway... We could theoretically use the same helpers we use at sack shifting, but are these collapse events frequent enough we should care ? Thanks ! ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next] tcp: enhance tcp collapsing 2016-11-01 13:33 ` Eric Dumazet @ 2016-11-01 15:32 ` Eric Dumazet 2016-11-01 16:35 ` Neal Cardwell 2016-11-01 17:53 ` [PATCH v2 " Eric Dumazet 0 siblings, 2 replies; 8+ messages in thread From: Eric Dumazet @ 2016-11-01 15:32 UTC (permalink / raw) To: Ilya Lesokhin, David Miller Cc: netdev@vger.kernel.org, Ilpo Järvinen, Neal Cardwell, Yuchung Cheng From: Eric Dumazet <edumazet@google.com> As Ilya Lesokhin suggested, we can collapse two skbs at retransmit time even if the skb at the right has fragments. We simply have to use more generic skb_copy_bits() instead of skb_copy_from_linear_data() in tcp_collapse_retrans() Tested: Used following packetdrill test // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.100 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 write(4, ..., 200) = 200 +0 > P. 1:201(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 201:401(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 401:601(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 601:801(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 801:1001(200) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1001:1101(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1101:1201(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1201:1301(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1301:1401(100) ack 1 +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401> // Check that TCP collapse works : +0 > P. 1:1001(1000) ack 1 Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> --- net/ipv4/tcp_output.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 896e9dfbdb5cd9ca0fa003f6be2c5cd332dde7cf..ad668d97fbb1660a9ad44ddb459ea45e15a22de1 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2529,8 +2529,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) tcp_unlink_write_queue(next_skb, sk); - skb_copy_from_linear_data(next_skb, skb_put(skb, next_skb_size), - next_skb_size); + skb_copy_bits(next_skb, 0, skb_put(skb, next_skb_size), next_skb_size); if (next_skb->ip_summed == CHECKSUM_PARTIAL) skb->ip_summed = CHECKSUM_PARTIAL; @@ -2567,14 +2566,11 @@ static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *skb) { if (tcp_skb_pcount(skb) > 1) return false; - /* TODO: SACK collapsing could be used to remove this condition */ - if (skb_shinfo(skb)->nr_frags != 0) - return false; if (skb_cloned(skb)) return false; if (skb == tcp_send_head(sk)) return false; - /* Some heurestics for collapsing over SACK'd could be invented */ + /* Some heuristics for collapsing over SACK'd could be invented */ if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) return false; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: enhance tcp collapsing 2016-11-01 15:32 ` [PATCH net-next] tcp: enhance tcp collapsing Eric Dumazet @ 2016-11-01 16:35 ` Neal Cardwell 2016-11-01 17:39 ` Eric Dumazet 2016-11-01 17:53 ` [PATCH v2 " Eric Dumazet 1 sibling, 1 reply; 8+ messages in thread From: Neal Cardwell @ 2016-11-01 16:35 UTC (permalink / raw) To: Eric Dumazet Cc: Ilya Lesokhin, David Miller, netdev@vger.kernel.org, Ilpo Järvinen, Yuchung Cheng On Tue, Nov 1, 2016 at 11:32 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > > As Ilya Lesokhin suggested, we can collapse two skbs at retransmit > time even if the skb at the right has fragments. > > We simply have to use more generic skb_copy_bits() instead of > skb_copy_from_linear_data() in tcp_collapse_retrans() > > Tested: > > Used following packetdrill test > > // Establish a connection. > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > +0 bind(3, ..., ...) = 0 > +0 listen(3, 1) = 0 > > +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> > +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> > +.100 < . 1:1(0) ack 1 win 257 > +0 accept(3, ..., ...) = 4 > > +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 > +0 write(4, ..., 200) = 200 > +0 > P. 1:201(200) ack 1 > +.001 write(4, ..., 200) = 200 > +0 > P. 201:401(200) ack 1 > +.001 write(4, ..., 200) = 200 > +0 > P. 401:601(200) ack 1 > +.001 write(4, ..., 200) = 200 > +0 > P. 601:801(200) ack 1 > +.001 write(4, ..., 200) = 200 > +0 > P. 801:1001(200) ack 1 > +.001 write(4, ..., 100) = 100 > +0 > P. 1001:1101(100) ack 1 > +.001 write(4, ..., 100) = 100 > +0 > P. 1101:1201(100) ack 1 > +.001 write(4, ..., 100) = 100 > +0 > P. 1201:1301(100) ack 1 > +.001 write(4, ..., 100) = 100 > +0 > P. 1301:1401(100) ack 1 > > +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401> > // Check that TCP collapse works : > +0 > P. 1:1001(1000) ack 1 > > > Reported-by: Ilya Lesokhin <ilyal@mellanox.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > --- Acked-by: Neal Cardwell <ncardwell@google.com> Thanks, Eric! neal ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: enhance tcp collapsing 2016-11-01 16:35 ` Neal Cardwell @ 2016-11-01 17:39 ` Eric Dumazet 2016-11-01 17:47 ` Eric Dumazet 0 siblings, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2016-11-01 17:39 UTC (permalink / raw) To: Neal Cardwell Cc: Ilya Lesokhin, David Miller, netdev@vger.kernel.org, Ilpo Järvinen, Yuchung Cheng On Tue, 2016-11-01 at 12:35 -0400, Neal Cardwell wrote: > On Tue, Nov 1, 2016 at 11:32 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > From: Eric Dumazet <edumazet@google.com> > > > > As Ilya Lesokhin suggested, we can collapse two skbs at retransmit > > time even if the skb at the right has fragments. > > > > We simply have to use more generic skb_copy_bits() instead of > > skb_copy_from_linear_data() in tcp_collapse_retrans() > > > > > > Reported-by: Ilya Lesokhin <ilyal@mellanox.com> > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > --- > > Acked-by: Neal Cardwell <ncardwell@google.com> > Thanks Neal. David please hold this patch, further tests show an issue I need to track and fix. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: enhance tcp collapsing 2016-11-01 17:39 ` Eric Dumazet @ 2016-11-01 17:47 ` Eric Dumazet 0 siblings, 0 replies; 8+ messages in thread From: Eric Dumazet @ 2016-11-01 17:47 UTC (permalink / raw) To: Neal Cardwell Cc: Ilya Lesokhin, David Miller, netdev@vger.kernel.org, Ilpo Järvinen, Yuchung Cheng On Tue, 2016-11-01 at 10:39 -0700, Eric Dumazet wrote: > Thanks Neal. > > David please hold this patch, further tests show an issue I need to > track and fix. The issue is that the right skb can be a FIN packet with no data, merged to a left skb with frag(s). Thus I need to call skb_copy_bits() only if there is something to copy, otherwise skb_put() can BUG_ON in SKB_LINEAR_ASSERT(), even if @len is 0. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 net-next] tcp: enhance tcp collapsing 2016-11-01 15:32 ` [PATCH net-next] tcp: enhance tcp collapsing Eric Dumazet 2016-11-01 16:35 ` Neal Cardwell @ 2016-11-01 17:53 ` Eric Dumazet 2016-11-01 19:35 ` Neal Cardwell 1 sibling, 1 reply; 8+ messages in thread From: Eric Dumazet @ 2016-11-01 17:53 UTC (permalink / raw) To: Ilya Lesokhin, David Miller Cc: netdev@vger.kernel.org, Ilpo Järvinen, Neal Cardwell, Yuchung Cheng From: Eric Dumazet <edumazet@google.com> As Ilya Lesokhin suggested, we can collapse two skbs at retransmit time even if the skb at the right has fragments. We simply have to use more generic skb_copy_bits() instead of skb_copy_from_linear_data() in tcp_collapse_retrans() Also need to guard this skb_copy_bits() in case there is nothing to copy, otherwise skb_put() could panic if left skb has frags. Tested: Used following packetdrill test // Establish a connection. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.100 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 write(4, ..., 200) = 200 +0 > P. 1:201(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 201:401(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 401:601(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 601:801(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 801:1001(200) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1001:1101(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1101:1201(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1201:1301(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1301:1401(100) ack 1 +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401> // Check that TCP collapse works : +0 > P. 1:1001(1000) ack 1 Reported-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_output.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 896e9dfbdb5cd9ca0fa003f6be2c5cd332dde7cf..f57b5aa51b59cf0a58975fe34a7dcdb886ea8c50 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2529,8 +2529,9 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) tcp_unlink_write_queue(next_skb, sk); - skb_copy_from_linear_data(next_skb, skb_put(skb, next_skb_size), - next_skb_size); + if (next_skb_size) + skb_copy_bits(next_skb, 0, skb_put(skb, next_skb_size), + next_skb_size); if (next_skb->ip_summed == CHECKSUM_PARTIAL) skb->ip_summed = CHECKSUM_PARTIAL; @@ -2567,14 +2568,11 @@ static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *skb) { if (tcp_skb_pcount(skb) > 1) return false; - /* TODO: SACK collapsing could be used to remove this condition */ - if (skb_shinfo(skb)->nr_frags != 0) - return false; if (skb_cloned(skb)) return false; if (skb == tcp_send_head(sk)) return false; - /* Some heurestics for collapsing over SACK'd could be invented */ + /* Some heuristics for collapsing over SACK'd could be invented */ if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) return false; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 net-next] tcp: enhance tcp collapsing 2016-11-01 17:53 ` [PATCH v2 " Eric Dumazet @ 2016-11-01 19:35 ` Neal Cardwell 0 siblings, 0 replies; 8+ messages in thread From: Neal Cardwell @ 2016-11-01 19:35 UTC (permalink / raw) To: Eric Dumazet Cc: Ilya Lesokhin, David Miller, netdev@vger.kernel.org, Ilpo Järvinen, Yuchung Cheng On Tue, Nov 1, 2016 at 1:53 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > As Ilya Lesokhin suggested, we can collapse two skbs at retransmit > time even if the skb at the right has fragments. > > We simply have to use more generic skb_copy_bits() instead of > skb_copy_from_linear_data() in tcp_collapse_retrans() > > Also need to guard this skb_copy_bits() in case there is nothing to > copy, otherwise skb_put() could panic if left skb has frags. > > Tested: > > Used following packetdrill test > > // Establish a connection. > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > +0 bind(3, ..., ...) = 0 > +0 listen(3, 1) = 0 > > +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> > +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> > +.100 < . 1:1(0) ack 1 win 257 > +0 accept(3, ..., ...) = 4 > > +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 > +0 write(4, ..., 200) = 200 > +0 > P. 1:201(200) ack 1 > +.001 write(4, ..., 200) = 200 > +0 > P. 201:401(200) ack 1 > +.001 write(4, ..., 200) = 200 > +0 > P. 401:601(200) ack 1 > +.001 write(4, ..., 200) = 200 > +0 > P. 601:801(200) ack 1 > +.001 write(4, ..., 200) = 200 > +0 > P. 801:1001(200) ack 1 > +.001 write(4, ..., 100) = 100 > +0 > P. 1001:1101(100) ack 1 > +.001 write(4, ..., 100) = 100 > +0 > P. 1101:1201(100) ack 1 > +.001 write(4, ..., 100) = 100 > +0 > P. 1201:1301(100) ack 1 > +.001 write(4, ..., 100) = 100 > +0 > P. 1301:1401(100) ack 1 > > +.100 < . 1:1(0) ack 1 win 257 <nop,nop,sack 1001:1401> > // Check that TCP collapse works : > +0 > P. 1:1001(1000) ack 1 > > > Reported-by: Ilya Lesokhin <ilyal@mellanox.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> > --- Acked-by: Neal Cardwell <ncardwell@google.com> Thanks, Eric. :-) neal ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-01 19:35 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-01 8:15 Why does tcp collapse behavior depend on nr_frags? Ilya Lesokhin 2016-11-01 13:33 ` Eric Dumazet 2016-11-01 15:32 ` [PATCH net-next] tcp: enhance tcp collapsing Eric Dumazet 2016-11-01 16:35 ` Neal Cardwell 2016-11-01 17:39 ` Eric Dumazet 2016-11-01 17:47 ` Eric Dumazet 2016-11-01 17:53 ` [PATCH v2 " Eric Dumazet 2016-11-01 19:35 ` Neal Cardwell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).