netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
@ 2015-06-05 16:16 Martin KaFai Lau
  2015-06-05 16:53 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Martin KaFai Lau @ 2015-06-05 16:16 UTC (permalink / raw)
  To: netdev; +Cc: Kernel Team, Eric Dumazet, Neal Cardwell, Yuchung Cheng

The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment():

[<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20
[<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0
[<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230
[<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80
[<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90
[<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0
[<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0

The WARN_ON pointed out that tcp_skb_pcount (i.e.
TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent.

The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0.

This patch is to set the pcount after skb_pull() was called
in tcp_mtu_probe().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Grant Zhang <gzhang@fastly.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a369e8a..6c95c28 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1984,8 +1984,9 @@ static int tcp_mtu_probe(struct sock *sk)
 								 skb->len, 0);
 			} else {
 				__pskb_trim_head(skb, copy);
-				tcp_set_skb_tso_segs(sk, skb, mss_now);
 			}
+			tcp_set_skb_tso_segs(sk, skb, mss_now);
+
 			TCP_SKB_CB(skb)->seq += copy;
 		}
 
-- 
1.8.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
  2015-06-05 16:16 [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing Martin KaFai Lau
@ 2015-06-05 16:53 ` Eric Dumazet
  2015-06-05 18:02   ` Martin KaFai Lau
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2015-06-05 16:53 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, Kernel Team, Neal Cardwell, Yuchung Cheng

On Fri, Jun 5, 2015 at 9:16 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment():
>
> [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20
> [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0
> [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230
> [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80
> [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90
> [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0
> [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0
>
> The WARN_ON pointed out that tcp_skb_pcount (i.e.
> TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent.
>
> The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0.
>
> This patch is to set the pcount after skb_pull() was called
> in tcp_mtu_probe().
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> Reported-by: Grant Zhang <gzhang@fastly.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
>  net/ipv4/tcp_output.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index a369e8a..6c95c28 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1984,8 +1984,9 @@ static int tcp_mtu_probe(struct sock *sk)
>                                                                  skb->len, 0);
>                         } else {
>                                 __pskb_trim_head(skb, copy);
> -                               tcp_set_skb_tso_segs(sk, skb, mss_now);
>                         }
> +                       tcp_set_skb_tso_segs(sk, skb, mss_now);
> +
>                         TCP_SKB_CB(skb)->seq += copy;
>                 }
>
>

Sounds good, although I would simply get rid of all this complexity in
this very unlikely path.

Would you instead try the following ?

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index eeb59befaf06867b00e1dd6ded7742b2f0bcd821..41ef2c96b2e810e289d6d773e4be09c493b99c09
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1987,16 +1987,7 @@ static int tcp_mtu_probe(struct sock *sk)
                } else {
                        TCP_SKB_CB(nskb)->tcp_flags |=
TCP_SKB_CB(skb)->tcp_flags &
                                                   ~(TCPHDR_FIN|TCPHDR_PSH);
-                       if (!skb_shinfo(skb)->nr_frags) {
-                               skb_pull(skb, copy);
-                               if (skb->ip_summed != CHECKSUM_PARTIAL)
-                                       skb->csum = csum_partial(skb->data,
-                                                                skb->len, 0);
-                       } else {
-                               __pskb_trim_head(skb, copy);
-                               tcp_set_skb_tso_segs(sk, skb, mss_now);
-                       }
-                       TCP_SKB_CB(skb)->seq += copy;
+                       tcp_trim_head(sk, skb, copy);
                }

                len += copy;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
  2015-06-05 16:53 ` Eric Dumazet
@ 2015-06-05 18:02   ` Martin KaFai Lau
  2015-06-05 21:23     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Martin KaFai Lau @ 2015-06-05 18:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Kernel Team, Neal Cardwell, Yuchung Cheng

On Fri, Jun 05, 2015 at 09:53:51AM -0700, Eric Dumazet wrote:
> Sounds good, although I would simply get rid of all this complexity in
> this very unlikely path.
> 
> Would you instead try the following ?
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index eeb59befaf06867b00e1dd6ded7742b2f0bcd821..41ef2c96b2e810e289d6d773e4be09c493b99c09
> 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1987,16 +1987,7 @@ static int tcp_mtu_probe(struct sock *sk)
>                 } else {
>                         TCP_SKB_CB(nskb)->tcp_flags |=
> TCP_SKB_CB(skb)->tcp_flags &
>                                                    ~(TCPHDR_FIN|TCPHDR_PSH);
> -                       if (!skb_shinfo(skb)->nr_frags) {
> -                               skb_pull(skb, copy);
> -                               if (skb->ip_summed != CHECKSUM_PARTIAL)
> -                                       skb->csum = csum_partial(skb->data,
> -                                                                skb->len, 0);
> -                       } else {
> -                               __pskb_trim_head(skb, copy);
> -                               tcp_set_skb_tso_segs(sk, skb, mss_now);
> -                       }
> -                       TCP_SKB_CB(skb)->seq += copy;
> +                       tcp_trim_head(sk, skb, copy);
tcp_trim_head() does not take the mss_now.
Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on
the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
  2015-06-05 18:02   ` Martin KaFai Lau
@ 2015-06-05 21:23     ` Eric Dumazet
  2015-06-05 23:02       ` Martin KaFai Lau
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2015-06-05 21:23 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng

On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote:

> tcp_trim_head() does not take the mss_now.
> Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on
> the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it?

It should be fine : packets not yet sent have tcp_skb_pcount()==0,
so that tcp_init_tso_segs() can do the computation at the right time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
  2015-06-05 21:23     ` Eric Dumazet
@ 2015-06-05 23:02       ` Martin KaFai Lau
  2015-06-05 23:45         ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Martin KaFai Lau @ 2015-06-05 23:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng

On Fri, Jun 05, 2015 at 02:23:55PM -0700, Eric Dumazet wrote:
> On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote:
> 
> > tcp_trim_head() does not take the mss_now.
> > Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on
> > the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it?
> 
> It should be fine : packets not yet sent have tcp_skb_pcount()==0,
> so that tcp_init_tso_segs() can do the computation at the right time.
hmm.... From tcp_write_xmit(), tcp_init_tso_segs() makes tcp_skb_pcount(skb) > 0
but it may not be sent out immediately (like failing tcp_cwnd_test()).  Later,
this skb is considered for tcp_mtu_probe(), sliced and forgot to update the
pcount, I think.

I am probably missing something.  Hence, I have a side question
in tcp_init_tso_segs(),  should pcount be also recalculated if
(tso_segs == 1 && mss_now < skb->len)?  Like this:

diff --git i/net/ipv4/tcp_output.c w/net/ipv4/tcp_output.c
index a369e8a..15d1c44 100644
--- i/net/ipv4/tcp_output.c
+++ w/net/ipv4/tcp_output.c
@@ -1624,7 +1624,8 @@ static int tcp_init_tso_segs(const struct sock *sk, struct sk_buff *skb,
 {
 	int tso_segs = tcp_skb_pcount(skb);

-	if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
+	if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now) ||
+	    (tso_segs == 1 && mss_now < skb->len)) {
 		tcp_set_skb_tso_segs(sk, skb, mss_now);
 		tso_segs = tcp_skb_pcount(skb);
 	}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
  2015-06-05 23:02       ` Martin KaFai Lau
@ 2015-06-05 23:45         ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2015-06-05 23:45 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng

On Fri, 2015-06-05 at 16:02 -0700, Martin KaFai Lau wrote:
> On Fri, Jun 05, 2015 at 02:23:55PM -0700, Eric Dumazet wrote:
> > On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote:
> > 
> > > tcp_trim_head() does not take the mss_now.
> > > Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on
> > > the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it?
> > 
> > It should be fine : packets not yet sent have tcp_skb_pcount()==0,
> > so that tcp_init_tso_segs() can do the computation at the right time.
> hmm.... From tcp_write_xmit(), tcp_init_tso_segs() makes tcp_skb_pcount(skb) > 0
> but it may not be sent out immediately (like failing tcp_cwnd_test()).  Later,
> this skb is considered for tcp_mtu_probe(), sliced and forgot to update the
> pcount, I think.

Then clear pcount at this point, if this is a requirement.

Or else the test on !tso_segs seems quite lazy to me.

> 
> I am probably missing something.  Hence, I have a side question
> in tcp_init_tso_segs(),  should pcount be also recalculated if
> (tso_segs == 1 && mss_now < skb->len)?  Like this:
> 
> diff --git i/net/ipv4/tcp_output.c w/net/ipv4/tcp_output.c
> index a369e8a..15d1c44 100644
> --- i/net/ipv4/tcp_output.c
> +++ w/net/ipv4/tcp_output.c
> @@ -1624,7 +1624,8 @@ static int tcp_init_tso_segs(const struct sock *sk, struct sk_buff *skb,
>  {
>  	int tso_segs = tcp_skb_pcount(skb);
> 
> -	if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
> +	if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now) ||
> +	    (tso_segs == 1 && mss_now < skb->len)) {
>  		tcp_set_skb_tso_segs(sk, skb, mss_now);
>  		tso_segs = tcp_skb_pcount(skb);
>  	}


See above. Lets keep fast path as fast as possible ;)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-06-05 23:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-05 16:16 [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing Martin KaFai Lau
2015-06-05 16:53 ` Eric Dumazet
2015-06-05 18:02   ` Martin KaFai Lau
2015-06-05 21:23     ` Eric Dumazet
2015-06-05 23:02       ` Martin KaFai Lau
2015-06-05 23:45         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).