* [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing @ 2015-06-05 16:16 Martin KaFai Lau 2015-06-05 16:53 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Martin KaFai Lau @ 2015-06-05 16:16 UTC (permalink / raw) To: netdev; +Cc: Kernel Team, Eric Dumazet, Neal Cardwell, Yuchung Cheng The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment(): [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20 [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0 [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230 [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80 [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90 [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0 [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0 The WARN_ON pointed out that tcp_skb_pcount (i.e. TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent. The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0. This patch is to set the pcount after skb_pull() was called in tcp_mtu_probe(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Grant Zhang <gzhang@fastly.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> --- net/ipv4/tcp_output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a369e8a..6c95c28 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1984,8 +1984,9 @@ static int tcp_mtu_probe(struct sock *sk) skb->len, 0); } else { __pskb_trim_head(skb, copy); - tcp_set_skb_tso_segs(sk, skb, mss_now); } + tcp_set_skb_tso_segs(sk, skb, mss_now); + TCP_SKB_CB(skb)->seq += copy; } -- 1.8.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing 2015-06-05 16:16 [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing Martin KaFai Lau @ 2015-06-05 16:53 ` Eric Dumazet 2015-06-05 18:02 ` Martin KaFai Lau 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2015-06-05 16:53 UTC (permalink / raw) To: Martin KaFai Lau; +Cc: netdev, Kernel Team, Neal Cardwell, Yuchung Cheng On Fri, Jun 5, 2015 at 9:16 AM, Martin KaFai Lau <kafai@fb.com> wrote: > The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment(): > > [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20 > [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0 > [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230 > [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80 > [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90 > [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0 > [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0 > > The WARN_ON pointed out that tcp_skb_pcount (i.e. > TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent. > > The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0. > > This patch is to set the pcount after skb_pull() was called > in tcp_mtu_probe(). > > Signed-off-by: Martin KaFai Lau <kafai@fb.com> > Reported-by: Grant Zhang <gzhang@fastly.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > --- > net/ipv4/tcp_output.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index a369e8a..6c95c28 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1984,8 +1984,9 @@ static int tcp_mtu_probe(struct sock *sk) > skb->len, 0); > } else { > __pskb_trim_head(skb, copy); > - tcp_set_skb_tso_segs(sk, skb, mss_now); > } > + tcp_set_skb_tso_segs(sk, skb, mss_now); > + > TCP_SKB_CB(skb)->seq += copy; > } > > Sounds good, although I would simply get rid of all this complexity in this very unlikely path. Would you instead try the following ? diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index eeb59befaf06867b00e1dd6ded7742b2f0bcd821..41ef2c96b2e810e289d6d773e4be09c493b99c09 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1987,16 +1987,7 @@ static int tcp_mtu_probe(struct sock *sk) } else { TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags & ~(TCPHDR_FIN|TCPHDR_PSH); - if (!skb_shinfo(skb)->nr_frags) { - skb_pull(skb, copy); - if (skb->ip_summed != CHECKSUM_PARTIAL) - skb->csum = csum_partial(skb->data, - skb->len, 0); - } else { - __pskb_trim_head(skb, copy); - tcp_set_skb_tso_segs(sk, skb, mss_now); - } - TCP_SKB_CB(skb)->seq += copy; + tcp_trim_head(sk, skb, copy); } len += copy; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing 2015-06-05 16:53 ` Eric Dumazet @ 2015-06-05 18:02 ` Martin KaFai Lau 2015-06-05 21:23 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Martin KaFai Lau @ 2015-06-05 18:02 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Kernel Team, Neal Cardwell, Yuchung Cheng On Fri, Jun 05, 2015 at 09:53:51AM -0700, Eric Dumazet wrote: > Sounds good, although I would simply get rid of all this complexity in > this very unlikely path. > > Would you instead try the following ? > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index eeb59befaf06867b00e1dd6ded7742b2f0bcd821..41ef2c96b2e810e289d6d773e4be09c493b99c09 > 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1987,16 +1987,7 @@ static int tcp_mtu_probe(struct sock *sk) > } else { > TCP_SKB_CB(nskb)->tcp_flags |= > TCP_SKB_CB(skb)->tcp_flags & > ~(TCPHDR_FIN|TCPHDR_PSH); > - if (!skb_shinfo(skb)->nr_frags) { > - skb_pull(skb, copy); > - if (skb->ip_summed != CHECKSUM_PARTIAL) > - skb->csum = csum_partial(skb->data, > - skb->len, 0); > - } else { > - __pskb_trim_head(skb, copy); > - tcp_set_skb_tso_segs(sk, skb, mss_now); > - } > - TCP_SKB_CB(skb)->seq += copy; > + tcp_trim_head(sk, skb, copy); tcp_trim_head() does not take the mss_now. Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing 2015-06-05 18:02 ` Martin KaFai Lau @ 2015-06-05 21:23 ` Eric Dumazet 2015-06-05 23:02 ` Martin KaFai Lau 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2015-06-05 21:23 UTC (permalink / raw) To: Martin KaFai Lau Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote: > tcp_trim_head() does not take the mss_now. > Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on > the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it? It should be fine : packets not yet sent have tcp_skb_pcount()==0, so that tcp_init_tso_segs() can do the computation at the right time. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing 2015-06-05 21:23 ` Eric Dumazet @ 2015-06-05 23:02 ` Martin KaFai Lau 2015-06-05 23:45 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Martin KaFai Lau @ 2015-06-05 23:02 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng On Fri, Jun 05, 2015 at 02:23:55PM -0700, Eric Dumazet wrote: > On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote: > > > tcp_trim_head() does not take the mss_now. > > Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on > > the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it? > > It should be fine : packets not yet sent have tcp_skb_pcount()==0, > so that tcp_init_tso_segs() can do the computation at the right time. hmm.... From tcp_write_xmit(), tcp_init_tso_segs() makes tcp_skb_pcount(skb) > 0 but it may not be sent out immediately (like failing tcp_cwnd_test()). Later, this skb is considered for tcp_mtu_probe(), sliced and forgot to update the pcount, I think. I am probably missing something. Hence, I have a side question in tcp_init_tso_segs(), should pcount be also recalculated if (tso_segs == 1 && mss_now < skb->len)? Like this: diff --git i/net/ipv4/tcp_output.c w/net/ipv4/tcp_output.c index a369e8a..15d1c44 100644 --- i/net/ipv4/tcp_output.c +++ w/net/ipv4/tcp_output.c @@ -1624,7 +1624,8 @@ static int tcp_init_tso_segs(const struct sock *sk, struct sk_buff *skb, { int tso_segs = tcp_skb_pcount(skb); - if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) { + if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now) || + (tso_segs == 1 && mss_now < skb->len)) { tcp_set_skb_tso_segs(sk, skb, mss_now); tso_segs = tcp_skb_pcount(skb); } ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing 2015-06-05 23:02 ` Martin KaFai Lau @ 2015-06-05 23:45 ` Eric Dumazet 0 siblings, 0 replies; 6+ messages in thread From: Eric Dumazet @ 2015-06-05 23:45 UTC (permalink / raw) To: Martin KaFai Lau Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng On Fri, 2015-06-05 at 16:02 -0700, Martin KaFai Lau wrote: > On Fri, Jun 05, 2015 at 02:23:55PM -0700, Eric Dumazet wrote: > > On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote: > > > > > tcp_trim_head() does not take the mss_now. > > > Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on > > > the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it? > > > > It should be fine : packets not yet sent have tcp_skb_pcount()==0, > > so that tcp_init_tso_segs() can do the computation at the right time. > hmm.... From tcp_write_xmit(), tcp_init_tso_segs() makes tcp_skb_pcount(skb) > 0 > but it may not be sent out immediately (like failing tcp_cwnd_test()). Later, > this skb is considered for tcp_mtu_probe(), sliced and forgot to update the > pcount, I think. Then clear pcount at this point, if this is a requirement. Or else the test on !tso_segs seems quite lazy to me. > > I am probably missing something. Hence, I have a side question > in tcp_init_tso_segs(), should pcount be also recalculated if > (tso_segs == 1 && mss_now < skb->len)? Like this: > > diff --git i/net/ipv4/tcp_output.c w/net/ipv4/tcp_output.c > index a369e8a..15d1c44 100644 > --- i/net/ipv4/tcp_output.c > +++ w/net/ipv4/tcp_output.c > @@ -1624,7 +1624,8 @@ static int tcp_init_tso_segs(const struct sock *sk, struct sk_buff *skb, > { > int tso_segs = tcp_skb_pcount(skb); > > - if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) { > + if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now) || > + (tso_segs == 1 && mss_now < skb->len)) { > tcp_set_skb_tso_segs(sk, skb, mss_now); > tso_segs = tcp_skb_pcount(skb); > } See above. Lets keep fast path as fast as possible ;) ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-06-05 23:45 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-06-05 16:16 [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing Martin KaFai Lau 2015-06-05 16:53 ` Eric Dumazet 2015-06-05 18:02 ` Martin KaFai Lau 2015-06-05 21:23 ` Eric Dumazet 2015-06-05 23:02 ` Martin KaFai Lau 2015-06-05 23:45 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).