* [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
@ 2015-06-05 16:16 Martin KaFai Lau
2015-06-05 16:53 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Martin KaFai Lau @ 2015-06-05 16:16 UTC (permalink / raw)
To: netdev; +Cc: Kernel Team, Eric Dumazet, Neal Cardwell, Yuchung Cheng
The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment():
[<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20
[<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0
[<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230
[<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80
[<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90
[<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0
[<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0
The WARN_ON pointed out that tcp_skb_pcount (i.e.
TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent.
The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0.
This patch is to set the pcount after skb_pull() was called
in tcp_mtu_probe().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Grant Zhang <gzhang@fastly.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
net/ipv4/tcp_output.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a369e8a..6c95c28 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1984,8 +1984,9 @@ static int tcp_mtu_probe(struct sock *sk)
skb->len, 0);
} else {
__pskb_trim_head(skb, copy);
- tcp_set_skb_tso_segs(sk, skb, mss_now);
}
+ tcp_set_skb_tso_segs(sk, skb, mss_now);
+
TCP_SKB_CB(skb)->seq += copy;
}
--
1.8.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
2015-06-05 16:16 [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing Martin KaFai Lau
@ 2015-06-05 16:53 ` Eric Dumazet
2015-06-05 18:02 ` Martin KaFai Lau
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2015-06-05 16:53 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: netdev, Kernel Team, Neal Cardwell, Yuchung Cheng
On Fri, Jun 5, 2015 at 9:16 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment():
>
> [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20
> [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0
> [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230
> [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80
> [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90
> [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0
> [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0
>
> The WARN_ON pointed out that tcp_skb_pcount (i.e.
> TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent.
>
> The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0.
>
> This patch is to set the pcount after skb_pull() was called
> in tcp_mtu_probe().
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> Reported-by: Grant Zhang <gzhang@fastly.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
> net/ipv4/tcp_output.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index a369e8a..6c95c28 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1984,8 +1984,9 @@ static int tcp_mtu_probe(struct sock *sk)
> skb->len, 0);
> } else {
> __pskb_trim_head(skb, copy);
> - tcp_set_skb_tso_segs(sk, skb, mss_now);
> }
> + tcp_set_skb_tso_segs(sk, skb, mss_now);
> +
> TCP_SKB_CB(skb)->seq += copy;
> }
>
>
Sounds good, although I would simply get rid of all this complexity in
this very unlikely path.
Would you instead try the following ?
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index eeb59befaf06867b00e1dd6ded7742b2f0bcd821..41ef2c96b2e810e289d6d773e4be09c493b99c09
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1987,16 +1987,7 @@ static int tcp_mtu_probe(struct sock *sk)
} else {
TCP_SKB_CB(nskb)->tcp_flags |=
TCP_SKB_CB(skb)->tcp_flags &
~(TCPHDR_FIN|TCPHDR_PSH);
- if (!skb_shinfo(skb)->nr_frags) {
- skb_pull(skb, copy);
- if (skb->ip_summed != CHECKSUM_PARTIAL)
- skb->csum = csum_partial(skb->data,
- skb->len, 0);
- } else {
- __pskb_trim_head(skb, copy);
- tcp_set_skb_tso_segs(sk, skb, mss_now);
- }
- TCP_SKB_CB(skb)->seq += copy;
+ tcp_trim_head(sk, skb, copy);
}
len += copy;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
2015-06-05 16:53 ` Eric Dumazet
@ 2015-06-05 18:02 ` Martin KaFai Lau
2015-06-05 21:23 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Martin KaFai Lau @ 2015-06-05 18:02 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Kernel Team, Neal Cardwell, Yuchung Cheng
On Fri, Jun 05, 2015 at 09:53:51AM -0700, Eric Dumazet wrote:
> Sounds good, although I would simply get rid of all this complexity in
> this very unlikely path.
>
> Would you instead try the following ?
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index eeb59befaf06867b00e1dd6ded7742b2f0bcd821..41ef2c96b2e810e289d6d773e4be09c493b99c09
> 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1987,16 +1987,7 @@ static int tcp_mtu_probe(struct sock *sk)
> } else {
> TCP_SKB_CB(nskb)->tcp_flags |=
> TCP_SKB_CB(skb)->tcp_flags &
> ~(TCPHDR_FIN|TCPHDR_PSH);
> - if (!skb_shinfo(skb)->nr_frags) {
> - skb_pull(skb, copy);
> - if (skb->ip_summed != CHECKSUM_PARTIAL)
> - skb->csum = csum_partial(skb->data,
> - skb->len, 0);
> - } else {
> - __pskb_trim_head(skb, copy);
> - tcp_set_skb_tso_segs(sk, skb, mss_now);
> - }
> - TCP_SKB_CB(skb)->seq += copy;
> + tcp_trim_head(sk, skb, copy);
tcp_trim_head() does not take the mss_now.
Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on
the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
2015-06-05 18:02 ` Martin KaFai Lau
@ 2015-06-05 21:23 ` Eric Dumazet
2015-06-05 23:02 ` Martin KaFai Lau
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2015-06-05 21:23 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng
On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote:
> tcp_trim_head() does not take the mss_now.
> Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on
> the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it?
It should be fine : packets not yet sent have tcp_skb_pcount()==0,
so that tcp_init_tso_segs() can do the computation at the right time.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
2015-06-05 21:23 ` Eric Dumazet
@ 2015-06-05 23:02 ` Martin KaFai Lau
2015-06-05 23:45 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Martin KaFai Lau @ 2015-06-05 23:02 UTC (permalink / raw)
To: Eric Dumazet
Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng
On Fri, Jun 05, 2015 at 02:23:55PM -0700, Eric Dumazet wrote:
> On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote:
>
> > tcp_trim_head() does not take the mss_now.
> > Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on
> > the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it?
>
> It should be fine : packets not yet sent have tcp_skb_pcount()==0,
> so that tcp_init_tso_segs() can do the computation at the right time.
hmm.... From tcp_write_xmit(), tcp_init_tso_segs() makes tcp_skb_pcount(skb) > 0
but it may not be sent out immediately (like failing tcp_cwnd_test()). Later,
this skb is considered for tcp_mtu_probe(), sliced and forgot to update the
pcount, I think.
I am probably missing something. Hence, I have a side question
in tcp_init_tso_segs(), should pcount be also recalculated if
(tso_segs == 1 && mss_now < skb->len)? Like this:
diff --git i/net/ipv4/tcp_output.c w/net/ipv4/tcp_output.c
index a369e8a..15d1c44 100644
--- i/net/ipv4/tcp_output.c
+++ w/net/ipv4/tcp_output.c
@@ -1624,7 +1624,8 @@ static int tcp_init_tso_segs(const struct sock *sk, struct sk_buff *skb,
{
int tso_segs = tcp_skb_pcount(skb);
- if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
+ if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now) ||
+ (tso_segs == 1 && mss_now < skb->len)) {
tcp_set_skb_tso_segs(sk, skb, mss_now);
tso_segs = tcp_skb_pcount(skb);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing
2015-06-05 23:02 ` Martin KaFai Lau
@ 2015-06-05 23:45 ` Eric Dumazet
0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2015-06-05 23:45 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Eric Dumazet, netdev, Kernel Team, Neal Cardwell, Yuchung Cheng
On Fri, 2015-06-05 at 16:02 -0700, Martin KaFai Lau wrote:
> On Fri, Jun 05, 2015 at 02:23:55PM -0700, Eric Dumazet wrote:
> > On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote:
> >
> > > tcp_trim_head() does not take the mss_now.
> > > Is it fine to have mss_now <= tcp_skb_mss(skb)? or we can depend on
> > > the tcp_init_tso_segs() in the tcp_write_xmit() to take care of it?
> >
> > It should be fine : packets not yet sent have tcp_skb_pcount()==0,
> > so that tcp_init_tso_segs() can do the computation at the right time.
> hmm.... From tcp_write_xmit(), tcp_init_tso_segs() makes tcp_skb_pcount(skb) > 0
> but it may not be sent out immediately (like failing tcp_cwnd_test()). Later,
> this skb is considered for tcp_mtu_probe(), sliced and forgot to update the
> pcount, I think.
Then clear pcount at this point, if this is a requirement.
Or else the test on !tso_segs seems quite lazy to me.
>
> I am probably missing something. Hence, I have a side question
> in tcp_init_tso_segs(), should pcount be also recalculated if
> (tso_segs == 1 && mss_now < skb->len)? Like this:
>
> diff --git i/net/ipv4/tcp_output.c w/net/ipv4/tcp_output.c
> index a369e8a..15d1c44 100644
> --- i/net/ipv4/tcp_output.c
> +++ w/net/ipv4/tcp_output.c
> @@ -1624,7 +1624,8 @@ static int tcp_init_tso_segs(const struct sock *sk, struct sk_buff *skb,
> {
> int tso_segs = tcp_skb_pcount(skb);
>
> - if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
> + if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now) ||
> + (tso_segs == 1 && mss_now < skb->len)) {
> tcp_set_skb_tso_segs(sk, skb, mss_now);
> tso_segs = tcp_skb_pcount(skb);
> }
See above. Lets keep fast path as fast as possible ;)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-06-05 23:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-05 16:16 [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing Martin KaFai Lau
2015-06-05 16:53 ` Eric Dumazet
2015-06-05 18:02 ` Martin KaFai Lau
2015-06-05 21:23 ` Eric Dumazet
2015-06-05 23:02 ` Martin KaFai Lau
2015-06-05 23:45 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).