* [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing
@ 2015-06-06 0:46 Martin KaFai Lau
2015-06-06 1:11 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Martin KaFai Lau @ 2015-06-06 0:46 UTC (permalink / raw)
To: netdev; +Cc: Kernel Team, Grant Zhang, Eric Dumazet, Neal Cardwell,
Yuchung Cheng
The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment():
[<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20
[<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0
[<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230
[<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80
[<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90
[<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0
[<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0
The WARN_ON pointed out that tcp_skb_pcount (i.e.
TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent.
The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0.
v2
- Replace the skb slicing codes by the existing tcp_trim_head(),
suggested by Eric Dumazet.
v1
- Call tcp_set_skb_tso_segs() for all slicing cases.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Grant Zhang <gzhang@fastly.com>
Cc: Grant Zhang <gzhang@fastly.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
net/ipv4/tcp_output.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a369e8a..4ae4f0c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1977,16 +1977,8 @@ static int tcp_mtu_probe(struct sock *sk)
} else {
TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags &
~(TCPHDR_FIN|TCPHDR_PSH);
- if (!skb_shinfo(skb)->nr_frags) {
- skb_pull(skb, copy);
- if (skb->ip_summed != CHECKSUM_PARTIAL)
- skb->csum = csum_partial(skb->data,
- skb->len, 0);
- } else {
- __pskb_trim_head(skb, copy);
- tcp_set_skb_tso_segs(sk, skb, mss_now);
- }
- TCP_SKB_CB(skb)->seq += copy;
+ tcp_skb_pcount_set(skb, 0);
+ tcp_trim_head(sk, skb, copy);
}
len += copy;
--
1.8.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing
2015-06-06 0:46 [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing Martin KaFai Lau
@ 2015-06-06 1:11 ` Eric Dumazet
2015-06-08 17:58 ` Martin KaFai Lau
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2015-06-06 1:11 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: netdev, Kernel Team, Grant Zhang, Eric Dumazet, Neal Cardwell,
Yuchung Cheng
On Fri, 2015-06-05 at 17:46 -0700, Martin KaFai Lau wrote:
> The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment():
>
> [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20
> [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0
> [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230
> [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80
> [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90
> [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0
> [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0
>
> The WARN_ON pointed out that tcp_skb_pcount (i.e.
> TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent.
>
> The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0.
>
> v2
> - Replace the skb slicing codes by the existing tcp_trim_head(),
> suggested by Eric Dumazet.
>
> v1
> - Call tcp_set_skb_tso_segs() for all slicing cases.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> Reported-by: Grant Zhang <gzhang@fastly.com>
> Cc: Grant Zhang <gzhang@fastly.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
> net/ipv4/tcp_output.c | 12 ++----------
> 1 file changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index a369e8a..4ae4f0c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1977,16 +1977,8 @@ static int tcp_mtu_probe(struct sock *sk)
> } else {
> TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags &
> ~(TCPHDR_FIN|TCPHDR_PSH);
> - if (!skb_shinfo(skb)->nr_frags) {
> - skb_pull(skb, copy);
> - if (skb->ip_summed != CHECKSUM_PARTIAL)
> - skb->csum = csum_partial(skb->data,
> - skb->len, 0);
> - } else {
> - __pskb_trim_head(skb, copy);
> - tcp_set_skb_tso_segs(sk, skb, mss_now);
> - }
> - TCP_SKB_CB(skb)->seq += copy;
> + tcp_skb_pcount_set(skb, 0);
> + tcp_trim_head(sk, skb, copy);
> }
>
> len += copy;
I think the invariant should be that if a packet had been never sent,
its pcount should be already 0.
(cleared in do_tcp_sendpages() and tcp_sendmsg() : it seems we hacked
these functions already in the past :( )
So we might need to track places where we violate this rule, then get
rid of the tcp_skb_pcount_set(skb, 0); done in do_tcp_sendpages() and
tcp_sendmsg().
Here, trimming a packet that was never sent (by definition) should not
force pcount to 0, it should already be the case.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing
2015-06-06 1:11 ` Eric Dumazet
@ 2015-06-08 17:58 ` Martin KaFai Lau
2015-06-08 18:11 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Martin KaFai Lau @ 2015-06-08 17:58 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, Kernel Team, Grant Zhang, Eric Dumazet, Neal Cardwell,
Yuchung Cheng
On Fri, Jun 05, 2015 at 06:11:33PM -0700, Eric Dumazet wrote:
> On Fri, 2015-06-05 at 17:46 -0700, Martin KaFai Lau wrote:
> > The problem is caught by this WARN_ON(len > skb->len) in tcp_fragment():
> >
> > [<ffffffff810510ca>] warn_slowpath_null+0x1a/0x20
> > [<ffffffff8160ec90>] tcp_fragment+0x2a0/0x2b0
> > [<ffffffff81604e06>] tcp_mark_head_lost+0x196/0x230
> > [<ffffffff8160585d>] tcp_update_scoreboard+0x4d/0x80
> > [<ffffffff8160a9ac>] tcp_fastretrans_alert+0x6ac/0xa90
> > [<ffffffff8160b834>] tcp_ack+0x9d4/0x10e0
> > [<ffffffff8160c699>] tcp_rcv_established+0x309/0x7e0
> >
> > The WARN_ON pointed out that tcp_skb_pcount (i.e.
> > TCP_SKB_CB(skb)->tcp_gso_segs) and skb->len is inconsistent.
> >
> > The WARN_ON stack goes away after setting net.ipv4.tcp_mtu_probing to 0.
> >
> > v2
> > - Replace the skb slicing codes by the existing tcp_trim_head(),
> > suggested by Eric Dumazet.
> >
> > v1
> > - Call tcp_set_skb_tso_segs() for all slicing cases.
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > Reported-by: Grant Zhang <gzhang@fastly.com>
> > Cc: Grant Zhang <gzhang@fastly.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Neal Cardwell <ncardwell@google.com>
> > Cc: Yuchung Cheng <ycheng@google.com>
> > ---
> > net/ipv4/tcp_output.c | 12 ++----------
> > 1 file changed, 2 insertions(+), 10 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index a369e8a..4ae4f0c 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -1977,16 +1977,8 @@ static int tcp_mtu_probe(struct sock *sk)
> > } else {
> > TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags &
> > ~(TCPHDR_FIN|TCPHDR_PSH);
> > - if (!skb_shinfo(skb)->nr_frags) {
> > - skb_pull(skb, copy);
> > - if (skb->ip_summed != CHECKSUM_PARTIAL)
> > - skb->csum = csum_partial(skb->data,
> > - skb->len, 0);
> > - } else {
> > - __pskb_trim_head(skb, copy);
> > - tcp_set_skb_tso_segs(sk, skb, mss_now);
> > - }
> > - TCP_SKB_CB(skb)->seq += copy;
> > + tcp_skb_pcount_set(skb, 0);
> > + tcp_trim_head(sk, skb, copy);
> > }
> >
> > len += copy;
>
>
> I think the invariant should be that if a packet had been never sent,
> its pcount should be already 0.
>
> (cleared in do_tcp_sendpages() and tcp_sendmsg() : it seems we hacked
> these functions already in the past :( )
>
> So we might need to track places where we violate this rule, then get
> rid of the tcp_skb_pcount_set(skb, 0); done in do_tcp_sendpages() and
> tcp_sendmsg().
>
> Here, trimming a packet that was never sent (by definition) should not
> force pcount to 0, it should already be the case.
It seems the invariant does not hold at this point also.
Should the invariant fix be something for net-next? or Would you like
to post a patch for it?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing
2015-06-08 17:58 ` Martin KaFai Lau
@ 2015-06-08 18:11 ` Eric Dumazet
2015-06-09 17:06 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2015-06-08 18:11 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: netdev, Kernel Team, Grant Zhang, Eric Dumazet, Neal Cardwell,
Yuchung Cheng
On Mon, 2015-06-08 at 10:58 -0700, Martin KaFai Lau wrote:
> It seems the invariant does not hold at this point also.
> Should the invariant fix be something for net-next? or Would you like
> to post a patch for it?
This patch definitely can target net-next, it is not a new regression.
Once fully tested, we can ask David to submit a stable version later.
Yes, I can provide a v3, probably at the end of today.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing
2015-06-08 18:11 ` Eric Dumazet
@ 2015-06-09 17:06 ` Eric Dumazet
2015-06-09 17:45 ` Martin KaFai Lau
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2015-06-09 17:06 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: netdev, Kernel Team, Grant Zhang, Eric Dumazet, Neal Cardwell,
Yuchung Cheng
On Mon, 2015-06-08 at 11:11 -0700, Eric Dumazet wrote:
> On Mon, 2015-06-08 at 10:58 -0700, Martin KaFai Lau wrote:
>
> > It seems the invariant does not hold at this point also.
> > Should the invariant fix be something for net-next? or Would you like
> > to post a patch for it?
>
> This patch definitely can target net-next, it is not a new regression.
>
> Once fully tested, we can ask David to submit a stable version later.
>
> Yes, I can provide a v3, probably at the end of today.
I've been working on this, but still can get the bug triggering in
tcp_fragment(), no matter what (Neal patch , yours, mine...)
I suspect a problem in skb_shift() or tcp_shifted_skb() and am doing
additional tests.
[ 1806.206345] ------------[ cut here ]------------
[ 1806.206359] WARNING: CPU: 19 PID: 0 at net/ipv4/tcp_output.c:1159 tcp_fragment+0x2b0/0x2d0()
[ 1806.206360] Modules linked in: ip_gre gre sit ip_tunnel tunnel4 bonding w1_therm wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid ib_uverbs mlx4_ib ib_sa ib_mad ib_core ib_addr ip6table_mangle ip6_tables ipv6
[ 1806.206376] CPU: 19 PID: 0 Comm: swapper/19 Tainted: G W 4.1.0-smp-DEV #1762
[ 1806.206378] ffffffff8196ba4b ffff88407fae38f8 ffffffff81643125 00000000000000dd
[ 1806.206380] 0000000000000000 ffff88407fae3938 ffffffff810a8c97 ffff88407fae3998
[ 1806.206382] ffff883f75498200 ffff881fe1d5f800 000000000000000d 0000000000004802
[ 1806.206384] Call Trace:
[ 1806.206385] <IRQ> [<ffffffff81643125>] dump_stack+0x45/0x57
[ 1806.206395] [<ffffffff810a8c97>] warn_slowpath_common+0x97/0xe0
[ 1806.206396] [<ffffffff810a8cfa>] warn_slowpath_null+0x1a/0x20
[ 1806.206398] [<ffffffff815df0c0>] tcp_fragment+0x2b0/0x2d0
[ 1806.206400] [<ffffffff815d62e1>] tcp_mark_head_lost+0x191/0x280
[ 1806.206402] [<ffffffff815d641d>] tcp_update_scoreboard+0x4d/0x80
[ 1806.206404] [<ffffffff815daa24>] tcp_fastretrans_alert+0x6e4/0xc20
[ 1806.206406] [<ffffffff815dbcef>] tcp_ack+0xb0f/0x13c0
[ 1806.206408] [<ffffffff815dcc31>] tcp_rcv_established+0x311/0x820
[ 1806.206410] [<ffffffff815e79cb>] tcp_v4_do_rcv+0x14b/0x3d0
[ 1806.206414] [<ffffffff812f15e6>] ? security_sock_rcv_skb+0x16/0x20
[ 1806.206415] [<ffffffff815e8e46>] tcp_v4_rcv+0x8c6/0x9d0
[ 1806.206418] [<ffffffff815c1dec>] ip_local_deliver_finish+0xac/0x230
[ 1806.206419] [<ffffffff815c216a>] ip_local_deliver+0xaa/0xc0
[ 1806.206421] [<ffffffff815c1d40>] ? ip_rcv_finish+0x380/0x380
[ 1806.206422] [<ffffffff815c1a48>] ip_rcv_finish+0x88/0x380
[ 1806.206423] [<ffffffff815c2487>] ip_rcv+0x307/0x420
[ 1806.206426] [<ffffffff815c19c0>] ? inet_add_protocol+0x50/0x50
[ 1806.206431] [<ffffffff8157cc0c>] __netif_receive_skb_core+0x5bc/0xa30
[ 1806.206432] [<ffffffff8157d0a6>] __netif_receive_skb+0x26/0x70
[ 1806.206434] [<ffffffff8157d180>] process_backlog+0x90/0x130
[ 1806.206436] [<ffffffff8157d9a7>] net_rx_action+0x157/0x330
[ 1806.206439] [<ffffffff810ac9c7>] __do_softirq+0xe7/0x270
[ 1806.206441] [<ffffffff810acd85>] irq_exit+0xa5/0xb0
[ 1806.206445] [<ffffffff8108aa65>] smp_call_function_single_interrupt+0x35/0x40
[ 1806.206448] [<ffffffff8164ae2b>] call_function_single_interrupt+0x6b/0x70
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing
2015-06-09 17:06 ` Eric Dumazet
@ 2015-06-09 17:45 ` Martin KaFai Lau
2015-06-09 17:59 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Martin KaFai Lau @ 2015-06-09 17:45 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, Kernel Team, Grant Zhang, Eric Dumazet, Neal Cardwell,
Yuchung Cheng
On Tue, Jun 09, 2015 at 10:06:25AM -0700, Eric Dumazet wrote:
> I've been working on this, but still can get the bug triggering in
> tcp_fragment(), no matter what (Neal patch , yours, mine...)
Can you describe the test case that can reproduce it?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing
2015-06-09 17:45 ` Martin KaFai Lau
@ 2015-06-09 17:59 ` Eric Dumazet
0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2015-06-09 17:59 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: netdev, Kernel Team, Grant Zhang, Eric Dumazet, Neal Cardwell,
Yuchung Cheng
On Tue, 2015-06-09 at 10:45 -0700, Martin KaFai Lau wrote:
> On Tue, Jun 09, 2015 at 10:06:25AM -0700, Eric Dumazet wrote:
> > I've been working on this, but still can get the bug triggering in
> > tcp_fragment(), no matter what (Neal patch , yours, mine...)
> Can you describe the test case that can reproduce it?
This might not be easy : I used a 40Gb testbed and following.
Warning triggers in about 30 minutes, but this might be related to some
unrelated traffic.
echo 4 >/proc/sys/net/ipv4/tcp_min_tso_segs
echo 0 >/proc/sys/kernel/timer_migration
echo fq >/proc/sys/net/core/default_qdisc
tc qd replace dev eth1 root mq
./super_netperf 1500 --google-pacing-rate 3028000 -H lpaa24 -l 10000 &
You might implement the pacing stuff using regular netperf with
for ETH in eth1
do
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 16`
do
slot=$( printf %x $(( i )) )
tc qd add dev $ETH parent 1:$slot fq maxrate 3028000
done
done
(Change eth1 by eth0 or something, and 16 by number of TX queues.)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-09 17:59 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-06 0:46 [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing Martin KaFai Lau
2015-06-06 1:11 ` Eric Dumazet
2015-06-08 17:58 ` Martin KaFai Lau
2015-06-08 18:11 ` Eric Dumazet
2015-06-09 17:06 ` Eric Dumazet
2015-06-09 17:45 ` Martin KaFai Lau
2015-06-09 17:59 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).