* TCP syn flood handling regression @ 2012-03-10 12:27 Simon Kirby 2012-03-10 17:44 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Simon Kirby @ 2012-03-10 12:27 UTC (permalink / raw) To: David S. Miller; +Cc: netdev Hello! A typical port 80 SYN flood started up to one of our clusters, but this time, it didn't work so well. Legitimate connections and trying to fetch server-status via localhost would hang for ~30 seconds before responding, even though though the box had plenty of spare cycles. An strace of all Apache processes showed quite a bit of sleeping in accept4(). This was with 3.2.9, so I went back in kernel builds and found that 3.1 and 3.0 were also broken, while 2.6.39 works as I remember -- when syn cookies are enabled, everything just works and is fast. The DoS kept up, so I was able to feed a bit to a node to do some bisection. Of course, the DoS stopped literally seconds before the last bisection test, but I got it down to: # good: [0e734419923bd8e599858f8fc196c7804bb85564] ipv4: Use inet_csk_route_child_sock() in DCCP and TCP. # bad: [ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0] ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit(). ...leaving ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0 and d9d8da805dcb503ef8ee49918a94d49085060f23 as culprits. I've stared at them but can't see what could be doing this. Simon- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TCP syn flood handling regression 2012-03-10 12:27 TCP syn flood handling regression Simon Kirby @ 2012-03-10 17:44 ` Eric Dumazet 2012-03-10 17:56 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2012-03-10 17:44 UTC (permalink / raw) To: Simon Kirby; +Cc: David S. Miller, netdev Le samedi 10 mars 2012 à 04:27 -0800, Simon Kirby a écrit : > Hello! > > A typical port 80 SYN flood started up to one of our clusters, but this > time, it didn't work so well. Legitimate connections and trying to fetch > server-status via localhost would hang for ~30 seconds before responding, > even though though the box had plenty of spare cycles. An strace of all > Apache processes showed quite a bit of sleeping in accept4(). > > This was with 3.2.9, so I went back in kernel builds and found that 3.1 > and 3.0 were also broken, while 2.6.39 works as I remember -- when syn > cookies are enabled, everything just works and is fast. The DoS kept up, > so I was able to feed a bit to a node to do some bisection. > > Of course, the DoS stopped literally seconds before the last bisection > test, but I got it down to: > > # good: [0e734419923bd8e599858f8fc196c7804bb85564] ipv4: Use inet_csk_route_child_sock() in DCCP and TCP. > # bad: [ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0] ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit(). > > ...leaving ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0 and > d9d8da805dcb503ef8ee49918a94d49085060f23 as culprits. > > I've stared at them but can't see what could be doing this. > Hi Simon, thanks a lot for this report. This sounds as a SYNCOOKIE side effect I could reproduce this with a typical DOS on ssh port. Yes, it seems first messages are not sent at all, only after several tcp timeouts : Here my sshd write() its first frame (23 bytes) right after the 3WHS, but we can see first frame leaves the machine 17 seconds later : 17:42:49.958432 IP 10.170.237.2.37550 > 192.168.20.110.22: S 3066771530:3066771530(0) win 14600 <mss 1460,sackOK,timestamp 19981015 0,nop,wscale 6> 17:42:49.958438 IP 192.168.20.110.22 > 10.170.237.2.37550: S 881027529:881027529(0) ack 3066771531 win 14480 <mss 1460,sackOK,timestamp 1550353622 19981015,nop,wscale 4> 17:42:50.137163 IP 10.170.237.2.37550 > 192.168.20.110.22: . ack 1 win 229 <nop,nop,timestamp 19981060 1550353622> 17:43:07.557096 IP 192.168.20.110.22 > 10.170.237.2.37550: P 1:24(23) ack 1 win 913 <nop,nop,timestamp 1550371264 19981060> 17:43:07.757096 IP 10.170.237.2.37550 > 192.168.20.110.22: . ack 24 win 229 <nop,nop,timestamp 19985459 1550371264> 17:43:07.936756 IP 10.170.237.2.37550 > 192.168.20.110.22: P 1:40(39) ack 24 win 229 <nop,nop,timestamp 19985459 1550371264> 17:43:07.936769 IP 192.168.20.110.22 > 10.170.237.2.37550: . ack 40 win 913 <nop,nop,timestamp 1550371643 19985459> 17:43:07.937384 IP 192.168.20.110.22 > 10.170.237.2.37550: P 24:664(640) ack 40 win 913 <nop,nop,timestamp 1550371644 19985459> 17:43:08.156252 IP 10.170.237.2.37550 > 192.168.20.110.22: P 40:1184(1144) ack 24 win 229 <nop,nop,timestamp 19985554 1550371643> To check that its indeed the tcp retransmits that trigger and eventually send the frame : $ ss -emoi dst 10.170.237.2 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 23 ::ffff:192.168.20.110:ssh ::ffff:10.170.237.2:37604 timer:(on,700ms,0) ino:11067765 sk:f3564780 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:1209 rtt:403/201.5 cwnd:10 ssthresh:5 send 287.4Kbps rcv_space:14600 $ ss -emoi dst 10.170.237.2 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 23 ::ffff:192.168.20.110:ssh ::ffff:10.170.237.2:37604 timer:(on,1.529ms,1) ino:11067765 sk:f3564780 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:2418 rtt:403/201.5 ssthresh:5 send 57.5Kbps rcv_space:14600 $ ss -emoi dst 10.170.237.2 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 23 ::ffff:192.168.20.110:ssh ::ffff:10.170.237.2:37604 timer:(on,4.005ms,2) ino:11067765 sk:f3564780 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:4836 rtt:403/201.5 ssthresh:5 send 57.5Kbps rcv_space:14600 $ ss -emoi dst 10.170.237.2 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 23 ::ffff:192.168.20.110:ssh ::ffff:10.170.237.2:37604 timer:(on,1.773ms,2) ino:11067765 sk:f3564780 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:4836 rtt:403/201.5 ssthresh:5 send 57.5Kbps rcv_space:14600 $ ss -emoi dst 10.170.237.2 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 23 ::ffff:192.168.20.110:ssh ::ffff:10.170.237.2:37604 timer:(on,8.574ms,3) ino:11067765 sk:f3564780 mem:(r0,w2240,f1856,t0) ts sack bic wscale:6,4 rto:9672 rtt:403/201.5 ssthresh:5 send 57.5Kbps rcv_space:14600 $ ss -emoi dst 10.170.237.2 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 ::ffff:192.168.20.110:ssh ::ffff:10.170.237.2:37604 timer:(keepalive,48min,0) ino:11067765 sk:f3564780 mem:(r0,w0,f0,t0) ts sack bic wscale:6,4 rto:806 rtt:263.875/105.5 ato:40 cwnd:6 ssthresh:5 send 263.4Kbps rcv_rtt:181 rcv_space:14600 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: TCP syn flood handling regression 2012-03-10 17:44 ` Eric Dumazet @ 2012-03-10 17:56 ` Eric Dumazet 2012-03-10 19:20 ` [PATCH] tcp: fix syncookie regression Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2012-03-10 17:56 UTC (permalink / raw) To: Simon Kirby; +Cc: David S. Miller, netdev Le samedi 10 mars 2012 à 09:44 -0800, Eric Dumazet a écrit : > Le samedi 10 mars 2012 à 04:27 -0800, Simon Kirby a écrit : > > Hello! > > > > A typical port 80 SYN flood started up to one of our clusters, but this > > time, it didn't work so well. Legitimate connections and trying to fetch > > server-status via localhost would hang for ~30 seconds before responding, > > even though though the box had plenty of spare cycles. An strace of all > > Apache processes showed quite a bit of sleeping in accept4(). > > > > This was with 3.2.9, so I went back in kernel builds and found that 3.1 > > and 3.0 were also broken, while 2.6.39 works as I remember -- when syn > > cookies are enabled, everything just works and is fast. The DoS kept up, > > so I was able to feed a bit to a node to do some bisection. > > > > Of course, the DoS stopped literally seconds before the last bisection > > test, but I got it down to: > > > > # good: [0e734419923bd8e599858f8fc196c7804bb85564] ipv4: Use inet_csk_route_child_sock() in DCCP and TCP. > > # bad: [ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0] ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit(). > > > > ...leaving ea4fc0d6193ff56fcef39b0d2210d402a7acb5f0 and > > d9d8da805dcb503ef8ee49918a94d49085060f23 as culprits. > > > > I've stared at them but can't see what could be doing this. > > > > Hi Simon, thanks a lot for this report. > > This sounds as a SYNCOOKIE side effect > It seems we miss a rebuild_header() indeed in the case of a syncookie initiated socket. So inet->cork.fl.u.ip4 is not correctly setup and ea4fc0d6193ff56fcef39b0d2210d402a7acb5f relied on this. I'll send a patch once tested. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] tcp: fix syncookie regression 2012-03-10 17:56 ` Eric Dumazet @ 2012-03-10 19:20 ` Eric Dumazet 2012-03-11 22:54 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Eric Dumazet @ 2012-03-10 19:20 UTC (permalink / raw) To: Simon Kirby; +Cc: David S. Miller, netdev commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit()) added a serious regression on synflood handling. Simon Kirby discovered a successful connection was delayed by 20 seconds before being responsive. In my tests, I discovered that xmit frames were lost, and needed ~4 retransmits and a socket dst rebuild before being really sent. In case of syncookie initiated connection, we use a different path to initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared. As ip_queue_xmit() now depends on inet flow being setup, fix this by copying the temp flowi4 we use in cookie_v4_check(). Reported-by: Simon Kirby <sim@netnation.com> Bisected-by: Simon Kirby <sim@netnation.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/ipv4/syncookies.c | 30 ++++++++++++++++-------------- net/ipv4/tcp_ipv4.c | 10 +++++++--- 2 files changed, 23 insertions(+), 17 deletions(-) diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 51fdbb4..eab2a7f 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -278,6 +278,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, struct rtable *rt; __u8 rcv_wscale; bool ecn_ok = false; + struct flowi4 fl4; if (!sysctl_tcp_syncookies || !th->ack || th->rst) goto out; @@ -346,20 +347,16 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, * hasn't changed since we received the original syn, but I see * no easy way to do this. */ - { - struct flowi4 fl4; - - flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk), - RT_SCOPE_UNIVERSE, IPPROTO_TCP, - inet_sk_flowi_flags(sk), - (opt && opt->srr) ? opt->faddr : ireq->rmt_addr, - ireq->loc_addr, th->source, th->dest); - security_req_classify_flow(req, flowi4_to_flowi(&fl4)); - rt = ip_route_output_key(sock_net(sk), &fl4); - if (IS_ERR(rt)) { - reqsk_free(req); - goto out; - } + flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk), + RT_SCOPE_UNIVERSE, IPPROTO_TCP, + inet_sk_flowi_flags(sk), + (opt && opt->srr) ? opt->faddr : ireq->rmt_addr, + ireq->loc_addr, th->source, th->dest); + security_req_classify_flow(req, flowi4_to_flowi(&fl4)); + rt = ip_route_output_key(sock_net(sk), &fl4); + if (IS_ERR(rt)) { + reqsk_free(req); + goto out; } /* Try to redo what tcp_v4_send_synack did. */ @@ -373,5 +370,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, ireq->rcv_wscale = rcv_wscale; ret = get_cookie_sock(sk, skb, req, &rt->dst); + /* ip_queue_xmit() depends on our flow being setup + * Normal sockets get it right from inet_csk_route_child_sock() + */ + if (ret) + inet_sk(ret)->cork.fl.u.ip4 = fl4; out: return ret; } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 94d683a..fd54c5f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1466,9 +1466,13 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb, inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen; newinet->inet_id = newtp->write_seq ^ jiffies; - if (!dst && (dst = inet_csk_route_child_sock(sk, newsk, req)) == NULL) - goto put_and_exit; - + if (!dst) { + dst = inet_csk_route_child_sock(sk, newsk, req); + if (!dst) + goto put_and_exit; + } else { + /* syncookie case : see end of cookie_v4_check() */ + } sk_setup_caps(newsk, dst); tcp_mtup_init(newsk); ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-10 19:20 ` [PATCH] tcp: fix syncookie regression Eric Dumazet @ 2012-03-11 22:54 ` David Miller 2012-03-12 20:30 ` Simon Kirby 2012-03-13 7:26 ` Simon Kirby 2 siblings, 0 replies; 14+ messages in thread From: David Miller @ 2012-03-11 22:54 UTC (permalink / raw) To: eric.dumazet; +Cc: sim, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sat, 10 Mar 2012 11:20:21 -0800 > commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit()) > added a serious regression on synflood handling. > > Simon Kirby discovered a successful connection was delayed by 20 seconds > before being responsive. > > In my tests, I discovered that xmit frames were lost, and needed ~4 > retransmits and a socket dst rebuild before being really sent. > > In case of syncookie initiated connection, we use a different path to > initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared. > > As ip_queue_xmit() now depends on inet flow being setup, fix this by > copying the temp flowi4 we use in cookie_v4_check(). > > Reported-by: Simon Kirby <sim@netnation.com> > Bisected-by: Simon Kirby <sim@netnation.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Applied and queued up for -stable, thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-10 19:20 ` [PATCH] tcp: fix syncookie regression Eric Dumazet 2012-03-11 22:54 ` David Miller @ 2012-03-12 20:30 ` Simon Kirby 2012-03-13 7:26 ` Simon Kirby 2 siblings, 0 replies; 14+ messages in thread From: Simon Kirby @ 2012-03-12 20:30 UTC (permalink / raw) To: Eric Dumazet; +Cc: David S. Miller, netdev On Sat, Mar 10, 2012 at 11:20:21AM -0800, Eric Dumazet wrote: > commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit()) > added a serious regression on synflood handling. > > Simon Kirby discovered a successful connection was delayed by 20 seconds > before being responsive. > > In my tests, I discovered that xmit frames were lost, and needed ~4 > retransmits and a socket dst rebuild before being really sent. > > In case of syncookie initiated connection, we use a different path to > initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared. > > As ip_queue_xmit() now depends on inet flow being setup, fix this by > copying the temp flowi4 we use in cookie_v4_check(). > > Reported-by: Simon Kirby <sim@netnation.com> > Bisected-by: Simon Kirby <sim@netnation.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Tested-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/ipv4/syncookies.c | 30 ++++++++++++++++-------------- > net/ipv4/tcp_ipv4.c | 10 +++++++--- > 2 files changed, 23 insertions(+), 17 deletions(-) > > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c > index 51fdbb4..eab2a7f 100644 > --- a/net/ipv4/syncookies.c > +++ b/net/ipv4/syncookies.c > @@ -278,6 +278,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, > struct rtable *rt; > __u8 rcv_wscale; > bool ecn_ok = false; > + struct flowi4 fl4; > > if (!sysctl_tcp_syncookies || !th->ack || th->rst) > goto out; > @@ -346,20 +347,16 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, > * hasn't changed since we received the original syn, but I see > * no easy way to do this. > */ > - { > - struct flowi4 fl4; > - > - flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk), > - RT_SCOPE_UNIVERSE, IPPROTO_TCP, > - inet_sk_flowi_flags(sk), > - (opt && opt->srr) ? opt->faddr : ireq->rmt_addr, > - ireq->loc_addr, th->source, th->dest); > - security_req_classify_flow(req, flowi4_to_flowi(&fl4)); > - rt = ip_route_output_key(sock_net(sk), &fl4); > - if (IS_ERR(rt)) { > - reqsk_free(req); > - goto out; > - } > + flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk), > + RT_SCOPE_UNIVERSE, IPPROTO_TCP, > + inet_sk_flowi_flags(sk), > + (opt && opt->srr) ? opt->faddr : ireq->rmt_addr, > + ireq->loc_addr, th->source, th->dest); > + security_req_classify_flow(req, flowi4_to_flowi(&fl4)); > + rt = ip_route_output_key(sock_net(sk), &fl4); > + if (IS_ERR(rt)) { > + reqsk_free(req); > + goto out; > } > > /* Try to redo what tcp_v4_send_synack did. */ > @@ -373,5 +370,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, > ireq->rcv_wscale = rcv_wscale; > > ret = get_cookie_sock(sk, skb, req, &rt->dst); > + /* ip_queue_xmit() depends on our flow being setup > + * Normal sockets get it right from inet_csk_route_child_sock() > + */ > + if (ret) > + inet_sk(ret)->cork.fl.u.ip4 = fl4; > out: return ret; > } > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 94d683a..fd54c5f 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -1466,9 +1466,13 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb, > inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen; > newinet->inet_id = newtp->write_seq ^ jiffies; > > - if (!dst && (dst = inet_csk_route_child_sock(sk, newsk, req)) == NULL) > - goto put_and_exit; > - > + if (!dst) { > + dst = inet_csk_route_child_sock(sk, newsk, req); > + if (!dst) > + goto put_and_exit; > + } else { > + /* syncookie case : see end of cookie_v4_check() */ > + } > sk_setup_caps(newsk, dst); > > tcp_mtup_init(newsk); Tested under real SYN flood on 3.3-rc7 and 3.2.9 -- works! Thanks! Simon- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-10 19:20 ` [PATCH] tcp: fix syncookie regression Eric Dumazet 2012-03-11 22:54 ` David Miller 2012-03-12 20:30 ` Simon Kirby @ 2012-03-13 7:26 ` Simon Kirby 2012-03-13 14:07 ` Dave Jones 2012-03-13 16:44 ` Neal Cardwell 2 siblings, 2 replies; 14+ messages in thread From: Simon Kirby @ 2012-03-13 7:26 UTC (permalink / raw) To: Eric Dumazet, Neal Cardwell; +Cc: netdev On Sat, Mar 10, 2012 at 11:20:21AM -0800, Eric Dumazet wrote: > Reported-by: Simon Kirby <sim@netnation.com> > Bisected-by: Simon Kirby <sim@netnation.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Tested-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/ipv4/syncookies.c | 30 ++++++++++++++++-------------- > net/ipv4/tcp_ipv4.c | 10 +++++++--- > 2 files changed, 23 insertions(+), 17 deletions(-) > > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c > index 51fdbb4..eab2a7f 100644 While deploying this on top of 3.2.9, we hit what seems to be the bug fixed by 4648dc97af9d496218a05353b0e442b3dfa6aaab in 3.3. I see 3.2.9 has daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, so maybe this is exposed in 3.2 now? [ 1196.683749] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0() [ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0() [ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0() [ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0() [ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0() [ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0() [ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0() [ 1196.687638] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0() [ 1196.716010] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0() [ 1196.722262] WARNING: at net/ipv4/tcp_input.c:3058 tcp_ack+0x209b/0x25f0() [ 1196.726111] WARNING: at net/ipv4/tcp_input.c:3439 tcp_ack+0x25a6/0x25f0() net/ipv4/tcp_input.c: 3438 #if FASTRETRANS_DEBUG > 0 ---> 3439 WARN_ON((int)tp->sacked_out < 0); 3440 WARN_ON((int)tp->lost_out < 0); 3441 WARN_ON((int)tp->retrans_out < 0); 3442 if (!tp->packets_out && tcp_is_sack(tp)) { ... 3057 /* D. Check consistency of the current state. */ ---> 3058 tcp_verify_left_out(tp); Oops two seconds later, scrolled off console, didn't get written to disk or remote syslog server, all we have is syslog-broadcasted Oops and Code lines due to broken printk priorities: Message from syslogd at Tue Mar 13 00:25:13 2012 ... kernel: [ 1198.637303] Oops: 0000 [#1] SMP ... kernel: [ 1198.640041] Code: f8 01 00 00 48 39 f9 49 89 ca 0f 84 cd 03 00 00 44 3b 67 40 79 28 e9 f7 04 00 00 66 90 4c 39 d1 0f 1f 44 00 00 0f 84 b2 03 00 00 <45> 3b 62 40 66 0f 1f 44 00 00 0f 88 a2 03 00 00 4c 89 d7 8b 87 kernel: [ 1198.640041] CR2: 0000000000000040 kernel: [ 1198.646208] Kernel panic - not syncing: Fatal exception in interrupt (gdb) find /b 0, +1000000000, 0x45,0x3b,0x62,0x40,0x66,0x0f,0x1f,0x44,0x00,0x00 0x66594e <tcp_sacktag_write_queue+958> (gdb) disass /m tcp_sacktag_write_queue ... 1709 if (after(TCP_SKB_CB(skb)->end_seq, skip_to_seq)) 0x0000000000665933 <+931>: cmp 0x40(%rdi),%r12d 0x0000000000665937 <+935>: jns 0x665961 <tcp_sacktag_write_queue+977> 0x0000000000665939 <+937>: jmpq 0x665e35 <tcp_sacktag_write_queue+2213> 0x000000000066593e <+942>: xchg %ax,%ax -> 0x000000000066594e <+958>: cmp 0x40(%r10),%r12d 0x0000000000665952 <+962>: nopw 0x0(%rax,%rax,1) 0x0000000000665958 <+968>: js 0x665d00 <tcp_sacktag_write_queue+1904> 0x000000000066595e <+974>: mov %r10,%rdi This only happened on one of the 30 or so servers I had just deployed it to. Simon- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-13 7:26 ` Simon Kirby @ 2012-03-13 14:07 ` Dave Jones 2012-03-13 16:52 ` Neal Cardwell 2012-03-13 16:44 ` Neal Cardwell 1 sibling, 1 reply; 14+ messages in thread From: Dave Jones @ 2012-03-13 14:07 UTC (permalink / raw) To: Simon Kirby; +Cc: Eric Dumazet, Neal Cardwell, netdev On Tue, Mar 13, 2012 at 12:26:44AM -0700, Simon Kirby wrote: > Oops two seconds later, scrolled off console, didn't get written to disk > or remote syslog server, all we have is syslog-broadcasted Oops and Code > lines due to broken printk priorities: I posted two reports we've had against Fedora yesterday which look like this bug. https://bugzilla.redhat.com/show_bug.cgi?id=801599 https://bugzilla.redhat.com/show_bug.cgi?id=802660 Dave ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-13 14:07 ` Dave Jones @ 2012-03-13 16:52 ` Neal Cardwell 0 siblings, 0 replies; 14+ messages in thread From: Neal Cardwell @ 2012-03-13 16:52 UTC (permalink / raw) To: Dave Jones; +Cc: Simon Kirby, Eric Dumazet, netdev On Tue, Mar 13, 2012 at 10:07 AM, Dave Jones <davej@redhat.com> wrote: > On Tue, Mar 13, 2012 at 12:26:44AM -0700, Simon Kirby wrote: > > > Oops two seconds later, scrolled off console, didn't get written to disk > > or remote syslog server, all we have is syslog-broadcasted Oops and Code > > lines due to broken printk priorities: > > I posted two reports we've had against Fedora yesterday which look like this bug. > > https://bugzilla.redhat.com/show_bug.cgi?id=801599 > https://bugzilla.redhat.com/show_bug.cgi?id=802660 Yes, these look like the same issue. neal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-13 7:26 ` Simon Kirby 2012-03-13 14:07 ` Dave Jones @ 2012-03-13 16:44 ` Neal Cardwell 2012-03-13 17:37 ` Eric Dumazet 1 sibling, 1 reply; 14+ messages in thread From: Neal Cardwell @ 2012-03-13 16:44 UTC (permalink / raw) To: Simon Kirby; +Cc: Eric Dumazet, netdev On Tue, Mar 13, 2012 at 3:26 AM, Simon Kirby <sim@netnation.com> wrote: > > While deploying this on top of 3.2.9, we hit what seems to be the bug > fixed by 4648dc97af9d496218a05353b0e442b3dfa6aaab in 3.3. I see 3.2.9 has > daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, so maybe this is exposed in 3.2 > now? ... > net/ipv4/tcp_input.c: > 3438 #if FASTRETRANS_DEBUG > 0 > ---> 3439 WARN_ON((int)tp->sacked_out < 0); > 3440 WARN_ON((int)tp->lost_out < 0); > 3441 WARN_ON((int)tp->retrans_out < 0); > 3442 if (!tp->packets_out && tcp_is_sack(tp)) { > ... > 3057 /* D. Check consistency of the current state. */ > ---> 3058 tcp_verify_left_out(tp); Yes, exactly. 3.2.9 and 3.2.10 have daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 but not the 4648dc97af9d496218a05353b0e442b3dfa6aaab that fixes the resulting issue with sacked_out going negative, and these lines you've flagged are exactly the symptoms we'd expect because of that. The fix is queued up for the -stable series already, so it should be in 3.2.11, I presume. The fix is already in 3.3-rc7. > Oops two seconds later, scrolled off console, didn't get written to disk > or remote syslog server, all we have is syslog-broadcasted Oops and Code > lines due to broken printk priorities: As you can imagine, the oops is probably connected to sacked_out going negative; others have reported an oops along with sacked_out going negative. The fix should take care of that. neal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-13 16:44 ` Neal Cardwell @ 2012-03-13 17:37 ` Eric Dumazet 2012-03-13 22:59 ` David Miller 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2012-03-13 17:37 UTC (permalink / raw) To: Neal Cardwell; +Cc: Simon Kirby, netdev On Tue, 2012-03-13 at 12:44 -0400, Neal Cardwell wrote: > Yes, exactly. 3.2.9 and 3.2.10 have > daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 but not the > 4648dc97af9d496218a05353b0e442b3dfa6aaab that fixes the resulting > issue with sacked_out going negative, and these lines you've flagged > are exactly the symptoms we'd expect because of that. The fix is > queued up for the -stable series already, so it should be in 3.2.11, I > presume. The fix is already in 3.3-rc7. 3.2.12 more probably, as 3.2.11 is a single build fix ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-13 17:37 ` Eric Dumazet @ 2012-03-13 22:59 ` David Miller 2012-03-13 23:30 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: David Miller @ 2012-03-13 22:59 UTC (permalink / raw) To: eric.dumazet; +Cc: ncardwell, sim, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 13 Mar 2012 10:37:31 -0700 > On Tue, 2012-03-13 at 12:44 -0400, Neal Cardwell wrote: > >> Yes, exactly. 3.2.9 and 3.2.10 have >> daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 but not the >> 4648dc97af9d496218a05353b0e442b3dfa6aaab that fixes the resulting >> issue with sacked_out going negative, and these lines you've flagged >> are exactly the symptoms we'd expect because of that. The fix is >> queued up for the -stable series already, so it should be in 3.2.11, I >> presume. The fix is already in 3.3-rc7. > > 3.2.12 more probably, as 3.2.11 is a single build fix The syncookie regression fix isn't even pushed to Linus's tree yet, and once it's there it still needs to cook for a few more days before I'm willing to sent it to -stable. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-13 22:59 ` David Miller @ 2012-03-13 23:30 ` Eric Dumazet 2012-03-14 0:36 ` Simon Kirby 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2012-03-13 23:30 UTC (permalink / raw) To: David Miller; +Cc: ncardwell, sim, netdev On Tue, 2012-03-13 at 15:59 -0700, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 13 Mar 2012 10:37:31 -0700 > > > On Tue, 2012-03-13 at 12:44 -0400, Neal Cardwell wrote: > > > >> Yes, exactly. 3.2.9 and 3.2.10 have > >> daef52bab1fd26e24e8e9578f8fb33ba1d0cb412 but not the > >> 4648dc97af9d496218a05353b0e442b3dfa6aaab that fixes the resulting > >> issue with sacked_out going negative, and these lines you've flagged > >> are exactly the symptoms we'd expect because of that. The fix is > >> queued up for the -stable series already, so it should be in 3.2.11, I > >> presume. The fix is already in 3.3-rc7. > > > > 3.2.12 more probably, as 3.2.11 is a single build fix > > The syncookie regression fix isn't even pushed to Linus's tree yet, > and once it's there it still needs to cook for a few more days before > I'm willing to sent it to -stable. Yeah, discussion diverged from tcp syncookie to other stuff (tcp at least ;) ) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] tcp: fix syncookie regression 2012-03-13 23:30 ` Eric Dumazet @ 2012-03-14 0:36 ` Simon Kirby 0 siblings, 0 replies; 14+ messages in thread From: Simon Kirby @ 2012-03-14 0:36 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, ncardwell, netdev On Tue, Mar 13, 2012 at 04:30:57PM -0700, Eric Dumazet wrote: > On Tue, 2012-03-13 at 15:59 -0700, David Miller wrote: > > > The syncookie regression fix isn't even pushed to Linus's tree yet, > > and once it's there it still needs to cook for a few more days before > > I'm willing to sent it to -stable. > > Yeah, discussion diverged from tcp syncookie to other stuff (tcp at > least ;) ) Yes, sorry, at first I thought it might be related. Anyway, the syncookie fix is cooking in production(tm) for us, and working well. Thanks, Simon- ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-03-14 0:36 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-10 12:27 TCP syn flood handling regression Simon Kirby 2012-03-10 17:44 ` Eric Dumazet 2012-03-10 17:56 ` Eric Dumazet 2012-03-10 19:20 ` [PATCH] tcp: fix syncookie regression Eric Dumazet 2012-03-11 22:54 ` David Miller 2012-03-12 20:30 ` Simon Kirby 2012-03-13 7:26 ` Simon Kirby 2012-03-13 14:07 ` Dave Jones 2012-03-13 16:52 ` Neal Cardwell 2012-03-13 16:44 ` Neal Cardwell 2012-03-13 17:37 ` Eric Dumazet 2012-03-13 22:59 ` David Miller 2012-03-13 23:30 ` Eric Dumazet 2012-03-14 0:36 ` Simon Kirby
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).