* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
@ 2008-06-11 15:06 Alexey Kuznetsov
0 siblings, 0 replies; 35+ messages in thread
From: Alexey Kuznetsov @ 2008-06-11 15:06 UTC (permalink / raw)
To: davem, mcmanus, netdev
Hello!
> On Tue, 2008-06-10 at 15:32 -0700, David Miller wrote:
>
> I took a close look at this, it seems this patch adds
> an ABBA deadlock. But I might be wrong.
Yes, this is not a real deadlock. But also it is not even a lock.
bh_lock_sock on listening socket is not enough to lock it.
Seems, the solution would be forcing bh_lock_sock() to protect accept_queue,
which is a mess. Or adding a separate spinlock, like it was done with
syn_wait_lock. Also not good.
Alexey
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ @ 2008-05-29 8:45 Ingo Molnar 2008-05-29 11:14 ` Ilpo Järvinen 0 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-05-29 8:45 UTC (permalink / raw) To: linux-kernel; +Cc: netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton * Ingo Molnar <mingo@elte.hu> wrote: > in an overnight -tip testruns that is based on recent -git i got two > stuck TCP connections: > > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp 0 174592 10.0.1.14:58015 10.0.1.14:3632 ESTABLISHED > tcp 72134 0 10.0.1.14:3632 10.0.1.14:58015 ESTABLISHED update: in the past 5 days of -tip testing i've gathered about 10 randconfig kernel configs that all produced such failures. Since the bug itself is very elusive (it takes up to 50 boot + kernel-rebuild-via-distccc iterations to trigger) bisection was still not an option - but with 10 configs statistical analysis of the configs is now possible. I made a histogram of all kernel options present in those configs, and one networking related kernel option stood out: 5 CONFIG_TCP_CONG_ADVANCED=y 6 CONFIG_INET_TCP_DIAG=y 6 CONFIG_TCP_MD5SIG=y 9 CONFIG_TCP_CONG_CUBIC=y that code is called in the bootlogs: > [ 13.279410] calling cubictcp_register+0x0/0x80 > [ 13.279412] TCP cubic registered the likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in my randconfig runs is 75%. The likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in 10 configs in a row is 0.75^10, or 5.6%. So statistical analysis can say it with a 95% confidence that the presence of this option correlates to the hung sockets. i have started testing this theory now, via the patch below, which turns off TCP_CONG_CUBIC. It will take about 50 bootups on the affected testsystems to confirm. (it will take a couple of hours today as not all testsystems show these hung socket symptoms) distributions enable TCP_CONG_CUBIC by default: $ grep CUBIC /boot/config-2.6.24.7-92.fc8 CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_CUBIC=y which would explain why Arjan and Peter triggered similar hangs as well. Ingo ----------------------> Subject: qa: no TCP_CONG_CUBIC From: Ingo Molnar <mingo@elte.hu> Date: Thu May 29 09:45:51 CEST 2008 --- net/ipv4/Kconfig | 4 ++++ 1 file changed, 4 insertions(+) Index: tip/net/ipv4/Kconfig =================================================================== --- tip.orig/net/ipv4/Kconfig +++ tip/net/ipv4/Kconfig @@ -454,6 +454,8 @@ config TCP_CONG_BIC config TCP_CONG_CUBIC tristate "CUBIC TCP" default y + depends on BROKEN_BOOT_ALLOWED + select BROKEN_BOOT ---help--- This is version 2.0 of BIC-TCP which uses a cubic growth function among other techniques. @@ -608,6 +610,8 @@ endif config TCP_CONG_CUBIC tristate depends on !TCP_CONG_ADVANCED + depends on BROKEN_BOOT_ALLOWED + select BROKEN_BOOT default y config DEFAULT_TCP_CONG ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-29 8:45 Ingo Molnar @ 2008-05-29 11:14 ` Ilpo Järvinen 2008-05-29 11:22 ` Ingo Molnar 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-05-29 11:14 UTC (permalink / raw) To: Ingo Molnar Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton On Thu, 29 May 2008, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > in an overnight -tip testruns that is based on recent -git i got two > > stuck TCP connections: > > > > Active Internet connections (w/o servers) > > Proto Recv-Q Send-Q Local Address Foreign Address State > > tcp 0 174592 10.0.1.14:58015 10.0.1.14:3632 ESTABLISHED > > tcp 72134 0 10.0.1.14:3632 10.0.1.14:58015 ESTABLISHED > > update: in the past 5 days of -tip testing i've gathered about 10 > randconfig kernel configs that all produced such failures. ...I tried yesterday some accept (& read some) & close/exit type stressing but I couldn't get it to show up (though I'll try longer time later on and also fault style exiting). > Since the bug itself is very elusive (it takes up to 50 boot + > kernel-rebuild-via-distccc iterations to trigger) bisection was still > not an option - but with 10 configs statistical analysis of the configs > is now possible. > > I made a histogram of all kernel options present in those configs, and > one networking related kernel option stood out: > > 5 CONFIG_TCP_CONG_ADVANCED=y > 6 CONFIG_INET_TCP_DIAG=y > 6 CONFIG_TCP_MD5SIG=y > 9 CONFIG_TCP_CONG_CUBIC=y > > that code is called in the bootlogs: > > > [ 13.279410] calling cubictcp_register+0x0/0x80 > > [ 13.279412] TCP cubic registered > > the likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in my randconfig > runs is 75%. The likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in > 10 configs in a row is 0.75^10, or 5.6%. So statistical analysis can say > it with a 95% confidence that the presence of this option correlates to > the hung sockets. Do I understand you correctly... it doesn't explain the tenth case out of ten but just nine of them? > i have started testing this theory now, via the patch below, which turns > off TCP_CONG_CUBIC. It will take about 50 bootups on the affected > testsystems to confirm. (it will take a couple of hours today as not all > testsystems show these hung socket symptoms) > > distributions enable TCP_CONG_CUBIC by default: > > $ grep CUBIC /boot/config-2.6.24.7-92.fc8 > CONFIG_TCP_CONG_CUBIC=y > CONFIG_DEFAULT_CUBIC=y > > which would explain why Arjan and Peter triggered similar hangs as well. Main problem with this explanation is that congestion control modules are only in use when TCP is in ESTABLISHED and transmitting normally, while it has nothing to do how we enter or leave ESTABLISHED. But if it's really that the process who owned the connection already went away, I think we should end up into tcp_close() which changes the state from established (and send RST too if there's still data to be received, which would be picked up by the other end and that end would no longer keep established either). ...A failure to send the reset would show up in LINUX_MIB_TCPABORTFAILED. Because both ends remain in established, it kind of excludes the possibility that something would have accidently allowed the Recv-Q end to return back to established too (due to some bug). To me there are mainly two weird things: 1) Why we see orphaning with data in the first place (I think distcc would be interested to read everything, unless some worker crashed in early... Though some timeout in distcc could explain it as well but I don't know too well how distcc does everything)... 2) Why the connection is still in ESTABLISHED when it was orphaned and it has some data to receive... If it had some unread data, should be in CLOSE or in FIN_WAIT1 otherwise. -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-29 11:14 ` Ilpo Järvinen @ 2008-05-29 11:22 ` Ingo Molnar 2008-05-30 18:18 ` Ingo Molnar 0 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-05-29 11:22 UTC (permalink / raw) To: Ilpo J?rvinen Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton after about 50 bootups i got a hung test again: titan:~/tip> netstat -nt Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.0.1.14:22 10.0.1.16:58062 ESTABLISHED tcp 0 0 10.0.1.14:22 10.0.1.16:60109 ESTABLISHED tcp 0 86368 10.0.1.14:43914 10.0.1.16:3632 ESTABLISHED and this time with CUBIC_TCP disabled - so that was a red herring. Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-29 11:22 ` Ingo Molnar @ 2008-05-30 18:18 ` Ingo Molnar 2008-05-31 6:09 ` Ingo Molnar 0 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-05-30 18:18 UTC (permalink / raw) To: Ilpo J?rvinen Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton * Ingo Molnar <mingo@elte.hu> wrote: > after about 50 bootups i got a hung test again: > > titan:~/tip> netstat -nt > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address > State > tcp 0 0 10.0.1.14:22 10.0.1.16:58062 ESTABLISHED > tcp 0 0 10.0.1.14:22 10.0.1.16:60109 ESTABLISHED > tcp 0 86368 10.0.1.14:43914 10.0.1.16:3632 ESTABLISHED > > and this time with CUBIC_TCP disabled - so that was a red herring. ah, in retrospect i realized that this test had one flaw: some of the systems i the build cluster already ran a newer kernel and hence were targets for this bug. so i turned off CONFIG_TCP_CONG_CUBIC on all the testboxes and rebooted the cluster boxes into 2.6.25, and the hung sockets are now gone. (about 150 successful iterations) i did another change as well: i removed the localhost distcc component. I'll reinstate that now to make sure it's really related to TCP_CONG_CUBIC and not to localhost networking. Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-30 18:18 ` Ingo Molnar @ 2008-05-31 6:09 ` Ingo Molnar 2008-05-31 11:46 ` Ilpo Järvinen 0 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-05-31 6:09 UTC (permalink / raw) To: Ilpo J?rvinen Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton * Ingo Molnar <mingo@elte.hu> wrote: > ah, in retrospect i realized that this test had one flaw: some of the > systems i the build cluster already ran a newer kernel and hence were > targets for this bug. > > so i turned off CONFIG_TCP_CONG_CUBIC on all the testboxes and > rebooted the cluster boxes into 2.6.25, and the hung sockets are now > gone. (about 150 successful iterations) > > i did another change as well: i removed the localhost distcc > component. I'll reinstate that now to make sure it's really related to > TCP_CONG_CUBIC and not to localhost networking. ok, once i added back the localhost distcc component and the hung kernel build + stuck TCP socket bug happened again overnight: Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 72187 0 10.0.1.14:3632 10.0.1.14:47910 ESTABLISHED tcp 0 174464 10.0.1.14:47910 10.0.1.14:3632 ESTABLISHED so it seems distcc over localhost was the aspect that made it fail. _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_ sides of the connection. But that is just a theory - it could be timing, etc. Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-31 6:09 ` Ingo Molnar @ 2008-05-31 11:46 ` Ilpo Järvinen 2008-05-31 12:18 ` Ilpo Järvinen 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-05-31 11:46 UTC (permalink / raw) To: Ingo Molnar Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov On Sat, 31 May 2008, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > ah, in retrospect i realized that this test had one flaw: some of the > > systems i the build cluster already ran a newer kernel and hence were > > targets for this bug. > > > > so i turned off CONFIG_TCP_CONG_CUBIC on all the testboxes and > > rebooted the cluster boxes into 2.6.25, and the hung sockets are now > > gone. (about 150 successful iterations) > > > > i did another change as well: i removed the localhost distcc > > component. I'll reinstate that now to make sure it's really related to > > TCP_CONG_CUBIC and not to localhost networking. > > ok, once i added back the localhost distcc component and the hung kernel > build + stuck TCP socket bug happened again overnight: > > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp 72187 0 10.0.1.14:3632 10.0.1.14:47910 ESTABLISHED > tcp 0 174464 10.0.1.14:47910 10.0.1.14:3632 ESTABLISHED > > so it seems distcc over localhost was the aspect that made it fail. > > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_ > sides of the connection. But that is just a theory - it could be timing, > etc. Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were some post 2.6.25 changes into it)? -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-31 11:46 ` Ilpo Järvinen @ 2008-05-31 12:18 ` Ilpo Järvinen 2008-05-31 12:54 ` Ingo Molnar 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-05-31 12:18 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov [-- Attachment #1: Type: TEXT/PLAIN, Size: 11376 bytes --] On Sat, 31 May 2008, Ilpo Järvinen wrote: > On Sat, 31 May 2008, Ingo Molnar wrote: > > > ok, once i added back the localhost distcc component and the hung kernel > > build + stuck TCP socket bug happened again overnight: > > > > Active Internet connections (w/o servers) > > Proto Recv-Q Send-Q Local Address Foreign Address State > > tcp 72187 0 10.0.1.14:3632 10.0.1.14:47910 ESTABLISHED > > tcp 0 174464 10.0.1.14:47910 10.0.1.14:3632 ESTABLISHED > > > > so it seems distcc over localhost was the aspect that made it fail. > > > > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_ > > sides of the connection. But that is just a theory - it could be timing, > > etc. > > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were > some post 2.6.25 changes into it)? Hmm, if so, please try this revert below (I hope I got everything that is related there, gitk is currently broken for me so that viewing many commits quickly is a pain, so I might have accidently missed something)... -- i. -- [PATCH] tcp: Reverts DEFER_ACCEPT modifications Reverts these commits: [TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_t [TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack [TCP]: TCP_DEFER_ACCEPT updates - process as established tcp: Fix slab corruption with ipv6 and tcp6fuzz ..., ie., 539fae89bebd16ebeafd57a87169bc56eb530d76, e4c78840284f3f51b1896cf3936d60a6033c4d2c, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 and 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- include/linux/tcp.h | 7 ------ include/net/request_sock.h | 4 +- include/net/tcp.h | 1 - net/ipv4/inet_connection_sock.c | 11 +++++++-- net/ipv4/tcp.c | 18 +++++++++------ net/ipv4/tcp_input.c | 45 --------------------------------------- net/ipv4/tcp_ipv4.c | 8 ------- net/ipv4/tcp_minisocks.c | 32 ++++++++++----------------- net/ipv4/tcp_timer.c | 5 ---- 9 files changed, 33 insertions(+), 98 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 18e62e3..b31b6b7 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -239,11 +239,6 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) return (struct tcp_request_sock *)req; } -struct tcp_deferred_accept_info { - struct sock *listen_sk; - struct request_sock *request; -}; - struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; @@ -379,8 +374,6 @@ struct tcp_sock { unsigned int keepalive_intvl; /* time interval between keep alive probes */ int linger2; - struct tcp_deferred_accept_info defer_tcp_accept; - unsigned long last_synq_overflow; u32 tso_deferred; diff --git a/include/net/request_sock.h b/include/net/request_sock.h index b220b5f..0c96e7b 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -115,8 +115,8 @@ struct request_sock_queue { struct request_sock *rskq_accept_head; struct request_sock *rskq_accept_tail; rwlock_t syn_wait_lock; - u16 rskq_defer_accept; - /* 2 bytes hole, try to pack */ + u8 rskq_defer_accept; + /* 3 bytes hole, try to pack */ struct listen_sock *listen_opt; }; diff --git a/include/net/tcp.h b/include/net/tcp.h index 633147c..981d5ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -139,7 +139,6 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); #define MAX_TCP_KEEPINTVL 32767 #define MAX_TCP_KEEPCNT 127 #define MAX_TCP_SYNCNT 127 -#define MAX_TCP_ACCEPT_DEFERRED 65535 #define TCP_SYNQ_INTERVAL (HZ/5) /* Period of SYNACK timer */ diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 828ea21..ec83448 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -419,7 +419,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, struct inet_connection_sock *icsk = inet_csk(parent); struct request_sock_queue *queue = &icsk->icsk_accept_queue; struct listen_sock *lopt = queue->listen_opt; - int thresh = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries; + int max_retries = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries; + int thresh = max_retries; unsigned long now = jiffies; struct request_sock **reqp, *req; int i, budget; @@ -455,6 +456,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, } } + if (queue->rskq_defer_accept) + max_retries = queue->rskq_defer_accept; + budget = 2 * (lopt->nr_table_entries / (timeout / interval)); i = lopt->clock_hand; @@ -462,8 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, reqp=&lopt->syn_table[i]; while ((req = *reqp) != NULL) { if (time_after_eq(now, req->expires)) { - if (req->retrans < thresh && - !req->rsk_ops->rtx_syn_ack(parent, req)) { + if ((req->retrans < thresh || + (inet_rsk(req)->acked && req->retrans < max_retries)) + && !req->rsk_ops->rtx_syn_ack(parent, req)) { unsigned long timeo; if (req->retrans++ == 0) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f886531..a0deead 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2105,12 +2105,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level, break; case TCP_DEFER_ACCEPT: - if (val < 0) { - err = -EINVAL; - } else { - if (val > MAX_TCP_ACCEPT_DEFERRED) - val = MAX_TCP_ACCEPT_DEFERRED; - icsk->icsk_accept_queue.rskq_defer_accept = val; + icsk->icsk_accept_queue.rskq_defer_accept = 0; + if (val > 0) { + /* Translate value in seconds to number of + * retransmits */ + while (icsk->icsk_accept_queue.rskq_defer_accept < 32 && + val > ((TCP_TIMEOUT_INIT / HZ) << + icsk->icsk_accept_queue.rskq_defer_accept)) + icsk->icsk_accept_queue.rskq_defer_accept++; + icsk->icsk_accept_queue.rskq_defer_accept++; } break; @@ -2292,7 +2295,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, val = (val ? : sysctl_tcp_fin_timeout) / HZ; break; case TCP_DEFER_ACCEPT: - val = icsk->icsk_accept_queue.rskq_defer_accept; + val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 : + ((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1)); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b54d9d3..ddc2e89 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4532,49 +4532,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th) } } -static int tcp_defer_accept_check(struct sock *sk) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (tp->defer_tcp_accept.request) { - int queued_data = tp->rcv_nxt - tp->copied_seq; - int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? - tcp_hdr((struct sk_buff *) - sk->sk_receive_queue.prev)->fin : 0; - - if (queued_data && hasfin) - queued_data--; - - if (queued_data && - tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) { - if (sock_flag(sk, SOCK_KEEPOPEN)) { - inet_csk_reset_keepalive_timer(sk, - keepalive_time_when(tp)); - } else { - inet_csk_delete_keepalive_timer(sk); - } - - inet_csk_reqsk_queue_add( - tp->defer_tcp_accept.listen_sk, - tp->defer_tcp_accept.request, - sk); - - tp->defer_tcp_accept.listen_sk->sk_data_ready( - tp->defer_tcp_accept.listen_sk, 0); - - sock_put(tp->defer_tcp_accept.listen_sk); - sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } else if (hasfin || - tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { - tcp_reset(sk); - return -1; - } - } - return 0; -} - static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen) { struct tcp_sock *tp = tcp_sk(sk); @@ -4935,8 +4892,6 @@ step5: tcp_data_snd_check(sk); tcp_ack_snd_check(sk); - - tcp_defer_accept_check(sk); return 0; csum_error: diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index cd601a8..b698b5b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1918,14 +1918,6 @@ int tcp_v4_destroy_sock(struct sock *sk) sk->sk_sndmsg_page = NULL; } - if (tp->defer_tcp_accept.request) { - reqsk_free(tp->defer_tcp_accept.request); - sock_put(tp->defer_tcp_accept.listen_sk); - sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } - atomic_dec(&tcp_sockets_allocated); return 0; diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 019c8c1..8245247 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -571,8 +571,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, does sequence test, SYN is truncated, and thus we consider it a bare ACK. - Both ends (listening sockets) accept the new incoming - connection and try to talk to each other. 8-) + If icsk->icsk_accept_queue.rskq_defer_accept, we silently drop this + bare ACK. Otherwise, we create an established connection. Both + ends (listening sockets) accept the new incoming connection and try + to talk to each other. 8-) Note: This case is both harmless, and rare. Possibility is about the same as us discovering intelligent life on another plant tomorrow. @@ -640,6 +642,13 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, if (!(flg & TCP_FLAG_ACK)) return NULL; + /* If TCP_DEFER_ACCEPT is set, drop bare ACK. */ + if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept && + TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) { + inet_rsk(req)->acked = 1; + return NULL; + } + /* OK, ACK is valid, create big socket and * feed this segment to it. It will repeat all * the tests. THIS SEGMENT MUST MOVE SOCKET TO @@ -678,24 +687,7 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, inet_csk_reqsk_queue_unlink(sk, req, prev); inet_csk_reqsk_queue_removed(sk, req); - if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept && - TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) { - - /* the accept queue handling is done is est recv slow - * path so lets make sure to start there - */ - tcp_sk(child)->pred_flags = 0; - sock_hold(sk); - sock_hold(child); - tcp_sk(child)->defer_tcp_accept.listen_sk = sk; - tcp_sk(child)->defer_tcp_accept.request = req; - - inet_csk_reset_keepalive_timer(child, - inet_csk(sk)->icsk_accept_queue.rskq_defer_accept * HZ); - } else { - inet_csk_reqsk_queue_add(sk, req, child); - } - + inet_csk_reqsk_queue_add(sk, req, child); return child; listen_overflow: diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 4de68cf..63ed9d6 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -489,11 +489,6 @@ static void tcp_keepalive_timer (unsigned long data) goto death; } - if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { - tcp_send_active_reset(sk, GFP_ATOMIC); - goto death; - } - if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE) goto out; -- 1.5.5.3 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-31 12:18 ` Ilpo Järvinen @ 2008-05-31 12:54 ` Ingo Molnar 2008-05-31 12:58 ` Ilpo Järvinen 0 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-05-31 12:54 UTC (permalink / raw) To: Ilpo Järvinen Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there > > were some post 2.6.25 changes into it)? > > Hmm, if so, please try this revert below (I hope I got everything that > is related there, gitk is currently broken for me so that viewing many > commits quickly is a pain, so I might have accidently missed > something)... what is the easiest way to figure out whether my version of distcc enables TCP_DEFER_ACCEPT? Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-31 12:54 ` Ingo Molnar @ 2008-05-31 12:58 ` Ilpo Järvinen 2008-05-31 16:35 ` Ingo Molnar 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-05-31 12:58 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov [-- Attachment #1: Type: TEXT/PLAIN, Size: 781 bytes --] On Sat, 31 May 2008, Ingo Molnar wrote: > > * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there > > > were some post 2.6.25 changes into it)? > > > > Hmm, if so, please try this revert below (I hope I got everything that > > is related there, gitk is currently broken for me so that viewing many > > commits quickly is a pain, so I might have accidently missed > > something)... > > what is the easiest way to figure out whether my version of distcc > enables TCP_DEFER_ACCEPT? strace distccd (the listening party is the interesting one here, not the distcc)? ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, sizeof(val)) seems to be the magic trick that is interestion here. -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-31 12:58 ` Ilpo Järvinen @ 2008-05-31 16:35 ` Ingo Molnar 2008-06-03 9:40 ` [fixed] [patch] " Ingo Molnar 0 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-05-31 16:35 UTC (permalink / raw) To: Ilpo Järvinen Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > what is the easiest way to figure out whether my version of distcc > > enables TCP_DEFER_ACCEPT? > > strace distccd (the listening party is the interesting one here, not > the distcc)? > > ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, sizeof(val)) > seems to be the magic trick that is interestion here. seems to be used: 22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62 22003 listen(4, 10) = 0 22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0 i'll queue up your reverts for testing in -tip. Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-05-31 16:35 ` Ingo Molnar @ 2008-06-03 9:40 ` Ingo Molnar 2008-06-03 14:41 ` Patrick McManus 2008-06-03 21:46 ` Ilpo Järvinen 0 siblings, 2 replies; 35+ messages in thread From: Ingo Molnar @ 2008-06-03 9:40 UTC (permalink / raw) To: Ilpo Järvinen Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov * Ingo Molnar <mingo@elte.hu> wrote: > > ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, > > sizeof(val)) seems to be the magic trick that is interestion here. > > seems to be used: > > 22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62 > 22003 listen(4, 10) = 0 > 22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0 > > i'll queue up your reverts for testing in -tip. update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely fixed the hangs! Here is the testing i did: first i ran about 500+ successful iterations on the affected testboxes with your revert patch applied, on multiple systems. Then today, without changing anything else on one of the testsystems i reverted your revert on that single system. After about an hour of testing, in 20 iterations i got a hang again over localhost: titan:~> netstat -nt Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 174592 10.0.1.14:34710 10.0.1.14:3632 ESTABLISHED tcp 72145 0 10.0.1.14:3632 10.0.1.14:34710 ESTABLISHED so i hereby conclude that your revert works :) I've repeated the commit below that resolves this nasty regression. phew ... Ingo ----------------> commit dad98991c137c089d64a3057dddc3a12bd6866b0 Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Date: Sat May 31 15:18:56 2008 +0300 tcp: revert DEFER_ACCEPT modifications On Sat, 31 May 2008, Ilpo Järvinen wrote: > On Sat, 31 May 2008, Ingo Molnar wrote: > > > ok, once i added back the localhost distcc component and the hung kernel > > build + stuck TCP socket bug happened again overnight: > > > > Active Internet connections (w/o servers) > > Proto Recv-Q Send-Q Local Address Foreign Address State > > tcp 72187 0 10.0.1.14:3632 10.0.1.14:47910 ESTABLISHED > > tcp 0 174464 10.0.1.14:47910 10.0.1.14:3632 ESTABLISHED > > > > so it seems distcc over localhost was the aspect that made it fail. > > > > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_ > > sides of the connection. But that is just a theory - it could be timing, > > etc. > > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were > some post 2.6.25 changes into it)? Hmm, if so, please try this revert below (I hope I got everything that is related there, gitk is currently broken for me so that viewing many commits quickly is a pain, so I might have accidently missed something)... Reverts these commits: [TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_t [TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack [TCP]: TCP_DEFER_ACCEPT updates - process as established tcp: Fix slab corruption with ipv6 and tcp6fuzz ..., ie., 539fae89bebd16ebeafd57a87169bc56eb530d76, e4c78840284f3f51b1896cf3936d60a6033c4d2c, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 and 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 18e62e3..b31b6b7 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -239,11 +239,6 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) return (struct tcp_request_sock *)req; } -struct tcp_deferred_accept_info { - struct sock *listen_sk; - struct request_sock *request; -}; - struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; @@ -379,8 +374,6 @@ struct tcp_sock { unsigned int keepalive_intvl; /* time interval between keep alive probes */ int linger2; - struct tcp_deferred_accept_info defer_tcp_accept; - unsigned long last_synq_overflow; u32 tso_deferred; diff --git a/include/net/request_sock.h b/include/net/request_sock.h index b220b5f..0c96e7b 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -115,8 +115,8 @@ struct request_sock_queue { struct request_sock *rskq_accept_head; struct request_sock *rskq_accept_tail; rwlock_t syn_wait_lock; - u16 rskq_defer_accept; - /* 2 bytes hole, try to pack */ + u8 rskq_defer_accept; + /* 3 bytes hole, try to pack */ struct listen_sock *listen_opt; }; diff --git a/include/net/tcp.h b/include/net/tcp.h index 633147c..981d5ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -139,7 +139,6 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); #define MAX_TCP_KEEPINTVL 32767 #define MAX_TCP_KEEPCNT 127 #define MAX_TCP_SYNCNT 127 -#define MAX_TCP_ACCEPT_DEFERRED 65535 #define TCP_SYNQ_INTERVAL (HZ/5) /* Period of SYNACK timer */ diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 828ea21..ec83448 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -419,7 +419,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, struct inet_connection_sock *icsk = inet_csk(parent); struct request_sock_queue *queue = &icsk->icsk_accept_queue; struct listen_sock *lopt = queue->listen_opt; - int thresh = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries; + int max_retries = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries; + int thresh = max_retries; unsigned long now = jiffies; struct request_sock **reqp, *req; int i, budget; @@ -455,6 +456,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, } } + if (queue->rskq_defer_accept) + max_retries = queue->rskq_defer_accept; + budget = 2 * (lopt->nr_table_entries / (timeout / interval)); i = lopt->clock_hand; @@ -462,8 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, reqp=&lopt->syn_table[i]; while ((req = *reqp) != NULL) { if (time_after_eq(now, req->expires)) { - if (req->retrans < thresh && - !req->rsk_ops->rtx_syn_ack(parent, req)) { + if ((req->retrans < thresh || + (inet_rsk(req)->acked && req->retrans < max_retries)) + && !req->rsk_ops->rtx_syn_ack(parent, req)) { unsigned long timeo; if (req->retrans++ == 0) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f886531..a0deead 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2105,12 +2105,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level, break; case TCP_DEFER_ACCEPT: - if (val < 0) { - err = -EINVAL; - } else { - if (val > MAX_TCP_ACCEPT_DEFERRED) - val = MAX_TCP_ACCEPT_DEFERRED; - icsk->icsk_accept_queue.rskq_defer_accept = val; + icsk->icsk_accept_queue.rskq_defer_accept = 0; + if (val > 0) { + /* Translate value in seconds to number of + * retransmits */ + while (icsk->icsk_accept_queue.rskq_defer_accept < 32 && + val > ((TCP_TIMEOUT_INIT / HZ) << + icsk->icsk_accept_queue.rskq_defer_accept)) + icsk->icsk_accept_queue.rskq_defer_accept++; + icsk->icsk_accept_queue.rskq_defer_accept++; } break; @@ -2292,7 +2295,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, val = (val ? : sysctl_tcp_fin_timeout) / HZ; break; case TCP_DEFER_ACCEPT: - val = icsk->icsk_accept_queue.rskq_defer_accept; + val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 : + ((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1)); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b54d9d3..ddc2e89 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4532,49 +4532,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th) } } -static int tcp_defer_accept_check(struct sock *sk) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (tp->defer_tcp_accept.request) { - int queued_data = tp->rcv_nxt - tp->copied_seq; - int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? - tcp_hdr((struct sk_buff *) - sk->sk_receive_queue.prev)->fin : 0; - - if (queued_data && hasfin) - queued_data--; - - if (queued_data && - tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) { - if (sock_flag(sk, SOCK_KEEPOPEN)) { - inet_csk_reset_keepalive_timer(sk, - keepalive_time_when(tp)); - } else { - inet_csk_delete_keepalive_timer(sk); - } - - inet_csk_reqsk_queue_add( - tp->defer_tcp_accept.listen_sk, - tp->defer_tcp_accept.request, - sk); - - tp->defer_tcp_accept.listen_sk->sk_data_ready( - tp->defer_tcp_accept.listen_sk, 0); - - sock_put(tp->defer_tcp_accept.listen_sk); - sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } else if (hasfin || - tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { - tcp_reset(sk); - return -1; - } - } - return 0; -} - static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen) { struct tcp_sock *tp = tcp_sk(sk); @@ -4935,8 +4892,6 @@ step5: tcp_data_snd_check(sk); tcp_ack_snd_check(sk); - - tcp_defer_accept_check(sk); return 0; csum_error: diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index cd601a8..b698b5b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1918,14 +1918,6 @@ int tcp_v4_destroy_sock(struct sock *sk) sk->sk_sndmsg_page = NULL; } - if (tp->defer_tcp_accept.request) { - reqsk_free(tp->defer_tcp_accept.request); - sock_put(tp->defer_tcp_accept.listen_sk); - sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } - atomic_dec(&tcp_sockets_allocated); return 0; diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 019c8c1..8245247 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -571,8 +571,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, does sequence test, SYN is truncated, and thus we consider it a bare ACK. - Both ends (listening sockets) accept the new incoming - connection and try to talk to each other. 8-) + If icsk->icsk_accept_queue.rskq_defer_accept, we silently drop this + bare ACK. Otherwise, we create an established connection. Both + ends (listening sockets) accept the new incoming connection and try + to talk to each other. 8-) Note: This case is both harmless, and rare. Possibility is about the same as us discovering intelligent life on another plant tomorrow. @@ -640,6 +642,13 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, if (!(flg & TCP_FLAG_ACK)) return NULL; + /* If TCP_DEFER_ACCEPT is set, drop bare ACK. */ + if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept && + TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) { + inet_rsk(req)->acked = 1; + return NULL; + } + /* OK, ACK is valid, create big socket and * feed this segment to it. It will repeat all * the tests. THIS SEGMENT MUST MOVE SOCKET TO @@ -678,24 +687,7 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, inet_csk_reqsk_queue_unlink(sk, req, prev); inet_csk_reqsk_queue_removed(sk, req); - if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept && - TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) { - - /* the accept queue handling is done is est recv slow - * path so lets make sure to start there - */ - tcp_sk(child)->pred_flags = 0; - sock_hold(sk); - sock_hold(child); - tcp_sk(child)->defer_tcp_accept.listen_sk = sk; - tcp_sk(child)->defer_tcp_accept.request = req; - - inet_csk_reset_keepalive_timer(child, - inet_csk(sk)->icsk_accept_queue.rskq_defer_accept * HZ); - } else { - inet_csk_reqsk_queue_add(sk, req, child); - } - + inet_csk_reqsk_queue_add(sk, req, child); return child; listen_overflow: diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 4de68cf..63ed9d6 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -489,11 +489,6 @@ static void tcp_keepalive_timer (unsigned long data) goto death; } - if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { - tcp_send_active_reset(sk, GFP_ATOMIC); - goto death; - } - if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE) goto out; ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 9:40 ` [fixed] [patch] " Ingo Molnar @ 2008-06-03 14:41 ` Patrick McManus 2008-06-03 21:46 ` Ilpo Järvinen 1 sibling, 0 replies; 35+ messages in thread From: Patrick McManus @ 2008-06-03 14:41 UTC (permalink / raw) To: Ingo Molnar Cc: Ilpo Järvinen, Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov On Tue, 2008-06-03 at 11:40 +0200, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > > ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, > > > sizeof(val)) seems to be the magic trick that is interestion here. > > > > seems to be used: > > > > 22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62 > > 22003 listen(4, 10) = 0 > > 22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0 > > > > i'll queue up your reverts for testing in -tip. > > update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely > fixed the hangs! > > Here is the testing i did: hm. Here's a theory - the DA code put the socket in the accept queue after data has been queued on the socket, instead of the other way around which is normal. Could this be confusing the select() mechanism? -Pat ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 9:40 ` [fixed] [patch] " Ingo Molnar 2008-06-03 14:41 ` Patrick McManus @ 2008-06-03 21:46 ` Ilpo Järvinen 2008-06-03 22:01 ` Ilpo Järvinen 2008-06-04 7:23 ` Ingo Molnar 1 sibling, 2 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-03 21:46 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov, Patrick McManus [-- Attachment #1: Type: TEXT/PLAIN, Size: 3897 bytes --] On Tue, 3 Jun 2008, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > > ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, > > > sizeof(val)) seems to be the magic trick that is interestion here. > > > > seems to be used: > > > > 22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62 > > 22003 listen(4, 10) = 0 > > 22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0 > > > > i'll queue up your reverts for testing in -tip. > > update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely > fixed the hangs! ...It wasn't exactly out-of-tree, Evgeniy fixed a problem that was found in "TCP_DEFER_ACCEPT updates - process as established", perhaps it just wasn't in your testing tree yet. $ git-log -n 1 --pretty=full 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38 | grep "Commit:" Commit: David S. Miller <davem@davemloft.net> > Here is the testing i did: > > first i ran about 500+ successful iterations on the affected testboxes > with your revert patch applied, on multiple systems. Are you sure this is enough to conclude the results? Seems quite small number to me to rule out luck. Especially considering that it was some amount of time in the tree already until you noticed it for the first time. Anyway, nice that it seems to be helping. It was almost the only possibility on TCP side, I don't think there were any other state machine related changes. So it wasn't just "random revert" in that sense like you were implying :-), I just didn't have any theory how it would cause the problem... ...I even first disregarded DA that because of timeline in-exactness and because I wrongly assumed that distcc probably won't use it anyway, but then I checked later on and found out that it was present at least in the source I had lying around. Anyway, it might be that the revert was a bit overkill, I'm not fully sure if 539fae and e4c7884 need to be reverted to fix it since main changes are in ec3c098. I just didn't want to take chances at first and put them all to the revert list. > Then today, without > changing anything else on one of the testsystems i reverted your revert > on that single system. After about an hour of testing, in 20 iterations > i got a hang again over localhost: > > titan:~> netstat -nt > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp 0 174592 10.0.1.14:34710 10.0.1.14:3632 ESTABLISHED > tcp 72145 0 10.0.1.14:3632 10.0.1.14:34710 ESTABLISHED > > so i hereby conclude that your revert works :) I've repeated the commit > below that resolves this nasty regression. ...I couldn't immediately find anything obviously wrong with those changes but the patch below might be worth of a try (without the revert of course). If it ever spits out that WARN_ON for you, we were playing with fire too much and it's better to return on the safe side there... -- i. [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on If header prediction is turned on under some circumstances, DA can deadlock though I have great trouble in figuring out how it could ever happen while ending up into that else branch (but I've been wrong before as well :-)). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c9454f0..0d9a3fe 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4595,6 +4595,9 @@ static int tcp_defer_accept_check(struct sock *sk) tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { tcp_reset(sk); return -1; + } else { + WARN_ON(tp->pred_flags); + tp->pred_flags = 0; } } return 0; -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 21:46 ` Ilpo Järvinen @ 2008-06-03 22:01 ` Ilpo Järvinen 2008-06-03 22:03 ` David Miller 2008-06-04 7:23 ` Ingo Molnar 1 sibling, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-03 22:01 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov, Patrick McManus [-- Attachment #1: Type: TEXT/PLAIN, Size: 647 bytes --] On Wed, 4 Jun 2008, Ilpo Järvinen wrote: > ...I couldn't immediately find anything obviously wrong with those changes > but the patch below might be worth of a try (without the revert of > course). If it ever spits out that WARN_ON for you, we were playing with > fire too much and it's better to return on the safe side there... > [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on > > If header prediction is turned on under some circumstances, > DA can deadlock though I have great trouble in figuring out ...Nah, keepalive timer would then eventually kill it then, so no deadlock seems possible through that one. -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 22:01 ` Ilpo Järvinen @ 2008-06-03 22:03 ` David Miller 2008-06-03 22:10 ` Ilpo Järvinen 2008-06-03 23:22 ` Ilpo Järvinen 0 siblings, 2 replies; 35+ messages in thread From: David Miller @ 2008-06-03 22:03 UTC (permalink / raw) To: ilpo.jarvinen Cc: mingo, peterz, linux-kernel, netdev, rjw, akpm, johnpol, mcmanus From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Wed, 4 Jun 2008 01:01:25 +0300 (EEST) > On Wed, 4 Jun 2008, Ilpo Järvinen wrote: > > > ...I couldn't immediately find anything obviously wrong with those changes > > but the patch below might be worth of a try (without the revert of > > course). If it ever spits out that WARN_ON for you, we were playing with > > fire too much and it's better to return on the safe side there... > > > > [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on > > > > If header prediction is turned on under some circumstances, > > DA can deadlock though I have great trouble in figuring out > > ...Nah, keepalive timer would then eventually kill it then, so no > deadlock seems possible through that one. Keepalive is very long, it might still "seem" like a deadlock for someone without much patience :-) ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 22:03 ` David Miller @ 2008-06-03 22:10 ` Ilpo Järvinen 2008-06-03 23:22 ` Ilpo Järvinen 1 sibling, 0 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-03 22:10 UTC (permalink / raw) To: David Miller Cc: mingo, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol, mcmanus [-- Attachment #1: Type: TEXT/PLAIN, Size: 1161 bytes --] On Tue, 3 Jun 2008, David Miller wrote: > From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> > Date: Wed, 4 Jun 2008 01:01:25 +0300 (EEST) > > > On Wed, 4 Jun 2008, Ilpo Järvinen wrote: > > > > > ...I couldn't immediately find anything obviously wrong with those changes > > > but the patch below might be worth of a try (without the revert of > > > course). If it ever spits out that WARN_ON for you, we were playing with > > > fire too much and it's better to return on the safe side there... > > > > > > > [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on > > > > > > If header prediction is turned on under some circumstances, > > > DA can deadlock though I have great trouble in figuring out > > > > ...Nah, keepalive timer would then eventually kill it then, so no > > deadlock seems possible through that one. > > Keepalive is very long, it might still "seem" like a deadlock for > someone without much patience :-) Sure, but I think Ingo kept it once hang for hours, ~8h or so... ...Though I'm not sure if that qualifies as a proof of Ingo's patience but rather shows his willingness to help solving the matter... ;-) -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 22:03 ` David Miller 2008-06-03 22:10 ` Ilpo Järvinen @ 2008-06-03 23:22 ` Ilpo Järvinen 2008-06-03 23:54 ` Joe Perches ` (3 more replies) 1 sibling, 4 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-03 23:22 UTC (permalink / raw) To: David Miller, mingo, mcmanus, peterz Cc: LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 3981 bytes --] On Tue, 3 Jun 2008, David Miller wrote: > From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> > Date: Wed, 4 Jun 2008 01:01:25 +0300 (EEST) > > > On Wed, 4 Jun 2008, Ilpo Järvinen wrote: > > > > > ...I couldn't immediately find anything obviously wrong with those changes > > > but the patch below might be worth of a try (without the revert of > > > course). If it ever spits out that WARN_ON for you, we were playing with > > > fire too much and it's better to return on the safe side there... > > > > > > > [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on > > > > > > If header prediction is turned on under some circumstances, > > > DA can deadlock though I have great trouble in figuring out > > > > ...Nah, keepalive timer would then eventually kill it then, so no > > deadlock seems possible through that one. > > Keepalive is very long, it might still "seem" like a deadlock for > someone without much patience :-) I think we want that clearing there, it's better to be safe than sorry there and to not put any trust on the keepalive thingie which tears down rather than results in a connection. But here's somewhat more likely explanation... Only compile tested... It probably needs some commenting from people who understand locking variants & details (I don't). -- i. -- [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk It seems that replacement of DA code also moved parts outside of appropriate locking. The Ingo's problem seems to come from the fact that two flows could now race in (inet_csk_)reqsk_queue_add corrupting the queue. ...This can leave dangling socks around which won't resolve themselves without stimuli from outside (e.g., external RST would help I think). Then some details I'm not too sure of: I guess we want to put listen_sk->sk_state checking under the lock as well. I've not evaluated if ->sk_data_ready too requires locking but assumed it does. I'm by no means familiar with all locking variants, requirements, etc. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 23 +++++++++++++---------- 1 files changed, 13 insertions(+), 10 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c9454f0..d21d2b9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); if (tp->defer_tcp_accept.request) { + struct sock *listen_sk = tp->defer_tcp_accept.listen_sk; int queued_data = tp->rcv_nxt - tp->copied_seq; int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? tcp_hdr((struct sk_buff *) @@ -4570,8 +4571,9 @@ static int tcp_defer_accept_check(struct sock *sk) if (queued_data && hasfin) queued_data--; - if (queued_data && - tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) { + bh_lock_sock(listen_sk); + + if (queued_data && listen_sk->sk_state == TCP_LISTEN) { if (sock_flag(sk, SOCK_KEEPOPEN)) { inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp)); @@ -4579,23 +4581,24 @@ static int tcp_defer_accept_check(struct sock *sk) inet_csk_delete_keepalive_timer(sk); } - inet_csk_reqsk_queue_add( - tp->defer_tcp_accept.listen_sk, - tp->defer_tcp_accept.request, - sk); + inet_csk_reqsk_queue_add(listen_sk, + tp->defer_tcp_accept.request, + sk); tp->defer_tcp_accept.listen_sk->sk_data_ready( - tp->defer_tcp_accept.listen_sk, 0); + listen_sk, 0); - sock_put(tp->defer_tcp_accept.listen_sk); + sock_put(listen_sk); sock_put(sk); tp->defer_tcp_accept.listen_sk = NULL; tp->defer_tcp_accept.request = NULL; - } else if (hasfin || - tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { + } else if (hasfin || listen_sk->sk_state != TCP_LISTEN) { + bh_unlock_sock(listen_sk); tcp_reset(sk); return -1; } + + bh_unlock_sock(listen_sk); } return 0; } -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 23:22 ` Ilpo Järvinen @ 2008-06-03 23:54 ` Joe Perches 2008-06-04 6:25 ` Ilpo Järvinen 2008-06-04 2:54 ` Patrick McManus ` (2 subsequent siblings) 3 siblings, 1 reply; 35+ messages in thread From: Joe Perches @ 2008-06-03 23:54 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, mingo, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote: > But here's somewhat more likely explanation... Only compile tested... > It probably needs some commenting from people who understand locking > variants & details (I don't). > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index c9454f0..d21d2b9 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk) > struct tcp_sock *tp = tcp_sk(sk); > > if (tp->defer_tcp_accept.request) { > + struct sock *listen_sk = tp->defer_tcp_accept.listen_sk; Not commenting on the locking, but I think the code would be clearer if tp->defer_tcp_accept was used in a temporary instead. Perhaps: net/ipv4/tcp_input.c | 33 ++++++++++++++++----------------- 1 files changed, 16 insertions(+), 17 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b54d9d3..f846e11 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4535,18 +4535,18 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th) static int tcp_defer_accept_check(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); - - if (tp->defer_tcp_accept.request) { - int queued_data = tp->rcv_nxt - tp->copied_seq; - int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? + struct tcp_deferred_accept_info *tdai = &tp->defer_tcp_accept; + if (tdai->request) { + int queued_data = tp->rcv_nxt - tp->copied_seq; + int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? tcp_hdr((struct sk_buff *) sk->sk_receive_queue.prev)->fin : 0; if (queued_data && hasfin) queued_data--; - if (queued_data && - tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) { + bh_lock_sock(tdai->listen_sk); + if (queued_data && tdai->listen_sk->sk_state == TCP_LISTEN) { if (sock_flag(sk, SOCK_KEEPOPEN)) { inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp)); @@ -4554,23 +4554,22 @@ static int tcp_defer_accept_check(struct sock *sk) inet_csk_delete_keepalive_timer(sk); } - inet_csk_reqsk_queue_add( - tp->defer_tcp_accept.listen_sk, - tp->defer_tcp_accept.request, - sk); + inet_csk_reqsk_queue_add(tdai->listen_sk, + tdai->request, + sk); - tp->defer_tcp_accept.listen_sk->sk_data_ready( - tp->defer_tcp_accept.listen_sk, 0); + tdai->listen_sk->sk_data_ready(tdai->listen_sk, 0); - sock_put(tp->defer_tcp_accept.listen_sk); + sock_put(tdai->listen_sk); sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } else if (hasfin || - tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { + tdai->listen_sk = NULL; + tdai->request = NULL; + } else if (hasfin || tdai->listen_sk->sk_state != TCP_LISTEN) { + bh_unlock_sock(tdai->listen_sk); tcp_reset(sk); return -1; } + bh_unlock_sock(tdai->listen_sk); } return 0; } ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 23:54 ` Joe Perches @ 2008-06-04 6:25 ` Ilpo Järvinen 0 siblings, 0 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-04 6:25 UTC (permalink / raw) To: Joe Perches Cc: David Miller, mingo, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 3532 bytes --] On Tue, 3 Jun 2008, Joe Perches wrote: > On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote: > > But here's somewhat more likely explanation... Only compile tested... > > It probably needs some commenting from people who understand locking > > variants & details (I don't). > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index c9454f0..d21d2b9 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk) > > struct tcp_sock *tp = tcp_sk(sk); > > > > if (tp->defer_tcp_accept.request) { > > + struct sock *listen_sk = tp->defer_tcp_accept.listen_sk; > > Not commenting on the locking, but I think the code > would be clearer if tp->defer_tcp_accept was used in > a temporary instead. Only fixes to net-2.6 now, as agreed. ...So I try to leave syntax cleanup to net-next. Besides, there's a trap which makes your patch a lot worse than mine... ;-) I too once tried a version without temporary until noticed that trap. > Perhaps: > > net/ipv4/tcp_input.c | 33 ++++++++++++++++----------------- > 1 files changed, 16 insertions(+), 17 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index b54d9d3..f846e11 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -4535,18 +4535,18 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th) > static int tcp_defer_accept_check(struct sock *sk) > { > struct tcp_sock *tp = tcp_sk(sk); > - > - if (tp->defer_tcp_accept.request) { > - int queued_data = tp->rcv_nxt - tp->copied_seq; > - int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? > + struct tcp_deferred_accept_info *tdai = &tp->defer_tcp_accept; > + if (tdai->request) { > + int queued_data = tp->rcv_nxt - tp->copied_seq; > + int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? > tcp_hdr((struct sk_buff *) > sk->sk_receive_queue.prev)->fin : 0; > > if (queued_data && hasfin) > queued_data--; > > - if (queued_data && > - tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) { > + bh_lock_sock(tdai->listen_sk); > + if (queued_data && tdai->listen_sk->sk_state == TCP_LISTEN) { > if (sock_flag(sk, SOCK_KEEPOPEN)) { > inet_csk_reset_keepalive_timer(sk, > keepalive_time_when(tp)); > @@ -4554,23 +4554,22 @@ static int tcp_defer_accept_check(struct sock *sk) > inet_csk_delete_keepalive_timer(sk); > } > > - inet_csk_reqsk_queue_add( > - tp->defer_tcp_accept.listen_sk, > - tp->defer_tcp_accept.request, > - sk); > + inet_csk_reqsk_queue_add(tdai->listen_sk, > + tdai->request, > + sk); > > - tp->defer_tcp_accept.listen_sk->sk_data_ready( > - tp->defer_tcp_accept.listen_sk, 0); > + tdai->listen_sk->sk_data_ready(tdai->listen_sk, 0); > > - sock_put(tp->defer_tcp_accept.listen_sk); > + sock_put(tdai->listen_sk); > sock_put(sk); > - tp->defer_tcp_accept.listen_sk = NULL; > - tp->defer_tcp_accept.request = NULL; > - } else if (hasfin || > - tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { > + tdai->listen_sk = NULL; > + tdai->request = NULL; uh-oh... > + } else if (hasfin || tdai->listen_sk->sk_state != TCP_LISTEN) { > + bh_unlock_sock(tdai->listen_sk); > tcp_reset(sk); > return -1; > } > + bh_unlock_sock(tdai->listen_sk); ...which will crash in here. NAK. > } > return 0; > } > > > -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 23:22 ` Ilpo Järvinen 2008-06-03 23:54 ` Joe Perches @ 2008-06-04 2:54 ` Patrick McManus 2008-06-04 6:42 ` Ilpo Järvinen 2008-06-05 14:22 ` Ingo Molnar 2008-06-10 22:32 ` David Miller 3 siblings, 1 reply; 35+ messages in thread From: Patrick McManus @ 2008-06-04 2:54 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, mingo, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote: == > -- > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk > > It seems that replacement of DA code also moved parts outside > of appropriate locking. The Ingo's problem seems to come from > the fact that two flows could now race in > (inet_csk_)reqsk_queue_add corrupting the queue. ...This can > leave dangling socks around which won't resolve themselves > without stimuli from outside (e.g., external RST would help > I think). Ilpo, has anyone told you today that you rock? Well allow me - you rock. > do_rcv() clearly has the listening socket locked in the non-DA case, and in the DA case it is the 'child' ESTABLISHED socket that is locked - leaving the accept queue unprotected. So simple. -Pat ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-04 2:54 ` Patrick McManus @ 2008-06-04 6:42 ` Ilpo Järvinen 0 siblings, 0 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-04 6:42 UTC (permalink / raw) To: Patrick McManus Cc: David Miller, mingo, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 1045 bytes --] On Tue, 3 Jun 2008, Patrick McManus wrote: > On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote: > == > > -- > > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk > > > > It seems that replacement of DA code also moved parts outside > > of appropriate locking. The Ingo's problem seems to come from > > the fact that two flows could now race in > > (inet_csk_)reqsk_queue_add corrupting the queue. ...This can > > leave dangling socks around which won't resolve themselves > > without stimuli from outside (e.g., external RST would help > > I think). > do_rcv() clearly has the listening socket locked in the non-DA case, and > in the DA case it is the 'child' ESTABLISHED socket that is locked - > leaving the accept queue unprotected. So simple. It also well explains why Ingo finally got the KERNEL: assertion (!sk->sk_ack_backlog) which is adjusted in the very same reqsk_queue_add making it and the actual queue come out of sync. ...too bad this has no relevance to Håkon's case, so more digging is necessary... :-) -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 23:22 ` Ilpo Järvinen 2008-06-03 23:54 ` Joe Perches 2008-06-04 2:54 ` Patrick McManus @ 2008-06-05 14:22 ` Ingo Molnar 2008-06-05 18:00 ` Ilpo Järvinen 2008-06-10 22:32 ` David Miller 3 siblings, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-06-05 14:22 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk i unapplied the reverts then applied your fix, and got this hang after 1.5 hours of testing: titan:~> netstat -nt Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 65032 10.0.1.14:39454 10.0.1.19:3632 ESTABLISHED 10.0.1.14 is this box, Core2Duo dual-core. 10.0.1.19 is another box with 16 CPUs. Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-05 14:22 ` Ingo Molnar @ 2008-06-05 18:00 ` Ilpo Järvinen 2008-06-05 21:13 ` Ilpo Järvinen 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-05 18:00 UTC (permalink / raw) To: Ingo Molnar Cc: David Miller, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 669 bytes --] On Thu, 5 Jun 2008, Ingo Molnar wrote: > > * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk > > i unapplied the reverts then applied your fix, and got this hang after > 1.5 hours of testing: > > titan:~> netstat -nt > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp 0 65032 10.0.1.14:39454 10.0.1.19:3632 ESTABLISHED > > 10.0.1.14 is this box, Core2Duo dual-core. 10.0.1.19 is another box with > 16 CPUs. Did 10.0.1.19 too have the fix (the receiving end is the important one to have this fix)? -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-05 18:00 ` Ilpo Järvinen @ 2008-06-05 21:13 ` Ilpo Järvinen 2008-06-05 23:29 ` Patrick McManus 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-05 21:13 UTC (permalink / raw) To: Ingo Molnar Cc: David Miller, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 11941 bytes --] On Thu, 5 Jun 2008, Ilpo Järvinen wrote: > On Thu, 5 Jun 2008, Ingo Molnar wrote: > > > > > * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > > > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk > > > > i unapplied the reverts then applied your fix, and got this hang after > > 1.5 hours of testing: > > > > titan:~> netstat -nt > > Active Internet connections (w/o servers) > > Proto Recv-Q Send-Q Local Address Foreign Address State > > tcp 0 65032 10.0.1.14:39454 10.0.1.19:3632 ESTABLISHED > > > > 10.0.1.14 is this box, Core2Duo dual-core. 10.0.1.19 is another box with > > 16 CPUs. > > Did 10.0.1.19 too have the fix (the receiving end is the important > one to have this fix)? Since you're probably going to tell that it was, here's a try which just reverts the most questionable commit out of those three DA related changes that were included in the first revert you found to be working (the other two are quite useful fixes). I'm out of new ideas what could be still wrong (I got confused and lost track number of times while I tried to verify socket locking today and probably don't have more time for that now)... Unless somebody else (Patrick?) comes up with something quickly, I propose you Ingo try if this smaller revert is enough to solve it and we leave those two fixes there and postpone the rest. If it isn't enough then things would again get quite interesting but that's unlikely to happen. -- i. -- [PATCH] tcp: revert DEFER_ACCEPT updates - process as established Reverts "[TCP]: TCP_DEFER_ACCEPT updates - process as established" (ec3c0982a2dd1e6) and a fix made into it ("tcp: Fix slab corruption with ipv6 and tcp6fuzz"; 9ae27e0adbf471c7). Ingo Molnar found out that TCP flows got stuck into ESTABLISHED state without receiving process. I tried to fix the locking for listen_sk: http://marc.info/?l=linux-kernel&m=121253537018216&w=2 ...but Ingo's tests still got stuck TCP flow with that fix. Two other DA related fixes which were among the reverted commits in the first revert attempt are not reverted here. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- include/linux/tcp.h | 7 ------ include/net/request_sock.h | 4 +- include/net/tcp.h | 1 - net/ipv4/inet_connection_sock.c | 11 +++++++-- net/ipv4/tcp.c | 18 +++++++++------ net/ipv4/tcp_input.c | 45 --------------------------------------- net/ipv4/tcp_ipv4.c | 8 ------- net/ipv4/tcp_minisocks.c | 32 ++++++++++----------------- net/ipv4/tcp_timer.c | 5 ---- 9 files changed, 33 insertions(+), 98 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 9881295..07e79bd 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -239,11 +239,6 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) return (struct tcp_request_sock *)req; } -struct tcp_deferred_accept_info { - struct sock *listen_sk; - struct request_sock *request; -}; - struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; @@ -379,8 +374,6 @@ struct tcp_sock { unsigned int keepalive_time; /* time before keep alive takes place */ unsigned int keepalive_intvl; /* time interval between keep alive probes */ - struct tcp_deferred_accept_info defer_tcp_accept; - unsigned long last_synq_overflow; /* Receiver side RTT estimation */ diff --git a/include/net/request_sock.h b/include/net/request_sock.h index b220b5f..0c96e7b 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -115,8 +115,8 @@ struct request_sock_queue { struct request_sock *rskq_accept_head; struct request_sock *rskq_accept_tail; rwlock_t syn_wait_lock; - u16 rskq_defer_accept; - /* 2 bytes hole, try to pack */ + u8 rskq_defer_accept; + /* 3 bytes hole, try to pack */ struct listen_sock *listen_opt; }; diff --git a/include/net/tcp.h b/include/net/tcp.h index 633147c..981d5ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -139,7 +139,6 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); #define MAX_TCP_KEEPINTVL 32767 #define MAX_TCP_KEEPCNT 127 #define MAX_TCP_SYNCNT 127 -#define MAX_TCP_ACCEPT_DEFERRED 65535 #define TCP_SYNQ_INTERVAL (HZ/5) /* Period of SYNACK timer */ diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 828ea21..045e799 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -419,7 +419,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, struct inet_connection_sock *icsk = inet_csk(parent); struct request_sock_queue *queue = &icsk->icsk_accept_queue; struct listen_sock *lopt = queue->listen_opt; - int thresh = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries; + int max_retries = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries; + int thresh = max_retries; unsigned long now = jiffies; struct request_sock **reqp, *req; int i, budget; @@ -455,6 +456,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, } } + if (queue->rskq_defer_accept) + max_retries = queue->rskq_defer_accept; + budget = 2 * (lopt->nr_table_entries / (timeout / interval)); i = lopt->clock_hand; @@ -462,8 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, reqp=&lopt->syn_table[i]; while ((req = *reqp) != NULL) { if (time_after_eq(now, req->expires)) { - if (req->retrans < thresh && - !req->rsk_ops->rtx_syn_ack(parent, req)) { + if ((req->retrans < (inet_rsk(req)->acked ? max_retries : thresh)) && + (inet_rsk(req)->acked || + !req->rsk_ops->rtx_syn_ack(parent, req))) { unsigned long timeo; if (req->retrans++ == 0) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ab66683..fc54a48 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2112,12 +2112,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level, break; case TCP_DEFER_ACCEPT: - if (val < 0) { - err = -EINVAL; - } else { - if (val > MAX_TCP_ACCEPT_DEFERRED) - val = MAX_TCP_ACCEPT_DEFERRED; - icsk->icsk_accept_queue.rskq_defer_accept = val; + icsk->icsk_accept_queue.rskq_defer_accept = 0; + if (val > 0) { + /* Translate value in seconds to number of + * retransmits */ + while (icsk->icsk_accept_queue.rskq_defer_accept < 32 && + val > ((TCP_TIMEOUT_INIT / HZ) << + icsk->icsk_accept_queue.rskq_defer_accept)) + icsk->icsk_accept_queue.rskq_defer_accept++; + icsk->icsk_accept_queue.rskq_defer_accept++; } break; @@ -2299,7 +2302,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, val = (val ? : sysctl_tcp_fin_timeout) / HZ; break; case TCP_DEFER_ACCEPT: - val = icsk->icsk_accept_queue.rskq_defer_accept; + val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 : + ((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1)); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1322fa1..fb1ab3f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4557,49 +4557,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th) } } -static int tcp_defer_accept_check(struct sock *sk) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (tp->defer_tcp_accept.request) { - int queued_data = tp->rcv_nxt - tp->copied_seq; - int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? - tcp_hdr((struct sk_buff *) - sk->sk_receive_queue.prev)->fin : 0; - - if (queued_data && hasfin) - queued_data--; - - if (queued_data && - tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) { - if (sock_flag(sk, SOCK_KEEPOPEN)) { - inet_csk_reset_keepalive_timer(sk, - keepalive_time_when(tp)); - } else { - inet_csk_delete_keepalive_timer(sk); - } - - inet_csk_reqsk_queue_add( - tp->defer_tcp_accept.listen_sk, - tp->defer_tcp_accept.request, - sk); - - tp->defer_tcp_accept.listen_sk->sk_data_ready( - tp->defer_tcp_accept.listen_sk, 0); - - sock_put(tp->defer_tcp_accept.listen_sk); - sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } else if (hasfin || - tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { - tcp_reset(sk); - return -1; - } - } - return 0; -} - static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen) { struct tcp_sock *tp = tcp_sk(sk); @@ -4974,8 +4931,6 @@ step5: tcp_data_snd_check(sk); tcp_ack_snd_check(sk); - - tcp_defer_accept_check(sk); return 0; csum_error: diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index cd601a8..b698b5b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1918,14 +1918,6 @@ int tcp_v4_destroy_sock(struct sock *sk) sk->sk_sndmsg_page = NULL; } - if (tp->defer_tcp_accept.request) { - reqsk_free(tp->defer_tcp_accept.request); - sock_put(tp->defer_tcp_accept.listen_sk); - sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } - atomic_dec(&tcp_sockets_allocated); return 0; diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 019c8c1..8245247 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -571,8 +571,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, does sequence test, SYN is truncated, and thus we consider it a bare ACK. - Both ends (listening sockets) accept the new incoming - connection and try to talk to each other. 8-) + If icsk->icsk_accept_queue.rskq_defer_accept, we silently drop this + bare ACK. Otherwise, we create an established connection. Both + ends (listening sockets) accept the new incoming connection and try + to talk to each other. 8-) Note: This case is both harmless, and rare. Possibility is about the same as us discovering intelligent life on another plant tomorrow. @@ -640,6 +642,13 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, if (!(flg & TCP_FLAG_ACK)) return NULL; + /* If TCP_DEFER_ACCEPT is set, drop bare ACK. */ + if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept && + TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) { + inet_rsk(req)->acked = 1; + return NULL; + } + /* OK, ACK is valid, create big socket and * feed this segment to it. It will repeat all * the tests. THIS SEGMENT MUST MOVE SOCKET TO @@ -678,24 +687,7 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, inet_csk_reqsk_queue_unlink(sk, req, prev); inet_csk_reqsk_queue_removed(sk, req); - if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept && - TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) { - - /* the accept queue handling is done is est recv slow - * path so lets make sure to start there - */ - tcp_sk(child)->pred_flags = 0; - sock_hold(sk); - sock_hold(child); - tcp_sk(child)->defer_tcp_accept.listen_sk = sk; - tcp_sk(child)->defer_tcp_accept.request = req; - - inet_csk_reset_keepalive_timer(child, - inet_csk(sk)->icsk_accept_queue.rskq_defer_accept * HZ); - } else { - inet_csk_reqsk_queue_add(sk, req, child); - } - + inet_csk_reqsk_queue_add(sk, req, child); return child; listen_overflow: diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 4de68cf..63ed9d6 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -489,11 +489,6 @@ static void tcp_keepalive_timer (unsigned long data) goto death; } - if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { - tcp_send_active_reset(sk, GFP_ATOMIC); - goto death; - } - if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE) goto out; -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-05 21:13 ` Ilpo Järvinen @ 2008-06-05 23:29 ` Patrick McManus 2008-06-06 10:03 ` Ilpo Järvinen 0 siblings, 1 reply; 35+ messages in thread From: Patrick McManus @ 2008-06-05 23:29 UTC (permalink / raw) To: Ilpo Järvinen Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol On Fri, 2008-06-06 at 00:13 +0300, Ilpo Järvinen wrote: > > I'm out of new ideas what could be still wrong (I got confused and > lost > track number of times while I tried to verify socket locking today and > probably don't have more time for that now)... Unless somebody else > (Patrick?) comes up with something quickly, Sorry, I don't see anything - it seems to boil down to the same code in the DA and non-DA case as far as I can tell, but after a while all the twisty passages seem to look alike. If Ingo confirms that the recv end was running the locking patch code, it would be interesting to just confirm the sysreq+t looks the same as before - it is possible the patch turned the race into a non-obvious deadlock. I'm sure your smaller revert will make the problem go away just as the larger one did, fwiw. The other odd thing is that Ingo did a lot of experimentation and was only making this happen on localhost before (though I agree there is nothing inherent about that lock and localhost) - isn't it odd that the first trigger of it now is between two hosts? What do you make of that? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-05 23:29 ` Patrick McManus @ 2008-06-06 10:03 ` Ilpo Järvinen 2008-06-06 17:11 ` Patrick McManus 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-06 10:03 UTC (permalink / raw) To: Patrick McManus Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 1779 bytes --] On Thu, 5 Jun 2008, Patrick McManus wrote: > On Fri, 2008-06-06 at 00:13 +0300, Ilpo Järvinen wrote: > > > > > I'm out of new ideas what could be still wrong (I got confused and > > lost > > track number of times while I tried to verify socket locking today and > > probably don't have more time for that now)... Unless somebody else > > (Patrick?) comes up with something quickly, > > Sorry, I don't see anything - it seems to boil down to the same code in > the DA and non-DA case as far as I can tell, but after a while all the > twisty passages seem to look alike. :-) This Ingo's testcase should anyway be quite "simple", I mean that distcc shouldn't do anything unexpected in a sense it shouldn't abort the flows by not sending data, close the listening socket or other things like that. > If Ingo confirms that the recv end was running the locking patch code, > it would be interesting to just confirm the sysreq+t looks the same as > before - it is possible the patch turned the race into a non-obvious > deadlock. ...Yes, but we want that from the receiver's host rather than from the sender end. Also checking that sender is still doing window probes once per 2min is probably worth of it though a change in that is quite unlikely. > I'm sure your smaller revert will make the problem go away just as the > larger one did, fwiw. I'd very much except it to. > The other odd thing is that Ingo did a lot of experimentation and was > only making this happen on localhost before (though I agree there is > nothing inherent about that lock and localhost) - isn't it odd that the > first trigger of it now is between two hosts? What do you make of that? ...No, it occurred couple of times in the past between host as well, nothing new in that. -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 10:03 ` Ilpo Järvinen @ 2008-06-06 17:11 ` Patrick McManus 2008-06-06 17:33 ` Ingo Molnar 2008-06-06 18:25 ` Ilpo Järvinen 0 siblings, 2 replies; 35+ messages in thread From: Patrick McManus @ 2008-06-06 17:11 UTC (permalink / raw) To: Ilpo Järvinen Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol > This Ingo's testcase should anyway be quite "simple", I mean that distcc > shouldn't do anything unexpected in a sense it shouldn't abort the flows > by not sending data, close the listening socket or other things like that. > maybe - I've noted that I can get the distcc server to crash with just a little fuzz (telnet to it and close the telnet) - but it is true I haven't seen anything odd using the distcc client. Anyhow, my news is that using rc5 I have managed to reproduce it on localhost - so it isn't just ingo anymore ! ;) I didn't have a 16 cpu machine at my disposal, so I was concerned I wouldn't be able to make it happen - but I setup a 64 bit kvm image with -smp 4, running on my core2-duo and created a makefile with 3 src files that I just compiled as "while true; do make -j3 all; done" - the makefile uses distcc to localhost and has intentionally broken dependencies so it just keeps recompiling stuff. The input files are approximately 135k, 98k, and 16k after running gcc -E on them (which I what I assume distcc does before putting them down the socket). On rc5 I could get the lockup in under 20 minutes.. usually 10. I think I did it 4 times. My compile test is probably a better trigger than the kernel compile because the distcc connects are never staggered like they would be in a large directory of files. (3 files, -j4). When I apply the locking patch you (Ilpo) wrote, I cannot reproduce the error at all in the first 90 minutes of testing. I'll let the test run and update the list. I'm holding out hope that Ingo's report did not have the locking patch on the distcc server end - because it certainly makes a difference for me. -Patrick ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 17:11 ` Patrick McManus @ 2008-06-06 17:33 ` Ingo Molnar 2008-06-06 18:19 ` Ilpo Järvinen 2008-06-06 18:25 ` Ilpo Järvinen 1 sibling, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-06-06 17:33 UTC (permalink / raw) To: Patrick McManus Cc: Ilpo Järvinen, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol * Patrick McManus <mcmanus@ducksong.com> wrote: > When I apply the locking patch you (Ilpo) wrote, I cannot reproduce > the error at all in the first 90 minutes of testing. I'll let the test > run and update the list. > > I'm holding out hope that Ingo's report did not have the locking patch > on the distcc server end - because it certainly makes a difference for > me. Hm, the distcc server had the full 3-patch-revert from Ilpo, was that supposed to fix the problem too, indirectly? The box is running that 3-patch revert right now as well: phoenix:~> uptime 19:20:28 up 9:58, 2 users, load average: 7.75, 13.88, 30.95 phoenix:~> uname -a Linux phoenix 2.6.26-rc4 #2352 SMP Fri Jun 6 09:18:07 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux ... and i never saw a single hang today in the 10 hours of uptime this box has. (and it built a good 500 kernel today) Nor any hang yesterday, and that was a good 500 kernels too. You can see it that the box built more than two thousand kernels in the past few days alone, so it's a rather busy little bee. The other testboxes built even more kernels - a quad box built and booted 2500 kernels: #define UTS_VERSION "#2524 SMP PREEMPT Fri Jun 6 19:22:21 CEST 2008" and i never saw a hang on that box either. a third box has: titan:~> uname -a Linux titan 2.6.26-rc5-00002-g737697d-dirty #2557 SMP PREEMPT Fri Jun 6 19:24:00 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux (this is the one that showed the hang for the first time) The total count of kernel bootups i did this week for -tip QA was somewhere between five and ten thousand random build+bootups - and the only time i got a hang was when i removed the 3-patch-revert intentionally, on one of the boxes. Maybe that 3-patch-revert just makes this locking bug a bit less likely to trigger, by accident? Out tip test-setup is specialized to find arch/x86 and scheduler bugs, not primarily to find networking bugs. (but at this test volume, and given that it makes use of distcc, it will trigger them too.) i have a rather accurate timeline of when the hang first occured, do we know the timeline of the introduction of the locking bug by any chance? Which commit introduced it? (Ilpo's commit log does not say it) Your test results are compelling nevertheless so i'll do a retest in any case, with all boxes either running an older kernel or a kernel with the locking fix. Ingo ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 17:33 ` Ingo Molnar @ 2008-06-06 18:19 ` Ilpo Järvinen 2008-06-06 18:39 ` Ingo Molnar 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-06 18:19 UTC (permalink / raw) To: Ingo Molnar Cc: Patrick McManus, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol ...I kind of fail to follow in general in this mail which patch have been tested and where and when... But I understand that it's just due to number of tests & hosts & kernels & what-else you use and know by heart (and we don't do all that well :-)). But I'll try to still sort it out below... On Fri, 6 Jun 2008, Ingo Molnar wrote: > > * Patrick McManus <mcmanus@ducksong.com> wrote: > > > When I apply the locking patch you (Ilpo) wrote, I cannot reproduce > > the error at all in the first 90 minutes of testing. I'll let the test > > run and update the list. > > > > I'm holding out hope that Ingo's report did not have the locking patch > > on the distcc server end - because it certainly makes a difference for > > me. > > Hm, the distcc server had the full 3-patch-revert from Ilpo, was that > supposed to fix the problem too, indirectly? Yes, the problematic outside of locking portion shouldn't be there without those DA changes. > The box is running that 3-patch revert right now as well: > > phoenix:~> uptime > 19:20:28 up 9:58, 2 users, load average: 7.75, 13.88, 30.95 > phoenix:~> uname -a > Linux phoenix 2.6.26-rc4 #2352 SMP Fri Jun 6 09:18:07 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux > > ... and i never saw a single hang today in the 10 hours of uptime this > box has. (and it built a good 500 kernel today) Nor any hang yesterday, > and that was a good 500 kernels too. > > You can see it that the box built more than two thousand kernels in the > past few days alone, so it's a rather busy little bee. The other > testboxes built even more kernels - a quad box built and booted 2500 > kernels: > > #define UTS_VERSION "#2524 SMP PREEMPT Fri Jun 6 19:22:21 CEST 2008" > > and i never saw a hang on that box either. > > a third box has: > > titan:~> uname -a > Linux titan 2.6.26-rc5-00002-g737697d-dirty #2557 SMP PREEMPT Fri Jun 6 > 19:24:00 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux > > (this is the one that showed the hang for the first time) > > The total count of kernel bootups i did this week for -tip QA was > somewhere between five and ten thousand random build+bootups - and the > only time i got a hang was when i removed the 3-patch-revert > intentionally, on one of the boxes. ...and you added the locking fix there instead? Or was this a removal? > Maybe that 3-patch-revert just makes this locking bug a bit less likely > to trigger, by accident? No, part of the DEFER_ACCEPT stuff was postponed in 2.6.25..2.6.26-rc1 timeframe (ec3c0982a2dd1e671bad8e9d26c28dcba0039d87) so that one portion of it ended up being added outside of the socket lock of the listening socket, while touching its datastructures. Without ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 the deferred accept related things happen earlier, ie., while we still are under the lock of the listening socket. So that particular locking bug was _introduced_ by that ec3c change, not made more likely or so. ...Of course software is known to have bugs, so we might always be (un?)lucky and hit another one and confuse... :-) > Out tip test-setup is specialized to find > arch/x86 and scheduler bugs, not primarily to find networking bugs. (but > at this test volume, and given that it makes use of distcc, it will > trigger them too.) It seems to work quite well actually for this kind of networking related bugs too which hardly depend on network at all :-). > i have a rather accurate timeline of when the hang first occured, do we > know the timeline of the introduction of the locking bug by any chance? > Which commit introduced it? (Ilpo's commit log does not say it) Ah, sorry I forgot to add that one there, it was sent quite late in the night and I just couldn't get sleep until sending the fix... :-) It was one of the reverted ones that did it: ec3c0982a2dd1e671bad8e9d26c28dcba0039d87. > Your test results are compelling nevertheless so i'll do a retest in any > case, with all boxes either running an older kernel or a kernel with the > locking fix. If you want an older kernel, you would have to go basically to 2.6.25 or so. To summarize. Both 3changes+1fix revert (you refer to it only as 3-patch revert) _and_ the locking fix I made should fix the problem (obviously they exclude each other). ...And end which is significant is the one which has LISTENing sockets (please keep this in mind if you still get the hang and provide some info). -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 18:19 ` Ilpo Järvinen @ 2008-06-06 18:39 ` Ingo Molnar 2008-06-06 19:49 ` Ilpo Järvinen 2008-06-06 20:08 ` Patrick McManus 0 siblings, 2 replies; 35+ messages in thread From: Ingo Molnar @ 2008-06-06 18:39 UTC (permalink / raw) To: Ilpo Järvinen Cc: Patrick McManus, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > If you want an older kernel, you would have to go basically to 2.6.25 > or so. correct, that's what i use as fallback, some distro kernel which is 2.6.25 or older. but i'm confused a bit, you say v2.6.25-rc6-475-gec3c098 introduced the locking problem - so 2.6.25 is affected as well? This is a significant question because the fallback kernel is kernel-2.6.25.3-18.fc9.x86_64 on the 16-way box. (all other build-boxes have 2.6.24 or older as a fallback kernel) > To summarize. Both 3changes+1fix revert (you refer to it only as > 3-patch revert) _and_ the locking fix I made should fix the problem > (obviously they exclude each other). ...And end which is significant > is the one which has LISTENing sockets (please keep this in mind if > you still get the hang and provide some info). ok. For completeness, let me repeat the patch i referred to as the '3-patch-revert' below. (which indeed is 3+1 as you note) this is the patch that appears to be working empirically. (Disclaimer: it might just hide the problem, change timings, have a lucky code layout, etc.) Ingo -----------------> commit 9e5b6ca8fa8e07aceaddbfb0f2a324449e5ef014 Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Date: Sat May 31 15:18:56 2008 +0300 tcp: revert DEFER_ACCEPT modifications On Sat, 31 May 2008, Ilpo Järvinen wrote: > On Sat, 31 May 2008, Ingo Molnar wrote: > > > ok, once i added back the localhost distcc component and the hung kernel > > build + stuck TCP socket bug happened again overnight: > > > > Active Internet connections (w/o servers) > > Proto Recv-Q Send-Q Local Address Foreign Address State > > tcp 72187 0 10.0.1.14:3632 10.0.1.14:47910 ESTABLISHED > > tcp 0 174464 10.0.1.14:47910 10.0.1.14:3632 ESTABLISHED > > > > so it seems distcc over localhost was the aspect that made it fail. > > > > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_ > > sides of the connection. But that is just a theory - it could be timing, > > etc. > > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were > some post 2.6.25 changes into it)? Hmm, if so, please try this revert below (I hope I got everything that is related there, gitk is currently broken for me so that viewing many commits quickly is a pain, so I might have accidently missed something)... Reverts these commits: [TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_t [TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack [TCP]: TCP_DEFER_ACCEPT updates - process as established tcp: Fix slab corruption with ipv6 and tcp6fuzz ..., ie., 539fae89bebd16ebeafd57a87169bc56eb530d76, e4c78840284f3f51b1896cf3936d60a6033c4d2c, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 and 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 18e62e3..b31b6b7 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -239,11 +239,6 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) return (struct tcp_request_sock *)req; } -struct tcp_deferred_accept_info { - struct sock *listen_sk; - struct request_sock *request; -}; - struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; @@ -379,8 +374,6 @@ struct tcp_sock { unsigned int keepalive_intvl; /* time interval between keep alive probes */ int linger2; - struct tcp_deferred_accept_info defer_tcp_accept; - unsigned long last_synq_overflow; u32 tso_deferred; diff --git a/include/net/request_sock.h b/include/net/request_sock.h index b220b5f..0c96e7b 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -115,8 +115,8 @@ struct request_sock_queue { struct request_sock *rskq_accept_head; struct request_sock *rskq_accept_tail; rwlock_t syn_wait_lock; - u16 rskq_defer_accept; - /* 2 bytes hole, try to pack */ + u8 rskq_defer_accept; + /* 3 bytes hole, try to pack */ struct listen_sock *listen_opt; }; diff --git a/include/net/tcp.h b/include/net/tcp.h index 633147c..981d5ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -139,7 +139,6 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); #define MAX_TCP_KEEPINTVL 32767 #define MAX_TCP_KEEPCNT 127 #define MAX_TCP_SYNCNT 127 -#define MAX_TCP_ACCEPT_DEFERRED 65535 #define TCP_SYNQ_INTERVAL (HZ/5) /* Period of SYNACK timer */ diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 828ea21..ec83448 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -419,7 +419,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, struct inet_connection_sock *icsk = inet_csk(parent); struct request_sock_queue *queue = &icsk->icsk_accept_queue; struct listen_sock *lopt = queue->listen_opt; - int thresh = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries; + int max_retries = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries; + int thresh = max_retries; unsigned long now = jiffies; struct request_sock **reqp, *req; int i, budget; @@ -455,6 +456,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, } } + if (queue->rskq_defer_accept) + max_retries = queue->rskq_defer_accept; + budget = 2 * (lopt->nr_table_entries / (timeout / interval)); i = lopt->clock_hand; @@ -462,8 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent, reqp=&lopt->syn_table[i]; while ((req = *reqp) != NULL) { if (time_after_eq(now, req->expires)) { - if (req->retrans < thresh && - !req->rsk_ops->rtx_syn_ack(parent, req)) { + if ((req->retrans < thresh || + (inet_rsk(req)->acked && req->retrans < max_retries)) + && !req->rsk_ops->rtx_syn_ack(parent, req)) { unsigned long timeo; if (req->retrans++ == 0) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ab66683..fc54a48 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2112,12 +2112,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level, break; case TCP_DEFER_ACCEPT: - if (val < 0) { - err = -EINVAL; - } else { - if (val > MAX_TCP_ACCEPT_DEFERRED) - val = MAX_TCP_ACCEPT_DEFERRED; - icsk->icsk_accept_queue.rskq_defer_accept = val; + icsk->icsk_accept_queue.rskq_defer_accept = 0; + if (val > 0) { + /* Translate value in seconds to number of + * retransmits */ + while (icsk->icsk_accept_queue.rskq_defer_accept < 32 && + val > ((TCP_TIMEOUT_INIT / HZ) << + icsk->icsk_accept_queue.rskq_defer_accept)) + icsk->icsk_accept_queue.rskq_defer_accept++; + icsk->icsk_accept_queue.rskq_defer_accept++; } break; @@ -2299,7 +2302,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level, val = (val ? : sysctl_tcp_fin_timeout) / HZ; break; case TCP_DEFER_ACCEPT: - val = icsk->icsk_accept_queue.rskq_defer_accept; + val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 : + ((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1)); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index eba873e..cad73b7 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4541,49 +4541,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th) } } -static int tcp_defer_accept_check(struct sock *sk) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (tp->defer_tcp_accept.request) { - int queued_data = tp->rcv_nxt - tp->copied_seq; - int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? - tcp_hdr((struct sk_buff *) - sk->sk_receive_queue.prev)->fin : 0; - - if (queued_data && hasfin) - queued_data--; - - if (queued_data && - tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) { - if (sock_flag(sk, SOCK_KEEPOPEN)) { - inet_csk_reset_keepalive_timer(sk, - keepalive_time_when(tp)); - } else { - inet_csk_delete_keepalive_timer(sk); - } - - inet_csk_reqsk_queue_add( - tp->defer_tcp_accept.listen_sk, - tp->defer_tcp_accept.request, - sk); - - tp->defer_tcp_accept.listen_sk->sk_data_ready( - tp->defer_tcp_accept.listen_sk, 0); - - sock_put(tp->defer_tcp_accept.listen_sk); - sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } else if (hasfin || - tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { - tcp_reset(sk); - return -1; - } - } - return 0; -} - static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen) { struct tcp_sock *tp = tcp_sk(sk); @@ -4944,8 +4901,6 @@ step5: tcp_data_snd_check(sk); tcp_ack_snd_check(sk); - - tcp_defer_accept_check(sk); return 0; csum_error: diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index cd601a8..b698b5b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1918,14 +1918,6 @@ int tcp_v4_destroy_sock(struct sock *sk) sk->sk_sndmsg_page = NULL; } - if (tp->defer_tcp_accept.request) { - reqsk_free(tp->defer_tcp_accept.request); - sock_put(tp->defer_tcp_accept.listen_sk); - sock_put(sk); - tp->defer_tcp_accept.listen_sk = NULL; - tp->defer_tcp_accept.request = NULL; - } - atomic_dec(&tcp_sockets_allocated); return 0; diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 019c8c1..8245247 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -571,8 +571,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, does sequence test, SYN is truncated, and thus we consider it a bare ACK. - Both ends (listening sockets) accept the new incoming - connection and try to talk to each other. 8-) + If icsk->icsk_accept_queue.rskq_defer_accept, we silently drop this + bare ACK. Otherwise, we create an established connection. Both + ends (listening sockets) accept the new incoming connection and try + to talk to each other. 8-) Note: This case is both harmless, and rare. Possibility is about the same as us discovering intelligent life on another plant tomorrow. @@ -640,6 +642,13 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, if (!(flg & TCP_FLAG_ACK)) return NULL; + /* If TCP_DEFER_ACCEPT is set, drop bare ACK. */ + if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept && + TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) { + inet_rsk(req)->acked = 1; + return NULL; + } + /* OK, ACK is valid, create big socket and * feed this segment to it. It will repeat all * the tests. THIS SEGMENT MUST MOVE SOCKET TO @@ -678,24 +687,7 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb, inet_csk_reqsk_queue_unlink(sk, req, prev); inet_csk_reqsk_queue_removed(sk, req); - if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept && - TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) { - - /* the accept queue handling is done is est recv slow - * path so lets make sure to start there - */ - tcp_sk(child)->pred_flags = 0; - sock_hold(sk); - sock_hold(child); - tcp_sk(child)->defer_tcp_accept.listen_sk = sk; - tcp_sk(child)->defer_tcp_accept.request = req; - - inet_csk_reset_keepalive_timer(child, - inet_csk(sk)->icsk_accept_queue.rskq_defer_accept * HZ); - } else { - inet_csk_reqsk_queue_add(sk, req, child); - } - + inet_csk_reqsk_queue_add(sk, req, child); return child; listen_overflow: diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 4de68cf..63ed9d6 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -489,11 +489,6 @@ static void tcp_keepalive_timer (unsigned long data) goto death; } - if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { - tcp_send_active_reset(sk, GFP_ATOMIC); - goto death; - } - if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE) goto out; ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 18:39 ` Ingo Molnar @ 2008-06-06 19:49 ` Ilpo Järvinen 2008-06-06 20:08 ` Patrick McManus 1 sibling, 0 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-06 19:49 UTC (permalink / raw) To: Ingo Molnar Cc: Patrick McManus, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 3643 bytes --] On Fri, 6 Jun 2008, Ingo Molnar wrote: > > * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > If you want an older kernel, you would have to go basically to 2.6.25 > > or so. > > correct, that's what i use as fallback, some distro kernel which is > 2.6.25 or older. > > but i'm confused a bit, you say v2.6.25-rc6-475-gec3c098 introduced the > locking problem - so 2.6.25 is affected as well? No, you're probably just falling into a git-describe trap I also used to fall: ijjarvin@pointhope:~/linux/mainline$ git-log -n 1 --pretty=oneline ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 ^v2.6.25 | cat - ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 [TCP]: TCP_DEFER_ACCEPT updates - process as established ijjarvin@pointhope:~/linux/mainline$ git-log -n 1 --pretty=oneline ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 ^v2.6.26-rc1 | cat - ijjarvin@pointhope:~/linux/mainline$ git-describe ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 v2.6.25-rc6-475-gec3c098 ijjarvin@pointhope:~/linux/mainline$ The git-describe is not the way one can determine into which mainline tag a commit was included, it basically just provides the closest tag among ancestors, which can be a vastly different one and has _no_ relation whatsoever to the tag we'd desire to get. In here, Dave had net-2.6 based on 2.5.25-rc6ish (or alternatively last merge to net-2.6 from Linus' tree's content came from that point of time), but Linus did the merge from 2.6.25 but git-describe won't look anything that happens after the asked commit. This is similar to the bisect-lands-lower-tag-than-select-good-commit-was "mystery" that was recently discussed extensively, again the Makefile only tracks ancestors, not the future. If somebody knows a trivial command to get that future information (to where merged info), I'd pretty interested to hear. > This is a significant > question because the fallback kernel is kernel-2.6.25.3-18.fc9.x86_64 on > the 16-way box. (all other build-boxes have 2.6.24 or older as a > fallback kernel) Please do get the receiver state if you still see such problem with it, it is also relevant but it a different problem then (I'm yet to analyze the data Håkan was collecting, dl it already by didn't even look into that yet). ...Or also if you see stuck TCPs with other cases I've told should fix it: 1. 2.6.25 (pre-ec3c to be accurate) 2. 3+1 revert 3. ec3c+locking fix (this is the most unsure one because it still would have the reversed socket lock taking order though nothing bad has been found by some review neither by me nor Patrick) Please collect at least /proc/net/tcp and the netstat -np, if there's process associated to the flow with _Recv-Q_ (in localhost case there are two of them, the other with Send-Q), also where the process is waiting is useful. Hopefully clear enough now... :-) > > To summarize. Both 3changes+1fix revert (you refer to it only as > > 3-patch revert) _and_ the locking fix I made should fix the problem > > (obviously they exclude each other). ...And end which is significant > > is the one which has LISTENing sockets (please keep this in mind if > > you still get the hang and provide some info). > > ok. > > For completeness, let me repeat the patch i referred to as the > '3-patch-revert' below. (which indeed is 3+1 as you note) ...I know because there never have been any 3-patch-revert made... :-) > this is the patch that appears to be working empirically. (Disclaimer: > it might just hide the problem, change timings, have a lucky code > layout, etc.) Sure, but the revert also removes the obvious locking problem that was introduced in ec3c. -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 18:39 ` Ingo Molnar 2008-06-06 19:49 ` Ilpo Järvinen @ 2008-06-06 20:08 ` Patrick McManus 2008-06-06 21:12 ` Ilpo Järvinen 2008-06-10 22:49 ` David Miller 1 sibling, 2 replies; 35+ messages in thread From: Patrick McManus @ 2008-06-06 20:08 UTC (permalink / raw) To: Ingo Molnar Cc: Ilpo Järvinen, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol This is all a bit confusing, but here are the conclusions I have drawn. There definitely is a problem with the locking of the DA commit ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 . That code was part of 26-rc1 but it never appeared in 25. It exists in pretty much the same form in rc5 (there was 1 patch to it over that time to fix a different problem). We're certain this code has a problem with the accept queue both because of code inspection and the fact that Ingo can back it out (as the significant part of the 3-patch revert) and the problem goes away in his testing. I have run tests that can reproduce the hung socket with distcc over localhost using 26-rc5. I can also apparently cure it using the locking fix patch Ilpo sent (c9454f0..d21d2b9) on top of that. (My test of rc5 +lockpatch is at 4.5+ hrs and counting without failures, it fails 6 times an hour with vanilla rc5) Based on all of that, the right thing to do seems to be to apply the lockpatch (c9454f0..d21d2b9) to Linus's tree and not revert anything - just fix the code and I'll send Ilpo and Ingo cookies at Christmas time for being great guys. Alternatively, Ingo could run the distcc servers and clients on -tip with the lockpatch (nothing reverted) for more testing. The only lingering problem is Ingo's report yesterday http://marc.info/?l=linux-netdev&m=121267587715976&w=2 of a distcc hang. In this one it was not over localhost and the distcc server had the ec3c DA changes totally reverted. (The server is really the only stack that matters in this case - the client is not impacted by the DA changes). This has to be a different issue, because the ec3c code we're talking about here wasn't on the server at all. As Ilpo mentions, Hakon is beleived to have a different problem and maybe you've tripped over that too? If we're sure of that conclusion we should just take Ilpo's DA patch as that will narrow the field for finding Hakon's issue. Its just with all of these data points I'm not sure if I'm reaching the right conclusion. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 20:08 ` Patrick McManus @ 2008-06-06 21:12 ` Ilpo Järvinen 2008-06-06 21:23 ` Arjan van de Ven 2008-06-10 22:49 ` David Miller 1 sibling, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-06 21:12 UTC (permalink / raw) To: Patrick McManus, Arjan van de Ven Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 4077 bytes --] ...added Arjan. On Fri, 6 Jun 2008, Patrick McManus wrote: > This is all a bit confusing, but here are the conclusions I have drawn. Your observations here match what I've understood :-). > There definitely is a problem with the locking of the DA commit > ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 . That code was part of 26-rc1 > but it never appeared in 25. It exists in pretty much the same form in > rc5 (there was 1 patch to it over that time to fix a different problem). > > We're certain this code has a problem with the accept queue both because > of code inspection and the fact that Ingo can back it out (as the > significant part of the 3-patch revert) and the problem goes away in his > testing. Problems were at least these: - Accept queue addition was racy and could leave dangling items - Dangling items caused inconsistent sk_ack_backlog - Checking for still in LISTEN state was racy, could be changed after the check was made (shouldn't happen with distcc though) I didn't read ->sk_data_ready that carefully, it could have some additional problems that are not listed (but they all should be fixed by the added locking anyway). AFAICT, rest of that ec3c change is safe wrt. locking, just holding sk is enough for the rest and those bits mostly shouldn't anyway be executed with a distcc setup. > I have run tests that can reproduce the hung socket with distcc over > localhost using 26-rc5. I can also apparently cure it using the locking > fix patch Ilpo sent (c9454f0..d21d2b9) on top of that. (My test of rc5 > +lockpatch is at 4.5+ hrs and counting without failures, it fails 6 > times an hour with vanilla rc5) > > Based on all of that, the right thing to do seems to be to apply the > lockpatch (c9454f0..d21d2b9) to Linus's tree and not revert anything - > just fix the code and I'll send Ilpo and Ingo cookies at Christmas time > for being great guys. Alternatively, Ingo could run the distcc servers > and clients on -tip with the lockpatch (nothing reverted) for more > testing. Anyway, we still would have an option to revert both the DA change + the locking fix later if the problem is still clearly more likely than with stable-2.6.25. > The only lingering problem is Ingo's report yesterday > http://marc.info/?l=linux-netdev&m=121267587715976&w=2 > of a distcc hang. In this one it was not over localhost and the distcc > server had the ec3c DA changes totally reverted. (The server is really > the only stack that matters in this case - the client is not impacted by > the DA changes). It definately didn't fit to picture that well if we would be talking just a single bug here. ...I wish Ingo would have provided the receiver state already then. :-) > This has to be a different issue, because the ec3c code > we're talking about here wasn't on the server at all. As Ilpo mentions, > Hakon is beleived to have a different problem and maybe you've tripped > over that too? ...The Håkon's case is definately different thing, also the symptoms are quite different because there's no deadlock at all but the TCP flow eventually dies, I don't yet know with what timescale that dying happens. Only common denominator actually was this receiver process missing, though it provably still was there. Besides, I don't know how long Ingo waited in this case until concluding that the TCP was stuck again? > If we're sure of that conclusion we should just take Ilpo's DA patch as > that will narrow the field for finding Hakon's issue. Its just with all > of these data points I'm not sure if I'm reaching the right conclusion. Lets widen the scope to two to three bugs then, one down already... In case you missed btw, also Arjan reported some problem quite early, but in his case claws mua+imap was the workload, so I doubt that DEFER_ACCEPT would be involved but who knows without strace, here: http://marc.info/?l=linux-kernel&m=121182171000434&w=2 Arjan, can you please check if your workload uses setsockopt TCP_DEFER_ACCEPT for the LISTENing socket? ...If not, then your case is different from Ingo's. -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 21:12 ` Ilpo Järvinen @ 2008-06-06 21:23 ` Arjan van de Ven 2008-06-06 21:28 ` Ilpo Järvinen 0 siblings, 1 reply; 35+ messages in thread From: Arjan van de Ven @ 2008-06-06 21:23 UTC (permalink / raw) To: Ilpo Järvinen Cc: Patrick McManus, Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol On Sat, 7 Jun 2008 00:12:55 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Arjan, can you please check if your workload uses setsockopt > TCP_DEFER_ACCEPT for the LISTENing socket? ...If not, then your case > is different from Ingo's. at this point the observed hung I had seems to be the app doing something bad (it's the app that hung) so for now please disregard my previous comments on this topic. > > -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 21:23 ` Arjan van de Ven @ 2008-06-06 21:28 ` Ilpo Järvinen 0 siblings, 0 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-06 21:28 UTC (permalink / raw) To: Arjan van de Ven Cc: Patrick McManus, Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 530 bytes --] On Fri, 6 Jun 2008, Arjan van de Ven wrote: > On Sat, 7 Jun 2008 00:12:55 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > Arjan, can you please check if your workload uses setsockopt > > TCP_DEFER_ACCEPT for the LISTENing socket? ...If not, then your case > > is different from Ingo's. > > at this point the observed hung I had seems to be the app doing > something bad (it's the app that hung) so for now please disregard my > previous comments on this topic. ...Ok, thanks for letting us know. -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 20:08 ` Patrick McManus 2008-06-06 21:12 ` Ilpo Järvinen @ 2008-06-10 22:49 ` David Miller 1 sibling, 0 replies; 35+ messages in thread From: David Miller @ 2008-06-10 22:49 UTC (permalink / raw) To: mcmanus Cc: mingo, ilpo.jarvinen, peterz, linux-kernel, netdev, rjw, akpm, johnpol From: Patrick McManus <mcmanus@ducksong.com> Date: Fri, 06 Jun 2008 16:08:57 -0400 > If we're sure of that conclusion we should just take Ilpo's DA patch as > that will narrow the field for finding Hakon's issue. While the conclusion might be accurate, Ilpo's patch adds a lock ordering problem that might introduce potential deadlocks (worst case) or at least some lockdep warnings. Both of which are unacceptable. I personally don't see any clear solution, and because this new DA code moves an operation into a code block where the locking order pretty much has to be inverted, I don't see any real short term solution other than a revert of this particular DA commit. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-06 17:11 ` Patrick McManus 2008-06-06 17:33 ` Ingo Molnar @ 2008-06-06 18:25 ` Ilpo Järvinen 1 sibling, 0 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-06 18:25 UTC (permalink / raw) To: Patrick McManus Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol On Fri, 6 Jun 2008, Patrick McManus wrote: > > This Ingo's testcase should anyway be quite "simple", I mean that distcc > > shouldn't do anything unexpected in a sense it shouldn't abort the flows > > by not sending data, close the listening socket or other things like that. > > maybe - I've noted that I can get the distcc server to crash with just a > little fuzz (telnet to it and close the telnet) - but it is true I > haven't seen anything odd using the distcc client. In addition I think I've also seen some bits floating around that occassionally distcc does something weird in a correct setup too. I briefly looked how distcc behaved while doing the stress_accept. Distcc basically seems to have n processes each accept()ing and some kind of memleak killer by limiting number of successive accepts then exit, while the parent who did the listen is only periodically (had some sleep(1)s) collecting dead ones & respawning them. > Anyhow, my news is that using rc5 I have managed to reproduce it on > localhost - so it isn't just ingo anymore ! ;) Also Peter Z has reported it earlier, it was distcc+localhost for him as well. > and has intentionally broken dependencies so it just keeps recompiling > stuff. ...Trying to invent perpetual motion machine? :-/ > The input files are > approximately 135k, 98k, and 16k after running gcc -E on them (which I > what I assume distcc does before putting them down the socket). > > On rc5 I could get the lockup in under 20 minutes.. usually 10. I think > I did it 4 times. My compile test is probably a better trigger than the > kernel compile because the distcc connects are never staggered like they > would be in a large directory of files. (3 files, -j4). It could be even easier if you make next in path gcc to play with nice, trying a number of different values might reveal some really fast to reproduce scenario. > When I apply the locking patch you (Ilpo) wrote, I cannot reproduce the > error at all in the first 90 minutes of testing. I'll let the test run > and update the list. At least it helps some :-), like it should. > I'm holding out hope that Ingo's report did not have the locking patch > on the distcc server end - because it certainly makes a difference for > me. ...He had some issue with different versions being deployed at least in the past, and I failed to follow his latest answer :-). -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 23:22 ` Ilpo Järvinen ` (2 preceding siblings ...) 2008-06-05 14:22 ` Ingo Molnar @ 2008-06-10 22:32 ` David Miller 2008-06-11 13:10 ` Patrick McManus 2008-06-11 15:13 ` Ilpo Järvinen 3 siblings, 2 replies; 35+ messages in thread From: David Miller @ 2008-06-10 22:32 UTC (permalink / raw) To: ilpo.jarvinen Cc: mingo, mcmanus, peterz, linux-kernel, netdev, rjw, akpm, johnpol From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Wed, 4 Jun 2008 02:22:16 +0300 (EEST) > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk > > It seems that replacement of DA code also moved parts outside > of appropriate locking. The Ingo's problem seems to come from > the fact that two flows could now race in > (inet_csk_)reqsk_queue_add corrupting the queue. ...This can > leave dangling socks around which won't resolve themselves > without stimuli from outside (e.g., external RST would help > I think). > > Then some details I'm not too sure of: > I guess we want to put listen_sk->sk_state checking under the > lock as well. I've not evaluated if ->sk_data_ready too > requires locking but assumed it does. > > I'm by no means familiar with all locking variants, requirements, > etc. > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> I took a close look at this, it seems this patch adds an ABBA deadlock. But I might be wrong. Normally the locking order is: listen_sk --> some_child_sk And this can be seen by the code paths that flow down to tcp_child_process() in net/ipv4/tcp_minisocks.c (via tcp_v4_do_rcv() for example). However here in this patch we will lock in the: child_sk --> listen_sk order. Unless... these defer accepted sockets live outside of the normal socket collection found by tcp_v{4,6}_hnd_req(). If that is the case, that ought to make this locking order OK but I fear that lockdep will likely complain because it has no way to see this distinction. If we cannot find a simple and easy way to deal with this locking problem, I am reverting the defer-accept changes entirely. It's not the end of the world if this feature has to be deferred to 2.6.27 and this bug has been known for several weeks already. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-10 22:32 ` David Miller @ 2008-06-11 13:10 ` Patrick McManus 2008-06-11 15:13 ` Ilpo Järvinen 1 sibling, 0 replies; 35+ messages in thread From: Patrick McManus @ 2008-06-11 13:10 UTC (permalink / raw) To: David Miller Cc: ilpo.jarvinen, mingo, peterz, linux-kernel, netdev, rjw, akpm, johnpol, vgusev On Tue, 2008-06-10 at 15:32 -0700, David Miller wrote: > If we cannot find a simple and easy way to deal with this locking > problem, I am reverting the defer-accept changes entirely. It's not > the end of the world if this feature has to be deferred to 2.6.27 based on this, plus the comments from vitaliy it is clear that moving the DA socket to established is pretty tricky. I agree I should address these things in the >= 27 time frame. imho we don't need to revert the first two patches: e4c78840284f3f51b1896cf3936d60a6033c4d2c and 539fae89bebd16ebeafd57a87169bc56eb530d76 .. they have nothing to do with rearranging states or the locking and they do fix the more notable DA behavior issues. -Patrick ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-10 22:32 ` David Miller 2008-06-11 13:10 ` Patrick McManus @ 2008-06-11 15:13 ` Ilpo Järvinen 1 sibling, 0 replies; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-11 15:13 UTC (permalink / raw) To: David Miller Cc: mingo, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol [-- Attachment #1: Type: TEXT/PLAIN, Size: 5129 bytes --] On Tue, 10 Jun 2008, David Miller wrote: > From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> > Date: Wed, 4 Jun 2008 02:22:16 +0300 (EEST) > > > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk > > > > It seems that replacement of DA code also moved parts outside > > of appropriate locking. The Ingo's problem seems to come from > > the fact that two flows could now race in > > (inet_csk_)reqsk_queue_add corrupting the queue. ...This can > > leave dangling socks around which won't resolve themselves > > without stimuli from outside (e.g., external RST would help > > I think). > > > > Then some details I'm not too sure of: > > I guess we want to put listen_sk->sk_state checking under the > > lock as well. I've not evaluated if ->sk_data_ready too > > requires locking but assumed it does. > > > > I'm by no means familiar with all locking variants, requirements, > > etc. > > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> > > I took a close look at this, it seems this patch adds > an ABBA deadlock. But I might be wrong. > > Normally the locking order is: > > listen_sk --> some_child_sk > > And this can be seen by the code paths that flow down > to tcp_child_process() in net/ipv4/tcp_minisocks.c (via > tcp_v4_do_rcv() for example). > > However here in this patch we will lock in the: > > child_sk --> listen_sk > > order. > > Unless... these defer accepted sockets live outside of the > normal socket collection found by tcp_v{4,6}_hnd_req(). All these collection/queues make the analysis a bit complex :-), but I think I finally got a hold of it anyway and my analysis agrees with yours. To me it seems that when DAed TCP sk moves to established, there's no longer a connection from listen_sk to child_sk, we keep them alive by incrementing those refcounts and the child sk is responsible about the binding between them (until the DA gets resolution). But the child_sk locking during the creation is also a suspect, but only after tcp_rcv_established() part can run for the child sock. Am I right that after tcp_v4_syn_recv_sock() it is possible to get tcp_rcv_established() to execute for the created child? It's called by this path: tcp_v4_hnd_req -> tcp_check_req -> tcp_v4_syn_recv_sock (inet_csk(sk)->icsk_af_ops->syn_recv_sock) ...and the listen_sk part acquires that child's lock only after that in tcp_v4_hnd_req(). ...Allowing it to deadlock during that short window! ...But from that point onward nothing in listen_sk needs to lock the child. Then this connection from listen_sk to child_sk comes back once we've reinserted the DAed child_sk back to the queue (in that racy part I was trying to fix) but at that point of time we won't ever need to lock in child_sk -> listen_sk order again because we've already passed that point. ...But like you, I don't understand all of it that well... Btw, your sk_callback_lock notes explained some mystery part for me as I came across with it too while looking around... :-) > If that is the case, that ought to make this locking order OK > but I fear that lockdep will likely complain because it has > no way to see this distinction. I definately agree that the locks are taken in different order no matter what (I actually referring to that already in the first mail about the deadlock) so probably lockdep is not going to be too friendly :-), whether it could cause a real deadlock, I was not that sure at that point of time. > If we cannot find a simple and easy way to deal with this locking > problem, I am reverting the defer-accept changes entirely. To avoid all problem, I was already thinking of another approach (though my time constraints hardly allows finishing it any time soon): Because DA code resides quite late on the TCP path it would be quite easy to do some preparatory work, drop child sk's lock and re-acquire both locks in the usual order (listen->child) but that would require handling correctly reset & other things that could intervene (luckily that pesky userspace isn't there yet to mess around so not that many things can happen :-)). But ipv6 seems to do some additional processing after tcp_rcv_established() which I'd expect to choke if child sk's lock was dropped there for a moment, while ipv4 part seems quite doable. I don't even know what exactly are the requirements for that ipv6 part (see the stuff after ipv6_pktoptions label in tcp_v6_do_rcv). ...I tried to do this but came across those two things mentioned above. IMHO, changing locking this late in the release cycle would be quite risky anyway... And we would also be fiddling with TCP state machine. > It's not the end of the world if this feature has to be deferred to > 2.6.27 Agreed. Especially in the light of the another issue that has been raised. > and this bug has been known for several weeks already. ...That's partially because Ingo didn't even test my fix on the receiver which got stuck but used 2.6.25 and got some other bug which looked alike but couldn't be the same because these DA problems weren't in 2.6.25. What could I've done for that :-). -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-03 21:46 ` Ilpo Järvinen 2008-06-03 22:01 ` Ilpo Järvinen @ 2008-06-04 7:23 ` Ingo Molnar 2008-06-04 18:24 ` David Miller 1 sibling, 1 reply; 35+ messages in thread From: Ingo Molnar @ 2008-06-04 7:23 UTC (permalink / raw) To: Ilpo Järvinen Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov, Patrick McManus * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > i'll queue up your reverts for testing in -tip. > > > > update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely > > fixed the hangs! > > ...It wasn't exactly out-of-tree, Evgeniy fixed a problem that was > found in "TCP_DEFER_ACCEPT updates - process as established", perhaps > it just wasn't in your testing tree yet. out of the -tip tree :-) The -tip tree has 75+ topic branches at the moment, but TCP topics are not in its scope - so any TCP change is "out of tree" for the -tip tree. People got confused in the past when they saw similar test patches show up in sched.git and x86.git before, so we wanted to make it very clear in -tip (with is the successor of sched.git, x86.git and a couple of other git trees) that these are commits we dont want to push anywhere. Commits in tip/out-of-tree dont get propagated into the tip/auto-*-next topic branches that linux-next and -mm picks up, they are purely a courtesy to help the testing/fixing of bugs in subsystems that are maintained in other git trees. See attached below the current shortlog of the tip/out-of-tree topic branch - it contains changes all around the tree for various things that we triggered in -tip and are not yet upstream or are in flight somewhere in another git tree. > > Here is the testing i did: > > > > first i ran about 500+ successful iterations on the affected > > testboxes with your revert patch applied, on multiple systems. > > Are you sure this is enough to conclude the results? Seems quite small > number to me to rule out luck. Especially considering that it was some > amount of time in the tree already until you noticed it for the first > time. a full day of testing on a testsystem with 500 random kernel builds and bootups (the kernel build done on the testsystem utilizing distcc and make -j100, so it's rather heavy and parallel TCP traffic per iteration) with no hang, compared to the same system with your reverts not applied that hung after an hour with 20-30 iterations. And that count increased to 1000 successful test iterations since yesterday. So i think yes, it seems rather conclusive, given the circumstances ;-) These random kernel boots found many 'impossible to trigger' bugs and races in the past. The reason for its race finding capability is the timing randomness of the resulting random kernel image: the delays caused by random combination of debugging facilities, build variants, kernel subsystem variants we have. This -tip qa method - as a side-effect of its coverage testing - simulates timing variantions that are otherwise only observable via hardware variations. I.e. this is not the same kernel booted up a 1000 times - that would be a very narrow test. This is 1000 _different_ kernels built and booted up. Each kernel having subtly different timings and ordering. And it's more than just externally injected random kernel: the test-system itself builds its "next version" (and uses the network for that as well), so it's a self-hosting recursive random test in essence. This method is also amazingly good at finding compiler/linker trouble: it found 3-4 real gcc bugs so far. (For example i triggered an ancient bug in gcc 4.0.2 just yesterday. For the record, the testsystem with the TCP hang utilizes gcc-4.2.2.) > > so i hereby conclude that your revert works :) I've repeated the > > commit below that resolves this nasty regression. > > ...I couldn't immediately find anything obviously wrong with those > changes but the patch below might be worth of a try (without the > revert of course). If it ever spits out that WARN_ON for you, we were > playing with fire too much and it's better to return on the safe side > there... i'll queue it up for testing, but no promises about speedy action here - the test cycle is really long with this bug. Ingo ------{ tip/out-of-tree shortlog: }-----------> Alexander van Heukelum (1): uml: cleanup: use def_bool in Kconfig files Bjorn Helgaas (1): PNPACPI: use _CRS IRQ descriptor length for _SRS Ilpo Järvinen (1): tcp: revert DEFER_ACCEPT modifications Ingo Molnar (7): video/dvb: fix MEDIA_TUNER && FW_LOADER build error dvb: input layer dependencies fixes drivers/media/video build fix for modular builds drivers/watchdog/geodewdt.c: build fix USB: fix build bug in USB_ISIGHTFW acpi-acpi_numa_init-build-fix acpi: fix drivers/acpi/glue.c build error Michael Krufky (1): dib7000p: fix dib7000p_attach when !CONFIG_DVB_DIB7000P Russ Anderson (1): acpi: fix boot breakage on Altix Yinghai Lu (2): net: use numa_node in net_devcice->dev instead of parent ide: use dev_to_node instead of pcibus_to_node ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-04 7:23 ` Ingo Molnar @ 2008-06-04 18:24 ` David Miller 2008-06-04 20:56 ` Ilpo Järvinen 0 siblings, 1 reply; 35+ messages in thread From: David Miller @ 2008-06-04 18:24 UTC (permalink / raw) To: mingo Cc: ilpo.jarvinen, peterz, linux-kernel, netdev, rjw, akpm, johnpol, mcmanus From: Ingo Molnar <mingo@elte.hu> Date: Wed, 4 Jun 2008 09:23:11 +0200 > * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > ...I couldn't immediately find anything obviously wrong with those > > changes but the patch below might be worth of a try (without the > > revert of course). If it ever spits out that WARN_ON for you, we were > > playing with fire too much and it's better to return on the safe side > > there... > > i'll queue it up for testing, but no promises about speedy action here - > the test cycle is really long with this bug. Ilpo posted another patch which fixes a locking bug in the code, please test with that patch. I include it below so that you know exactly which one I am referring to. The quicker you test this, the faster I can merge it to Linus and get the bug fixed for good. [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk It seems that replacement of DA code also moved parts outside of appropriate locking. The Ingo's problem seems to come from the fact that two flows could now race in (inet_csk_)reqsk_queue_add corrupting the queue. ...This can leave dangling socks around which won't resolve themselves without stimuli from outside (e.g., external RST would help I think). Then some details I'm not too sure of: I guess we want to put listen_sk->sk_state checking under the lock as well. I've not evaluated if ->sk_data_ready too requires locking but assumed it does. I'm by no means familiar with all locking variants, requirements, etc. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 23 +++++++++++++---------- 1 files changed, 13 insertions(+), 10 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c9454f0..d21d2b9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); if (tp->defer_tcp_accept.request) { + struct sock *listen_sk = tp->defer_tcp_accept.listen_sk; int queued_data = tp->rcv_nxt - tp->copied_seq; int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ? tcp_hdr((struct sk_buff *) @@ -4570,8 +4571,9 @@ static int tcp_defer_accept_check(struct sock *sk) if (queued_data && hasfin) queued_data--; - if (queued_data && - tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) { + bh_lock_sock(listen_sk); + + if (queued_data && listen_sk->sk_state == TCP_LISTEN) { if (sock_flag(sk, SOCK_KEEPOPEN)) { inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp)); @@ -4579,23 +4581,24 @@ static int tcp_defer_accept_check(struct sock *sk) inet_csk_delete_keepalive_timer(sk); } - inet_csk_reqsk_queue_add( - tp->defer_tcp_accept.listen_sk, - tp->defer_tcp_accept.request, - sk); + inet_csk_reqsk_queue_add(listen_sk, + tp->defer_tcp_accept.request, + sk); tp->defer_tcp_accept.listen_sk->sk_data_ready( - tp->defer_tcp_accept.listen_sk, 0); + listen_sk, 0); - sock_put(tp->defer_tcp_accept.listen_sk); + sock_put(listen_sk); sock_put(sk); tp->defer_tcp_accept.listen_sk = NULL; tp->defer_tcp_accept.request = NULL; - } else if (hasfin || - tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) { + } else if (hasfin || listen_sk->sk_state != TCP_LISTEN) { + bh_unlock_sock(listen_sk); tcp_reset(sk); return -1; } + + bh_unlock_sock(listen_sk); } return 0; } ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-04 18:24 ` David Miller @ 2008-06-04 20:56 ` Ilpo Järvinen 2008-06-04 21:55 ` David Miller 0 siblings, 1 reply; 35+ messages in thread From: Ilpo Järvinen @ 2008-06-04 20:56 UTC (permalink / raw) To: David Miller Cc: mingo, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol, mcmanus [-- Attachment #1: Type: TEXT/PLAIN, Size: 1021 bytes --] On Wed, 4 Jun 2008, David Miller wrote: > From: Ingo Molnar <mingo@elte.hu> > Date: Wed, 4 Jun 2008 09:23:11 +0200 > > > * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > > > ...I couldn't immediately find anything obviously wrong with those > > > changes but the patch below might be worth of a try (without the > > > revert of course). If it ever spits out that WARN_ON for you, we were > > > playing with fire too much and it's better to return on the safe side > > > there... > > > > i'll queue it up for testing, but no promises about speedy action here - > > the test cycle is really long with this bug. > > Ilpo posted another patch which fixes a locking bug in the > code, please test with that patch. I include it below so > that you know exactly which one I am referring to. Do you know if TCP ever needs to lock the child sk again (after creating it and it has started to live on its own) from a context that has already locked the listen_sk (it could deadlock after this change)? -- i. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ 2008-06-04 20:56 ` Ilpo Järvinen @ 2008-06-04 21:55 ` David Miller 0 siblings, 0 replies; 35+ messages in thread From: David Miller @ 2008-06-04 21:55 UTC (permalink / raw) To: ilpo.jarvinen Cc: mingo, peterz, linux-kernel, netdev, rjw, akpm, johnpol, mcmanus From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Wed, 4 Jun 2008 23:56:34 +0300 (EEST) > Do you know if TCP ever needs to lock the child sk again (after creating > it and it has started to live on its own) from a context that has already > locked the listen_sk (it could deadlock after this change)? I don't know, and since I'm getting back onto an airplane again tomorrow morning I'm the last person who is likely do be able to do an audit of these cases for you :-) ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2008-06-11 15:13 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-11 15:06 [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ Alexey Kuznetsov -- strict thread matches above, loose matches on Subject: below -- 2008-05-29 8:45 Ingo Molnar 2008-05-29 11:14 ` Ilpo Järvinen 2008-05-29 11:22 ` Ingo Molnar 2008-05-30 18:18 ` Ingo Molnar 2008-05-31 6:09 ` Ingo Molnar 2008-05-31 11:46 ` Ilpo Järvinen 2008-05-31 12:18 ` Ilpo Järvinen 2008-05-31 12:54 ` Ingo Molnar 2008-05-31 12:58 ` Ilpo Järvinen 2008-05-31 16:35 ` Ingo Molnar 2008-06-03 9:40 ` [fixed] [patch] " Ingo Molnar 2008-06-03 14:41 ` Patrick McManus 2008-06-03 21:46 ` Ilpo Järvinen 2008-06-03 22:01 ` Ilpo Järvinen 2008-06-03 22:03 ` David Miller 2008-06-03 22:10 ` Ilpo Järvinen 2008-06-03 23:22 ` Ilpo Järvinen 2008-06-03 23:54 ` Joe Perches 2008-06-04 6:25 ` Ilpo Järvinen 2008-06-04 2:54 ` Patrick McManus 2008-06-04 6:42 ` Ilpo Järvinen 2008-06-05 14:22 ` Ingo Molnar 2008-06-05 18:00 ` Ilpo Järvinen 2008-06-05 21:13 ` Ilpo Järvinen 2008-06-05 23:29 ` Patrick McManus 2008-06-06 10:03 ` Ilpo Järvinen 2008-06-06 17:11 ` Patrick McManus 2008-06-06 17:33 ` Ingo Molnar 2008-06-06 18:19 ` Ilpo Järvinen 2008-06-06 18:39 ` Ingo Molnar 2008-06-06 19:49 ` Ilpo Järvinen 2008-06-06 20:08 ` Patrick McManus 2008-06-06 21:12 ` Ilpo Järvinen 2008-06-06 21:23 ` Arjan van de Ven 2008-06-06 21:28 ` Ilpo Järvinen 2008-06-10 22:49 ` David Miller 2008-06-06 18:25 ` Ilpo Järvinen 2008-06-10 22:32 ` David Miller 2008-06-11 13:10 ` Patrick McManus 2008-06-11 15:13 ` Ilpo Järvinen 2008-06-04 7:23 ` Ingo Molnar 2008-06-04 18:24 ` David Miller 2008-06-04 20:56 ` Ilpo Järvinen 2008-06-04 21:55 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).