Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
@ 2008-06-11 15:06 Alexey Kuznetsov
  0 siblings, 0 replies; 35+ messages in thread
From: Alexey Kuznetsov @ 2008-06-11 15:06 UTC (permalink / raw)
  To: davem, mcmanus, netdev

Hello!

> On Tue, 2008-06-10 at 15:32 -0700, David Miller wrote:
> 
> I took a close look at this, it seems this patch adds
> an ABBA deadlock.  But I might be wrong.

Yes, this is not a real deadlock. But also it is not even a lock.
bh_lock_sock on listening socket is not enough to lock it.

Seems, the solution would be forcing bh_lock_sock() to protect accept_queue,
which is a mess. Or adding a separate spinlock, like it was done with
syn_wait_lock. Also not good.

Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
@ 2008-05-29  8:45 Ingo Molnar
  2008-05-29 11:14 ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-05-29  8:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton

* Ingo Molnar <mingo@elte.hu> wrote:

> in an overnight -tip testruns that is based on recent -git i got two 
> stuck TCP connections:
> 
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address               Foreign Address             State      
> tcp        0 174592 10.0.1.14:58015             10.0.1.14:3632              ESTABLISHED 
> tcp    72134      0 10.0.1.14:3632              10.0.1.14:58015             ESTABLISHED 

update: in the past 5 days of -tip testing i've gathered about 10 
randconfig kernel configs that all produced such failures.

Since the bug itself is very elusive (it takes up to 50 boot + 
kernel-rebuild-via-distccc iterations to trigger) bisection was still 
not an option - but with 10 configs statistical analysis of the configs 
is now possible.

I made a histogram of all kernel options present in those configs, and 
one networking related kernel option stood out:

      5 CONFIG_TCP_CONG_ADVANCED=y
      6 CONFIG_INET_TCP_DIAG=y
      6 CONFIG_TCP_MD5SIG=y
      9 CONFIG_TCP_CONG_CUBIC=y

that code is called in the bootlogs:

> [   13.279410] calling  cubictcp_register+0x0/0x80
> [   13.279412] TCP cubic registered

the likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in my randconfig 
runs is 75%. The likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in 
10 configs in a row is 0.75^10, or 5.6%. So statistical analysis can say 
it with a 95% confidence that the presence of this option correlates to 
the hung sockets.

i have started testing this theory now, via the patch below, which turns 
off TCP_CONG_CUBIC. It will take about 50 bootups on the affected 
testsystems to confirm. (it will take a couple of hours today as not all 
testsystems show these hung socket symptoms)

distributions enable TCP_CONG_CUBIC by default:

  $ grep CUBIC /boot/config-2.6.24.7-92.fc8
  CONFIG_TCP_CONG_CUBIC=y
  CONFIG_DEFAULT_CUBIC=y

which would explain why Arjan and Peter triggered similar hangs as well.

	Ingo

---------------------->
Subject: qa: no TCP_CONG_CUBIC
From: Ingo Molnar <mingo@elte.hu>
Date: Thu May 29 09:45:51 CEST 2008

---
 net/ipv4/Kconfig |    4 ++++
 1 file changed, 4 insertions(+)

Index: tip/net/ipv4/Kconfig
===================================================================
--- tip.orig/net/ipv4/Kconfig
+++ tip/net/ipv4/Kconfig
@@ -454,6 +454,8 @@ config TCP_CONG_BIC
 config TCP_CONG_CUBIC
 	tristate "CUBIC TCP"
 	default y
+	depends on BROKEN_BOOT_ALLOWED
+	select BROKEN_BOOT
 	---help---
 	This is version 2.0 of BIC-TCP which uses a cubic growth function
 	among other techniques.
@@ -608,6 +610,8 @@ endif
 config TCP_CONG_CUBIC
 	tristate
 	depends on !TCP_CONG_ADVANCED
+	depends on BROKEN_BOOT_ALLOWED
+	select BROKEN_BOOT
 	default y

 config DEFAULT_TCP_CONG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-29  8:45 Ingo Molnar
@ 2008-05-29 11:14 ` Ilpo Järvinen
  2008-05-29 11:22   ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-05-29 11:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton

On Thu, 29 May 2008, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > in an overnight -tip testruns that is based on recent -git i got two 
> > stuck TCP connections:
> > 
> > Active Internet connections (w/o servers)
> > Proto Recv-Q Send-Q Local Address               Foreign Address             State      
> > tcp        0 174592 10.0.1.14:58015             10.0.1.14:3632              ESTABLISHED 
> > tcp    72134      0 10.0.1.14:3632              10.0.1.14:58015             ESTABLISHED 
> 
> update: in the past 5 days of -tip testing i've gathered about 10 
> randconfig kernel configs that all produced such failures.

...I tried yesterday some accept (& read some) & close/exit type 
stressing but I couldn't get it to show up (though I'll try longer 
time later on and also fault style exiting).

> Since the bug itself is very elusive (it takes up to 50 boot + 
> kernel-rebuild-via-distccc iterations to trigger) bisection was still 
> not an option - but with 10 configs statistical analysis of the configs 
> is now possible.
> 
> I made a histogram of all kernel options present in those configs, and 
> one networking related kernel option stood out:
> 
>       5 CONFIG_TCP_CONG_ADVANCED=y
>       6 CONFIG_INET_TCP_DIAG=y
>       6 CONFIG_TCP_MD5SIG=y
>       9 CONFIG_TCP_CONG_CUBIC=y
> 
> that code is called in the bootlogs:
> 
> > [   13.279410] calling  cubictcp_register+0x0/0x80
> > [   13.279412] TCP cubic registered
> 
> the likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in my randconfig 
> runs is 75%. The likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in 
> 10 configs in a row is 0.75^10, or 5.6%. So statistical analysis can say 
> it with a 95% confidence that the presence of this option correlates to 
> the hung sockets.

Do I understand you correctly... it doesn't explain the tenth case out 
of ten but just nine of them?

> i have started testing this theory now, via the patch below, which turns 
> off TCP_CONG_CUBIC. It will take about 50 bootups on the affected 
> testsystems to confirm. (it will take a couple of hours today as not all 
> testsystems show these hung socket symptoms)
> 
> distributions enable TCP_CONG_CUBIC by default:
> 
>   $ grep CUBIC /boot/config-2.6.24.7-92.fc8
>   CONFIG_TCP_CONG_CUBIC=y
>   CONFIG_DEFAULT_CUBIC=y
> 
> which would explain why Arjan and Peter triggered similar hangs as well.

Main problem with this explanation is that congestion control modules are 
only in use when TCP is in ESTABLISHED and transmitting normally, while it 
has nothing to do how we enter or leave ESTABLISHED.

But if it's really that the process who owned the connection already went 
away, I think we should end up into tcp_close() which changes the state 
from established (and send RST too if there's still data to be received, 
which would be picked up by the other end and that end would no longer 
keep established either). ...A failure to send the reset would show up in 
LINUX_MIB_TCPABORTFAILED. Because both ends remain in established, it kind 
of excludes the possibility that something would have accidently allowed 
the Recv-Q end to return back to established too (due to some bug).

To me there are mainly two weird things:
1) Why we see orphaning with data in the first place (I think distcc would 
be interested to read everything, unless some worker crashed in early...
Though some timeout in distcc could explain it as well but I don't know 
too well how distcc does everything)...
2) Why the connection is still in ESTABLISHED when it was orphaned and 
it has some data to receive... If it had some unread data, should be in 
CLOSE or in FIN_WAIT1 otherwise.

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-29 11:14 ` Ilpo Järvinen
@ 2008-05-29 11:22   ` Ingo Molnar
  2008-05-30 18:18     ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-05-29 11:22 UTC (permalink / raw)
  To: Ilpo J?rvinen
  Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton


after about 50 bootups i got a hung test again:

titan:~/tip> netstat -nt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             
State
tcp        0      0 10.0.1.14:22                10.0.1.16:58062  ESTABLISHED
tcp        0      0 10.0.1.14:22                10.0.1.16:60109  ESTABLISHED
tcp        0  86368 10.0.1.14:43914             10.0.1.16:3632   ESTABLISHED

and this time with CUBIC_TCP disabled - so that was a red herring.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-29 11:22   ` Ingo Molnar
@ 2008-05-30 18:18     ` Ingo Molnar
  2008-05-31  6:09       ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-05-30 18:18 UTC (permalink / raw)
  To: Ilpo J?rvinen
  Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton

* Ingo Molnar <mingo@elte.hu> wrote:

> after about 50 bootups i got a hung test again:
> 
> titan:~/tip> netstat -nt
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address               Foreign Address             
> State
> tcp        0      0 10.0.1.14:22                10.0.1.16:58062  ESTABLISHED
> tcp        0      0 10.0.1.14:22                10.0.1.16:60109  ESTABLISHED
> tcp        0  86368 10.0.1.14:43914             10.0.1.16:3632   ESTABLISHED
> 
> and this time with CUBIC_TCP disabled - so that was a red herring.

ah, in retrospect i realized that this test had one flaw: some of the 
systems i the build cluster already ran a newer kernel and hence were 
targets for this bug.

so i turned off CONFIG_TCP_CONG_CUBIC on all the testboxes and rebooted 
the cluster boxes into 2.6.25, and the hung sockets are now gone. (about 
150 successful iterations)

i did another change as well: i removed the localhost distcc component. 
I'll reinstate that now to make sure it's really related to 
TCP_CONG_CUBIC and not to localhost networking.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-30 18:18     ` Ingo Molnar
@ 2008-05-31  6:09       ` Ingo Molnar
  2008-05-31 11:46         ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-05-31  6:09 UTC (permalink / raw)
  To: Ilpo J?rvinen
  Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton


* Ingo Molnar <mingo@elte.hu> wrote:

> ah, in retrospect i realized that this test had one flaw: some of the 
> systems i the build cluster already ran a newer kernel and hence were 
> targets for this bug.
> 
> so i turned off CONFIG_TCP_CONG_CUBIC on all the testboxes and 
> rebooted the cluster boxes into 2.6.25, and the hung sockets are now 
> gone. (about 150 successful iterations)
> 
> i did another change as well: i removed the localhost distcc 
> component. I'll reinstate that now to make sure it's really related to 
> TCP_CONG_CUBIC and not to localhost networking.

ok, once i added back the localhost distcc component and the hung kernel 
build + stuck TCP socket bug happened again overnight:

 Active Internet connections (w/o servers)
 Proto Recv-Q Send-Q Local Address               Foreign Address             State
 tcp    72187      0 10.0.1.14:3632              10.0.1.14:47910             ESTABLISHED
 tcp        0 174464 10.0.1.14:47910             10.0.1.14:3632              ESTABLISHED

so it seems distcc over localhost was the aspect that made it fail.

_Perhaps_ what matters is to have the new post-rc3 TCP code on _both_ 
sides of the connection. But that is just a theory - it could be timing, 
etc.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-31  6:09       ` Ingo Molnar
@ 2008-05-31 11:46         ` Ilpo Järvinen
  2008-05-31 12:18           ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-05-31 11:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton,
	Evgeniy Polyakov

On Sat, 31 May 2008, Ingo Molnar wrote:

> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > ah, in retrospect i realized that this test had one flaw: some of the 
> > systems i the build cluster already ran a newer kernel and hence were 
> > targets for this bug.
> > 
> > so i turned off CONFIG_TCP_CONG_CUBIC on all the testboxes and 
> > rebooted the cluster boxes into 2.6.25, and the hung sockets are now 
> > gone. (about 150 successful iterations)
> > 
> > i did another change as well: i removed the localhost distcc 
> > component. I'll reinstate that now to make sure it's really related to 
> > TCP_CONG_CUBIC and not to localhost networking.
> 
> ok, once i added back the localhost distcc component and the hung kernel 
> build + stuck TCP socket bug happened again overnight:
> 
>  Active Internet connections (w/o servers)
>  Proto Recv-Q Send-Q Local Address               Foreign Address             State
>  tcp    72187      0 10.0.1.14:3632              10.0.1.14:47910             ESTABLISHED
>  tcp        0 174464 10.0.1.14:47910             10.0.1.14:3632              ESTABLISHED
> 
> so it seems distcc over localhost was the aspect that made it fail.
> 
> _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_ 
> sides of the connection. But that is just a theory - it could be timing, 
> etc.

Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were
some post 2.6.25 changes into it)?

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-31 11:46         ` Ilpo Järvinen
@ 2008-05-31 12:18           ` Ilpo Järvinen
  2008-05-31 12:54             ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-05-31 12:18 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Netdev, David S. Miller, Rafael J. Wysocki, Andrew Morton,
	Evgeniy Polyakov

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11376 bytes --]

On Sat, 31 May 2008, Ilpo Järvinen wrote:

> On Sat, 31 May 2008, Ingo Molnar wrote:
> 
> > ok, once i added back the localhost distcc component and the hung kernel 
> > build + stuck TCP socket bug happened again overnight:
> > 
> >  Active Internet connections (w/o servers)
> >  Proto Recv-Q Send-Q Local Address               Foreign Address             State
> >  tcp    72187      0 10.0.1.14:3632              10.0.1.14:47910             ESTABLISHED
> >  tcp        0 174464 10.0.1.14:47910             10.0.1.14:3632              ESTABLISHED
> > 
> > so it seems distcc over localhost was the aspect that made it fail.
> > 
> > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_ 
> > sides of the connection. But that is just a theory - it could be timing, 
> > etc.
> 
> Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were
> some post 2.6.25 changes into it)?

Hmm, if so, please try this revert below (I hope I got everything that is 
related there, gitk is currently broken for me so that viewing many 
commits quickly is a pain, so I might have accidently missed something)...

-- 
 i.

--
[PATCH] tcp: Reverts DEFER_ACCEPT modifications

Reverts these commits:
 [TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_t
 [TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack
 [TCP]: TCP_DEFER_ACCEPT updates - process as established
 tcp: Fix slab corruption with ipv6 and tcp6fuzz

..., ie.,
 539fae89bebd16ebeafd57a87169bc56eb530d76,
 e4c78840284f3f51b1896cf3936d60a6033c4d2c,
 ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 and
 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/linux/tcp.h             |    7 ------
 include/net/request_sock.h      |    4 +-
 include/net/tcp.h               |    1 -
 net/ipv4/inet_connection_sock.c |   11 +++++++--
 net/ipv4/tcp.c                  |   18 +++++++++------
 net/ipv4/tcp_input.c            |   45 ---------------------------------------
 net/ipv4/tcp_ipv4.c             |    8 -------
 net/ipv4/tcp_minisocks.c        |   32 ++++++++++-----------------
 net/ipv4/tcp_timer.c            |    5 ----
 9 files changed, 33 insertions(+), 98 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 18e62e3..b31b6b7 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -239,11 +239,6 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
 	return (struct tcp_request_sock *)req;
 }
 
-struct tcp_deferred_accept_info {
-	struct sock *listen_sk;
-	struct request_sock *request;
-};
-
 struct tcp_sock {
 	/* inet_connection_sock has to be the first member of tcp_sock */
 	struct inet_connection_sock	inet_conn;
@@ -379,8 +374,6 @@ struct tcp_sock {
 	unsigned int		keepalive_intvl;  /* time interval between keep alive probes */
 	int			linger2;
 
-	struct tcp_deferred_accept_info defer_tcp_accept;
-
 	unsigned long last_synq_overflow; 
 
 	u32	tso_deferred;
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index b220b5f..0c96e7b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -115,8 +115,8 @@ struct request_sock_queue {
 	struct request_sock	*rskq_accept_head;
 	struct request_sock	*rskq_accept_tail;
 	rwlock_t		syn_wait_lock;
-	u16			rskq_defer_accept;
-	/* 2 bytes hole, try to pack */
+	u8			rskq_defer_accept;
+	/* 3 bytes hole, try to pack */
 	struct listen_sock	*listen_opt;
 };
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 633147c..981d5ba 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -139,7 +139,6 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define MAX_TCP_KEEPINTVL	32767
 #define MAX_TCP_KEEPCNT		127
 #define MAX_TCP_SYNCNT		127
-#define MAX_TCP_ACCEPT_DEFERRED 65535
 
 #define TCP_SYNQ_INTERVAL	(HZ/5)	/* Period of SYNACK timer */
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 828ea21..ec83448 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -419,7 +419,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 	struct inet_connection_sock *icsk = inet_csk(parent);
 	struct request_sock_queue *queue = &icsk->icsk_accept_queue;
 	struct listen_sock *lopt = queue->listen_opt;
-	int thresh = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries;
+	int max_retries = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries;
+	int thresh = max_retries;
 	unsigned long now = jiffies;
 	struct request_sock **reqp, *req;
 	int i, budget;
@@ -455,6 +456,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		}
 	}
 
+	if (queue->rskq_defer_accept)
+		max_retries = queue->rskq_defer_accept;
+
 	budget = 2 * (lopt->nr_table_entries / (timeout / interval));
 	i = lopt->clock_hand;
 
@@ -462,8 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		reqp=&lopt->syn_table[i];
 		while ((req = *reqp) != NULL) {
 			if (time_after_eq(now, req->expires)) {
-				if (req->retrans < thresh &&
-				    !req->rsk_ops->rtx_syn_ack(parent, req)) {
+				if ((req->retrans < thresh ||
+				     (inet_rsk(req)->acked && req->retrans < max_retries))
+				    && !req->rsk_ops->rtx_syn_ack(parent, req)) {
 					unsigned long timeo;
 
 					if (req->retrans++ == 0)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f886531..a0deead 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2105,12 +2105,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		break;
 
 	case TCP_DEFER_ACCEPT:
-		if (val < 0) {
-			err = -EINVAL;
-		} else {
-			if (val > MAX_TCP_ACCEPT_DEFERRED)
-				val = MAX_TCP_ACCEPT_DEFERRED;
-			icsk->icsk_accept_queue.rskq_defer_accept = val;
+		icsk->icsk_accept_queue.rskq_defer_accept = 0;
+		if (val > 0) {
+			/* Translate value in seconds to number of
+			 * retransmits */
+			while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
+			       val > ((TCP_TIMEOUT_INIT / HZ) <<
+				       icsk->icsk_accept_queue.rskq_defer_accept))
+				icsk->icsk_accept_queue.rskq_defer_accept++;
+			icsk->icsk_accept_queue.rskq_defer_accept++;
 		}
 		break;
 
@@ -2292,7 +2295,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 			val = (val ? : sysctl_tcp_fin_timeout) / HZ;
 		break;
 	case TCP_DEFER_ACCEPT:
-		val = icsk->icsk_accept_queue.rskq_defer_accept;
+		val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 :
+			((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1));
 		break;
 	case TCP_WINDOW_CLAMP:
 		val = tp->window_clamp;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b54d9d3..ddc2e89 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4532,49 +4532,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th)
 	}
 }
 
-static int tcp_defer_accept_check(struct sock *sk)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (tp->defer_tcp_accept.request) {
-		int queued_data =  tp->rcv_nxt - tp->copied_seq;
-		int hasfin =  !skb_queue_empty(&sk->sk_receive_queue) ?
-			tcp_hdr((struct sk_buff *)
-				sk->sk_receive_queue.prev)->fin : 0;
-
-		if (queued_data && hasfin)
-			queued_data--;
-
-		if (queued_data &&
-		    tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
-			if (sock_flag(sk, SOCK_KEEPOPEN)) {
-				inet_csk_reset_keepalive_timer(sk,
-							       keepalive_time_when(tp));
-			} else {
-				inet_csk_delete_keepalive_timer(sk);
-			}
-
-			inet_csk_reqsk_queue_add(
-				tp->defer_tcp_accept.listen_sk,
-				tp->defer_tcp_accept.request,
-				sk);
-
-			tp->defer_tcp_accept.listen_sk->sk_data_ready(
-				tp->defer_tcp_accept.listen_sk, 0);
-
-			sock_put(tp->defer_tcp_accept.listen_sk);
-			sock_put(sk);
-			tp->defer_tcp_accept.listen_sk = NULL;
-			tp->defer_tcp_accept.request = NULL;
-		} else if (hasfin ||
-			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
-			tcp_reset(sk);
-			return -1;
-		}
-	}
-	return 0;
-}
-
 static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -4935,8 +4892,6 @@ step5:
 
 	tcp_data_snd_check(sk);
 	tcp_ack_snd_check(sk);
-
-	tcp_defer_accept_check(sk);
 	return 0;
 
 csum_error:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cd601a8..b698b5b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1918,14 +1918,6 @@ int tcp_v4_destroy_sock(struct sock *sk)
 		sk->sk_sndmsg_page = NULL;
 	}
 
-	if (tp->defer_tcp_accept.request) {
-		reqsk_free(tp->defer_tcp_accept.request);
-		sock_put(tp->defer_tcp_accept.listen_sk);
-		sock_put(sk);
-		tp->defer_tcp_accept.listen_sk = NULL;
-		tp->defer_tcp_accept.request = NULL;
-	}
-
 	atomic_dec(&tcp_sockets_allocated);
 
 	return 0;
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 019c8c1..8245247 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -571,8 +571,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 	   does sequence test, SYN is truncated, and thus we consider
 	   it a bare ACK.
 
-	   Both ends (listening sockets) accept the new incoming
-	   connection and try to talk to each other. 8-)
+	   If icsk->icsk_accept_queue.rskq_defer_accept, we silently drop this
+	   bare ACK.  Otherwise, we create an established connection.  Both
+	   ends (listening sockets) accept the new incoming connection and try
+	   to talk to each other. 8-)
 
 	   Note: This case is both harmless, and rare.  Possibility is about the
 	   same as us discovering intelligent life on another plant tomorrow.
@@ -640,6 +642,13 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 		if (!(flg & TCP_FLAG_ACK))
 			return NULL;
 
+		/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
+		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
+		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+			inet_rsk(req)->acked = 1;
+			return NULL;
+		}
+
 		/* OK, ACK is valid, create big socket and
 		 * feed this segment to it. It will repeat all
 		 * the tests. THIS SEGMENT MUST MOVE SOCKET TO
@@ -678,24 +687,7 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 		inet_csk_reqsk_queue_unlink(sk, req, prev);
 		inet_csk_reqsk_queue_removed(sk, req);
 
-		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
-		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
-
-			/* the accept queue handling is done is est recv slow
-			 * path so lets make sure to start there
-			 */
-			tcp_sk(child)->pred_flags = 0;
-			sock_hold(sk);
-			sock_hold(child);
-			tcp_sk(child)->defer_tcp_accept.listen_sk = sk;
-			tcp_sk(child)->defer_tcp_accept.request = req;
-
-			inet_csk_reset_keepalive_timer(child,
-						       inet_csk(sk)->icsk_accept_queue.rskq_defer_accept * HZ);
-		} else {
-			inet_csk_reqsk_queue_add(sk, req, child);
-		}
-
+		inet_csk_reqsk_queue_add(sk, req, child);
 		return child;
 
 	listen_overflow:
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 4de68cf..63ed9d6 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -489,11 +489,6 @@ static void tcp_keepalive_timer (unsigned long data)
 		goto death;
 	}
 
-	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
-		tcp_send_active_reset(sk, GFP_ATOMIC);
-		goto death;
-	}
-
 	if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
 		goto out;
 
-- 
1.5.5.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-31 12:18           ` Ilpo Järvinen
@ 2008-05-31 12:54             ` Ingo Molnar
  2008-05-31 12:58               ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-05-31 12:54 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki,
	Andrew Morton, Evgeniy Polyakov


* Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there 
> > were some post 2.6.25 changes into it)?
> 
> Hmm, if so, please try this revert below (I hope I got everything that 
> is related there, gitk is currently broken for me so that viewing many 
> commits quickly is a pain, so I might have accidently missed 
> something)...

what is the easiest way to figure out whether my version of distcc 
enables TCP_DEFER_ACCEPT?

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-31 12:54             ` Ingo Molnar
@ 2008-05-31 12:58               ` Ilpo Järvinen
  2008-05-31 16:35                 ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-05-31 12:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki,
	Andrew Morton, Evgeniy Polyakov

[-- Attachment #1: Type: TEXT/PLAIN, Size: 781 bytes --]

On Sat, 31 May 2008, Ingo Molnar wrote:

> 
> * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> 
> > > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there 
> > > were some post 2.6.25 changes into it)?
> > 
> > Hmm, if so, please try this revert below (I hope I got everything that 
> > is related there, gitk is currently broken for me so that viewing many 
> > commits quickly is a pain, so I might have accidently missed 
> > something)...
> 
> what is the easiest way to figure out whether my version of distcc 
> enables TCP_DEFER_ACCEPT?

strace distccd (the listening party is the interesting one here, not the 
distcc)?

...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, sizeof(val)) 
seems to be the magic trick that is interestion here.

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-31 12:58               ` Ilpo Järvinen
@ 2008-05-31 16:35                 ` Ingo Molnar
  2008-06-03  9:40                   ` [fixed] [patch] " Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-05-31 16:35 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki,
	Andrew Morton, Evgeniy Polyakov


* Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> > what is the easiest way to figure out whether my version of distcc 
> > enables TCP_DEFER_ACCEPT?
> 
> strace distccd (the listening party is the interesting one here, not 
> the distcc)?
> 
> ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, sizeof(val)) 
> seems to be the magic trick that is interestion here.

seems to be used:

 22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62
 22003 listen(4, 10)                     = 0
 22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0

i'll queue up your reverts for testing in -tip.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-05-31 16:35                 ` Ingo Molnar
@ 2008-06-03  9:40                   ` Ingo Molnar
  2008-06-03 14:41                     ` Patrick McManus
  2008-06-03 21:46                     ` Ilpo Järvinen
  0 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2008-06-03  9:40 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki,
	Andrew Morton, Evgeniy Polyakov


* Ingo Molnar <mingo@elte.hu> wrote:

> > ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, 
> > sizeof(val)) seems to be the magic trick that is interestion here.
> 
> seems to be used:
> 
>  22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62
>  22003 listen(4, 10)                     = 0
>  22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0
> 
> i'll queue up your reverts for testing in -tip.

update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely 
fixed the hangs!

Here is the testing i did:

first i ran about 500+ successful iterations on the affected testboxes 
with your revert patch applied, on multiple systems. Then today, without 
changing anything else on one of the testsystems i reverted your revert 
on that single system. After about an hour of testing, in 20 iterations 
i got a hang again over localhost:

 titan:~> netstat -nt
 Active Internet connections (w/o servers)
 Proto Recv-Q Send-Q Local Address               Foreign Address             State
 tcp        0 174592 10.0.1.14:34710             10.0.1.14:3632              ESTABLISHED
 tcp    72145      0 10.0.1.14:3632              10.0.1.14:34710             ESTABLISHED

so i hereby conclude that your revert works :) I've repeated the commit 
below that resolves this nasty regression.

phew ...

	Ingo

---------------->
commit dad98991c137c089d64a3057dddc3a12bd6866b0
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Date:   Sat May 31 15:18:56 2008 +0300

    tcp: revert DEFER_ACCEPT modifications
    
    On Sat, 31 May 2008, Ilpo Järvinen wrote:
    
    > On Sat, 31 May 2008, Ingo Molnar wrote:
    >
    > > ok, once i added back the localhost distcc component and the hung kernel
    > > build + stuck TCP socket bug happened again overnight:
    > >
    > >  Active Internet connections (w/o servers)
    > >  Proto Recv-Q Send-Q Local Address               Foreign Address             State
    > >  tcp    72187      0 10.0.1.14:3632              10.0.1.14:47910             ESTABLISHED
    > >  tcp        0 174464 10.0.1.14:47910             10.0.1.14:3632              ESTABLISHED
    > >
    > > so it seems distcc over localhost was the aspect that made it fail.
    > >
    > > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_
    > > sides of the connection. But that is just a theory - it could be timing,
    > > etc.
    >
    > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were
    > some post 2.6.25 changes into it)?
    
    Hmm, if so, please try this revert below (I hope I got everything that is
    related there, gitk is currently broken for me so that viewing many
    commits quickly is a pain, so I might have accidently missed something)...
    
    Reverts these commits:
     [TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_t
     [TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack
     [TCP]: TCP_DEFER_ACCEPT updates - process as established
     tcp: Fix slab corruption with ipv6 and tcp6fuzz
    
    ..., ie.,
     539fae89bebd16ebeafd57a87169bc56eb530d76,
     e4c78840284f3f51b1896cf3936d60a6033c4d2c,
     ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 and
     9ae27e0adbf471c7a6b80102e38e1d5a346b3b38.
    
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 18e62e3..b31b6b7 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -239,11 +239,6 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
 	return (struct tcp_request_sock *)req;
 }
 
-struct tcp_deferred_accept_info {
-	struct sock *listen_sk;
-	struct request_sock *request;
-};
-
 struct tcp_sock {
 	/* inet_connection_sock has to be the first member of tcp_sock */
 	struct inet_connection_sock	inet_conn;
@@ -379,8 +374,6 @@ struct tcp_sock {
 	unsigned int		keepalive_intvl;  /* time interval between keep alive probes */
 	int			linger2;
 
-	struct tcp_deferred_accept_info defer_tcp_accept;
-
 	unsigned long last_synq_overflow; 
 
 	u32	tso_deferred;
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index b220b5f..0c96e7b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -115,8 +115,8 @@ struct request_sock_queue {
 	struct request_sock	*rskq_accept_head;
 	struct request_sock	*rskq_accept_tail;
 	rwlock_t		syn_wait_lock;
-	u16			rskq_defer_accept;
-	/* 2 bytes hole, try to pack */
+	u8			rskq_defer_accept;
+	/* 3 bytes hole, try to pack */
 	struct listen_sock	*listen_opt;
 };
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 633147c..981d5ba 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -139,7 +139,6 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define MAX_TCP_KEEPINTVL	32767
 #define MAX_TCP_KEEPCNT		127
 #define MAX_TCP_SYNCNT		127
-#define MAX_TCP_ACCEPT_DEFERRED 65535
 
 #define TCP_SYNQ_INTERVAL	(HZ/5)	/* Period of SYNACK timer */
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 828ea21..ec83448 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -419,7 +419,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 	struct inet_connection_sock *icsk = inet_csk(parent);
 	struct request_sock_queue *queue = &icsk->icsk_accept_queue;
 	struct listen_sock *lopt = queue->listen_opt;
-	int thresh = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries;
+	int max_retries = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries;
+	int thresh = max_retries;
 	unsigned long now = jiffies;
 	struct request_sock **reqp, *req;
 	int i, budget;
@@ -455,6 +456,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		}
 	}
 
+	if (queue->rskq_defer_accept)
+		max_retries = queue->rskq_defer_accept;
+
 	budget = 2 * (lopt->nr_table_entries / (timeout / interval));
 	i = lopt->clock_hand;
 
@@ -462,8 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		reqp=&lopt->syn_table[i];
 		while ((req = *reqp) != NULL) {
 			if (time_after_eq(now, req->expires)) {
-				if (req->retrans < thresh &&
-				    !req->rsk_ops->rtx_syn_ack(parent, req)) {
+				if ((req->retrans < thresh ||
+				     (inet_rsk(req)->acked && req->retrans < max_retries))
+				    && !req->rsk_ops->rtx_syn_ack(parent, req)) {
 					unsigned long timeo;
 
 					if (req->retrans++ == 0)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f886531..a0deead 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2105,12 +2105,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		break;
 
 	case TCP_DEFER_ACCEPT:
-		if (val < 0) {
-			err = -EINVAL;
-		} else {
-			if (val > MAX_TCP_ACCEPT_DEFERRED)
-				val = MAX_TCP_ACCEPT_DEFERRED;
-			icsk->icsk_accept_queue.rskq_defer_accept = val;
+		icsk->icsk_accept_queue.rskq_defer_accept = 0;
+		if (val > 0) {
+			/* Translate value in seconds to number of
+			 * retransmits */
+			while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
+			       val > ((TCP_TIMEOUT_INIT / HZ) <<
+				       icsk->icsk_accept_queue.rskq_defer_accept))
+				icsk->icsk_accept_queue.rskq_defer_accept++;
+			icsk->icsk_accept_queue.rskq_defer_accept++;
 		}
 		break;
 
@@ -2292,7 +2295,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 			val = (val ? : sysctl_tcp_fin_timeout) / HZ;
 		break;
 	case TCP_DEFER_ACCEPT:
-		val = icsk->icsk_accept_queue.rskq_defer_accept;
+		val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 :
+			((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1));
 		break;
 	case TCP_WINDOW_CLAMP:
 		val = tp->window_clamp;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b54d9d3..ddc2e89 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4532,49 +4532,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th)
 	}
 }
 
-static int tcp_defer_accept_check(struct sock *sk)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (tp->defer_tcp_accept.request) {
-		int queued_data =  tp->rcv_nxt - tp->copied_seq;
-		int hasfin =  !skb_queue_empty(&sk->sk_receive_queue) ?
-			tcp_hdr((struct sk_buff *)
-				sk->sk_receive_queue.prev)->fin : 0;
-
-		if (queued_data && hasfin)
-			queued_data--;
-
-		if (queued_data &&
-		    tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
-			if (sock_flag(sk, SOCK_KEEPOPEN)) {
-				inet_csk_reset_keepalive_timer(sk,
-							       keepalive_time_when(tp));
-			} else {
-				inet_csk_delete_keepalive_timer(sk);
-			}
-
-			inet_csk_reqsk_queue_add(
-				tp->defer_tcp_accept.listen_sk,
-				tp->defer_tcp_accept.request,
-				sk);
-
-			tp->defer_tcp_accept.listen_sk->sk_data_ready(
-				tp->defer_tcp_accept.listen_sk, 0);
-
-			sock_put(tp->defer_tcp_accept.listen_sk);
-			sock_put(sk);
-			tp->defer_tcp_accept.listen_sk = NULL;
-			tp->defer_tcp_accept.request = NULL;
-		} else if (hasfin ||
-			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
-			tcp_reset(sk);
-			return -1;
-		}
-	}
-	return 0;
-}
-
 static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -4935,8 +4892,6 @@ step5:
 
 	tcp_data_snd_check(sk);
 	tcp_ack_snd_check(sk);
-
-	tcp_defer_accept_check(sk);
 	return 0;
 
 csum_error:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cd601a8..b698b5b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1918,14 +1918,6 @@ int tcp_v4_destroy_sock(struct sock *sk)
 		sk->sk_sndmsg_page = NULL;
 	}
 
-	if (tp->defer_tcp_accept.request) {
-		reqsk_free(tp->defer_tcp_accept.request);
-		sock_put(tp->defer_tcp_accept.listen_sk);
-		sock_put(sk);
-		tp->defer_tcp_accept.listen_sk = NULL;
-		tp->defer_tcp_accept.request = NULL;
-	}
-
 	atomic_dec(&tcp_sockets_allocated);
 
 	return 0;
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 019c8c1..8245247 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -571,8 +571,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 	   does sequence test, SYN is truncated, and thus we consider
 	   it a bare ACK.
 
-	   Both ends (listening sockets) accept the new incoming
-	   connection and try to talk to each other. 8-)
+	   If icsk->icsk_accept_queue.rskq_defer_accept, we silently drop this
+	   bare ACK.  Otherwise, we create an established connection.  Both
+	   ends (listening sockets) accept the new incoming connection and try
+	   to talk to each other. 8-)
 
 	   Note: This case is both harmless, and rare.  Possibility is about the
 	   same as us discovering intelligent life on another plant tomorrow.
@@ -640,6 +642,13 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 		if (!(flg & TCP_FLAG_ACK))
 			return NULL;
 
+		/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
+		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
+		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+			inet_rsk(req)->acked = 1;
+			return NULL;
+		}
+
 		/* OK, ACK is valid, create big socket and
 		 * feed this segment to it. It will repeat all
 		 * the tests. THIS SEGMENT MUST MOVE SOCKET TO
@@ -678,24 +687,7 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 		inet_csk_reqsk_queue_unlink(sk, req, prev);
 		inet_csk_reqsk_queue_removed(sk, req);
 
-		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
-		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
-
-			/* the accept queue handling is done is est recv slow
-			 * path so lets make sure to start there
-			 */
-			tcp_sk(child)->pred_flags = 0;
-			sock_hold(sk);
-			sock_hold(child);
-			tcp_sk(child)->defer_tcp_accept.listen_sk = sk;
-			tcp_sk(child)->defer_tcp_accept.request = req;
-
-			inet_csk_reset_keepalive_timer(child,
-						       inet_csk(sk)->icsk_accept_queue.rskq_defer_accept * HZ);
-		} else {
-			inet_csk_reqsk_queue_add(sk, req, child);
-		}
-
+		inet_csk_reqsk_queue_add(sk, req, child);
 		return child;
 
 	listen_overflow:
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 4de68cf..63ed9d6 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -489,11 +489,6 @@ static void tcp_keepalive_timer (unsigned long data)
 		goto death;
 	}
 
-	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
-		tcp_send_active_reset(sk, GFP_ATOMIC);
-		goto death;
-	}
-
 	if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
 		goto out;
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03  9:40                   ` [fixed] [patch] " Ingo Molnar
@ 2008-06-03 14:41                     ` Patrick McManus
  2008-06-03 21:46                     ` Ilpo Järvinen
  1 sibling, 0 replies; 35+ messages in thread
From: Patrick McManus @ 2008-06-03 14:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ilpo Järvinen, Peter Zijlstra, LKML, Netdev, David S. Miller,
	Rafael J. Wysocki, Andrew Morton, Evgeniy Polyakov

On Tue, 2008-06-03 at 11:40 +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, 
> > > sizeof(val)) seems to be the magic trick that is interestion here.
> > 
> > seems to be used:
> > 
> >  22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62
> >  22003 listen(4, 10)                     = 0
> >  22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0
> > 
> > i'll queue up your reverts for testing in -tip.
> 
> update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely 
> fixed the hangs!
> 
> Here is the testing i did:


hm.

Here's a theory - the DA code put the socket in the accept queue after
data has been queued on the socket, instead of the other way around
which is normal.

Could this be confusing the select() mechanism?

-Pat









^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03  9:40                   ` [fixed] [patch] " Ingo Molnar
  2008-06-03 14:41                     ` Patrick McManus
@ 2008-06-03 21:46                     ` Ilpo Järvinen
  2008-06-03 22:01                       ` Ilpo Järvinen
  2008-06-04  7:23                       ` Ingo Molnar
  1 sibling, 2 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-03 21:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki,
	Andrew Morton, Evgeniy Polyakov, Patrick McManus

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3897 bytes --]

On Tue, 3 Jun 2008, Ingo Molnar wrote:

> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, 
> > > sizeof(val)) seems to be the magic trick that is interestion here.
> > 
> > seems to be used:
> > 
> >  22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62
> >  22003 listen(4, 10)                     = 0
> >  22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0
> > 
> > i'll queue up your reverts for testing in -tip.
> 
> update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely 
> fixed the hangs!

...It wasn't exactly out-of-tree, Evgeniy fixed a problem that was found
in "TCP_DEFER_ACCEPT updates - process as established", perhaps it just 
wasn't in your testing tree yet.

$ git-log -n 1 --pretty=full 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38 | 
grep "Commit:"
Commit: David S. Miller <davem@davemloft.net>

> Here is the testing i did:
> 
> first i ran about 500+ successful iterations on the affected testboxes 
> with your revert patch applied, on multiple systems.

Are you sure this is enough to conclude the results? Seems quite small 
number to me to rule out luck. Especially considering that it was some 
amount of time in the tree already until you noticed it for the first 
time.

Anyway, nice that it seems to be helping. It was almost the only 
possibility on TCP side, I don't think there were any other state machine 
related changes. So it wasn't just "random revert" in that sense like you 
were implying :-), I just didn't have any theory how it would cause the 
problem... ...I even first disregarded DA that because of timeline 
in-exactness and because I wrongly assumed that distcc probably won't use 
it anyway, but then I checked later on and found out that it was present 
at least in the source I had lying around.

Anyway, it might be that the revert was a bit overkill, I'm not fully sure 
if 539fae and e4c7884 need to be reverted to fix it since main changes 
are in ec3c098. I just didn't want to take chances at first and put them 
all to the revert list.

> Then today, without 
> changing anything else on one of the testsystems i reverted your revert 
> on that single system. After about an hour of testing, in 20 iterations 
> i got a hang again over localhost:
> 
>  titan:~> netstat -nt
>  Active Internet connections (w/o servers)
>  Proto Recv-Q Send-Q Local Address               Foreign Address             State
>  tcp        0 174592 10.0.1.14:34710             10.0.1.14:3632              ESTABLISHED
>  tcp    72145      0 10.0.1.14:3632              10.0.1.14:34710             ESTABLISHED
> 
> so i hereby conclude that your revert works :) I've repeated the commit 
> below that resolves this nasty regression.

...I couldn't immediately find anything obviously wrong with those changes 
but the patch below might be worth of a try (without the revert of 
course). If it ever spits out that WARN_ON for you, we were playing with 
fire too much and it's better to return on the safe side there...

-- 
 i.

[PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on

If header prediction is turned on under some circumstances,
DA can deadlock though I have great trouble in figuring out
how it could ever happen while ending up into that else
branch (but I've been wrong before as well :-)).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c9454f0..0d9a3fe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4595,6 +4595,9 @@ static int tcp_defer_accept_check(struct sock *sk)
 			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
 			tcp_reset(sk);
 			return -1;
+		} else {
+			WARN_ON(tp->pred_flags);
+			tp->pred_flags = 0;
 		}
 	}
 	return 0;
-- 
1.5.2.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 21:46                     ` Ilpo Järvinen
@ 2008-06-03 22:01                       ` Ilpo Järvinen
  2008-06-03 22:03                         ` David Miller
  2008-06-04  7:23                       ` Ingo Molnar
  1 sibling, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-03 22:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki,
	Andrew Morton, Evgeniy Polyakov, Patrick McManus

[-- Attachment #1: Type: TEXT/PLAIN, Size: 647 bytes --]

On Wed, 4 Jun 2008, Ilpo Järvinen wrote:

> ...I couldn't immediately find anything obviously wrong with those changes 
> but the patch below might be worth of a try (without the revert of 
> course). If it ever spits out that WARN_ON for you, we were playing with 
> fire too much and it's better to return on the safe side there...


> [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on
> 
> If header prediction is turned on under some circumstances,
> DA can deadlock though I have great trouble in figuring out

...Nah, keepalive timer would then eventually kill it then, so no
deadlock seems possible through that one.

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 22:01                       ` Ilpo Järvinen
@ 2008-06-03 22:03                         ` David Miller
  2008-06-03 22:10                           ` Ilpo Järvinen
  2008-06-03 23:22                           ` Ilpo Järvinen
  0 siblings, 2 replies; 35+ messages in thread
From: David Miller @ 2008-06-03 22:03 UTC (permalink / raw)
  To: ilpo.jarvinen
  Cc: mingo, peterz, linux-kernel, netdev, rjw, akpm, johnpol, mcmanus

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 4 Jun 2008 01:01:25 +0300 (EEST)

> On Wed, 4 Jun 2008, Ilpo Järvinen wrote:
> 
> > ...I couldn't immediately find anything obviously wrong with those changes 
> > but the patch below might be worth of a try (without the revert of 
> > course). If it ever spits out that WARN_ON for you, we were playing with 
> > fire too much and it's better to return on the safe side there...
> 
> 
> > [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on
> > 
> > If header prediction is turned on under some circumstances,
> > DA can deadlock though I have great trouble in figuring out
> 
> ...Nah, keepalive timer would then eventually kill it then, so no
> deadlock seems possible through that one.

Keepalive is very long, it might still "seem" like a deadlock for
someone without much patience :-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 22:03                         ` David Miller
@ 2008-06-03 22:10                           ` Ilpo Järvinen
  2008-06-03 23:22                           ` Ilpo Järvinen
  1 sibling, 0 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-03 22:10 UTC (permalink / raw)
  To: David Miller
  Cc: mingo, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol, mcmanus

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1161 bytes --]

On Tue, 3 Jun 2008, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Wed, 4 Jun 2008 01:01:25 +0300 (EEST)
> 
> > On Wed, 4 Jun 2008, Ilpo Järvinen wrote:
> > 
> > > ...I couldn't immediately find anything obviously wrong with those changes 
> > > but the patch below might be worth of a try (without the revert of 
> > > course). If it ever spits out that WARN_ON for you, we were playing with 
> > > fire too much and it's better to return on the safe side there...
> > 
> > 
> > > [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on
> > > 
> > > If header prediction is turned on under some circumstances,
> > > DA can deadlock though I have great trouble in figuring out
> > 
> > ...Nah, keepalive timer would then eventually kill it then, so no
> > deadlock seems possible through that one.
> 
> Keepalive is very long, it might still "seem" like a deadlock for
> someone without much patience :-)

Sure, but I think Ingo kept it once hang for hours, ~8h or so...
...Though I'm not sure if that qualifies as a proof of Ingo's patience 
but rather shows his willingness to help solving the matter... ;-)

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 22:03                         ` David Miller
  2008-06-03 22:10                           ` Ilpo Järvinen
@ 2008-06-03 23:22                           ` Ilpo Järvinen
  2008-06-03 23:54                             ` Joe Perches
                                               ` (3 more replies)
  1 sibling, 4 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-03 23:22 UTC (permalink / raw)
  To: David Miller, mingo, mcmanus, peterz
  Cc: LKML, Netdev, rjw, Andrew Morton, johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3981 bytes --]

On Tue, 3 Jun 2008, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Wed, 4 Jun 2008 01:01:25 +0300 (EEST)
> 
> > On Wed, 4 Jun 2008, Ilpo Järvinen wrote:
> > 
> > > ...I couldn't immediately find anything obviously wrong with those changes 
> > > but the patch below might be worth of a try (without the revert of 
> > > course). If it ever spits out that WARN_ON for you, we were playing with 
> > > fire too much and it's better to return on the safe side there...
> > 
> > 
> > > [PATCH] tcp DEFER_ACCEPT: see if header prediction got turned on
> > > 
> > > If header prediction is turned on under some circumstances,
> > > DA can deadlock though I have great trouble in figuring out
> > 
> > ...Nah, keepalive timer would then eventually kill it then, so no
> > deadlock seems possible through that one.
> 
> Keepalive is very long, it might still "seem" like a deadlock for
> someone without much patience :-)

I think we want that clearing there, it's better to be safe than sorry 
there and to not put any trust on the keepalive thingie which tears down 
rather than results in a connection.

But here's somewhat more likely explanation... Only compile tested...
It probably needs some commenting from people who understand locking 
variants & details (I don't).


-- 
 i.


--
[PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk

It seems that replacement of DA code also moved parts outside
of appropriate locking. The Ingo's problem seems to come from
the fact that two flows could now race in
(inet_csk_)reqsk_queue_add corrupting the queue. ...This can
leave dangling socks around which won't resolve themselves
without stimuli from outside (e.g., external RST would help
I think).

Then some details I'm not too sure of:
I guess we want to put listen_sk->sk_state checking under the
lock as well. I've not evaluated if ->sk_data_ready too
requires locking but assumed it does.

I'm by no means familiar with all locking variants, requirements,
etc.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   23 +++++++++++++----------
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c9454f0..d21d2b9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	if (tp->defer_tcp_accept.request) {
+		struct sock *listen_sk = tp->defer_tcp_accept.listen_sk;
 		int queued_data =  tp->rcv_nxt - tp->copied_seq;
 		int hasfin =  !skb_queue_empty(&sk->sk_receive_queue) ?
 			tcp_hdr((struct sk_buff *)
@@ -4570,8 +4571,9 @@ static int tcp_defer_accept_check(struct sock *sk)
 		if (queued_data && hasfin)
 			queued_data--;
 
-		if (queued_data &&
-		    tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
+		bh_lock_sock(listen_sk);
+
+		if (queued_data && listen_sk->sk_state == TCP_LISTEN) {
 			if (sock_flag(sk, SOCK_KEEPOPEN)) {
 				inet_csk_reset_keepalive_timer(sk,
 							       keepalive_time_when(tp));
@@ -4579,23 +4581,24 @@ static int tcp_defer_accept_check(struct sock *sk)
 				inet_csk_delete_keepalive_timer(sk);
 			}
 
-			inet_csk_reqsk_queue_add(
-				tp->defer_tcp_accept.listen_sk,
-				tp->defer_tcp_accept.request,
-				sk);
+			inet_csk_reqsk_queue_add(listen_sk,
+						 tp->defer_tcp_accept.request,
+						 sk);
 
 			tp->defer_tcp_accept.listen_sk->sk_data_ready(
-				tp->defer_tcp_accept.listen_sk, 0);
+						listen_sk, 0);
 
-			sock_put(tp->defer_tcp_accept.listen_sk);
+			sock_put(listen_sk);
 			sock_put(sk);
 			tp->defer_tcp_accept.listen_sk = NULL;
 			tp->defer_tcp_accept.request = NULL;
-		} else if (hasfin ||
-			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
+		} else if (hasfin || listen_sk->sk_state != TCP_LISTEN) {
+			bh_unlock_sock(listen_sk);
 			tcp_reset(sk);
 			return -1;
 		}
+
+		bh_unlock_sock(listen_sk);
 	}
 	return 0;
 }
-- 
1.5.2.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 23:22                           ` Ilpo Järvinen
@ 2008-06-03 23:54                             ` Joe Perches
  2008-06-04  6:25                               ` Ilpo Järvinen
  2008-06-04  2:54                             ` Patrick McManus
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Joe Perches @ 2008-06-03 23:54 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: David Miller, mingo, mcmanus, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote:
> But here's somewhat more likely explanation... Only compile tested...
> It probably needs some commenting from people who understand locking 
> variants & details (I don't).
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index c9454f0..d21d2b9 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk)
>  	struct tcp_sock *tp = tcp_sk(sk);
>  
>  	if (tp->defer_tcp_accept.request) {
> +		struct sock *listen_sk = tp->defer_tcp_accept.listen_sk;

Not commenting on the locking, but I think the code
would be clearer if tp->defer_tcp_accept was used in
a temporary instead.

Perhaps:

 net/ipv4/tcp_input.c |   33 ++++++++++++++++-----------------
 1 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b54d9d3..f846e11 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4535,18 +4535,18 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th)
 static int tcp_defer_accept_check(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (tp->defer_tcp_accept.request) {
-		int queued_data =  tp->rcv_nxt - tp->copied_seq;
-		int hasfin =  !skb_queue_empty(&sk->sk_receive_queue) ?
+	struct tcp_deferred_accept_info *tdai = &tp->defer_tcp_accept;
+	if (tdai->request) {
+		int queued_data = tp->rcv_nxt - tp->copied_seq;
+		int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ?
 			tcp_hdr((struct sk_buff *)
 				sk->sk_receive_queue.prev)->fin : 0;
 
 		if (queued_data && hasfin)
 			queued_data--;
 
-		if (queued_data &&
-		    tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
+		bh_lock_sock(tdai->listen_sk);
+		if (queued_data && tdai->listen_sk->sk_state == TCP_LISTEN) {
 			if (sock_flag(sk, SOCK_KEEPOPEN)) {
 				inet_csk_reset_keepalive_timer(sk,
 							       keepalive_time_when(tp));
@@ -4554,23 +4554,22 @@ static int tcp_defer_accept_check(struct sock *sk)
 				inet_csk_delete_keepalive_timer(sk);
 			}
 
-			inet_csk_reqsk_queue_add(
-				tp->defer_tcp_accept.listen_sk,
-				tp->defer_tcp_accept.request,
-				sk);
+			inet_csk_reqsk_queue_add(tdai->listen_sk,
+						 tdai->request,
+						 sk);
 
-			tp->defer_tcp_accept.listen_sk->sk_data_ready(
-				tp->defer_tcp_accept.listen_sk, 0);
+			tdai->listen_sk->sk_data_ready(tdai->listen_sk, 0);
 
-			sock_put(tp->defer_tcp_accept.listen_sk);
+			sock_put(tdai->listen_sk);
 			sock_put(sk);
-			tp->defer_tcp_accept.listen_sk = NULL;
-			tp->defer_tcp_accept.request = NULL;
-		} else if (hasfin ||
-			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
+			tdai->listen_sk = NULL;
+			tdai->request = NULL;
+		} else if (hasfin || tdai->listen_sk->sk_state != TCP_LISTEN) {
+			bh_unlock_sock(tdai->listen_sk);
 			tcp_reset(sk);
 			return -1;
 		}
+		bh_unlock_sock(tdai->listen_sk);
 	}
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 23:54                             ` Joe Perches
@ 2008-06-04  6:25                               ` Ilpo Järvinen
  0 siblings, 0 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-04  6:25 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Miller, mingo, mcmanus, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3532 bytes --]

On Tue, 3 Jun 2008, Joe Perches wrote:

> On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote:
> > But here's somewhat more likely explanation... Only compile tested...
> > It probably needs some commenting from people who understand locking 
> > variants & details (I don't).
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index c9454f0..d21d2b9 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk)
> >  	struct tcp_sock *tp = tcp_sk(sk);
> >  
> >  	if (tp->defer_tcp_accept.request) {
> > +		struct sock *listen_sk = tp->defer_tcp_accept.listen_sk;
> 
> Not commenting on the locking, but I think the code
> would be clearer if tp->defer_tcp_accept was used in
> a temporary instead.

Only fixes to net-2.6 now, as agreed. ...So I try to leave syntax cleanup 
to net-next.

Besides, there's a trap which makes your patch a lot worse than
mine... ;-) I too once tried a version without temporary until
noticed that trap.

> Perhaps:
> 
>  net/ipv4/tcp_input.c |   33 ++++++++++++++++-----------------
>  1 files changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index b54d9d3..f846e11 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4535,18 +4535,18 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th)
>  static int tcp_defer_accept_check(struct sock *sk)
>  {
>  	struct tcp_sock *tp = tcp_sk(sk);
> -
> -	if (tp->defer_tcp_accept.request) {
> -		int queued_data =  tp->rcv_nxt - tp->copied_seq;
> -		int hasfin =  !skb_queue_empty(&sk->sk_receive_queue) ?
> +	struct tcp_deferred_accept_info *tdai = &tp->defer_tcp_accept;
> +	if (tdai->request) {
> +		int queued_data = tp->rcv_nxt - tp->copied_seq;
> +		int hasfin = !skb_queue_empty(&sk->sk_receive_queue) ?
>  			tcp_hdr((struct sk_buff *)
>  				sk->sk_receive_queue.prev)->fin : 0;
>  
>  		if (queued_data && hasfin)
>  			queued_data--;
>  
> -		if (queued_data &&
> -		    tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
> +		bh_lock_sock(tdai->listen_sk);
> +		if (queued_data && tdai->listen_sk->sk_state == TCP_LISTEN) {
>  			if (sock_flag(sk, SOCK_KEEPOPEN)) {
>  				inet_csk_reset_keepalive_timer(sk,
>  							       keepalive_time_when(tp));
> @@ -4554,23 +4554,22 @@ static int tcp_defer_accept_check(struct sock *sk)
>  				inet_csk_delete_keepalive_timer(sk);
>  			}
>  
> -			inet_csk_reqsk_queue_add(
> -				tp->defer_tcp_accept.listen_sk,
> -				tp->defer_tcp_accept.request,
> -				sk);
> +			inet_csk_reqsk_queue_add(tdai->listen_sk,
> +						 tdai->request,
> +						 sk);
>  
> -			tp->defer_tcp_accept.listen_sk->sk_data_ready(
> -				tp->defer_tcp_accept.listen_sk, 0);
> +			tdai->listen_sk->sk_data_ready(tdai->listen_sk, 0);
>  
> -			sock_put(tp->defer_tcp_accept.listen_sk);
> +			sock_put(tdai->listen_sk);
>  			sock_put(sk);
> -			tp->defer_tcp_accept.listen_sk = NULL;
> -			tp->defer_tcp_accept.request = NULL;
> -		} else if (hasfin ||
> -			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
> +			tdai->listen_sk = NULL;
> +			tdai->request = NULL;

uh-oh...

> +		} else if (hasfin || tdai->listen_sk->sk_state != TCP_LISTEN) {
> +			bh_unlock_sock(tdai->listen_sk);
>  			tcp_reset(sk);
>  			return -1;
>  		}
> +		bh_unlock_sock(tdai->listen_sk);

...which will crash in here. 

NAK.

>  	}
>  	return 0;
>  }
> 
> 
> 

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 23:22                           ` Ilpo Järvinen
  2008-06-03 23:54                             ` Joe Perches
@ 2008-06-04  2:54                             ` Patrick McManus
  2008-06-04  6:42                               ` Ilpo Järvinen
  2008-06-05 14:22                             ` Ingo Molnar
  2008-06-10 22:32                             ` David Miller
  3 siblings, 1 reply; 35+ messages in thread
From: Patrick McManus @ 2008-06-04  2:54 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: David Miller, mingo, peterz, LKML, Netdev, rjw, Andrew Morton,
	johnpol

On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote:
==
> --
> [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk
> 
> It seems that replacement of DA code also moved parts outside
> of appropriate locking. The Ingo's problem seems to come from
> the fact that two flows could now race in
> (inet_csk_)reqsk_queue_add corrupting the queue. ...This can
> leave dangling socks around which won't resolve themselves
> without stimuli from outside (e.g., external RST would help
> I think).


Ilpo, has anyone told you today that you rock? Well allow me - you
rock.
> 
do_rcv() clearly has the listening socket locked in the non-DA case, and
in the DA case it is the 'child' ESTABLISHED socket that is locked -
leaving the accept queue unprotected. So simple.

-Pat



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-04  2:54                             ` Patrick McManus
@ 2008-06-04  6:42                               ` Ilpo Järvinen
  0 siblings, 0 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-04  6:42 UTC (permalink / raw)
  To: Patrick McManus
  Cc: David Miller, mingo, peterz, LKML, Netdev, rjw, Andrew Morton,
	johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1045 bytes --]

On Tue, 3 Jun 2008, Patrick McManus wrote:

> On Wed, 2008-06-04 at 02:22 +0300, Ilpo Järvinen wrote:
> ==
> > --
> > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk
> > 
> > It seems that replacement of DA code also moved parts outside
> > of appropriate locking. The Ingo's problem seems to come from
> > the fact that two flows could now race in
> > (inet_csk_)reqsk_queue_add corrupting the queue. ...This can
> > leave dangling socks around which won't resolve themselves
> > without stimuli from outside (e.g., external RST would help
> > I think).

> do_rcv() clearly has the listening socket locked in the non-DA case, and
> in the DA case it is the 'child' ESTABLISHED socket that is locked -
> leaving the accept queue unprotected. So simple.

It also well explains why Ingo finally got the KERNEL: assertion 
(!sk->sk_ack_backlog) which is adjusted in the very same reqsk_queue_add 
making it and the actual queue come out of sync.

...too bad this has no relevance to Håkon's case, so more digging
is necessary... :-)

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 23:22                           ` Ilpo Järvinen
  2008-06-03 23:54                             ` Joe Perches
  2008-06-04  2:54                             ` Patrick McManus
@ 2008-06-05 14:22                             ` Ingo Molnar
  2008-06-05 18:00                               ` Ilpo Järvinen
  2008-06-10 22:32                             ` David Miller
  3 siblings, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-06-05 14:22 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: David Miller, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton,
	johnpol


* Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk

i unapplied the reverts then applied your fix, and got this hang after 
1.5 hours of testing:

 titan:~> netstat -nt
 Active Internet connections (w/o servers)
 Proto Recv-Q Send-Q Local Address           Foreign Address   State
 tcp        0  65032 10.0.1.14:39454         10.0.1.19:3632    ESTABLISHED

10.0.1.14 is this box, Core2Duo dual-core. 10.0.1.19 is another box with 
16 CPUs.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-05 14:22                             ` Ingo Molnar
@ 2008-06-05 18:00                               ` Ilpo Järvinen
  2008-06-05 21:13                                 ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-05 18:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Miller, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton,
	johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 669 bytes --]

On Thu, 5 Jun 2008, Ingo Molnar wrote:

> 
> * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> 
> > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk
> 
> i unapplied the reverts then applied your fix, and got this hang after 
> 1.5 hours of testing:
> 
>  titan:~> netstat -nt
>  Active Internet connections (w/o servers)
>  Proto Recv-Q Send-Q Local Address           Foreign Address   State
>  tcp        0  65032 10.0.1.14:39454         10.0.1.19:3632    ESTABLISHED
> 
> 10.0.1.14 is this box, Core2Duo dual-core. 10.0.1.19 is another box with 
> 16 CPUs.

Did 10.0.1.19 too have the fix (the receiving end is the important 
one to have this fix)?

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-05 18:00                               ` Ilpo Järvinen
@ 2008-06-05 21:13                                 ` Ilpo Järvinen
  2008-06-05 23:29                                   ` Patrick McManus
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-05 21:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Miller, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton,
	johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11941 bytes --]

On Thu, 5 Jun 2008, Ilpo Järvinen wrote:

> On Thu, 5 Jun 2008, Ingo Molnar wrote:
> 
> > 
> > * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> > 
> > > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk
> > 
> > i unapplied the reverts then applied your fix, and got this hang after 
> > 1.5 hours of testing:
> > 
> >  titan:~> netstat -nt
> >  Active Internet connections (w/o servers)
> >  Proto Recv-Q Send-Q Local Address           Foreign Address   State
> >  tcp        0  65032 10.0.1.14:39454         10.0.1.19:3632    ESTABLISHED
> > 
> > 10.0.1.14 is this box, Core2Duo dual-core. 10.0.1.19 is another box with 
> > 16 CPUs.
> 
> Did 10.0.1.19 too have the fix (the receiving end is the important 
> one to have this fix)?

Since you're probably going to tell that it was, here's a try which just 
reverts the most questionable commit out of those three DA related 
changes that were included in the first revert you found to be working 
(the other two are quite useful fixes).

I'm out of new ideas what could be still wrong (I got confused and lost 
track number of times while I tried to verify socket locking today and 
probably don't have more time for that now)... Unless somebody else 
(Patrick?) comes up with something quickly, I propose you Ingo try if
this smaller revert is enough to solve it and we leave those two fixes 
there and postpone the rest. If it isn't enough then things would again 
get quite interesting but that's unlikely to happen.

-- 
 i.

--
[PATCH] tcp: revert DEFER_ACCEPT updates - process as established

Reverts "[TCP]: TCP_DEFER_ACCEPT updates - process as established"
(ec3c0982a2dd1e6) and a fix made into it ("tcp: Fix slab corruption
with ipv6 and tcp6fuzz"; 9ae27e0adbf471c7).

Ingo Molnar found out that TCP flows got stuck into ESTABLISHED
state without receiving process. I tried to fix the locking for
listen_sk:
  http://marc.info/?l=linux-kernel&m=121253537018216&w=2
...but Ingo's tests still got stuck TCP flow with that fix.

Two other DA related fixes which were among the reverted commits
in the first revert attempt are not reverted here.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/linux/tcp.h             |    7 ------
 include/net/request_sock.h      |    4 +-
 include/net/tcp.h               |    1 -
 net/ipv4/inet_connection_sock.c |   11 +++++++--
 net/ipv4/tcp.c                  |   18 +++++++++------
 net/ipv4/tcp_input.c            |   45 ---------------------------------------
 net/ipv4/tcp_ipv4.c             |    8 -------
 net/ipv4/tcp_minisocks.c        |   32 ++++++++++-----------------
 net/ipv4/tcp_timer.c            |    5 ----
 9 files changed, 33 insertions(+), 98 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 9881295..07e79bd 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -239,11 +239,6 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
 	return (struct tcp_request_sock *)req;
 }
 
-struct tcp_deferred_accept_info {
-	struct sock *listen_sk;
-	struct request_sock *request;
-};
-
 struct tcp_sock {
 	/* inet_connection_sock has to be the first member of tcp_sock */
 	struct inet_connection_sock	inet_conn;
@@ -379,8 +374,6 @@ struct tcp_sock {
 	unsigned int		keepalive_time;	  /* time before keep alive takes place */
 	unsigned int		keepalive_intvl;  /* time interval between keep alive probes */
 
-	struct tcp_deferred_accept_info defer_tcp_accept;
-
 	unsigned long last_synq_overflow; 
 
 /* Receiver side RTT estimation */
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index b220b5f..0c96e7b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -115,8 +115,8 @@ struct request_sock_queue {
 	struct request_sock	*rskq_accept_head;
 	struct request_sock	*rskq_accept_tail;
 	rwlock_t		syn_wait_lock;
-	u16			rskq_defer_accept;
-	/* 2 bytes hole, try to pack */
+	u8			rskq_defer_accept;
+	/* 3 bytes hole, try to pack */
 	struct listen_sock	*listen_opt;
 };
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 633147c..981d5ba 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -139,7 +139,6 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define MAX_TCP_KEEPINTVL	32767
 #define MAX_TCP_KEEPCNT		127
 #define MAX_TCP_SYNCNT		127
-#define MAX_TCP_ACCEPT_DEFERRED 65535
 
 #define TCP_SYNQ_INTERVAL	(HZ/5)	/* Period of SYNACK timer */
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 828ea21..045e799 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -419,7 +419,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 	struct inet_connection_sock *icsk = inet_csk(parent);
 	struct request_sock_queue *queue = &icsk->icsk_accept_queue;
 	struct listen_sock *lopt = queue->listen_opt;
-	int thresh = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries;
+	int max_retries = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries;
+	int thresh = max_retries;
 	unsigned long now = jiffies;
 	struct request_sock **reqp, *req;
 	int i, budget;
@@ -455,6 +456,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		}
 	}
 
+	if (queue->rskq_defer_accept)
+		max_retries = queue->rskq_defer_accept;
+
 	budget = 2 * (lopt->nr_table_entries / (timeout / interval));
 	i = lopt->clock_hand;
 
@@ -462,8 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		reqp=&lopt->syn_table[i];
 		while ((req = *reqp) != NULL) {
 			if (time_after_eq(now, req->expires)) {
-				if (req->retrans < thresh &&
-				    !req->rsk_ops->rtx_syn_ack(parent, req)) {
+				if ((req->retrans < (inet_rsk(req)->acked ? max_retries : thresh)) &&
+				    (inet_rsk(req)->acked ||
+				     !req->rsk_ops->rtx_syn_ack(parent, req))) {
 					unsigned long timeo;
 
 					if (req->retrans++ == 0)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ab66683..fc54a48 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2112,12 +2112,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		break;
 
 	case TCP_DEFER_ACCEPT:
-		if (val < 0) {
-			err = -EINVAL;
-		} else {
-			if (val > MAX_TCP_ACCEPT_DEFERRED)
-				val = MAX_TCP_ACCEPT_DEFERRED;
-			icsk->icsk_accept_queue.rskq_defer_accept = val;
+		icsk->icsk_accept_queue.rskq_defer_accept = 0;
+		if (val > 0) {
+			/* Translate value in seconds to number of
+			 * retransmits */
+			while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
+			       val > ((TCP_TIMEOUT_INIT / HZ) <<
+				       icsk->icsk_accept_queue.rskq_defer_accept))
+				icsk->icsk_accept_queue.rskq_defer_accept++;
+			icsk->icsk_accept_queue.rskq_defer_accept++;
 		}
 		break;
 
@@ -2299,7 +2302,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 			val = (val ? : sysctl_tcp_fin_timeout) / HZ;
 		break;
 	case TCP_DEFER_ACCEPT:
-		val = icsk->icsk_accept_queue.rskq_defer_accept;
+		val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 :
+			((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1));
 		break;
 	case TCP_WINDOW_CLAMP:
 		val = tp->window_clamp;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1322fa1..fb1ab3f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4557,49 +4557,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th)
 	}
 }
 
-static int tcp_defer_accept_check(struct sock *sk)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (tp->defer_tcp_accept.request) {
-		int queued_data =  tp->rcv_nxt - tp->copied_seq;
-		int hasfin =  !skb_queue_empty(&sk->sk_receive_queue) ?
-			tcp_hdr((struct sk_buff *)
-				sk->sk_receive_queue.prev)->fin : 0;
-
-		if (queued_data && hasfin)
-			queued_data--;
-
-		if (queued_data &&
-		    tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
-			if (sock_flag(sk, SOCK_KEEPOPEN)) {
-				inet_csk_reset_keepalive_timer(sk,
-							       keepalive_time_when(tp));
-			} else {
-				inet_csk_delete_keepalive_timer(sk);
-			}
-
-			inet_csk_reqsk_queue_add(
-				tp->defer_tcp_accept.listen_sk,
-				tp->defer_tcp_accept.request,
-				sk);
-
-			tp->defer_tcp_accept.listen_sk->sk_data_ready(
-				tp->defer_tcp_accept.listen_sk, 0);
-
-			sock_put(tp->defer_tcp_accept.listen_sk);
-			sock_put(sk);
-			tp->defer_tcp_accept.listen_sk = NULL;
-			tp->defer_tcp_accept.request = NULL;
-		} else if (hasfin ||
-			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
-			tcp_reset(sk);
-			return -1;
-		}
-	}
-	return 0;
-}
-
 static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -4974,8 +4931,6 @@ step5:
 
 	tcp_data_snd_check(sk);
 	tcp_ack_snd_check(sk);
-
-	tcp_defer_accept_check(sk);
 	return 0;
 
 csum_error:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cd601a8..b698b5b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1918,14 +1918,6 @@ int tcp_v4_destroy_sock(struct sock *sk)
 		sk->sk_sndmsg_page = NULL;
 	}
 
-	if (tp->defer_tcp_accept.request) {
-		reqsk_free(tp->defer_tcp_accept.request);
-		sock_put(tp->defer_tcp_accept.listen_sk);
-		sock_put(sk);
-		tp->defer_tcp_accept.listen_sk = NULL;
-		tp->defer_tcp_accept.request = NULL;
-	}
-
 	atomic_dec(&tcp_sockets_allocated);
 
 	return 0;
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 019c8c1..8245247 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -571,8 +571,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 	   does sequence test, SYN is truncated, and thus we consider
 	   it a bare ACK.
 
-	   Both ends (listening sockets) accept the new incoming
-	   connection and try to talk to each other. 8-)
+	   If icsk->icsk_accept_queue.rskq_defer_accept, we silently drop this
+	   bare ACK.  Otherwise, we create an established connection.  Both
+	   ends (listening sockets) accept the new incoming connection and try
+	   to talk to each other. 8-)
 
 	   Note: This case is both harmless, and rare.  Possibility is about the
 	   same as us discovering intelligent life on another plant tomorrow.
@@ -640,6 +642,13 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 		if (!(flg & TCP_FLAG_ACK))
 			return NULL;
 
+		/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
+		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
+		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+			inet_rsk(req)->acked = 1;
+			return NULL;
+		}
+
 		/* OK, ACK is valid, create big socket and
 		 * feed this segment to it. It will repeat all
 		 * the tests. THIS SEGMENT MUST MOVE SOCKET TO
@@ -678,24 +687,7 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 		inet_csk_reqsk_queue_unlink(sk, req, prev);
 		inet_csk_reqsk_queue_removed(sk, req);
 
-		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
-		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
-
-			/* the accept queue handling is done is est recv slow
-			 * path so lets make sure to start there
-			 */
-			tcp_sk(child)->pred_flags = 0;
-			sock_hold(sk);
-			sock_hold(child);
-			tcp_sk(child)->defer_tcp_accept.listen_sk = sk;
-			tcp_sk(child)->defer_tcp_accept.request = req;
-
-			inet_csk_reset_keepalive_timer(child,
-						       inet_csk(sk)->icsk_accept_queue.rskq_defer_accept * HZ);
-		} else {
-			inet_csk_reqsk_queue_add(sk, req, child);
-		}
-
+		inet_csk_reqsk_queue_add(sk, req, child);
 		return child;
 
 	listen_overflow:
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 4de68cf..63ed9d6 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -489,11 +489,6 @@ static void tcp_keepalive_timer (unsigned long data)
 		goto death;
 	}
 
-	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
-		tcp_send_active_reset(sk, GFP_ATOMIC);
-		goto death;
-	}
-
 	if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
 		goto out;
 
-- 
1.5.2.2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-05 21:13                                 ` Ilpo Järvinen
@ 2008-06-05 23:29                                   ` Patrick McManus
  2008-06-06 10:03                                     ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: Patrick McManus @ 2008-06-05 23:29 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

On Fri, 2008-06-06 at 00:13 +0300, Ilpo Järvinen wrote:

> 
> I'm out of new ideas what could be still wrong (I got confused and
> lost 
> track number of times while I tried to verify socket locking today and 
> probably don't have more time for that now)... Unless somebody else 
> (Patrick?) comes up with something quickly,

Sorry, I don't see anything - it seems to boil down to the same code in
the DA and non-DA case as far as I can tell, but after a while all the
twisty passages seem to look alike.

If Ingo confirms that the recv end was running the locking patch code,
it would be interesting to just confirm the sysreq+t looks the same as
before - it is possible the patch turned the race into a non-obvious
deadlock.

I'm sure your smaller revert will make the problem go away just as the
larger one did, fwiw. 

The other odd thing is that Ingo did a lot of experimentation and was
only making this happen on localhost before (though I agree there is
nothing inherent about that lock and localhost) - isn't it odd that the
first trigger of it now is between two hosts? What do you make of that?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-05 23:29                                   ` Patrick McManus
@ 2008-06-06 10:03                                     ` Ilpo Järvinen
  2008-06-06 17:11                                       ` Patrick McManus
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-06 10:03 UTC (permalink / raw)
  To: Patrick McManus
  Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1779 bytes --]

On Thu, 5 Jun 2008, Patrick McManus wrote:

> On Fri, 2008-06-06 at 00:13 +0300, Ilpo Järvinen wrote:
> 
> > 
> > I'm out of new ideas what could be still wrong (I got confused and
> > lost 
> > track number of times while I tried to verify socket locking today and 
> > probably don't have more time for that now)... Unless somebody else 
> > (Patrick?) comes up with something quickly,
> 
> Sorry, I don't see anything - it seems to boil down to the same code in
> the DA and non-DA case as far as I can tell, but after a while all the
> twisty passages seem to look alike.

:-)
 
This Ingo's testcase should anyway be quite "simple", I mean that distcc 
shouldn't do anything unexpected in a sense it shouldn't abort the flows 
by not sending data, close the listening socket or other things like that.

> If Ingo confirms that the recv end was running the locking patch code,
> it would be interesting to just confirm the sysreq+t looks the same as
> before - it is possible the patch turned the race into a non-obvious
> deadlock.

...Yes, but we want that from the receiver's host rather than from the 
sender end. Also checking that sender is still doing window probes once 
per 2min is probably worth of it though a change in that is quite 
unlikely.

> I'm sure your smaller revert will make the problem go away just as the
> larger one did, fwiw. 

I'd very much except it to.

> The other odd thing is that Ingo did a lot of experimentation and was
> only making this happen on localhost before (though I agree there is
> nothing inherent about that lock and localhost) - isn't it odd that the
> first trigger of it now is between two hosts? What do you make of that?

...No, it occurred couple of times in the past between host as well, 
nothing new in that.

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 10:03                                     ` Ilpo Järvinen
@ 2008-06-06 17:11                                       ` Patrick McManus
  2008-06-06 17:33                                         ` Ingo Molnar
  2008-06-06 18:25                                         ` Ilpo Järvinen
  0 siblings, 2 replies; 35+ messages in thread
From: Patrick McManus @ 2008-06-06 17:11 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

> This Ingo's testcase should anyway be quite "simple", I mean that distcc 
> shouldn't do anything unexpected in a sense it shouldn't abort the flows 
> by not sending data, close the listening socket or other things like that.
> 

maybe - I've noted that I can get the distcc server to crash with just a
little fuzz (telnet to it and close the telnet) - but it is true I
haven't seen anything odd using the distcc client.

Anyhow, my news is that using rc5 I have managed to reproduce it on
localhost - so it isn't just ingo anymore ! ;)

I didn't have a 16 cpu machine at my disposal, so I was concerned I
wouldn't be able to make it happen - but I setup a 64 bit kvm image with
-smp 4, running on my core2-duo and created a makefile with 3 src files
that I just compiled as "while true; do make -j3 all; done" - the
makefile uses distcc to localhost and has intentionally broken
dependencies so it just keeps recompiling stuff. The input files are
approximately 135k, 98k, and 16k after running gcc -E on them (which I
what I assume distcc does before putting them down the socket).

On rc5 I could get the lockup in under 20 minutes.. usually 10. I think
I did it 4 times. My compile test is probably a better trigger than the
kernel compile because the distcc connects are never staggered like they
would be in a large directory of files. (3 files, -j4).

When I apply the locking patch you (Ilpo) wrote, I cannot reproduce the
error at all in the first 90 minutes of testing. I'll let the test run
and update the list.

I'm holding out hope that Ingo's report did not have the locking patch
on the distcc server end - because it certainly makes a difference for 
me.

-Patrick

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 17:11                                       ` Patrick McManus
@ 2008-06-06 17:33                                         ` Ingo Molnar
  2008-06-06 18:19                                           ` Ilpo Järvinen
  2008-06-06 18:25                                         ` Ilpo Järvinen
  1 sibling, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-06-06 17:33 UTC (permalink / raw)
  To: Patrick McManus
  Cc: Ilpo Järvinen, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

* Patrick McManus <mcmanus@ducksong.com> wrote:

> When I apply the locking patch you (Ilpo) wrote, I cannot reproduce 
> the error at all in the first 90 minutes of testing. I'll let the test 
> run and update the list.
> 
> I'm holding out hope that Ingo's report did not have the locking patch 
> on the distcc server end - because it certainly makes a difference for 
> me.

Hm, the distcc server had the full 3-patch-revert from Ilpo, was that 
supposed to fix the problem too, indirectly?

The box is running that 3-patch revert right now as well:

 phoenix:~> uptime
  19:20:28 up  9:58,  2 users,  load average: 7.75, 13.88, 30.95
 phoenix:~> uname -a
 Linux phoenix 2.6.26-rc4 #2352 SMP Fri Jun 6 09:18:07 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux

... and i never saw a single hang today in the 10 hours of uptime this 
box has. (and it built a good 500 kernel today) Nor any hang yesterday, 
and that was a good 500 kernels too.

You can see it that the box built more than two thousand kernels in the 
past few days alone, so it's a rather busy little bee. The other 
testboxes built even more kernels - a quad box built and booted 2500 
kernels:

 #define UTS_VERSION "#2524 SMP PREEMPT Fri Jun 6 19:22:21 CEST 2008"

and i never saw a hang on that box either. 

a third box has:

 titan:~> uname -a
 Linux titan 2.6.26-rc5-00002-g737697d-dirty #2557 SMP PREEMPT Fri Jun 6 
 19:24:00 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux

(this is the one that showed the hang for the first time)

The total count of kernel bootups i did this week for -tip QA was 
somewhere between five and ten thousand random build+bootups - and the 
only time i got a hang was when i removed the 3-patch-revert 
intentionally, on one of the boxes.

Maybe that 3-patch-revert just makes this locking bug a bit less likely 
to trigger, by accident? Out tip test-setup is specialized to find 
arch/x86 and scheduler bugs, not primarily to find networking bugs. (but 
at this test volume, and given that it makes use of distcc, it will 
trigger them too.)

i have a rather accurate timeline of when the hang first occured, do we 
know the timeline of the introduction of the locking bug by any chance? 
Which commit introduced it? (Ilpo's commit log does not say it)

Your test results are compelling nevertheless so i'll do a retest in any 
case, with all boxes either running an older kernel or a kernel with the 
locking fix.

	Ingo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 17:33                                         ` Ingo Molnar
@ 2008-06-06 18:19                                           ` Ilpo Järvinen
  2008-06-06 18:39                                             ` Ingo Molnar
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-06 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Patrick McManus, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

...I kind of fail to follow in general in this mail which patch have been 
tested and where and when... But I understand that it's just due to number 
of tests & hosts & kernels & what-else you use and know by heart (and we 
don't do all that well :-)). But I'll try to still sort it out below...

On Fri, 6 Jun 2008, Ingo Molnar wrote:

> 
> * Patrick McManus <mcmanus@ducksong.com> wrote:
> 
> > When I apply the locking patch you (Ilpo) wrote, I cannot reproduce 
> > the error at all in the first 90 minutes of testing. I'll let the test 
> > run and update the list.
> > 
> > I'm holding out hope that Ingo's report did not have the locking patch 
> > on the distcc server end - because it certainly makes a difference for 
> > me.
> 
> Hm, the distcc server had the full 3-patch-revert from Ilpo, was that 
> supposed to fix the problem too, indirectly?

Yes, the problematic outside of locking portion shouldn't be there
without those DA changes.

> The box is running that 3-patch revert right now as well:
> 
>  phoenix:~> uptime
>   19:20:28 up  9:58,  2 users,  load average: 7.75, 13.88, 30.95
>  phoenix:~> uname -a
>  Linux phoenix 2.6.26-rc4 #2352 SMP Fri Jun 6 09:18:07 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> ... and i never saw a single hang today in the 10 hours of uptime this 
> box has. (and it built a good 500 kernel today) Nor any hang yesterday, 
> and that was a good 500 kernels too.
> 
> You can see it that the box built more than two thousand kernels in the 
> past few days alone, so it's a rather busy little bee. The other 
> testboxes built even more kernels - a quad box built and booted 2500 
> kernels:
> 
>  #define UTS_VERSION "#2524 SMP PREEMPT Fri Jun 6 19:22:21 CEST 2008"
> 
> and i never saw a hang on that box either. 
> 
> a third box has:
> 
>  titan:~> uname -a
>  Linux titan 2.6.26-rc5-00002-g737697d-dirty #2557 SMP PREEMPT Fri Jun 6 
>  19:24:00 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> (this is the one that showed the hang for the first time)
> 
> The total count of kernel bootups i did this week for -tip QA was 
> somewhere between five and ten thousand random build+bootups - and the 
> only time i got a hang was when i removed the 3-patch-revert 
> intentionally, on one of the boxes.

...and you added the locking fix there instead? Or was this a removal?

> Maybe that 3-patch-revert just makes this locking bug a bit less likely 
> to trigger, by accident?

No, part of the DEFER_ACCEPT stuff was postponed in 2.6.25..2.6.26-rc1 
timeframe (ec3c0982a2dd1e671bad8e9d26c28dcba0039d87) so that one portion 
of it ended up being added outside of the socket lock of the listening 
socket, while touching its datastructures. Without 
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 the deferred accept related 
things happen earlier, ie., while we still are under the lock of the 
listening socket. So that particular locking bug was _introduced_ by that 
ec3c change, not made more likely or so.

...Of course software is known to have bugs, so we might always be 
(un?)lucky and hit another one and confuse... :-)

> Out tip test-setup is specialized to find 
> arch/x86 and scheduler bugs, not primarily to find networking bugs. (but 
> at this test volume, and given that it makes use of distcc, it will 
> trigger them too.)

It seems to work quite well actually for this kind of networking related 
bugs too which hardly depend on network at all :-).

> i have a rather accurate timeline of when the hang first occured, do we 
> know the timeline of the introduction of the locking bug by any chance? 
> Which commit introduced it? (Ilpo's commit log does not say it)

Ah, sorry I forgot to add that one there, it was sent quite late in the 
night and I just couldn't get sleep until sending the fix... :-) It was 
one of the reverted ones that did it:
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87.

> Your test results are compelling nevertheless so i'll do a retest in any 
> case, with all boxes either running an older kernel or a kernel with the 
> locking fix.

If you want an older kernel, you would have to go basically to 2.6.25 or 
so.

To summarize. Both 3changes+1fix revert (you refer to it only as 3-patch 
revert) _and_ the locking fix I made should fix the problem (obviously 
they exclude each other). ...And end which is significant is the one which 
has LISTENing sockets (please keep this in mind if you still get the hang
and provide some info).

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 18:19                                           ` Ilpo Järvinen
@ 2008-06-06 18:39                                             ` Ingo Molnar
  2008-06-06 19:49                                               ` Ilpo Järvinen
  2008-06-06 20:08                                               ` Patrick McManus
  0 siblings, 2 replies; 35+ messages in thread
From: Ingo Molnar @ 2008-06-06 18:39 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Patrick McManus, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol


* Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> If you want an older kernel, you would have to go basically to 2.6.25 
> or so.

correct, that's what i use as fallback, some distro kernel which is 
2.6.25 or older.

but i'm confused a bit, you say v2.6.25-rc6-475-gec3c098 introduced the 
locking problem - so 2.6.25 is affected as well? This is a significant 
question because the fallback kernel is kernel-2.6.25.3-18.fc9.x86_64 on 
the 16-way box. (all other build-boxes have 2.6.24 or older as a 
fallback kernel)

> To summarize. Both 3changes+1fix revert (you refer to it only as 
> 3-patch revert) _and_ the locking fix I made should fix the problem 
> (obviously they exclude each other). ...And end which is significant 
> is the one which has LISTENing sockets (please keep this in mind if 
> you still get the hang and provide some info).

ok.

For completeness, let me repeat the patch i referred to as the 
'3-patch-revert' below. (which indeed is 3+1 as you note)

this is the patch that appears to be working empirically. (Disclaimer: 
it might just hide the problem, change timings, have a lucky code 
layout, etc.)

	Ingo

----------------->
commit 9e5b6ca8fa8e07aceaddbfb0f2a324449e5ef014
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Date:   Sat May 31 15:18:56 2008 +0300

    tcp: revert DEFER_ACCEPT modifications
    
    On Sat, 31 May 2008, Ilpo Järvinen wrote:
    
    > On Sat, 31 May 2008, Ingo Molnar wrote:
    >
    > > ok, once i added back the localhost distcc component and the hung kernel
    > > build + stuck TCP socket bug happened again overnight:
    > >
    > >  Active Internet connections (w/o servers)
    > >  Proto Recv-Q Send-Q Local Address               Foreign Address             State
    > >  tcp    72187      0 10.0.1.14:3632              10.0.1.14:47910             ESTABLISHED
    > >  tcp        0 174464 10.0.1.14:47910             10.0.1.14:3632              ESTABLISHED
    > >
    > > so it seems distcc over localhost was the aspect that made it fail.
    > >
    > > _Perhaps_ what matters is to have the new post-rc3 TCP code on _both_
    > > sides of the connection. But that is just a theory - it could be timing,
    > > etc.
    >
    > Btw, does your distcc perhaps happen enable TCP_DEFER_ACCEPT (there were
    > some post 2.6.25 changes into it)?
    
    Hmm, if so, please try this revert below (I hope I got everything that is
    related there, gitk is currently broken for me so that viewing many
    commits quickly is a pain, so I might have accidently missed something)...
    
    Reverts these commits:
     [TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_t
     [TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack
     [TCP]: TCP_DEFER_ACCEPT updates - process as established
     tcp: Fix slab corruption with ipv6 and tcp6fuzz
    
    ..., ie.,
     539fae89bebd16ebeafd57a87169bc56eb530d76,
     e4c78840284f3f51b1896cf3936d60a6033c4d2c,
     ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 and
     9ae27e0adbf471c7a6b80102e38e1d5a346b3b38.
    
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 18e62e3..b31b6b7 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -239,11 +239,6 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
 	return (struct tcp_request_sock *)req;
 }
 
-struct tcp_deferred_accept_info {
-	struct sock *listen_sk;
-	struct request_sock *request;
-};
-
 struct tcp_sock {
 	/* inet_connection_sock has to be the first member of tcp_sock */
 	struct inet_connection_sock	inet_conn;
@@ -379,8 +374,6 @@ struct tcp_sock {
 	unsigned int		keepalive_intvl;  /* time interval between keep alive probes */
 	int			linger2;
 
-	struct tcp_deferred_accept_info defer_tcp_accept;
-
 	unsigned long last_synq_overflow; 
 
 	u32	tso_deferred;
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index b220b5f..0c96e7b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -115,8 +115,8 @@ struct request_sock_queue {
 	struct request_sock	*rskq_accept_head;
 	struct request_sock	*rskq_accept_tail;
 	rwlock_t		syn_wait_lock;
-	u16			rskq_defer_accept;
-	/* 2 bytes hole, try to pack */
+	u8			rskq_defer_accept;
+	/* 3 bytes hole, try to pack */
 	struct listen_sock	*listen_opt;
 };
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 633147c..981d5ba 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -139,7 +139,6 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define MAX_TCP_KEEPINTVL	32767
 #define MAX_TCP_KEEPCNT		127
 #define MAX_TCP_SYNCNT		127
-#define MAX_TCP_ACCEPT_DEFERRED 65535
 
 #define TCP_SYNQ_INTERVAL	(HZ/5)	/* Period of SYNACK timer */
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 828ea21..ec83448 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -419,7 +419,8 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 	struct inet_connection_sock *icsk = inet_csk(parent);
 	struct request_sock_queue *queue = &icsk->icsk_accept_queue;
 	struct listen_sock *lopt = queue->listen_opt;
-	int thresh = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries;
+	int max_retries = icsk->icsk_syn_retries ? : sysctl_tcp_synack_retries;
+	int thresh = max_retries;
 	unsigned long now = jiffies;
 	struct request_sock **reqp, *req;
 	int i, budget;
@@ -455,6 +456,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		}
 	}
 
+	if (queue->rskq_defer_accept)
+		max_retries = queue->rskq_defer_accept;
+
 	budget = 2 * (lopt->nr_table_entries / (timeout / interval));
 	i = lopt->clock_hand;
 
@@ -462,8 +466,9 @@ void inet_csk_reqsk_queue_prune(struct sock *parent,
 		reqp=&lopt->syn_table[i];
 		while ((req = *reqp) != NULL) {
 			if (time_after_eq(now, req->expires)) {
-				if (req->retrans < thresh &&
-				    !req->rsk_ops->rtx_syn_ack(parent, req)) {
+				if ((req->retrans < thresh ||
+				     (inet_rsk(req)->acked && req->retrans < max_retries))
+				    && !req->rsk_ops->rtx_syn_ack(parent, req)) {
 					unsigned long timeo;
 
 					if (req->retrans++ == 0)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ab66683..fc54a48 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2112,12 +2112,15 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		break;
 
 	case TCP_DEFER_ACCEPT:
-		if (val < 0) {
-			err = -EINVAL;
-		} else {
-			if (val > MAX_TCP_ACCEPT_DEFERRED)
-				val = MAX_TCP_ACCEPT_DEFERRED;
-			icsk->icsk_accept_queue.rskq_defer_accept = val;
+		icsk->icsk_accept_queue.rskq_defer_accept = 0;
+		if (val > 0) {
+			/* Translate value in seconds to number of
+			 * retransmits */
+			while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
+			       val > ((TCP_TIMEOUT_INIT / HZ) <<
+				       icsk->icsk_accept_queue.rskq_defer_accept))
+				icsk->icsk_accept_queue.rskq_defer_accept++;
+			icsk->icsk_accept_queue.rskq_defer_accept++;
 		}
 		break;
 
@@ -2299,7 +2302,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 			val = (val ? : sysctl_tcp_fin_timeout) / HZ;
 		break;
 	case TCP_DEFER_ACCEPT:
-		val = icsk->icsk_accept_queue.rskq_defer_accept;
+		val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 :
+			((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1));
 		break;
 	case TCP_WINDOW_CLAMP:
 		val = tp->window_clamp;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index eba873e..cad73b7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4541,49 +4541,6 @@ static void tcp_urg(struct sock *sk, struct sk_buff *skb, struct tcphdr *th)
 	}
 }
 
-static int tcp_defer_accept_check(struct sock *sk)
-{
-	struct tcp_sock *tp = tcp_sk(sk);
-
-	if (tp->defer_tcp_accept.request) {
-		int queued_data =  tp->rcv_nxt - tp->copied_seq;
-		int hasfin =  !skb_queue_empty(&sk->sk_receive_queue) ?
-			tcp_hdr((struct sk_buff *)
-				sk->sk_receive_queue.prev)->fin : 0;
-
-		if (queued_data && hasfin)
-			queued_data--;
-
-		if (queued_data &&
-		    tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
-			if (sock_flag(sk, SOCK_KEEPOPEN)) {
-				inet_csk_reset_keepalive_timer(sk,
-							       keepalive_time_when(tp));
-			} else {
-				inet_csk_delete_keepalive_timer(sk);
-			}
-
-			inet_csk_reqsk_queue_add(
-				tp->defer_tcp_accept.listen_sk,
-				tp->defer_tcp_accept.request,
-				sk);
-
-			tp->defer_tcp_accept.listen_sk->sk_data_ready(
-				tp->defer_tcp_accept.listen_sk, 0);
-
-			sock_put(tp->defer_tcp_accept.listen_sk);
-			sock_put(sk);
-			tp->defer_tcp_accept.listen_sk = NULL;
-			tp->defer_tcp_accept.request = NULL;
-		} else if (hasfin ||
-			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
-			tcp_reset(sk);
-			return -1;
-		}
-	}
-	return 0;
-}
-
 static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
@@ -4944,8 +4901,6 @@ step5:
 
 	tcp_data_snd_check(sk);
 	tcp_ack_snd_check(sk);
-
-	tcp_defer_accept_check(sk);
 	return 0;
 
 csum_error:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index cd601a8..b698b5b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1918,14 +1918,6 @@ int tcp_v4_destroy_sock(struct sock *sk)
 		sk->sk_sndmsg_page = NULL;
 	}
 
-	if (tp->defer_tcp_accept.request) {
-		reqsk_free(tp->defer_tcp_accept.request);
-		sock_put(tp->defer_tcp_accept.listen_sk);
-		sock_put(sk);
-		tp->defer_tcp_accept.listen_sk = NULL;
-		tp->defer_tcp_accept.request = NULL;
-	}
-
 	atomic_dec(&tcp_sockets_allocated);
 
 	return 0;
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 019c8c1..8245247 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -571,8 +571,10 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 	   does sequence test, SYN is truncated, and thus we consider
 	   it a bare ACK.
 
-	   Both ends (listening sockets) accept the new incoming
-	   connection and try to talk to each other. 8-)
+	   If icsk->icsk_accept_queue.rskq_defer_accept, we silently drop this
+	   bare ACK.  Otherwise, we create an established connection.  Both
+	   ends (listening sockets) accept the new incoming connection and try
+	   to talk to each other. 8-)
 
 	   Note: This case is both harmless, and rare.  Possibility is about the
 	   same as us discovering intelligent life on another plant tomorrow.
@@ -640,6 +642,13 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 		if (!(flg & TCP_FLAG_ACK))
 			return NULL;
 
+		/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
+		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
+		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+			inet_rsk(req)->acked = 1;
+			return NULL;
+		}
+
 		/* OK, ACK is valid, create big socket and
 		 * feed this segment to it. It will repeat all
 		 * the tests. THIS SEGMENT MUST MOVE SOCKET TO
@@ -678,24 +687,7 @@ struct sock *tcp_check_req(struct sock *sk,struct sk_buff *skb,
 		inet_csk_reqsk_queue_unlink(sk, req, prev);
 		inet_csk_reqsk_queue_removed(sk, req);
 
-		if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
-		    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
-
-			/* the accept queue handling is done is est recv slow
-			 * path so lets make sure to start there
-			 */
-			tcp_sk(child)->pred_flags = 0;
-			sock_hold(sk);
-			sock_hold(child);
-			tcp_sk(child)->defer_tcp_accept.listen_sk = sk;
-			tcp_sk(child)->defer_tcp_accept.request = req;
-
-			inet_csk_reset_keepalive_timer(child,
-						       inet_csk(sk)->icsk_accept_queue.rskq_defer_accept * HZ);
-		} else {
-			inet_csk_reqsk_queue_add(sk, req, child);
-		}
-
+		inet_csk_reqsk_queue_add(sk, req, child);
 		return child;
 
 	listen_overflow:
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 4de68cf..63ed9d6 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -489,11 +489,6 @@ static void tcp_keepalive_timer (unsigned long data)
 		goto death;
 	}
 
-	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
-		tcp_send_active_reset(sk, GFP_ATOMIC);
-		goto death;
-	}
-
 	if (!sock_flag(sk, SOCK_KEEPOPEN) || sk->sk_state == TCP_CLOSE)
 		goto out;
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 18:39                                             ` Ingo Molnar
@ 2008-06-06 19:49                                               ` Ilpo Järvinen
  2008-06-06 20:08                                               ` Patrick McManus
  1 sibling, 0 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-06 19:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Patrick McManus, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3643 bytes --]

On Fri, 6 Jun 2008, Ingo Molnar wrote:

> 
> * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> 
> > If you want an older kernel, you would have to go basically to 2.6.25 
> > or so.
> 
> correct, that's what i use as fallback, some distro kernel which is 
> 2.6.25 or older.
> 
> but i'm confused a bit, you say v2.6.25-rc6-475-gec3c098 introduced the 
> locking problem - so 2.6.25 is affected as well?

No, you're probably just falling into a git-describe trap I also used
to fall:

ijjarvin@pointhope:~/linux/mainline$ git-log -n 1 --pretty=oneline 
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 ^v2.6.25 | cat -
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 [TCP]: TCP_DEFER_ACCEPT updates - 
process as established
ijjarvin@pointhope:~/linux/mainline$ git-log -n 1 --pretty=oneline 
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 ^v2.6.26-rc1 | cat -
ijjarvin@pointhope:~/linux/mainline$ git-describe 
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
v2.6.25-rc6-475-gec3c098
ijjarvin@pointhope:~/linux/mainline$ 

The git-describe is not the way one can determine into which mainline
tag a commit was included, it basically just provides the closest tag 
among ancestors, which can be a vastly different one and has _no_ 
relation whatsoever to the tag we'd desire to get. In here, Dave had 
net-2.6 based on 2.5.25-rc6ish (or alternatively last merge to net-2.6 
from Linus' tree's content came from that point of time), but Linus did 
the merge from 2.6.25 but git-describe won't look anything that happens 
after the asked commit. This is similar to the 
bisect-lands-lower-tag-than-select-good-commit-was "mystery" that was 
recently discussed extensively, again the Makefile only tracks ancestors, 
not the future.

If somebody knows a trivial command to get that future information (to
where merged info), I'd pretty interested to hear.

> This is a significant 
> question because the fallback kernel is kernel-2.6.25.3-18.fc9.x86_64 on 
> the 16-way box. (all other build-boxes have 2.6.24 or older as a 
> fallback kernel)

Please do get the receiver state if you still see such problem with it, 
it is also relevant but it a different problem then (I'm yet to analyze 
the data Håkan was collecting, dl it already by didn't even look into 
that yet).

...Or also if you see stuck TCPs with other cases I've told should fix it:

1. 2.6.25 (pre-ec3c to be accurate)
2. 3+1 revert
3. ec3c+locking fix (this is the most unsure one because it still would 
have the reversed socket lock taking order though nothing bad has been 
found by some review neither by me nor Patrick)

Please collect at least /proc/net/tcp and the netstat -np, if there's 
process associated to the flow with _Recv-Q_ (in localhost case there 
are two of them, the other with Send-Q), also where the process is
waiting is useful. Hopefully clear enough now... :-)

> > To summarize. Both 3changes+1fix revert (you refer to it only as 
> > 3-patch revert) _and_ the locking fix I made should fix the problem 
> > (obviously they exclude each other). ...And end which is significant 
> > is the one which has LISTENing sockets (please keep this in mind if 
> > you still get the hang and provide some info).
> 
> ok.
> 
> For completeness, let me repeat the patch i referred to as the 
> '3-patch-revert' below. (which indeed is 3+1 as you note)

...I know because there never have been any 3-patch-revert made... :-)

> this is the patch that appears to be working empirically. (Disclaimer: 
> it might just hide the problem, change timings, have a lucky code 
> layout, etc.)

Sure, but the revert also removes the obvious locking problem that was 
introduced in ec3c.

--
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 18:39                                             ` Ingo Molnar
  2008-06-06 19:49                                               ` Ilpo Järvinen
@ 2008-06-06 20:08                                               ` Patrick McManus
  2008-06-06 21:12                                                 ` Ilpo Järvinen
  2008-06-10 22:49                                                 ` David Miller
  1 sibling, 2 replies; 35+ messages in thread
From: Patrick McManus @ 2008-06-06 20:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ilpo Järvinen, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

This is all a bit confusing, but here are the conclusions I have drawn.

There definitely is a problem with the locking of the DA commit
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 . That code was part of 26-rc1
but it never appeared in 25. It exists in pretty much the same form in
rc5 (there was 1 patch to it over that time to fix a different problem).

We're certain this code has a problem with the accept queue both because
of code inspection and the fact that Ingo can back it out (as the
significant part of the 3-patch revert) and the problem goes away in his
testing.

I have run tests that can reproduce the hung socket with distcc over
localhost using 26-rc5. I can also apparently cure it using the locking
fix patch Ilpo sent (c9454f0..d21d2b9) on top of that. (My test of rc5
+lockpatch is at 4.5+ hrs and counting without failures, it fails 6
times an hour with vanilla rc5)

Based on all of that, the right thing to do seems to be to apply the
lockpatch (c9454f0..d21d2b9) to Linus's tree and not revert anything -
just fix the code and I'll send Ilpo and Ingo cookies at Christmas time
for being  great guys. Alternatively, Ingo could run the distcc servers
and clients on -tip with the lockpatch (nothing reverted) for more
testing.

The only lingering problem is Ingo's report yesterday
http://marc.info/?l=linux-netdev&m=121267587715976&w=2
of a distcc hang. In this one it was not over localhost and the distcc
server had the ec3c DA changes totally reverted. (The server is really
the only stack that matters in this case - the client is not impacted by
the DA changes). This has to be a different issue, because the ec3c code
we're talking about here wasn't on the server at all. As Ilpo mentions,
Hakon is beleived to have a different problem and maybe you've tripped
over that too?

If we're sure of that conclusion we should just take Ilpo's DA patch as
that will narrow the field for finding Hakon's issue. Its just with all
of these data points I'm not sure if I'm reaching the right conclusion.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 20:08                                               ` Patrick McManus
@ 2008-06-06 21:12                                                 ` Ilpo Järvinen
  2008-06-06 21:23                                                   ` Arjan van de Ven
  2008-06-10 22:49                                                 ` David Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-06 21:12 UTC (permalink / raw)
  To: Patrick McManus, Arjan van de Ven
  Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4077 bytes --]

...added Arjan.

On Fri, 6 Jun 2008, Patrick McManus wrote:

> This is all a bit confusing, but here are the conclusions I have drawn.

Your observations here match what I've understood :-).

> There definitely is a problem with the locking of the DA commit
> ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 . That code was part of 26-rc1
> but it never appeared in 25. It exists in pretty much the same form in
> rc5 (there was 1 patch to it over that time to fix a different problem).
>
> We're certain this code has a problem with the accept queue both because
> of code inspection and the fact that Ingo can back it out (as the
> significant part of the 3-patch revert) and the problem goes away in his
> testing.

Problems were at least these:
- Accept queue addition was racy and could leave dangling items
- Dangling items caused inconsistent sk_ack_backlog
- Checking for still in LISTEN state was racy, could be changed after
  the check was made (shouldn't happen with distcc though)

I didn't read ->sk_data_ready that carefully, it could have some 
additional problems that are not listed (but they all should be fixed
by the added locking anyway).

AFAICT, rest of that ec3c change is safe wrt. locking, just holding sk is 
enough for the rest and those bits mostly shouldn't anyway be executed 
with a distcc setup.

> I have run tests that can reproduce the hung socket with distcc over
> localhost using 26-rc5. I can also apparently cure it using the locking
> fix patch Ilpo sent (c9454f0..d21d2b9) on top of that. (My test of rc5
> +lockpatch is at 4.5+ hrs and counting without failures, it fails 6
> times an hour with vanilla rc5)
>
> Based on all of that, the right thing to do seems to be to apply the
> lockpatch (c9454f0..d21d2b9) to Linus's tree and not revert anything -
> just fix the code and I'll send Ilpo and Ingo cookies at Christmas time
> for being  great guys. Alternatively, Ingo could run the distcc servers
> and clients on -tip with the lockpatch (nothing reverted) for more
> testing.

Anyway, we still would have an option to revert both the DA change + the 
locking fix later if the problem is still clearly more likely than with 
stable-2.6.25.

> The only lingering problem is Ingo's report yesterday
> http://marc.info/?l=linux-netdev&m=121267587715976&w=2
> of a distcc hang. In this one it was not over localhost and the distcc
> server had the ec3c DA changes totally reverted. (The server is really
> the only stack that matters in this case - the client is not impacted by
> the DA changes).

It definately didn't fit to picture that well if we would be talking just 
a single bug here.

...I wish Ingo would have provided the receiver state already then. :-)

> This has to be a different issue, because the ec3c code
> we're talking about here wasn't on the server at all. As Ilpo mentions,
> Hakon is beleived to have a different problem and maybe you've tripped
> over that too?

...The Håkon's case is definately different thing, also the symptoms 
are quite different because there's no deadlock at all but the TCP flow 
eventually dies, I don't yet know with what timescale that dying happens.
Only common denominator actually was this receiver process missing, though 
it provably still was there.

Besides, I don't know how long Ingo waited in this case until concluding 
that the TCP was stuck again?

> If we're sure of that conclusion we should just take Ilpo's DA patch as
> that will narrow the field for finding Hakon's issue. Its just with all
> of these data points I'm not sure if I'm reaching the right conclusion.

Lets widen the scope to two to three bugs then, one down already...

In case you missed btw, also Arjan reported some problem quite early, but 
in his case claws mua+imap was the workload, so I doubt that DEFER_ACCEPT 
would be involved but who knows without strace, here:
  http://marc.info/?l=linux-kernel&m=121182171000434&w=2

Arjan, can you please check if your workload uses setsockopt 
TCP_DEFER_ACCEPT for the LISTENing socket? ...If not, then your case
is different from Ingo's.


-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 21:12                                                 ` Ilpo Järvinen
@ 2008-06-06 21:23                                                   ` Arjan van de Ven
  2008-06-06 21:28                                                     ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: Arjan van de Ven @ 2008-06-06 21:23 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Patrick McManus, Ingo Molnar, David Miller, peterz, LKML, Netdev,
	rjw, Andrew Morton, johnpol

On Sat, 7 Jun 2008 00:12:55 +0300 (EEST)
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:
> Arjan, can you please check if your workload uses setsockopt 
> TCP_DEFER_ACCEPT for the LISTENing socket? ...If not, then your case
> is different from Ingo's.

at this point the observed hung I had seems to be the app doing
something bad (it's the app that hung) so for now please disregard my
previous comments on this topic.

> 
> 


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 21:23                                                   ` Arjan van de Ven
@ 2008-06-06 21:28                                                     ` Ilpo Järvinen
  0 siblings, 0 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-06 21:28 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Patrick McManus, Ingo Molnar, David Miller, peterz, LKML, Netdev,
	rjw, Andrew Morton, johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 530 bytes --]

On Fri, 6 Jun 2008, Arjan van de Ven wrote:

> On Sat, 7 Jun 2008 00:12:55 +0300 (EEST)
> "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote:
> > Arjan, can you please check if your workload uses setsockopt 
> > TCP_DEFER_ACCEPT for the LISTENing socket? ...If not, then your case
> > is different from Ingo's.
> 
> at this point the observed hung I had seems to be the app doing
> something bad (it's the app that hung) so for now please disregard my
> previous comments on this topic.

...Ok, thanks for letting us know.

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 20:08                                               ` Patrick McManus
  2008-06-06 21:12                                                 ` Ilpo Järvinen
@ 2008-06-10 22:49                                                 ` David Miller
  1 sibling, 0 replies; 35+ messages in thread
From: David Miller @ 2008-06-10 22:49 UTC (permalink / raw)
  To: mcmanus
  Cc: mingo, ilpo.jarvinen, peterz, linux-kernel, netdev, rjw, akpm,
	johnpol

From: Patrick McManus <mcmanus@ducksong.com>
Date: Fri, 06 Jun 2008 16:08:57 -0400

> If we're sure of that conclusion we should just take Ilpo's DA patch as
> that will narrow the field for finding Hakon's issue.

While the conclusion might be accurate, Ilpo's patch adds a lock
ordering problem that might introduce potential deadlocks (worst
case) or at least some lockdep warnings.  Both of which are
unacceptable.

I personally don't see any clear solution, and because this new
DA code moves an operation into a code block where the locking
order pretty much has to be inverted, I don't see any real short
term solution other than a revert of this particular DA commit.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-06 17:11                                       ` Patrick McManus
  2008-06-06 17:33                                         ` Ingo Molnar
@ 2008-06-06 18:25                                         ` Ilpo Järvinen
  1 sibling, 0 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-06 18:25 UTC (permalink / raw)
  To: Patrick McManus
  Cc: Ingo Molnar, David Miller, peterz, LKML, Netdev, rjw,
	Andrew Morton, johnpol

On Fri, 6 Jun 2008, Patrick McManus wrote:

> > This Ingo's testcase should anyway be quite "simple", I mean that distcc 
> > shouldn't do anything unexpected in a sense it shouldn't abort the flows 
> > by not sending data, close the listening socket or other things like that.
> 
> maybe - I've noted that I can get the distcc server to crash with just a
> little fuzz (telnet to it and close the telnet) - but it is true I
> haven't seen anything odd using the distcc client.

In addition I think I've also seen some bits floating around that 
occassionally distcc does something weird in a correct setup too.

I briefly looked how distcc behaved while doing the stress_accept. Distcc 
basically seems to have n processes each accept()ing and some kind of 
memleak killer by limiting number of successive accepts then exit, while 
the parent who did the listen is only periodically (had some sleep(1)s) 
collecting dead ones & respawning them.

> Anyhow, my news is that using rc5 I have managed to reproduce it on
> localhost - so it isn't just ingo anymore ! ;)

Also Peter Z has reported it earlier, it was distcc+localhost for him as 
well.

> and has intentionally broken dependencies so it just keeps recompiling 
> stuff.

...Trying to invent perpetual motion machine? :-/

> The input files are
> approximately 135k, 98k, and 16k after running gcc -E on them (which I
> what I assume distcc does before putting them down the socket).
>
> On rc5 I could get the lockup in under 20 minutes.. usually 10. I think
> I did it 4 times. My compile test is probably a better trigger than the
> kernel compile because the distcc connects are never staggered like they
> would be in a large directory of files. (3 files, -j4).

It could be even easier if you make next in path gcc to play with 
nice, trying a number of different values might reveal some really fast 
to reproduce scenario.

> When I apply the locking patch you (Ilpo) wrote, I cannot reproduce the
> error at all in the first 90 minutes of testing. I'll let the test run
> and update the list.

At least it helps some :-), like it should.

> I'm holding out hope that Ingo's report did not have the locking patch
> on the distcc server end - because it certainly makes a difference for 
> me.

...He had some issue with different versions being deployed at least in 
the past, and I failed to follow his latest answer :-).

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 23:22                           ` Ilpo Järvinen
                                               ` (2 preceding siblings ...)
  2008-06-05 14:22                             ` Ingo Molnar
@ 2008-06-10 22:32                             ` David Miller
  2008-06-11 13:10                               ` Patrick McManus
  2008-06-11 15:13                               ` Ilpo Järvinen
  3 siblings, 2 replies; 35+ messages in thread
From: David Miller @ 2008-06-10 22:32 UTC (permalink / raw)
  To: ilpo.jarvinen
  Cc: mingo, mcmanus, peterz, linux-kernel, netdev, rjw, akpm, johnpol

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 4 Jun 2008 02:22:16 +0300 (EEST)

> [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk
> 
> It seems that replacement of DA code also moved parts outside
> of appropriate locking. The Ingo's problem seems to come from
> the fact that two flows could now race in
> (inet_csk_)reqsk_queue_add corrupting the queue. ...This can
> leave dangling socks around which won't resolve themselves
> without stimuli from outside (e.g., external RST would help
> I think).
> 
> Then some details I'm not too sure of:
> I guess we want to put listen_sk->sk_state checking under the
> lock as well. I've not evaluated if ->sk_data_ready too
> requires locking but assumed it does.
> 
> I'm by no means familiar with all locking variants, requirements,
> etc.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

I took a close look at this, it seems this patch adds
an ABBA deadlock.  But I might be wrong.

Normally the locking order is:

	listen_sk --> some_child_sk

And this can be seen by the code paths that flow down
to tcp_child_process() in net/ipv4/tcp_minisocks.c (via
tcp_v4_do_rcv() for example).

However here in this patch we will lock in the:

	child_sk --> listen_sk

order.

Unless... these defer accepted sockets live outside of the
normal socket collection found by tcp_v{4,6}_hnd_req().  If
that is the case, that ought to make this locking order OK
but I fear that lockdep will likely complain because it has
no way to see this distinction.

If we cannot find a simple and easy way to deal with this locking
problem, I am reverting the defer-accept changes entirely.  It's not
the end of the world if this feature has to be deferred to 2.6.27
and this bug has been known for several weeks already.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-10 22:32                             ` David Miller
@ 2008-06-11 13:10                               ` Patrick McManus
  2008-06-11 15:13                               ` Ilpo Järvinen
  1 sibling, 0 replies; 35+ messages in thread
From: Patrick McManus @ 2008-06-11 13:10 UTC (permalink / raw)
  To: David Miller
  Cc: ilpo.jarvinen, mingo, peterz, linux-kernel, netdev, rjw, akpm,
	johnpol, vgusev

On Tue, 2008-06-10 at 15:32 -0700, David Miller wrote:

> If we cannot find a simple and easy way to deal with this locking
> problem, I am reverting the defer-accept changes entirely.  It's not
> the end of the world if this feature has to be deferred to 2.6.27

based on this, plus the comments from vitaliy it is clear that moving
the DA socket to established is pretty tricky. I agree I should address
these things in the >= 27 time frame.

imho we don't need to revert the first two patches:
e4c78840284f3f51b1896cf3936d60a6033c4d2c and
539fae89bebd16ebeafd57a87169bc56eb530d76 .. they have nothing to do with
rearranging states or the locking and they do fix the more notable DA
behavior issues.

-Patrick

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-10 22:32                             ` David Miller
  2008-06-11 13:10                               ` Patrick McManus
@ 2008-06-11 15:13                               ` Ilpo Järvinen
  1 sibling, 0 replies; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-11 15:13 UTC (permalink / raw)
  To: David Miller
  Cc: mingo, mcmanus, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5129 bytes --]

On Tue, 10 Jun 2008, David Miller wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Wed, 4 Jun 2008 02:22:16 +0300 (EEST)
> 
> > [PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk
> > 
> > It seems that replacement of DA code also moved parts outside
> > of appropriate locking. The Ingo's problem seems to come from
> > the fact that two flows could now race in
> > (inet_csk_)reqsk_queue_add corrupting the queue. ...This can
> > leave dangling socks around which won't resolve themselves
> > without stimuli from outside (e.g., external RST would help
> > I think).
> > 
> > Then some details I'm not too sure of:
> > I guess we want to put listen_sk->sk_state checking under the
> > lock as well. I've not evaluated if ->sk_data_ready too
> > requires locking but assumed it does.
> > 
> > I'm by no means familiar with all locking variants, requirements,
> > etc.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> I took a close look at this, it seems this patch adds
> an ABBA deadlock.  But I might be wrong.
> 
> Normally the locking order is:
> 
> 	listen_sk --> some_child_sk
> 
> And this can be seen by the code paths that flow down
> to tcp_child_process() in net/ipv4/tcp_minisocks.c (via
> tcp_v4_do_rcv() for example).
> 
> However here in this patch we will lock in the:
> 
> 	child_sk --> listen_sk
> 
> order.
>
> Unless... these defer accepted sockets live outside of the
> normal socket collection found by tcp_v{4,6}_hnd_req().

All these collection/queues make the analysis a bit complex :-), but
I think I finally got a hold of it anyway and my analysis agrees with 
yours. To me it seems that when DAed TCP sk moves to established, there's 
no longer a connection from listen_sk to child_sk, we keep them alive by 
incrementing those refcounts and the child sk is responsible about the 
binding between them (until the DA gets resolution). But the child_sk 
locking during the creation is also a suspect, but only after 
tcp_rcv_established() part can run for the child sock.

Am I right that after tcp_v4_syn_recv_sock() it is possible to get 
tcp_rcv_established() to execute for the created child? It's called 
by this path:
  tcp_v4_hnd_req ->
    tcp_check_req -> 
      tcp_v4_syn_recv_sock (inet_csk(sk)->icsk_af_ops->syn_recv_sock)
...and the listen_sk part acquires that child's lock only after that in 
tcp_v4_hnd_req(). ...Allowing it to deadlock during that short window!

...But from that point onward nothing in listen_sk needs to lock the 
child. Then this connection from listen_sk to child_sk comes back once 
we've reinserted the DAed child_sk back to the queue (in that racy part I 
was trying to fix) but at that point of time we won't ever need to lock in 
child_sk -> listen_sk order again because we've already passed that 
point.

...But like you, I don't understand all of it that well... Btw, your 
sk_callback_lock notes explained some mystery part for me as I came
across with it too while looking around... :-)

> If that is the case, that ought to make this locking order OK
> but I fear that lockdep will likely complain because it has
> no way to see this distinction.

I definately agree that the locks are taken in different order no matter 
what (I actually referring to that already in the first mail about the 
deadlock) so probably lockdep is not going to be too friendly :-), whether 
it could cause a real deadlock, I was not that sure at that point of time.

> If we cannot find a simple and easy way to deal with this locking
> problem, I am reverting the defer-accept changes entirely.

To avoid all problem, I was already thinking of another approach (though 
my time constraints hardly allows finishing it any time soon):

Because DA code resides quite late on the TCP path it would be quite easy
to do some preparatory work, drop child sk's lock and re-acquire both 
locks in the usual order (listen->child) but that would require handling 
correctly reset & other things that could intervene (luckily that pesky 
userspace isn't there yet to mess around so not that many things can 
happen :-)). But ipv6 seems to do some additional processing after 
tcp_rcv_established() which I'd expect to choke if child sk's lock was 
dropped there for a moment, while ipv4 part seems quite doable. I don't 
even know what exactly are the requirements for that ipv6 part (see the 
stuff after ipv6_pktoptions label in tcp_v6_do_rcv). ...I tried to do
this but came across those two things mentioned above.

IMHO, changing locking this late in the release cycle would be quite
risky anyway... And we would also be fiddling with TCP state machine.

> It's not the end of the world if this feature has to be deferred to 
> 2.6.27

Agreed. Especially in the light of the another issue that has been 
raised.

> and this bug has been known for several weeks already.

...That's partially because Ingo didn't even test my fix on the receiver 
which got stuck but used 2.6.25 and got some other bug which looked alike 
but couldn't be the same because these DA problems weren't in 2.6.25. What 
could I've done for that :-).

-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-03 21:46                     ` Ilpo Järvinen
  2008-06-03 22:01                       ` Ilpo Järvinen
@ 2008-06-04  7:23                       ` Ingo Molnar
  2008-06-04 18:24                         ` David Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Ingo Molnar @ 2008-06-04  7:23 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Peter Zijlstra, LKML, Netdev, David S. Miller, Rafael J. Wysocki,
	Andrew Morton, Evgeniy Polyakov, Patrick McManus

* Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:

> > > i'll queue up your reverts for testing in -tip.
> > 
> > update: your 3 reverts in tip/out-of-tree [commit dad98991c] definitely 
> > fixed the hangs!
> 
> ...It wasn't exactly out-of-tree, Evgeniy fixed a problem that was 
> found in "TCP_DEFER_ACCEPT updates - process as established", perhaps 
> it just wasn't in your testing tree yet.

out of the -tip tree :-) The -tip tree has 75+ topic branches at the 
moment, but TCP topics are not in its scope - so any TCP change is "out 
of tree" for the -tip tree.

People got confused in the past when they saw similar test patches show 
up in sched.git and x86.git before, so we wanted to make it very clear 
in -tip (with is the successor of sched.git, x86.git and a couple of 
other git trees) that these are commits we dont want to push anywhere. 

Commits in tip/out-of-tree dont get propagated into the tip/auto-*-next 
topic branches that linux-next and -mm picks up, they are purely a 
courtesy to help the testing/fixing of bugs in subsystems that are 
maintained in other git trees.

See attached below the current shortlog of the tip/out-of-tree topic 
branch - it contains changes all around the tree for various things that 
we triggered in -tip and are not yet upstream or are in flight somewhere 
in another git tree.

> > Here is the testing i did:
> > 
> > first i ran about 500+ successful iterations on the affected 
> > testboxes with your revert patch applied, on multiple systems.
> 
> Are you sure this is enough to conclude the results? Seems quite small 
> number to me to rule out luck. Especially considering that it was some 
> amount of time in the tree already until you noticed it for the first 
> time.

a full day of testing on a testsystem with 500 random kernel builds and 
bootups (the kernel build done on the testsystem utilizing distcc and 
make -j100, so it's rather heavy and parallel TCP traffic per iteration) 
with no hang, compared to the same system with your reverts not applied 
that hung after an hour with 20-30 iterations.

And that count increased to 1000 successful test iterations since 
yesterday.

So i think yes, it seems rather conclusive, given the circumstances ;-) 

These random kernel boots found many 'impossible to trigger' bugs and 
races in the past. The reason for its race finding capability is the 
timing randomness of the resulting random kernel image: the delays 
caused by random combination of debugging facilities, build variants, 
kernel subsystem variants we have. This -tip qa method - as a 
side-effect of its coverage testing - simulates timing variantions that 
are otherwise only observable via hardware variations.

I.e. this is not the same kernel booted up a 1000 times - that would be 
a very narrow test. This is 1000 _different_ kernels built and booted 
up. Each kernel having subtly different timings and ordering. And it's 
more than just externally injected random kernel: the test-system itself 
builds its "next version" (and uses the network for that as well), so 
it's a self-hosting recursive random test in essence.

This method is also amazingly good at finding compiler/linker trouble: 
it found 3-4 real gcc bugs so far. (For example i triggered an ancient 
bug in gcc 4.0.2 just yesterday. For the record, the testsystem with the 
TCP hang utilizes gcc-4.2.2.)

> > so i hereby conclude that your revert works :) I've repeated the 
> > commit below that resolves this nasty regression.
> 
> ...I couldn't immediately find anything obviously wrong with those 
> changes but the patch below might be worth of a try (without the 
> revert of course). If it ever spits out that WARN_ON for you, we were 
> playing with fire too much and it's better to return on the safe side 
> there...

i'll queue it up for testing, but no promises about speedy action here - 
the test cycle is really long with this bug.

	Ingo

------{ tip/out-of-tree shortlog: }----------->

Alexander van Heukelum (1):
      uml: cleanup: use def_bool in Kconfig files

Bjorn Helgaas (1):
      PNPACPI: use _CRS IRQ descriptor length for _SRS

Ilpo Järvinen (1):
      tcp: revert DEFER_ACCEPT modifications

Ingo Molnar (7):
      video/dvb: fix MEDIA_TUNER && FW_LOADER build error
      dvb: input layer dependencies fixes
      drivers/media/video build fix for modular builds
      drivers/watchdog/geodewdt.c: build fix
      USB: fix build bug in USB_ISIGHTFW
      acpi-acpi_numa_init-build-fix
      acpi: fix drivers/acpi/glue.c build error

Michael Krufky (1):
      dib7000p: fix dib7000p_attach when !CONFIG_DVB_DIB7000P

Russ Anderson (1):
      acpi: fix boot breakage on Altix

Yinghai Lu (2):
      net: use numa_node in net_devcice->dev instead of parent
      ide: use dev_to_node instead of pcibus_to_node

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-04  7:23                       ` Ingo Molnar
@ 2008-06-04 18:24                         ` David Miller
  2008-06-04 20:56                           ` Ilpo Järvinen
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2008-06-04 18:24 UTC (permalink / raw)
  To: mingo
  Cc: ilpo.jarvinen, peterz, linux-kernel, netdev, rjw, akpm, johnpol,
	mcmanus

From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 4 Jun 2008 09:23:11 +0200

> * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
>
> > ...I couldn't immediately find anything obviously wrong with those 
> > changes but the patch below might be worth of a try (without the 
> > revert of course). If it ever spits out that WARN_ON for you, we were 
> > playing with fire too much and it's better to return on the safe side 
> > there...
> 
> i'll queue it up for testing, but no promises about speedy action here - 
> the test cycle is really long with this bug.

Ilpo posted another patch which fixes a locking bug in the
code, please test with that patch.  I include it below so
that you know exactly which one I am referring to.

The quicker you test this, the faster I can merge it to
Linus and get the bug fixed for good.

[PATCH] tcp DEFER_ACCEPT: fix racy access to listen_sk

It seems that replacement of DA code also moved parts outside
of appropriate locking. The Ingo's problem seems to come from
the fact that two flows could now race in
(inet_csk_)reqsk_queue_add corrupting the queue. ...This can
leave dangling socks around which won't resolve themselves
without stimuli from outside (e.g., external RST would help
I think).

Then some details I'm not too sure of:
I guess we want to put listen_sk->sk_state checking under the
lock as well. I've not evaluated if ->sk_data_ready too
requires locking but assumed it does.

I'm by no means familiar with all locking variants, requirements,
etc.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |   23 +++++++++++++----------
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c9454f0..d21d2b9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4562,6 +4562,7 @@ static int tcp_defer_accept_check(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	if (tp->defer_tcp_accept.request) {
+		struct sock *listen_sk = tp->defer_tcp_accept.listen_sk;
 		int queued_data =  tp->rcv_nxt - tp->copied_seq;
 		int hasfin =  !skb_queue_empty(&sk->sk_receive_queue) ?
 			tcp_hdr((struct sk_buff *)
@@ -4570,8 +4571,9 @@ static int tcp_defer_accept_check(struct sock *sk)
 		if (queued_data && hasfin)
 			queued_data--;
 
-		if (queued_data &&
-		    tp->defer_tcp_accept.listen_sk->sk_state == TCP_LISTEN) {
+		bh_lock_sock(listen_sk);
+
+		if (queued_data && listen_sk->sk_state == TCP_LISTEN) {
 			if (sock_flag(sk, SOCK_KEEPOPEN)) {
 				inet_csk_reset_keepalive_timer(sk,
 							       keepalive_time_when(tp));
@@ -4579,23 +4581,24 @@ static int tcp_defer_accept_check(struct sock *sk)
 				inet_csk_delete_keepalive_timer(sk);
 			}
 
-			inet_csk_reqsk_queue_add(
-				tp->defer_tcp_accept.listen_sk,
-				tp->defer_tcp_accept.request,
-				sk);
+			inet_csk_reqsk_queue_add(listen_sk,
+						 tp->defer_tcp_accept.request,
+						 sk);
 
 			tp->defer_tcp_accept.listen_sk->sk_data_ready(
-				tp->defer_tcp_accept.listen_sk, 0);
+						listen_sk, 0);
 
-			sock_put(tp->defer_tcp_accept.listen_sk);
+			sock_put(listen_sk);
 			sock_put(sk);
 			tp->defer_tcp_accept.listen_sk = NULL;
 			tp->defer_tcp_accept.request = NULL;
-		} else if (hasfin ||
-			   tp->defer_tcp_accept.listen_sk->sk_state != TCP_LISTEN) {
+		} else if (hasfin || listen_sk->sk_state != TCP_LISTEN) {
+			bh_unlock_sock(listen_sk);
 			tcp_reset(sk);
 			return -1;
 		}
+
+		bh_unlock_sock(listen_sk);
 	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-04 18:24                         ` David Miller
@ 2008-06-04 20:56                           ` Ilpo Järvinen
  2008-06-04 21:55                             ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Ilpo Järvinen @ 2008-06-04 20:56 UTC (permalink / raw)
  To: David Miller
  Cc: mingo, peterz, LKML, Netdev, rjw, Andrew Morton, johnpol, mcmanus

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1021 bytes --]

On Wed, 4 Jun 2008, David Miller wrote:

> From: Ingo Molnar <mingo@elte.hu>
> Date: Wed, 4 Jun 2008 09:23:11 +0200
> 
> > * Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> >
> > > ...I couldn't immediately find anything obviously wrong with those 
> > > changes but the patch below might be worth of a try (without the 
> > > revert of course). If it ever spits out that WARN_ON for you, we were 
> > > playing with fire too much and it's better to return on the safe side 
> > > there...
> > 
> > i'll queue it up for testing, but no promises about speedy action here - 
> > the test cycle is really long with this bug.
> 
> Ilpo posted another patch which fixes a locking bug in the
> code, please test with that patch.  I include it below so
> that you know exactly which one I am referring to.

Do you know if TCP ever needs to lock the child sk again (after creating 
it and it has started to live on its own) from a context that has already 
locked the listen_sk (it could deadlock after this change)?


-- 
 i.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
  2008-06-04 20:56                           ` Ilpo Järvinen
@ 2008-06-04 21:55                             ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2008-06-04 21:55 UTC (permalink / raw)
  To: ilpo.jarvinen
  Cc: mingo, peterz, linux-kernel, netdev, rjw, akpm, johnpol, mcmanus

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 4 Jun 2008 23:56:34 +0300 (EEST)

> Do you know if TCP ever needs to lock the child sk again (after creating 
> it and it has started to live on its own) from a context that has already 
> locked the listen_sk (it could deadlock after this change)?

I don't know, and since I'm getting back onto an airplane again tomorrow
morning I'm the last person who is likely do be able to do an audit
of these cases for you :-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2008-06-11 15:13 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-11 15:06 [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ Alexey Kuznetsov
  -- strict thread matches above, loose matches on Subject: below --
2008-05-29  8:45 Ingo Molnar
2008-05-29 11:14 ` Ilpo Järvinen
2008-05-29 11:22   ` Ingo Molnar
2008-05-30 18:18     ` Ingo Molnar
2008-05-31  6:09       ` Ingo Molnar
2008-05-31 11:46         ` Ilpo Järvinen
2008-05-31 12:18           ` Ilpo Järvinen
2008-05-31 12:54             ` Ingo Molnar
2008-05-31 12:58               ` Ilpo Järvinen
2008-05-31 16:35                 ` Ingo Molnar
2008-06-03  9:40                   ` [fixed] [patch] " Ingo Molnar
2008-06-03 14:41                     ` Patrick McManus
2008-06-03 21:46                     ` Ilpo Järvinen
2008-06-03 22:01                       ` Ilpo Järvinen
2008-06-03 22:03                         ` David Miller
2008-06-03 22:10                           ` Ilpo Järvinen
2008-06-03 23:22                           ` Ilpo Järvinen
2008-06-03 23:54                             ` Joe Perches
2008-06-04  6:25                               ` Ilpo Järvinen
2008-06-04  2:54                             ` Patrick McManus
2008-06-04  6:42                               ` Ilpo Järvinen
2008-06-05 14:22                             ` Ingo Molnar
2008-06-05 18:00                               ` Ilpo Järvinen
2008-06-05 21:13                                 ` Ilpo Järvinen
2008-06-05 23:29                                   ` Patrick McManus
2008-06-06 10:03                                     ` Ilpo Järvinen
2008-06-06 17:11                                       ` Patrick McManus
2008-06-06 17:33                                         ` Ingo Molnar
2008-06-06 18:19                                           ` Ilpo Järvinen
2008-06-06 18:39                                             ` Ingo Molnar
2008-06-06 19:49                                               ` Ilpo Järvinen
2008-06-06 20:08                                               ` Patrick McManus
2008-06-06 21:12                                                 ` Ilpo Järvinen
2008-06-06 21:23                                                   ` Arjan van de Ven
2008-06-06 21:28                                                     ` Ilpo Järvinen
2008-06-10 22:49                                                 ` David Miller
2008-06-06 18:25                                         ` Ilpo Järvinen
2008-06-10 22:32                             ` David Miller
2008-06-11 13:10                               ` Patrick McManus
2008-06-11 15:13                               ` Ilpo Järvinen
2008-06-04  7:23                       ` Ingo Molnar
2008-06-04 18:24                         ` David Miller
2008-06-04 20:56                           ` Ilpo Järvinen
2008-06-04 21:55                             ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).