Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Eric Dumazet @ 2010-04-20 22:31 UTC (permalink / raw)
  To: Gaspar Chilingarov; +Cc: netdev
In-Reply-To: <g2z46c8cb3e1004201517i5641a75cze2ec5bd33e81fb0f@mail.gmail.com>

Le mercredi 21 avril 2010 à 03:17 +0500, Gaspar Chilingarov a écrit :
> [1.] Large amount of outgoing tcp connections fail to bind properly to
> their ip/ports
> 
> [2.] Full description of the problem/report:
> 
> I'm trying to establish huge amount of outgoing tcp connections (over
> several 100000-s) on a single machine. I need to test load a server,
> which could process that amount of connections :)
> 
> The number of connections which are possible to establish from single
> ip is regulated by
> net.ipv4.ip_local_port_range = 32768    61000, which gives 28232 connections.
> 
> Good. I expect that each socket is identified on a local side as
> unique pair of local_ip:local_port .
> Thus I've added some more IP addresses (say 10) to the machine
> (aliases to the same network interface).
> I expect to be able to establish 10 times more connections than before
> (I know about file descriptor limits, system limit of total number of
> file descriptors and  so on - which are tuned to high values already).
> 
> And the fun part begins -
> I have 28232 on a first source IP (all in established state, say
> 10.0.0.10) and now I'm trying to establish one more connection with
> nc, specifying 10.0.0.11 as a source IP -- and getting "unable to bind
> error"
> 
> Notes about example;
> 10.0.0.1:8192 is a server which just accepts a connections and listens
> forever on them. It's in erlang and it can handle great loads -- so
> there is not problems on that side.
> Using the same script I was able to establish more than 20.000
> connections without any problems (having a standard local port range
> set)
> 
> 
> To make experiment easily reproducible I've done the following things:
> 
> Decrease number of local ports available to 1001 -
> net.ipv4.ip_local_port_range = 60000    61000
> 
> I have script like this (writing from memory)
> 
> #!/bin/sh
> 
> I=0
> 
> IP=10.0.0.10
> 
> # connection stats before run
> netstat -n | grep ESTABLISHED | fgrep "$IP" | wc -l
> 
> while [ $I -le 1000 ]; do
> 
> # run nc in background, supress any output
> nc -s $IP 10.0.0.1 8192 > /dev/null 2>&1 &
> 
> I=$(($I + 1))
> 
> done
> 
> # connection stats after run
> netstat -n | grep ESTABLISHED | fgrep "$IP" | wc -l
> 
> 
> EVEN on the first run I get only 990 successful connections! something
> fails, strange ....
> 
> nc 10.0.0.1 8192 fails with error "unable to bind" and establishes
> connection only from 5-10 try.
> 
> Ook, well, run this script again, get all possible 1001 connections
> and than change source IP to 10.0.0.11
> 
> If you run in several times you will get the following numbers of
> established connections about each run (for given source IP)
> ~650, ~870, 950,980,990,995,995, 1000 and several runs to get 1001.
> 
> Then if you change IP to the next available and run it again - you
> will get practically the same numbers and this continues for 3-th,
> 4th, 5-th and other IP's.
> 
> 
> As a programmer, I feel that there is some hash table for
> local_ip:local_port pairs in the kernel (may be also incorporating
> PID), which has a collisions and
> in case of collision it just fails to reserve/bind this pair for the
> socket.  I hope I'm right, but I've failed to find where the
> allocation is done :)
> In case if PID does not change (i've tried to run tests from primitive
> client in erlang as well -- you get much more worse picture and
> getting new socket becomes just impossible).
> 
> I think that even in case if there is one port available for that IP
> -- it should be possible to bind (even if the kernel should do the
> full scan on local port range to find that unused port).
> 
> 
> I would be grateful for hints where to look in the source -- may be I
> can produce some working patches for it.
> 
> 
> [3.] Keywords (i.e., modules, networking, kernel):
> does not matter, i think.
> 
> [4.] Kernel version (from /proc/version):
> Ubuntu Karmic Koala on amd64 with latest shipped kernel.
> Linux version 2.6.31-21-generic (buildd@yellow) (gcc version 4.4.1
> (Ubuntu 4.4.1-4ubuntu9) ) #59-Ubuntu SMP Wed Mar 24 07:28:27 UTC 2010
> 
> 
> [5.] Output of Oops.. message (if applicable) with symbolic information
>     resolved (see Documentation/oops-tracing.txt)
> n/a
> 
> [6.] A small shell script or example program which triggers the
>     problem (if possible)
> [7.] Environment
> 
> 
> Thanks in advance,
> Gaspar
> 

Its doable, only if you bind() your sockets before connect()

For each socket, you'll need to chose an (local IP, local port) not
already in use.

kernel wont magically select one source IP from the pool you have.




^ permalink raw reply

* [PATCH net-next-2.6] net: sk_sleep() helper
From: Eric Dumazet @ 2010-04-20 23:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1271199845.16881.586.camel@edumazet-laptop>

Le mercredi 14 avril 2010 à 01:04 +0200, Eric Dumazet a écrit :

> I am now working on sk_callback_lock case, to speedup
> sock_def_readable(), sock_def_write_space() in typical cases
> (SOCK_FASYNC not set)
> 
> Instead of using rcu on whole "struct socket", my plan is to use a small
> structure :
> 
> struct wait_queue_head_rcu {
> 	wait_queue_head_t wait;
> 	struct rcu_head	  rcu;
> } ____cacheline_aligned_in_smp;
> 
> and make sk->sk_sleep points to this 'wait' field.
> 

I am preparing this now, and submit this preliminary patch.

Thanks !

[PATCH net-next-2.6] net: sk_sleep() helper

Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
	return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/atm/atmtcp.c            |    6 +-
 drivers/net/macvtap.c           |    4 -
 drivers/net/tun.c               |    4 -
 drivers/scsi/iscsi_tcp.c        |    4 -
 include/net/sock.h              |   10 +++-
 include/net/tcp.h               |    2 
 net/atm/common.c                |   12 ++---
 net/atm/signaling.c             |    2 
 net/atm/svc.c                   |   62 +++++++++++++++---------------
 net/ax25/af_ax25.c              |    8 +--
 net/bluetooth/af_bluetooth.c    |    6 +-
 net/bluetooth/bnep/core.c       |    8 +--
 net/bluetooth/bnep/netdev.c     |    6 +-
 net/bluetooth/cmtp/cmtp.h       |    2 
 net/bluetooth/cmtp/core.c       |    4 -
 net/bluetooth/hidp/core.c       |   10 ++--
 net/bluetooth/hidp/hidp.h       |    4 -
 net/bluetooth/l2cap.c           |    4 -
 net/bluetooth/rfcomm/sock.c     |    8 +--
 net/bluetooth/sco.c             |    4 -
 net/caif/caif_socket.c          |    2 
 net/core/datagram.c             |    6 +-
 net/core/sock.c                 |   16 +++----
 net/core/stream.c               |   16 +++----
 net/dccp/output.c               |    6 +-
 net/dccp/proto.c                |    2 
 net/decnet/af_decnet.c          |   26 ++++++------
 net/ipv4/af_inet.c              |    6 +-
 net/ipv4/inet_connection_sock.c |    4 -
 net/ipv4/tcp.c                  |    2 
 net/irda/af_irda.c              |   14 +++---
 net/iucv/af_iucv.c              |   12 ++---
 net/llc/af_llc.c                |   12 ++---
 net/netfilter/ipvs/ip_vs_sync.c |    2 
 net/netrom/af_netrom.c          |    8 +--
 net/rds/af_rds.c                |    2 
 net/rds/rds.h                   |    2 
 net/rds/recv.c                  |    2 
 net/rds/send.c                  |    2 
 net/rose/af_rose.c              |    8 +--
 net/rxrpc/af_rxrpc.c            |    4 -
 net/sctp/socket.c               |   20 ++++-----
 net/sunrpc/svcsock.c            |   24 +++++------
 net/tipc/socket.c               |   26 ++++++------
 net/unix/af_unix.c              |   10 ++--
 net/x25/af_x25.c                |    8 +--
 46 files changed, 208 insertions(+), 204 deletions(-)

diff --git a/drivers/atm/atmtcp.c b/drivers/atm/atmtcp.c
index b867121..b910181 100644
--- a/drivers/atm/atmtcp.c
+++ b/drivers/atm/atmtcp.c
@@ -68,7 +68,7 @@ static int atmtcp_send_control(struct atm_vcc *vcc,int type,
 	*(struct atm_vcc **) &new_msg->vcc = vcc;
 	old_test = test_bit(flag,&vcc->flags);
 	out_vcc->push(out_vcc,skb);
-	add_wait_queue(sk_atm(vcc)->sk_sleep, &wait);
+	add_wait_queue(sk_sleep(sk_atm(vcc)), &wait);
 	while (test_bit(flag,&vcc->flags) == old_test) {
 		mb();
 		out_vcc = PRIV(vcc->dev) ? PRIV(vcc->dev)->vcc : NULL;
@@ -80,7 +80,7 @@ static int atmtcp_send_control(struct atm_vcc *vcc,int type,
 		schedule();
 	}
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk_atm(vcc)->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk_atm(vcc)), &wait);
 	return error;
 }
 
@@ -105,7 +105,7 @@ static int atmtcp_recv_control(const struct atmtcp_control *msg)
 		    msg->type);
 		return -EINVAL;
 	}
-	wake_up(sk_atm(vcc)->sk_sleep);
+	wake_up(sk_sleep(sk_atm(vcc)));
 	return 0;
 }
 
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index abba3cc..85d6420 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -246,8 +246,8 @@ static void macvtap_sock_write_space(struct sock *sk)
 	    !test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags))
 		return;
 
-	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible_poll(sk->sk_sleep, POLLOUT | POLLWRNORM | POLLWRBAND);
+	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible_poll(sk_sleep(sk), POLLOUT | POLLWRNORM | POLLWRBAND);
 }
 
 static int macvtap_open(struct inode *inode, struct file *file)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4326520..20a1793 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -868,8 +868,8 @@ static void tun_sock_write_space(struct sock *sk)
 	if (!test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags))
 		return;
 
-	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible_sync_poll(sk->sk_sleep, POLLOUT |
+	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible_sync_poll(sk_sleep(sk), POLLOUT |
 						POLLWRNORM | POLLWRBAND);
 
 	tun = tun_sk(sk)->tun;
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 0ee725c..9eae04a 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -599,9 +599,9 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag)
 	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_rx);
 	write_unlock_bh(&tcp_sw_conn->sock->sk->sk_callback_lock);
 
-	if (sock->sk->sk_sleep && waitqueue_active(sock->sk->sk_sleep)) {
+	if (sk_sleep(sock->sk) && waitqueue_active(sk_sleep(sock->sk))) {
 		sock->sk->sk_err = EIO;
-		wake_up_interruptible(sock->sk->sk_sleep);
+		wake_up_interruptible(sk_sleep(sock->sk));
 	}
 
 	iscsi_conn_stop(cls_conn, flag);
diff --git a/include/net/sock.h b/include/net/sock.h
index 56df440..8ab0514 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1160,6 +1160,10 @@ static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 	sk->sk_socket = sock;
 }
 
+static inline wait_queue_head_t *sk_sleep(struct sock *sk)
+{
+	return sk->sk_sleep;
+}
 /* Detach socket from process context.
  * Announce socket dead, detach it from wait queue and inode.
  * Note that parent inode held reference count on this struct sock,
@@ -1346,8 +1350,8 @@ static inline int sk_has_allocations(const struct sock *sk)
  *   tp->rcv_nxt check   sock_def_readable
  *   ...                 {
  *   schedule               ...
- *                          if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
- *                              wake_up_interruptible(sk->sk_sleep)
+ *                          if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+ *                              wake_up_interruptible(sk_sleep(sk))
  *                          ...
  *                       }
  *
@@ -1368,7 +1372,7 @@ static inline int sk_has_sleeper(struct sock *sk)
 	 * This memory barrier is paired in the sock_poll_wait.
 	 */
 	smp_mb__after_lock();
-	return sk->sk_sleep && waitqueue_active(sk->sk_sleep);
+	return sk_sleep(sk) && waitqueue_active(sk_sleep(sk));
 }
 
 /**
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 70c5159..b7d83d2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -939,7 +939,7 @@ static inline int tcp_prequeue(struct sock *sk, struct sk_buff *skb)
 
 		tp->ucopy.memory = 0;
 	} else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {
-		wake_up_interruptible_sync_poll(sk->sk_sleep,
+		wake_up_interruptible_sync_poll(sk_sleep(sk),
 					   POLLIN | POLLRDNORM | POLLRDBAND);
 		if (!inet_csk_ack_scheduled(sk))
 			inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
diff --git a/net/atm/common.c b/net/atm/common.c
index 97ed94a..e3e10e6 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -92,7 +92,7 @@ static void vcc_def_wakeup(struct sock *sk)
 {
 	read_lock(&sk->sk_callback_lock);
 	if (sk_has_sleeper(sk))
-		wake_up(sk->sk_sleep);
+		wake_up(sk_sleep(sk));
 	read_unlock(&sk->sk_callback_lock);
 }
 
@@ -110,7 +110,7 @@ static void vcc_write_space(struct sock *sk)
 
 	if (vcc_writable(sk)) {
 		if (sk_has_sleeper(sk))
-			wake_up_interruptible(sk->sk_sleep);
+			wake_up_interruptible(sk_sleep(sk));
 
 		sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
 	}
@@ -549,7 +549,7 @@ int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
 	}
 
 	eff = (size+3) & ~3; /* align to word boundary */
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	error = 0;
 	while (!(skb = alloc_tx(vcc, eff))) {
 		if (m->msg_flags & MSG_DONTWAIT) {
@@ -568,9 +568,9 @@ int vcc_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m,
 			send_sig(SIGPIPE, current, 0);
 			break;
 		}
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	if (error)
 		goto out;
 	skb->dev = NULL; /* for paths shared with net_device interfaces */
@@ -595,7 +595,7 @@ unsigned int vcc_poll(struct file *file, struct socket *sock, poll_table *wait)
 	struct atm_vcc *vcc;
 	unsigned int mask;
 
-	sock_poll_wait(file, sk->sk_sleep, wait);
+	sock_poll_wait(file, sk_sleep(sk), wait);
 	mask = 0;
 
 	vcc = ATM_SD(sock);
diff --git a/net/atm/signaling.c b/net/atm/signaling.c
index 6ba6e46..509c8ac 100644
--- a/net/atm/signaling.c
+++ b/net/atm/signaling.c
@@ -131,7 +131,7 @@ static int sigd_send(struct atm_vcc *vcc, struct sk_buff *skb)
 		}
 		sk->sk_ack_backlog++;
 		skb_queue_tail(&sk->sk_receive_queue, skb);
-		pr_debug("waking sk->sk_sleep 0x%p\n", sk->sk_sleep);
+		pr_debug("waking sk_sleep(sk) 0x%p\n", sk_sleep(sk));
 		sk->sk_state_change(sk);
 as_indicate_complete:
 		release_sock(sk);
diff --git a/net/atm/svc.c b/net/atm/svc.c
index 3ba9a45..754ee47 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -49,14 +49,14 @@ static void svc_disconnect(struct atm_vcc *vcc)
 
 	pr_debug("%p\n", vcc);
 	if (test_bit(ATM_VF_REGIS, &vcc->flags)) {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_UNINTERRUPTIBLE);
 		sigd_enq(vcc, as_close, NULL, NULL, NULL);
 		while (!test_bit(ATM_VF_RELEASED, &vcc->flags) && sigd) {
 			schedule();
-			prepare_to_wait(sk->sk_sleep, &wait,
+			prepare_to_wait(sk_sleep(sk), &wait,
 					TASK_UNINTERRUPTIBLE);
 		}
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 	}
 	/* beware - socket is still in use by atmsigd until the last
 	   as_indicate has been answered */
@@ -125,13 +125,13 @@ static int svc_bind(struct socket *sock, struct sockaddr *sockaddr,
 	}
 	vcc->local = *addr;
 	set_bit(ATM_VF_WAITING, &vcc->flags);
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_UNINTERRUPTIBLE);
 	sigd_enq(vcc, as_bind, NULL, NULL, &vcc->local);
 	while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) {
 		schedule();
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_UNINTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	clear_bit(ATM_VF_REGIS, &vcc->flags); /* doesn't count */
 	if (!sigd) {
 		error = -EUNATCH;
@@ -201,10 +201,10 @@ static int svc_connect(struct socket *sock, struct sockaddr *sockaddr,
 		}
 		vcc->remote = *addr;
 		set_bit(ATM_VF_WAITING, &vcc->flags);
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		sigd_enq(vcc, as_connect, NULL, NULL, &vcc->remote);
 		if (flags & O_NONBLOCK) {
-			finish_wait(sk->sk_sleep, &wait);
+			finish_wait(sk_sleep(sk), &wait);
 			sock->state = SS_CONNECTING;
 			error = -EINPROGRESS;
 			goto out;
@@ -213,7 +213,7 @@ static int svc_connect(struct socket *sock, struct sockaddr *sockaddr,
 		while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) {
 			schedule();
 			if (!signal_pending(current)) {
-				prepare_to_wait(sk->sk_sleep, &wait,
+				prepare_to_wait(sk_sleep(sk), &wait,
 						TASK_INTERRUPTIBLE);
 				continue;
 			}
@@ -232,14 +232,14 @@ static int svc_connect(struct socket *sock, struct sockaddr *sockaddr,
 			 */
 			sigd_enq(vcc, as_close, NULL, NULL, NULL);
 			while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) {
-				prepare_to_wait(sk->sk_sleep, &wait,
+				prepare_to_wait(sk_sleep(sk), &wait,
 						TASK_INTERRUPTIBLE);
 				schedule();
 			}
 			if (!sk->sk_err)
 				while (!test_bit(ATM_VF_RELEASED, &vcc->flags) &&
 				       sigd) {
-					prepare_to_wait(sk->sk_sleep, &wait,
+					prepare_to_wait(sk_sleep(sk), &wait,
 							TASK_INTERRUPTIBLE);
 					schedule();
 				}
@@ -250,7 +250,7 @@ static int svc_connect(struct socket *sock, struct sockaddr *sockaddr,
 			error = -EINTR;
 			break;
 		}
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 		if (error)
 			goto out;
 		if (!sigd) {
@@ -302,13 +302,13 @@ static int svc_listen(struct socket *sock, int backlog)
 		goto out;
 	}
 	set_bit(ATM_VF_WAITING, &vcc->flags);
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_UNINTERRUPTIBLE);
 	sigd_enq(vcc, as_listen, NULL, NULL, &vcc->local);
 	while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) {
 		schedule();
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_UNINTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	if (!sigd) {
 		error = -EUNATCH;
 		goto out;
@@ -343,7 +343,7 @@ static int svc_accept(struct socket *sock, struct socket *newsock, int flags)
 	while (1) {
 		DEFINE_WAIT(wait);
 
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		while (!(skb = skb_dequeue(&sk->sk_receive_queue)) &&
 		       sigd) {
 			if (test_bit(ATM_VF_RELEASED, &old_vcc->flags))
@@ -363,10 +363,10 @@ static int svc_accept(struct socket *sock, struct socket *newsock, int flags)
 				error = -ERESTARTSYS;
 				break;
 			}
-			prepare_to_wait(sk->sk_sleep, &wait,
+			prepare_to_wait(sk_sleep(sk), &wait,
 					TASK_INTERRUPTIBLE);
 		}
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 		if (error)
 			goto out;
 		if (!skb) {
@@ -392,17 +392,17 @@ static int svc_accept(struct socket *sock, struct socket *newsock, int flags)
 		}
 		/* wait should be short, so we ignore the non-blocking flag */
 		set_bit(ATM_VF_WAITING, &new_vcc->flags);
-		prepare_to_wait(sk_atm(new_vcc)->sk_sleep, &wait,
+		prepare_to_wait(sk_sleep(sk_atm(new_vcc)), &wait,
 				TASK_UNINTERRUPTIBLE);
 		sigd_enq(new_vcc, as_accept, old_vcc, NULL, NULL);
 		while (test_bit(ATM_VF_WAITING, &new_vcc->flags) && sigd) {
 			release_sock(sk);
 			schedule();
 			lock_sock(sk);
-			prepare_to_wait(sk_atm(new_vcc)->sk_sleep, &wait,
+			prepare_to_wait(sk_sleep(sk_atm(new_vcc)), &wait,
 					TASK_UNINTERRUPTIBLE);
 		}
-		finish_wait(sk_atm(new_vcc)->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk_atm(new_vcc)), &wait);
 		if (!sigd) {
 			error = -EUNATCH;
 			goto out;
@@ -438,14 +438,14 @@ int svc_change_qos(struct atm_vcc *vcc, struct atm_qos *qos)
 	DEFINE_WAIT(wait);
 
 	set_bit(ATM_VF_WAITING, &vcc->flags);
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_UNINTERRUPTIBLE);
 	sigd_enq2(vcc, as_modify, NULL, NULL, &vcc->local, qos, 0);
 	while (test_bit(ATM_VF_WAITING, &vcc->flags) &&
 	       !test_bit(ATM_VF_RELEASED, &vcc->flags) && sigd) {
 		schedule();
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_UNINTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	if (!sigd)
 		return -EUNATCH;
 	return -sk->sk_err;
@@ -534,20 +534,20 @@ static int svc_addparty(struct socket *sock, struct sockaddr *sockaddr,
 
 	lock_sock(sk);
 	set_bit(ATM_VF_WAITING, &vcc->flags);
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	sigd_enq(vcc, as_addparty, NULL, NULL,
 		 (struct sockaddr_atmsvc *) sockaddr);
 	if (flags & O_NONBLOCK) {
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 		error = -EINPROGRESS;
 		goto out;
 	}
 	pr_debug("added wait queue\n");
 	while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) {
 		schedule();
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	error = xchg(&sk->sk_err_soft, 0);
 out:
 	release_sock(sk);
@@ -563,13 +563,13 @@ static int svc_dropparty(struct socket *sock, int ep_ref)
 
 	lock_sock(sk);
 	set_bit(ATM_VF_WAITING, &vcc->flags);
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	sigd_enq2(vcc, as_dropparty, NULL, NULL, NULL, NULL, ep_ref);
 	while (test_bit(ATM_VF_WAITING, &vcc->flags) && sigd) {
 		schedule();
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	if (!sigd) {
 		error = -EUNATCH;
 		goto out;
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 65c5801..cfdfd7e 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1281,7 +1281,7 @@ static int __must_check ax25_connect(struct socket *sock,
 		DEFINE_WAIT(wait);
 
 		for (;;) {
-			prepare_to_wait(sk->sk_sleep, &wait,
+			prepare_to_wait(sk_sleep(sk), &wait,
 					TASK_INTERRUPTIBLE);
 			if (sk->sk_state != TCP_SYN_SENT)
 				break;
@@ -1294,7 +1294,7 @@ static int __must_check ax25_connect(struct socket *sock,
 			err = -ERESTARTSYS;
 			break;
 		}
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 
 		if (err)
 			goto out_release;
@@ -1346,7 +1346,7 @@ static int ax25_accept(struct socket *sock, struct socket *newsock, int flags)
 	 *	hooked into the SABM we saved
 	 */
 	for (;;) {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		skb = skb_dequeue(&sk->sk_receive_queue);
 		if (skb)
 			break;
@@ -1364,7 +1364,7 @@ static int ax25_accept(struct socket *sock, struct socket *newsock, int flags)
 		err = -ERESTARTSYS;
 		break;
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 
 	if (err)
 		goto out;
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 404a850..421c45b 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -288,7 +288,7 @@ unsigned int bt_sock_poll(struct file * file, struct socket *sock, poll_table *w
 
 	BT_DBG("sock %p, sk %p", sock, sk);
 
-	poll_wait(file, sk->sk_sleep, wait);
+	poll_wait(file, sk_sleep(sk), wait);
 
 	if (sk->sk_state == BT_LISTEN)
 		return bt_accept_poll(sk);
@@ -378,7 +378,7 @@ int bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo)
 
 	BT_DBG("sk %p", sk);
 
-	add_wait_queue(sk->sk_sleep, &wait);
+	add_wait_queue(sk_sleep(sk), &wait);
 	while (sk->sk_state != state) {
 		set_current_state(TASK_INTERRUPTIBLE);
 
@@ -401,7 +401,7 @@ int bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo)
 			break;
 	}
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 	return err;
 }
 EXPORT_SYMBOL(bt_sock_wait_state);
diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c
index 8062dad..f10b41f 100644
--- a/net/bluetooth/bnep/core.c
+++ b/net/bluetooth/bnep/core.c
@@ -474,7 +474,7 @@ static int bnep_session(void *arg)
 	set_user_nice(current, -15);
 
 	init_waitqueue_entry(&wait, current);
-	add_wait_queue(sk->sk_sleep, &wait);
+	add_wait_queue(sk_sleep(sk), &wait);
 	while (!atomic_read(&s->killed)) {
 		set_current_state(TASK_INTERRUPTIBLE);
 
@@ -496,7 +496,7 @@ static int bnep_session(void *arg)
 		schedule();
 	}
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 
 	/* Cleanup session */
 	down_write(&bnep_session_sem);
@@ -507,7 +507,7 @@ static int bnep_session(void *arg)
 	/* Wakeup user-space polling for socket errors */
 	s->sock->sk->sk_err = EUNATCH;
 
-	wake_up_interruptible(s->sock->sk->sk_sleep);
+	wake_up_interruptible(sk_sleep(s->sock->sk));
 
 	/* Release the socket */
 	fput(s->sock->file);
@@ -638,7 +638,7 @@ int bnep_del_connection(struct bnep_conndel_req *req)
 
 		/* Kill session thread */
 		atomic_inc(&s->killed);
-		wake_up_interruptible(s->sock->sk->sk_sleep);
+		wake_up_interruptible(sk_sleep(s->sock->sk));
 	} else
 		err = -ENOENT;
 
diff --git a/net/bluetooth/bnep/netdev.c b/net/bluetooth/bnep/netdev.c
index d48b33f..0faad5c 100644
--- a/net/bluetooth/bnep/netdev.c
+++ b/net/bluetooth/bnep/netdev.c
@@ -109,7 +109,7 @@ static void bnep_net_set_mc_list(struct net_device *dev)
 	}
 
 	skb_queue_tail(&sk->sk_write_queue, skb);
-	wake_up_interruptible(sk->sk_sleep);
+	wake_up_interruptible(sk_sleep(sk));
 #endif
 }
 
@@ -193,11 +193,11 @@ static netdev_tx_t bnep_net_xmit(struct sk_buff *skb,
 	/*
 	 * We cannot send L2CAP packets from here as we are potentially in a bh.
 	 * So we have to queue them and wake up session thread which is sleeping
-	 * on the sk->sk_sleep.
+	 * on the sk_sleep(sk).
 	 */
 	dev->trans_start = jiffies;
 	skb_queue_tail(&sk->sk_write_queue, skb);
-	wake_up_interruptible(sk->sk_sleep);
+	wake_up_interruptible(sk_sleep(sk));
 
 	if (skb_queue_len(&sk->sk_write_queue) >= BNEP_TX_QUEUE_LEN) {
 		BT_DBG("tx queue is full");
diff --git a/net/bluetooth/cmtp/cmtp.h b/net/bluetooth/cmtp/cmtp.h
index e4663aa..785e79e 100644
--- a/net/bluetooth/cmtp/cmtp.h
+++ b/net/bluetooth/cmtp/cmtp.h
@@ -125,7 +125,7 @@ static inline void cmtp_schedule(struct cmtp_session *session)
 {
 	struct sock *sk = session->sock->sk;
 
-	wake_up_interruptible(sk->sk_sleep);
+	wake_up_interruptible(sk_sleep(sk));
 }
 
 /* CMTP init defines */
diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
index 0073ec8..d4c6af0 100644
--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -284,7 +284,7 @@ static int cmtp_session(void *arg)
 	set_user_nice(current, -15);
 
 	init_waitqueue_entry(&wait, current);
-	add_wait_queue(sk->sk_sleep, &wait);
+	add_wait_queue(sk_sleep(sk), &wait);
 	while (!atomic_read(&session->terminate)) {
 		set_current_state(TASK_INTERRUPTIBLE);
 
@@ -301,7 +301,7 @@ static int cmtp_session(void *arg)
 		schedule();
 	}
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 
 	down_write(&cmtp_session_sem);
 
diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 280529a..bfe641b 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -561,8 +561,8 @@ static int hidp_session(void *arg)
 
 	init_waitqueue_entry(&ctrl_wait, current);
 	init_waitqueue_entry(&intr_wait, current);
-	add_wait_queue(ctrl_sk->sk_sleep, &ctrl_wait);
-	add_wait_queue(intr_sk->sk_sleep, &intr_wait);
+	add_wait_queue(sk_sleep(ctrl_sk), &ctrl_wait);
+	add_wait_queue(sk_sleep(intr_sk), &intr_wait);
 	while (!atomic_read(&session->terminate)) {
 		set_current_state(TASK_INTERRUPTIBLE);
 
@@ -584,8 +584,8 @@ static int hidp_session(void *arg)
 		schedule();
 	}
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(intr_sk->sk_sleep, &intr_wait);
-	remove_wait_queue(ctrl_sk->sk_sleep, &ctrl_wait);
+	remove_wait_queue(sk_sleep(intr_sk), &intr_wait);
+	remove_wait_queue(sk_sleep(ctrl_sk), &ctrl_wait);
 
 	down_write(&hidp_session_sem);
 
@@ -609,7 +609,7 @@ static int hidp_session(void *arg)
 
 	fput(session->intr_sock->file);
 
-	wait_event_timeout(*(ctrl_sk->sk_sleep),
+	wait_event_timeout(*(sk_sleep(ctrl_sk)),
 		(ctrl_sk->sk_state == BT_CLOSED), msecs_to_jiffies(500));
 
 	fput(session->ctrl_sock->file);
diff --git a/net/bluetooth/hidp/hidp.h b/net/bluetooth/hidp/hidp.h
index a4e215d..8d934a1 100644
--- a/net/bluetooth/hidp/hidp.h
+++ b/net/bluetooth/hidp/hidp.h
@@ -164,8 +164,8 @@ static inline void hidp_schedule(struct hidp_session *session)
 	struct sock *ctrl_sk = session->ctrl_sock->sk;
 	struct sock *intr_sk = session->intr_sock->sk;
 
-	wake_up_interruptible(ctrl_sk->sk_sleep);
-	wake_up_interruptible(intr_sk->sk_sleep);
+	wake_up_interruptible(sk_sleep(ctrl_sk));
+	wake_up_interruptible(sk_sleep(intr_sk));
 }
 
 /* HIDP init defines */
diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index 99d68c3..c1e60ee 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -1147,7 +1147,7 @@ static int l2cap_sock_accept(struct socket *sock, struct socket *newsock, int fl
 	BT_DBG("sk %p timeo %ld", sk, timeo);
 
 	/* Wait for an incoming connection. (wake-one). */
-	add_wait_queue_exclusive(sk->sk_sleep, &wait);
+	add_wait_queue_exclusive(sk_sleep(sk), &wait);
 	while (!(nsk = bt_accept_dequeue(sk, newsock))) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (!timeo) {
@@ -1170,7 +1170,7 @@ static int l2cap_sock_accept(struct socket *sock, struct socket *newsock, int fl
 		}
 	}
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 
 	if (err)
 		goto done;
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 8ed3c37..43fbf6b 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -503,7 +503,7 @@ static int rfcomm_sock_accept(struct socket *sock, struct socket *newsock, int f
 	BT_DBG("sk %p timeo %ld", sk, timeo);
 
 	/* Wait for an incoming connection. (wake-one). */
-	add_wait_queue_exclusive(sk->sk_sleep, &wait);
+	add_wait_queue_exclusive(sk_sleep(sk), &wait);
 	while (!(nsk = bt_accept_dequeue(sk, newsock))) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (!timeo) {
@@ -526,7 +526,7 @@ static int rfcomm_sock_accept(struct socket *sock, struct socket *newsock, int f
 		}
 	}
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 
 	if (err)
 		goto done;
@@ -621,7 +621,7 @@ static long rfcomm_sock_data_wait(struct sock *sk, long timeo)
 {
 	DECLARE_WAITQUEUE(wait, current);
 
-	add_wait_queue(sk->sk_sleep, &wait);
+	add_wait_queue(sk_sleep(sk), &wait);
 	for (;;) {
 		set_current_state(TASK_INTERRUPTIBLE);
 
@@ -640,7 +640,7 @@ static long rfcomm_sock_data_wait(struct sock *sk, long timeo)
 	}
 
 	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 	return timeo;
 }
 
diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index ca6b2ad..b406d3e 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -567,7 +567,7 @@ static int sco_sock_accept(struct socket *sock, struct socket *newsock, int flag
 	BT_DBG("sk %p timeo %ld", sk, timeo);
 
 	/* Wait for an incoming connection. (wake-one). */
-	add_wait_queue_exclusive(sk->sk_sleep, &wait);
+	add_wait_queue_exclusive(sk_sleep(sk), &wait);
 	while (!(ch = bt_accept_dequeue(sk, newsock))) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (!timeo) {
@@ -590,7 +590,7 @@ static int sco_sock_accept(struct socket *sock, struct socket *newsock, int flag
 		}
 	}
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 
 	if (err)
 		goto done;
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index cdf62b9..90317e7 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -689,7 +689,7 @@ static unsigned int caif_poll(struct file *file, struct socket *sock,
 	struct sock *sk = sock->sk;
 	struct caifsock *cf_sk = container_of(sk, struct caifsock, sk);
 	u32 mask = 0;
-	poll_wait(file, sk->sk_sleep, wait);
+	poll_wait(file, sk_sleep(sk), wait);
 	lock_sock(&(cf_sk->sk));
 	if (!STATE_IS_OPEN(cf_sk)) {
 		if (!STATE_IS_PENDING(cf_sk))
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 2dccd4e..5574a5d 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -86,7 +86,7 @@ static int wait_for_packet(struct sock *sk, int *err, long *timeo_p)
 	int error;
 	DEFINE_WAIT_FUNC(wait, receiver_wake_function);
 
-	prepare_to_wait_exclusive(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait_exclusive(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
 	/* Socket errors? */
 	error = sock_error(sk);
@@ -115,7 +115,7 @@ static int wait_for_packet(struct sock *sk, int *err, long *timeo_p)
 	error = 0;
 	*timeo_p = schedule_timeout(*timeo_p);
 out:
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return error;
 interrupted:
 	error = sock_intr_errno(*timeo_p);
@@ -726,7 +726,7 @@ unsigned int datagram_poll(struct file *file, struct socket *sock,
 	struct sock *sk = sock->sk;
 	unsigned int mask;
 
-	sock_poll_wait(file, sk->sk_sleep, wait);
+	sock_poll_wait(file, sk_sleep(sk), wait);
 	mask = 0;
 
 	/* exceptional events? */
diff --git a/net/core/sock.c b/net/core/sock.c
index 7effa1e..58ebd14 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1395,7 +1395,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
 		if (signal_pending(current))
 			break;
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)
 			break;
 		if (sk->sk_shutdown & SEND_SHUTDOWN)
@@ -1404,7 +1404,7 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
 			break;
 		timeo = schedule_timeout(timeo);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return timeo;
 }
 
@@ -1570,11 +1570,11 @@ int sk_wait_data(struct sock *sk, long *timeo)
 	int rc;
 	DEFINE_WAIT(wait);
 
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
 	rc = sk_wait_event(sk, timeo, !skb_queue_empty(&sk->sk_receive_queue));
 	clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return rc;
 }
 EXPORT_SYMBOL(sk_wait_data);
@@ -1798,7 +1798,7 @@ static void sock_def_wakeup(struct sock *sk)
 {
 	read_lock(&sk->sk_callback_lock);
 	if (sk_has_sleeper(sk))
-		wake_up_interruptible_all(sk->sk_sleep);
+		wake_up_interruptible_all(sk_sleep(sk));
 	read_unlock(&sk->sk_callback_lock);
 }
 
@@ -1806,7 +1806,7 @@ static void sock_def_error_report(struct sock *sk)
 {
 	read_lock(&sk->sk_callback_lock);
 	if (sk_has_sleeper(sk))
-		wake_up_interruptible_poll(sk->sk_sleep, POLLERR);
+		wake_up_interruptible_poll(sk_sleep(sk), POLLERR);
 	sk_wake_async(sk, SOCK_WAKE_IO, POLL_ERR);
 	read_unlock(&sk->sk_callback_lock);
 }
@@ -1815,7 +1815,7 @@ static void sock_def_readable(struct sock *sk, int len)
 {
 	read_lock(&sk->sk_callback_lock);
 	if (sk_has_sleeper(sk))
-		wake_up_interruptible_sync_poll(sk->sk_sleep, POLLIN |
+		wake_up_interruptible_sync_poll(sk_sleep(sk), POLLIN |
 						POLLRDNORM | POLLRDBAND);
 	sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
 	read_unlock(&sk->sk_callback_lock);
@@ -1830,7 +1830,7 @@ static void sock_def_write_space(struct sock *sk)
 	 */
 	if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
 		if (sk_has_sleeper(sk))
-			wake_up_interruptible_sync_poll(sk->sk_sleep, POLLOUT |
+			wake_up_interruptible_sync_poll(sk_sleep(sk), POLLOUT |
 						POLLWRNORM | POLLWRBAND);
 
 		/* Should agree with poll, otherwise some programs break */
diff --git a/net/core/stream.c b/net/core/stream.c
index a37debf..7b3c3f3 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -32,8 +32,8 @@ void sk_stream_write_space(struct sock *sk)
 	if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) && sock) {
 		clear_bit(SOCK_NOSPACE, &sock->flags);
 
-		if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-			wake_up_interruptible_poll(sk->sk_sleep, POLLOUT |
+		if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+			wake_up_interruptible_poll(sk_sleep(sk), POLLOUT |
 						POLLWRNORM | POLLWRBAND);
 		if (sock->fasync_list && !(sk->sk_shutdown & SEND_SHUTDOWN))
 			sock_wake_async(sock, SOCK_WAKE_SPACE, POLL_OUT);
@@ -66,13 +66,13 @@ int sk_stream_wait_connect(struct sock *sk, long *timeo_p)
 		if (signal_pending(tsk))
 			return sock_intr_errno(*timeo_p);
 
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		sk->sk_write_pending++;
 		done = sk_wait_event(sk, timeo_p,
 				     !sk->sk_err &&
 				     !((1 << sk->sk_state) &
 				       ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)));
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 		sk->sk_write_pending--;
 	} while (!done);
 	return 0;
@@ -96,13 +96,13 @@ void sk_stream_wait_close(struct sock *sk, long timeout)
 		DEFINE_WAIT(wait);
 
 		do {
-			prepare_to_wait(sk->sk_sleep, &wait,
+			prepare_to_wait(sk_sleep(sk), &wait,
 					TASK_INTERRUPTIBLE);
 			if (sk_wait_event(sk, &timeout, !sk_stream_closing(sk)))
 				break;
 		} while (!signal_pending(current) && timeout);
 
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 	}
 }
 
@@ -126,7 +126,7 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
 	while (1) {
 		set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags);
 
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
 		if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
 			goto do_error;
@@ -157,7 +157,7 @@ int sk_stream_wait_memory(struct sock *sk, long *timeo_p)
 		*timeo_p = current_timeo;
 	}
 out:
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return err;
 
 do_error:
diff --git a/net/dccp/output.c b/net/dccp/output.c
index e98b65e..2d3dcb3 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -198,7 +198,7 @@ void dccp_write_space(struct sock *sk)
 	read_lock(&sk->sk_callback_lock);
 
 	if (sk_has_sleeper(sk))
-		wake_up_interruptible(sk->sk_sleep);
+		wake_up_interruptible(sk_sleep(sk));
 	/* Should agree with poll, otherwise some programs break */
 	if (sock_writeable(sk))
 		sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
@@ -225,7 +225,7 @@ static int dccp_wait_for_ccid(struct sock *sk, struct sk_buff *skb, int delay)
 		dccp_pr_debug("delayed send by %d msec\n", delay);
 		jiffdelay = msecs_to_jiffies(delay);
 
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
 		sk->sk_write_pending++;
 		release_sock(sk);
@@ -241,7 +241,7 @@ static int dccp_wait_for_ccid(struct sock *sk, struct sk_buff *skb, int delay)
 		rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb);
 	} while ((delay = rc) > 0);
 out:
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return rc;
 
 do_error:
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index a0e38d8..b03ecf6 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -312,7 +312,7 @@ unsigned int dccp_poll(struct file *file, struct socket *sock,
 	unsigned int mask;
 	struct sock *sk = sock->sk;
 
-	sock_poll_wait(file, sk->sk_sleep, wait);
+	sock_poll_wait(file, sk_sleep(sk), wait);
 	if (sk->sk_state == DCCP_LISTEN)
 		return inet_csk_listen_poll(sk);
 
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 55e3b6b..d6b93d1 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -832,7 +832,7 @@ static int dn_confirm_accept(struct sock *sk, long *timeo, gfp_t allocation)
 	scp->segsize_loc = dst_metric(__sk_dst_get(sk), RTAX_ADVMSS);
 	dn_send_conn_conf(sk, allocation);
 
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	for(;;) {
 		release_sock(sk);
 		if (scp->state == DN_CC)
@@ -850,9 +850,9 @@ static int dn_confirm_accept(struct sock *sk, long *timeo, gfp_t allocation)
 		err = -EAGAIN;
 		if (!*timeo)
 			break;
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	if (err == 0) {
 		sk->sk_socket->state = SS_CONNECTED;
 	} else if (scp->state != DN_CC) {
@@ -873,7 +873,7 @@ static int dn_wait_run(struct sock *sk, long *timeo)
 	if (!*timeo)
 		return -EALREADY;
 
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	for(;;) {
 		release_sock(sk);
 		if (scp->state == DN_CI || scp->state == DN_CC)
@@ -891,9 +891,9 @@ static int dn_wait_run(struct sock *sk, long *timeo)
 		err = -ETIMEDOUT;
 		if (!*timeo)
 			break;
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 out:
 	if (err == 0) {
 		sk->sk_socket->state = SS_CONNECTED;
@@ -1040,7 +1040,7 @@ static struct sk_buff *dn_wait_for_connect(struct sock *sk, long *timeo)
 	struct sk_buff *skb = NULL;
 	int err = 0;
 
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	for(;;) {
 		release_sock(sk);
 		skb = skb_dequeue(&sk->sk_receive_queue);
@@ -1060,9 +1060,9 @@ static struct sk_buff *dn_wait_for_connect(struct sock *sk, long *timeo)
 		err = -EAGAIN;
 		if (!*timeo)
 			break;
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 
 	return skb == NULL ? ERR_PTR(err) : skb;
 }
@@ -1746,11 +1746,11 @@ static int dn_recvmsg(struct kiocb *iocb, struct socket *sock,
 			goto out;
 		}
 
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
 		sk_wait_event(sk, &timeo, dn_data_ready(sk, queue, flags, target));
 		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 	}
 
 	skb_queue_walk_safe(queue, skb, n) {
@@ -2003,12 +2003,12 @@ static int dn_sendmsg(struct kiocb *iocb, struct socket *sock,
 				goto out;
 			}
 
-			prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+			prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 			set_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
 			sk_wait_event(sk, &timeo,
 				      !dn_queue_too_long(scp, queue, flags));
 			clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
-			finish_wait(sk->sk_sleep, &wait);
+			finish_wait(sk_sleep(sk), &wait);
 			continue;
 		}
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c5376c7..5ca7290 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -548,7 +548,7 @@ static long inet_wait_for_connect(struct sock *sk, long timeo)
 {
 	DEFINE_WAIT(wait);
 
-	prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
 	/* Basic assumption: if someone sets sk->sk_err, he _must_
 	 * change state of the socket from TCP_SYN_*.
@@ -561,9 +561,9 @@ static long inet_wait_for_connect(struct sock *sk, long timeo)
 		lock_sock(sk);
 		if (signal_pending(current) || !timeo)
 			break;
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return timeo;
 }
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 8da6429..e0a3e35 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -234,7 +234,7 @@ static int inet_csk_wait_for_connect(struct sock *sk, long timeo)
 	 * having to remove and re-insert us on the wait queue.
 	 */
 	for (;;) {
-		prepare_to_wait_exclusive(sk->sk_sleep, &wait,
+		prepare_to_wait_exclusive(sk_sleep(sk), &wait,
 					  TASK_INTERRUPTIBLE);
 		release_sock(sk);
 		if (reqsk_queue_empty(&icsk->icsk_accept_queue))
@@ -253,7 +253,7 @@ static int inet_csk_wait_for_connect(struct sock *sk, long timeo)
 		if (!timeo)
 			break;
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return err;
 }
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0f8caf6..7720833 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -378,7 +378,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	struct sock *sk = sock->sk;
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	sock_poll_wait(file, sk->sk_sleep, wait);
+	sock_poll_wait(file, sk_sleep(sk), wait);
 	if (sk->sk_state == TCP_LISTEN)
 		return inet_csk_listen_poll(sk);
 
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 2a4efce..79986a6 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -347,7 +347,7 @@ static void irda_flow_indication(void *instance, void *sap, LOCAL_FLOW flow)
 		self->tx_flow = flow;
 		IRDA_DEBUG(1, "%s(), IrTTP wants us to start again\n",
 			   __func__);
-		wake_up_interruptible(sk->sk_sleep);
+		wake_up_interruptible(sk_sleep(sk));
 		break;
 	default:
 		IRDA_DEBUG(0, "%s(), Unknown flow command!\n", __func__);
@@ -900,7 +900,7 @@ static int irda_accept(struct socket *sock, struct socket *newsock, int flags)
 		if (flags & O_NONBLOCK)
 			goto out;
 
-		err = wait_event_interruptible(*(sk->sk_sleep),
+		err = wait_event_interruptible(*(sk_sleep(sk)),
 					skb_peek(&sk->sk_receive_queue));
 		if (err)
 			goto out;
@@ -1066,7 +1066,7 @@ static int irda_connect(struct socket *sock, struct sockaddr *uaddr,
 		goto out;
 
 	err = -ERESTARTSYS;
-	if (wait_event_interruptible(*(sk->sk_sleep),
+	if (wait_event_interruptible(*(sk_sleep(sk)),
 				     (sk->sk_state != TCP_SYN_SENT)))
 		goto out;
 
@@ -1318,7 +1318,7 @@ static int irda_sendmsg(struct kiocb *iocb, struct socket *sock,
 
 	/* Check if IrTTP is wants us to slow down */
 
-	if (wait_event_interruptible(*(sk->sk_sleep),
+	if (wait_event_interruptible(*(sk_sleep(sk)),
 	    (self->tx_flow != FLOW_STOP  ||  sk->sk_state != TCP_ESTABLISHED))) {
 		err = -ERESTARTSYS;
 		goto out;
@@ -1477,7 +1477,7 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 			if (copied >= target)
 				break;
 
-			prepare_to_wait_exclusive(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+			prepare_to_wait_exclusive(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
 			/*
 			 *	POSIX 1003.1g mandates this order.
@@ -1497,7 +1497,7 @@ static int irda_recvmsg_stream(struct kiocb *iocb, struct socket *sock,
 				/* Wait process until data arrives */
 				schedule();
 
-			finish_wait(sk->sk_sleep, &wait);
+			finish_wait(sk_sleep(sk), &wait);
 
 			if (err)
 				goto out;
@@ -1787,7 +1787,7 @@ static unsigned int irda_poll(struct file * file, struct socket *sock,
 	IRDA_DEBUG(4, "%s()\n", __func__);
 
 	lock_kernel();
-	poll_wait(file, sk->sk_sleep, wait);
+	poll_wait(file, sk_sleep(sk), wait);
 	mask = 0;
 
 	/* Exceptional events? */
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index c18286a..9636b7d 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -59,7 +59,7 @@ do {									\
 	DEFINE_WAIT(__wait);						\
 	long __timeo = timeo;						\
 	ret = 0;							\
-	prepare_to_wait(sk->sk_sleep, &__wait, TASK_INTERRUPTIBLE);	\
+	prepare_to_wait(sk_sleep(sk), &__wait, TASK_INTERRUPTIBLE);	\
 	while (!(condition)) {						\
 		if (!__timeo) {						\
 			ret = -EAGAIN;					\
@@ -76,7 +76,7 @@ do {									\
 		if (ret)						\
 			break;						\
 	}								\
-	finish_wait(sk->sk_sleep, &__wait);				\
+	finish_wait(sk_sleep(sk), &__wait);				\
 } while (0)
 
 #define iucv_sock_wait(sk, condition, timeo)				\
@@ -307,7 +307,7 @@ static void iucv_sock_wake_msglim(struct sock *sk)
 {
 	read_lock(&sk->sk_callback_lock);
 	if (sk_has_sleeper(sk))
-		wake_up_interruptible_all(sk->sk_sleep);
+		wake_up_interruptible_all(sk_sleep(sk));
 	sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
 	read_unlock(&sk->sk_callback_lock);
 }
@@ -795,7 +795,7 @@ static int iucv_sock_accept(struct socket *sock, struct socket *newsock,
 	timeo = sock_rcvtimeo(sk, flags & O_NONBLOCK);
 
 	/* Wait for an incoming connection */
-	add_wait_queue_exclusive(sk->sk_sleep, &wait);
+	add_wait_queue_exclusive(sk_sleep(sk), &wait);
 	while (!(nsk = iucv_accept_dequeue(sk, newsock))) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		if (!timeo) {
@@ -819,7 +819,7 @@ static int iucv_sock_accept(struct socket *sock, struct socket *newsock,
 	}
 
 	set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 
 	if (err)
 		goto done;
@@ -1269,7 +1269,7 @@ unsigned int iucv_sock_poll(struct file *file, struct socket *sock,
 	struct sock *sk = sock->sk;
 	unsigned int mask = 0;
 
-	sock_poll_wait(file, sk->sk_sleep, wait);
+	sock_poll_wait(file, sk_sleep(sk), wait);
 
 	if (sk->sk_state == IUCV_LISTEN)
 		return iucv_accept_poll(sk);
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 2db6a9f..023ba82 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -536,7 +536,7 @@ static int llc_ui_wait_for_disc(struct sock *sk, long timeout)
 	int rc = 0;
 
 	while (1) {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		if (sk_wait_event(sk, &timeout, sk->sk_state == TCP_CLOSE))
 			break;
 		rc = -ERESTARTSYS;
@@ -547,7 +547,7 @@ static int llc_ui_wait_for_disc(struct sock *sk, long timeout)
 			break;
 		rc = 0;
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return rc;
 }
 
@@ -556,13 +556,13 @@ static int llc_ui_wait_for_conn(struct sock *sk, long timeout)
 	DEFINE_WAIT(wait);
 
 	while (1) {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		if (sk_wait_event(sk, &timeout, sk->sk_state != TCP_SYN_SENT))
 			break;
 		if (signal_pending(current) || !timeout)
 			break;
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return timeout;
 }
 
@@ -573,7 +573,7 @@ static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout)
 	int rc;
 
 	while (1) {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		rc = 0;
 		if (sk_wait_event(sk, &timeout,
 				  (sk->sk_shutdown & RCV_SHUTDOWN) ||
@@ -588,7 +588,7 @@ static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout)
 		if (!timeout)
 			break;
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return rc;
 }
 
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 8fb0ae6..7ba0693 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -802,7 +802,7 @@ static int sync_thread_backup(void *data)
 		ip_vs_backup_mcast_ifn, ip_vs_backup_syncid);
 
 	while (!kthread_should_stop()) {
-		wait_event_interruptible(*tinfo->sock->sk->sk_sleep,
+		wait_event_interruptible(*sk_sleep(tinfo->sock->sk),
 			 !skb_queue_empty(&tinfo->sock->sk->sk_receive_queue)
 			 || kthread_should_stop());
 
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index fa07f04..06cb027 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -739,7 +739,7 @@ static int nr_connect(struct socket *sock, struct sockaddr *uaddr,
 		DEFINE_WAIT(wait);
 
 		for (;;) {
-			prepare_to_wait(sk->sk_sleep, &wait,
+			prepare_to_wait(sk_sleep(sk), &wait,
 					TASK_INTERRUPTIBLE);
 			if (sk->sk_state != TCP_SYN_SENT)
 				break;
@@ -752,7 +752,7 @@ static int nr_connect(struct socket *sock, struct sockaddr *uaddr,
 			err = -ERESTARTSYS;
 			break;
 		}
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 		if (err)
 			goto out_release;
 	}
@@ -798,7 +798,7 @@ static int nr_accept(struct socket *sock, struct socket *newsock, int flags)
 	 *	hooked into the SABM we saved
 	 */
 	for (;;) {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		skb = skb_dequeue(&sk->sk_receive_queue);
 		if (skb)
 			break;
@@ -816,7 +816,7 @@ static int nr_accept(struct socket *sock, struct socket *newsock, int flags)
 		err = -ERESTARTSYS;
 		break;
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	if (err)
 		goto out_release;
 
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 7919a9e..aebfecb 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -158,7 +158,7 @@ static unsigned int rds_poll(struct file *file, struct socket *sock,
 	unsigned int mask = 0;
 	unsigned long flags;
 
-	poll_wait(file, sk->sk_sleep, wait);
+	poll_wait(file, sk_sleep(sk), wait);
 
 	if (rs->rs_seen_congestion)
 		poll_wait(file, &rds_poll_waitq, wait);
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 4bec6e2..c224b5b 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -492,7 +492,7 @@ void rds_sock_put(struct rds_sock *rs);
 void rds_wake_sk_sleep(struct rds_sock *rs);
 static inline void __rds_wake_sk_sleep(struct sock *sk)
 {
-	wait_queue_head_t *waitq = sk->sk_sleep;
+	wait_queue_head_t *waitq = sk_sleep(sk);
 
 	if (!sock_flag(sk, SOCK_DEAD) && waitq)
 		wake_up(waitq);
diff --git a/net/rds/recv.c b/net/rds/recv.c
index e2a2b93..795a00b 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -432,7 +432,7 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 				break;
 			}
 
-			timeo = wait_event_interruptible_timeout(*sk->sk_sleep,
+			timeo = wait_event_interruptible_timeout(*sk_sleep(sk),
 					(!list_empty(&rs->rs_notify_queue) ||
 					 rs->rs_cong_notify ||
 					 rds_next_incoming(rs, &inc)), timeo);
diff --git a/net/rds/send.c b/net/rds/send.c
index 53d6795..9c1c6bc 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -915,7 +915,7 @@ int rds_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 			goto out;
 		}
 
-		timeo = wait_event_interruptible_timeout(*sk->sk_sleep,
+		timeo = wait_event_interruptible_timeout(*sk_sleep(sk),
 					rds_send_queue_rm(rs, conn, rm,
 							  rs->rs_bound_port,
 							  dport,
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 4fb711a..8e45e76 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -845,7 +845,7 @@ rose_try_next_neigh:
 		DEFINE_WAIT(wait);
 
 		for (;;) {
-			prepare_to_wait(sk->sk_sleep, &wait,
+			prepare_to_wait(sk_sleep(sk), &wait,
 					TASK_INTERRUPTIBLE);
 			if (sk->sk_state != TCP_SYN_SENT)
 				break;
@@ -858,7 +858,7 @@ rose_try_next_neigh:
 			err = -ERESTARTSYS;
 			break;
 		}
-		finish_wait(sk->sk_sleep, &wait);
+		finish_wait(sk_sleep(sk), &wait);
 
 		if (err)
 			goto out_release;
@@ -911,7 +911,7 @@ static int rose_accept(struct socket *sock, struct socket *newsock, int flags)
 	 *	hooked into the SABM we saved
 	 */
 	for (;;) {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
 		skb = skb_dequeue(&sk->sk_receive_queue);
 		if (skb)
@@ -930,7 +930,7 @@ static int rose_accept(struct socket *sock, struct socket *newsock, int flags)
 		err = -ERESTARTSYS;
 		break;
 	}
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	if (err)
 		goto out_release;
 
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index c060095..c432d76 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -65,7 +65,7 @@ static void rxrpc_write_space(struct sock *sk)
 	read_lock(&sk->sk_callback_lock);
 	if (rxrpc_writable(sk)) {
 		if (sk_has_sleeper(sk))
-			wake_up_interruptible(sk->sk_sleep);
+			wake_up_interruptible(sk_sleep(sk));
 		sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
 	}
 	read_unlock(&sk->sk_callback_lock);
@@ -589,7 +589,7 @@ static unsigned int rxrpc_poll(struct file *file, struct socket *sock,
 	unsigned int mask;
 	struct sock *sk = sock->sk;
 
-	sock_poll_wait(file, sk->sk_sleep, wait);
+	sock_poll_wait(file, sk_sleep(sk), wait);
 	mask = 0;
 
 	/* the socket is readable if there are any messages waiting on the Rx
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c194127..f34adcc 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -5702,7 +5702,7 @@ unsigned int sctp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	struct sctp_sock *sp = sctp_sk(sk);
 	unsigned int mask;
 
-	poll_wait(file, sk->sk_sleep, wait);
+	poll_wait(file, sk_sleep(sk), wait);
 
 	/* A TCP-style listening socket becomes readable when the accept queue
 	 * is not empty.
@@ -5943,7 +5943,7 @@ static int sctp_wait_for_packet(struct sock * sk, int *err, long *timeo_p)
 	int error;
 	DEFINE_WAIT(wait);
 
-	prepare_to_wait_exclusive(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+	prepare_to_wait_exclusive(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
 	/* Socket errors? */
 	error = sock_error(sk);
@@ -5980,14 +5980,14 @@ static int sctp_wait_for_packet(struct sock * sk, int *err, long *timeo_p)
 	sctp_lock_sock(sk);
 
 ready:
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	return 0;
 
 interrupted:
 	error = sock_intr_errno(*timeo_p);
 
 out:
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	*err = error;
 	return error;
 }
@@ -6061,8 +6061,8 @@ static void __sctp_write_space(struct sctp_association *asoc)
 			wake_up_interruptible(&asoc->wait);
 
 		if (sctp_writeable(sk)) {
-			if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-				wake_up_interruptible(sk->sk_sleep);
+			if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+				wake_up_interruptible(sk_sleep(sk));
 
 			/* Note that we try to include the Async I/O support
 			 * here by modeling from the current TCP/UDP code.
@@ -6296,7 +6296,7 @@ static int sctp_wait_for_accept(struct sock *sk, long timeo)
 
 
 	for (;;) {
-		prepare_to_wait_exclusive(sk->sk_sleep, &wait,
+		prepare_to_wait_exclusive(sk_sleep(sk), &wait,
 					  TASK_INTERRUPTIBLE);
 
 		if (list_empty(&ep->asocs)) {
@@ -6322,7 +6322,7 @@ static int sctp_wait_for_accept(struct sock *sk, long timeo)
 			break;
 	}
 
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 
 	return err;
 }
@@ -6332,7 +6332,7 @@ static void sctp_wait_for_close(struct sock *sk, long timeout)
 	DEFINE_WAIT(wait);
 
 	do {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 		if (list_empty(&sctp_sk(sk)->ep->asocs))
 			break;
 		sctp_release_sock(sk);
@@ -6340,7 +6340,7 @@ static void sctp_wait_for_close(struct sock *sk, long timeout)
 		sctp_lock_sock(sk);
 	} while (!signal_pending(current) && timeout);
 
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 }
 
 static void sctp_skb_set_owner_r_frag(struct sk_buff *skb, struct sock *sk)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index a29f259..ce0d5b3 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -419,8 +419,8 @@ static void svc_udp_data_ready(struct sock *sk, int count)
 		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
-	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible(sk->sk_sleep);
+	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible(sk_sleep(sk));
 }
 
 /*
@@ -436,10 +436,10 @@ static void svc_write_space(struct sock *sk)
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
 
-	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) {
+	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk))) {
 		dprintk("RPC svc_write_space: someone sleeping on %p\n",
 		       svsk);
-		wake_up_interruptible(sk->sk_sleep);
+		wake_up_interruptible(sk_sleep(sk));
 	}
 }
 
@@ -757,8 +757,8 @@ static void svc_tcp_listen_data_ready(struct sock *sk, int count_unused)
 			printk("svc: socket %p: no user data\n", sk);
 	}
 
-	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible_all(sk->sk_sleep);
+	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible_all(sk_sleep(sk));
 }
 
 /*
@@ -777,8 +777,8 @@ static void svc_tcp_state_change(struct sock *sk)
 		set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
-	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible_all(sk->sk_sleep);
+	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible_all(sk_sleep(sk));
 }
 
 static void svc_tcp_data_ready(struct sock *sk, int count)
@@ -791,8 +791,8 @@ static void svc_tcp_data_ready(struct sock *sk, int count)
 		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
-	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible(sk->sk_sleep);
+	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible(sk_sleep(sk));
 }
 
 /*
@@ -1494,8 +1494,8 @@ static void svc_sock_detach(struct svc_xprt *xprt)
 	sk->sk_data_ready = svsk->sk_odata;
 	sk->sk_write_space = svsk->sk_owspace;
 
-	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible(sk->sk_sleep);
+	if (sk_sleep(sk) && waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible(sk_sleep(sk));
 }
 
 /*
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index cfb20b8..66e889b 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -446,7 +446,7 @@ static unsigned int poll(struct file *file, struct socket *sock,
 	struct sock *sk = sock->sk;
 	u32 mask;
 
-	poll_wait(file, sk->sk_sleep, wait);
+	poll_wait(file, sk_sleep(sk), wait);
 
 	if (!skb_queue_empty(&sk->sk_receive_queue) ||
 	    (sock->state == SS_UNCONNECTED) ||
@@ -591,7 +591,7 @@ static int send_msg(struct kiocb *iocb, struct socket *sock,
 			break;
 		}
 		release_sock(sk);
-		res = wait_event_interruptible(*sk->sk_sleep,
+		res = wait_event_interruptible(*sk_sleep(sk),
 					       !tport->congested);
 		lock_sock(sk);
 		if (res)
@@ -650,7 +650,7 @@ static int send_packet(struct kiocb *iocb, struct socket *sock,
 			break;
 		}
 		release_sock(sk);
-		res = wait_event_interruptible(*sk->sk_sleep,
+		res = wait_event_interruptible(*sk_sleep(sk),
 			(!tport->congested || !tport->connected));
 		lock_sock(sk);
 		if (res)
@@ -931,7 +931,7 @@ restart:
 			goto exit;
 		}
 		release_sock(sk);
-		res = wait_event_interruptible(*sk->sk_sleep,
+		res = wait_event_interruptible(*sk_sleep(sk),
 			(!skb_queue_empty(&sk->sk_receive_queue) ||
 			 (sock->state == SS_DISCONNECTING)));
 		lock_sock(sk);
@@ -1064,7 +1064,7 @@ restart:
 			goto exit;
 		}
 		release_sock(sk);
-		res = wait_event_interruptible(*sk->sk_sleep,
+		res = wait_event_interruptible(*sk_sleep(sk),
 			(!skb_queue_empty(&sk->sk_receive_queue) ||
 			 (sock->state == SS_DISCONNECTING)));
 		lock_sock(sk);
@@ -1271,8 +1271,8 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 		tipc_disconnect_port(tipc_sk_port(sk));
 	}
 
-	if (waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible(sk->sk_sleep);
+	if (waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible(sk_sleep(sk));
 	return TIPC_OK;
 }
 
@@ -1343,8 +1343,8 @@ static void wakeupdispatch(struct tipc_port *tport)
 {
 	struct sock *sk = (struct sock *)tport->usr_handle;
 
-	if (waitqueue_active(sk->sk_sleep))
-		wake_up_interruptible(sk->sk_sleep);
+	if (waitqueue_active(sk_sleep(sk)))
+		wake_up_interruptible(sk_sleep(sk));
 }
 
 /**
@@ -1426,7 +1426,7 @@ static int connect(struct socket *sock, struct sockaddr *dest, int destlen,
 	/* Wait until an 'ACK' or 'RST' arrives, or a timeout occurs */
 
 	release_sock(sk);
-	res = wait_event_interruptible_timeout(*sk->sk_sleep,
+	res = wait_event_interruptible_timeout(*sk_sleep(sk),
 			(!skb_queue_empty(&sk->sk_receive_queue) ||
 			(sock->state != SS_CONNECTING)),
 			sk->sk_rcvtimeo);
@@ -1521,7 +1521,7 @@ static int accept(struct socket *sock, struct socket *new_sock, int flags)
 			goto exit;
 		}
 		release_sock(sk);
-		res = wait_event_interruptible(*sk->sk_sleep,
+		res = wait_event_interruptible(*sk_sleep(sk),
 				(!skb_queue_empty(&sk->sk_receive_queue)));
 		lock_sock(sk);
 		if (res)
@@ -1632,8 +1632,8 @@ restart:
 		/* Discard any unreceived messages; wake up sleeping tasks */
 
 		discard_rx_queue(sk);
-		if (waitqueue_active(sk->sk_sleep))
-			wake_up_interruptible(sk->sk_sleep);
+		if (waitqueue_active(sk_sleep(sk)))
+			wake_up_interruptible(sk_sleep(sk));
 		res = 0;
 		break;
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3d9122e..87c0360 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -316,7 +316,7 @@ static void unix_write_space(struct sock *sk)
 	read_lock(&sk->sk_callback_lock);
 	if (unix_writable(sk)) {
 		if (sk_has_sleeper(sk))
-			wake_up_interruptible_sync(sk->sk_sleep);
+			wake_up_interruptible_sync(sk_sleep(sk));
 		sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
 	}
 	read_unlock(&sk->sk_callback_lock);
@@ -1736,7 +1736,7 @@ static long unix_stream_data_wait(struct sock *sk, long timeo)
 	unix_state_lock(sk);
 
 	for (;;) {
-		prepare_to_wait(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
+		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
 
 		if (!skb_queue_empty(&sk->sk_receive_queue) ||
 		    sk->sk_err ||
@@ -1752,7 +1752,7 @@ static long unix_stream_data_wait(struct sock *sk, long timeo)
 		clear_bit(SOCK_ASYNC_WAITDATA, &sk->sk_socket->flags);
 	}
 
-	finish_wait(sk->sk_sleep, &wait);
+	finish_wait(sk_sleep(sk), &wait);
 	unix_state_unlock(sk);
 	return timeo;
 }
@@ -1991,7 +1991,7 @@ static unsigned int unix_poll(struct file *file, struct socket *sock, poll_table
 	struct sock *sk = sock->sk;
 	unsigned int mask;
 
-	sock_poll_wait(file, sk->sk_sleep, wait);
+	sock_poll_wait(file, sk_sleep(sk), wait);
 	mask = 0;
 
 	/* exceptional events? */
@@ -2028,7 +2028,7 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 	struct sock *sk = sock->sk, *other;
 	unsigned int mask, writable;
 
-	sock_poll_wait(file, sk->sk_sleep, wait);
+	sock_poll_wait(file, sk_sleep(sk), wait);
 	mask = 0;
 
 	/* exceptional events? */
diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
index cbddd0c..6cffbc4 100644
--- a/net/x25/af_x25.c
+++ b/net/x25/af_x25.c
@@ -718,7 +718,7 @@ static int x25_wait_for_connection_establishment(struct sock *sk)
 	DECLARE_WAITQUEUE(wait, current);
 	int rc;
 
-	add_wait_queue_exclusive(sk->sk_sleep, &wait);
+	add_wait_queue_exclusive(sk_sleep(sk), &wait);
 	for (;;) {
 		__set_current_state(TASK_INTERRUPTIBLE);
 		rc = -ERESTARTSYS;
@@ -738,7 +738,7 @@ static int x25_wait_for_connection_establishment(struct sock *sk)
 			break;
 	}
 	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 	return rc;
 }
 
@@ -838,7 +838,7 @@ static int x25_wait_for_data(struct sock *sk, long timeout)
 	DECLARE_WAITQUEUE(wait, current);
 	int rc = 0;
 
-	add_wait_queue_exclusive(sk->sk_sleep, &wait);
+	add_wait_queue_exclusive(sk_sleep(sk), &wait);
 	for (;;) {
 		__set_current_state(TASK_INTERRUPTIBLE);
 		if (sk->sk_shutdown & RCV_SHUTDOWN)
@@ -858,7 +858,7 @@ static int x25_wait_for_data(struct sock *sk, long timeout)
 			break;
 	}
 	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(sk->sk_sleep, &wait);
+	remove_wait_queue(sk_sleep(sk), &wait);
 	return rc;
 }
 



^ permalink raw reply related

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of  outgoing connections (unable to bind ... )
From: Ben Greear @ 2010-04-20 23:07 UTC (permalink / raw)
  To: Gaspar Chilingarov; +Cc: netdev
In-Reply-To: <g2z46c8cb3e1004201517i5641a75cze2ec5bd33e81fb0f@mail.gmail.com>

On 04/20/2010 03:17 PM, Gaspar Chilingarov wrote:

> I would be grateful for hints where to look in the source -- may be I
> can produce some working patches for it.

I've opened 40,000 connections to/from my machine (over external interfaces)
on a slightly hacked .31 kernel.  This is 80,000 sockets total.  I bind
to local IP and port range, as well as SO_BINDTODEVICE.

I'm using a 64-bit system with 12GB of RAM, quad-core i7 3.3Ghz, etc.

It takes a lot of RAM to do this but you can probably use less RAM in user-space
than I am.  Be sure to set your socket buffer sizes small.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Gaspar Chilingarov @ 2010-04-20 23:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1271802676.7895.457.camel@edumazet-laptop>

>
> Its doable, only if you bind() your sockets before connect()
>
> For each socket, you'll need to chose an (local IP, local port) not
> already in use.
>
> kernel wont magically select one source IP from the pool you have.
>

I expect that I should select source IP and kernel should find some
unused local_port for me.

I'm binding before the connect, of course ;)

-- 
Gaspar Chilingarov

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Gaspar Chilingarov @ 2010-04-20 23:20 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4BCE33B9.8050101@candelatech.com>

Opening that amount of incoming connections in not a problem at all.

I'm speaking of outgoing connections.

What kind of hack do you have on your kernel?

Thanks in advance,
Gaspar

2010/4/21 Ben Greear <greearb@candelatech.com>:
> On 04/20/2010 03:17 PM, Gaspar Chilingarov wrote:
>
>> I would be grateful for hints where to look in the source -- may be I
>> can produce some working patches for it.
>
> I've opened 40,000 connections to/from my machine (over external interfaces)
> on a slightly hacked .31 kernel.  This is 80,000 sockets total.  I bind
> to local IP and port range, as well as SO_BINDTODEVICE.
>
> I'm using a 64-bit system with 12GB of RAM, quad-core i7 3.3Ghz, etc.
>
> It takes a lot of RAM to do this but you can probably use less RAM in
> user-space
> than I am.  Be sure to set your socket buffer sizes small.
>
> Thanks,
> Ben
>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Gaspar Chilingarov @ 2010-04-20 23:20 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4BCE33B9.8050101@candelatech.com>

Opening that amount of incoming connections in not a problem at all.

I'm speaking of outgoing connections.

What kind of hack do you have on your kernel?

Thanks in advance,
Gaspar

2010/4/21 Ben Greear <greearb@candelatech.com>:
> On 04/20/2010 03:17 PM, Gaspar Chilingarov wrote:
>
>> I would be grateful for hints where to look in the source -- may be I
>> can produce some working patches for it.
>
> I've opened 40,000 connections to/from my machine (over external interfaces)
> on a slightly hacked .31 kernel.  This is 80,000 sockets total.  I bind
> to local IP and port range, as well as SO_BINDTODEVICE.
>
> I'm using a 64-bit system with 12GB of RAM, quad-core i7 3.3Ghz, etc.
>
> It takes a lot of RAM to do this but you can probably use less RAM in
> user-space
> than I am.  Be sure to set your socket buffer sizes small.
>
> Thanks,
> Ben
>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>

-- 
Gaspar Chilingarov

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Ben Greear @ 2010-04-20 23:30 UTC (permalink / raw)
  To: Gaspar Chilingarov; +Cc: netdev
In-Reply-To: <u2t46c8cb3e1004201620tf9fce087ibae421759f9e1e84@mail.gmail.com>

On 04/20/2010 04:20 PM, Gaspar Chilingarov wrote:
> Opening that amount of incoming connections in not a problem at all.
>
> I'm speaking of outgoing connections.
>
> What kind of hack do you have on your kernel?

send-to-self logic and similar..but nothing that should be
better at doing lots of sockets than the default kernel.

I have 40,000 out-going connections to myself..basically I'm
acting as both client and server.  So, I'd expect you to be
able to do around 80,000 connections to some external system on
similar hardware and efficient client software.  At least with
my software, I'd need more RAM to scale much beyond that, but I'm
trying to generate traffic on these sockets (which requires user-space
buffers per socket in my implementation), and I have a decent bit
of overhead gathering stats and such.  A simpler application with
less overhead should do more sockets.

Be sure to use lots of processes so you don't end up with
a poll loop with 80,000 fd entries.  I found that 10k sockets
per process was a good upper limit..but it's not a hard
upper limit.

And be sure to do the bind logic correctly as Eric pointed out in an
earlier email.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Gaspar Chilingarov @ 2010-04-20 23:35 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4BCE392F.60104@candelatech.com>

> I have 40,000 out-going connections to myself..basically I'm
> acting as both client and server.  So, I'd expect you to be

What is yours

sysctl -a | grep local_port_range

?


>
> Be sure to use lots of processes so you don't end up with
> a poll loop with 80,000 fd entries.  I found that 10k sockets
> per process was a good upper limit..but it's not a hard
> upper limit.

sure -- it's a next step :)

>
> And be sure to do the bind logic correctly as Eric pointed out in an
> earlier email.
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>



-- 
Gaspar Chilingarov

^ permalink raw reply

* Re: [PATCHv2 1/7] X25: Add if_x25.h and x25 to device identifiers
From: David Miller @ 2010-04-20 23:35 UTC (permalink / raw)
  To: andrew.hendry; +Cc: netdev
In-Reply-To: <1271719717.2802.77.camel@ibex>

From: Andrew Hendry <andrew.hendry@gmail.com>
Date: Tue, 20 Apr 2010 09:28:37 +1000

> diff --git a/include/linux/if_x25.h b/include/linux/if_x25.h
> new file mode 100644
> index 0000000..897765f
> --- /dev/null
> +++ b/include/linux/if_x25.h
> @@ -0,0 +1,26 @@
> +/*
> + *  Linux X.25 packet to device interface

Headers meant to be used by userspace must be added
to the include/linux/Kbuild file.

^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: consistent rxhash
From: Changli Gao @ 2010-04-20 23:35 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, franco, therbert, netdev
In-Reply-To: <20100420.144106.118596093.davem@davemloft.net>

On Wed, Apr 21, 2010 at 5:41 AM, David Miller <davem@davemloft.net> wrote:
> Eric, do you remember that "TCP friends" rough patch I sent you last
> year that essentailly made TCP sockets over loopback behave like
> AF_UNIX ones and just queue the SKBs directly to the destination
> socket without doing any protocol work?

I think it will break some benchmark tools. The loopback device is for
testing networking protocol stacks, so we shouldn't bypass the
protocol processing. And anyone who has a performance problem of
loopback device should turn to UNIX domain socket.

For routers, how about letting users choose whether RPS mixes layer 4 info in?

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH net-next-2.6] rps: consistent rxhash
From: David Miller @ 2010-04-20 23:38 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, franco, therbert, netdev
In-Reply-To: <v2s412e6f7f1004201635m5258e777oac49fea8f9625bcf@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Wed, 21 Apr 2010 07:35:48 +0800

> On Wed, Apr 21, 2010 at 5:41 AM, David Miller <davem@davemloft.net> wrote:
>> Eric, do you remember that "TCP friends" rough patch I sent you last
>> year that essentailly made TCP sockets over loopback behave like
>> AF_UNIX ones and just queue the SKBs directly to the destination
>> socket without doing any protocol work?
> 
> I think it will break some benchmark tools.

Other systems already do this optimization, so if things break, this
breakage is already pervasive.

We should be able to tell people that they can use TCP solely in their
applications and it will perform optimally regardless of transport.

People already code their applications this way, and ignoring
this issue would just makes us stupid.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_sleep() helper
From: David Miller @ 2010-04-20 23:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1271804631.7895.519.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Apr 2010 01:03:51 +0200

> [PATCH net-next-2.6] net: sk_sleep() helper

Looks good, applied, thanks Eric.

^ permalink raw reply

* crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-20 23:40 UTC (permalink / raw)
  To: lists.linux-foundation.org, netdev; +Cc: kaber, davem

Hi

I got this crash on 2.6.34-rc4 when using it as a bridge. The crash is not 
reproducible. There are two interfaces, eth0 (sun-hme) and eth1 (tg3) and 
there is a bridge between them.

The crash was in inlined function skb_drop_list, inlined from 
skb_drop_fraglist from skb_release_data. The "list" pointer was 0x18.

This is backtrace:
skb_release_data+7c/e0
__kfree_skb+c/c0
dev_hard_start_xmit+288/3c0
sch_direct_xmit+12c/1c0
dev_queue_xmit+3fc/520
br_dev_queue_push_xmit+60/80
br_handle_frame_finish+100/1e00
br_handle_frame+168/240
netif_receive_skb+274/480
process_backlog+64/c0
net_rx_action+cc/180
__do_softirq+94/120

Settings for for eth0 and eth1 are:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

For br0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off

(there is no netfilter compiled on that machine)

I'm curious --- this code path in dev_hard_start_xmit is taken only if GSO 
is used. From the trace, it can be seen that a packet was received by one 
nic and forwarded by the bridge to the other nic. How could GSO be used in 
this scenario?

---

Reviewing the code further, I found one very weird commit 
572a9d7b6fc7f20f573664063324c086be310c42 committed to 2.6.33. What it 
does, it changes the semantics of ndo_hard_start_xmit(). Prior to the 
patch, the meaning was --- return zero (NETDEV_TX_OK) --- the skb is 
consumed by the driver. Returns non-zero --- the skb is left owned by the 
caller. The patch changes it to return other flags in bits 4-7 and changes 
the consumed/returned logic.

The problem is that there is still plenty of code that compares it against 
NETDEV_TX_OK to find out if the skb was consumed.

I don't know if this caused my crash (it is not reproducible), but the 
patch is buggy and dangerous.

Handling of the return codes is inconsistent:
* most callers use dev_xmit_complete(int rc) { return rc < 
NET_XMIT_MASK(0x0f); } to test if the skb was consumed. This seems to be 
the method to be used intended by the developers. (why not <= 0x0f ?)

* dev_hard_start_xmit uses rc == NETDEV_TX_OK || rc & ~NETDEV_TX_MASK 
(~0xf0) to test if the skb is consumed. This is differs from 
dev_xmit_complete for some values.

net/core/netpoll.c : at two places, it compares the return value of 
ndo_start_xmit with NETDEV_TX_OK
net/sched/sch_teql.c : compares with NETDEV_TX_OK
drivers/net/wan/dlci.c : ignores the return value of ndo_start_xmit

--- the above inconsistencies lead to situations where both the caller and 
the callee think that they own the skb, and the result is memory 
corruption.

If we grep for places that assume that the packet was consumed if the 
return code is NETDEV_TX_OK, there are even more:
drivers/net/qla3xxx.c
drivers/net/chelsio/sge.c
drivers/net/wireless/hostap/hostap_80211_tx.c
drivers/net/wireless/ipw2x00/libipw_tx.c
drivers/net/sfc/selftest.c
net/mac80211/tx.c
--- Not all these cases are bugs, because sometimes the return value is 
self-generated by the driver. But it is just confusing and it may trigger 
bugs in the future.

---

I'd recommend to revert 572a9d7b6fc7f20f573664063324c086be310c42 because 
it made several bugs (the code that compared the return value against 
NETDEV_TX_OK was correct before and is buggy after the patch). 
Furthermore, there may be hidden bugs, if someone compares the return 
value with 0 and frees skb based on this comparison, it is impossible to 
find it with grep, yet changing the meaning of return values would make a 
bug here.

Mikulas

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Eric Dumazet @ 2010-04-20 23:42 UTC (permalink / raw)
  To: Gaspar Chilingarov; +Cc: netdev
In-Reply-To: <v2n46c8cb3e1004201618s37e9f0c6j7527e59673a34322@mail.gmail.com>

Le mercredi 21 avril 2010 à 04:18 +0500, Gaspar Chilingarov a écrit :
> >
> > Its doable, only if you bind() your sockets before connect()
> >
> > For each socket, you'll need to chose an (local IP, local port) not
> > already in use.
> >
> > kernel wont magically select one source IP from the pool you have.
> >
> 
> I expect that I should select source IP and kernel should find some
> unused local_port for me.
> 
> I'm binding before the connect, of course ;)
> 
> 

Yes, you bind the IP, but let kernel choose the port.

In this case, kernel is a bit dumb (or too smart ?)

If you want to check source, its in file net/ipv4/inet_connection_sock.c

function inet_csk_get_port()

you can remove the 

if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) { 
	spin_unlock(&head->lock); 
	snum = smallest_rover; 
	goto have_snum;
}

It will solve your problem (but bind() will probably be slow)




^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-20 23:45 UTC (permalink / raw)
  To: mpatocka; +Cc: lists.linux-foundation.org, netdev, kaber
In-Reply-To: <Pine.LNX.4.64.1004201834190.29017@hs20-bc2-1.build.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 20 Apr 2010 19:40:56 -0400 (EDT)

> Reviewing the code further, I found one very weird commit
> 572a9d7b6fc7f20f573664063324c086be310c42 committed to 2.6.33. What
> it does, it changes the semantics of ndo_hard_start_xmit(). Prior to
> the patch, the meaning was --- return zero (NETDEV_TX_OK) --- the
> skb is consumed by the driver. Returns non-zero --- the skb is left
> owned by the caller.  The patch changes it to return other flags in
> bits 4-7 and changes the consumed/returned logic.
> 
> The problem is that there is still plenty of code that compares it
> against NETDEV_TX_OK to find out if the skb was consumed.

Drivers are not supposed to return those new flag bits, the new flag
bits as return values exist only in the packet scheduler path.

We're not reverting the commit you mention, we're going to fix
whatever bug exists instead.

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-20 23:48 UTC (permalink / raw)
  To: mpatocka; +Cc: lists.linux-foundation.org, netdev, kaber
In-Reply-To: <20100420.164505.34362726.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 20 Apr 2010 16:45:05 -0700 (PDT)

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 19:40:56 -0400 (EDT)
> 
>> Reviewing the code further, I found one very weird commit
>> 572a9d7b6fc7f20f573664063324c086be310c42 committed to 2.6.33. What
>> it does, it changes the semantics of ndo_hard_start_xmit(). Prior to
>> the patch, the meaning was --- return zero (NETDEV_TX_OK) --- the
>> skb is consumed by the driver. Returns non-zero --- the skb is left
>> owned by the caller.  The patch changes it to return other flags in
>> bits 4-7 and changes the consumed/returned logic.
>> 
>> The problem is that there is still plenty of code that compares it
>> against NETDEV_TX_OK to find out if the skb was consumed.
> 
> Drivers are not supposed to return those new flag bits, the new flag
> bits as return values exist only in the packet scheduler path.

And BTW, NETDEV_TX_OK is only ever returned by itself, the
flag bits only get set when a non-NETDEV_TX_OK value is returned.

So we really haven't changed semantics at all, NETDEV_TX_OK (which is
zero) and non-zero are the two valid return value cases.

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Ben Greear @ 2010-04-20 23:49 UTC (permalink / raw)
  To: Gaspar Chilingarov; +Cc: netdev
In-Reply-To: <y2p46c8cb3e1004201635s8f84672dq90d3115dca04a1ed@mail.gmail.com>

On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote:
> sysctl -a | grep local_port_range

[root@ct503-10G-09 ~]# sysctl -a | grep local_port_range
net.ipv4.ip_local_port_range = 10000	61000

I'm explicitly binding to local ports as well as local IPs, btw.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Gaspar Chilingarov @ 2010-04-20 23:57 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4BCE3D8D.3030500@candelatech.com>

Ohhh, see my bug report again, please :)

The problem arises when used port count is close to or more than the
number of local ports available on the system (in your case 40000
client connections is smaller than 51000 of available ports).

I think that if you choose to go with default settings and try to bind
half of your ports to one local IP and half of your ports - to another
one -- you will hit this bug as well.

If you explicitly bind to the local ports as well - it _may_ solve a
problem - I will this today (it's just a workaround, but in fact not
the solution :), but if you let kernel automatically find local ports
- you will get the errors.

/Gaspar

2010/4/21 Ben Greear <greearb@candelatech.com>:
> On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote:
>>
>> sysctl -a | grep local_port_range
>
> [root@ct503-10G-09 ~]# sysctl -a | grep local_port_range
> net.ipv4.ip_local_port_range = 10000    61000
>
> I'm explicitly binding to local ports as well as local IPs, btw.
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>

-- 
Gaspar Chilingarov

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-20 23:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20100420.164505.34362726.davem@davemloft.net>

On Tue, 20 Apr 2010, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 19:40:56 -0400 (EDT)
> 
> > Reviewing the code further, I found one very weird commit
> > 572a9d7b6fc7f20f573664063324c086be310c42 committed to 2.6.33. What
> > it does, it changes the semantics of ndo_hard_start_xmit(). Prior to
> > the patch, the meaning was --- return zero (NETDEV_TX_OK) --- the
> > skb is consumed by the driver. Returns non-zero --- the skb is left
> > owned by the caller.  The patch changes it to return other flags in
> > bits 4-7 and changes the consumed/returned logic.
> > 
> > The problem is that there is still plenty of code that compares it
> > against NETDEV_TX_OK to find out if the skb was consumed.
> 
> Drivers are not supposed to return those new flag bits, the new flag
> bits as return values exist only in the packet scheduler path.

If they are not supposed to return it, why do you test it in 
dev_hard_start_xmit?

Define clearly what the return status is allowed (and maybe enforce it 
with BUG()s) and follow that definition.

Currently, there is no specification and the code is unintelligible crap: 
it first masks the value with ~0xf0 to test if the skb is consumed 
(dev_hard_start_xmit) and then compares it with 0x0f for the same test 
(dev_queue_xmit).

> We're not reverting the commit you mention, we're going to fix
> whatever bug exists instead.

So have a fun searching for them :-/

Mikulas

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: David Miller @ 2010-04-21  0:00 UTC (permalink / raw)
  To: mpatocka; +Cc: netdev, kaber
In-Reply-To: <Pine.LNX.4.64.1004201947470.29017@hs20-bc2-1.build.redhat.com>

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 20 Apr 2010 19:57:45 -0400 (EDT)

>> We're not reverting the commit you mention, we're going to fix
>> whatever bug exists instead.
> 
> So have a fun searching for them :-/

When you report a problem and your knee-jerk reaction is "revert this
commit", expect some push back.

We'll instead analyze this bug and take the time to figure out the
most appropriate course of action.

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Eric Dumazet @ 2010-04-21  0:05 UTC (permalink / raw)
  To: Ben Greear, David Miller; +Cc: Gaspar Chilingarov, netdev
In-Reply-To: <4BCE3D8D.3030500@candelatech.com>

Le mardi 20 avril 2010 à 16:49 -0700, Ben Greear a écrit :
> On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote:
> > sysctl -a | grep local_port_range
> 
> [root@ct503-10G-09 ~]# sysctl -a | grep local_port_range
> net.ipv4.ip_local_port_range = 10000	61000
> 
> I'm explicitly binding to local ports as well as local IPs, btw.
> 

I believe the bsockets 'optimization' is a bug, we should remove it.

This is a stable candidate (2.6.30+)

[PATCH net-next-2.6] tcp: remove bsockets count

Counting number of bound sockets to avoid a loop is buggy, since we cant
know how many IP addresses are in use. When threshold is reached, we try
5 random slots and can fail while there are plenty available ports.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/inet_hashtables.h   |    2 --
 net/ipv4/inet_connection_sock.c |    5 -----
 net/ipv4/inet_hashtables.c      |    5 -----
 3 files changed, 12 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 74358d1..e0f3a05 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -150,8 +150,6 @@ struct inet_hashinfo {
 	 */
 	struct inet_listen_hashbucket	listening_hash[INET_LHTABLE_SIZE]
 					____cacheline_aligned_in_smp;
-
-	atomic_t			bsockets;
 };
 
 static inline struct inet_ehash_bucket *inet_ehash_bucket(
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 8da6429..0bbfd00 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -119,11 +119,6 @@ again:
 					    (tb->num_owners < smallest_size || smallest_size == -1)) {
 						smallest_size = tb->num_owners;
 						smallest_rover = rover;
-						if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) {
-							spin_unlock(&head->lock);
-							snum = smallest_rover;
-							goto have_snum;
-						}
 					}
 					goto next;
 				}
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 2b79377..4bc921f 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -62,8 +62,6 @@ void inet_bind_hash(struct sock *sk, struct inet_bind_bucket *tb,
 {
 	struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo;
 
-	atomic_inc(&hashinfo->bsockets);
-
 	inet_sk(sk)->inet_num = snum;
 	sk_add_bind_node(sk, &tb->owners);
 	tb->num_owners++;
@@ -81,8 +79,6 @@ static void __inet_put_port(struct sock *sk)
 	struct inet_bind_hashbucket *head = &hashinfo->bhash[bhash];
 	struct inet_bind_bucket *tb;
 
-	atomic_dec(&hashinfo->bsockets);
-
 	spin_lock(&head->lock);
 	tb = inet_csk(sk)->icsk_bind_hash;
 	__sk_del_bind_node(sk);
@@ -551,7 +547,6 @@ void inet_hashinfo_init(struct inet_hashinfo *h)
 {
 	int i;
 
-	atomic_set(&h->bsockets, 0);
 	for (i = 0; i < INET_LHTABLE_SIZE; i++) {
 		spin_lock_init(&h->listening_hash[i].lock);
 		INIT_HLIST_NULLS_HEAD(&h->listening_hash[i].head,



^ permalink raw reply related

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Gaspar Chilingarov @ 2010-04-21  0:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ben Greear, David Miller, netdev
In-Reply-To: <1271808314.7895.614.camel@edumazet-laptop>

2010/4/21 Eric Dumazet <eric.dumazet@gmail.com>:
> Le mardi 20 avril 2010 à 16:49 -0700, Ben Greear a écrit :
>> On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote:
>> > sysctl -a | grep local_port_range
>>
>> [root@ct503-10G-09 ~]# sysctl -a | grep local_port_range
>> net.ipv4.ip_local_port_range = 10000  61000
>>
>> I'm explicitly binding to local ports as well as local IPs, btw.
>>
>
> I believe the bsockets 'optimization' is a bug, we should remove it.
>
> This is a stable candidate (2.6.30+)
>
> [PATCH net-next-2.6] tcp: remove bsockets count
>
> Counting number of bound sockets to avoid a loop is buggy, since we cant
> know how many IP addresses are in use. When threshold is reached, we try
> 5 random slots and can fail while there are plenty available ports.
>

Thank you a lot for the patch - I will try it.

In FreeBSD I was able to add about 32 C classes (8192 ips) on the
single interface (never tried to do that in Linux yet :) - so you
really never know how much IP's are there available. Tens and even up
to hundred IPs on the single machine are not that usual in hosting
environment at all.

/Gaspar

^ permalink raw reply

* Re: crash with bridge and inconsistent handling of NETDEV_TX_OK
From: Mikulas Patocka @ 2010-04-21  0:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, kaber
In-Reply-To: <20100420.170038.130590455.davem@davemloft.net>

On Tue, 20 Apr 2010, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Tue, 20 Apr 2010 19:57:45 -0400 (EDT)
> 
> >> We're not reverting the commit you mention, we're going to fix
> >> whatever bug exists instead.
> > 
> > So have a fun searching for them :-/
> 
> When you report a problem and your knee-jerk reaction is "revert this
> commit", expect some push back.

I tried to analyze skb handling and couldn't check correctness of this 
commit. I really think it should be reverted and applied later, when the 
author does it correctly (for example, one bit in the return value to 
indicate consumed/returned skb, other unimportant informational bits).

> We'll instead analyze this bug and take the time to figure out the
> most appropriate course of action.

And do you have any idea what caused it? Why is it using GSO on bridging?

Mikulas

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: David Miller @ 2010-04-21  0:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: greearb, gasparch, netdev
In-Reply-To: <1271808314.7895.614.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Apr 2010 02:05:14 +0200

> [PATCH net-next-2.6] tcp: remove bsockets count
> 
> Counting number of bound sockets to avoid a loop is buggy, since we cant
> know how many IP addresses are in use. When threshold is reached, we try
> 5 random slots and can fail while there are plenty available ports.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Hmmm, yes indeed it seems there are improper assumptions made by
this scheme.

This is a tricky area, so I'll wait for some test results from the
reporter and study the code over a few times myself to make sure
we get this right.

^ permalink raw reply

* Re: PROBLEM: Linux kernel 2.6.31 IPv4 TCP fails to open huge amount of outgoing connections (unable to bind ... )
From: Eric Dumazet @ 2010-04-21  0:14 UTC (permalink / raw)
  To: Gaspar Chilingarov; +Cc: Ben Greear, netdev
In-Reply-To: <n2g46c8cb3e1004201657t50599d7em5535f2f12af96b59@mail.gmail.com>

Le mercredi 21 avril 2010 à 04:57 +0500, Gaspar Chilingarov a écrit :
> Ohhh, see my bug report again, please :)
> 
> The problem arises when used port count is close to or more than the
> number of local ports available on the system (in your case 40000
> client connections is smaller than 51000 of available ports).
> 
> I think that if you choose to go with default settings and try to bind
> half of your ports to one local IP and half of your ports - to another
> one -- you will hit this bug as well.
> 
> If you explicitly bind to the local ports as well - it _may_ solve a
> problem - I will this today (it's just a workaround, but in fact not
> the solution :), but if you let kernel automatically find local ports
> - you will get the errors.

Yes, I posted a preliminar patch to solve this problem, please test
it ;)

But beware bind(port=0) time might be long. This is why high perf
programs prefer to give the exact port at bind(IP, port) time.

Algo will scan the whole range to find the port used by minimum number
of connection. Then it'll check if the selected IP was already in use.

If not -> ok, port found

If yes -> redo five time the same loop (I guess this redo should be
changed too, to use something else.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox