* [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
@ 2015-06-06 15:53 Eric Dumazet
2015-06-06 16:38 ` Maciej Żenczykowski
2015-06-07 2:30 ` Neal Cardwell
0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2015-06-06 15:53 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Michael Kerrisk, Maciej Żenczykowski
From: Eric Dumazet <edumazet@google.com>
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).
As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())
With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.
This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.
The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.
This new feature is available for both IPv4 and IPv6.
Tested:
Wrote a test program and checked its behavior.
strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6
setsockopt(6, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(6, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(6, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.4")}, 16) = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(35032), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
I was able to bind()/connect() a million concurrent sockets, instead of
~32000 before patch.
lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets
lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66
Check that a given source port is indeed used by many different
connections :
lpaa23:~# ss -t src :40000 | head -10
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.2:40000 127.0.202.33:44983
ESTAB 0 0 127.0.0.2:40000 127.2.27.240:44983
ESTAB 0 0 127.0.0.2:40000 127.2.98.5:44983
ESTAB 0 0 127.0.0.2:40000 127.0.124.196:44983
ESTAB 0 0 127.0.0.2:40000 127.2.139.38:44983
ESTAB 0 0 127.0.0.2:40000 127.1.59.80:44983
ESTAB 0 0 127.0.0.2:40000 127.3.6.228:44983
ESTAB 0 0 127.0.0.2:40000 127.0.38.53:44983
ESTAB 0 0 127.0.0.2:40000 127.1.197.10:44983
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/inet_sock.h | 1 +
include/uapi/linux/in.h | 1 +
net/ipv4/af_inet.c | 3 ++-
net/ipv4/ip_sockglue.c | 7 +++++++
4 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index b6c3737da4e9..47eb67b08abd 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -187,6 +187,7 @@ struct inet_sock {
transparent:1,
mc_all:1,
nodefrag:1;
+ __u8 bind_address_no_port:1;
__u8 rcv_tos;
__u8 convert_csum;
int uc_index;
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 641338bef651..83d6236a2f08 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -112,6 +112,7 @@ struct in_addr {
#define IP_MINTTL 21
#define IP_NODEFRAG 22
#define IP_CHECKSUM 23
+#define IP_BIND_ADDRESS_NO_PORT 24
/* IP_MTU_DISCOVER values */
#define IP_PMTUDISC_DONT 0 /* Never send DF frames */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6ad0f7a711c9..cc858ef44451 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -488,7 +488,8 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
inet->inet_saddr = 0; /* Use device */
/* Make sure we are allowed to bind here. */
- if (sk->sk_prot->get_port(sk, snum)) {
+ if ((snum || !inet->bind_address_no_port) &&
+ sk->sk_prot->get_port(sk, snum)) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
err = -EADDRINUSE;
goto out_release_sock;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 7cfb0893f263..04ae2992a5cd 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -582,6 +582,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
case IP_TRANSPARENT:
case IP_MINTTL:
case IP_NODEFRAG:
+ case IP_BIND_ADDRESS_NO_PORT:
case IP_UNICAST_IF:
case IP_MULTICAST_TTL:
case IP_MULTICAST_ALL:
@@ -732,6 +733,9 @@ static int do_ip_setsockopt(struct sock *sk, int level,
}
inet->nodefrag = val ? 1 : 0;
break;
+ case IP_BIND_ADDRESS_NO_PORT:
+ inet->bind_address_no_port = val ? 1 : 0;
+ break;
case IP_MTU_DISCOVER:
if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_OMIT)
goto e_inval;
@@ -1324,6 +1328,9 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_NODEFRAG:
val = inet->nodefrag;
break;
+ case IP_BIND_ADDRESS_NO_PORT:
+ val = inet->bind_address_no_port;
+ break;
case IP_MTU_DISCOVER:
val = inet->pmtudisc;
break;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
2015-06-06 15:53 [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations Eric Dumazet
@ 2015-06-06 16:38 ` Maciej Żenczykowski
2015-06-06 17:39 ` Eric Dumazet
2015-06-06 19:40 ` David Miller
2015-06-07 2:30 ` Neal Cardwell
1 sibling, 2 replies; 8+ messages in thread
From: Maciej Żenczykowski @ 2015-06-06 16:38 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Michael Kerrisk
Hmm, I certainly like this.
So IMHO this is indeed much better than a sysctl to select a magic
port to ignore during a bind call (previous internal patchset),
although it does use up one more bit per socket (and one more syscall
per connect).
---
Thinking about this some more, I think it might be possible to make
this behaviour automatic in certain cases.
The new socket bit has 2 different meanings, depending on whether a
port is already allocated or not.
if a port is not yet allocated, it governs whether bind(port=0) will
allocate a port.
if a port is already allocated, it flags whether it was autoallocated
(obviously could also just use 2 bits instead of 1)
bind(with port=0)
if the flag is set, doesn't select a port [ie. this patch]
if the flag wasn't set, selects a port, sets the flag
getsockname()
if a port has been selected and the flag is set, clears the flag
[we've now revealed the port to userspace so can no longer change it]
connect()
if a port has already been selected and the flag is (still) set,
release the port
[side note: in order to prevent spurious failures it's possible you
would have to release the port after allocating the new 4-tuple, so
that if that fails, you can still use the pre-allocated port]
[perhaps after successful connect() or listen() the flag should always
be clear(ed)]
End result:
bind(port=0) connect() without an interleaved getsockname() gets
this ephemeral-port-saving behaviour without userspace changes.
Obviously this is a fair bit of jumping through hoops - but it does
have the benefit of improving ephemeral port use even for unmodified
applications.
---
Less well thought out musings, maybe untenable:
Or perhaps the already existing SOCK_BINDPORT_LOCK could be abused somehow...
The setsockopt could set (and clear) that flag instead of the new bit?
Obviously the setsockopt would only allow changing the flag if port is
still unallocated.
And in bind() that flag being set would prevent automatic allocation
of a port if port=0 was asked for?
Not sure if saving a bit in the socket is worth these additional extra hoops.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
2015-06-06 16:38 ` Maciej Żenczykowski
@ 2015-06-06 17:39 ` Eric Dumazet
2015-06-06 19:40 ` David Miller
1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2015-06-06 17:39 UTC (permalink / raw)
To: Maciej Żenczykowski; +Cc: David Miller, netdev, Michael Kerrisk
On Sat, 2015-06-06 at 18:38 +0200, Maciej Żenczykowski wrote:
> Hmm, I certainly like this.
>
> So IMHO this is indeed much better than a sysctl to select a magic
> port to ignore during a bind call (previous internal patchset),
> although it does use up one more bit per socket (and one more syscall
> per connect).
>
> ---
>
> Thinking about this some more, I think it might be possible to make
> this behaviour automatic in certain cases.
>
> The new socket bit has 2 different meanings, depending on whether a
> port is already allocated or not.
> if a port is not yet allocated, it governs whether bind(port=0) will
> allocate a port.
> if a port is already allocated, it flags whether it was autoallocated
> (obviously could also just use 2 bits instead of 1)
>
> bind(with port=0)
> if the flag is set, doesn't select a port [ie. this patch]
> if the flag wasn't set, selects a port, sets the flag
But this the problematic part here with multi threaded applications,
and servers where all ephemeral ports are already in use by at least one
socket.
Also think about cohabitation with applications not using yet this
knowledge (lets say they use bind(0), getsockname(), connect())
My patch allows bind(0) to succeed even if all ports are in use.
Then connect() is almost guaranteed to succeed, unless this host already
have ~32000 sessions with exact same
(source_ip, destination_ip, destination_port) 3-tuple.
connect() already can return EADDRINUSE for this case.
(Some applications tried SO_REUSEADDR or SO_REUSEPORT to get rid of the
problem, with no great success)
So I am not sure what the 'automatic' stuff would provide anyway ?
Selecting a port is quite expensive because of all the spinlocks and
lookups, so doing this twice automatically would add a significant cost.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
2015-06-06 16:38 ` Maciej Żenczykowski
2015-06-06 17:39 ` Eric Dumazet
@ 2015-06-06 19:40 ` David Miller
1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2015-06-06 19:40 UTC (permalink / raw)
To: zenczykowski; +Cc: eric.dumazet, netdev, mtk.manpages
Please do not top-post.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
2015-06-06 15:53 [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations Eric Dumazet
2015-06-06 16:38 ` Maciej Żenczykowski
@ 2015-06-07 2:30 ` Neal Cardwell
2015-06-07 3:08 ` Eric Dumazet
2015-06-07 4:17 ` [PATCH v2 " Eric Dumazet
1 sibling, 2 replies; 8+ messages in thread
From: Neal Cardwell @ 2015-06-07 2:30 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, netdev, Michael Kerrisk, Maciej Żenczykowski
On Sat, Jun 6, 2015 at 11:53 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
...
> This patch adds a new SOL_IP socket option, asking kernel to ignore
> the 0 port provided by application in bind(IP, port=0) and only
> remember the given IP address.
...
> This new feature is available for both IPv4 and IPv6.
I like this a lot. This addresses a very serious gap in the sockets
API, and should be very useful. The comment mentions that this is
available for IPv6. From skimming the change I would have expected
inet6_bind() would need a change analogous to the change in
inet_bind()? Was there a missing "git add", or is a change to
inet6_bind() somehow not needed?
Thanks!
neal
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
2015-06-07 2:30 ` Neal Cardwell
@ 2015-06-07 3:08 ` Eric Dumazet
2015-06-07 4:17 ` [PATCH v2 " Eric Dumazet
1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2015-06-07 3:08 UTC (permalink / raw)
To: Neal Cardwell
Cc: David Miller, netdev, Michael Kerrisk, Maciej Żenczykowski
On Sat, 2015-06-06 at 22:30 -0400, Neal Cardwell wrote:
> I like this a lot. This addresses a very serious gap in the sockets
> API, and should be very useful. The comment mentions that this is
> available for IPv6. From skimming the change I would have expected
> inet6_bind() would need a change analogous to the change in
> inet_bind()? Was there a missing "git add", or is a change to
> inet6_bind() somehow not needed?
Hmm... No I totally missed inet6_bind(), thanks for spotting this.
I'll add a IPv6 mode to my test program before sending a v2.
Thanks Neal.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
2015-06-07 2:30 ` Neal Cardwell
2015-06-07 3:08 ` Eric Dumazet
@ 2015-06-07 4:17 ` Eric Dumazet
2015-06-07 6:57 ` David Miller
1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2015-06-07 4:17 UTC (permalink / raw)
To: Neal Cardwell
Cc: David Miller, netdev, Michael Kerrisk, Maciej Żenczykowski
From: Eric Dumazet <edumazet@google.com>
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).
As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())
With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.
This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.
The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.
This new feature is available for both IPv4 and IPv6 (Thanks Neal)
Tested:
Wrote a test program and checked its behavior on IPv4 and IPv6.
strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
IPv6 test :
socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.
lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets
lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66
Check that a given source port is indeed used by many different
connections :
lpaa23:~# ss -t src :40000 | head -10
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 127.0.0.2:40000 127.0.202.33:44983
ESTAB 0 0 127.0.0.2:40000 127.2.27.240:44983
ESTAB 0 0 127.0.0.2:40000 127.2.98.5:44983
ESTAB 0 0 127.0.0.2:40000 127.0.124.196:44983
ESTAB 0 0 127.0.0.2:40000 127.2.139.38:44983
ESTAB 0 0 127.0.0.2:40000 127.1.59.80:44983
ESTAB 0 0 127.0.0.2:40000 127.3.6.228:44983
ESTAB 0 0 127.0.0.2:40000 127.0.38.53:44983
ESTAB 0 0 127.0.0.2:40000 127.1.197.10:44983
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
v2: really implements IPv6 part, thanks Neal !
include/net/inet_sock.h | 1 +
include/uapi/linux/in.h | 1 +
net/ipv4/af_inet.c | 3 ++-
net/ipv4/ip_sockglue.c | 7 +++++++
net/ipv6/af_inet6.c | 3 ++-
5 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index b6c3737da4e9..47eb67b08abd 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -187,6 +187,7 @@ struct inet_sock {
transparent:1,
mc_all:1,
nodefrag:1;
+ __u8 bind_address_no_port:1;
__u8 rcv_tos;
__u8 convert_csum;
int uc_index;
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 641338bef651..83d6236a2f08 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -112,6 +112,7 @@ struct in_addr {
#define IP_MINTTL 21
#define IP_NODEFRAG 22
#define IP_CHECKSUM 23
+#define IP_BIND_ADDRESS_NO_PORT 24
/* IP_MTU_DISCOVER values */
#define IP_PMTUDISC_DONT 0 /* Never send DF frames */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6ad0f7a711c9..cc858ef44451 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -488,7 +488,8 @@ int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
inet->inet_saddr = 0; /* Use device */
/* Make sure we are allowed to bind here. */
- if (sk->sk_prot->get_port(sk, snum)) {
+ if ((snum || !inet->bind_address_no_port) &&
+ sk->sk_prot->get_port(sk, snum)) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
err = -EADDRINUSE;
goto out_release_sock;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 7cfb0893f263..04ae2992a5cd 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -582,6 +582,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
case IP_TRANSPARENT:
case IP_MINTTL:
case IP_NODEFRAG:
+ case IP_BIND_ADDRESS_NO_PORT:
case IP_UNICAST_IF:
case IP_MULTICAST_TTL:
case IP_MULTICAST_ALL:
@@ -732,6 +733,9 @@ static int do_ip_setsockopt(struct sock *sk, int level,
}
inet->nodefrag = val ? 1 : 0;
break;
+ case IP_BIND_ADDRESS_NO_PORT:
+ inet->bind_address_no_port = val ? 1 : 0;
+ break;
case IP_MTU_DISCOVER:
if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_OMIT)
goto e_inval;
@@ -1324,6 +1328,9 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_NODEFRAG:
val = inet->nodefrag;
break;
+ case IP_BIND_ADDRESS_NO_PORT:
+ val = inet->bind_address_no_port;
+ break;
case IP_MTU_DISCOVER:
val = inet->pmtudisc;
break;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index f3866c0b6cfe..7de52b65173f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -362,7 +362,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
np->saddr = addr->sin6_addr;
/* Make sure we are allowed to bind here. */
- if (sk->sk_prot->get_port(sk, snum)) {
+ if ((snum || !inet->bind_address_no_port) &&
+ sk->sk_prot->get_port(sk, snum)) {
inet_reset_saddr(sk);
err = -EADDRINUSE;
goto out;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
2015-06-07 4:17 ` [PATCH v2 " Eric Dumazet
@ 2015-06-07 6:57 ` David Miller
0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2015-06-07 6:57 UTC (permalink / raw)
To: eric.dumazet; +Cc: ncardwell, netdev, mtk.manpages, maze
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 06 Jun 2015 21:17:57 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> When an application needs to force a source IP on an active TCP socket
> it has to use bind(IP, port=x).
>
> As most applications do not want to deal with already used ports, x is
> often set to 0, meaning the kernel is in charge to find an available
> port.
> But kernel does not know yet if this socket is going to be a listener or
> be connected.
> It has very limited choices (no full knowledge of final 4-tuple for a
> connect())
>
> With limited ephemeral port range (about 32K ports), it is very easy to
> fill the space.
>
> This patch adds a new SOL_IP socket option, asking kernel to ignore
> the 0 port provided by application in bind(IP, port=0) and only
> remember the given IP address.
>
> The port will be automatically chosen at connect() time, in a way
> that allows sharing a source port as long as the 4-tuples are unique.
>
> This new feature is available for both IPv4 and IPv6 (Thanks Neal)
>
> Tested:
...
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Looks good, applied, thanks Eric.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-06-07 6:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-06 15:53 [PATCH net-next] inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations Eric Dumazet
2015-06-06 16:38 ` Maciej Żenczykowski
2015-06-06 17:39 ` Eric Dumazet
2015-06-06 19:40 ` David Miller
2015-06-07 2:30 ` Neal Cardwell
2015-06-07 3:08 ` Eric Dumazet
2015-06-07 4:17 ` [PATCH v2 " Eric Dumazet
2015-06-07 6:57 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).