* [PATCH net-next v2 0/2] tcp: Update bind bucket state on port release
@ 2025-08-21 11:09 Jakub Sitnicki
2025-08-21 11:09 ` [PATCH net-next v2 1/2] " Jakub Sitnicki
2025-08-21 11:09 ` [PATCH net-next v2 2/2] selftests/net: Test tcp port reuse after unbinding a socket Jakub Sitnicki
0 siblings, 2 replies; 5+ messages in thread
From: Jakub Sitnicki @ 2025-08-21 11:09 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Kuniyuki Iwashima,
Neal Cardwell, Paolo Abeni, kernel-team, Lee Valentine
TL;DR
-----
This is another take on addressing the issue we already raised earlier [1].
This time around, instead of trying to relax the bind-conflict checks in
connect(), we make an attempt to fix the tcp bind bucket state accounting.
The goal of this patch set is to make the bind buckets return to "port
reusable by ephemeral connections" state when all sockets blocking the port
from reuse get unhashed.
Changelog
---------
Changes in v2:
- Rename the inet_sock flag from LAZY_BIND to AUTOBIND (Eric)
- Clear the AUTOBIND flag on disconnect path (Eric)
- Add a test to cover the disconnect case (Eric)
- Link to RFC v1: https://lore.kernel.org/r/20250808-update-bind-bucket-state-on-unhash-v1-0-faf85099d61b@cloudflare.com
Situation
---------
We observe the following scenario in production:
inet_bind_bucket
state for port 54321
--------------------
(bucket doesn't exist)
// Process A opens a long-lived connection:
s1 = socket(AF_INET, SOCK_STREAM)
s1.setsockopt(IP_BIND_ADDRESS_NO_PORT)
s1.setsockopt(IP_LOCAL_PORT_RANGE, 54000..54500)
s1.bind(192.0.2.10, 0)
s1.connect(192.51.100.1, 443)
tb->fastreuse = -1
tb->fastreuseport = -1
s1.getsockname() -> 192.0.2.10:54321
s1.send()
s1.recv()
// ... s1 stays open.
// Process B opens a short-lived connection:
s2 = socket(AF_INET, SOCK_STREAM)
s2.setsockopt(SO_REUSEADDR)
s2.bind(192.0.2.20, 0)
tb->fastreuse = 0
tb->fastreuseport = 0
s2.connect(192.51.100.2, 53)
s2.getsockname() -> 192.0.2.20:54321
s2.send()
s2.recv()
s2.close()
// bucket remains in this
// state even though port
// was released by s2
tb->fastreuse = 0
tb->fastreuseport = 0
// Process A attempts to open another connection
// when there is connection pressure from
// 192.0.2.30:54000..54500 to 192.51.100.1:443.
// Assume only port 54321 is still available.
s3 = socket(AF_INET, SOCK_STREAM)
s3.setsockopt(IP_BIND_ADDRESS_NO_PORT)
s3.setsockopt(IP_LOCAL_PORT_RANGE, 54000..54500)
s3.bind(192.0.2.30, 0)
s3.connect(192.51.100.1, 443) -> EADDRNOTAVAIL (99)
Problem
-------
We end up in a state where Process A can't reuse ephemeral port 54321 for
as long as there are sockets, like s1, that keep the bind bucket alive. The
bucket does not return to "reusable" state even when all sockets which
blocked it from reuse, like s2, are gone.
The ephemeral port becomes available for use again only after all sockets
bound to it are gone and the bind bucket is destroyed.
Programs which behave like Process B in this scenario - that is, binding to
an IP address without setting IP_BIND_ADDRESS_NO_PORT - might be considered
poorly written. However, the reality is that such implementation is not
actually uncommon. Trying to fix each and every such program is like
playing whack-a-mole.
For instance, it could be any software using Golang's net.Dialer with
LocalAddr provided:
dialer := &net.Dialer{
LocalAddr: &net.TCPAddr{IP: srcIP},
}
conn, err := dialer.Dial("tcp4", dialTarget)
Or even a ubiquitous tool like dig when using a specific local address:
$ dig -b 127.1.1.1 +tcp +short example.com
Hence, we are proposing a systematic fix in the network stack itself.
Solution
--------
Please see the description in patch 1.
[1] https://lore.kernel.org/r/20250714-connect-port-search-harder-v3-0-b1a41f249865@cloudflare.com
Reported-by: Lee Valentine <lvalentine@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
Jakub Sitnicki (2):
tcp: Update bind bucket state on port release
selftests/net: Test tcp port reuse after unbinding a socket
include/net/inet_connection_sock.h | 5 +-
include/net/inet_hashtables.h | 2 +
include/net/inet_sock.h | 2 +
include/net/inet_timewait_sock.h | 3 +-
include/net/tcp.h | 15 ++
net/ipv4/inet_connection_sock.c | 12 +-
net/ipv4/inet_hashtables.c | 32 +++-
net/ipv4/inet_timewait_sock.c | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/tcp_port_share.c | 258 +++++++++++++++++++++++++++
10 files changed, 323 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next v2 1/2] tcp: Update bind bucket state on port release
2025-08-21 11:09 [PATCH net-next v2 0/2] tcp: Update bind bucket state on port release Jakub Sitnicki
@ 2025-08-21 11:09 ` Jakub Sitnicki
2025-08-22 3:58 ` Kuniyuki Iwashima
2025-08-21 11:09 ` [PATCH net-next v2 2/2] selftests/net: Test tcp port reuse after unbinding a socket Jakub Sitnicki
1 sibling, 1 reply; 5+ messages in thread
From: Jakub Sitnicki @ 2025-08-21 11:09 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Kuniyuki Iwashima,
Neal Cardwell, Paolo Abeni, kernel-team, Lee Valentine
Currently, when an inet_bind_bucket enters a state where fastreuse >= 0 or
fastreuseport >= 0, after a socket explicitly binds to a port, it stays in
that state until all associated sockets are removed and the bucket is
destroyed.
In this state, the bucket is skipped during ephemeral port selection in
connect(). For applications using a small ephemeral port range (via
IP_LOCAL_PORT_RANGE option), this can lead to quicker port exhaustion
because "blocked" buckets remain excluded from reuse.
The reason for not updating the bucket state on port release is unclear. It
may have been a performance trade-off to avoid scanning bucket owners, or
simply an oversight.
Address it by recalculating the bind bucket state when a socket releases a
port. To minimize overhead, use a divide-and-conquer strategy: duplicate
the (fastreuse, fastreuseport) state in each inet_bind2_bucket. On port
release, we only need to scan the relevant port-addr bucket, and the
overall port bucket state can be derived from those.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
include/net/inet_connection_sock.h | 5 +++--
include/net/inet_hashtables.h | 2 ++
include/net/inet_sock.h | 2 ++
include/net/inet_timewait_sock.h | 3 ++-
include/net/tcp.h | 15 +++++++++++++++
net/ipv4/inet_connection_sock.c | 12 ++++++++----
net/ipv4/inet_hashtables.c | 32 +++++++++++++++++++++++++++++++-
net/ipv4/inet_timewait_sock.c | 1 +
8 files changed, 64 insertions(+), 8 deletions(-)
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 1735db332aab..072347f16483 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -322,8 +322,9 @@ int inet_csk_listen_start(struct sock *sk);
void inet_csk_listen_stop(struct sock *sk);
/* update the fast reuse flag when adding a socket */
-void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
- struct sock *sk);
+void inet_csk_update_fastreuse(const struct sock *sk,
+ struct inet_bind_bucket *tb,
+ struct inet_bind2_bucket *tb2);
struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 19dbd9081d5a..d6676746dabf 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -108,6 +108,8 @@ struct inet_bind2_bucket {
struct hlist_node bhash_node;
/* List of sockets hashed to this bucket */
struct hlist_head owners;
+ signed char fastreuse;
+ signed char fastreuseport;
};
static inline struct net *ib_net(const struct inet_bind_bucket *ib)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 1086256549fa..9614d0430471 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -279,6 +279,8 @@ enum {
INET_FLAGS_RTALERT_ISOLATE = 28,
INET_FLAGS_SNDFLOW = 29,
INET_FLAGS_RTALERT = 30,
+ /* socket bound to a port at connect() time */
+ INET_FLAGS_AUTOBIND = 31,
};
/* cmsg flags for inet */
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index 67a313575780..ec99176d576f 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -70,7 +70,8 @@ struct inet_timewait_sock {
unsigned int tw_transparent : 1,
tw_flowlabel : 20,
tw_usec_ts : 1,
- tw_pad : 2, /* 2 bits hole */
+ tw_autobind : 1,
+ tw_pad : 1, /* 1 bit hole */
tw_tos : 8;
u32 tw_txhash;
u32 tw_priority;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2936b8175950..c4bb6e56a668 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2225,6 +2225,21 @@ static inline bool inet_sk_transparent(const struct sock *sk)
return inet_test_bit(TRANSPARENT, sk);
}
+/**
+ * inet_sk_autobind - Check if socket was bound to a port at connect() time.
+ * @sk: &struct inet_connection_sock or &struct inet_timewait_sock
+ */
+static inline bool inet_sk_autobind(const struct sock *sk)
+{
+ switch (sk->sk_state) {
+ case TCP_TIME_WAIT:
+ return inet_twsk(sk)->tw_autobind;
+ case TCP_NEW_SYN_RECV:
+ return false; /* n/a to request sock */
+ }
+ return inet_test_bit(AUTOBIND, sk);
+}
+
/* Determines whether this is a thin stream (which may suffer from
* increased latency). Used to trigger latency-reducing mechanisms.
*/
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 0ef1eacd539d..34e4fe0c7b4b 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -423,7 +423,7 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
}
static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
- struct sock *sk)
+ const struct sock *sk)
{
if (tb->fastreuseport <= 0)
return 0;
@@ -453,8 +453,9 @@ static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
ipv6_only_sock(sk), true, false);
}
-void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
- struct sock *sk)
+void inet_csk_update_fastreuse(const struct sock *sk,
+ struct inet_bind_bucket *tb,
+ struct inet_bind2_bucket *tb2)
{
bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
@@ -501,6 +502,9 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
tb->fastreuseport = 0;
}
}
+
+ tb2->fastreuse = tb->fastreuse;
+ tb2->fastreuseport = tb->fastreuseport;
}
/* Obtain a reference to a local port for the given sock,
@@ -582,7 +586,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
}
success:
- inet_csk_update_fastreuse(tb, sk);
+ inet_csk_update_fastreuse(sk, tb, tb2);
if (!inet_csk(sk)->icsk_bind_hash)
inet_bind_hash(sk, tb, tb2, port);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ceeeec9b7290..f644ffe43018 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -87,10 +87,22 @@ struct inet_bind_bucket *inet_bind_bucket_create(struct kmem_cache *cachep,
*/
void inet_bind_bucket_destroy(struct inet_bind_bucket *tb)
{
+ const struct inet_bind2_bucket *tb2;
+
if (hlist_empty(&tb->bhash2)) {
hlist_del_rcu(&tb->node);
kfree_rcu(tb, rcu);
+ return;
+ }
+
+ if (tb->fastreuse == -1 && tb->fastreuseport == -1)
+ return;
+ hlist_for_each_entry(tb2, &tb->bhash2, bhash_node) {
+ if (tb2->fastreuse != -1 || tb2->fastreuseport != -1)
+ return;
}
+ tb->fastreuse = -1;
+ tb->fastreuseport = -1;
}
bool inet_bind_bucket_match(const struct inet_bind_bucket *tb, const struct net *net,
@@ -121,6 +133,8 @@ static void inet_bind2_bucket_init(struct inet_bind2_bucket *tb2,
#else
tb2->rcv_saddr = sk->sk_rcv_saddr;
#endif
+ tb2->fastreuse = 0;
+ tb2->fastreuseport = 0;
INIT_HLIST_HEAD(&tb2->owners);
hlist_add_head(&tb2->node, &head->chain);
hlist_add_head(&tb2->bhash_node, &tb->bhash2);
@@ -143,11 +157,23 @@ struct inet_bind2_bucket *inet_bind2_bucket_create(struct kmem_cache *cachep,
/* Caller must hold hashbucket lock for this tb with local BH disabled */
void inet_bind2_bucket_destroy(struct kmem_cache *cachep, struct inet_bind2_bucket *tb)
{
+ const struct sock *sk;
+
if (hlist_empty(&tb->owners)) {
__hlist_del(&tb->node);
__hlist_del(&tb->bhash_node);
kmem_cache_free(cachep, tb);
+ return;
+ }
+
+ if (tb->fastreuse == -1 && tb->fastreuseport == -1)
+ return;
+ sk_for_each_bound(sk, &tb->owners) {
+ if (!inet_sk_autobind(sk))
+ return;
}
+ tb->fastreuse = -1;
+ tb->fastreuseport = -1;
}
static bool inet_bind2_bucket_addr_match(const struct inet_bind2_bucket *tb2,
@@ -191,6 +217,7 @@ static void __inet_put_port(struct sock *sk)
tb = inet_csk(sk)->icsk_bind_hash;
inet_csk(sk)->icsk_bind_hash = NULL;
inet_sk(sk)->inet_num = 0;
+ inet_clear_bit(AUTOBIND, sk);
spin_lock(&head2->lock);
if (inet_csk(sk)->icsk_bind2_hash) {
@@ -277,7 +304,7 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
}
}
if (update_fastreuse)
- inet_csk_update_fastreuse(tb, child);
+ inet_csk_update_fastreuse(child, tb, tb2);
inet_bind_hash(child, tb, tb2, port);
spin_unlock(&head2->lock);
spin_unlock(&head->lock);
@@ -1136,6 +1163,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
head2, tb, sk);
if (!tb2)
goto error;
+ tb2->fastreuse = -1;
+ tb2->fastreuseport = -1;
}
/* Here we want to add a little bit of randomness to the next source
@@ -1148,6 +1177,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
/* Head lock still held and bh's disabled */
inet_bind_hash(sk, tb, tb2, port);
+ inet_set_bit(AUTOBIND, sk);
if (sk_unhashed(sk)) {
inet_sk(sk)->inet_sport = htons(port);
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 875ff923a8ed..0150f5697040 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -206,6 +206,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
tw->tw_hash = sk->sk_hash;
tw->tw_ipv6only = 0;
tw->tw_transparent = inet_test_bit(TRANSPARENT, sk);
+ tw->tw_autobind = inet_test_bit(AUTOBIND, sk);
tw->tw_prot = sk->sk_prot_creator;
atomic64_set(&tw->tw_cookie, atomic64_read(&sk->sk_cookie));
twsk_net_set(tw, sock_net(sk));
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next v2 2/2] selftests/net: Test tcp port reuse after unbinding a socket
2025-08-21 11:09 [PATCH net-next v2 0/2] tcp: Update bind bucket state on port release Jakub Sitnicki
2025-08-21 11:09 ` [PATCH net-next v2 1/2] " Jakub Sitnicki
@ 2025-08-21 11:09 ` Jakub Sitnicki
1 sibling, 0 replies; 5+ messages in thread
From: Jakub Sitnicki @ 2025-08-21 11:09 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Kuniyuki Iwashima,
Neal Cardwell, Paolo Abeni, kernel-team, Lee Valentine
Exercise the scenario described in detail in the cover letter:
1) socket A: connect() from ephemeral port X
2) socket B: explicitly bind() to port X
3) check that port X is now excluded from ephemeral ports
4) close socket B to release the port bind
5) socket C: connect() from ephemeral port X
As well as a corner case to test that the autobind flag is cleared:
1) connect() from ephemeral port X
2) disconnect the socket with connect(AF_UNSPEC)
3) bind() it explicitly to port X
4) check that port X is now excluded from ephemeral ports
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/tcp_port_share.c | 258 +++++++++++++++++++++++++++
2 files changed, 259 insertions(+)
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index b31a71f2b372..b317ec5e6aec 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -117,6 +117,7 @@ TEST_GEN_FILES += tfo
TEST_PROGS += tfo_passive.sh
TEST_PROGS += broadcast_pmtu.sh
TEST_PROGS += ipv6_force_forwarding.sh
+TEST_GEN_PROGS += tcp_port_share
# YNL files, must be before "include ..lib.mk"
YNL_GEN_FILES := busy_poller netlink-dumps
diff --git a/tools/testing/selftests/net/tcp_port_share.c b/tools/testing/selftests/net/tcp_port_share.c
new file mode 100644
index 000000000000..4c39d599dfce
--- /dev/null
+++ b/tools/testing/selftests/net/tcp_port_share.c
@@ -0,0 +1,258 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2025 Cloudflare, Inc.
+
+/* Tests for TCP port sharing (bind bucket reuse). */
+
+#include <arpa/inet.h>
+#include <net/if.h>
+#include <sys/ioctl.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdlib.h>
+
+#include "../kselftest_harness.h"
+
+#define DST_PORT 30000
+#define SRC_PORT 40000
+
+struct sockaddr_inet {
+ union {
+ struct sockaddr_storage ss;
+ struct sockaddr_in6 v6;
+ struct sockaddr_in v4;
+ struct sockaddr sa;
+ };
+ socklen_t len;
+ char str[INET6_ADDRSTRLEN + __builtin_strlen("[]:65535") + 1];
+};
+
+const int one = 1;
+
+static int disconnect(int fd)
+{
+ return connect(fd, &(struct sockaddr){ AF_UNSPEC }, sizeof(struct sockaddr));
+}
+
+static int getsockname_port(int fd)
+{
+ struct sockaddr_inet addr = {};
+ int err;
+
+ addr.len = sizeof(addr);
+ err = getsockname(fd, &addr.sa, &addr.len);
+ if (err)
+ return -1;
+
+ switch (addr.sa.sa_family) {
+ case AF_INET:
+ return ntohs(addr.v4.sin_port);
+ case AF_INET6:
+ return ntohs(addr.v6.sin6_port);
+ default:
+ errno = EAFNOSUPPORT;
+ return -1;
+ }
+}
+
+static void make_inet_addr(int af, const char *ip, __u16 port,
+ struct sockaddr_inet *addr)
+{
+ const char *fmt = "";
+
+ memset(addr, 0, sizeof(*addr));
+
+ switch (af) {
+ case AF_INET:
+ addr->len = sizeof(addr->v4);
+ addr->v4.sin_family = af;
+ addr->v4.sin_port = htons(port);
+ inet_pton(af, ip, &addr->v4.sin_addr);
+ fmt = "%s:%hu";
+ break;
+ case AF_INET6:
+ addr->len = sizeof(addr->v6);
+ addr->v6.sin6_family = af;
+ addr->v6.sin6_port = htons(port);
+ inet_pton(af, ip, &addr->v6.sin6_addr);
+ fmt = "[%s]:%hu";
+ break;
+ }
+
+ snprintf(addr->str, sizeof(addr->str), fmt, ip, port);
+}
+
+FIXTURE(tcp_port_share) {};
+
+FIXTURE_VARIANT(tcp_port_share) {
+ int domain;
+ /* IP to listen on and connect to */
+ const char *dst_ip;
+ /* Primary IP to connect from */
+ const char *src1_ip;
+ /* Secondary IP to connect from */
+ const char *src2_ip;
+ /* IP to bind to in order to block the source port */
+ const char *bind_ip;
+};
+
+FIXTURE_VARIANT_ADD(tcp_port_share, ipv4) {
+ .domain = AF_INET,
+ .dst_ip = "127.0.0.1",
+ .src1_ip = "127.1.1.1",
+ .src2_ip = "127.2.2.2",
+ .bind_ip = "127.3.3.3",
+};
+
+FIXTURE_VARIANT_ADD(tcp_port_share, ipv6) {
+ .domain = AF_INET6,
+ .dst_ip = "::1",
+ .src1_ip = "2001:db8::1",
+ .src2_ip = "2001:db8::2",
+ .bind_ip = "2001:db8::3",
+};
+
+FIXTURE_SETUP(tcp_port_share)
+{
+ int sc;
+
+ ASSERT_EQ(unshare(CLONE_NEWNET), 0);
+ ASSERT_EQ(system("ip link set dev lo up"), 0);
+ ASSERT_EQ(system("ip addr add dev lo 2001:db8::1/32 nodad"), 0);
+ ASSERT_EQ(system("ip addr add dev lo 2001:db8::2/32 nodad"), 0);
+ ASSERT_EQ(system("ip addr add dev lo 2001:db8::3/32 nodad"), 0);
+
+ sc = open("/proc/sys/net/ipv4/ip_local_port_range", O_WRONLY);
+ ASSERT_GE(sc, 0);
+ ASSERT_GT(dprintf(sc, "%hu %hu\n", SRC_PORT, SRC_PORT), 0);
+ ASSERT_EQ(close(sc), 0);
+}
+
+FIXTURE_TEARDOWN(tcp_port_share) {}
+
+/* Verify that an ephemeral port becomes available again after the socket
+ * bound to it and blocking it from reuse is closed.
+ */
+TEST_F(tcp_port_share, can_reuse_port_after_bind_and_close)
+{
+ const typeof(variant) v = variant;
+ struct sockaddr_inet addr;
+ int c1, c2, ln, pb;
+
+ /* Listen on <dst_ip>:<DST_PORT> */
+ ln = socket(v->domain, SOCK_STREAM, 0);
+ ASSERT_GE(ln, 0) TH_LOG("socket(): %m");
+ ASSERT_EQ(setsockopt(ln, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->dst_ip, DST_PORT, &addr);
+ ASSERT_EQ(bind(ln, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+ ASSERT_EQ(listen(ln, 2), 0);
+
+ /* Connect from <src1_ip>:<SRC_PORT> */
+ c1 = socket(v->domain, SOCK_STREAM, 0);
+ ASSERT_GE(c1, 0) TH_LOG("socket(): %m");
+ ASSERT_EQ(setsockopt(c1, SOL_IP, IP_BIND_ADDRESS_NO_PORT, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->src1_ip, 0, &addr);
+ ASSERT_EQ(bind(c1, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+
+ make_inet_addr(v->domain, v->dst_ip, DST_PORT, &addr);
+ ASSERT_EQ(connect(c1, &addr.sa, addr.len), 0) TH_LOG("connect(%s): %m", addr.str);
+ ASSERT_EQ(getsockname_port(c1), SRC_PORT);
+
+ /* Bind to <bind_ip>:<SRC_PORT>. Block the port from reuse. */
+ pb = socket(v->domain, SOCK_STREAM, 0);
+ ASSERT_GE(pb, 0) TH_LOG("socket(): %m");
+ ASSERT_EQ(setsockopt(pb, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->bind_ip, SRC_PORT, &addr);
+ ASSERT_EQ(bind(pb, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+
+ /* Try to connect from <src2_ip>:<SRC_PORT>. Expect failure. */
+ c2 = socket(v->domain, SOCK_STREAM, 0);
+ ASSERT_GE(c2, 0) TH_LOG("socket");
+ ASSERT_EQ(setsockopt(c2, SOL_IP, IP_BIND_ADDRESS_NO_PORT, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->src2_ip, 0, &addr);
+ ASSERT_EQ(bind(c2, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+
+ make_inet_addr(v->domain, v->dst_ip, DST_PORT, &addr);
+ ASSERT_EQ(connect(c2, &addr.sa, addr.len), -1) TH_LOG("connect(%s)", addr.str);
+ ASSERT_EQ(errno, EADDRNOTAVAIL) TH_LOG("%m");
+
+ /* Unbind from <bind_ip>:<SRC_PORT>. Unblock the port for reuse. */
+ ASSERT_EQ(close(pb), 0);
+
+ /* Connect again from <src2_ip>:<SRC_PORT> */
+ EXPECT_EQ(connect(c2, &addr.sa, addr.len), 0) TH_LOG("connect(%s): %m", addr.str);
+ EXPECT_EQ(getsockname_port(c2), SRC_PORT);
+
+ ASSERT_EQ(close(c2), 0);
+ ASSERT_EQ(close(c1), 0);
+ ASSERT_EQ(close(ln), 0);
+}
+
+/* Verify that a socket auto-bound during connect() blocks port reuse after
+ * disconnect (connect(AF_UNSPEC)) followed by an explicit port bind().
+ */
+TEST_F(tcp_port_share, port_block_after_disconnect)
+{
+ const typeof(variant) v = variant;
+ struct sockaddr_inet addr;
+ int c1, c2, ln, pb;
+
+ /* Listen on <dst_ip>:<DST_PORT> */
+ ln = socket(v->domain, SOCK_STREAM, 0);
+ ASSERT_GE(ln, 0) TH_LOG("socket(): %m");
+ ASSERT_EQ(setsockopt(ln, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->dst_ip, DST_PORT, &addr);
+ ASSERT_EQ(bind(ln, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+ ASSERT_EQ(listen(ln, 2), 0);
+
+ /* Connect from <src1_ip>:<SRC_PORT> */
+ c1 = socket(v->domain, SOCK_STREAM, 0);
+ ASSERT_GE(c1, 0) TH_LOG("socket(): %m");
+ ASSERT_EQ(setsockopt(c1, SOL_IP, IP_BIND_ADDRESS_NO_PORT, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->src1_ip, 0, &addr);
+ ASSERT_EQ(bind(c1, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+
+ make_inet_addr(v->domain, v->dst_ip, DST_PORT, &addr);
+ ASSERT_EQ(connect(c1, &addr.sa, addr.len), 0) TH_LOG("connect(%s): %m", addr.str);
+ ASSERT_EQ(getsockname_port(c1), SRC_PORT);
+
+ /* Disconnect the socket and bind it to <bind_ip>:<SRC_PORT> to block the port */
+ ASSERT_EQ(disconnect(c1), 0) TH_LOG("disconnect: %m");
+ ASSERT_EQ(setsockopt(c1, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->bind_ip, SRC_PORT, &addr);
+ ASSERT_EQ(bind(c1, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+
+ /* Trigger port-addr bucket state update with another bind() and close() */
+ pb = socket(v->domain, SOCK_STREAM, 0);
+ ASSERT_GE(pb, 0) TH_LOG("socket(): %m");
+ ASSERT_EQ(setsockopt(pb, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->bind_ip, SRC_PORT, &addr);
+ ASSERT_EQ(bind(pb, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+
+ ASSERT_EQ(close(pb), 0);
+
+ /* Connect from <src2_ip>:<SRC_PORT>. Expect failure. */
+ c2 = socket(v->domain, SOCK_STREAM, 0);
+ ASSERT_GE(c2, 0) TH_LOG("socket: %m");
+ ASSERT_EQ(setsockopt(c2, SOL_IP, IP_BIND_ADDRESS_NO_PORT, &one, sizeof(one)), 0);
+
+ make_inet_addr(v->domain, v->src2_ip, 0, &addr);
+ ASSERT_EQ(bind(c2, &addr.sa, addr.len), 0) TH_LOG("bind(%s): %m", addr.str);
+
+ make_inet_addr(v->domain, v->dst_ip, DST_PORT, &addr);
+ EXPECT_EQ(connect(c2, &addr.sa, addr.len), -1) TH_LOG("connect(%s)", addr.str);
+ EXPECT_EQ(errno, EADDRNOTAVAIL) TH_LOG("%m");
+
+ ASSERT_EQ(close(c2), 0);
+ ASSERT_EQ(close(c1), 0);
+ ASSERT_EQ(close(ln), 0);
+}
+
+TEST_HARNESS_MAIN
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v2 1/2] tcp: Update bind bucket state on port release
2025-08-21 11:09 ` [PATCH net-next v2 1/2] " Jakub Sitnicki
@ 2025-08-22 3:58 ` Kuniyuki Iwashima
2025-08-22 13:37 ` Jakub Sitnicki
0 siblings, 1 reply; 5+ messages in thread
From: Kuniyuki Iwashima @ 2025-08-22 3:58 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Neal Cardwell, Paolo Abeni, kernel-team, Lee Valentine
On Thu, Aug 21, 2025 at 4:09 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Currently, when an inet_bind_bucket enters a state where fastreuse >= 0 or
> fastreuseport >= 0, after a socket explicitly binds to a port, it stays in
> that state until all associated sockets are removed and the bucket is
> destroyed.
>
> In this state, the bucket is skipped during ephemeral port selection in
> connect(). For applications using a small ephemeral port range (via
> IP_LOCAL_PORT_RANGE option), this can lead to quicker port exhaustion
> because "blocked" buckets remain excluded from reuse.
>
> The reason for not updating the bucket state on port release is unclear. It
> may have been a performance trade-off to avoid scanning bucket owners, or
> simply an oversight.
>
> Address it by recalculating the bind bucket state when a socket releases a
> port. To minimize overhead, use a divide-and-conquer strategy: duplicate
> the (fastreuse, fastreuseport) state in each inet_bind2_bucket. On port
> release, we only need to scan the relevant port-addr bucket, and the
> overall port bucket state can be derived from those.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
> include/net/inet_connection_sock.h | 5 +++--
> include/net/inet_hashtables.h | 2 ++
> include/net/inet_sock.h | 2 ++
> include/net/inet_timewait_sock.h | 3 ++-
> include/net/tcp.h | 15 +++++++++++++++
> net/ipv4/inet_connection_sock.c | 12 ++++++++----
> net/ipv4/inet_hashtables.c | 32 +++++++++++++++++++++++++++++++-
> net/ipv4/inet_timewait_sock.c | 1 +
> 8 files changed, 64 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index 1735db332aab..072347f16483 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -322,8 +322,9 @@ int inet_csk_listen_start(struct sock *sk);
> void inet_csk_listen_stop(struct sock *sk);
>
> /* update the fast reuse flag when adding a socket */
> -void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
> - struct sock *sk);
> +void inet_csk_update_fastreuse(const struct sock *sk,
> + struct inet_bind_bucket *tb,
> + struct inet_bind2_bucket *tb2);
>
> struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);
>
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index 19dbd9081d5a..d6676746dabf 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -108,6 +108,8 @@ struct inet_bind2_bucket {
> struct hlist_node bhash_node;
> /* List of sockets hashed to this bucket */
> struct hlist_head owners;
> + signed char fastreuse;
> + signed char fastreuseport;
> };
>
> static inline struct net *ib_net(const struct inet_bind_bucket *ib)
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 1086256549fa..9614d0430471 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -279,6 +279,8 @@ enum {
> INET_FLAGS_RTALERT_ISOLATE = 28,
> INET_FLAGS_SNDFLOW = 29,
> INET_FLAGS_RTALERT = 30,
> + /* socket bound to a port at connect() time */
> + INET_FLAGS_AUTOBIND = 31,
AUTOBIND sounds like inet_autobind() was called.
__inet_bind() saves similar flags in sk->sk_userlocks and
it has 3 bits available.
How about flagging SOCK_BINDPORT_CONNECT in
sk->sk_userlocks ?
> };
>
> /* cmsg flags for inet */
> diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
> index 67a313575780..ec99176d576f 100644
> --- a/include/net/inet_timewait_sock.h
> +++ b/include/net/inet_timewait_sock.h
> @@ -70,7 +70,8 @@ struct inet_timewait_sock {
> unsigned int tw_transparent : 1,
> tw_flowlabel : 20,
> tw_usec_ts : 1,
> - tw_pad : 2, /* 2 bits hole */
> + tw_autobind : 1,
> + tw_pad : 1, /* 1 bit hole */
> tw_tos : 8;
> u32 tw_txhash;
> u32 tw_priority;
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 2936b8175950..c4bb6e56a668 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2225,6 +2225,21 @@ static inline bool inet_sk_transparent(const struct sock *sk)
> return inet_test_bit(TRANSPARENT, sk);
> }
>
> +/**
> + * inet_sk_autobind - Check if socket was bound to a port at connect() time.
> + * @sk: &struct inet_connection_sock or &struct inet_timewait_sock
> + */
> +static inline bool inet_sk_autobind(const struct sock *sk)
> +{
> + switch (sk->sk_state) {
> + case TCP_TIME_WAIT:
> + return inet_twsk(sk)->tw_autobind;
> + case TCP_NEW_SYN_RECV:
> + return false; /* n/a to request sock */
This never happens. Maybe remove the case
or add DEBUG_NET_WARN_ON_ONCE(1) ?
> + }
> + return inet_test_bit(AUTOBIND, sk);
> +}
> +
> /* Determines whether this is a thin stream (which may suffer from
> * increased latency). Used to trigger latency-reducing mechanisms.
> */
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 0ef1eacd539d..34e4fe0c7b4b 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -423,7 +423,7 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
> }
>
> static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
> - struct sock *sk)
> + const struct sock *sk)
> {
> if (tb->fastreuseport <= 0)
> return 0;
> @@ -453,8 +453,9 @@ static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
> ipv6_only_sock(sk), true, false);
> }
>
> -void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
> - struct sock *sk)
> +void inet_csk_update_fastreuse(const struct sock *sk,
> + struct inet_bind_bucket *tb,
> + struct inet_bind2_bucket *tb2)
> {
> bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
>
> @@ -501,6 +502,9 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
> tb->fastreuseport = 0;
> }
> }
> +
> + tb2->fastreuse = tb->fastreuse;
> + tb2->fastreuseport = tb->fastreuseport;
> }
>
> /* Obtain a reference to a local port for the given sock,
> @@ -582,7 +586,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
> }
>
> success:
> - inet_csk_update_fastreuse(tb, sk);
> + inet_csk_update_fastreuse(sk, tb, tb2);
>
> if (!inet_csk(sk)->icsk_bind_hash)
> inet_bind_hash(sk, tb, tb2, port);
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index ceeeec9b7290..f644ffe43018 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -87,10 +87,22 @@ struct inet_bind_bucket *inet_bind_bucket_create(struct kmem_cache *cachep,
> */
> void inet_bind_bucket_destroy(struct inet_bind_bucket *tb)
> {
> + const struct inet_bind2_bucket *tb2;
> +
> if (hlist_empty(&tb->bhash2)) {
> hlist_del_rcu(&tb->node);
> kfree_rcu(tb, rcu);
> + return;
> + }
> +
> + if (tb->fastreuse == -1 && tb->fastreuseport == -1)
> + return;
> + hlist_for_each_entry(tb2, &tb->bhash2, bhash_node) {
> + if (tb2->fastreuse != -1 || tb2->fastreuseport != -1)
> + return;
> }
> + tb->fastreuse = -1;
> + tb->fastreuseport = -1;
> }
>
> bool inet_bind_bucket_match(const struct inet_bind_bucket *tb, const struct net *net,
> @@ -121,6 +133,8 @@ static void inet_bind2_bucket_init(struct inet_bind2_bucket *tb2,
> #else
> tb2->rcv_saddr = sk->sk_rcv_saddr;
> #endif
> + tb2->fastreuse = 0;
> + tb2->fastreuseport = 0;
> INIT_HLIST_HEAD(&tb2->owners);
> hlist_add_head(&tb2->node, &head->chain);
> hlist_add_head(&tb2->bhash_node, &tb->bhash2);
> @@ -143,11 +157,23 @@ struct inet_bind2_bucket *inet_bind2_bucket_create(struct kmem_cache *cachep,
> /* Caller must hold hashbucket lock for this tb with local BH disabled */
> void inet_bind2_bucket_destroy(struct kmem_cache *cachep, struct inet_bind2_bucket *tb)
> {
> + const struct sock *sk;
> +
> if (hlist_empty(&tb->owners)) {
> __hlist_del(&tb->node);
> __hlist_del(&tb->bhash_node);
> kmem_cache_free(cachep, tb);
> + return;
> + }
> +
> + if (tb->fastreuse == -1 && tb->fastreuseport == -1)
> + return;
> + sk_for_each_bound(sk, &tb->owners) {
> + if (!inet_sk_autobind(sk))
> + return;
> }
> + tb->fastreuse = -1;
> + tb->fastreuseport = -1;
> }
>
> static bool inet_bind2_bucket_addr_match(const struct inet_bind2_bucket *tb2,
> @@ -191,6 +217,7 @@ static void __inet_put_port(struct sock *sk)
> tb = inet_csk(sk)->icsk_bind_hash;
> inet_csk(sk)->icsk_bind_hash = NULL;
> inet_sk(sk)->inet_num = 0;
> + inet_clear_bit(AUTOBIND, sk);
>
> spin_lock(&head2->lock);
> if (inet_csk(sk)->icsk_bind2_hash) {
> @@ -277,7 +304,7 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child)
> }
> }
> if (update_fastreuse)
> - inet_csk_update_fastreuse(tb, child);
> + inet_csk_update_fastreuse(child, tb, tb2);
> inet_bind_hash(child, tb, tb2, port);
> spin_unlock(&head2->lock);
> spin_unlock(&head->lock);
> @@ -1136,6 +1163,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
> head2, tb, sk);
> if (!tb2)
> goto error;
> + tb2->fastreuse = -1;
> + tb2->fastreuseport = -1;
> }
>
> /* Here we want to add a little bit of randomness to the next source
> @@ -1148,6 +1177,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
>
> /* Head lock still held and bh's disabled */
> inet_bind_hash(sk, tb, tb2, port);
> + inet_set_bit(AUTOBIND, sk);
>
> if (sk_unhashed(sk)) {
> inet_sk(sk)->inet_sport = htons(port);
> diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
> index 875ff923a8ed..0150f5697040 100644
> --- a/net/ipv4/inet_timewait_sock.c
> +++ b/net/ipv4/inet_timewait_sock.c
> @@ -206,6 +206,7 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
> tw->tw_hash = sk->sk_hash;
> tw->tw_ipv6only = 0;
> tw->tw_transparent = inet_test_bit(TRANSPARENT, sk);
> + tw->tw_autobind = inet_test_bit(AUTOBIND, sk);
> tw->tw_prot = sk->sk_prot_creator;
> atomic64_set(&tw->tw_cookie, atomic64_read(&sk->sk_cookie));
> twsk_net_set(tw, sock_net(sk));
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v2 1/2] tcp: Update bind bucket state on port release
2025-08-22 3:58 ` Kuniyuki Iwashima
@ 2025-08-22 13:37 ` Jakub Sitnicki
0 siblings, 0 replies; 5+ messages in thread
From: Jakub Sitnicki @ 2025-08-22 13:37 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Neal Cardwell, Paolo Abeni, kernel-team, Lee Valentine
On Thu, Aug 21, 2025 at 08:58 PM -07, Kuniyuki Iwashima wrote:
> On Thu, Aug 21, 2025 at 4:09 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Currently, when an inet_bind_bucket enters a state where fastreuse >= 0 or
>> fastreuseport >= 0, after a socket explicitly binds to a port, it stays in
>> that state until all associated sockets are removed and the bucket is
>> destroyed.
>>
>> In this state, the bucket is skipped during ephemeral port selection in
>> connect(). For applications using a small ephemeral port range (via
>> IP_LOCAL_PORT_RANGE option), this can lead to quicker port exhaustion
>> because "blocked" buckets remain excluded from reuse.
>>
>> The reason for not updating the bucket state on port release is unclear. It
>> may have been a performance trade-off to avoid scanning bucket owners, or
>> simply an oversight.
>>
>> Address it by recalculating the bind bucket state when a socket releases a
>> port. To minimize overhead, use a divide-and-conquer strategy: duplicate
>> the (fastreuse, fastreuseport) state in each inet_bind2_bucket. On port
>> release, we only need to scan the relevant port-addr bucket, and the
>> overall port bucket state can be derived from those.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>> include/net/inet_connection_sock.h | 5 +++--
>> include/net/inet_hashtables.h | 2 ++
>> include/net/inet_sock.h | 2 ++
>> include/net/inet_timewait_sock.h | 3 ++-
>> include/net/tcp.h | 15 +++++++++++++++
>> net/ipv4/inet_connection_sock.c | 12 ++++++++----
>> net/ipv4/inet_hashtables.c | 32 +++++++++++++++++++++++++++++++-
>> net/ipv4/inet_timewait_sock.c | 1 +
>> 8 files changed, 64 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
>> index 1735db332aab..072347f16483 100644
>> --- a/include/net/inet_connection_sock.h
>> +++ b/include/net/inet_connection_sock.h
>> @@ -322,8 +322,9 @@ int inet_csk_listen_start(struct sock *sk);
>> void inet_csk_listen_stop(struct sock *sk);
>>
>> /* update the fast reuse flag when adding a socket */
>> -void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
>> - struct sock *sk);
>> +void inet_csk_update_fastreuse(const struct sock *sk,
>> + struct inet_bind_bucket *tb,
>> + struct inet_bind2_bucket *tb2);
>>
>> struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);
>>
>> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
>> index 19dbd9081d5a..d6676746dabf 100644
>> --- a/include/net/inet_hashtables.h
>> +++ b/include/net/inet_hashtables.h
>> @@ -108,6 +108,8 @@ struct inet_bind2_bucket {
>> struct hlist_node bhash_node;
>> /* List of sockets hashed to this bucket */
>> struct hlist_head owners;
>> + signed char fastreuse;
>> + signed char fastreuseport;
>> };
>>
>> static inline struct net *ib_net(const struct inet_bind_bucket *ib)
>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>> index 1086256549fa..9614d0430471 100644
>> --- a/include/net/inet_sock.h
>> +++ b/include/net/inet_sock.h
>> @@ -279,6 +279,8 @@ enum {
>> INET_FLAGS_RTALERT_ISOLATE = 28,
>> INET_FLAGS_SNDFLOW = 29,
>> INET_FLAGS_RTALERT = 30,
>> + /* socket bound to a port at connect() time */
>> + INET_FLAGS_AUTOBIND = 31,
>
> AUTOBIND sounds like inet_autobind() was called.
That was intentional. I was going for an analogy to
inet_dgram_connect->inet_autobind, but I see how it can also be
confusing.
> __inet_bind() saves similar flags in sk->sk_userlocks and
> it has 3 bits available.
>
> How about flagging SOCK_BINDPORT_CONNECT in
> sk->sk_userlocks ?
I was on the fence whether to put the bit flag in sk_userlocks or
inet_flags. Treating it as a variant of BINDPORT lock also makes sense.
>> };
>>
>> /* cmsg flags for inet */
>> diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
>> index 67a313575780..ec99176d576f 100644
>> --- a/include/net/inet_timewait_sock.h
>> +++ b/include/net/inet_timewait_sock.h
>> @@ -70,7 +70,8 @@ struct inet_timewait_sock {
>> unsigned int tw_transparent : 1,
>> tw_flowlabel : 20,
>> tw_usec_ts : 1,
>> - tw_pad : 2, /* 2 bits hole */
>> + tw_autobind : 1,
>> + tw_pad : 1, /* 1 bit hole */
>> tw_tos : 8;
>> u32 tw_txhash;
>> u32 tw_priority;
>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>> index 2936b8175950..c4bb6e56a668 100644
>> --- a/include/net/tcp.h
>> +++ b/include/net/tcp.h
>> @@ -2225,6 +2225,21 @@ static inline bool inet_sk_transparent(const struct sock *sk)
>> return inet_test_bit(TRANSPARENT, sk);
>> }
>>
>> +/**
>> + * inet_sk_autobind - Check if socket was bound to a port at connect() time.
>> + * @sk: &struct inet_connection_sock or &struct inet_timewait_sock
>> + */
>> +static inline bool inet_sk_autobind(const struct sock *sk)
>> +{
>> + switch (sk->sk_state) {
>> + case TCP_TIME_WAIT:
>> + return inet_twsk(sk)->tw_autobind;
>> + case TCP_NEW_SYN_RECV:
>> + return false; /* n/a to request sock */
>
> This never happens. Maybe remove the case
> or add DEBUG_NET_WARN_ON_ONCE(1) ?
Will probably just remove it.
Thanks for reviewing!
>> + }
>> + return inet_test_bit(AUTOBIND, sk);
>> +}
>> +
[...]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-08-22 13:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21 11:09 [PATCH net-next v2 0/2] tcp: Update bind bucket state on port release Jakub Sitnicki
2025-08-21 11:09 ` [PATCH net-next v2 1/2] " Jakub Sitnicki
2025-08-22 3:58 ` Kuniyuki Iwashima
2025-08-22 13:37 ` Jakub Sitnicki
2025-08-21 11:09 ` [PATCH net-next v2 2/2] selftests/net: Test tcp port reuse after unbinding a socket Jakub Sitnicki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).