* [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range
@ 2026-06-26 9:38 xuanqiang.luo
2026-06-26 23:40 ` Kuniyuki Iwashima
0 siblings, 1 reply; 5+ messages in thread
From: xuanqiang.luo @ 2026-06-26 9:38 UTC (permalink / raw)
To: Eric Dumazet, Neal Cardwell, netdev
Cc: Kuniyuki Iwashima, David S . Miller, Jakub Kicinski, Paolo Abeni,
Simon Horman, luoxuanqiang
From: luoxuanqiang <luoxuanqiang@kylinos.cn>
IP_LOCAL_PORT_RANGE lets applications override the netns ephemeral port
range on a per-socket basis. __inet_hash_connect() already treats such a
range as an explicit application partition and scans it with step 1 [1].
Do the same in inet_csk_find_open_port(): when a socket-local range is set,
walk the whole selected range instead of first splitting it by parity.
Keep the existing step-2 parity behavior for sockets using the netns range,
so the default bind/connect separation remains unchanged.
[1] https://lore.kernel.org/r/20231214192939.1962891-3-edumazet@google.com
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: luoxuanqiang <luoxuanqiang@kylinos.cn>
---
net/ipv4/inet_connection_sock.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 56902bba54838..ad8af70c92ca3 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -323,13 +323,16 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
struct inet_bind2_bucket *tb2;
struct inet_bind_bucket *tb;
u32 remaining, offset;
+ bool local_ports;
bool relax = false;
+ int step;
l3mdev = inet_sk_bound_l3mdev(sk);
ports_exhausted:
attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
other_half_scan:
- inet_sk_get_local_port_range(sk, &low, &high);
+ local_ports = inet_sk_get_local_port_range(sk, &low, &high);
+ step = local_ports ? 1 : 2;
high++; /* [32768, 60999] -> [32768, 61000[ */
if (high - low < 4)
attempt_half = 0;
@@ -342,18 +345,19 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
low = half;
}
remaining = high - low;
- if (likely(remaining > 1))
+ if (!local_ports && remaining > 1)
remaining &= ~1U;
offset = get_random_u32_below(remaining);
/* __inet_hash_connect() favors ports having @low parity
* We do the opposite to not pollute connect() users.
*/
- offset |= 1U;
+ if (!local_ports)
+ offset |= 1U;
other_parity_scan:
port = low + offset;
- for (i = 0; i < remaining; i += 2, port += 2) {
+ for (i = 0; i < remaining; i += step, port += step) {
if (unlikely(port >= high))
port -= remaining;
if (inet_is_local_reserved_port(net, port))
@@ -384,9 +388,11 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
cond_resched();
}
- offset--;
- if (!(offset & 1))
- goto other_parity_scan;
+ if (!local_ports) {
+ offset--;
+ if (!(offset & 1))
+ goto other_parity_scan;
+ }
if (attempt_half == 1) {
/* OK we now try the upper half of the range */
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range
2026-06-26 9:38 [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range xuanqiang.luo
@ 2026-06-26 23:40 ` Kuniyuki Iwashima
2026-06-27 1:59 ` luoxuanqiang
0 siblings, 1 reply; 5+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-26 23:40 UTC (permalink / raw)
To: xuanqiang.luo
Cc: Eric Dumazet, Neal Cardwell, netdev, David S . Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, luoxuanqiang
On Fri, Jun 26, 2026 at 2:40 AM <xuanqiang.luo@linux.dev> wrote:
>
> From: luoxuanqiang <luoxuanqiang@kylinos.cn>
>
> IP_LOCAL_PORT_RANGE lets applications override the netns ephemeral port
> range on a per-socket basis. __inet_hash_connect() already treats such a
> range as an explicit application partition and scans it with step 1 [1].
>
> Do the same in inet_csk_find_open_port():
What's the use case of IP_LOCAL_PORT_RANGE + bind(, 0)
without IP_BIND_ADDRESS_NO_PORT ?
> when a socket-local range is set,
> walk the whole selected range instead of first splitting it by parity.
> Keep the existing step-2 parity behavior for sockets using the netns range,
> so the default bind/connect separation remains unchanged.
>
> [1] https://lore.kernel.org/r/20231214192939.1962891-3-edumazet@google.com
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: luoxuanqiang <luoxuanqiang@kylinos.cn>
> ---
> net/ipv4/inet_connection_sock.c | 20 +++++++++++++-------
> 1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 56902bba54838..ad8af70c92ca3 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -323,13 +323,16 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
> struct inet_bind2_bucket *tb2;
> struct inet_bind_bucket *tb;
> u32 remaining, offset;
> + bool local_ports;
> bool relax = false;
> + int step;
>
> l3mdev = inet_sk_bound_l3mdev(sk);
> ports_exhausted:
> attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
> other_half_scan:
> - inet_sk_get_local_port_range(sk, &low, &high);
> + local_ports = inet_sk_get_local_port_range(sk, &low, &high);
> + step = local_ports ? 1 : 2;
> high++; /* [32768, 60999] -> [32768, 61000[ */
> if (high - low < 4)
> attempt_half = 0;
> @@ -342,18 +345,19 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
> low = half;
> }
> remaining = high - low;
> - if (likely(remaining > 1))
> + if (!local_ports && remaining > 1)
> remaining &= ~1U;
>
> offset = get_random_u32_below(remaining);
> /* __inet_hash_connect() favors ports having @low parity
> * We do the opposite to not pollute connect() users.
> */
> - offset |= 1U;
> + if (!local_ports)
> + offset |= 1U;
>
> other_parity_scan:
> port = low + offset;
> - for (i = 0; i < remaining; i += 2, port += 2) {
> + for (i = 0; i < remaining; i += step, port += step) {
> if (unlikely(port >= high))
> port -= remaining;
> if (inet_is_local_reserved_port(net, port))
> @@ -384,9 +388,11 @@ inet_csk_find_open_port(const struct sock *sk, struct inet_bind_bucket **tb_ret,
> cond_resched();
> }
>
> - offset--;
> - if (!(offset & 1))
> - goto other_parity_scan;
> + if (!local_ports) {
> + offset--;
> + if (!(offset & 1))
> + goto other_parity_scan;
> + }
>
> if (attempt_half == 1) {
> /* OK we now try the upper half of the range */
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range
2026-06-26 23:40 ` Kuniyuki Iwashima
@ 2026-06-27 1:59 ` luoxuanqiang
2026-06-29 18:21 ` Kuniyuki Iwashima
0 siblings, 1 reply; 5+ messages in thread
From: luoxuanqiang @ 2026-06-27 1:59 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Eric Dumazet, Neal Cardwell, netdev, David S . Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, luoxuanqiang
> 2026年6月27日 07:40,Kuniyuki Iwashima <kuniyu@google.com> 写道:
>
> On Fri, Jun 26, 2026 at 2:40 AM <xuanqiang.luo@linux.dev> wrote:
>>
>> From: luoxuanqiang <luoxuanqiang@kylinos.cn>
>>
>> IP_LOCAL_PORT_RANGE lets applications override the netns ephemeral port
>> range on a per-socket basis. __inet_hash_connect() already treats such a
>> range as an explicit application partition and scans it with step 1 [1].
>>
>> Do the same in inet_csk_find_open_port():
>
> What's the use case of IP_LOCAL_PORT_RANGE + bind(, 0)
> without IP_BIND_ADDRESS_NO_PORT ?
Hi Kuniyuki,
Thanks for the question!
The use case is when an application wants to restrict ephemeral port
allocation to a socket-local IP_LOCAL_PORT_RANGE, but still needs
bind(..., 0) to allocate and reserve a local port immediately.
IP_BIND_ADDRESS_NO_PORT is useful when the application can defer port
allocation until connect(), but it changes this behavior: bind(..., 0)
does not reserve a port in that case. So it is not a replacement for
applications that need the local port before connect(), for example to
publish it to another component or set up local policy.
This patch is also intended to keep the bind(..., 0) path consistent with
Eric's earlier change in __inet_hash_connect().
Thanks,
Xuanqiang
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range
2026-06-27 1:59 ` luoxuanqiang
@ 2026-06-29 18:21 ` Kuniyuki Iwashima
2026-06-30 5:56 ` luoxuanqiang
0 siblings, 1 reply; 5+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-29 18:21 UTC (permalink / raw)
To: luoxuanqiang
Cc: Eric Dumazet, Neal Cardwell, netdev, David S . Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, luoxuanqiang
On Fri, Jun 26, 2026 at 7:00 PM luoxuanqiang <xuanqiang.luo@linux.dev> wrote:
> > 2026年6月27日 07:40,Kuniyuki Iwashima <kuniyu@google.com> 写道:
> >
> > On Fri, Jun 26, 2026 at 2:40 AM <xuanqiang.luo@linux.dev> wrote:
> >>
> >> From: luoxuanqiang <luoxuanqiang@kylinos.cn>
> >>
> >> IP_LOCAL_PORT_RANGE lets applications override the netns ephemeral port
> >> range on a per-socket basis. __inet_hash_connect() already treats such a
> >> range as an explicit application partition and scans it with step 1 [1].
> >>
> >> Do the same in inet_csk_find_open_port():
> >
> > What's the use case of IP_LOCAL_PORT_RANGE + bind(, 0)
> > without IP_BIND_ADDRESS_NO_PORT ?
> Hi Kuniyuki,
>
> Thanks for the question!
>
> The use case is when an application wants to restrict ephemeral port
> allocation to a socket-local IP_LOCAL_PORT_RANGE, but still needs
> bind(..., 0) to allocate and reserve a local port immediately.
IP_LOCAL_PORT_RANGE was introduced for connect().
Unlike connect(), bind() occupies the port without SO_REUSEADDR/PORT,
so I don't think the step 1 or 2 makes any difference.
>
> IP_BIND_ADDRESS_NO_PORT is useful when the application can defer port
> allocation until connect(), but it changes this behavior: bind(..., 0)
> does not reserve a port in that case. So it is not a replacement for
> applications that need the local port before connect(), for example to
> publish it to another component or set up local policy.
>
> This patch is also intended to keep the bind(..., 0) path consistent with
> Eric's earlier change in __inet_hash_connect().
>
> Thanks,
> Xuanqiang
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range
2026-06-29 18:21 ` Kuniyuki Iwashima
@ 2026-06-30 5:56 ` luoxuanqiang
0 siblings, 0 replies; 5+ messages in thread
From: luoxuanqiang @ 2026-06-30 5:56 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: Eric Dumazet, Neal Cardwell, netdev, David S . Miller,
Jakub Kicinski, Paolo Abeni, Simon Horman, luoxuanqiang
在 2026/6/30 02:21, Kuniyuki Iwashima 写道:
> On Fri, Jun 26, 2026 at 7:00 PM luoxuanqiang <xuanqiang.luo@linux.dev> wrote:
>>> 2026年6月27日 07:40,Kuniyuki Iwashima <kuniyu@google.com> 写道:
>>>
>>> On Fri, Jun 26, 2026 at 2:40 AM <xuanqiang.luo@linux.dev> wrote:
>>>> From: luoxuanqiang <luoxuanqiang@kylinos.cn>
>>>>
>>>> IP_LOCAL_PORT_RANGE lets applications override the netns ephemeral port
>>>> range on a per-socket basis. __inet_hash_connect() already treats such a
>>>> range as an explicit application partition and scans it with step 1 [1].
>>>>
>>>> Do the same in inet_csk_find_open_port():
>>> What's the use case of IP_LOCAL_PORT_RANGE + bind(, 0)
>>> without IP_BIND_ADDRESS_NO_PORT ?
>> Hi Kuniyuki,
>>
>> Thanks for the question!
>>
>> The use case is when an application wants to restrict ephemeral port
>> allocation to a socket-local IP_LOCAL_PORT_RANGE, but still needs
>> bind(..., 0) to allocate and reserve a local port immediately.
> IP_LOCAL_PORT_RANGE was introduced for connect().
>
> Unlike connect(), bind() occupies the port without SO_REUSEADDR/PORT,
> so I don't think the step 1 or 2 makes any difference.
>
Hi Kuniyuki,
That's a fair point — bind() takes exclusive ownership of the port
without SO_REUSEADDR/PORT, so the parity split only changes the scan
order, not the set of ports bind() can pick. Correctness-wise there is
no difference between step 1 and step 2 here.
There are a couple of smaller things that made me think it is still
worth aligning the two paths, though:
- inet_csk_find_open_port() already consumes the narrowed range from
IP_LOCAL_PORT_RANGE since commit 91d0b78c5177f ("inet: Add
IP_LOCAL_PORT_RANGE socket option"), so the bind path isn't
insulated from the option. It just didn't pick up the relaxed scan
step that __inet_hash_connect() got in commit 207184853dbdb
("tcp/dccp: change source port selection at connect() time"). Eric
even noted at the end of that commit:
"A similar change can be done in inet_csk_find_open_port() if
needed."
So this felt more like completing the companion change than adding
something new.
- The entropy argument from commit 207184853dbdb applies equally here:
the parity split drops one bit of the 16-bit sport for RSS hashing.
Whether the port came from connect() or bind() doesn't matter to the
NIC, and losing that bit hurts more when the application has already
shrunk its port space with IP_LOCAL_PORT_RANGE.
- When IP_BIND_ADDRESS_NO_PORT is not set, plain bind(, 0) reserves a
port immediately through inet_csk_find_open_port(). If the same
application also uses connect() on other sockets within the same
IP_LOCAL_PORT_RANGE, connect() now scans the full range while bind()
still biases toward odd ports — so the parity heuristic works against
itself inside the application's own partition.
Does that make sense, or am I over-thinking the consistency angle?
Thanks,
Xuanqiang
>> IP_BIND_ADDRESS_NO_PORT is useful when the application can defer port
>> allocation until connect(), but it changes this behavior: bind(..., 0)
>> does not reserve a port in that case. So it is not a replacement for
>> applications that need the local port before connect(), for example to
>> publish it to another component or set up local policy.
>>
>> This patch is also intended to keep the bind(..., 0) path consistent with
>> Eric's earlier change in __inet_hash_connect().
>>
>> Thanks,
>> Xuanqiang
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-30 5:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 9:38 [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range xuanqiang.luo
2026-06-26 23:40 ` Kuniyuki Iwashima
2026-06-27 1:59 ` luoxuanqiang
2026-06-29 18:21 ` Kuniyuki Iwashima
2026-06-30 5:56 ` luoxuanqiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox