Netdev List
 help / color / mirror / Atom feed
From: luoxuanqiang <xuanqiang.luo@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Eric Dumazet <edumazet@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	netdev@vger.kernel.org, "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	luoxuanqiang <luoxuanqiang@kylinos.cn>
Subject: Re: [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range
Date: Tue, 30 Jun 2026 13:56:37 +0800	[thread overview]
Message-ID: <3b746d04-33f3-4d86-bc8c-292a341557ac@linux.dev> (raw)
In-Reply-To: <CAAVpQUA+yKA773dtgr2E=mu4SbMCjWZk83sFYrWS6f1sDphdRQ@mail.gmail.com>


在 2026/6/30 02:21, Kuniyuki Iwashima 写道:
> On Fri, Jun 26, 2026 at 7:00 PM luoxuanqiang <xuanqiang.luo@linux.dev> wrote:
>>> 2026年6月27日 07:40,Kuniyuki Iwashima <kuniyu@google.com> 写道:
>>>
>>> On Fri, Jun 26, 2026 at 2:40 AM <xuanqiang.luo@linux.dev> wrote:
>>>> From: luoxuanqiang <luoxuanqiang@kylinos.cn>
>>>>
>>>> IP_LOCAL_PORT_RANGE lets applications override the netns ephemeral port
>>>> range on a per-socket basis.  __inet_hash_connect() already treats such a
>>>> range as an explicit application partition and scans it with step 1 [1].
>>>>
>>>> Do the same in inet_csk_find_open_port():
>>> What's the use case of IP_LOCAL_PORT_RANGE + bind(, 0)
>>> without IP_BIND_ADDRESS_NO_PORT ?
>> Hi Kuniyuki,
>>
>> Thanks for the question!
>>
>> The use case is when an application wants to restrict ephemeral port
>> allocation to a socket-local IP_LOCAL_PORT_RANGE, but still needs
>> bind(..., 0) to allocate and reserve a local port immediately.
> IP_LOCAL_PORT_RANGE was introduced for connect().
>
> Unlike connect(), bind() occupies the port without SO_REUSEADDR/PORT,
> so I don't think the step 1 or 2 makes any difference.
>
Hi Kuniyuki,

That's a fair point — bind() takes exclusive ownership of the port
without SO_REUSEADDR/PORT, so the parity split only changes the scan
order, not the set of ports bind() can pick.  Correctness-wise there is
no difference between step 1 and step 2 here.

There are a couple of smaller things that made me think it is still
worth aligning the two paths, though:

- inet_csk_find_open_port() already consumes the narrowed range from
   IP_LOCAL_PORT_RANGE since commit 91d0b78c5177f ("inet: Add
   IP_LOCAL_PORT_RANGE socket option"), so the bind path isn't
   insulated from the option.  It just didn't pick up the relaxed scan
   step that __inet_hash_connect() got in commit 207184853dbdb
   ("tcp/dccp: change source port selection at connect() time").  Eric
   even noted at the end of that commit:

     "A similar change can be done in inet_csk_find_open_port() if
      needed."

   So this felt more like completing the companion change than adding
   something new.

- The entropy argument from commit 207184853dbdb applies equally here:
   the parity split drops one bit of the 16-bit sport for RSS hashing.
   Whether the port came from connect() or bind() doesn't matter to the
   NIC, and losing that bit hurts more when the application has already
   shrunk its port space with IP_LOCAL_PORT_RANGE.

- When IP_BIND_ADDRESS_NO_PORT is not set, plain bind(, 0) reserves a
   port immediately through inet_csk_find_open_port().  If the same
   application also uses connect() on other sockets within the same
   IP_LOCAL_PORT_RANGE, connect() now scans the full range while bind()
   still biases toward odd ports — so the parity heuristic works against
   itself inside the application's own partition.

Does that make sense, or am I over-thinking the consistency angle?

Thanks,
Xuanqiang

>> IP_BIND_ADDRESS_NO_PORT is useful when the application can defer port
>> allocation until connect(), but it changes this behavior: bind(..., 0)
>> does not reserve a port in that case. So it is not a replacement for
>> applications that need the local port before connect(), for example to
>> publish it to another component or set up local policy.
>>
>> This patch is also intended to keep the bind(..., 0) path consistent with
>> Eric's earlier change in __inet_hash_connect().
>>
>> Thanks,
>> Xuanqiang

      reply	other threads:[~2026-06-30  5:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-26  9:38 [PATCH net-next v1] tcp/dccp: avoid parity split for socket-local bind range xuanqiang.luo
2026-06-26 23:40 ` Kuniyuki Iwashima
2026-06-27  1:59   ` luoxuanqiang
2026-06-29 18:21     ` Kuniyuki Iwashima
2026-06-30  5:56       ` luoxuanqiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b746d04-33f3-4d86-bc8c-292a341557ac@linux.dev \
    --to=xuanqiang.luo@linux.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=luoxuanqiang@kylinos.cn \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox