All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Eric Dumazet <edumazet@google.com>,
	 Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Neal Cardwell <ncardwell@google.com>,
	 Kuniyuki Iwashima <kuniyu@google.com>
Cc: netdev@vger.kernel.org,  kernel-team@cloudflare.com,
	 Lee Valentine <lvalentine@cloudflare.com>
Subject: Re: [PATCH net-next v2 1/2] tcp: Consider every port when connecting with IP_LOCAL_PORT_RANGE
Date: Tue, 08 Jul 2025 11:13:33 +0200	[thread overview]
Message-ID: <878qkzgi4i.fsf@cloudflare.com> (raw)
In-Reply-To: <20250703-connect-port-search-harder-v2-1-d51bce6bd0a6@cloudflare.com> (Jakub Sitnicki's message of "Thu, 3 Jul 2025 17:59:36 +0200")

On Thu, Jul 03, 2025 at 05:59 PM +02, Jakub Sitnicki wrote:
> Situation
> ---------
>
> We observe the following scenario in production:
>
>                                                   inet_bind_bucket
>                                                 state for port 54321
>                                                 --------------------
>
>                                                 (bucket doesn't exist)
>
> // Process A opens a long-lived connection:
> s1 = socket(AF_INET, SOCK_STREAM)
> s1.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> s1.setsockopt(IP_LOCAL_PORT_RANGE, 54000..54500)
> s1.bind(192.0.2.10, 0)
> s1.connect(192.51.100.1, 443)
>                                                 tb->reuse = -1
>                                                 tb->reuseport = -1
> s1.getsockname() -> 192.0.2.10:54321
> s1.send()
> s1.recv()
> // ... s1 stays open.
>
> // Process B opens a short-lived connection:
> s2 = socket(AF_INET, SOCK_STREAM)
> s2.setsockopt(SO_REUSEADDR)
> s2.bind(192.0.2.20, 0)
>                                                 tb->reuse = 0
>                                                 tb->reuseport = 0
> s2.connect(192.51.100.2, 53)
> s2.getsockname() -> 192.0.2.20:54321
> s2.send()
> s2.recv()
> s2.close()
>
>                                                 // bucket remains in this
>                                                 // state even though port
>                                                 // was released by s2
>                                                 tb->reuse = 0
>                                                 tb->reuseport = 0
>
> // Process A attempts to open another connection
> // when there is connection pressure from
> // 192.0.2.30:54000..54500 to 192.51.100.1:443.
> // Assume only port 54321 is still available.
>
> s3 = socket(AF_INET, SOCK_STREAM)
> s3.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> s3.setsockopt(IP_LOCAL_PORT_RANGE, 54000..54500)
> s3.bind(192.0.2.30, 0)
> s3.connect(192.51.100.1, 443) -> EADDRNOTAVAIL (99)
>
> Problem
> -------
>
> We end up in a state where Process A can't reuse ephemeral port 54321 for
> as long as there are sockets, like s1, that keep the bind bucket alive. The
> bucket does not return to "reusable" state even when all sockets which
> blocked it from reuse, like s2, are gone.
>
> The ephemeral port becomes available for use again only after all sockets
> bound to it are gone and the bind bucket is destroyed.
>
> Programs which behave like Process B in this scenario - that is, binding to
> an IP address without setting IP_BIND_ADDRESS_NO_PORT - might be considered
> poorly written. However, the reality is that such implementation is not
> actually uncommon. Trying to fix each and every such program is like
> playing whack-a-mole.
>
> For instance, it could be any software using Golang's net.Dialer with
> LocalAddr provided:
>
>         dialer := &net.Dialer{
>                 LocalAddr: &net.TCPAddr{IP: srcIP},
>         }
>         conn, err := dialer.Dial("tcp4", dialTarget)
>
> Or even a ubiquitous tool like dig when using a specific local address:
>
>         $ dig -b 127.1.1.1 +tcp +short example.com
>
> Hence, we are proposing a systematic fix in the network stack itself.
>
> Solution
> --------
>
> If there is no IP address conflict with any socket bound to a given local
> port, then from the protocol's perspective, the port can be safely shared.
>
> With that in mind, modify the port search during connect(), that is
> __inet_hash_connect, to consider all bind buckets (ports) when looking for
> a local port for egress.
>
> To achieve this, add an extra walk over bhash2 buckets for the port to
> check for IP conflicts. The additional walk is not free, so perform it only
> once per port - during the second phase of conflict checking, when the
> bhash bucket is locked.
>
> We enable this changed behavior only if the IP_LOCAL_PORT_RANGE socket
> option is set. The rationale is that users are likely to care about using
> every possible local port only when they have deliberately constrained the
> ephemeral port range.

I was wondering what the maintainers' outlook on this change is:

Does this sound acceptable?
Or should we start looking in a different direction?

Thanks,
-jkbs

[...]

  parent reply	other threads:[~2025-07-08  9:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03 15:59 [PATCH net-next v2 1/2] tcp: Consider every port when connecting with IP_LOCAL_PORT_RANGE Jakub Sitnicki
2025-07-03 15:59 ` [PATCH net-next v2 2/2] selftests/net: Cover port sharing scenarios " Jakub Sitnicki
2025-07-08  9:13 ` Jakub Sitnicki [this message]
2025-07-08  9:35   ` [PATCH net-next v2 1/2] tcp: Consider every port when connecting " Eric Dumazet
2025-07-08 11:38 ` Eric Dumazet
2025-07-08 14:35   ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878qkzgi4i.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=lvalentine@cloudflare.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.