All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org,  "David S. Miller" <davem@davemloft.net>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Kuniyuki Iwashima <kuniyu@google.com>,
	 Neal Cardwell <ncardwell@google.com>,
	 Paolo Abeni <pabeni@redhat.com>,
	kernel-team@cloudflare.com,
	 Lee Valentine <lvalentine@cloudflare.com>
Subject: Re: [PATCH RFC net-next 1/2] tcp: Update bind bucket state on port release
Date: Fri, 08 Aug 2025 14:06:25 +0200	[thread overview]
Message-ID: <87cy9681by.fsf@cloudflare.com> (raw)
In-Reply-To: <CANn89iJd1Wsc552rYjSQSKXMZ92PmU0NczJp+Y-0n07Njaoc8A@mail.gmail.com> (Eric Dumazet's message of "Fri, 8 Aug 2025 04:43:57 -0700")

On Fri, Aug 08, 2025 at 04:43 AM -07, Eric Dumazet wrote:
> On Fri, Aug 8, 2025 at 2:10 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Currently, when an inet_bind_bucket enters a state where fastreuse >= 0 or
>> fastreuseport >= 0, after a socket explicitly binds to a port, it stays in
>> that state until all associated sockets are removed and the bucket is
>> destroyed.
>>
>> In this state, the bucket is skipped during ephemeral port selection in
>> connect(). For applications using a small ephemeral port range (via
>> IP_LOCAL_PORT_RANGE option), this can lead to quicker port exhaustion
>> because "blocked" buckets remain excluded from reuse.
>>
>> The reason for not updating the bucket state on port release is unclear. It
>> may have been a performance trade-off to avoid scanning bucket owners, or
>> simply an oversight.
>>
>> Address it by recalculating the bind bucket state when a socket releases a
>> port. To minimize overhead, use a divide-and-conquer strategy: duplicate
>> the (fastreuse, fastreuseport) state in each inet_bind2_bucket. On port
>> release, we only need to scan the relevant port-addr bucket, and the
>> overall port bucket state can be derived from those.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  include/net/inet_connection_sock.h |  5 +++--
>>  include/net/inet_hashtables.h      |  2 ++
>>  include/net/inet_sock.h            |  2 ++
>>  include/net/inet_timewait_sock.h   |  3 ++-
>>  include/net/tcp.h                  | 12 ++++++++++++
>>  net/ipv4/inet_connection_sock.c    | 12 ++++++++----
>>  net/ipv4/inet_hashtables.c         | 31 ++++++++++++++++++++++++++++++-
>>  net/ipv4/inet_timewait_sock.c      |  1 +
>>  8 files changed, 60 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
>> index 1735db332aab..072347f16483 100644
>> --- a/include/net/inet_connection_sock.h
>> +++ b/include/net/inet_connection_sock.h
>> @@ -322,8 +322,9 @@ int inet_csk_listen_start(struct sock *sk);
>>  void inet_csk_listen_stop(struct sock *sk);
>>
>>  /* update the fast reuse flag when adding a socket */
>> -void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
>> -                              struct sock *sk);
>> +void inet_csk_update_fastreuse(const struct sock *sk,
>> +                              struct inet_bind_bucket *tb,
>> +                              struct inet_bind2_bucket *tb2);
>>
>>  struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);
>>
>> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
>> index 19dbd9081d5a..d6676746dabf 100644
>> --- a/include/net/inet_hashtables.h
>> +++ b/include/net/inet_hashtables.h
>> @@ -108,6 +108,8 @@ struct inet_bind2_bucket {
>>         struct hlist_node       bhash_node;
>>         /* List of sockets hashed to this bucket */
>>         struct hlist_head       owners;
>> +       signed char             fastreuse;
>> +       signed char             fastreuseport;
>>  };
>>
>>  static inline struct net *ib_net(const struct inet_bind_bucket *ib)
>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>> index 1086256549fa..73f1dbc1a04b 100644
>> --- a/include/net/inet_sock.h
>> +++ b/include/net/inet_sock.h
>> @@ -279,6 +279,8 @@ enum {
>>         INET_FLAGS_RTALERT_ISOLATE = 28,
>>         INET_FLAGS_SNDFLOW      = 29,
>>         INET_FLAGS_RTALERT      = 30,
>> +       /* socket bound to a port at connect() time */
>> +       INET_FLAGS_LAZY_BIND    = 31,
>
> I am not a huge fan of this name. I think we already use something
> like autobind.

Now that I think of it - it is just another autobind path. Will change.

> I have not seen where you clear this bit, once it has been set, it
> sticks forever ?
>
> Perhaps add in the selftest something to call tcp_disconnect() :)
>
> fd = socket()
> connect(fd ...) // this sets the 'autobind' bit
> connect(fd ... AF_UNSPEC ..)  // disconnects
> // reuse fd
> bind(fd, .... port=X)
> connect(fd ...) // after this point 'autobind' should not be set.

You're right. That's not handled correctly at all. The bit should be
cleared on disconnect. Completely missed that scenario.

Thanks for reviewing!

  reply	other threads:[~2025-08-08 12:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-08  9:10 [PATCH RFC net-next 0/2] tcp: Update bind bucket state on port release Jakub Sitnicki
2025-08-08  9:10 ` [PATCH RFC net-next 1/2] " Jakub Sitnicki
2025-08-08 11:43   ` Eric Dumazet
2025-08-08 12:06     ` Jakub Sitnicki [this message]
2025-08-08  9:10 ` [PATCH RFC net-next 2/2] selftests/net: Test tcp port reuse after unbinding a socket Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cy9681by.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=lvalentine@cloudflare.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.