public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: hawk@kernel.org
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	shuah@kernel.org, linux-kselftest@vger.kernel.org,
	hawk@kernel.org, ivan@cloudflare.com, kernel-team@cloudflare.com
Subject: [RFC PATCH net-next 1/4] ipv4: make inet_addr_lst hash table size configurable
Date: Tue, 31 Mar 2026 23:07:36 +0200	[thread overview]
Message-ID: <20260331210739.3998753-2-hawk@kernel.org> (raw)
In-Reply-To: <20260331210739.3998753-1-hawk@kernel.org>

From: Jesper Dangaard Brouer <hawk@kernel.org>

On servers with many IPv4 addresses, __ip_dev_find() becomes visible in
perf profiles on the unconnected UDP sendmsg path. The call chain is:

  udpv6_sendmsg / udp_sendmsg
    ip_route_output_flow
      ip_route_output_key_hash_rcu
        __ip_dev_find              <-- source address validation

__ip_dev_find() calls inet_lookup_ifaddr_rcu() which walks a hash chain
in inet_addr_lst. With the current fixed table size of 256 buckets, a
host with ~700 IPv4 addresses averages ~2.8 entries per chain, adding
unnecessary cache misses under RCU on every unconnected send.

Add CONFIG_INET_ADDR_HASH_BUCKETS (default 256, range 64-16384, EXPERT)
so hosts with many addresses can size the table appropriately. The value
is rounded up to the nearest power of 2 at compile time via
order_base_2(). Memory cost is one hlist_head pointer per bucket per net
namespace.

Reported-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
 net/ipv4/Kconfig   | 16 ++++++++++++++++
 net/ipv4/devinet.c |  2 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index df922f9f5289..3c5e5e74b3e4 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -402,6 +402,22 @@ config INET_IPCOMP
 
 	  If unsure, say Y.
 
+config INET_ADDR_HASH_BUCKETS
+	int "IPv4 address hash table size" if EXPERT
+	range 64 16384
+	default 256
+	help
+	  Number of hash buckets for looking up local IPv4 addresses,
+	  e.g. during route output to validate the source address via
+	  __ip_dev_find().  Rounded up to the nearest power of 2.
+
+	  Hosts with many IPv4 addresses benefit from a larger table to reduce
+	  hash chain lengths. This is particularly relevant when sending using
+	  unconnected UDP sockets.
+
+	  The default of 256 is fine for most systems.  A value of 1024
+	  suits hosts with ~500+ addresses.
+
 config INET_TABLE_PERTURB_ORDER
 	int "INET: Source port perturbation table size (as power of 2)" if EXPERT
 	default 16
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 58fe7cb69545..9e3da06fb618 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -108,7 +108,7 @@ static const struct nla_policy ifa_ipv4_policy[IFA_MAX+1] = {
 	[IFA_PROTO]		= { .type = NLA_U8 },
 };
 
-#define IN4_ADDR_HSIZE_SHIFT	8
+#define IN4_ADDR_HSIZE_SHIFT	order_base_2(CONFIG_INET_ADDR_HASH_BUCKETS)
 #define IN4_ADDR_HSIZE		(1U << IN4_ADDR_HSIZE_SHIFT)
 
 static u32 inet_addr_hash(const struct net *net, __be32 addr)
-- 
2.43.0


  reply	other threads:[~2026-03-31 21:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31 21:07 [RFC PATCH net-next 0/4] ipv4/ipv6: local address lookup scaling hawk
2026-03-31 21:07 ` hawk [this message]
2026-03-31 21:07 ` [RFC PATCH net-next 2/4] ipv6: make inet6_addr_lst hash table size configurable hawk
2026-03-31 21:07 ` [RFC PATCH net-next 3/4] ipv4: convert inet_addr_lst to rhltable for dynamic resizing hawk
2026-03-31 21:07 ` [RFC PATCH net-next 4/4] selftests: net: add IPv4 address lookup stress test hawk
2026-04-03 22:35 ` [RFC PATCH net-next 0/4] ipv4/ipv6: local address lookup scaling David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260331210739.3998753-2-hawk@kernel.org \
    --to=hawk@kernel.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=ivan@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox