From: hawk@kernel.org
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
shuah@kernel.org, linux-kselftest@vger.kernel.org,
hawk@kernel.org, ivan@cloudflare.com, kernel-team@cloudflare.com
Subject: [RFC PATCH net-next 1/4] ipv4: make inet_addr_lst hash table size configurable
Date: Tue, 31 Mar 2026 23:07:36 +0200 [thread overview]
Message-ID: <20260331210739.3998753-2-hawk@kernel.org> (raw)
In-Reply-To: <20260331210739.3998753-1-hawk@kernel.org>
From: Jesper Dangaard Brouer <hawk@kernel.org>
On servers with many IPv4 addresses, __ip_dev_find() becomes visible in
perf profiles on the unconnected UDP sendmsg path. The call chain is:
udpv6_sendmsg / udp_sendmsg
ip_route_output_flow
ip_route_output_key_hash_rcu
__ip_dev_find <-- source address validation
__ip_dev_find() calls inet_lookup_ifaddr_rcu() which walks a hash chain
in inet_addr_lst. With the current fixed table size of 256 buckets, a
host with ~700 IPv4 addresses averages ~2.8 entries per chain, adding
unnecessary cache misses under RCU on every unconnected send.
Add CONFIG_INET_ADDR_HASH_BUCKETS (default 256, range 64-16384, EXPERT)
so hosts with many addresses can size the table appropriately. The value
is rounded up to the nearest power of 2 at compile time via
order_base_2(). Memory cost is one hlist_head pointer per bucket per net
namespace.
Reported-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
net/ipv4/Kconfig | 16 ++++++++++++++++
net/ipv4/devinet.c | 2 +-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index df922f9f5289..3c5e5e74b3e4 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -402,6 +402,22 @@ config INET_IPCOMP
If unsure, say Y.
+config INET_ADDR_HASH_BUCKETS
+ int "IPv4 address hash table size" if EXPERT
+ range 64 16384
+ default 256
+ help
+ Number of hash buckets for looking up local IPv4 addresses,
+ e.g. during route output to validate the source address via
+ __ip_dev_find(). Rounded up to the nearest power of 2.
+
+ Hosts with many IPv4 addresses benefit from a larger table to reduce
+ hash chain lengths. This is particularly relevant when sending using
+ unconnected UDP sockets.
+
+ The default of 256 is fine for most systems. A value of 1024
+ suits hosts with ~500+ addresses.
+
config INET_TABLE_PERTURB_ORDER
int "INET: Source port perturbation table size (as power of 2)" if EXPERT
default 16
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 58fe7cb69545..9e3da06fb618 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -108,7 +108,7 @@ static const struct nla_policy ifa_ipv4_policy[IFA_MAX+1] = {
[IFA_PROTO] = { .type = NLA_U8 },
};
-#define IN4_ADDR_HSIZE_SHIFT 8
+#define IN4_ADDR_HSIZE_SHIFT order_base_2(CONFIG_INET_ADDR_HASH_BUCKETS)
#define IN4_ADDR_HSIZE (1U << IN4_ADDR_HSIZE_SHIFT)
static u32 inet_addr_hash(const struct net *net, __be32 addr)
--
2.43.0
next prev parent reply other threads:[~2026-03-31 21:08 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-31 21:07 [RFC PATCH net-next 0/4] ipv4/ipv6: local address lookup scaling hawk
2026-03-31 21:07 ` hawk [this message]
2026-03-31 21:07 ` [RFC PATCH net-next 2/4] ipv6: make inet6_addr_lst hash table size configurable hawk
2026-03-31 21:07 ` [RFC PATCH net-next 3/4] ipv4: convert inet_addr_lst to rhltable for dynamic resizing hawk
2026-03-31 21:07 ` [RFC PATCH net-next 4/4] selftests: net: add IPv4 address lookup stress test hawk
2026-04-03 22:35 ` [RFC PATCH net-next 0/4] ipv4/ipv6: local address lookup scaling David Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260331210739.3998753-2-hawk@kernel.org \
--to=hawk@kernel.org \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=ivan@cloudflare.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox