* [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables
@ 2025-11-05 16:47 Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 01/11] netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_max sysctl Florian Westphal
` (10 more replies)
0 siblings, 11 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:47 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
This patch set moves nf_conntrack to use separate pernet hash tables.
Only partially tested, not yet for merge.
Not converted here and left for a later date:
- we still use a global array of locks to guard hash tables.
- only one (global) hash secret
- memory accounting is not resolved here, after this patchset,
single netns can hog system memory as there are no limits (anymore).
This will need memcg integration for nf_conn structs, but the patchset
that does it isn't complete yet.
Alternative would be to keep the constraint added in first patch, i.e.
let nf_max_conntrack be bound by the init_net setting
(and then maybe remove the constraint later).
- ovs/tc 100% untested.
- bpf ct helpers untested.
Patches 1 and 2 are not dependant on the rest and could go in
regardless of the rest of the series (after more testing).
Only upside of this patchset that I can see is faster conntrack dumps:
we no longer need to skip foreign netns-owned entries.
Downside is the excess indirections required for the pernet table
accesses.
We also need to be careful on lookup and iteration, we can still
observe foreign netns conntracks due to SLAB_TYPESAFE_BY_RCU reuse.
This is also the reason why nf_conn retains the struct net pointer,
I think thats better than going back to pernet slab caches.
Florian Westphal (10):
netfilter: conntrack: don't schedule gc worker when table is empty
tests: netfilter: conntrack_resize: prepare for pernet conntrack table
netfilter: conntrack: pass pointer to buckets instead of index
netfilter: conntrack: split hashtable auto-size to helper function
netfilter: conntrack: move nf_conntrack_hash to struct net
netfilter: conntrack: init and start independent gc workers when needed
netfilter: conntrack: make nf_conntrack hash table pernet
netfilter: conntrack: delay conntrack hashtable allocation until needed
netfilter: conntrack: allow non-init-net to change table size
netfilter: nf_nat: make bysource hash table pernet
lvxiafei (1):
netfilter: netns nf_conntrack: per-netns
net.netfilter.nf_conntrack_max sysctl
.../networking/nf_conntrack-sysctl.rst | 8 +-
include/net/netfilter/nf_conntrack.h | 42 +-
include/net/netfilter/nf_conntrack_core.h | 1 -
include/net/netns/conntrack.h | 3 +
net/netfilter/nf_conntrack_bpf.c | 5 +
net/netfilter/nf_conntrack_core.c | 368 +++++++++++-------
net/netfilter/nf_conntrack_expect.c | 2 +-
net/netfilter/nf_conntrack_netlink.c | 30 +-
net/netfilter/nf_conntrack_proto.c | 6 +-
net/netfilter/nf_conntrack_standalone.c | 42 +-
net/netfilter/nf_nat_core.c | 100 +++--
net/openvswitch/conntrack.c | 6 +
net/sched/act_connmark.c | 6 +
net/sched/act_ct.c | 7 +
net/sched/act_ctinfo.c | 7 +
.../net/netfilter/conntrack_resize.sh | 26 +-
16 files changed, 434 insertions(+), 225 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC nf-next 01/11] netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_max sysctl
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
@ 2025-11-05 16:47 ` Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 02/11] netfilter: conntrack: don't schedule gc worker when table is empty Florian Westphal
` (9 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:47 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
From: lvxiafei <lvxiafei@sensetime.com>
Support net.netfilter.nf_conntrack_max settings per
netns, net.netfilter.nf_conntrack_max is used to more
flexibly limit the ct_count in different netns. The
default value belongs to the init_net limit.
After net.netfilter.nf_conntrack_max is set in different
netns, it is not allowed to be greater than the init_net
limit when working.
Signed-off-by: lvxiafei <lvxiafei@sensetime.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
.../networking/nf_conntrack-sysctl.rst | 7 +++++--
include/net/netfilter/nf_conntrack.h | 10 +++++++++-
include/net/netns/conntrack.h | 1 +
net/netfilter/nf_conntrack_core.c | 20 ++++++++++---------
net/netfilter/nf_conntrack_netlink.c | 2 +-
net/netfilter/nf_conntrack_standalone.c | 7 ++++---
6 files changed, 31 insertions(+), 16 deletions(-)
diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index 35f889259fcd..eaf11ec1f4dc 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -92,13 +92,16 @@ nf_conntrack_log_invalid - INTEGER
Log invalid packets of a type specified by value.
nf_conntrack_max - INTEGER
- Maximum number of allowed connection tracking entries. This value is set
- to nf_conntrack_buckets by default.
+ Maximum number of allowed connection tracking entries per netns.
+ This value is set to nf_conntrack_buckets by default.
+
Note that connection tracking entries are added to the table twice -- once
for the original direction and once for the reply direction (i.e., with
the reversed address). This means that with default settings a maxed-out
table will have a average hash chain length of 2, not 1.
+ The limit of other netns cannot be greater than init_net netns.
+
nf_conntrack_tcp_be_liberal - BOOLEAN
- 0 - disabled (default)
- not 0 - enabled
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index aa0a7c82199e..d404e1352737 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -329,7 +329,6 @@ int nf_conntrack_hash_resize(unsigned int hashsize);
extern struct hlist_nulls_head *nf_conntrack_hash;
extern unsigned int nf_conntrack_htable_size;
extern seqcount_spinlock_t nf_conntrack_generation;
-extern unsigned int nf_conntrack_max;
/* must be called with rcu read lock held */
static inline void
@@ -369,6 +368,15 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net)
return net_generic(net, nf_conntrack_net_id);
}
+static inline unsigned int nf_conntrack_max(const struct net *net)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+ return min(init_net.ct.sysctl_max, net->ct.sysctl_max);
+#else
+ return 0;
+#endif
+}
+
int nf_ct_skb_network_trim(struct sk_buff *skb, int family);
int nf_ct_handle_fragments(struct net *net, struct sk_buff *skb,
u16 zone, u8 family, u8 *proto, u16 *mru);
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index ab74b5ed0b01..2e7707b7d349 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -89,6 +89,7 @@ struct netns_ct {
u8 sysctl_acct;
u8 sysctl_tstamp;
u8 sysctl_checksum;
+ unsigned int sysctl_max;
struct ip_conntrack_stat __percpu *stat;
struct nf_ct_event_notifier __rcu *nf_conntrack_event_cb;
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 0b95f226f211..210792a2275d 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -202,8 +202,6 @@ static void nf_conntrack_all_unlock(void)
unsigned int nf_conntrack_htable_size __read_mostly;
EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
-unsigned int nf_conntrack_max __read_mostly;
-EXPORT_SYMBOL_GPL(nf_conntrack_max);
seqcount_spinlock_t nf_conntrack_generation __read_mostly;
static siphash_aligned_key_t nf_conntrack_hash_rnd;
@@ -1512,7 +1510,7 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
static void gc_worker(struct work_struct *work)
{
- unsigned int i, hashsz, nf_conntrack_max95 = 0;
+ unsigned int i, hashsz;
u32 end_time, start_time = nfct_time_stamp;
struct conntrack_gc_work *gc_work;
unsigned int expired_count = 0;
@@ -1523,8 +1521,6 @@ static void gc_worker(struct work_struct *work)
gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
i = gc_work->next_bucket;
- if (gc_work->early_drop)
- nf_conntrack_max95 = nf_conntrack_max / 100u * 95u;
if (i == 0) {
gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT;
@@ -1552,6 +1548,7 @@ static void gc_worker(struct work_struct *work)
}
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) {
+ unsigned int nf_conntrack_max95 = 0;
struct nf_conntrack_net *cnet;
struct net *net;
long expires;
@@ -1581,11 +1578,14 @@ static void gc_worker(struct work_struct *work)
expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP);
expires = (expires - (long)next_run) / ++count;
next_run += expires;
+ net = nf_ct_net(tmp);
+
+ if (gc_work->early_drop)
+ nf_conntrack_max95 = nf_conntrack_max(net) / 100u * 95u;
if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp))
continue;
- net = nf_ct_net(tmp);
cnet = nf_ct_pernet(net);
if (atomic_read(&cnet->count) < nf_conntrack_max95)
continue;
@@ -1662,13 +1662,15 @@ __nf_conntrack_alloc(struct net *net,
gfp_t gfp, u32 hash)
{
struct nf_conntrack_net *cnet = nf_ct_pernet(net);
- unsigned int ct_count;
+ unsigned int ct_max, ct_count;
struct nf_conn *ct;
+ ct_max = nf_conntrack_max(net);
+
/* We don't want any race condition at early drop stage */
ct_count = atomic_inc_return(&cnet->count);
- if (unlikely(ct_count > nf_conntrack_max)) {
+ if (unlikely(ct_count > ct_max)) {
if (!early_drop(net, hash)) {
if (!conntrack_gc_work.early_drop)
conntrack_gc_work.early_drop = true;
@@ -2663,7 +2665,7 @@ int nf_conntrack_init_start(void)
if (!nf_conntrack_hash)
return -ENOMEM;
- nf_conntrack_max = max_factor * nf_conntrack_htable_size;
+ init_net.ct.sysctl_max = max_factor * nf_conntrack_htable_size;
nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
sizeof(struct nf_conn),
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 3a04665adf99..df243d494afd 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -2590,7 +2590,7 @@ ctnetlink_stat_ct_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type,
if (nla_put_be32(skb, CTA_STATS_GLOBAL_ENTRIES, htonl(nr_conntracks)))
goto nla_put_failure;
- if (nla_put_be32(skb, CTA_STATS_GLOBAL_MAX_ENTRIES, htonl(nf_conntrack_max)))
+ if (nla_put_be32(skb, CTA_STATS_GLOBAL_MAX_ENTRIES, htonl(nf_conntrack_max(net))))
goto nla_put_failure;
nlmsg_end(skb, nlh);
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 207b240b14e5..787c506c15bd 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -644,7 +644,7 @@ enum nf_ct_sysctl_index {
static struct ctl_table nf_ct_sysctl_table[] = {
[NF_SYSCTL_CT_MAX] = {
.procname = "nf_conntrack_max",
- .data = &nf_conntrack_max,
+ .data = &init_net.ct.sysctl_max,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
@@ -925,7 +925,7 @@ static struct ctl_table nf_ct_sysctl_table[] = {
static struct ctl_table nf_ct_netfilter_table[] = {
{
.procname = "nf_conntrack_max",
- .data = &nf_conntrack_max,
+ .data = &init_net.ct.sysctl_max,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
@@ -1017,6 +1017,7 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
table[NF_SYSCTL_CT_COUNT].data = &cnet->count;
table[NF_SYSCTL_CT_CHECKSUM].data = &net->ct.sysctl_checksum;
+ table[NF_SYSCTL_CT_MAX].data = &net->ct.sysctl_max;
table[NF_SYSCTL_CT_LOG_INVALID].data = &net->ct.sysctl_log_invalid;
table[NF_SYSCTL_CT_ACCT].data = &net->ct.sysctl_acct;
#ifdef CONFIG_NF_CONNTRACK_EVENTS
@@ -1040,7 +1041,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
/* Don't allow non-init_net ns to alter global sysctls */
if (!net_eq(&init_net, net)) {
- table[NF_SYSCTL_CT_MAX].mode = 0444;
table[NF_SYSCTL_CT_EXPECT_MAX].mode = 0444;
table[NF_SYSCTL_CT_BUCKETS].mode = 0444;
}
@@ -1092,6 +1092,7 @@ static int nf_conntrack_pernet_init(struct net *net)
int ret;
net->ct.sysctl_checksum = 1;
+ net->ct.sysctl_max = init_net.ct.sysctl_max;
ret = nf_conntrack_standalone_init_sysctl(net);
if (ret < 0)
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 02/11] netfilter: conntrack: don't schedule gc worker when table is empty
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 01/11] netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_max sysctl Florian Westphal
@ 2025-11-05 16:47 ` Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 03/11] tests: netfilter: conntrack_resize: prepare for pernet conntrack table Florian Westphal
` (8 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:47 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
No need to wakeup every minute when there are no entries.
Instead of doing a scan at least once a minute, check of the worker
is pending (its expected to be except on idle system) and queue it
if its not.
In case the gc worker was executing at time of check (means, it wasn't
pending), then the gc worker should re-run at the newly computed next_run
interval. Switch it to mod_delayed_work() to allow this.
While at it, get rid of 'exiting' toggle:
use disable_delayed_work_sync instead of the 'cancel_' version at exit
time to prevent rearming.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_conntrack_core.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 210792a2275d..fa6e5047d15b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -69,7 +69,6 @@ struct conntrack_gc_work {
u32 avg_timeout;
u32 count;
u32 start_time;
- bool exiting;
bool early_drop;
};
@@ -91,7 +90,7 @@ static DEFINE_MUTEX(nf_conntrack_mutex);
* allowing non-idle machines to wakeup more often when needed.
*/
#define GC_SCAN_INITIAL_COUNT 100
-#define GC_SCAN_INTERVAL_INIT GC_SCAN_INTERVAL_MAX
+#define GC_SCAN_INTERVAL_INIT (GC_SCAN_INTERVAL_MAX / 2)
#define GC_SCAN_MAX_DURATION msecs_to_jiffies(10)
#define GC_SCAN_EXPIRED_MAX (64000u / HZ)
@@ -1639,19 +1638,17 @@ static void gc_worker(struct work_struct *work)
next_run = 1;
early_exit:
- if (gc_work->exiting)
- return;
-
if (next_run)
gc_work->early_drop = false;
- queue_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run);
+ if (gc_work->count > GC_SCAN_INITIAL_COUNT || gc_work->next_bucket > 0)
+ mod_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run);
}
static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work)
{
+ /* work is started on first conntrack allocation. */
INIT_DELAYED_WORK(&gc_work->dwork, gc_worker);
- gc_work->exiting = false;
}
static struct nf_conn *
@@ -1709,6 +1706,15 @@ __nf_conntrack_alloc(struct net *net,
* this is inserted in any list.
*/
refcount_set(&ct->ct_general.use, 0);
+
+ /* Re-arm gc_work if needed, but do not modify
+ * in case it was already pending.
+ */
+ if (unlikely(!delayed_work_pending(&conntrack_gc_work.dwork)))
+ queue_delayed_work(system_power_efficient_wq,
+ &conntrack_gc_work.dwork,
+ GC_SCAN_INTERVAL_INIT);
+
return ct;
out:
atomic_dec(&cnet->count);
@@ -2458,13 +2464,12 @@ static int kill_all(struct nf_conn *i, void *data)
void nf_conntrack_cleanup_start(void)
{
cleanup_nf_conntrack_bpf();
- conntrack_gc_work.exiting = true;
}
void nf_conntrack_cleanup_end(void)
{
RCU_INIT_POINTER(nf_ct_hook, NULL);
- cancel_delayed_work_sync(&conntrack_gc_work.dwork);
+ disable_delayed_work_sync(&conntrack_gc_work.dwork);
kvfree(nf_conntrack_hash);
nf_conntrack_proto_fini();
@@ -2687,7 +2692,6 @@ int nf_conntrack_init_start(void)
goto err_proto;
conntrack_gc_work_init(&conntrack_gc_work);
- queue_delayed_work(system_power_efficient_wq, &conntrack_gc_work.dwork, HZ);
ret = register_nf_conntrack_bpf();
if (ret < 0)
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 03/11] tests: netfilter: conntrack_resize: prepare for pernet conntrack table
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 01/11] netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_max sysctl Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 02/11] netfilter: conntrack: don't schedule gc worker when table is empty Florian Westphal
@ 2025-11-05 16:47 ` Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 04/11] netfilter: conntrack: pass pointer to buckets instead of index Florian Westphal
` (7 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:47 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
The test_conntrack_max_limit subtest will fail once we have pernet
tables, each netns can set its own limits, not bound by init_net max
setting.
Also, because ct hashtable is allocated on demand,
net.netfilter.nf_conntrack_buckets will be 0 until first user enables
conntrack, so don't try to reset this value to 0 when that was the
original value.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
.../net/netfilter/conntrack_resize.sh | 26 +++++--------------
1 file changed, 6 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/conntrack_resize.sh b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
index 615fe3c6f405..c155de936287 100755
--- a/tools/testing/selftests/net/netfilter/conntrack_resize.sh
+++ b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
@@ -35,7 +35,7 @@ cleanup() {
# restore original sysctl setting
sysctl -q net.netfilter.nf_conntrack_max=$init_net_max
- sysctl -q net.netfilter.nf_conntrack_buckets=$ct_buckets
+ [ "$ct_buckets" -gt 0 ] && sysctl -q net.netfilter.nf_conntrack_buckets=$ct_buckets
}
trap cleanup EXIT
@@ -90,9 +90,11 @@ ctresize() {
local duration="$1"
local now=$(date +%s)
local end=$((now + duration))
+ local rnd
while [ $now -lt $end ]; do
- sysctl -q net.netfilter.nf_conntrack_buckets=$RANDOM
+ rnd=$((RANDOM+1))
+ sysctl -q net.netfilter.nf_conntrack_buckets=$rnd
now=$(date +%s)
done
}
@@ -434,18 +436,6 @@ check_sysctl_immutable()
return 1
}
-test_conntrack_max_limit()
-{
- sysctl -q net.netfilter.nf_conntrack_max=100
- insert_ctnetlink "$nsclient1" 101
-
- # check netns is clamped by init_net, i.e., either netns follows
- # init_net value, or a higher pernet limit (compared to init_net) is ignored.
- check_ctcount "$nsclient1" 100 "netns conntrack_max is init_net bound"
-
- sysctl -q net.netfilter.nf_conntrack_max=$init_net_max
-}
-
test_conntrack_disable()
{
local timeout=2
@@ -476,15 +466,12 @@ check_max_alias 262000
setup_ns nsclient1 nsclient2
# check this only works from init_net
-for n in netfilter.nf_conntrack_buckets netfilter.nf_conntrack_expect_max net.nf_conntrack_max;do
- check_sysctl_immutable "$nsclient1" "net.$n" 1
-done
+check_sysctl_immutable "$nsclient1" "net.$netfilter.nf_conntrack_expect_max" 1
# won't work on older kernels. If it works, check that the netns obeys the limit
if check_sysctl_immutable "$nsclient1" net.netfilter.nf_conntrack_max 0;then
# subtest: if pernet is changeable, check that reducing it in pernet
- # limits the pernet entries. Inverse, pernet clamped by a lower init_net
- # setting, is already checked by "test_conntrack_max_limit" test.
+ # limits the pernet entries.
ip netns exec "$nsclient1" sysctl -q net.netfilter.nf_conntrack_max=1
insert_ctnetlink "$nsclient1" 2
@@ -507,7 +494,6 @@ done
tmpfile=$(mktemp)
tmpfile_proc=$(mktemp)
tmpfile_uniq=$(mktemp)
-test_conntrack_max_limit
test_dump_all
test_floodresize_all
test_conntrack_disable
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 04/11] netfilter: conntrack: pass pointer to buckets instead of index
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
` (2 preceding siblings ...)
2025-11-05 16:47 ` [RFC nf-next 03/11] tests: netfilter: conntrack_resize: prepare for pernet conntrack table Florian Westphal
@ 2025-11-05 16:47 ` Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 05/11] netfilter: conntrack: split hashtable auto-size to helper function Florian Westphal
` (6 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:47 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
This is a preparation patch to ease later conversion to pernet table.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_conntrack_core.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index fa6e5047d15b..fc9312bfa616 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -824,13 +824,13 @@ nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
EXPORT_SYMBOL_GPL(nf_conntrack_find_get);
static void __nf_conntrack_hash_insert(struct nf_conn *ct,
- unsigned int hash,
- unsigned int reply_hash)
+ struct hlist_nulls_head *head_orig,
+ struct hlist_nulls_head *head_repl)
{
hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
- &nf_conntrack_hash[hash]);
+ head_orig);
hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode,
- &nf_conntrack_hash[reply_hash]);
+ head_repl);
}
static bool nf_ct_ext_valid_pre(const struct nf_ct_ext *ext)
@@ -926,7 +926,9 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
smp_wmb();
/* The caller holds a reference to this object */
refcount_set(&ct->ct_general.use, 2);
- __nf_conntrack_hash_insert(ct, hash, reply_hash);
+ __nf_conntrack_hash_insert(ct,
+ &nf_conntrack_hash[hash],
+ &nf_conntrack_hash[reply_hash]);
nf_conntrack_double_unlock(hash, reply_hash);
NF_CT_STAT_INC(net, insert);
local_bh_enable();
@@ -1302,7 +1304,9 @@ __nf_conntrack_confirm(struct sk_buff *skb)
* setting ct->timeout. The RCU barriers guarantee that no other CPU
* can find the conntrack before the above stores are visible.
*/
- __nf_conntrack_hash_insert(ct, hash, reply_hash);
+ __nf_conntrack_hash_insert(ct,
+ &nf_conntrack_hash[hash],
+ &nf_conntrack_hash[reply_hash]);
/* IPS_CONFIRMED unset means 'ct not (yet) in hash', conntrack lookups
* skip entries that lack this bit. This happens when a CPU is looking
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 05/11] netfilter: conntrack: split hashtable auto-size to helper function
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
` (3 preceding siblings ...)
2025-11-05 16:47 ` [RFC nf-next 04/11] netfilter: conntrack: pass pointer to buckets instead of index Florian Westphal
@ 2025-11-05 16:47 ` Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 06/11] netfilter: conntrack: move nf_conntrack_hash to struct net Florian Westphal
` (5 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:47 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
Split the 'figure out a good default hash table size' into a
new function. We will later no longer do the allocation right away,
but will still do the initial size computation.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_conntrack_core.c | 33 ++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index fc9312bfa616..1f938ef8e59a 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2636,9 +2636,27 @@ int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp)
return nf_conntrack_hash_resize(hashsize);
}
-int nf_conntrack_init_start(void)
+static unsigned int nf_conntrack_htable_autosize(void)
{
unsigned long nr_pages = totalram_pages();
+ unsigned int ht_size;
+
+ ht_size = (((nr_pages << PAGE_SHIFT) / 16384)
+ / sizeof(struct hlist_head));
+ if (BITS_PER_LONG >= 64 &&
+ nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
+ ht_size = 262144;
+ else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE))
+ ht_size = 65536;
+
+ if (nf_conntrack_htable_size < 1024)
+ ht_size = 1024;
+
+ return ht_size;
+}
+
+int nf_conntrack_init_start(void)
+{
int max_factor = 8;
int ret = -ENOMEM;
int i;
@@ -2650,17 +2668,8 @@ int nf_conntrack_init_start(void)
spin_lock_init(&nf_conntrack_locks[i]);
if (!nf_conntrack_htable_size) {
- nf_conntrack_htable_size
- = (((nr_pages << PAGE_SHIFT) / 16384)
- / sizeof(struct hlist_head));
- if (BITS_PER_LONG >= 64 &&
- nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
- nf_conntrack_htable_size = 262144;
- else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE))
- nf_conntrack_htable_size = 65536;
-
- if (nf_conntrack_htable_size < 1024)
- nf_conntrack_htable_size = 1024;
+ nf_conntrack_htable_size = nf_conntrack_htable_autosize();
+
/* Use a max. factor of one by default to keep the average
* hash chain length at 2 entries. Each entry has to be added
* twice (once for original direction, once for reply).
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 06/11] netfilter: conntrack: move nf_conntrack_hash to struct net
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
` (4 preceding siblings ...)
2025-11-05 16:47 ` [RFC nf-next 05/11] netfilter: conntrack: split hashtable auto-size to helper function Florian Westphal
@ 2025-11-05 16:48 ` Florian Westphal
2025-11-07 14:03 ` kernel test robot
2025-11-05 16:48 ` [RFC nf-next 07/11] netfilter: conntrack: init and start independent gc workers when needed Florian Westphal
` (4 subsequent siblings)
10 siblings, 1 reply; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:48 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
This preparation change moves the nf_conntrack_hash to pernet scope,
but only the init_net one is allocated.
net->ct.nf_conntrack_hash aliases init_net->ct.nf_conntrack_hash.
Without this, the actual pernet conversion patch would grow too large.
1. nf_conntrack_get_ht() returns inet_net table, not pernet one.
2. Same for __nf_conntrack_confirm.
This is because hash resize would result in UaF due to stale
net->ct.nf_conntrack_hash.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_conntrack.h | 6 +-
include/net/netns/conntrack.h | 1 +
net/netfilter/nf_conntrack_core.c | 114 +++++++++++++-----------
net/netfilter/nf_conntrack_netlink.c | 3 +-
net/netfilter/nf_conntrack_standalone.c | 8 +-
5 files changed, 74 insertions(+), 58 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index d404e1352737..a90654bb2410 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -332,7 +332,8 @@ extern seqcount_spinlock_t nf_conntrack_generation;
/* must be called with rcu read lock held */
static inline void
-nf_conntrack_get_ht(struct hlist_nulls_head **hash, unsigned int *hsize)
+nf_conntrack_get_ht(struct net *net, struct hlist_nulls_head **hash,
+ unsigned int *hsize)
{
struct hlist_nulls_head *hptr;
unsigned int sequence, hsz;
@@ -340,7 +341,8 @@ nf_conntrack_get_ht(struct hlist_nulls_head **hash, unsigned int *hsize)
do {
sequence = read_seqcount_begin(&nf_conntrack_generation);
hsz = nf_conntrack_htable_size;
- hptr = nf_conntrack_hash;
+ hptr = net->ct.nf_conntrack_hash;
+ hptr = init_net.ct.nf_conntrack_hash;
} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
*hash = hptr;
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 2e7707b7d349..96b326bc1cd7 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -92,6 +92,7 @@ struct netns_ct {
unsigned int sysctl_max;
struct ip_conntrack_stat __percpu *stat;
+ struct hlist_nulls_head *nf_conntrack_hash;
struct nf_ct_event_notifier __rcu *nf_conntrack_event_cb;
struct nf_ip_net nf_ct_proto;
#if defined(CONFIG_NF_CONNTRACK_LABELS)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 1f938ef8e59a..f2ff0e70f5ab 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -60,9 +60,6 @@ EXPORT_SYMBOL_GPL(nf_conntrack_locks);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(nf_conntrack_expect_lock);
EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock);
-struct hlist_nulls_head *nf_conntrack_hash __read_mostly;
-EXPORT_SYMBOL_GPL(nf_conntrack_hash);
-
struct conntrack_gc_work {
struct delayed_work dwork;
u32 next_bucket;
@@ -738,7 +735,7 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
unsigned int bucket, hsize;
begin:
- nf_conntrack_get_ht(&ct_hash, &hsize);
+ nf_conntrack_get_ht(&init_net, &ct_hash, &hsize);
bucket = reciprocal_scale(hash, hsize);
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) {
@@ -853,7 +850,7 @@ static bool nf_ct_ext_valid_post(struct nf_ct_ext *ext)
if (ext->gen_id != atomic_read(&nf_conntrack_ext_genid))
return false;
- /* inserted into conntrack table, nf_ct_iterate_cleanup()
+ /* inserted into conntrack table, nf_ct_iterate_cleanup_net()
* will find it. Disable nf_ct_ext_find() id check.
*/
WRITE_ONCE(ext->gen_id, 0);
@@ -867,6 +864,7 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
struct net *net = nf_ct_net(ct);
unsigned int hash, reply_hash;
struct nf_conntrack_tuple_hash *h;
+ struct hlist_nulls_head *ct_hash;
struct hlist_nulls_node *n;
unsigned int max_chainlen;
unsigned int chainlen = 0;
@@ -889,10 +887,11 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_REPLY));
} while (nf_conntrack_double_lock(hash, reply_hash, sequence));
+ ct_hash = init_net.ct.nf_conntrack_hash;
max_chainlen = MIN_CHAINLEN + get_random_u32_below(MAX_CHAINLEN);
/* See if there's one in the list already, including reverse */
- hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode) {
+ hlist_nulls_for_each_entry(h, n, &ct_hash[hash], hnnode) {
if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
zone, net))
goto out;
@@ -903,7 +902,7 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
chainlen = 0;
- hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode) {
+ hlist_nulls_for_each_entry(h, n, &ct_hash[reply_hash], hnnode) {
if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
zone, net))
goto out;
@@ -926,9 +925,7 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
smp_wmb();
/* The caller holds a reference to this object */
refcount_set(&ct->ct_general.use, 2);
- __nf_conntrack_hash_insert(ct,
- &nf_conntrack_hash[hash],
- &nf_conntrack_hash[reply_hash]);
+ __nf_conntrack_hash_insert(ct, &ct_hash[hash], &ct_hash[reply_hash]);
nf_conntrack_double_unlock(hash, reply_hash);
NF_CT_STAT_INC(net, insert);
local_bh_enable();
@@ -1084,16 +1081,19 @@ static int nf_ct_resolve_clash_harder(struct sk_buff *skb, u32 repl_idx)
struct nf_conn *loser_ct = (struct nf_conn *)skb_nfct(skb);
const struct nf_conntrack_zone *zone;
struct nf_conntrack_tuple_hash *h;
+ struct hlist_nulls_head *ct_hash;
struct hlist_nulls_node *n;
struct net *net;
zone = nf_ct_zone(loser_ct);
net = nf_ct_net(loser_ct);
+ ct_hash = init_net.ct.nf_conntrack_hash;
+
/* Reply direction must never result in a clash, unless both origin
* and reply tuples are identical.
*/
- hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[repl_idx], hnnode) {
+ hlist_nulls_for_each_entry(h, n, &ct_hash[repl_idx], hnnode) {
if (nf_ct_key_equal(h,
&loser_ct->tuplehash[IP_CT_DIR_REPLY].tuple,
zone, net))
@@ -1119,7 +1119,7 @@ static int nf_ct_resolve_clash_harder(struct sk_buff *skb, u32 repl_idx)
hlist_nulls_add_fake(&loser_ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
hlist_nulls_add_head_rcu(&loser_ct->tuplehash[IP_CT_DIR_REPLY].hnnode,
- &nf_conntrack_hash[repl_idx]);
+ &ct_hash[repl_idx]);
/* confirmed bit must be set after hlist add, not before:
* loser_ct can still be visible to other cpu due to
* SLAB_TYPESAFE_BY_RCU.
@@ -1205,6 +1205,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
const struct nf_conntrack_zone *zone;
unsigned int hash, reply_hash;
struct nf_conntrack_tuple_hash *h;
+ struct hlist_nulls_head *ct_hash;
struct nf_conn *ct;
struct nf_conn_help *help;
struct hlist_nulls_node *n;
@@ -1235,6 +1236,8 @@ __nf_conntrack_confirm(struct sk_buff *skb)
nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_REPLY));
} while (nf_conntrack_double_lock(hash, reply_hash, sequence));
+ ct_hash = init_net.ct.nf_conntrack_hash;
+
/* We're not in hash table, and we refuse to set up related
* connections for unconfirmed conns. But packet copies and
* REJECT will give spurious warnings here.
@@ -1271,7 +1274,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
/* See if there's one in the list already, including reverse:
NAT could have grabbed it without realizing, since we're
not in the hash. If there is, we lost race. */
- hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode) {
+ hlist_nulls_for_each_entry(h, n, &ct_hash[hash], hnnode) {
if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
zone, net))
goto out;
@@ -1280,7 +1283,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
}
chainlen = 0;
- hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode) {
+ hlist_nulls_for_each_entry(h, n, &ct_hash[reply_hash], hnnode) {
if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
zone, net))
goto out;
@@ -1304,9 +1307,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
* setting ct->timeout. The RCU barriers guarantee that no other CPU
* can find the conntrack before the above stores are visible.
*/
- __nf_conntrack_hash_insert(ct,
- &nf_conntrack_hash[hash],
- &nf_conntrack_hash[reply_hash]);
+ __nf_conntrack_hash_insert(ct, &ct_hash[hash], &ct_hash[reply_hash]);
/* IPS_CONFIRMED unset means 'ct not (yet) in hash', conntrack lookups
* skip entries that lack this bit. This happens when a CPU is looking
@@ -1366,7 +1367,7 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
rcu_read_lock();
begin:
- nf_conntrack_get_ht(&ct_hash, &hsize);
+ nf_conntrack_get_ht(&init_net, &ct_hash, &hsize);
hash = __hash_conntrack(net, tuple, nf_ct_zone_id(zone, IP_CT_DIR_REPLY), hsize);
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) {
@@ -1473,7 +1474,7 @@ static noinline int early_drop(struct net *net, unsigned int hash)
unsigned int hsize, drops;
rcu_read_lock();
- nf_conntrack_get_ht(&ct_hash, &hsize);
+ nf_conntrack_get_ht(&init_net, &ct_hash, &hsize);
if (!i)
bucket = reciprocal_scale(hash, hsize);
else
@@ -1544,7 +1545,7 @@ static void gc_worker(struct work_struct *work)
rcu_read_lock();
- nf_conntrack_get_ht(&ct_hash, &hashsz);
+ nf_conntrack_get_ht(&init_net, &ct_hash, &hashsz);
if (i >= hashsz) {
rcu_read_unlock();
break;
@@ -2327,8 +2328,9 @@ get_next_corpse(int (*iter)(struct nf_conn *i, void *data),
spinlock_t *lockp;
for (; *bucket < nf_conntrack_htable_size; (*bucket)++) {
- struct hlist_nulls_head *hslot = &nf_conntrack_hash[*bucket];
+ struct hlist_nulls_head *hslot;
+ hslot = &init_net.ct.nf_conntrack_hash[*bucket];
if (hlist_nulls_empty(hslot))
continue;
@@ -2351,8 +2353,7 @@ get_next_corpse(int (*iter)(struct nf_conn *i, void *data),
*/
ct = nf_ct_tuplehash_to_ctrack(h);
- if (iter_data->net &&
- !net_eq(iter_data->net, nf_ct_net(ct)))
+ if (!net_eq(iter_data->net, nf_ct_net(ct)))
continue;
if (iter(ct, iter_data->data))
@@ -2371,14 +2372,19 @@ get_next_corpse(int (*iter)(struct nf_conn *i, void *data),
return ct;
}
-static void nf_ct_iterate_cleanup(int (*iter)(struct nf_conn *i, void *data),
- const struct nf_ct_iter_data *iter_data)
+void nf_ct_iterate_cleanup_net(int (*iter)(struct nf_conn *i, void *data),
+ const struct nf_ct_iter_data *iter_data)
{
+ struct net *net = iter_data->net;
+ struct nf_conntrack_net *cnet = nf_ct_pernet(net);
unsigned int bucket = 0;
struct nf_conn *ct;
might_sleep();
+ if (atomic_read(&cnet->count) == 0)
+ return;
+
mutex_lock(&nf_conntrack_mutex);
while ((ct = get_next_corpse(iter, iter_data, &bucket)) != NULL) {
/* Time to push up daises... */
@@ -2389,20 +2395,6 @@ static void nf_ct_iterate_cleanup(int (*iter)(struct nf_conn *i, void *data),
}
mutex_unlock(&nf_conntrack_mutex);
}
-
-void nf_ct_iterate_cleanup_net(int (*iter)(struct nf_conn *i, void *data),
- const struct nf_ct_iter_data *iter_data)
-{
- struct net *net = iter_data->net;
- struct nf_conntrack_net *cnet = nf_ct_pernet(net);
-
- might_sleep();
-
- if (atomic_read(&cnet->count) == 0)
- return;
-
- nf_ct_iterate_cleanup(iter, iter_data);
-}
EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup_net);
/**
@@ -2410,16 +2402,18 @@ EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup_net);
* @iter: callback to invoke for each conntrack
* @data: data to pass to @iter
*
- * Like nf_ct_iterate_cleanup, but first marks conntracks on the
- * unconfirmed list as dying (so they will not be inserted into
- * main table).
+ * Like nf_ct_iterate_cleanup_net, but bumps extension genid so
+ * extensions with stale data will not be accessible for conntracks not yet
+ * confirmed to main table.
*
* Can only be called in module exit path.
*/
void
nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data), void *data)
{
- struct nf_ct_iter_data iter_data = {};
+ struct nf_ct_iter_data iter_data = {
+ .data = data,
+ };
struct net *net;
down_read(&net_rwsem);
@@ -2429,6 +2423,8 @@ nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data), void *data)
if (atomic_read(&cnet->count) == 0)
continue;
nf_queue_nf_hook_drop(net);
+ iter_data.net = net;
+ nf_ct_iterate_cleanup_net(iter, &iter_data);
}
up_read(&net_rwsem);
@@ -2447,8 +2443,14 @@ nf_ct_iterate_destroy(int (*iter)(struct nf_conn *i, void *data), void *data)
synchronize_net();
nf_ct_ext_bump_genid();
- iter_data.data = data;
- nf_ct_iterate_cleanup(iter, &iter_data);
+
+ down_read(&net_rwsem);
+ for_each_net(net) {
+ iter_data.net = net;
+ nf_ct_iterate_cleanup_net(iter, &iter_data);
+ }
+
+ up_read(&net_rwsem);
/* Another cpu might be in a rcu read section with
* rcu protected pointer cleared in iter callback
@@ -2474,7 +2476,6 @@ void nf_conntrack_cleanup_end(void)
{
RCU_INIT_POINTER(nf_ct_hook, NULL);
disable_delayed_work_sync(&conntrack_gc_work.dwork);
- kvfree(nf_conntrack_hash);
nf_conntrack_proto_fini();
nf_conntrack_helper_fini();
@@ -2587,10 +2588,10 @@ int nf_conntrack_hash_resize(unsigned int hashsize)
*/
for (i = 0; i < nf_conntrack_htable_size; i++) {
- while (!hlist_nulls_empty(&nf_conntrack_hash[i])) {
+ while (!hlist_nulls_empty(&init_net.ct.nf_conntrack_hash[i])) {
unsigned int zone_id;
- h = hlist_nulls_entry(nf_conntrack_hash[i].first,
+ h = hlist_nulls_entry(init_net.ct.nf_conntrack_hash[i].first,
struct nf_conntrack_tuple_hash, hnnode);
ct = nf_ct_tuplehash_to_ctrack(h);
hlist_nulls_del_rcu(&h->hnnode);
@@ -2601,9 +2602,11 @@ int nf_conntrack_hash_resize(unsigned int hashsize)
hlist_nulls_add_head_rcu(&h->hnnode, &hash[bucket]);
}
}
- old_hash = nf_conntrack_hash;
- nf_conntrack_hash = hash;
+ old_size = nf_conntrack_htable_size;
+ old_hash = init_net.ct.nf_conntrack_hash;
+
+ init_net.ct.nf_conntrack_hash = hash;
nf_conntrack_htable_size = hashsize;
write_seqcount_end(&nf_conntrack_generation);
@@ -2626,7 +2629,7 @@ int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp)
return -EOPNOTSUPP;
/* On boot, we can set this without any fancy locking. */
- if (!nf_conntrack_hash)
+ if (!init_net.ct.nf_conntrack_hash)
return param_set_uint(val, kp);
rc = kstrtouint(val, 0, &hashsize);
@@ -2679,8 +2682,8 @@ int nf_conntrack_init_start(void)
max_factor = 1;
}
- nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1);
- if (!nf_conntrack_hash)
+ init_net.ct.nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1);
+ if (!init_net.ct.nf_conntrack_hash)
return -ENOMEM;
init_net.ct.sysctl_max = max_factor * nf_conntrack_htable_size;
@@ -2722,7 +2725,7 @@ int nf_conntrack_init_start(void)
err_expect:
kmem_cache_destroy(nf_conntrack_cachep);
err_cachep:
- kvfree(nf_conntrack_hash);
+ kvfree(init_net.ct.nf_conntrack_hash);
return ret;
}
@@ -2779,6 +2782,9 @@ int nf_conntrack_init_net(struct net *net)
nf_conntrack_ecache_pernet_init(net);
nf_conntrack_proto_pernet_init(net);
+ if (!net_eq(net, &init_net))
+ net->ct.nf_conntrack_hash = init_net.ct.nf_conntrack_hash;
+
return 0;
err_expect:
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index df243d494afd..068e831545ec 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1244,7 +1244,8 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
spin_unlock(lockp);
goto out;
}
- hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[cb->args[0]],
+ hlist_nulls_for_each_entry(h, n,
+ &init_net.ct.nf_conntrack_hash[cb->args[0]],
hnnode) {
ct = nf_ct_tuplehash_to_ctrack(h);
if (nf_ct_is_expired(ct)) {
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 787c506c15bd..e610a0887cc2 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -155,7 +155,7 @@ static void *ct_seq_start(struct seq_file *seq, loff_t *pos)
st->time_now = ktime_get_real_ns();
rcu_read_lock();
- nf_conntrack_get_ht(&st->hash, &st->htable_size);
+ nf_conntrack_get_ht(&init_net, &st->hash, &st->htable_size);
if (*pos == 0) {
st->skip_elems = 0;
@@ -1131,6 +1131,12 @@ static void nf_conntrack_pernet_exit(struct list_head *net_exit_list)
nf_conntrack_fini_net(net);
nf_conntrack_cleanup_net_list(net_exit_list);
+
+ list_for_each_entry(net, net_exit_list, exit_list) {
+ if (net_eq(net, &init_net))
+ kvfree(net->ct.nf_conntrack_hash);
+ net->ct.nf_conntrack_hash = NULL;
+ }
}
static struct pernet_operations nf_conntrack_net_ops = {
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 07/11] netfilter: conntrack: init and start independent gc workers when needed
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
` (5 preceding siblings ...)
2025-11-05 16:48 ` [RFC nf-next 06/11] netfilter: conntrack: move nf_conntrack_hash to struct net Florian Westphal
@ 2025-11-05 16:48 ` Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 08/11] netfilter: conntrack: make nf_conntrack hash table pernet Florian Westphal
` (3 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:48 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
Next step in pernet coversion: make the gc worker pernet.
Because net->ct.nf_conntrack_hash still aliases the init_net one this
patch makes little sense, its just preparation work to keep this change
separate.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_conntrack.h | 21 ++++++++
include/net/netfilter/nf_conntrack_core.h | 1 -
net/netfilter/nf_conntrack_core.c | 58 +++++++++++------------
net/netfilter/nf_conntrack_standalone.c | 7 ++-
4 files changed, 55 insertions(+), 32 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index a90654bb2410..d3e419c08cc1 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -47,6 +47,26 @@ struct nf_conntrack_net_ecache {
struct hlist_nulls_head dying_list;
};
+/**
+ * struct conntrack_gc_work - gc state
+ * @dwork: delayed GC work item
+ * @net: net namespace the gc worker belongs to
+ * @next_bucket: next conntrack hash bucket to work on
+ * @avg_timeout: average timeout of conntracks seen
+ * @count: non-expired conntracks seen
+ * @start_time: nfct_time_stamp taken on start of work function
+ * @early_drop: remove non-assured and closing conntracks too
+ */
+struct conntrack_gc_work {
+ struct delayed_work dwork;
+ possible_net_t net;
+ u32 next_bucket;
+ u32 avg_timeout;
+ u32 count;
+ u32 start_time;
+ bool early_drop;
+};
+
struct nf_conntrack_net {
/* only used when new connection is allocated: */
atomic_t count;
@@ -62,6 +82,7 @@ struct nf_conntrack_net {
#ifdef CONFIG_NF_CONNTRACK_EVENTS
struct nf_conntrack_net_ecache ecache;
#endif
+ struct conntrack_gc_work gc_work;
};
#include <linux/types.h>
diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 3384859a8921..eb6e05c654b2 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -35,7 +35,6 @@ int nf_conntrack_proto_init(void);
void nf_conntrack_proto_fini(void);
int nf_conntrack_init_start(void);
-void nf_conntrack_cleanup_start(void);
void nf_conntrack_init_end(void);
void nf_conntrack_cleanup_end(void);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index f2ff0e70f5ab..b2f0dffb7f79 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -60,15 +60,6 @@ EXPORT_SYMBOL_GPL(nf_conntrack_locks);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(nf_conntrack_expect_lock);
EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock);
-struct conntrack_gc_work {
- struct delayed_work dwork;
- u32 next_bucket;
- u32 avg_timeout;
- u32 count;
- u32 start_time;
- bool early_drop;
-};
-
static __read_mostly struct kmem_cache *nf_conntrack_cachep;
static DEFINE_SPINLOCK(nf_conntrack_locks_all_lock);
static __read_mostly bool nf_conntrack_locks_all;
@@ -95,8 +86,6 @@ static DEFINE_MUTEX(nf_conntrack_mutex);
#define MIN_CHAINLEN 50u
#define MAX_CHAINLEN (80u - MIN_CHAINLEN)
-static struct conntrack_gc_work conntrack_gc_work;
-
void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
{
/* 1) Acquire the lock */
@@ -1518,11 +1507,15 @@ static void gc_worker(struct work_struct *work)
u32 end_time, start_time = nfct_time_stamp;
struct conntrack_gc_work *gc_work;
unsigned int expired_count = 0;
+ struct nf_conntrack_net *cnet;
unsigned long next_run;
+ struct net *net;
s32 delta_time;
long count;
gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
+ net = read_pnet(&gc_work->net);
+ cnet = nf_ct_pernet(net);
i = gc_work->next_bucket;
@@ -1545,7 +1538,7 @@ static void gc_worker(struct work_struct *work)
rcu_read_lock();
- nf_conntrack_get_ht(&init_net, &ct_hash, &hashsz);
+ nf_conntrack_get_ht(net, &ct_hash, &hashsz);
if (i >= hashsz) {
rcu_read_unlock();
break;
@@ -1553,8 +1546,6 @@ static void gc_worker(struct work_struct *work)
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) {
unsigned int nf_conntrack_max95 = 0;
- struct nf_conntrack_net *cnet;
- struct net *net;
long expires;
tmp = nf_ct_tuplehash_to_ctrack(h);
@@ -1573,6 +1564,9 @@ static void gc_worker(struct work_struct *work)
goto early_exit;
}
+ if (!net_eq(net, nf_ct_net(tmp)))
+ break;
+
if (nf_ct_is_expired(tmp)) {
nf_ct_gc_expired(tmp);
expired_count++;
@@ -1582,7 +1576,6 @@ static void gc_worker(struct work_struct *work)
expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP);
expires = (expires - (long)next_run) / ++count;
next_run += expires;
- net = nf_ct_net(tmp);
if (gc_work->early_drop)
nf_conntrack_max95 = nf_conntrack_max(net) / 100u * 95u;
@@ -1590,7 +1583,6 @@ static void gc_worker(struct work_struct *work)
if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp))
continue;
- cnet = nf_ct_pernet(net);
if (atomic_read(&cnet->count) < nf_conntrack_max95)
continue;
@@ -1601,6 +1593,11 @@ static void gc_worker(struct work_struct *work)
/* load ->status after refcount increase */
smp_acquire__after_ctrl_dep();
+ if (!net_eq(net, nf_ct_net(tmp))) {
+ nf_ct_put(tmp);
+ break;
+ }
+
if (gc_worker_skip_ct(tmp)) {
nf_ct_put(tmp);
continue;
@@ -1650,10 +1647,19 @@ static void gc_worker(struct work_struct *work)
mod_delayed_work(system_power_efficient_wq, &gc_work->dwork, next_run);
}
-static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work)
+static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work, struct net *net)
{
/* work is started on first conntrack allocation. */
INIT_DELAYED_WORK(&gc_work->dwork, gc_worker);
+ write_pnet(&gc_work->net, net);
+}
+
+static void gc_set_early_drop(struct net *net)
+{
+ struct nf_conntrack_net *n = nf_ct_pernet(net);
+
+ if (!n->gc_work.early_drop)
+ n->gc_work.early_drop = true;
}
static struct nf_conn *
@@ -1674,8 +1680,7 @@ __nf_conntrack_alloc(struct net *net,
if (unlikely(ct_count > ct_max)) {
if (!early_drop(net, hash)) {
- if (!conntrack_gc_work.early_drop)
- conntrack_gc_work.early_drop = true;
+ gc_set_early_drop(net);
atomic_dec(&cnet->count);
if (net == &init_net)
net_warn_ratelimited("nf_conntrack: table full, dropping packet\n");
@@ -1715,9 +1720,9 @@ __nf_conntrack_alloc(struct net *net,
/* Re-arm gc_work if needed, but do not modify
* in case it was already pending.
*/
- if (unlikely(!delayed_work_pending(&conntrack_gc_work.dwork)))
+ if (unlikely(!delayed_work_pending(&cnet->gc_work.dwork)))
queue_delayed_work(system_power_efficient_wq,
- &conntrack_gc_work.dwork,
+ &cnet->gc_work.dwork,
GC_SCAN_INTERVAL_INIT);
return ct;
@@ -2467,15 +2472,9 @@ static int kill_all(struct nf_conn *i, void *data)
return 1;
}
-void nf_conntrack_cleanup_start(void)
-{
- cleanup_nf_conntrack_bpf();
-}
-
void nf_conntrack_cleanup_end(void)
{
RCU_INIT_POINTER(nf_ct_hook, NULL);
- disable_delayed_work_sync(&conntrack_gc_work.dwork);
nf_conntrack_proto_fini();
nf_conntrack_helper_fini();
@@ -2707,8 +2706,6 @@ int nf_conntrack_init_start(void)
if (ret < 0)
goto err_proto;
- conntrack_gc_work_init(&conntrack_gc_work);
-
ret = register_nf_conntrack_bpf();
if (ret < 0)
goto err_kfunc;
@@ -2716,7 +2713,6 @@ int nf_conntrack_init_start(void)
return 0;
err_kfunc:
- cancel_delayed_work_sync(&conntrack_gc_work.dwork);
nf_conntrack_proto_fini();
err_proto:
nf_conntrack_helper_fini();
@@ -2785,6 +2781,8 @@ int nf_conntrack_init_net(struct net *net)
if (!net_eq(net, &init_net))
net->ct.nf_conntrack_hash = init_net.ct.nf_conntrack_hash;
+ conntrack_gc_work_init(&cnet->gc_work, net);
+
return 0;
err_expect:
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index e610a0887cc2..e980213ef602 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -16,6 +16,7 @@
#include <net/netfilter/nf_log.h>
#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_bpf.h>
#include <net/netfilter/nf_conntrack_core.h>
#include <net/netfilter/nf_conntrack_l4proto.h>
#include <net/netfilter/nf_conntrack_expect.h>
@@ -1080,9 +1081,13 @@ static void nf_conntrack_standalone_fini_sysctl(struct net *net)
static void nf_conntrack_fini_net(struct net *net)
{
+ struct nf_conntrack_net *ctnet = nf_ct_pernet(net);
+
if (enable_hooks)
nf_ct_netns_put(net, NFPROTO_INET);
+ disable_delayed_work_sync(&ctnet->gc_work.dwork);
+
nf_conntrack_standalone_fini_proc(net);
nf_conntrack_standalone_fini_sysctl(net);
}
@@ -1186,7 +1191,7 @@ static int __init nf_conntrack_standalone_init(void)
static void __exit nf_conntrack_standalone_fini(void)
{
- nf_conntrack_cleanup_start();
+ cleanup_nf_conntrack_bpf();
unregister_pernet_subsys(&nf_conntrack_net_ops);
#ifdef CONFIG_SYSCTL
unregister_net_sysctl_table(nf_ct_netfilter_header);
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 08/11] netfilter: conntrack: make nf_conntrack hash table pernet
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
` (6 preceding siblings ...)
2025-11-05 16:48 ` [RFC nf-next 07/11] netfilter: conntrack: init and start independent gc workers when needed Florian Westphal
@ 2025-11-05 16:48 ` Florian Westphal
2025-11-07 16:05 ` kernel test robot
2025-11-05 16:48 ` [RFC nf-next 09/11] netfilter: conntrack: delay conntrack hashtable allocation until needed Florian Westphal
` (2 subsequent siblings)
10 siblings, 1 reply; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:48 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
Make net->ct.hashtable distinct for each netns.
Allocation is done from nf_conntrack_init_net().
This is not optimal, as we register conntrack hooks on demand.
A followup patch will delay the allocation until after conntrack
functionality is requested by userspace.
Earlier hack to prefer init_net.ct.nf_conntrack_hash is removed.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_conntrack.h | 12 +--
include/net/netns/conntrack.h | 3 +-
net/netfilter/nf_conntrack_core.c | 120 ++++++++++++++----------
net/netfilter/nf_conntrack_expect.c | 2 +-
net/netfilter/nf_conntrack_netlink.c | 8 +-
net/netfilter/nf_conntrack_proto.c | 2 +-
net/netfilter/nf_conntrack_standalone.c | 34 +++----
net/netfilter/nf_nat_core.c | 2 +-
8 files changed, 99 insertions(+), 84 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index d3e419c08cc1..e6c3a7dba8dd 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -77,6 +77,7 @@ struct nf_conntrack_net {
unsigned int users6;
unsigned int users_bridge;
#ifdef CONFIG_SYSCTL
+ unsigned int htable_size_user;
struct ctl_table_header *sysctl_header;
#endif
#ifdef CONFIG_NF_CONNTRACK_EVENTS
@@ -345,10 +346,8 @@ static inline bool nf_ct_should_gc(const struct nf_conn *ct)
struct kernel_param;
int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp);
-int nf_conntrack_hash_resize(unsigned int hashsize);
+int nf_conntrack_hash_resize(struct net *net, unsigned int hashsize);
-extern struct hlist_nulls_head *nf_conntrack_hash;
-extern unsigned int nf_conntrack_htable_size;
extern seqcount_spinlock_t nf_conntrack_generation;
/* must be called with rcu read lock held */
@@ -361,9 +360,8 @@ nf_conntrack_get_ht(struct net *net, struct hlist_nulls_head **hash,
do {
sequence = read_seqcount_begin(&nf_conntrack_generation);
- hsz = nf_conntrack_htable_size;
+ hsz = net->ct.nf_conntrack_htable_size;
hptr = net->ct.nf_conntrack_hash;
- hptr = init_net.ct.nf_conntrack_hash;
} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
*hash = hptr;
@@ -394,9 +392,7 @@ static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net)
static inline unsigned int nf_conntrack_max(const struct net *net)
{
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
- return min(init_net.ct.sysctl_max, net->ct.sysctl_max);
-#else
- return 0;
+ return net->ct.nf_conntrack_max;
#endif
}
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 96b326bc1cd7..32c9e6ee9c2c 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -89,10 +89,11 @@ struct netns_ct {
u8 sysctl_acct;
u8 sysctl_tstamp;
u8 sysctl_checksum;
- unsigned int sysctl_max;
+ unsigned int nf_conntrack_htable_size;
struct ip_conntrack_stat __percpu *stat;
struct hlist_nulls_head *nf_conntrack_hash;
+ unsigned int nf_conntrack_max;
struct nf_ct_event_notifier __rcu *nf_conntrack_event_cb;
struct nf_ip_net nf_ct_proto;
#if defined(CONFIG_NF_CONNTRACK_LABELS)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index b2f0dffb7f79..bbe195f34904 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -63,6 +63,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock);
static __read_mostly struct kmem_cache *nf_conntrack_cachep;
static DEFINE_SPINLOCK(nf_conntrack_locks_all_lock);
static __read_mostly bool nf_conntrack_locks_all;
+static unsigned int conntrack_htable_autosize __ro_after_init;
/* serialize hash resizes and nf_ct_iterate_cleanup */
static DEFINE_MUTEX(nf_conntrack_mutex);
@@ -184,9 +185,6 @@ static void nf_conntrack_all_unlock(void)
spin_unlock(&nf_conntrack_locks_all_lock);
}
-unsigned int nf_conntrack_htable_size __read_mostly;
-EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
-
seqcount_spinlock_t nf_conntrack_generation __read_mostly;
static siphash_aligned_key_t nf_conntrack_hash_rnd;
@@ -208,9 +206,9 @@ static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
&key);
}
-static u32 scale_hash(u32 hash)
+static u32 scale_hash(const struct net *net, u32 hash)
{
- return reciprocal_scale(hash, nf_conntrack_htable_size);
+ return reciprocal_scale(hash, net->ct.nf_conntrack_htable_size);
}
static u32 __hash_conntrack(const struct net *net,
@@ -225,7 +223,7 @@ static u32 hash_conntrack(const struct net *net,
const struct nf_conntrack_tuple *tuple,
unsigned int zoneid)
{
- return scale_hash(hash_conntrack_raw(tuple, zoneid, net));
+ return scale_hash(net, hash_conntrack_raw(tuple, zoneid, net));
}
static bool nf_ct_get_tuple_ports(const struct sk_buff *skb,
@@ -722,9 +720,12 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
struct hlist_nulls_head *ct_hash;
struct hlist_nulls_node *n;
unsigned int bucket, hsize;
+ bool restart;
begin:
- nf_conntrack_get_ht(&init_net, &ct_hash, &hsize);
+ restart = false;
+
+ nf_conntrack_get_ht(net, &ct_hash, &hsize);
bucket = reciprocal_scale(hash, hsize);
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) {
@@ -738,13 +739,19 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
if (nf_ct_key_equal(h, tuple, zone, net))
return h;
+
+ if (net_eq(net, nf_ct_net(ct)))
+ continue;
+
+ restart = true;
+ break;
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
- if (get_nulls_value(n) != bucket) {
+ if (restart || get_nulls_value(n) != bucket) {
NF_CT_STAT_INC_ATOMIC(net, search_restart);
goto begin;
}
@@ -876,7 +883,7 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_REPLY));
} while (nf_conntrack_double_lock(hash, reply_hash, sequence));
- ct_hash = init_net.ct.nf_conntrack_hash;
+ ct_hash = net->ct.nf_conntrack_hash;
max_chainlen = MIN_CHAINLEN + get_random_u32_below(MAX_CHAINLEN);
/* See if there's one in the list already, including reverse */
@@ -1077,7 +1084,7 @@ static int nf_ct_resolve_clash_harder(struct sk_buff *skb, u32 repl_idx)
zone = nf_ct_zone(loser_ct);
net = nf_ct_net(loser_ct);
- ct_hash = init_net.ct.nf_conntrack_hash;
+ ct_hash = net->ct.nf_conntrack_hash;
/* Reply direction must never result in a clash, unless both origin
* and reply tuples are identical.
@@ -1219,13 +1226,13 @@ __nf_conntrack_confirm(struct sk_buff *skb)
sequence = read_seqcount_begin(&nf_conntrack_generation);
/* reuse the hash saved before */
hash = *(unsigned long *)&ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev;
- hash = scale_hash(hash);
+ hash = scale_hash(net, hash);
reply_hash = hash_conntrack(net,
&ct->tuplehash[IP_CT_DIR_REPLY].tuple,
nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_REPLY));
} while (nf_conntrack_double_lock(hash, reply_hash, sequence));
- ct_hash = init_net.ct.nf_conntrack_hash;
+ ct_hash = net->ct.nf_conntrack_hash;
/* We're not in hash table, and we refuse to set up related
* connections for unconfirmed conns. But packet copies and
@@ -1356,7 +1363,7 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
rcu_read_lock();
begin:
- nf_conntrack_get_ht(&init_net, &ct_hash, &hsize);
+ nf_conntrack_get_ht(net, &ct_hash, &hsize);
hash = __hash_conntrack(net, tuple, nf_ct_zone_id(zone, IP_CT_DIR_REPLY), hsize);
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) {
@@ -1463,7 +1470,7 @@ static noinline int early_drop(struct net *net, unsigned int hash)
unsigned int hsize, drops;
rcu_read_lock();
- nf_conntrack_get_ht(&init_net, &ct_hash, &hsize);
+ nf_conntrack_get_ht(net, &ct_hash, &hsize);
if (!i)
bucket = reciprocal_scale(hash, hsize);
else
@@ -2328,14 +2335,17 @@ get_next_corpse(int (*iter)(struct nf_conn *i, void *data),
const struct nf_ct_iter_data *iter_data, unsigned int *bucket)
{
struct nf_conntrack_tuple_hash *h;
+ struct net *net = iter_data->net;
struct nf_conn *ct;
struct hlist_nulls_node *n;
+ unsigned int htable_size;
spinlock_t *lockp;
- for (; *bucket < nf_conntrack_htable_size; (*bucket)++) {
+ htable_size = net->ct.nf_conntrack_htable_size;
+ for (; *bucket < htable_size; (*bucket)++) {
struct hlist_nulls_head *hslot;
- hslot = &init_net.ct.nf_conntrack_hash[*bucket];
+ hslot = &net->ct.nf_conntrack_hash[*bucket];
if (hlist_nulls_empty(hslot))
continue;
@@ -2543,8 +2553,8 @@ void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls)
if (nr_slots > (INT_MAX / sizeof(struct hlist_nulls_head)))
return NULL;
- hash = kvcalloc(nr_slots, sizeof(struct hlist_nulls_head), GFP_KERNEL);
-
+ hash = kvcalloc(nr_slots, sizeof(struct hlist_nulls_head),
+ GFP_KERNEL_ACCOUNT);
if (hash && nulls)
for (i = 0; i < nr_slots; i++)
INIT_HLIST_NULLS_HEAD(&hash[i], i);
@@ -2553,7 +2563,7 @@ void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls)
}
EXPORT_SYMBOL_GPL(nf_ct_alloc_hashtable);
-int nf_conntrack_hash_resize(unsigned int hashsize)
+int nf_conntrack_hash_resize(struct net *net, unsigned int hashsize)
{
int i, bucket;
unsigned int old_size;
@@ -2569,7 +2579,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize)
return -ENOMEM;
mutex_lock(&nf_conntrack_mutex);
- old_size = nf_conntrack_htable_size;
+ old_size = net->ct.nf_conntrack_htable_size;
if (old_size == hashsize) {
mutex_unlock(&nf_conntrack_mutex);
kvfree(hash);
@@ -2586,11 +2596,12 @@ int nf_conntrack_hash_resize(unsigned int hashsize)
* though since that required taking the locks.
*/
- for (i = 0; i < nf_conntrack_htable_size; i++) {
- while (!hlist_nulls_empty(&init_net.ct.nf_conntrack_hash[i])) {
+ old_size = net->ct.nf_conntrack_htable_size;
+ for (i = 0; i < old_size; i++) {
+ while (!hlist_nulls_empty(&net->ct.nf_conntrack_hash[i])) {
unsigned int zone_id;
- h = hlist_nulls_entry(init_net.ct.nf_conntrack_hash[i].first,
+ h = hlist_nulls_entry(net->ct.nf_conntrack_hash[i].first,
struct nf_conntrack_tuple_hash, hnnode);
ct = nf_ct_tuplehash_to_ctrack(h);
hlist_nulls_del_rcu(&h->hnnode);
@@ -2602,11 +2613,11 @@ int nf_conntrack_hash_resize(unsigned int hashsize)
}
}
- old_size = nf_conntrack_htable_size;
- old_hash = init_net.ct.nf_conntrack_hash;
+ old_size = net->ct.nf_conntrack_htable_size;
+ old_hash = net->ct.nf_conntrack_hash;
- init_net.ct.nf_conntrack_hash = hash;
- nf_conntrack_htable_size = hashsize;
+ net->ct.nf_conntrack_hash = hash;
+ net->ct.nf_conntrack_htable_size = hashsize;
write_seqcount_end(&nf_conntrack_generation);
nf_conntrack_all_unlock();
@@ -2635,7 +2646,7 @@ int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp)
if (rc)
return rc;
- return nf_conntrack_hash_resize(hashsize);
+ return nf_conntrack_hash_resize(&init_net, hashsize);
}
static unsigned int nf_conntrack_htable_autosize(void)
@@ -2651,7 +2662,7 @@ static unsigned int nf_conntrack_htable_autosize(void)
else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE))
ht_size = 65536;
- if (nf_conntrack_htable_size < 1024)
+ if (ht_size < 1024)
ht_size = 1024;
return ht_size;
@@ -2659,7 +2670,6 @@ static unsigned int nf_conntrack_htable_autosize(void)
int nf_conntrack_init_start(void)
{
- int max_factor = 8;
int ret = -ENOMEM;
int i;
@@ -2669,23 +2679,7 @@ int nf_conntrack_init_start(void)
for (i = 0; i < CONNTRACK_LOCKS; i++)
spin_lock_init(&nf_conntrack_locks[i]);
- if (!nf_conntrack_htable_size) {
- nf_conntrack_htable_size = nf_conntrack_htable_autosize();
-
- /* Use a max. factor of one by default to keep the average
- * hash chain length at 2 entries. Each entry has to be added
- * twice (once for original direction, once for reply).
- * When a table size is given we use the old value of 8 to
- * avoid implicit reduction of the max entries setting.
- */
- max_factor = 1;
- }
-
- init_net.ct.nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1);
- if (!init_net.ct.nf_conntrack_hash)
- return -ENOMEM;
-
- init_net.ct.sysctl_max = max_factor * nf_conntrack_htable_size;
+ conntrack_htable_autosize = nf_conntrack_htable_autosize();
nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
sizeof(struct nf_conn),
@@ -2721,7 +2715,6 @@ int nf_conntrack_init_start(void)
err_expect:
kmem_cache_destroy(nf_conntrack_cachep);
err_cachep:
- kvfree(init_net.ct.nf_conntrack_hash);
return ret;
}
@@ -2759,7 +2752,31 @@ void nf_conntrack_init_end(void)
int nf_conntrack_init_net(struct net *net)
{
struct nf_conntrack_net *cnet = nf_ct_pernet(net);
+ unsigned int ht_size = conntrack_htable_autosize;
int ret = -ENOMEM;
+ int max_factor = 1;
+
+ net->ct.nf_conntrack_max = conntrack_htable_autosize;
+
+ if (&init_net == net &&
+ init_net.ct.nf_conntrack_htable_size) {
+ /* Use a max. factor of one by default to keep the average
+ * hash chain length at 2 entries. Each entry has to be added
+ * twice (once for original direction, once for reply).
+ * When a table size is given we use the old value of 8 to
+ * avoid implicit reduction of the max entries setting.
+ */
+ ht_size = init_net.ct.nf_conntrack_htable_size;
+ max_factor = 8;
+ }
+
+ net->ct.nf_conntrack_hash = nf_ct_alloc_hashtable(&ht_size, 1);
+ if (!net->ct.nf_conntrack_hash)
+ return ret;
+
+ net->ct.nf_conntrack_htable_size = ht_size;
+ cnet->htable_size_user = ht_size;
+ net->ct.nf_conntrack_max = ht_size * max_factor;
BUILD_BUG_ON(IP_CT_UNTRACKED == IP_CT_NUMBER);
BUILD_BUG_ON_NOT_POWER_OF_2(CONNTRACK_LOCKS);
@@ -2767,7 +2784,7 @@ int nf_conntrack_init_net(struct net *net)
net->ct.stat = alloc_percpu(struct ip_conntrack_stat);
if (!net->ct.stat)
- return ret;
+ goto err_stat;
ret = nf_conntrack_expect_pernet_init(net);
if (ret < 0)
@@ -2778,15 +2795,14 @@ int nf_conntrack_init_net(struct net *net)
nf_conntrack_ecache_pernet_init(net);
nf_conntrack_proto_pernet_init(net);
- if (!net_eq(net, &init_net))
- net->ct.nf_conntrack_hash = init_net.ct.nf_conntrack_hash;
-
conntrack_gc_work_init(&cnet->gc_work, net);
return 0;
err_expect:
free_percpu(net->ct.stat);
+err_stat:
+ kvfree(net->ct.nf_conntrack_hash);
return ret;
}
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index cfc2daa3fc7f..a3b9539e1036 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -717,7 +717,7 @@ void nf_conntrack_expect_pernet_fini(struct net *net)
int nf_conntrack_expect_init(void)
{
if (!nf_ct_expect_hsize) {
- nf_ct_expect_hsize = nf_conntrack_htable_size / 256;
+ nf_ct_expect_hsize = init_net.ct.nf_conntrack_htable_size / 256;
if (!nf_ct_expect_hsize)
nf_ct_expect_hsize = 1;
}
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 068e831545ec..d707ef7e2d75 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1221,6 +1221,7 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
unsigned long last_id = cb->args[1];
struct nf_conntrack_tuple_hash *h;
struct hlist_nulls_node *n;
+ unsigned int htable_size;
struct nf_conn *nf_ct_evict[8];
struct nf_conn *ct;
int res, i;
@@ -1229,7 +1230,8 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
i = 0;
local_bh_disable();
- for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
+ htable_size = net->ct.nf_conntrack_htable_size;
+ for (; cb->args[0] < htable_size; cb->args[0]++) {
restart:
while (i) {
i--;
@@ -1240,12 +1242,12 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
lockp = &nf_conntrack_locks[cb->args[0] % CONNTRACK_LOCKS];
nf_conntrack_lock(lockp);
- if (cb->args[0] >= nf_conntrack_htable_size) {
+ if (cb->args[0] >= htable_size) {
spin_unlock(lockp);
goto out;
}
hlist_nulls_for_each_entry(h, n,
- &init_net.ct.nf_conntrack_hash[cb->args[0]],
+ &net->ct.nf_conntrack_hash[cb->args[0]],
hnnode) {
ct = nf_ct_tuplehash_to_ctrack(h);
if (nf_ct_is_expired(ct)) {
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index bc1d96686b9c..01eec82f4cba 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -687,7 +687,7 @@ void nf_conntrack_proto_pernet_init(struct net *net)
}
module_param_call(hashsize, nf_conntrack_set_hashsize, param_get_uint,
- &nf_conntrack_htable_size, 0600);
+ &init_net.ct.nf_conntrack_htable_size, 0600);
MODULE_ALIAS("ip_conntrack");
MODULE_ALIAS("nf_conntrack-" __stringify(AF_INET));
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index e980213ef602..f31c95b77041 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -156,7 +156,7 @@ static void *ct_seq_start(struct seq_file *seq, loff_t *pos)
st->time_now = ktime_get_real_ns();
rcu_read_lock();
- nf_conntrack_get_ht(&init_net, &st->hash, &st->htable_size);
+ nf_conntrack_get_ht(net, &st->hash, &st->htable_size);
if (*pos == 0) {
st->skip_elems = 0;
@@ -536,27 +536,27 @@ EXPORT_SYMBOL_GPL(nf_conntrack_count);
/* Sysctl support */
#ifdef CONFIG_SYSCTL
-/* size the user *wants to set */
-static unsigned int nf_conntrack_htable_size_user __read_mostly;
-
static int
nf_conntrack_hash_sysctl(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
+ struct net *net = table->extra1;
+ struct nf_conntrack_net *cnet;
+ unsigned int size_old;
int ret;
- /* module_param hashsize could have changed value */
- nf_conntrack_htable_size_user = nf_conntrack_htable_size;
+ cnet = nf_ct_pernet(net);
+ size_old = net->ct.nf_conntrack_htable_size;
ret = proc_dointvec(table, write, buffer, lenp, ppos);
if (ret < 0 || !write)
return ret;
/* update ret, we might not be able to satisfy request */
- ret = nf_conntrack_hash_resize(nf_conntrack_htable_size_user);
+ ret = nf_conntrack_hash_resize(net, cnet->htable_size_user);
/* update it to the actual value used by conntrack */
- nf_conntrack_htable_size_user = nf_conntrack_htable_size;
+ cnet->htable_size_user = net->ct.nf_conntrack_htable_size;
return ret;
}
@@ -645,7 +645,7 @@ enum nf_ct_sysctl_index {
static struct ctl_table nf_ct_sysctl_table[] = {
[NF_SYSCTL_CT_MAX] = {
.procname = "nf_conntrack_max",
- .data = &init_net.ct.sysctl_max,
+ .data = &init_net.ct.nf_conntrack_max,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
@@ -660,10 +660,12 @@ static struct ctl_table nf_ct_sysctl_table[] = {
},
[NF_SYSCTL_CT_BUCKETS] = {
.procname = "nf_conntrack_buckets",
- .data = &nf_conntrack_htable_size_user,
+ .data = &init_net.ct.nf_conntrack_htable_size,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = nf_conntrack_hash_sysctl,
+ .extra1 = &init_net,
+ .maxlen = sizeof(unsigned int),
},
[NF_SYSCTL_CT_CHECKSUM] = {
.procname = "nf_conntrack_checksum",
@@ -926,7 +928,7 @@ static struct ctl_table nf_ct_sysctl_table[] = {
static struct ctl_table nf_ct_netfilter_table[] = {
{
.procname = "nf_conntrack_max",
- .data = &init_net.ct.sysctl_max,
+ .data = &init_net.ct.nf_conntrack_max,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
@@ -1017,8 +1019,10 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
return -ENOMEM;
table[NF_SYSCTL_CT_COUNT].data = &cnet->count;
+ table[NF_SYSCTL_CT_BUCKETS].data = &cnet->htable_size_user;
+ table[NF_SYSCTL_CT_BUCKETS].extra1 = net;
table[NF_SYSCTL_CT_CHECKSUM].data = &net->ct.sysctl_checksum;
- table[NF_SYSCTL_CT_MAX].data = &net->ct.sysctl_max;
+ table[NF_SYSCTL_CT_MAX].data = &net->ct.nf_conntrack_max;
table[NF_SYSCTL_CT_LOG_INVALID].data = &net->ct.sysctl_log_invalid;
table[NF_SYSCTL_CT_ACCT].data = &net->ct.sysctl_acct;
#ifdef CONFIG_NF_CONNTRACK_EVENTS
@@ -1097,7 +1101,6 @@ static int nf_conntrack_pernet_init(struct net *net)
int ret;
net->ct.sysctl_checksum = 1;
- net->ct.sysctl_max = init_net.ct.sysctl_max;
ret = nf_conntrack_standalone_init_sysctl(net);
if (ret < 0)
@@ -1138,8 +1141,7 @@ static void nf_conntrack_pernet_exit(struct list_head *net_exit_list)
nf_conntrack_cleanup_net_list(net_exit_list);
list_for_each_entry(net, net_exit_list, exit_list) {
- if (net_eq(net, &init_net))
- kvfree(net->ct.nf_conntrack_hash);
+ kvfree(net->ct.nf_conntrack_hash);
net->ct.nf_conntrack_hash = NULL;
}
}
@@ -1167,8 +1169,6 @@ static int __init nf_conntrack_standalone_init(void)
ret = -ENOMEM;
goto out_sysctl;
}
-
- nf_conntrack_htable_size_user = nf_conntrack_htable_size;
#endif
nf_conntrack_init_end();
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 78a61dac4ade..2e660f4d4ac1 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -1330,7 +1330,7 @@ static int __init nf_nat_init(void)
int ret, i;
/* Leave them the same for the moment. */
- nf_nat_htable_size = nf_conntrack_htable_size;
+ nf_nat_htable_size = init_net.ct.nf_conntrack_htable_size;
if (nf_nat_htable_size < CONNTRACK_LOCKS)
nf_nat_htable_size = CONNTRACK_LOCKS;
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 09/11] netfilter: conntrack: delay conntrack hashtable allocation until needed
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
` (7 preceding siblings ...)
2025-11-05 16:48 ` [RFC nf-next 08/11] netfilter: conntrack: make nf_conntrack hash table pernet Florian Westphal
@ 2025-11-05 16:48 ` Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 10/11] netfilter: conntrack: allow non-init-net to change table size Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 11/11] netfilter: nf_nat: make bysource hash table pernet Florian Westphal
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:48 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
Don't allocate the hashtable at netns init time.
Delay this until userspace requests it.
For netfilter users (iptables, nftables), do it before we register the
first conntrack hooks.
The table is allocated in any of these cases:
1. ctnetlink tries to insert an entry
2. sysctl is used to reallocate the table (reallocation == allocation)
3. conntrack base hooks get registered for the first time
TC and OVS need special handling, call the new init helper where needed.
Hashtable release happens at netns exit time.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_conntrack.h | 1 +
net/netfilter/nf_conntrack_bpf.c | 5 +++
net/netfilter/nf_conntrack_core.c | 57 ++++++++++++++++++++++++++--
net/netfilter/nf_conntrack_netlink.c | 19 +++++++---
net/netfilter/nf_conntrack_proto.c | 4 ++
net/openvswitch/conntrack.c | 6 +++
net/sched/act_connmark.c | 6 +++
net/sched/act_ct.c | 7 ++++
net/sched/act_ctinfo.c | 7 ++++
9 files changed, 102 insertions(+), 10 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index e6c3a7dba8dd..7212bcaab02f 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -347,6 +347,7 @@ struct kernel_param;
int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp);
int nf_conntrack_hash_resize(struct net *net, unsigned int hashsize);
+int nf_conntrack_hash_init(struct net *net);
extern seqcount_spinlock_t nf_conntrack_generation;
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 4a136fc3a9c0..545ba9b70286 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -421,6 +421,11 @@ __bpf_kfunc struct nf_conn *bpf_ct_insert_entry(struct nf_conn___init *nfct_i)
struct nf_conn *nfct = (struct nf_conn *)nfct_i;
int err;
+ if (!READ_ONCE(net->ct.nf_conntrack_hash)) {
+ nf_conntrack_free(nfct);
+ return NULL;
+ }
+
if (!nf_ct_is_confirmed(nfct))
nfct->timeout += nfct_time_stamp;
nfct->status |= IPS_CONFIRMED;
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index bbe195f34904..6e69b52572b5 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -728,6 +728,9 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
nf_conntrack_get_ht(net, &ct_hash, &hsize);
bucket = reciprocal_scale(hash, hsize);
+ if (unlikely(!ct_hash))
+ return NULL;
+
hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) {
struct nf_conn *ct;
@@ -2579,6 +2582,14 @@ int nf_conntrack_hash_resize(struct net *net, unsigned int hashsize)
return -ENOMEM;
mutex_lock(&nf_conntrack_mutex);
+
+ if (!net->ct.nf_conntrack_hash) {
+ net->ct.nf_conntrack_hash = hash;
+ net->ct.nf_conntrack_htable_size = hashsize;
+ mutex_unlock(&nf_conntrack_mutex);
+ return 0;
+ }
+
old_size = net->ct.nf_conntrack_htable_size;
if (old_size == hashsize) {
mutex_unlock(&nf_conntrack_mutex);
@@ -2630,6 +2641,48 @@ int nf_conntrack_hash_resize(struct net *net, unsigned int hashsize)
return 0;
}
+/**
+ * nf_conntrack_hash_init - allocate initial conntrack table
+ *
+ * @net: network namespace to operate in
+ *
+ * In order to not waste memory, the hash table is not allocated
+ * on network namespace initialisation, but when userspace requests
+ * the functionality.
+ *
+ * Memory is released from the pernet_ops exit handler.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int nf_conntrack_hash_init(struct net *net)
+{
+ int err = 0;
+
+ if (net->ct.nf_conntrack_hash)
+ return 0;
+
+ mutex_lock(&nf_conntrack_mutex);
+ if (!net->ct.nf_conntrack_hash) {
+ unsigned int size = READ_ONCE(net->ct.nf_conntrack_htable_size);
+ struct hlist_nulls_head *hash;
+
+ hash = nf_ct_alloc_hashtable(&size, 1);
+ if (hash) {
+ struct nf_conntrack_net *cnet = nf_ct_pernet(net);
+
+ net->ct.nf_conntrack_hash = hash;
+ net->ct.nf_conntrack_htable_size = size;
+ cnet->htable_size_user = size;
+ } else {
+ err = -ENOMEM;
+ }
+ }
+
+ mutex_unlock(&nf_conntrack_mutex);
+ return err;
+}
+EXPORT_SYMBOL_GPL(nf_conntrack_hash_init);
+
int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp)
{
unsigned int hashsize;
@@ -2770,10 +2823,6 @@ int nf_conntrack_init_net(struct net *net)
max_factor = 8;
}
- net->ct.nf_conntrack_hash = nf_ct_alloc_hashtable(&ht_size, 1);
- if (!net->ct.nf_conntrack_hash)
- return ret;
-
net->ct.nf_conntrack_htable_size = ht_size;
cnet->htable_size_user = ht_size;
net->ct.nf_conntrack_max = ht_size * max_factor;
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index d707ef7e2d75..47a966a15f07 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1227,6 +1227,9 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
int res, i;
spinlock_t *lockp;
+ if (!net->ct.nf_conntrack_hash)
+ return skb->len;
+
i = 0;
local_bh_disable();
@@ -2379,17 +2382,21 @@ ctnetlink_create_conntrack(struct net *net,
if (tstamp)
tstamp->start = ktime_get_real_ns();
- err = nf_conntrack_hash_check_insert(ct);
+ rcu_read_unlock();
+
+ err = nf_conntrack_hash_init(net);
if (err < 0)
goto err3;
- rcu_read_unlock();
+ err = nf_conntrack_hash_check_insert(ct);
+ if (err < 0) {
+err3:
+ if (ct->master)
+ nf_ct_put(ct->master);
+ goto err1;
+ }
return ct;
-
-err3:
- if (ct->master)
- nf_ct_put(ct->master);
err2:
rcu_read_unlock();
err1:
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index 01eec82f4cba..be78cbe5b50b 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -580,6 +580,10 @@ int nf_ct_netns_get(struct net *net, u8 nfproto)
{
int err;
+ err = nf_conntrack_hash_init(net);
+ if (err)
+ return err;
+
switch (nfproto) {
case NFPROTO_INET:
err = nf_ct_netns_inet_get(net);
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index e573e9221302..a5c1f575bbc1 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -1402,6 +1402,12 @@ int ovs_ct_copy_action(struct net *net, const struct nlattr *attr,
if (err)
return err;
+ err = nf_conntrack_hash_init(net);
+ if (err) {
+ OVS_NLERR(log, "Failed to allocate conntrack table");
+ return err;
+ }
+
/* Set up template for tracking connections in specific zones. */
ct_info.ct = nf_ct_tmpl_alloc(net, &ct_info.zone, GFP_KERNEL);
if (!ct_info.ct) {
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index 3e89927d7116..fbb296d19d87 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -121,6 +121,12 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
if (!tb[TCA_CONNMARK_PARMS])
return -EINVAL;
+ err = nf_conntrack_hash_init(net);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Cannot allocate conntrack table");
+ return err;
+ }
+
nparms = kzalloc(sizeof(*nparms), GFP_KERNEL);
if (!nparms)
return -ENOMEM;
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 6749a4a9a9cd..12799e5ac056 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -1366,6 +1366,13 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla,
NL_SET_ERR_MSG_MOD(extack, "Missing required ct parameters");
return -EINVAL;
}
+
+ err = nf_conntrack_hash_init(net);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Cannot allocate conntrack table");
+ return err;
+ }
+
parm = nla_data(tb[TCA_CT_PARMS]);
index = parm->index;
err = tcf_idr_check_alloc(tn, &index, a, bind);
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
index 71efe04d00b5..cc959bfe9abd 100644
--- a/net/sched/act_ctinfo.c
+++ b/net/sched/act_ctinfo.c
@@ -181,6 +181,13 @@ static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
"Missing required TCA_CTINFO_ACT attribute");
return -EINVAL;
}
+
+ err = nf_conntrack_hash_init(net);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Cannot allocate conntrack table");
+ return err;
+ }
+
actparm = nla_data(tb[TCA_CTINFO_ACT]);
/* do some basic validation here before dynamically allocating things */
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 10/11] netfilter: conntrack: allow non-init-net to change table size
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
` (8 preceding siblings ...)
2025-11-05 16:48 ` [RFC nf-next 09/11] netfilter: conntrack: delay conntrack hashtable allocation until needed Florian Westphal
@ 2025-11-05 16:48 ` Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 11/11] netfilter: nf_nat: make bysource hash table pernet Florian Westphal
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:48 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
This removes the init_net restriction on the bucket sysctl, i.e.
table size can be altered in a netns.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Documentation/networking/nf_conntrack-sysctl.rst | 1 -
net/netfilter/nf_conntrack_standalone.c | 2 --
2 files changed, 3 deletions(-)
diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index eaf11ec1f4dc..a684d7664501 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -19,7 +19,6 @@ nf_conntrack_buckets - INTEGER
loading, the default size is calculated by dividing total memory
by 16384 to determine the number of buckets. The hash table will
never have fewer than 1024 and never more than 262144 buckets.
- This sysctl is only writeable in the initial net namespace.
nf_conntrack_checksum - BOOLEAN
- 0 - disabled
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index f31c95b77041..9170761caf96 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -660,7 +660,6 @@ static struct ctl_table nf_ct_sysctl_table[] = {
},
[NF_SYSCTL_CT_BUCKETS] = {
.procname = "nf_conntrack_buckets",
- .data = &init_net.ct.nf_conntrack_htable_size,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = nf_conntrack_hash_sysctl,
@@ -1047,7 +1046,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
/* Don't allow non-init_net ns to alter global sysctls */
if (!net_eq(&init_net, net)) {
table[NF_SYSCTL_CT_EXPECT_MAX].mode = 0444;
- table[NF_SYSCTL_CT_BUCKETS].mode = 0444;
}
cnet->sysctl_header = register_net_sysctl_sz(net, "net/netfilter",
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC nf-next 11/11] netfilter: nf_nat: make bysource hash table pernet
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
` (9 preceding siblings ...)
2025-11-05 16:48 ` [RFC nf-next 10/11] netfilter: conntrack: allow non-init-net to change table size Florian Westphal
@ 2025-11-05 16:48 ` Florian Westphal
10 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-11-05 16:48 UTC (permalink / raw)
To: netfilter-devel; +Cc: pablo
Improve netns isolation by providing each net namespace
with its own table.
Table is allocated when the namespace requests nat
functionality.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_nat_core.c | 100 ++++++++++++++++++++++++++----------
1 file changed, 74 insertions(+), 26 deletions(-)
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 2e660f4d4ac1..2add90e3d636 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -35,10 +35,6 @@ static spinlock_t nf_nat_locks[CONNTRACK_LOCKS];
static DEFINE_MUTEX(nf_nat_proto_mutex);
static unsigned int nat_net_id __read_mostly;
-static struct hlist_head *nf_nat_bysource __read_mostly;
-static unsigned int nf_nat_htable_size __read_mostly;
-static siphash_aligned_key_t nf_nat_hash_rnd;
-
struct nf_nat_lookup_hook_priv {
struct nf_hook_entries __rcu *entries;
@@ -51,9 +47,18 @@ struct nf_nat_hooks_net {
};
struct nat_net {
+ struct hlist_head *nf_nat_bysource;
+ unsigned int nf_nat_htable_size;
+ siphash_key_t hash_rnd;
+
struct nf_nat_hooks_net nat_proto_net[NFPROTO_NUMPROTO];
};
+static struct nat_net *nf_nat_get_pernet(const struct net *net)
+{
+ return net_generic(net, nat_net_id);
+}
+
#ifdef CONFIG_XFRM
static void nf_nat_ipv4_decode_session(struct sk_buff *skb,
const struct nf_conn *ct,
@@ -153,30 +158,27 @@ hash_by_src(const struct net *net,
const struct nf_conntrack_zone *zone,
const struct nf_conntrack_tuple *tuple)
{
+ struct nat_net *nat_pernet = nf_nat_get_pernet(net);
unsigned int hash;
struct {
struct nf_conntrack_man src;
- u32 net_mix;
u32 protonum;
u32 zone;
} __aligned(SIPHASH_ALIGNMENT) combined;
- get_random_once(&nf_nat_hash_rnd, sizeof(nf_nat_hash_rnd));
-
memset(&combined, 0, sizeof(combined));
/* Original src, to ensure we map it consistently if poss. */
combined.src = tuple->src;
- combined.net_mix = net_hash_mix(net);
combined.protonum = tuple->dst.protonum;
/* Zone ID can be used provided its valid for both directions */
if (zone->dir == NF_CT_DEFAULT_ZONE_DIR)
combined.zone = zone->id;
- hash = siphash(&combined, sizeof(combined), &nf_nat_hash_rnd);
+ hash = siphash(&combined, sizeof(combined), &nat_pernet->hash_rnd);
- return reciprocal_scale(hash, nf_nat_htable_size);
+ return reciprocal_scale(hash, nat_pernet->nf_nat_htable_size);
}
/**
@@ -481,10 +483,12 @@ find_appropriate_src(struct net *net,
struct nf_conntrack_tuple *result,
const struct nf_nat_range2 *range)
{
+ struct nat_net *nat_pernet = nf_nat_get_pernet(net);
unsigned int h = hash_by_src(net, zone, tuple);
const struct nf_conn *ct;
- hlist_for_each_entry_rcu(ct, &nf_nat_bysource[h], nat_bysource) {
+ hlist_for_each_entry_rcu(ct, &nat_pernet->nf_nat_bysource[h],
+ nat_bysource) {
if (same_src(ct, tuple) &&
net_eq(net, nf_ct_net(ct)) &&
nf_ct_zone_equal(ct, zone, IP_CT_DIR_ORIGINAL)) {
@@ -826,6 +830,7 @@ nf_nat_setup_info(struct nf_conn *ct,
}
if (maniptype == NF_NAT_MANIP_SRC) {
+ struct nat_net *nat_net = nf_nat_get_pernet(net);
unsigned int srchash;
spinlock_t *lock;
@@ -834,7 +839,7 @@ nf_nat_setup_info(struct nf_conn *ct,
lock = &nf_nat_locks[srchash % CONNTRACK_LOCKS];
spin_lock_bh(lock);
hlist_add_head_rcu(&ct->nat_bysource,
- &nf_nat_bysource[srchash]);
+ &nat_net->nf_nat_bysource[srchash]);
spin_unlock_bh(lock);
}
@@ -1189,6 +1194,22 @@ static struct nf_ct_helper_expectfn follow_master_nat = {
.expectfn = nf_nat_follow_master,
};
+static bool nf_nat_alloc_bysource(struct nat_net *nat_net, unsigned int size)
+{
+ struct hlist_head *nf_nat_bysource;
+
+ nf_nat_bysource = nf_ct_alloc_hashtable(&size, 0);
+ if (!nf_nat_bysource)
+ return false;
+
+ get_random_bytes_wait(&nat_net->hash_rnd,
+ sizeof(nat_net->hash_rnd));
+
+ nat_net->nf_nat_bysource = nf_nat_bysource;
+ nat_net->nf_nat_htable_size = size;
+ return true;
+}
+
int nf_nat_register_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops,
const struct nf_hook_ops *orig_nat_ops, unsigned int ops_count)
{
@@ -1215,6 +1236,13 @@ int nf_nat_register_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops,
return -EINVAL;
mutex_lock(&nf_nat_proto_mutex);
+
+ if (!nat_net->nf_nat_bysource &&
+ !nf_nat_alloc_bysource(nat_net, net->ct.nf_conntrack_htable_size)) {
+ mutex_unlock(&nf_nat_proto_mutex);
+ return -ENOMEM;
+ }
+
if (!nat_proto_net->nat_hook_ops) {
WARN_ON(nat_proto_net->users != 0);
@@ -1312,8 +1340,41 @@ void nf_nat_unregister_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops,
mutex_unlock(&nf_nat_proto_mutex);
}
+static int __net_init nf_nat_net_init(struct net *net)
+{
+ unsigned int nf_nat_htable_size;
+
+ /* Leave them the same for the moment. */
+ nf_nat_htable_size = net->ct.nf_conntrack_htable_size;
+ if (nf_nat_htable_size < CONNTRACK_LOCKS)
+ nf_nat_htable_size = CONNTRACK_LOCKS;
+
+ return 0;
+}
+
+static void __net_exit nf_nat_net_exit_batch(struct list_head *net_exit_list)
+{
+ struct nf_nat_proto_clean clean = {};
+ struct net *net;
+
+ /* all nat hooks must have been removed at this point */
+ list_for_each_entry(net, net_exit_list, exit_list) {
+ struct nat_net *nat_net = nf_nat_get_pernet(net);
+ struct nf_ct_iter_data iter_data = {
+ .data = &clean,
+ .net = net,
+ };
+
+ nf_ct_iterate_cleanup_net(nf_nat_proto_clean, &iter_data);
+
+ kvfree(nat_net->nf_nat_bysource);
+ }
+}
+
static struct pernet_operations nat_net_ops = {
.id = &nat_net_id,
+ .init = nf_nat_net_init,
+ .exit_batch = nf_nat_net_exit_batch,
.size = sizeof(struct nat_net),
};
@@ -1329,23 +1390,12 @@ static int __init nf_nat_init(void)
{
int ret, i;
- /* Leave them the same for the moment. */
- nf_nat_htable_size = init_net.ct.nf_conntrack_htable_size;
- if (nf_nat_htable_size < CONNTRACK_LOCKS)
- nf_nat_htable_size = CONNTRACK_LOCKS;
-
- nf_nat_bysource = nf_ct_alloc_hashtable(&nf_nat_htable_size, 0);
- if (!nf_nat_bysource)
- return -ENOMEM;
-
for (i = 0; i < CONNTRACK_LOCKS; i++)
spin_lock_init(&nf_nat_locks[i]);
ret = register_pernet_subsys(&nat_net_ops);
- if (ret < 0) {
- kvfree(nf_nat_bysource);
+ if (ret < 0)
return ret;
- }
nf_ct_helper_expectfn_register(&follow_master_nat);
@@ -1358,7 +1408,6 @@ static int __init nf_nat_init(void)
nf_ct_helper_expectfn_unregister(&follow_master_nat);
synchronize_net();
unregister_pernet_subsys(&nat_net_ops);
- kvfree(nf_nat_bysource);
}
return ret;
@@ -1374,7 +1423,6 @@ static void __exit nf_nat_cleanup(void)
RCU_INIT_POINTER(nf_nat_hook, NULL);
synchronize_net();
- kvfree(nf_nat_bysource);
unregister_pernet_subsys(&nat_net_ops);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC nf-next 06/11] netfilter: conntrack: move nf_conntrack_hash to struct net
2025-11-05 16:48 ` [RFC nf-next 06/11] netfilter: conntrack: move nf_conntrack_hash to struct net Florian Westphal
@ 2025-11-07 14:03 ` kernel test robot
0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-11-07 14:03 UTC (permalink / raw)
To: Florian Westphal; +Cc: oe-kbuild-all
Hi Florian,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on next-20251105]
[cannot apply to nf-next/master netfilter-nf/main linus/master horms-ipvs/master v6.18-rc4 v6.18-rc3 v6.18-rc2 v6.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Florian-Westphal/netfilter-netns-nf_conntrack-per-netns-net-netfilter-nf_conntrack_max-sysctl/20251106-014030
base: next-20251105
patch link: https://lore.kernel.org/r/20251105164805.3992-7-fw%40strlen.de
patch subject: [RFC nf-next 06/11] netfilter: conntrack: move nf_conntrack_hash to struct net
config: s390-randconfig-001-20251107 (https://download.01.org/0day-ci/archive/20251107/202511072102.lOOFMVU8-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251107/202511072102.lOOFMVU8-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511072102.lOOFMVU8-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/net/netfilter/nf_conntrack_zones.h:6,
from net/openvswitch/flow.c:38:
include/net/netfilter/nf_conntrack.h: In function 'nf_conntrack_get_ht':
>> include/net/netfilter/nf_conntrack.h:344:13: error: 'struct net' has no member named 'ct'
hptr = net->ct.nf_conntrack_hash;
^~
include/net/netfilter/nf_conntrack.h:345:18: error: 'struct net' has no member named 'ct'
hptr = init_net.ct.nf_conntrack_hash;
^
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for OF_GPIO
Depends on [n]: GPIOLIB [=y] && OF [=y] && HAS_IOMEM [=n]
Selected by [m]:
- REGULATOR_RT5133 [=m] && REGULATOR [=y] && I2C [=m] && GPIOLIB [=y] && OF [=y]
vim +344 include/net/netfilter/nf_conntrack.h
332
333 /* must be called with rcu read lock held */
334 static inline void
335 nf_conntrack_get_ht(struct net *net, struct hlist_nulls_head **hash,
336 unsigned int *hsize)
337 {
338 struct hlist_nulls_head *hptr;
339 unsigned int sequence, hsz;
340
341 do {
342 sequence = read_seqcount_begin(&nf_conntrack_generation);
343 hsz = nf_conntrack_htable_size;
> 344 hptr = net->ct.nf_conntrack_hash;
345 hptr = init_net.ct.nf_conntrack_hash;
346 } while (read_seqcount_retry(&nf_conntrack_generation, sequence));
347
348 *hash = hptr;
349 *hsize = hsz;
350 }
351
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC nf-next 08/11] netfilter: conntrack: make nf_conntrack hash table pernet
2025-11-05 16:48 ` [RFC nf-next 08/11] netfilter: conntrack: make nf_conntrack hash table pernet Florian Westphal
@ 2025-11-07 16:05 ` kernel test robot
0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-11-07 16:05 UTC (permalink / raw)
To: Florian Westphal; +Cc: oe-kbuild-all
Hi Florian,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:
[auto build test WARNING on next-20251105]
[cannot apply to nf-next/master netfilter-nf/main linus/master horms-ipvs/master v6.18-rc4 v6.18-rc3 v6.18-rc2 v6.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Florian-Westphal/netfilter-netns-nf_conntrack-per-netns-net-netfilter-nf_conntrack_max-sysctl/20251106-014030
base: next-20251105
patch link: https://lore.kernel.org/r/20251105164805.3992-9-fw%40strlen.de
patch subject: [RFC nf-next 08/11] netfilter: conntrack: make nf_conntrack hash table pernet
config: s390-randconfig-001-20251107 (https://download.01.org/0day-ci/archive/20251107/202511072353.gCeACgfc-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251107/202511072353.gCeACgfc-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511072353.gCeACgfc-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from include/net/netfilter/nf_conntrack_zones.h:6,
from net/openvswitch/flow.c:38:
include/net/netfilter/nf_conntrack.h: In function 'nf_conntrack_get_ht':
include/net/netfilter/nf_conntrack.h:363:12: error: 'struct net' has no member named 'ct'
hsz = net->ct.nf_conntrack_htable_size;
^~
include/net/netfilter/nf_conntrack.h:364:13: error: 'struct net' has no member named 'ct'
hptr = net->ct.nf_conntrack_hash;
^~
include/net/netfilter/nf_conntrack.h: In function 'nf_conntrack_max':
>> include/net/netfilter/nf_conntrack.h:397:1: warning: no return statement in function returning non-void [-Wreturn-type]
}
^
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for OF_GPIO
Depends on [n]: GPIOLIB [=y] && OF [=y] && HAS_IOMEM [=n]
Selected by [m]:
- REGULATOR_RT5133 [=m] && REGULATOR [=y] && I2C [=m] && GPIOLIB [=y] && OF [=y]
vim +397 include/net/netfilter/nf_conntrack.h
9fb9cbb1082d6b Yasuyuki Kozakai 2005-11-09 352
92e47ba8839bac Liping Zhang 2016-08-13 353 /* must be called with rcu read lock held */
92e47ba8839bac Liping Zhang 2016-08-13 354 static inline void
eca18b74271e4e Florian Westphal 2025-11-05 355 nf_conntrack_get_ht(struct net *net, struct hlist_nulls_head **hash,
eca18b74271e4e Florian Westphal 2025-11-05 356 unsigned int *hsize)
92e47ba8839bac Liping Zhang 2016-08-13 357 {
92e47ba8839bac Liping Zhang 2016-08-13 358 struct hlist_nulls_head *hptr;
92e47ba8839bac Liping Zhang 2016-08-13 359 unsigned int sequence, hsz;
92e47ba8839bac Liping Zhang 2016-08-13 360
92e47ba8839bac Liping Zhang 2016-08-13 361 do {
92e47ba8839bac Liping Zhang 2016-08-13 362 sequence = read_seqcount_begin(&nf_conntrack_generation);
193325b9852e29 Florian Westphal 2025-11-05 @363 hsz = net->ct.nf_conntrack_htable_size;
eca18b74271e4e Florian Westphal 2025-11-05 364 hptr = net->ct.nf_conntrack_hash;
92e47ba8839bac Liping Zhang 2016-08-13 365 } while (read_seqcount_retry(&nf_conntrack_generation, sequence));
92e47ba8839bac Liping Zhang 2016-08-13 366
92e47ba8839bac Liping Zhang 2016-08-13 367 *hash = hptr;
92e47ba8839bac Liping Zhang 2016-08-13 368 *hsize = hsz;
92e47ba8839bac Liping Zhang 2016-08-13 369 }
92e47ba8839bac Liping Zhang 2016-08-13 370
308ac9143ee220 Daniel Borkmann 2015-08-08 371 struct nf_conn *nf_ct_tmpl_alloc(struct net *net,
308ac9143ee220 Daniel Borkmann 2015-08-08 372 const struct nf_conntrack_zone *zone,
308ac9143ee220 Daniel Borkmann 2015-08-08 373 gfp_t flags);
9cf94eab8b309e Daniel Borkmann 2015-08-31 374 void nf_ct_tmpl_free(struct nf_conn *tmpl);
e53376bef2cd97 Pablo Neira Ayuso 2014-02-03 375
3c79107631db1f Florian Westphal 2019-04-01 376 u32 nf_ct_get_id(const struct nf_conn *ct);
c53bd0e96662c2 Florian Westphal 2021-04-12 377 u32 nf_conntrack_count(const struct net *net);
3c79107631db1f Florian Westphal 2019-04-01 378
c74454fadd5ea6 Florian Westphal 2017-01-23 379 static inline void
c74454fadd5ea6 Florian Westphal 2017-01-23 380 nf_ct_set(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info info)
c74454fadd5ea6 Florian Westphal 2017-01-23 381 {
261db6c2fbd64a Jeremy Sowden 2019-09-13 382 skb_set_nfct(skb, (unsigned long)ct | info);
c74454fadd5ea6 Florian Westphal 2017-01-23 383 }
c74454fadd5ea6 Florian Westphal 2017-01-23 384
0418b989a46788 Pablo Neira Ayuso 2021-06-02 385 extern unsigned int nf_conntrack_net_id;
0418b989a46788 Pablo Neira Ayuso 2021-06-02 386
0418b989a46788 Pablo Neira Ayuso 2021-06-02 387 static inline struct nf_conntrack_net *nf_ct_pernet(const struct net *net)
0418b989a46788 Pablo Neira Ayuso 2021-06-02 388 {
0418b989a46788 Pablo Neira Ayuso 2021-06-02 389 return net_generic(net, nf_conntrack_net_id);
0418b989a46788 Pablo Neira Ayuso 2021-06-02 390 }
0418b989a46788 Pablo Neira Ayuso 2021-06-02 391
e22987c91161c3 lvxiafei 2025-11-05 392 static inline unsigned int nf_conntrack_max(const struct net *net)
e22987c91161c3 lvxiafei 2025-11-05 393 {
e22987c91161c3 lvxiafei 2025-11-05 394 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
193325b9852e29 Florian Westphal 2025-11-05 395 return net->ct.nf_conntrack_max;
e22987c91161c3 lvxiafei 2025-11-05 396 #endif
e22987c91161c3 lvxiafei 2025-11-05 @397 }
e22987c91161c3 lvxiafei 2025-11-05 398
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-11-07 16:05 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-05 16:47 [RFC nf-next 00/11] netfilter: conntrack: pernet hash tables Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 01/11] netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_max sysctl Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 02/11] netfilter: conntrack: don't schedule gc worker when table is empty Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 03/11] tests: netfilter: conntrack_resize: prepare for pernet conntrack table Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 04/11] netfilter: conntrack: pass pointer to buckets instead of index Florian Westphal
2025-11-05 16:47 ` [RFC nf-next 05/11] netfilter: conntrack: split hashtable auto-size to helper function Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 06/11] netfilter: conntrack: move nf_conntrack_hash to struct net Florian Westphal
2025-11-07 14:03 ` kernel test robot
2025-11-05 16:48 ` [RFC nf-next 07/11] netfilter: conntrack: init and start independent gc workers when needed Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 08/11] netfilter: conntrack: make nf_conntrack hash table pernet Florian Westphal
2025-11-07 16:05 ` kernel test robot
2025-11-05 16:48 ` [RFC nf-next 09/11] netfilter: conntrack: delay conntrack hashtable allocation until needed Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 10/11] netfilter: conntrack: allow non-init-net to change table size Florian Westphal
2025-11-05 16:48 ` [RFC nf-next 11/11] netfilter: nf_nat: make bysource hash table pernet Florian Westphal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.