* [PATCH v2 net-next 0/7] netfilter: updates for net-next
@ 2025-09-02 13:35 Florian Westphal
0 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2025-09-02 13:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
v2: drop patch 5, to be routed via net tree. No other changes.
Hi,
The following patchset contains Netfilter fixes for *net-next*:
1) prefer vmalloc_array in ebtables, from Qianfeng Rong.
2) Use csum_replace4 instead of open-coding it, from Christophe Leroy.
3+4) Get rid of GFP_ATOMIC in transaction object allocations, those
cause silly failures with large sets under memory pressure, from
myself.
5) Remove test for AVX cpu feature in nftables pipapo set type,
testing for AVX2 feature is sufficient.
6) Unexport a few function in nf_reject infra: no external callers.
7) Extend payload offset to u16, this was restricted to values <=255
so far, from Fernando Fernandez Mancera.
Please, pull these changes from:
The following changes since commit cd8a4cfa6bb43a441901e82f5c222dddc75a18a3:
Merge branch 'e-switch-vport-sharing-delegation' (2025-09-02 15:18:19 +0200)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-25-09-02
for you to fetch changes up to 077dc4a275790b09e8a2ce80822ba8970e9dfb99:
netfilter: nft_payload: extend offset to 65535 bytes (2025-09-02 15:28:18 +0200)
----------------------------------------------------------------
netfilter pull request nf-next-25-09-02
----------------------------------------------------------------
Christophe Leroy (1):
netfilter: nft_payload: Use csum_replace4() instead of opencoding
Fernando Fernandez Mancera (1):
netfilter: nft_payload: extend offset to 65535 bytes
Florian Westphal (4):
netfilter: nf_tables: allow iter callbacks to sleep
netfilter: nf_tables: all transaction allocations can now sleep
netfilter: nft_set_pipapo: remove redundant test for avx feature bit
netfilter: nf_reject: remove unneeded exports
Qianfeng Rong (1):
netfilter: ebtables: Use vmalloc_array() to improve code
include/net/netfilter/ipv4/nf_reject.h | 8 ---
include/net/netfilter/ipv6/nf_reject.h | 10 ----
include/net/netfilter/nf_tables.h | 2 +
include/net/netfilter/nf_tables_core.h | 2 +-
net/bridge/netfilter/ebtables.c | 14 ++---
net/ipv4/netfilter/nf_reject_ipv4.c | 27 +++++----
net/ipv6/netfilter/nf_reject_ipv6.c | 37 ++++++++----
net/netfilter/nf_tables_api.c | 47 +++++++---------
net/netfilter/nft_payload.c | 20 ++++---
net/netfilter/nft_set_hash.c | 100 ++++++++++++++++++++++++++++++++-
net/netfilter/nft_set_pipapo.c | 3 +-
net/netfilter/nft_set_pipapo_avx2.c | 2 +-
net/netfilter/nft_set_rbtree.c | 35 +++++++++---
13 files changed, 209 insertions(+), 98 deletions(-)
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 0/7] netfilter: updates for net-next
@ 2026-01-29 10:54 Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 1/7] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
` (7 more replies)
0 siblings, 8 replies; 16+ messages in thread
From: Florian Westphal @ 2026-01-29 10:54 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi,
v2: discard buggy nfqueue patch, no other changes.
The following patchset contains Netfilter updates for *net-next*:
Patches 1 to 4 add IP6IP6 tunneling acceleration to the flowtable
infrastructure. Patch 5 extends test coverage for this.
From Lorenzo Bianconi.
Patch 6 removes a duplicated helper from xt_time extension, we can
use an existing helper for this, from Jinjie Ruan.
Patch 7 adds an rhashtable to nfnetink_queue to speed up out-of-order
verdict processing. Before this list walk was required due to in-order
design assumption.
Please, pull these changes from:
The following changes since commit aba0138eb7d72fec755a985fae42a54b7ff147a8:
net: ethernet: neterion: s2io: remove unused driver (2026-01-28 20:08:07 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-26-01-29
for you to fetch changes up to e19079adcd26a25d7d3e586b1837493361fdf8b6:
netfilter: nfnetlink_queue: optimize verdict lookup with hash table (2026-01-29 09:52:07 +0100)
----------------------------------------------------------------
netfilter pull request nf-next-26-01-29
----------------------------------------------------------------
Jinjie Ruan (1):
netfilter: xt_time: use is_leap_year() helper
Lorenzo Bianconi (5):
netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature
netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct
netfilter: flowtable: Add IP6IP6 rx sw acceleration
netfilter: flowtable: Add IP6IP6 tx sw acceleration
selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest
Scott Mitchell (1):
netfilter: nfnetlink_queue: optimize verdict lookup with hash table
include/net/netfilter/nf_queue.h | 3 +
net/ipv6/ip6_tunnel.c | 27 ++
net/netfilter/nf_flow_table_ip.c | 243 +++++++++++++++---
net/netfilter/nfnetlink_queue.c | 146 ++++++++---
net/netfilter/xt_time.c | 8 +-
.../selftests/net/netfilter/nft_flowtable.sh | 62 ++++-
6 files changed, 408 insertions(+), 81 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 1/7] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
@ 2026-01-29 10:54 ` Florian Westphal
2026-01-29 14:10 ` patchwork-bot+netdevbpf
2026-01-29 10:54 ` [PATCH v2 net-next 2/7] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct Florian Westphal
` (6 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2026-01-29 10:54 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Lorenzo Bianconi <lorenzo@kernel.org>
Rely on nf_flowtable_ctx struct pointer in nf_flow_ip4_tunnel_proto and
nf_flow_skb_encap_protocol routine signature. This is a preliminary patch
to introduce IP6IP6 flowtable acceleration since nf_flowtable_ctx will
be used to store IP6IP6 tunnel info.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_flow_table_ip.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 11da560f38bf..283b3fe61919 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -295,15 +295,16 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
return NF_STOLEN;
}
-static bool nf_flow_ip4_tunnel_proto(struct sk_buff *skb, u32 *psize)
+static bool nf_flow_ip4_tunnel_proto(struct nf_flowtable_ctx *ctx,
+ struct sk_buff *skb)
{
struct iphdr *iph;
u16 size;
- if (!pskb_may_pull(skb, sizeof(*iph) + *psize))
+ if (!pskb_may_pull(skb, sizeof(*iph) + ctx->offset))
return false;
- iph = (struct iphdr *)(skb_network_header(skb) + *psize);
+ iph = (struct iphdr *)(skb_network_header(skb) + ctx->offset);
size = iph->ihl << 2;
if (ip_is_fragment(iph) || unlikely(ip_has_options(size)))
@@ -313,7 +314,7 @@ static bool nf_flow_ip4_tunnel_proto(struct sk_buff *skb, u32 *psize)
return false;
if (iph->protocol == IPPROTO_IPIP)
- *psize += size;
+ ctx->offset += size;
return true;
}
@@ -329,8 +330,8 @@ static void nf_flow_ip4_tunnel_pop(struct sk_buff *skb)
skb_reset_network_header(skb);
}
-static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
- u32 *offset)
+static bool nf_flow_skb_encap_protocol(struct nf_flowtable_ctx *ctx,
+ struct sk_buff *skb, __be16 proto)
{
__be16 inner_proto = skb->protocol;
struct vlan_ethhdr *veth;
@@ -343,7 +344,7 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
veth = (struct vlan_ethhdr *)skb_mac_header(skb);
if (veth->h_vlan_encapsulated_proto == proto) {
- *offset += VLAN_HLEN;
+ ctx->offset += VLAN_HLEN;
inner_proto = proto;
ret = true;
}
@@ -351,14 +352,14 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
case htons(ETH_P_PPP_SES):
if (nf_flow_pppoe_proto(skb, &inner_proto) &&
inner_proto == proto) {
- *offset += PPPOE_SES_HLEN;
+ ctx->offset += PPPOE_SES_HLEN;
ret = true;
}
break;
}
if (inner_proto == htons(ETH_P_IP))
- ret = nf_flow_ip4_tunnel_proto(skb, offset);
+ ret = nf_flow_ip4_tunnel_proto(ctx, skb);
return ret;
}
@@ -416,7 +417,7 @@ nf_flow_offload_lookup(struct nf_flowtable_ctx *ctx,
{
struct flow_offload_tuple tuple = {};
- if (!nf_flow_skb_encap_protocol(skb, htons(ETH_P_IP), &ctx->offset))
+ if (!nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IP)))
return NULL;
if (nf_flow_tuple_ip(ctx, skb, &tuple) < 0)
@@ -897,7 +898,7 @@ nf_flow_offload_ipv6_lookup(struct nf_flowtable_ctx *ctx,
struct flow_offload_tuple tuple = {};
if (skb->protocol != htons(ETH_P_IPV6) &&
- !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IPV6), &ctx->offset))
+ !nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IPV6)))
return NULL;
if (nf_flow_tuple_ipv6(ctx, skb, &tuple) < 0)
--
2.52.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 2/7] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 1/7] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
@ 2026-01-29 10:54 ` Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 3/7] netfilter: flowtable: Add IP6IP6 rx sw acceleration Florian Westphal
` (5 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2026-01-29 10:54 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Lorenzo Bianconi <lorenzo@kernel.org>
Add tunnel hdr_size and tunnel proto fields in nf_flowtable_ctx struct
in order to store IP tunnel header size and protocol used during IPIP
and IP6IP6 tunnel sw offloading decapsulation and avoid recomputing them
during tunnel header pop since this is constant for IPv6.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_flow_table_ip.c | 41 +++++++++++++++++++-------------
1 file changed, 25 insertions(+), 16 deletions(-)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 283b3fe61919..ddfaddfa57be 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -144,6 +144,18 @@ static bool ip_has_options(unsigned int thoff)
return thoff != sizeof(struct iphdr);
}
+struct nf_flowtable_ctx {
+ const struct net_device *in;
+ u32 offset;
+ u32 hdrsize;
+ struct {
+ /* Tunnel IP header size */
+ u32 hdr_size;
+ /* IP tunnel protocol */
+ u8 proto;
+ } tun;
+};
+
static void nf_flow_tuple_encap(struct sk_buff *skb,
struct flow_offload_tuple *tuple)
{
@@ -186,12 +198,6 @@ static void nf_flow_tuple_encap(struct sk_buff *skb,
}
}
-struct nf_flowtable_ctx {
- const struct net_device *in;
- u32 offset;
- u32 hdrsize;
-};
-
static int nf_flow_tuple_ip(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
struct flow_offload_tuple *tuple)
{
@@ -313,20 +319,22 @@ static bool nf_flow_ip4_tunnel_proto(struct nf_flowtable_ctx *ctx,
if (iph->ttl <= 1)
return false;
- if (iph->protocol == IPPROTO_IPIP)
+ if (iph->protocol == IPPROTO_IPIP) {
+ ctx->tun.proto = IPPROTO_IPIP;
+ ctx->tun.hdr_size = size;
ctx->offset += size;
+ }
return true;
}
-static void nf_flow_ip4_tunnel_pop(struct sk_buff *skb)
+static void nf_flow_ip4_tunnel_pop(struct nf_flowtable_ctx *ctx,
+ struct sk_buff *skb)
{
- struct iphdr *iph = (struct iphdr *)skb_network_header(skb);
-
- if (iph->protocol != IPPROTO_IPIP)
+ if (ctx->tun.proto != IPPROTO_IPIP)
return;
- skb_pull(skb, iph->ihl << 2);
+ skb_pull(skb, ctx->tun.hdr_size);
skb_reset_network_header(skb);
}
@@ -364,7 +372,8 @@ static bool nf_flow_skb_encap_protocol(struct nf_flowtable_ctx *ctx,
return ret;
}
-static void nf_flow_encap_pop(struct sk_buff *skb,
+static void nf_flow_encap_pop(struct nf_flowtable_ctx *ctx,
+ struct sk_buff *skb,
struct flow_offload_tuple_rhash *tuplehash)
{
struct vlan_hdr *vlan_hdr;
@@ -391,7 +400,7 @@ static void nf_flow_encap_pop(struct sk_buff *skb,
}
if (skb->protocol == htons(ETH_P_IP))
- nf_flow_ip4_tunnel_pop(skb);
+ nf_flow_ip4_tunnel_pop(ctx, skb);
}
struct nf_flow_xmit {
@@ -461,7 +470,7 @@ static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
flow_offload_refresh(flow_table, flow, false);
- nf_flow_encap_pop(skb, tuplehash);
+ nf_flow_encap_pop(ctx, skb, tuplehash);
thoff -= ctx->offset;
iph = ip_hdr(skb);
@@ -876,7 +885,7 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
flow_offload_refresh(flow_table, flow, false);
- nf_flow_encap_pop(skb, tuplehash);
+ nf_flow_encap_pop(ctx, skb, tuplehash);
ip6h = ipv6_hdr(skb);
nf_flow_nat_ipv6(flow, skb, dir, ip6h);
--
2.52.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 3/7] netfilter: flowtable: Add IP6IP6 rx sw acceleration
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 1/7] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 2/7] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct Florian Westphal
@ 2026-01-29 10:54 ` Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 4/7] netfilter: flowtable: Add IP6IP6 tx " Florian Westphal
` (4 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2026-01-29 10:54 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Lorenzo Bianconi <lorenzo@kernel.org>
Introduce sw acceleration for rx path of IP6IP6 tunnels relying on the
netfilter flowtable infrastructure. Subsequent patches will add sw
acceleration for IP6IP6 tunnels tx path.
IP6IP6 rx sw acceleration can be tested running the following scenario
where the traffic is forwarded between two NICs (eth0 and eth1) and an
IP6IP6 tunnel is used to access a remote site (using eth1 as the underlay
device):
ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (2001:db8:3::2)
$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
inet6 2001:db8:1::2/64 scope global nodad
valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
inet6 2001:db8:2::1/64 scope global nodad
valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr ce9c:2940:7dcc::
inet6 2002:db8:1::1/64 scope global nodad
valid_lft forever preferred_lft forever
$ip -6 route show
2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium
2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium
default via 2002:db8:1::2 dev tun0 metric 1024 pref medium
$nft list ruleset
table inet filter {
flowtable ft {
hook ingress priority filter
devices = { eth0, eth1 }
}
chain forward {
type filter hook forward priority filter; policy accept;
meta l4proto { tcp, udp } flow add @ft
}
}
Reproducing the scenario described above using veths I got the following
results:
- TCP stream received from the IPIP tunnel:
- net-next: (baseline) ~ 81Gbps
- net-next + IP6IP6 flowtbale support: ~112Gbps
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/ipv6/ip6_tunnel.c | 27 +++++++++++
net/netfilter/nf_flow_table_ip.c | 83 +++++++++++++++++++++++++++-----
2 files changed, 97 insertions(+), 13 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index c1f39735a236..f68f6f110a3e 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1828,6 +1828,32 @@ int ip6_tnl_encap_setup(struct ip6_tnl *t,
}
EXPORT_SYMBOL_GPL(ip6_tnl_encap_setup);
+static int ip6_tnl_fill_forward_path(struct net_device_path_ctx *ctx,
+ struct net_device_path *path)
+{
+ struct ip6_tnl *t = netdev_priv(ctx->dev);
+ struct flowi6 fl6 = {
+ .daddr = t->parms.raddr,
+ };
+ struct dst_entry *dst;
+ int err;
+
+ dst = ip6_route_output(dev_net(ctx->dev), NULL, &fl6);
+ if (!dst->error) {
+ path->type = DEV_PATH_TUN;
+ path->tun.src_v6 = t->parms.laddr;
+ path->tun.dst_v6 = t->parms.raddr;
+ path->tun.l3_proto = IPPROTO_IPV6;
+ path->dev = ctx->dev;
+ ctx->dev = dst->dev;
+ }
+
+ err = dst->error;
+ dst_release(dst);
+
+ return err;
+}
+
static const struct net_device_ops ip6_tnl_netdev_ops = {
.ndo_init = ip6_tnl_dev_init,
.ndo_uninit = ip6_tnl_dev_uninit,
@@ -1836,6 +1862,7 @@ static const struct net_device_ops ip6_tnl_netdev_ops = {
.ndo_change_mtu = ip6_tnl_change_mtu,
.ndo_get_stats64 = dev_get_tstats64,
.ndo_get_iflink = ip6_tnl_get_iflink,
+ .ndo_fill_forward_path = ip6_tnl_fill_forward_path,
};
#define IPXIPX_FEATURES (NETIF_F_SG | \
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index ddfaddfa57be..51c64b3d4e50 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -156,12 +156,14 @@ struct nf_flowtable_ctx {
} tun;
};
-static void nf_flow_tuple_encap(struct sk_buff *skb,
+static void nf_flow_tuple_encap(struct nf_flowtable_ctx *ctx,
+ struct sk_buff *skb,
struct flow_offload_tuple *tuple)
{
__be16 inner_proto = skb->protocol;
struct vlan_ethhdr *veth;
struct pppoe_hdr *phdr;
+ struct ipv6hdr *ip6h;
struct iphdr *iph;
u16 offset = 0;
int i = 0;
@@ -188,13 +190,25 @@ static void nf_flow_tuple_encap(struct sk_buff *skb,
break;
}
- if (inner_proto == htons(ETH_P_IP)) {
+ switch (inner_proto) {
+ case htons(ETH_P_IP):
iph = (struct iphdr *)(skb_network_header(skb) + offset);
- if (iph->protocol == IPPROTO_IPIP) {
+ if (ctx->tun.proto == IPPROTO_IPIP) {
tuple->tun.dst_v4.s_addr = iph->daddr;
tuple->tun.src_v4.s_addr = iph->saddr;
tuple->tun.l3_proto = IPPROTO_IPIP;
}
+ break;
+ case htons(ETH_P_IPV6):
+ ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
+ if (ctx->tun.proto == IPPROTO_IPV6) {
+ tuple->tun.dst_v6 = ip6h->daddr;
+ tuple->tun.src_v6 = ip6h->saddr;
+ tuple->tun.l3_proto = IPPROTO_IPV6;
+ }
+ break;
+ default:
+ break;
}
}
@@ -265,7 +279,7 @@ static int nf_flow_tuple_ip(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
tuple->l3proto = AF_INET;
tuple->l4proto = ipproto;
tuple->iifidx = ctx->in->ifindex;
- nf_flow_tuple_encap(skb, tuple);
+ nf_flow_tuple_encap(ctx, skb, tuple);
return 0;
}
@@ -328,10 +342,45 @@ static bool nf_flow_ip4_tunnel_proto(struct nf_flowtable_ctx *ctx,
return true;
}
-static void nf_flow_ip4_tunnel_pop(struct nf_flowtable_ctx *ctx,
- struct sk_buff *skb)
+static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
+ struct sk_buff *skb)
{
- if (ctx->tun.proto != IPPROTO_IPIP)
+#if IS_ENABLED(CONFIG_IPV6)
+ struct ipv6hdr *ip6h, _ip6h;
+ __be16 frag_off;
+ u8 nexthdr;
+ int hdrlen;
+
+ ip6h = skb_header_pointer(skb, ctx->offset, sizeof(*ip6h), &_ip6h);
+ if (!ip6h)
+ return false;
+
+ if (ip6h->hop_limit <= 1)
+ return false;
+
+ nexthdr = ip6h->nexthdr;
+ hdrlen = ipv6_skip_exthdr(skb, sizeof(*ip6h) + ctx->offset, &nexthdr,
+ &frag_off);
+ if (hdrlen < 0)
+ return false;
+
+ if (nexthdr == IPPROTO_IPV6) {
+ ctx->tun.hdr_size = hdrlen;
+ ctx->tun.proto = IPPROTO_IPV6;
+ }
+ ctx->offset += ctx->tun.hdr_size;
+
+ return true;
+#else
+ return false;
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+}
+
+static void nf_flow_ip_tunnel_pop(struct nf_flowtable_ctx *ctx,
+ struct sk_buff *skb)
+{
+ if (ctx->tun.proto != IPPROTO_IPIP &&
+ ctx->tun.proto != IPPROTO_IPV6)
return;
skb_pull(skb, ctx->tun.hdr_size);
@@ -366,8 +415,16 @@ static bool nf_flow_skb_encap_protocol(struct nf_flowtable_ctx *ctx,
break;
}
- if (inner_proto == htons(ETH_P_IP))
+ switch (inner_proto) {
+ case htons(ETH_P_IP):
ret = nf_flow_ip4_tunnel_proto(ctx, skb);
+ break;
+ case htons(ETH_P_IPV6):
+ ret = nf_flow_ip6_tunnel_proto(ctx, skb);
+ break;
+ default:
+ break;
+ }
return ret;
}
@@ -399,8 +456,9 @@ static void nf_flow_encap_pop(struct nf_flowtable_ctx *ctx,
}
}
- if (skb->protocol == htons(ETH_P_IP))
- nf_flow_ip4_tunnel_pop(ctx, skb);
+ if (skb->protocol == htons(ETH_P_IP) ||
+ skb->protocol == htons(ETH_P_IPV6))
+ nf_flow_ip_tunnel_pop(ctx, skb);
}
struct nf_flow_xmit {
@@ -848,7 +906,7 @@ static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
tuple->l3proto = AF_INET6;
tuple->l4proto = nexthdr;
tuple->iifidx = ctx->in->ifindex;
- nf_flow_tuple_encap(skb, tuple);
+ nf_flow_tuple_encap(ctx, skb, tuple);
return 0;
}
@@ -906,8 +964,7 @@ nf_flow_offload_ipv6_lookup(struct nf_flowtable_ctx *ctx,
{
struct flow_offload_tuple tuple = {};
- if (skb->protocol != htons(ETH_P_IPV6) &&
- !nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IPV6)))
+ if (!nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IPV6)))
return NULL;
if (nf_flow_tuple_ipv6(ctx, skb, &tuple) < 0)
--
2.52.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 4/7] netfilter: flowtable: Add IP6IP6 tx sw acceleration
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
` (2 preceding siblings ...)
2026-01-29 10:54 ` [PATCH v2 net-next 3/7] netfilter: flowtable: Add IP6IP6 rx sw acceleration Florian Westphal
@ 2026-01-29 10:54 ` Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 5/7] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest Florian Westphal
` (3 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2026-01-29 10:54 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Lorenzo Bianconi <lorenzo@kernel.org>
Introduce sw acceleration for tx path of IP6IP6 tunnels relying on the
netfilter flowtable infrastructure.
IP6IP6 tx sw acceleration can be tested running the following scenario
where the traffic is forwarded between two NICs (eth0 and eth1) and an
IP6IP6 tunnel is used to access a remote site (using eth1 as the underlay
device):
ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (2001:db8:3::2)
$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
inet6 2001:db8:1::2/64 scope global nodad
valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
inet6 2001:db8:2::1/64 scope global nodad
valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr ce9c:2940:7dcc::
inet6 2002:db8:1::1/64 scope global nodad
valid_lft forever preferred_lft forever
$ip -6 route show
2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium
2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium
2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium
default via 2002:db8:1::2 dev tun0 metric 1024 pref medium
$nft list ruleset
table inet filter {
flowtable ft {
hook ingress priority filter
devices = { eth0, eth1 }
}
chain forward {
type filter hook forward priority filter; policy accept;
meta l4proto { tcp, udp } flow add @ft
}
}
Reproducing the scenario described above using veths I got the following
results:
- TCP stream received from the IPIP tunnel:
- net-next: (baseline) ~93Gbps
- net-next + IP6IP6 flowtbale support: ~98Gbps
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_flow_table_ip.c | 108 ++++++++++++++++++++++++++++++-
1 file changed, 106 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 51c64b3d4e50..3fdb10d9bf7f 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -14,6 +14,7 @@
#include <net/ip.h>
#include <net/ipv6.h>
#include <net/ip6_route.h>
+#include <net/ip6_tunnel.h>
#include <net/neighbour.h>
#include <net/netfilter/nf_flow_table.h>
#include <net/netfilter/nf_conntrack_acct.h>
@@ -637,6 +638,97 @@ static int nf_flow_tunnel_v4_push(struct net *net, struct sk_buff *skb,
return 0;
}
+struct ipv6_tel_txoption {
+ struct ipv6_txoptions ops;
+ __u8 dst_opt[8];
+};
+
+static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
+ struct flow_offload_tuple *tuple,
+ struct in6_addr **ip6_daddr,
+ int encap_limit)
+{
+ struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb);
+ u8 hop_limit = ip6h->hop_limit, proto = IPPROTO_IPV6;
+ struct rtable *rt = dst_rtable(tuple->dst_cache);
+ __u8 dsfield = ipv6_get_dsfield(ip6h);
+ struct flowi6 fl6 = {
+ .daddr = tuple->tun.src_v6,
+ .saddr = tuple->tun.dst_v6,
+ .flowi6_proto = proto,
+ };
+ int err, mtu;
+ u32 headroom;
+
+ err = iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6);
+ if (err)
+ return err;
+
+ skb_set_inner_ipproto(skb, proto);
+ headroom = sizeof(*ip6h) + LL_RESERVED_SPACE(rt->dst.dev) +
+ rt->dst.header_len;
+ if (encap_limit)
+ headroom += 8;
+ err = skb_cow_head(skb, headroom);
+ if (err)
+ return err;
+
+ skb_scrub_packet(skb, true);
+ mtu = dst_mtu(&rt->dst) - sizeof(*ip6h);
+ if (encap_limit)
+ mtu -= 8;
+ mtu = max(mtu, IPV6_MIN_MTU);
+ skb_dst_update_pmtu_no_confirm(skb, mtu);
+
+ if (encap_limit > 0) {
+ struct ipv6_tel_txoption opt = {
+ .dst_opt[2] = IPV6_TLV_TNL_ENCAP_LIMIT,
+ .dst_opt[3] = 1,
+ .dst_opt[4] = encap_limit,
+ .dst_opt[5] = IPV6_TLV_PADN,
+ .dst_opt[6] = 1,
+ };
+ struct ipv6_opt_hdr *hopt;
+
+ opt.ops.dst1opt = (struct ipv6_opt_hdr *)opt.dst_opt;
+ opt.ops.opt_nflen = 8;
+
+ hopt = skb_push(skb, ipv6_optlen(opt.ops.dst1opt));
+ memcpy(hopt, opt.ops.dst1opt, ipv6_optlen(opt.ops.dst1opt));
+ hopt->nexthdr = IPPROTO_IPV6;
+ proto = NEXTHDR_DEST;
+ }
+
+ skb_push(skb, sizeof(*ip6h));
+ skb_reset_network_header(skb);
+
+ ip6h = ipv6_hdr(skb);
+ ip6_flow_hdr(ip6h, dsfield,
+ ip6_make_flowlabel(net, skb, fl6.flowlabel, true, &fl6));
+ ip6h->hop_limit = hop_limit;
+ ip6h->nexthdr = proto;
+ ip6h->daddr = tuple->tun.src_v6;
+ ip6h->saddr = tuple->tun.dst_v6;
+ ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(*ip6h));
+ IP6CB(skb)->nhoff = offsetof(struct ipv6hdr, nexthdr);
+
+ *ip6_daddr = &tuple->tun.src_v6;
+
+ return 0;
+}
+
+static int nf_flow_tunnel_v6_push(struct net *net, struct sk_buff *skb,
+ struct flow_offload_tuple *tuple,
+ struct in6_addr **ip6_daddr,
+ int encap_limit)
+{
+ if (tuple->tun_num)
+ return nf_flow_tunnel_ip6ip6_push(net, skb, tuple, ip6_daddr,
+ encap_limit);
+
+ return 0;
+}
+
static int nf_flow_encap_push(struct sk_buff *skb,
struct flow_offload_tuple *tuple)
{
@@ -914,7 +1006,7 @@ static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
struct nf_flowtable *flow_table,
struct flow_offload_tuple_rhash *tuplehash,
- struct sk_buff *skb)
+ struct sk_buff *skb, int encap_limit)
{
enum flow_offload_tuple_dir dir;
struct flow_offload *flow;
@@ -925,6 +1017,12 @@ static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
mtu = flow->tuplehash[dir].tuple.mtu + ctx->offset;
+ if (flow->tuplehash[!dir].tuple.tun_num) {
+ mtu -= sizeof(*ip6h);
+ if (encap_limit > 0)
+ mtu -= 8; /* encap limit option */
+ }
+
if (unlikely(nf_flow_exceeds_mtu(skb, mtu)))
return 0;
@@ -977,6 +1075,7 @@ unsigned int
nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
const struct nf_hook_state *state)
{
+ int encap_limit = IPV6_DEFAULT_TNL_ENCAP_LIMIT;
struct flow_offload_tuple_rhash *tuplehash;
struct nf_flowtable *flow_table = priv;
struct flow_offload_tuple *other_tuple;
@@ -995,7 +1094,8 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
if (tuplehash == NULL)
return NF_ACCEPT;
- ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb);
+ ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb,
+ encap_limit);
if (ret < 0)
return NF_DROP;
else if (ret == 0)
@@ -1014,6 +1114,10 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
other_tuple = &flow->tuplehash[!dir].tuple;
ip6_daddr = &other_tuple->src_v6;
+ if (nf_flow_tunnel_v6_push(state->net, skb, other_tuple,
+ &ip6_daddr, encap_limit) < 0)
+ return NF_DROP;
+
if (nf_flow_encap_push(skb, other_tuple) < 0)
return NF_DROP;
--
2.52.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 5/7] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
` (3 preceding siblings ...)
2026-01-29 10:54 ` [PATCH v2 net-next 4/7] netfilter: flowtable: Add IP6IP6 tx " Florian Westphal
@ 2026-01-29 10:54 ` Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 6/7] netfilter: xt_time: use is_leap_year() helper Florian Westphal
` (2 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2026-01-29 10:54 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Lorenzo Bianconi <lorenzo@kernel.org>
Similar to IPIP, introduce specific selftest for IP6IP6 flowtable SW
acceleration in nft_flowtable.sh
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
.../selftests/net/netfilter/nft_flowtable.sh | 62 ++++++++++++++++---
1 file changed, 53 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/nft_flowtable.sh b/tools/testing/selftests/net/netfilter/nft_flowtable.sh
index a68bc882fa4e..14d7f67715ed 100755
--- a/tools/testing/selftests/net/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/net/netfilter/nft_flowtable.sh
@@ -592,16 +592,28 @@ ip -net "$nsr1" link set tun0 up
ip -net "$nsr1" addr add 192.168.100.1/24 dev tun0
ip netns exec "$nsr1" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null
+ip -net "$nsr1" link add name tun6 type ip6tnl local fee1:2::1 remote fee1:2::2
+ip -net "$nsr1" link set tun6 up
+ip -net "$nsr1" addr add fee1:3::1/64 dev tun6 nodad
+
ip -net "$nsr2" link add name tun0 type ipip local 192.168.10.2 remote 192.168.10.1
ip -net "$nsr2" link set tun0 up
ip -net "$nsr2" addr add 192.168.100.2/24 dev tun0
ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null
+ip -net "$nsr2" link add name tun6 type ip6tnl local fee1:2::2 remote fee1:2::1
+ip -net "$nsr2" link set tun6 up
+ip -net "$nsr2" addr add fee1:3::2/64 dev tun6 nodad
+
ip -net "$nsr1" route change default via 192.168.100.2
ip -net "$nsr2" route change default via 192.168.100.1
+ip -6 -net "$nsr1" route change default via fee1:3::2
+ip -6 -net "$nsr2" route change default via fee1:3::1
ip -net "$ns2" route add default via 10.0.2.1
+ip -6 -net "$ns2" route add default via dead:2::1
ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun0 accept'
+ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun6 accept'
ip netns exec "$nsr1" nft -a insert rule inet filter forward \
'meta oif "veth0" tcp sport 12345 ct mark set 1 flow add @f1 counter name routed_repl accept'
@@ -611,28 +623,51 @@ if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IPIP tunnel"; then
ret=1
fi
+if test_tcp_forwarding "$ns1" "$ns2" 1 6 "[dead:2::99]" 12345; then
+ echo "PASS: flow offload for ns1/ns2 IP6IP6 tunnel"
+else
+ echo "FAIL: flow offload for ns1/ns2 with IP6IP6 tunnel" 1>&2
+ ip netns exec "$nsr1" nft list ruleset
+ ret=1
+fi
+
# Create vlan tagged devices for IPIP traffic.
ip -net "$nsr1" link add link veth1 name veth1.10 type vlan id 10
ip -net "$nsr1" link set veth1.10 up
ip -net "$nsr1" addr add 192.168.20.1/24 dev veth1.10
+ip -net "$nsr1" addr add fee1:4::1/64 dev veth1.10 nodad
ip netns exec "$nsr1" sysctl net.ipv4.conf.veth1/10.forwarding=1 > /dev/null
ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif veth1.10 accept'
-ip -net "$nsr1" link add name tun1 type ipip local 192.168.20.1 remote 192.168.20.2
-ip -net "$nsr1" link set tun1 up
-ip -net "$nsr1" addr add 192.168.200.1/24 dev tun1
+
+ip -net "$nsr1" link add name tun0.10 type ipip local 192.168.20.1 remote 192.168.20.2
+ip -net "$nsr1" link set tun0.10 up
+ip -net "$nsr1" addr add 192.168.200.1/24 dev tun0.10
ip -net "$nsr1" route change default via 192.168.200.2
-ip netns exec "$nsr1" sysctl net.ipv4.conf.tun1.forwarding=1 > /dev/null
-ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun1 accept'
+ip netns exec "$nsr1" sysctl net.ipv4.conf.tun0/10.forwarding=1 > /dev/null
+ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun0.10 accept'
+
+ip -net "$nsr1" link add name tun6.10 type ip6tnl local fee1:4::1 remote fee1:4::2
+ip -net "$nsr1" link set tun6.10 up
+ip -net "$nsr1" addr add fee1:5::1/64 dev tun6.10 nodad
+ip -6 -net "$nsr1" route change default via fee1:5::2
+ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun6.10 accept'
ip -net "$nsr2" link add link veth0 name veth0.10 type vlan id 10
ip -net "$nsr2" link set veth0.10 up
ip -net "$nsr2" addr add 192.168.20.2/24 dev veth0.10
+ip -net "$nsr2" addr add fee1:4::2/64 dev veth0.10 nodad
ip netns exec "$nsr2" sysctl net.ipv4.conf.veth0/10.forwarding=1 > /dev/null
-ip -net "$nsr2" link add name tun1 type ipip local 192.168.20.2 remote 192.168.20.1
-ip -net "$nsr2" link set tun1 up
-ip -net "$nsr2" addr add 192.168.200.2/24 dev tun1
+
+ip -net "$nsr2" link add name tun0.10 type ipip local 192.168.20.2 remote 192.168.20.1
+ip -net "$nsr2" link set tun0.10 up
+ip -net "$nsr2" addr add 192.168.200.2/24 dev tun0.10
ip -net "$nsr2" route change default via 192.168.200.1
-ip netns exec "$nsr2" sysctl net.ipv4.conf.tun1.forwarding=1 > /dev/null
+ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0/10.forwarding=1 > /dev/null
+
+ip -net "$nsr2" link add name tun6.10 type ip6tnl local fee1:4::2 remote fee1:4::1
+ip -net "$nsr2" link set tun6.10 up
+ip -net "$nsr2" addr add fee1:5::2/64 dev tun6.10 nodad
+ip -6 -net "$nsr2" route change default via fee1:5::1
if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IPIP tunnel over vlan"; then
echo "FAIL: flow offload for ns1/ns2 with IPIP tunnel over vlan" 1>&2
@@ -640,10 +675,19 @@ if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IPIP tunnel over vlan"; then
ret=1
fi
+if test_tcp_forwarding "$ns1" "$ns2" 1 6 "[dead:2::99]" 12345; then
+ echo "PASS: flow offload for ns1/ns2 IP6IP6 tunnel over vlan"
+else
+ echo "FAIL: flow offload for ns1/ns2 with IP6IP6 tunnel over vlan" 1>&2
+ ip netns exec "$nsr1" nft list ruleset
+ ret=1
+fi
+
# Restore the previous configuration
ip -net "$nsr1" route change default via 192.168.10.2
ip -net "$nsr2" route change default via 192.168.10.1
ip -net "$ns2" route del default via 10.0.2.1
+ip -6 -net "$ns2" route del default via dead:2::1
}
# Another test:
--
2.52.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 6/7] netfilter: xt_time: use is_leap_year() helper
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
` (4 preceding siblings ...)
2026-01-29 10:54 ` [PATCH v2 net-next 5/7] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest Florian Westphal
@ 2026-01-29 10:54 ` Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 7/7] netfilter: nfnetlink_queue: optimize verdict lookup with hash table Florian Westphal
2026-01-30 16:12 ` [PATCH v2 net-next 0/7] netfilter: updates for net-next Jakub Kicinski
7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2026-01-29 10:54 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Jinjie Ruan <ruanjinjie@huawei.com>
Use the is_leap_year() helper from rtc.h instead of
writing it by hand
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/xt_time.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/xt_time.c b/net/netfilter/xt_time.c
index 6aa12d0f54e2..00319d2a54da 100644
--- a/net/netfilter/xt_time.c
+++ b/net/netfilter/xt_time.c
@@ -14,6 +14,7 @@
#include <linux/ktime.h>
#include <linux/module.h>
+#include <linux/rtc.h>
#include <linux/skbuff.h>
#include <linux/types.h>
#include <linux/netfilter/x_tables.h>
@@ -64,11 +65,6 @@ static const u_int16_t days_since_epoch[] = {
3287, 2922, 2557, 2191, 1826, 1461, 1096, 730, 365, 0,
};
-static inline bool is_leap(unsigned int y)
-{
- return y % 4 == 0 && (y % 100 != 0 || y % 400 == 0);
-}
-
/*
* Each network packet has a (nano)seconds-since-the-epoch (SSTE) timestamp.
* Since we match against days and daytime, the SSTE value needs to be
@@ -138,7 +134,7 @@ static void localtime_3(struct xtm *r, time64_t time)
* (A different approach to use would be to subtract a monthlength
* from w repeatedly while counting.)
*/
- if (is_leap(year)) {
+ if (is_leap_year(year)) {
/* use days_since_leapyear[] in a leap year */
for (i = ARRAY_SIZE(days_since_leapyear) - 1;
i > 0 && days_since_leapyear[i] > w; --i)
--
2.52.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 net-next 7/7] netfilter: nfnetlink_queue: optimize verdict lookup with hash table
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
` (5 preceding siblings ...)
2026-01-29 10:54 ` [PATCH v2 net-next 6/7] netfilter: xt_time: use is_leap_year() helper Florian Westphal
@ 2026-01-29 10:54 ` Florian Westphal
2026-01-30 16:12 ` [PATCH v2 net-next 0/7] netfilter: updates for net-next Jakub Kicinski
7 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2026-01-29 10:54 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Scott Mitchell <scott.k.mitch1@gmail.com>
The current implementation uses a linear list to find queued packets by
ID when processing verdicts from userspace. With large queue depths and
out-of-order verdicting, this O(n) lookup becomes a significant
bottleneck, causing userspace verdict processing to dominate CPU time.
Replace the linear search with a hash table for O(1) average-case
packet lookup by ID. A global rhashtable spanning all network
namespaces attributes hash bucket memory to kernel but is subject to
fixed upper bound.
Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_queue.h | 3 +
net/netfilter/nfnetlink_queue.c | 146 ++++++++++++++++++++++++-------
2 files changed, 119 insertions(+), 30 deletions(-)
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 4aeffddb7586..e6803831d6af 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -6,11 +6,13 @@
#include <linux/ipv6.h>
#include <linux/jhash.h>
#include <linux/netfilter.h>
+#include <linux/rhashtable-types.h>
#include <linux/skbuff.h>
/* Each queued (to userspace) skbuff has one of these. */
struct nf_queue_entry {
struct list_head list;
+ struct rhash_head hash_node;
struct sk_buff *skb;
unsigned int id;
unsigned int hook_index; /* index in hook_entries->hook[] */
@@ -20,6 +22,7 @@ struct nf_queue_entry {
#endif
struct nf_hook_state state;
u16 size; /* sizeof(entry) + saved route keys */
+ u16 queue_num;
/* extra space to store route keys */
};
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 8fa0807973c9..671b52c652ef 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -30,6 +30,8 @@
#include <linux/netfilter/nf_conntrack_common.h>
#include <linux/list.h>
#include <linux/cgroup-defs.h>
+#include <linux/rhashtable.h>
+#include <linux/jhash.h>
#include <net/gso.h>
#include <net/sock.h>
#include <net/tcp_states.h>
@@ -47,6 +49,8 @@
#endif
#define NFQNL_QMAX_DEFAULT 1024
+#define NFQNL_HASH_MIN 1024
+#define NFQNL_HASH_MAX 1048576
/* We're using struct nlattr which has 16bit nla_len. Note that nla_len
* includes the header length. Thus, the maximum packet length that we
@@ -56,6 +60,26 @@
*/
#define NFQNL_MAX_COPY_RANGE (0xffff - NLA_HDRLEN)
+/* Composite key for packet lookup: (net, queue_num, packet_id) */
+struct nfqnl_packet_key {
+ possible_net_t net;
+ u32 packet_id;
+ u16 queue_num;
+} __aligned(sizeof(u32)); /* jhash2 requires 32-bit alignment */
+
+/* Global rhashtable - one for entire system, all netns */
+static struct rhashtable nfqnl_packet_map __read_mostly;
+
+/* Helper to initialize composite key */
+static inline void nfqnl_init_key(struct nfqnl_packet_key *key,
+ struct net *net, u32 packet_id, u16 queue_num)
+{
+ memset(key, 0, sizeof(*key));
+ write_pnet(&key->net, net);
+ key->packet_id = packet_id;
+ key->queue_num = queue_num;
+}
+
struct nfqnl_instance {
struct hlist_node hlist; /* global list of queues */
struct rcu_head rcu;
@@ -100,6 +124,39 @@ static inline u_int8_t instance_hashfn(u_int16_t queue_num)
return ((queue_num >> 8) ^ queue_num) % INSTANCE_BUCKETS;
}
+/* Extract composite key from nf_queue_entry for hashing */
+static u32 nfqnl_packet_obj_hashfn(const void *data, u32 len, u32 seed)
+{
+ const struct nf_queue_entry *entry = data;
+ struct nfqnl_packet_key key;
+
+ nfqnl_init_key(&key, entry->state.net, entry->id, entry->queue_num);
+
+ return jhash2((u32 *)&key, sizeof(key) / sizeof(u32), seed);
+}
+
+/* Compare stack-allocated key against entry */
+static int nfqnl_packet_obj_cmpfn(struct rhashtable_compare_arg *arg,
+ const void *obj)
+{
+ const struct nfqnl_packet_key *key = arg->key;
+ const struct nf_queue_entry *entry = obj;
+
+ return !net_eq(entry->state.net, read_pnet(&key->net)) ||
+ entry->queue_num != key->queue_num ||
+ entry->id != key->packet_id;
+}
+
+static const struct rhashtable_params nfqnl_rhashtable_params = {
+ .head_offset = offsetof(struct nf_queue_entry, hash_node),
+ .key_len = sizeof(struct nfqnl_packet_key),
+ .obj_hashfn = nfqnl_packet_obj_hashfn,
+ .obj_cmpfn = nfqnl_packet_obj_cmpfn,
+ .automatic_shrinking = true,
+ .min_size = NFQNL_HASH_MIN,
+ .max_size = NFQNL_HASH_MAX,
+};
+
static struct nfqnl_instance *
instance_lookup(struct nfnl_queue_net *q, u_int16_t queue_num)
{
@@ -188,33 +245,45 @@ instance_destroy(struct nfnl_queue_net *q, struct nfqnl_instance *inst)
spin_unlock(&q->instances_lock);
}
-static inline void
+static int
__enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
{
- list_add_tail(&entry->list, &queue->queue_list);
- queue->queue_total++;
+ int err;
+
+ entry->queue_num = queue->queue_num;
+
+ err = rhashtable_insert_fast(&nfqnl_packet_map, &entry->hash_node,
+ nfqnl_rhashtable_params);
+ if (unlikely(err))
+ return err;
+
+ list_add_tail(&entry->list, &queue->queue_list);
+ queue->queue_total++;
+
+ return 0;
}
static void
__dequeue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
{
+ rhashtable_remove_fast(&nfqnl_packet_map, &entry->hash_node,
+ nfqnl_rhashtable_params);
list_del(&entry->list);
queue->queue_total--;
}
static struct nf_queue_entry *
-find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
+find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id,
+ struct net *net)
{
- struct nf_queue_entry *entry = NULL, *i;
+ struct nfqnl_packet_key key;
+ struct nf_queue_entry *entry;
- spin_lock_bh(&queue->lock);
+ nfqnl_init_key(&key, net, id, queue->queue_num);
- list_for_each_entry(i, &queue->queue_list, list) {
- if (i->id == id) {
- entry = i;
- break;
- }
- }
+ spin_lock_bh(&queue->lock);
+ entry = rhashtable_lookup_fast(&nfqnl_packet_map, &key,
+ nfqnl_rhashtable_params);
if (entry)
__dequeue_entry(queue, entry);
@@ -404,8 +473,7 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
spin_lock_bh(&queue->lock);
list_for_each_entry_safe(entry, next, &queue->queue_list, list) {
if (!cmpfn || cmpfn(entry, data)) {
- list_del(&entry->list);
- queue->queue_total--;
+ __dequeue_entry(queue, entry);
nfqnl_reinject(entry, NF_DROP);
}
}
@@ -885,23 +953,23 @@ __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
if (nf_ct_drop_unconfirmed(entry))
goto err_out_free_nskb;
- if (queue->queue_total >= queue->queue_maxlen) {
- if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
- failopen = 1;
- err = 0;
- } else {
- queue->queue_dropped++;
- net_warn_ratelimited("nf_queue: full at %d entries, dropping packets(s)\n",
- queue->queue_total);
- }
- goto err_out_free_nskb;
- }
+ if (queue->queue_total >= queue->queue_maxlen)
+ goto err_out_queue_drop;
+
entry->id = ++queue->id_sequence;
*packet_id_ptr = htonl(entry->id);
+ /* Insert into hash BEFORE unicast. If failure don't send to userspace. */
+ err = __enqueue_entry(queue, entry);
+ if (unlikely(err))
+ goto err_out_queue_drop;
+
/* nfnetlink_unicast will either free the nskb or add it to a socket */
err = nfnetlink_unicast(nskb, net, queue->peer_portid);
if (err < 0) {
+ /* Unicast failed - remove entry we just inserted */
+ __dequeue_entry(queue, entry);
+
if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
failopen = 1;
err = 0;
@@ -911,11 +979,22 @@ __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
goto err_out_unlock;
}
- __enqueue_entry(queue, entry);
-
spin_unlock_bh(&queue->lock);
return 0;
+err_out_queue_drop:
+ if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
+ failopen = 1;
+ err = 0;
+ } else {
+ queue->queue_dropped++;
+
+ if (queue->queue_total >= queue->queue_maxlen)
+ net_warn_ratelimited("nf_queue: full at %d entries, dropping packets(s)\n",
+ queue->queue_total);
+ else
+ net_warn_ratelimited("nf_queue: hash insert failed: %d\n", err);
+ }
err_out_free_nskb:
kfree_skb(nskb);
err_out_unlock:
@@ -1427,7 +1506,7 @@ static int nfqnl_recv_verdict(struct sk_buff *skb, const struct nfnl_info *info,
verdict = ntohl(vhdr->verdict);
- entry = find_dequeue_entry(queue, ntohl(vhdr->id));
+ entry = find_dequeue_entry(queue, ntohl(vhdr->id), info->net);
if (entry == NULL)
return -ENOENT;
@@ -1774,10 +1853,14 @@ static int __init nfnetlink_queue_init(void)
{
int status;
+ status = rhashtable_init(&nfqnl_packet_map, &nfqnl_rhashtable_params);
+ if (status < 0)
+ return status;
+
status = register_pernet_subsys(&nfnl_queue_net_ops);
if (status < 0) {
pr_err("failed to register pernet ops\n");
- goto out;
+ goto cleanup_rhashtable;
}
netlink_register_notifier(&nfqnl_rtnl_notifier);
@@ -1802,7 +1885,8 @@ static int __init nfnetlink_queue_init(void)
cleanup_netlink_notifier:
netlink_unregister_notifier(&nfqnl_rtnl_notifier);
unregister_pernet_subsys(&nfnl_queue_net_ops);
-out:
+cleanup_rhashtable:
+ rhashtable_destroy(&nfqnl_packet_map);
return status;
}
@@ -1814,6 +1898,8 @@ static void __exit nfnetlink_queue_fini(void)
netlink_unregister_notifier(&nfqnl_rtnl_notifier);
unregister_pernet_subsys(&nfnl_queue_net_ops);
+ rhashtable_destroy(&nfqnl_packet_map);
+
rcu_barrier(); /* Wait for completion of call_rcu()'s */
}
--
2.52.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 1/7] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature
2026-01-29 10:54 ` [PATCH v2 net-next 1/7] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
@ 2026-01-29 14:10 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 16+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-01-29 14:10 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, pabeni, davem, edumazet, kuba, netfilter-devel, pablo
Hello:
This series was applied to netdev/net-next.git (main)
by Florian Westphal <fw@strlen.de>:
On Thu, 29 Jan 2026 11:54:21 +0100 you wrote:
> From: Lorenzo Bianconi <lorenzo@kernel.org>
>
> Rely on nf_flowtable_ctx struct pointer in nf_flow_ip4_tunnel_proto and
> nf_flow_skb_encap_protocol routine signature. This is a preliminary patch
> to introduce IP6IP6 flowtable acceleration since nf_flowtable_ctx will
> be used to store IP6IP6 tunnel info.
>
> [...]
Here is the summary with links:
- [v2,net-next,1/7] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature
https://git.kernel.org/netdev/net-next/c/baa501b12a48
- [v2,net-next,2/7] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct
https://git.kernel.org/netdev/net-next/c/c64436daf675
- [v2,net-next,3/7] netfilter: flowtable: Add IP6IP6 rx sw acceleration
https://git.kernel.org/netdev/net-next/c/d98103575dcd
- [v2,net-next,4/7] netfilter: flowtable: Add IP6IP6 tx sw acceleration
https://git.kernel.org/netdev/net-next/c/93cf357fa797
- [v2,net-next,5/7] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest
https://git.kernel.org/netdev/net-next/c/5e5180352193
- [v2,net-next,6/7] netfilter: xt_time: use is_leap_year() helper
https://git.kernel.org/netdev/net-next/c/77fd1b4c6e08
- [v2,net-next,7/7] netfilter: nfnetlink_queue: optimize verdict lookup with hash table
https://git.kernel.org/netdev/net-next/c/e19079adcd26
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 0/7] netfilter: updates for net-next
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
` (6 preceding siblings ...)
2026-01-29 10:54 ` [PATCH v2 net-next 7/7] netfilter: nfnetlink_queue: optimize verdict lookup with hash table Florian Westphal
@ 2026-01-30 16:12 ` Jakub Kicinski
2026-01-30 19:09 ` Florian Westphal
7 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2026-01-30 16:12 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
netfilter-devel, pablo
On Thu, 29 Jan 2026 11:54:20 +0100 Florian Westphal wrote:
> v2: discard buggy nfqueue patch, no other changes.
>
> The following patchset contains Netfilter updates for *net-next*:
>
> Patches 1 to 4 add IP6IP6 tunneling acceleration to the flowtable
> infrastructure. Patch 5 extends test coverage for this.
> From Lorenzo Bianconi.
>
> Patch 6 removes a duplicated helper from xt_time extension, we can
> use an existing helper for this, from Jinjie Ruan.
>
> Patch 7 adds an rhashtable to nfnetink_queue to speed up out-of-order
> verdict processing. Before this list walk was required due to in-order
> design assumption.
Hi Florian, some more KASAN today:
https://netdev-ctrl.bots.linux.dev/logs/vmksft/nf-dbg/results/496421/vm-crash-thr0-0
[ 1144.170509][ T12] ==================================================================
[ 1144.170759][ T12] BUG: KASAN: slab-use-after-free in idr_for_each+0x1c1/0x1f0
[ 1144.170922][ T12] Read of size 8 at addr ff11000012a16a70 by task kworker/u16:0/12
[ 1144.171079][ T12]
[ 1144.171133][ T12] CPU: 1 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted 6.19.0-rc7-virtme #1 PREEMPT(full)
[ 1144.171137][ T12] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1144.171139][ T12] Workqueue: netns cleanup_net
[ 1144.171145][ T12] Call Trace:
[ 1144.171147][ T12] <TASK>
[ 1144.171149][ T12] dump_stack_lvl+0x6f/0xa0
[ 1144.171154][ T12] print_address_description.constprop.0+0x6e/0x300
[ 1144.171159][ T12] print_report+0xfc/0x1fb
[ 1144.171161][ T12] ? idr_for_each+0x1c1/0x1f0
[ 1144.171163][ T12] ? __virt_addr_valid+0x1da/0x430
[ 1144.171167][ T12] ? idr_for_each+0x1c1/0x1f0
[ 1144.171168][ T12] kasan_report+0xe8/0x120
[ 1144.171172][ T12] ? idr_for_each+0x1c1/0x1f0
[ 1144.171174][ T12] ? rtnl_net_notifyid+0x1a0/0x1a0
[ 1144.171176][ T12] idr_for_each+0x1c1/0x1f0
[ 1144.171178][ T12] ? idr_find+0x70/0x70
[ 1144.171180][ T12] ? __lock_release.isra.0+0x59/0x170
[ 1144.171184][ T12] ? __up_write+0x283/0x4f0
[ 1144.171185][ T12] ? cleanup_net+0x1f2/0x810
[ 1144.171187][ T12] cleanup_net+0x260/0x810
[ 1144.171188][ T12] ? lock_acquire.part.0+0xbc/0x260
[ 1144.171190][ T12] ? process_one_work+0xd16/0x1390
[ 1144.171193][ T12] ? net_passive_dec+0x190/0x190
[ 1144.171194][ T12] ? rcu_is_watching+0x15/0xd0
[ 1144.171197][ T12] ? process_one_work+0xd16/0x1390
[ 1144.171198][ T12] ? lock_acquire+0x10a/0x150
[ 1144.171199][ T12] ? rcu_is_watching+0x15/0xd0
[ 1144.171201][ T12] process_one_work+0xd57/0x1390
[ 1144.171204][ T12] ? pwq_dec_nr_in_flight+0x700/0x700
[ 1144.171205][ T12] ? lock_acquire.part.0+0xbc/0x260
[ 1144.171208][ T12] ? assign_work+0x152/0x380
[ 1144.171209][ T12] worker_thread+0x4d6/0xd40
[ 1144.171212][ T12] ? process_one_work+0x1390/0x1390
[ 1144.171213][ T12] kthread+0x355/0x5b0
[ 1144.171215][ T12] ? kthread_is_per_cpu+0xe0/0xe0
[ 1144.171217][ T12] ? __lock_release.isra.0+0x59/0x170
[ 1144.171219][ T12] ? rcu_is_watching+0x15/0xd0
[ 1144.171220][ T12] ? kthread_is_per_cpu+0xe0/0xe0
[ 1144.171221][ T12] ret_from_fork+0x3fb/0x510
[ 1144.171225][ T12] ? arch_exit_to_user_mode_prepare.isra.0+0x140/0x140
[ 1144.171228][ T12] ? __switch_to+0x53c/0xd00
[ 1144.171230][ T12] ? kthread_is_per_cpu+0xe0/0xe0
[ 1144.171231][ T12] ret_from_fork_asm+0x11/0x20
[ 1144.171235][ T12] </TASK>
[ 1144.171236][ T12]
[ 1144.175222][ T12] Allocated by task 32108:
[ 1144.175317][ T12] kasan_save_stack+0x30/0x50
[ 1144.175407][ T12] kasan_save_track+0x14/0x30
[ 1144.175493][ T12] __kasan_slab_alloc+0x5f/0x70
[ 1144.175580][ T12] kmem_cache_alloc_noprof+0x226/0x6e0
[ 1144.175675][ T12] radix_tree_node_alloc.constprop.0+0x176/0x340
[ 1144.175790][ T12] idr_get_free+0x326/0x840
[ 1144.175878][ T12] idr_alloc_u32+0x14a/0x2e0
[ 1144.175966][ T12] idr_alloc+0x7d/0xc0
[ 1144.176033][ T12] peernet2id_alloc+0x22c/0x340
[ 1144.176122][ T12] __dev_change_net_namespace+0x8e5/0x1980
[ 1144.176232][ T12] do_setlink.isra.0+0x211/0x25d0
[ 1144.176325][ T12] rtnl_newlink+0x75c/0xe90
[ 1144.176416][ T12] rtnetlink_rcv_msg+0x6fe/0xb90
[ 1144.176503][ T12] netlink_rcv_skb+0x123/0x380
[ 1144.176590][ T12] netlink_unicast+0x4a3/0x770
[ 1144.176678][ T12] netlink_sendmsg+0x735/0xc60
[ 1144.176767][ T12] ____sys_sendmsg+0x419/0x850
[ 1144.176852][ T12] ___sys_sendmsg+0xfd/0x180
[ 1144.176943][ T12] __sys_sendmsg+0x124/0x1c0
[ 1144.177031][ T12] do_syscall_64+0xbd/0xfc0
[ 1144.177118][ T12] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 1144.177225][ T12]
[ 1144.177268][ T12] Freed by task 12:
[ 1144.177335][ T12] kasan_save_stack+0x30/0x50
[ 1144.177422][ T12] kasan_save_track+0x14/0x30
[ 1144.177507][ T12] kasan_save_free_info+0x3b/0x60
[ 1144.177599][ T12] __kasan_slab_free+0x43/0x70
[ 1144.177690][ T12] kmem_cache_free+0xfe/0x5e0
[ 1144.177780][ T12] rcu_do_batch+0x28b/0xfe0
[ 1144.177873][ T12] rcu_core+0x2b4/0x5f0
[ 1144.177944][ T12] handle_softirqs+0x1d7/0x840
[ 1144.178030][ T12] irq_exit_rcu+0xa2/0xf0
[ 1144.178095][ T12] sysvec_apic_timer_interrupt+0x9d/0xe0
[ 1144.178188][ T12] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 1144.178295][ T12]
[ 1144.178340][ T12] Last potentially related work creation:
[ 1144.178425][ T12] kasan_save_stack+0x30/0x50
[ 1144.178513][ T12] kasan_record_aux_stack+0x8c/0xa0
[ 1144.178599][ T12] __call_rcu_common.constprop.0+0xa6/0xa00
[ 1144.178705][ T12] delete_node+0x198/0x810
[ 1144.178792][ T12] radix_tree_delete_item+0xc5/0x1b0
[ 1144.178889][ T12] unhash_nsid_callback+0xb4/0x100
[ 1144.178972][ T12] idr_for_each+0x108/0x1f0
[ 1144.179057][ T12] cleanup_net+0x260/0x810
[ 1144.179141][ T12] process_one_work+0xd57/0x1390
[ 1144.179224][ T12] worker_thread+0x4d6/0xd40
[ 1144.179308][ T12] kthread+0x355/0x5b0
[ 1144.179372][ T12] ret_from_fork+0x3fb/0x510
[ 1144.179462][ T12] ret_from_fork_asm+0x11/0x20
[ 1144.179552][ T12]
[ 1144.179597][ T12] The buggy address belongs to the object at ff11000012a16a38
[ 1144.179597][ T12] which belongs to the cache radix_tree_node of size 576
[ 1144.179825][ T12] The buggy address is located 56 bytes inside of
[ 1144.179825][ T12] freed 576-byte region [ff11000012a16a38, ff11000012a16c78)
[ 1144.180038][ T12]
[ 1144.180085][ T12] The buggy address belongs to the physical page:
[ 1144.180191][ T12] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xff11000012a17848 pfn:0x12a14
[ 1144.180374][ T12] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 1144.180506][ T12] flags: 0x80000000000240(workingset|head|node=0|zone=1)
[ 1144.180617][ T12] page_type: f5(slab)
[ 1144.180689][ T12] raw: 0080000000000240 ff11000001043700 ffd40000004ab810 ffd40000008b6d10
[ 1144.180850][ T12] raw: ff11000012a17848 000000000016000e 00000000f5000000 0000000000000000
[ 1144.180998][ T12] head: 0080000000000240 ff11000001043700 ffd40000004ab810 ffd40000008b6d10
[ 1144.181151][ T12] head: ff11000012a17848 000000000016000e 00000000f5000000 0000000000000000
[ 1144.181305][ T12] head: 0080000000000002 ffd40000004a8501 00000000ffffffff 00000000ffffffff
[ 1144.181459][ T12] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1144.181612][ T12] page dumped because: kasan: bad access detected
[ 1144.181723][ T12]
[ 1144.181770][ T12] Memory state around the buggy address:
[ 1144.181853][ T12] ff11000012a16900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1144.181993][ T12] ff11000012a16980: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
[ 1144.182116][ T12] >ff11000012a16a00: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb
[ 1144.182238][ T12] ^
[ 1144.182362][ T12] ff11000012a16a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1144.182497][ T12] ff11000012a16b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1144.182618][ T12] ==================================================================
[ 1144.182760][ T12] Disabling lock debugging due to kernel taint
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 0/7] netfilter: updates for net-next
2026-01-30 16:12 ` [PATCH v2 net-next 0/7] netfilter: updates for net-next Jakub Kicinski
@ 2026-01-30 19:09 ` Florian Westphal
2026-01-30 20:23 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2026-01-30 19:09 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
netfilter-devel, pablo
Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 29 Jan 2026 11:54:20 +0100 Florian Westphal wrote:
> > v2: discard buggy nfqueue patch, no other changes.
> >
> > The following patchset contains Netfilter updates for *net-next*:
> >
> > Patches 1 to 4 add IP6IP6 tunneling acceleration to the flowtable
> > infrastructure. Patch 5 extends test coverage for this.
> > From Lorenzo Bianconi.
> >
> > Patch 6 removes a duplicated helper from xt_time extension, we can
> > use an existing helper for this, from Jinjie Ruan.
> >
> > Patch 7 adds an rhashtable to nfnetink_queue to speed up out-of-order
> > verdict processing. Before this list walk was required due to in-order
> > design assumption.
>
> Hi Florian, some more KASAN today:
>
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/nf-dbg/results/496421/vm-crash-thr0-0
> [ 1144.170509][ T12] ==================================================================
> [ 1144.170759][ T12] BUG: KASAN: slab-use-after-free in idr_for_each+0x1c1/0x1f0
> [ 1144.170922][ T12] Read of size 8 at addr ff11000012a16a70 by task kworker/u16:0/12
> [ 1144.171079][ T12]
> [ 1144.171133][ T12] CPU: 1 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted 6.19.0-rc7-virtme #1 PREEMPT(full)
> [ 1144.171137][ T12] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 1144.171139][ T12] Workqueue: netns cleanup_net
> [ 1144.171145][ T12] Call Trace:
> [ 1144.171147][ T12] <TASK>
> [ 1144.171149][ T12] dump_stack_lvl+0x6f/0xa0
> [ 1144.171154][ T12] print_address_description.constprop.0+0x6e/0x300
> [ 1144.171159][ T12] print_report+0xfc/0x1fb
> [ 1144.171161][ T12] ? idr_for_each+0x1c1/0x1f0
> [ 1144.171163][ T12] ? __virt_addr_valid+0x1da/0x430
> [ 1144.171167][ T12] ? idr_for_each+0x1c1/0x1f0
> [ 1144.171168][ T12] kasan_report+0xe8/0x120
Sigh. Doesn't ring a bell, I will have a look.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 0/7] netfilter: updates for net-next
2026-01-30 19:09 ` Florian Westphal
@ 2026-01-30 20:23 ` Eric Dumazet
2026-01-31 1:01 ` Jakub Kicinski
2026-01-31 21:00 ` Florian Westphal
0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2026-01-30 20:23 UTC (permalink / raw)
To: Florian Westphal
Cc: Jakub Kicinski, netdev, Paolo Abeni, David S. Miller,
netfilter-devel, pablo
On Fri, Jan 30, 2026 at 8:09 PM Florian Westphal <fw@strlen.de> wrote:
>
> Jakub Kicinski <kuba@kernel.org> wrote:
> > On Thu, 29 Jan 2026 11:54:20 +0100 Florian Westphal wrote:
> > > v2: discard buggy nfqueue patch, no other changes.
> > >
> > > The following patchset contains Netfilter updates for *net-next*:
> > >
> > > Patches 1 to 4 add IP6IP6 tunneling acceleration to the flowtable
> > > infrastructure. Patch 5 extends test coverage for this.
> > > From Lorenzo Bianconi.
> > >
> > > Patch 6 removes a duplicated helper from xt_time extension, we can
> > > use an existing helper for this, from Jinjie Ruan.
> > >
> > > Patch 7 adds an rhashtable to nfnetink_queue to speed up out-of-order
> > > verdict processing. Before this list walk was required due to in-order
> > > design assumption.
> >
> > Hi Florian, some more KASAN today:
> >
> > https://netdev-ctrl.bots.linux.dev/logs/vmksft/nf-dbg/results/496421/vm-crash-thr0-0
>
> > [ 1144.170509][ T12] ==================================================================
> > [ 1144.170759][ T12] BUG: KASAN: slab-use-after-free in idr_for_each+0x1c1/0x1f0
> > [ 1144.170922][ T12] Read of size 8 at addr ff11000012a16a70 by task kworker/u16:0/12
> > [ 1144.171079][ T12]
> > [ 1144.171133][ T12] CPU: 1 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted 6.19.0-rc7-virtme #1 PREEMPT(full)
> > [ 1144.171137][ T12] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [ 1144.171139][ T12] Workqueue: netns cleanup_net
> > [ 1144.171145][ T12] Call Trace:
> > [ 1144.171147][ T12] <TASK>
> > [ 1144.171149][ T12] dump_stack_lvl+0x6f/0xa0
> > [ 1144.171154][ T12] print_address_description.constprop.0+0x6e/0x300
> > [ 1144.171159][ T12] print_report+0xfc/0x1fb
> > [ 1144.171161][ T12] ? idr_for_each+0x1c1/0x1f0
> > [ 1144.171163][ T12] ? __virt_addr_valid+0x1da/0x430
> > [ 1144.171167][ T12] ? idr_for_each+0x1c1/0x1f0
> > [ 1144.171168][ T12] kasan_report+0xe8/0x120
>
> Sigh. Doesn't ring a bell, I will have a look.
Could this be related to "netns: optimize netns cleaning by batching
unhash_nsid calls" ?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 0/7] netfilter: updates for net-next
2026-01-30 20:23 ` Eric Dumazet
@ 2026-01-31 1:01 ` Jakub Kicinski
2026-01-31 21:00 ` Florian Westphal
1 sibling, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2026-01-31 1:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: Florian Westphal, netdev, Paolo Abeni, David S. Miller,
netfilter-devel, pablo
On Fri, 30 Jan 2026 21:23:23 +0100 Eric Dumazet wrote:
> > Sigh. Doesn't ring a bell, I will have a look.
>
> Could this be related to "netns: optimize netns cleaning by batching
> unhash_nsid calls" ?
Ah yes, that makes more sense. Thanks!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 0/7] netfilter: updates for net-next
2026-01-30 20:23 ` Eric Dumazet
2026-01-31 1:01 ` Jakub Kicinski
@ 2026-01-31 21:00 ` Florian Westphal
2026-01-31 21:17 ` Jakub Kicinski
1 sibling, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2026-01-31 21:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jakub Kicinski, netdev, Paolo Abeni, David S. Miller,
netfilter-devel, pablo
Eric Dumazet <edumazet@google.com> wrote:
> > > Hi Florian, some more KASAN today:
> > >
> > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/nf-dbg/results/496421/vm-crash-thr0-0
> >
> > > [ 1144.170509][ T12] ==================================================================
> > > [ 1144.170759][ T12] BUG: KASAN: slab-use-after-free in idr_for_each+0x1c1/0x1f0
> > > [ 1144.170922][ T12] Read of size 8 at addr ff11000012a16a70 by task kworker/u16:0/12
> > > [ 1144.171079][ T12]
> > > [ 1144.171133][ T12] CPU: 1 UID: 0 PID: 12 Comm: kworker/u16:0 Not tainted 6.19.0-rc7-virtme #1 PREEMPT(full)
> > > [ 1144.171137][ T12] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > [ 1144.171139][ T12] Workqueue: netns cleanup_net
> > > [ 1144.171145][ T12] Call Trace:
> > > [ 1144.171147][ T12] <TASK>
> > > [ 1144.171149][ T12] dump_stack_lvl+0x6f/0xa0
> > > [ 1144.171154][ T12] print_address_description.constprop.0+0x6e/0x300
> > > [ 1144.171159][ T12] print_report+0xfc/0x1fb
> > > [ 1144.171161][ T12] ? idr_for_each+0x1c1/0x1f0
> > > [ 1144.171163][ T12] ? __virt_addr_valid+0x1da/0x430
> > > [ 1144.171167][ T12] ? idr_for_each+0x1c1/0x1f0
> > > [ 1144.171168][ T12] kasan_report+0xe8/0x120
> >
> > Sigh. Doesn't ring a bell, I will have a look.
>
> Could this be related to "netns: optimize netns cleaning by batching
> unhash_nsid calls" ?
Thanks Eric, that seems plausible.
Did not yet have much luck with reproducing this so far, I will
look at this in more detail lon monday.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 net-next 0/7] netfilter: updates for net-next
2026-01-31 21:00 ` Florian Westphal
@ 2026-01-31 21:17 ` Jakub Kicinski
0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2026-01-31 21:17 UTC (permalink / raw)
To: Florian Westphal
Cc: Eric Dumazet, netdev, Paolo Abeni, David S. Miller,
netfilter-devel, pablo
On Sat, 31 Jan 2026 22:00:03 +0100 Florian Westphal wrote:
> Eric Dumazet <edumazet@google.com> wrote:
> > > Sigh. Doesn't ring a bell, I will have a look.
> >
> > Could this be related to "netns: optimize netns cleaning by batching
> > unhash_nsid calls" ?
>
> Thanks Eric, that seems plausible.
>
> Did not yet have much luck with reproducing this so far, I will
> look at this in more detail lon monday.
To be clear -- the patch Eric pointed out is _not_ merged yet.
It was pending in the test branch but it's not in net-next.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-01-31 21:17 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-29 10:54 [PATCH v2 net-next 0/7] netfilter: updates for net-next Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 1/7] netfilter: Add ctx pointer in nf_flow_skb_encap_protocol/nf_flow_ip4_tunnel_proto signature Florian Westphal
2026-01-29 14:10 ` patchwork-bot+netdevbpf
2026-01-29 10:54 ` [PATCH v2 net-next 2/7] netfilter: Introduce tunnel metadata info in nf_flowtable_ctx struct Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 3/7] netfilter: flowtable: Add IP6IP6 rx sw acceleration Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 4/7] netfilter: flowtable: Add IP6IP6 tx " Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 5/7] selftests: netfilter: nft_flowtable.sh: Add IP6IP6 flowtable selftest Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 6/7] netfilter: xt_time: use is_leap_year() helper Florian Westphal
2026-01-29 10:54 ` [PATCH v2 net-next 7/7] netfilter: nfnetlink_queue: optimize verdict lookup with hash table Florian Westphal
2026-01-30 16:12 ` [PATCH v2 net-next 0/7] netfilter: updates for net-next Jakub Kicinski
2026-01-30 19:09 ` Florian Westphal
2026-01-30 20:23 ` Eric Dumazet
2026-01-31 1:01 ` Jakub Kicinski
2026-01-31 21:00 ` Florian Westphal
2026-01-31 21:17 ` Jakub Kicinski
-- strict thread matches above, loose matches on Subject: below --
2025-09-02 13:35 Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox