* [PATCH nf-next 0/4] Add IPv4 over IPv6 flowtable SW acceleration
@ 2026-05-05 14:49 Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 1/4] net: netfilter: Add ether_type to net_device_path_ctx Lorenzo Bianconi
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Lorenzo Bianconi @ 2026-05-05 14:49 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Felix Fietkau, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, David Ahern,
Ido Schimmel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
Shuah Khan
Cc: linux-arm-kernel, linux-mediatek, netdev, netfilter-devel,
coreteam, linux-kselftest, Lorenzo Bianconi
Similar to IPIP and IP6I6 tunnels, introduce sw acceleration for IPv4 over
IPv6 tunnels in the netfilter flowtable infrastructure.
---
Lorenzo Bianconi (4):
net: netfilter: Add ether_type to net_device_path_ctx
net: netfilter: Add encap_proto to flow_offload_tunnel
net: netfilter: Add IPv4 over IPv6 tunnel flowtable acceleration
selftests: netfilter: nft_flowtable.sh: Add IPv4 over IPv6 flowtable selftest
drivers/net/ethernet/airoha/airoha_ppe.c | 14 ++-
drivers/net/ethernet/mediatek/mtk_ppe_offload.c | 13 ++-
include/linux/netdevice.h | 5 +-
include/net/netfilter/nf_flow_table.h | 1 +
net/core/dev.c | 6 +-
net/ipv4/ipip.c | 1 +
net/ipv6/ip6_tunnel.c | 6 +-
net/netfilter/nf_flow_table_core.c | 14 ++-
net/netfilter/nf_flow_table_ip.c | 129 ++++++++++++++++-----
net/netfilter/nf_flow_table_path.c | 16 +--
.../selftests/net/netfilter/nft_flowtable.sh | 26 +++++
11 files changed, 174 insertions(+), 57 deletions(-)
---
base-commit: c1e5127b577c6b88fa48e532616932ae978528d5
change-id: 20260505-b4-flowtable-sw-accel-ip6ip-7101034cd147
Best regards,
--
Lorenzo Bianconi <lorenzo@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH nf-next 1/4] net: netfilter: Add ether_type to net_device_path_ctx
2026-05-05 14:49 [PATCH nf-next 0/4] Add IPv4 over IPv6 flowtable SW acceleration Lorenzo Bianconi
@ 2026-05-05 14:49 ` Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 2/4] net: netfilter: Add encap_proto to flow_offload_tunnel Lorenzo Bianconi
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Lorenzo Bianconi @ 2026-05-05 14:49 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Felix Fietkau, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, David Ahern,
Ido Schimmel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
Shuah Khan
Cc: linux-arm-kernel, linux-mediatek, netdev, netfilter-devel,
coreteam, linux-kselftest, Lorenzo Bianconi
Add an ether_type field to struct net_device_path_ctx to allow IPv6
tunnel drivers to select the appropriate L3 protocol based on the
encapsulated traffic.
Update the airoha and mtk Ethernet drivers to use the new
dev_fill_forward_path() signature.
This is a preliminary patch to enable sw flowtable acceleration for
IPv4 over IPv6 tunnels.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
drivers/net/ethernet/airoha/airoha_ppe.c | 14 +++++++++-----
drivers/net/ethernet/mediatek/mtk_ppe_offload.c | 13 ++++++++-----
include/linux/netdevice.h | 4 +++-
net/core/dev.c | 6 ++++--
net/ipv6/ip6_tunnel.c | 5 ++++-
net/netfilter/nf_flow_table_path.c | 8 +++++---
6 files changed, 33 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c
index 26da519236bf..c5eccb3a43a1 100644
--- a/drivers/net/ethernet/airoha/airoha_ppe.c
+++ b/drivers/net/ethernet/airoha/airoha_ppe.c
@@ -245,7 +245,8 @@ static int airoha_ppe_flow_mangle_ipv4(const struct flow_action_entry *act,
return 0;
}
-static int airoha_ppe_get_wdma_info(struct net_device *dev, const u8 *addr,
+static int airoha_ppe_get_wdma_info(struct net_device *dev,
+ const u8 *addr, __be16 ether_type,
struct airoha_wdma_info *info)
{
struct net_device_path_stack stack;
@@ -256,7 +257,7 @@ static int airoha_ppe_get_wdma_info(struct net_device *dev, const u8 *addr,
return -ENODEV;
rcu_read_lock();
- err = dev_fill_forward_path(dev, addr, &stack);
+ err = dev_fill_forward_path(dev, addr, ether_type, &stack);
rcu_read_unlock();
if (err)
return err;
@@ -300,7 +301,7 @@ static int airoha_ppe_foe_entry_prepare(struct airoha_eth *eth,
struct airoha_foe_entry *hwe,
struct net_device *dev, int type,
struct airoha_flow_data *data,
- int l4proto)
+ __be16 ether_type, int l4proto)
{
u32 qdata = FIELD_PREP(AIROHA_FOE_SHAPER_ID, 0x7f), ports_pad, val;
int wlan_etype = -EINVAL, dsa_port = airoha_get_dsa_port(&dev);
@@ -322,7 +323,8 @@ static int airoha_ppe_foe_entry_prepare(struct airoha_eth *eth,
if (dev) {
struct airoha_wdma_info info = {};
- if (!airoha_ppe_get_wdma_info(dev, data->eth.h_dest, &info)) {
+ if (!airoha_ppe_get_wdma_info(dev, data->eth.h_dest,
+ ether_type, &info)) {
val |= FIELD_PREP(AIROHA_FOE_IB2_NBQ, info.idx) |
FIELD_PREP(AIROHA_FOE_IB2_PSE_PORT,
FE_PSE_PORT_CDM4);
@@ -1047,6 +1049,7 @@ static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
struct flow_action_entry *act;
struct airoha_foe_entry hwe;
int err, i, offload_type;
+ __be16 ether_type = 0;
u16 addr_type = 0;
u8 l4proto = 0;
@@ -1073,6 +1076,7 @@ static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
struct flow_match_basic match;
flow_rule_match_basic(rule, &match);
+ ether_type = match.key->n_proto;
l4proto = match.key->ip_proto;
} else {
return -EOPNOTSUPP;
@@ -1143,7 +1147,7 @@ static int airoha_ppe_flow_offload_replace(struct airoha_eth *eth,
return -EINVAL;
err = airoha_ppe_foe_entry_prepare(eth, &hwe, odev, offload_type,
- &data, l4proto);
+ &data, ether_type, l4proto);
if (err)
return err;
diff --git a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
index cc8c4ef8038f..2601c17b29c8 100644
--- a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
@@ -89,7 +89,8 @@ mtk_flow_offload_mangle_eth(const struct flow_action_entry *act, void *eth)
}
static int
-mtk_flow_get_wdma_info(struct net_device *dev, const u8 *addr, struct mtk_wdma_info *info)
+mtk_flow_get_wdma_info(struct net_device *dev, const u8 *addr,
+ __be16 ether_type, struct mtk_wdma_info *info)
{
struct net_device_path_stack stack;
struct net_device_path *path;
@@ -102,7 +103,7 @@ mtk_flow_get_wdma_info(struct net_device *dev, const u8 *addr, struct mtk_wdma_i
return -1;
rcu_read_lock();
- err = dev_fill_forward_path(dev, addr, &stack);
+ err = dev_fill_forward_path(dev, addr, ether_type, &stack);
rcu_read_unlock();
if (err)
return err;
@@ -190,12 +191,12 @@ mtk_flow_get_dsa_port(struct net_device **dev)
static int
mtk_flow_set_output_device(struct mtk_eth *eth, struct mtk_foe_entry *foe,
struct net_device *dev, const u8 *dest_mac,
- int *wed_index)
+ __be16 ether_type, int *wed_index)
{
struct mtk_wdma_info info = {};
int pse_port, dsa_port, queue;
- if (mtk_flow_get_wdma_info(dev, dest_mac, &info) == 0) {
+ if (mtk_flow_get_wdma_info(dev, dest_mac, ether_type, &info) == 0) {
mtk_foe_entry_set_wdma(eth, foe, info.wdma_idx, info.queue,
info.bss, info.wcid, info.amsdu);
if (mtk_is_netsys_v2_or_greater(eth)) {
@@ -273,6 +274,7 @@ mtk_flow_offload_replace(struct mtk_eth *eth, struct flow_cls_offload *f,
struct mtk_flow_data data = {};
struct mtk_foe_entry foe;
struct mtk_flow_entry *entry;
+ __be16 ether_type = 0;
int offload_type = 0;
int wed_index = -1;
u16 addr_type = 0;
@@ -319,6 +321,7 @@ mtk_flow_offload_replace(struct mtk_eth *eth, struct flow_cls_offload *f,
struct flow_match_basic match;
flow_rule_match_basic(rule, &match);
+ ether_type = match.key->n_proto;
l4proto = match.key->ip_proto;
} else {
return -EOPNOTSUPP;
@@ -481,7 +484,7 @@ mtk_flow_offload_replace(struct mtk_eth *eth, struct flow_cls_offload *f,
mtk_foe_entry_set_pppoe(eth, &foe, data.pppoe.sid);
err = mtk_flow_set_output_device(eth, &foe, odev, data.eth.h_dest,
- &wed_index);
+ ether_type, &wed_index);
if (err)
return err;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 744ffa243501..85bd9d46b5a0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -938,6 +938,7 @@ struct net_device_path_stack {
struct net_device_path_ctx {
const struct net_device *dev;
u8 daddr[ETH_ALEN];
+ __be16 ether_type;
int num_vlans;
struct {
@@ -3391,7 +3392,8 @@ void dev_remove_offload(struct packet_offload *po);
int dev_get_iflink(const struct net_device *dev);
int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
-int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
+int dev_fill_forward_path(const struct net_device *dev,
+ const u8 *daddr, __be16 ether_type,
struct net_device_path_stack *stack);
struct net_device *dev_get_by_name(struct net *net, const char *name);
struct net_device *dev_get_by_name_rcu(struct net *net, const char *name);
diff --git a/net/core/dev.c b/net/core/dev.c
index 06c195906231..5f6171c08849 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -750,12 +750,14 @@ static struct net_device_path *dev_fwd_path(struct net_device_path_stack *stack)
return &stack->path[k];
}
-int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
+int dev_fill_forward_path(const struct net_device *dev,
+ const u8 *daddr, __be16 ether_type,
struct net_device_path_stack *stack)
{
const struct net_device *last_dev;
struct net_device_path_ctx ctx = {
- .dev = dev,
+ .dev = dev,
+ .ether_type = ether_type,
};
struct net_device_path *path;
int ret = 0;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index c468c83af0f2..3d64e672eeee 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1851,7 +1851,10 @@ static int ip6_tnl_fill_forward_path(struct net_device_path_ctx *ctx,
path->type = DEV_PATH_TUN;
path->tun.src_v6 = t->parms.laddr;
path->tun.dst_v6 = t->parms.raddr;
- path->tun.l3_proto = IPPROTO_IPV6;
+ if (ctx->ether_type == cpu_to_be16(ETH_P_IP))
+ path->tun.l3_proto = IPPROTO_IPIP;
+ else
+ path->tun.l3_proto = IPPROTO_IPV6;
path->dev = ctx->dev;
ctx->dev = dst->dev;
}
diff --git a/net/netfilter/nf_flow_table_path.c b/net/netfilter/nf_flow_table_path.c
index 6bb9579dcc2a..df4e180ed3c2 100644
--- a/net/netfilter/nf_flow_table_path.c
+++ b/net/netfilter/nf_flow_table_path.c
@@ -45,7 +45,8 @@ static bool nft_is_valid_ether_device(const struct net_device *dev)
static int nft_dev_fill_forward_path(const struct nf_flow_route *route,
const struct dst_entry *dst_cache,
const struct nf_conn *ct,
- enum ip_conntrack_dir dir, u8 *ha,
+ enum ip_conntrack_dir dir,
+ u8 *ha, __be16 ether_type,
struct net_device_path_stack *stack)
{
const void *daddr = &ct->tuplehash[!dir].tuple.src.u3;
@@ -70,7 +71,7 @@ static int nft_dev_fill_forward_path(const struct nf_flow_route *route,
return -1;
out:
- return dev_fill_forward_path(dev, ha, stack);
+ return dev_fill_forward_path(dev, ha, ether_type, stack);
}
struct nft_forward_info {
@@ -248,7 +249,8 @@ static void nft_dev_forward_path(const struct nft_pktinfo *pkt,
unsigned char ha[ETH_ALEN];
int i;
- if (nft_dev_fill_forward_path(route, dst, ct, dir, ha, &stack) >= 0)
+ if (nft_dev_fill_forward_path(route, dst, ct, dir, ha, pkt->ethertype,
+ &stack) >= 0)
nft_dev_path_info(&stack, &info, ha, &ft->data);
if (info.outdev)
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH nf-next 2/4] net: netfilter: Add encap_proto to flow_offload_tunnel
2026-05-05 14:49 [PATCH nf-next 0/4] Add IPv4 over IPv6 flowtable SW acceleration Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 1/4] net: netfilter: Add ether_type to net_device_path_ctx Lorenzo Bianconi
@ 2026-05-05 14:49 ` Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 3/4] net: netfilter: Add IPv4 over IPv6 tunnel flowtable acceleration Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 4/4] selftests: netfilter: nft_flowtable.sh: Add IPv4 over IPv6 flowtable selftest Lorenzo Bianconi
3 siblings, 0 replies; 5+ messages in thread
From: Lorenzo Bianconi @ 2026-05-05 14:49 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Felix Fietkau, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, David Ahern,
Ido Schimmel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
Shuah Khan
Cc: linux-arm-kernel, linux-mediatek, netdev, netfilter-devel,
coreteam, linux-kselftest, Lorenzo Bianconi
Add encap_proto (AF_INET or AF_INET6) to struct flow_offload_tunnel
to allow its use as part of the hash table key during flowtable entry
lookup.
This is a preliminary change to support IPv4 over IPv6 tunneling via
the flowtable infrastructure for software acceleration.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
include/linux/netdevice.h | 1 +
include/net/netfilter/nf_flow_table.h | 1 +
net/ipv4/ipip.c | 1 +
net/ipv6/ip6_tunnel.c | 1 +
net/netfilter/nf_flow_table_ip.c | 2 ++
net/netfilter/nf_flow_table_path.c | 2 ++
6 files changed, 8 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 85bd9d46b5a0..02f593397fad 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -902,6 +902,7 @@ struct net_device_path {
};
u8 l3_proto;
+ u8 encap_proto;
} tun;
struct {
enum {
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index b09c11c048d5..96e8ecf0f530 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -118,6 +118,7 @@ struct flow_offload_tunnel {
};
u8 l3_proto;
+ u8 encap_proto;
};
struct flow_offload_tuple {
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index ff95b1b9908e..5425af051d5a 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -369,6 +369,7 @@ static int ipip_fill_forward_path(struct net_device_path_ctx *ctx,
path->tun.src_v4.s_addr = tiph->saddr;
path->tun.dst_v4.s_addr = tiph->daddr;
path->tun.l3_proto = IPPROTO_IPIP;
+ path->tun.encap_proto = AF_INET;
path->dev = ctx->dev;
ctx->dev = rt->dst.dev;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 3d64e672eeee..c99ed41bfc99 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1851,6 +1851,7 @@ static int ip6_tnl_fill_forward_path(struct net_device_path_ctx *ctx,
path->type = DEV_PATH_TUN;
path->tun.src_v6 = t->parms.laddr;
path->tun.dst_v6 = t->parms.raddr;
+ path->tun.encap_proto = AF_INET6;
if (ctx->ether_type == cpu_to_be16(ETH_P_IP))
path->tun.l3_proto = IPPROTO_IPIP;
else
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index fd56d663cb5b..9efd76b57847 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -198,6 +198,7 @@ static void nf_flow_tuple_encap(struct nf_flowtable_ctx *ctx,
tuple->tun.dst_v4.s_addr = iph->daddr;
tuple->tun.src_v4.s_addr = iph->saddr;
tuple->tun.l3_proto = IPPROTO_IPIP;
+ tuple->tun.encap_proto = AF_INET;
}
break;
case htons(ETH_P_IPV6):
@@ -206,6 +207,7 @@ static void nf_flow_tuple_encap(struct nf_flowtable_ctx *ctx,
tuple->tun.dst_v6 = ip6h->daddr;
tuple->tun.src_v6 = ip6h->saddr;
tuple->tun.l3_proto = IPPROTO_IPV6;
+ tuple->tun.encap_proto = AF_INET6;
}
break;
default:
diff --git a/net/netfilter/nf_flow_table_path.c b/net/netfilter/nf_flow_table_path.c
index df4e180ed3c2..5a5774d9b6f5 100644
--- a/net/netfilter/nf_flow_table_path.c
+++ b/net/netfilter/nf_flow_table_path.c
@@ -127,6 +127,7 @@ static void nft_dev_path_info(const struct net_device_path_stack *stack,
info->tun.src_v6 = path->tun.src_v6;
info->tun.dst_v6 = path->tun.dst_v6;
info->tun.l3_proto = path->tun.l3_proto;
+ info->tun.encap_proto = path->tun.encap_proto;
info->num_tuns++;
} else {
if (info->num_encaps >= NF_FLOW_TABLE_ENCAP_MAX) {
@@ -270,6 +271,7 @@ static void nft_dev_forward_path(const struct nft_pktinfo *pkt,
route->tuple[!dir].in.tun.src_v6 = info.tun.dst_v6;
route->tuple[!dir].in.tun.dst_v6 = info.tun.src_v6;
route->tuple[!dir].in.tun.l3_proto = info.tun.l3_proto;
+ route->tuple[!dir].in.tun.encap_proto = info.tun.encap_proto;
route->tuple[!dir].in.num_tuns = info.num_tuns;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH nf-next 3/4] net: netfilter: Add IPv4 over IPv6 tunnel flowtable acceleration
2026-05-05 14:49 [PATCH nf-next 0/4] Add IPv4 over IPv6 flowtable SW acceleration Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 1/4] net: netfilter: Add ether_type to net_device_path_ctx Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 2/4] net: netfilter: Add encap_proto to flow_offload_tunnel Lorenzo Bianconi
@ 2026-05-05 14:49 ` Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 4/4] selftests: netfilter: nft_flowtable.sh: Add IPv4 over IPv6 flowtable selftest Lorenzo Bianconi
3 siblings, 0 replies; 5+ messages in thread
From: Lorenzo Bianconi @ 2026-05-05 14:49 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Felix Fietkau, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, David Ahern,
Ido Schimmel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
Shuah Khan
Cc: linux-arm-kernel, linux-mediatek, netdev, netfilter-devel,
coreteam, linux-kselftest, Lorenzo Bianconi
Introduce sw flowtable acceleration for the TX/RX paths of
IPv4 over IPv6 tunnels, relying on the netfilter flowtable
infrastructure.
The feature can be tested with a forwarding scenario between two
NICs (eth0 and eth1), where an IPv4 over IPv6 tunnel is used to
reach a remote site via eth1 as the underlay device:
ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)
[IP configuration]
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.2/24 scope global eth0
valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
inet6 2001:db8:2::1/64 scope global nodad
valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr ce9c:2940:7dcc::
inet 192.168.100.1/24 scope global tun0
valid_lft forever preferred_lft forever
$ ip route show
default via 192.168.100.2 dev tun0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1
$ ip -6 route show
2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium
default via 2002:db8:1::2 dev tun0 metric 1024 pref medium
$ nft list ruleset
table inet filter {
flowtable ft {
hook ingress priority filter
devices = { eth0, eth1 }
}
chain forward {
type filter hook forward priority filter; policy accept;
meta l4proto { tcp, udp } flow add @ft
}
}
When reproducing this scenario using veth interfaces, the following
results were observed:
- TCP stream received from IPv4 over IPv6 tunnel:
- net-next (baseline): ~126 Gbps
- net-next + IP6IP flowtable support: ~140 Gbps
- TCP stream transmitted to IPv4 over IPv6 tunnel:
- net-next (baseline): ~127 Gbps
- net-next + IP6IP flowtable support: ~141 Gbps
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
net/netfilter/nf_flow_table_core.c | 14 ++--
net/netfilter/nf_flow_table_ip.c | 127 +++++++++++++++++++++++++++----------
net/netfilter/nf_flow_table_path.c | 6 +-
3 files changed, 107 insertions(+), 40 deletions(-)
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 2c4140e6f53c..53fea3da0747 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -76,9 +76,11 @@ struct flow_offload *flow_offload_alloc(struct nf_conn *ct)
}
EXPORT_SYMBOL_GPL(flow_offload_alloc);
-static u32 flow_offload_dst_cookie(struct flow_offload_tuple *flow_tuple)
+static u32 flow_offload_dst_cookie(struct flow_offload_tuple *flow_tuple,
+ u8 tun_encap_proto)
{
- if (flow_tuple->l3proto == NFPROTO_IPV6)
+ if (flow_tuple->l3proto == NFPROTO_IPV6 ||
+ tun_encap_proto == NFPROTO_IPV6)
return rt6_get_cookie(dst_rt6_info(flow_tuple->dst_cache));
return 0;
@@ -134,10 +136,14 @@ static int flow_offload_fill_route(struct flow_offload *flow,
dst_release(dst);
break;
case FLOW_OFFLOAD_XMIT_XFRM:
- case FLOW_OFFLOAD_XMIT_NEIGH:
+ case FLOW_OFFLOAD_XMIT_NEIGH: {
+ u8 encap_proto = route->tuple[!dir].in.tun.encap_proto;
+
flow_tuple->ifidx = route->tuple[dir].out.ifindex;
flow_tuple->dst_cache = dst;
- flow_tuple->dst_cookie = flow_offload_dst_cookie(flow_tuple);
+ flow_tuple->dst_cookie = flow_offload_dst_cookie(flow_tuple,
+ encap_proto);
+ }
break;
default:
WARN_ON_ONCE(1);
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 9efd76b57847..70d7b5a200e2 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -191,27 +191,27 @@ static void nf_flow_tuple_encap(struct nf_flowtable_ctx *ctx,
break;
}
- switch (inner_proto) {
- case htons(ETH_P_IP):
- iph = (struct iphdr *)(skb_network_header(skb) + offset);
- if (ctx->tun.proto == IPPROTO_IPIP) {
+ if (ctx->tun.proto == IPPROTO_IPIP || ctx->tun.proto == IPPROTO_IPV6) {
+ switch (inner_proto) {
+ case htons(ETH_P_IP):
+ iph = (struct iphdr *)(skb_network_header(skb) +
+ offset);
tuple->tun.dst_v4.s_addr = iph->daddr;
tuple->tun.src_v4.s_addr = iph->saddr;
- tuple->tun.l3_proto = IPPROTO_IPIP;
+ tuple->tun.l3_proto = ctx->tun.proto;
tuple->tun.encap_proto = AF_INET;
- }
- break;
- case htons(ETH_P_IPV6):
- ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
- if (ctx->tun.proto == IPPROTO_IPV6) {
+ break;
+ case htons(ETH_P_IPV6):
+ ip6h = (struct ipv6hdr *)(skb_network_header(skb) +
+ offset);
tuple->tun.dst_v6 = ip6h->daddr;
tuple->tun.src_v6 = ip6h->saddr;
- tuple->tun.l3_proto = IPPROTO_IPV6;
+ tuple->tun.l3_proto = ctx->tun.proto;
tuple->tun.encap_proto = AF_INET6;
+ break;
+ default:
+ break;
}
- break;
- default:
- break;
}
}
@@ -367,9 +367,9 @@ static bool nf_flow_ip6_tunnel_proto(struct nf_flowtable_ctx *ctx,
if (hdrlen < 0)
return false;
- if (nexthdr == IPPROTO_IPV6) {
+ if (nexthdr == IPPROTO_IPIP || nexthdr == IPPROTO_IPV6) {
ctx->tun.hdr_size = hdrlen;
- ctx->tun.proto = IPPROTO_IPV6;
+ ctx->tun.proto = nexthdr;
}
ctx->offset += ctx->tun.hdr_size;
@@ -388,6 +388,10 @@ static void nf_flow_ip_tunnel_pop(struct nf_flowtable_ctx *ctx,
skb_pull(skb, ctx->tun.hdr_size);
skb_reset_network_header(skb);
+ if (ctx->tun.proto == IPPROTO_IPIP)
+ skb->protocol = htons(ETH_P_IP);
+ else
+ skb->protocol = htons(ETH_P_IPV6);
}
static bool nf_flow_skb_encap_protocol(struct nf_flowtable_ctx *ctx,
@@ -650,18 +654,29 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
struct in6_addr **ip6_daddr,
int encap_limit)
{
- struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb);
- u8 hop_limit = ip6h->hop_limit, proto = IPPROTO_IPV6;
struct rtable *rt = dst_rtable(tuple->dst_cache);
- __u8 dsfield = ipv6_get_dsfield(ip6h);
+ u8 hop_limit, proto = tuple->tun.l3_proto;
struct flowi6 fl6 = {
.daddr = tuple->tun.src_v6,
.saddr = tuple->tun.dst_v6,
.flowi6_proto = proto,
};
+ struct ipv6hdr *ip6h;
+ __u8 dsfield;
int err, mtu;
u32 headroom;
+ if (tuple->tun.l3_proto == IPPROTO_IPIP) {
+ struct iphdr *iph = (struct iphdr *)skb_network_header(skb);
+
+ dsfield = ipv4_get_dsfield(iph);
+ hop_limit = iph->ttl;
+ } else {
+ ip6h = (struct ipv6hdr *)skb_network_header(skb);
+ dsfield = ipv6_get_dsfield(ip6h);
+ hop_limit = ip6h->hop_limit;
+ }
+
err = iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6);
if (err)
return err;
@@ -697,12 +712,13 @@ static int nf_flow_tunnel_ip6ip6_push(struct net *net, struct sk_buff *skb,
hopt = skb_push(skb, ipv6_optlen(opt.ops.dst1opt));
memcpy(hopt, opt.ops.dst1opt, ipv6_optlen(opt.ops.dst1opt));
- hopt->nexthdr = IPPROTO_IPV6;
+ hopt->nexthdr = proto;
proto = NEXTHDR_DEST;
}
skb_push(skb, sizeof(*ip6h));
skb_reset_network_header(skb);
+ skb->protocol = htons(ETH_P_IPV6);
ip6h = ipv6_hdr(skb);
ip6_flow_hdr(ip6h, dsfield,
@@ -767,6 +783,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
.in = state->in,
};
struct nf_flow_xmit xmit = {};
+ struct in6_addr *ip6_daddr;
struct flow_offload *flow;
struct neighbour *neigh;
struct rtable *rt;
@@ -796,28 +813,50 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
other_tuple = &flow->tuplehash[!dir].tuple;
ip_daddr = other_tuple->src_v4.s_addr;
- if (nf_flow_tunnel_v4_push(state->net, skb, other_tuple, &ip_daddr) < 0)
+ if (other_tuple->tun.encap_proto == AF_INET6) {
+ if (nf_flow_tunnel_v6_push(state->net, skb, other_tuple,
+ &ip6_daddr,
+ IPV6_DEFAULT_TNL_ENCAP_LIMIT) < 0)
+ return NF_DROP;
+ } else if (nf_flow_tunnel_v4_push(state->net, skb, other_tuple,
+ &ip_daddr) < 0) {
return NF_DROP;
+ }
if (nf_flow_encap_push(skb, other_tuple) < 0)
return NF_DROP;
switch (tuplehash->tuple.xmit_type) {
- case FLOW_OFFLOAD_XMIT_NEIGH:
- rt = dst_rtable(tuplehash->tuple.dst_cache);
+ case FLOW_OFFLOAD_XMIT_NEIGH: {
+ struct dst_entry *dst;
+
xmit.outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.ifidx);
if (!xmit.outdev) {
flow_offload_teardown(flow);
return NF_DROP;
}
- neigh = ip_neigh_gw4(rt->dst.dev, rt_nexthop(rt, ip_daddr));
+ if (other_tuple->tun.encap_proto == AF_INET6 ||
+ ctx.tun.proto == IPPROTO_IPV6) {
+ struct rt6_info *rt6;
+
+ rt6 = dst_rt6_info(tuplehash->tuple.dst_cache);
+ neigh = ip_neigh_gw6(rt6->dst.dev,
+ rt6_nexthop(rt6, ip6_daddr));
+ dst = &rt6->dst;
+ } else {
+ rt = dst_rtable(tuplehash->tuple.dst_cache);
+ neigh = ip_neigh_gw4(rt->dst.dev,
+ rt_nexthop(rt, ip_daddr));
+ dst = &rt->dst;
+ }
if (IS_ERR(neigh)) {
flow_offload_teardown(flow);
return NF_DROP;
}
xmit.dest = neigh->ha;
- skb_dst_set_noref(skb, &rt->dst);
+ skb_dst_set_noref(skb, dst);
break;
+ }
case FLOW_OFFLOAD_XMIT_DIRECT:
xmit.outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.out.ifidx);
if (!xmit.outdev) {
@@ -1068,8 +1107,12 @@ nf_flow_offload_ipv6_lookup(struct nf_flowtable_ctx *ctx,
if (!nf_flow_skb_encap_protocol(ctx, skb, htons(ETH_P_IPV6)))
return NULL;
- if (nf_flow_tuple_ipv6(ctx, skb, &tuple) < 0)
+ if (ctx->tun.proto == IPPROTO_IPIP) {
+ if (nf_flow_tuple_ip(ctx, skb, &tuple) < 0)
+ return NULL;
+ } else if (nf_flow_tuple_ipv6(ctx, skb, &tuple) < 0) {
return NULL;
+ }
return flow_offload_lookup(flow_table, &tuple);
}
@@ -1097,8 +1140,11 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
if (tuplehash == NULL)
return NF_ACCEPT;
- ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb,
- encap_limit);
+ if (ctx.tun.proto == IPPROTO_IPIP)
+ ret = nf_flow_offload_forward(&ctx, flow_table, tuplehash, skb);
+ else
+ ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash,
+ skb, encap_limit);
if (ret < 0)
return NF_DROP;
else if (ret == 0)
@@ -1125,21 +1171,38 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
return NF_DROP;
switch (tuplehash->tuple.xmit_type) {
- case FLOW_OFFLOAD_XMIT_NEIGH:
- rt = dst_rt6_info(tuplehash->tuple.dst_cache);
+ case FLOW_OFFLOAD_XMIT_NEIGH: {
+ struct dst_entry *dst;
+
xmit.outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.ifidx);
if (!xmit.outdev) {
flow_offload_teardown(flow);
return NF_DROP;
}
- neigh = ip_neigh_gw6(rt->dst.dev, rt6_nexthop(rt, ip6_daddr));
+ if (other_tuple->tun.encap_proto == AF_INET ||
+ ctx.tun.proto == IPPROTO_IPIP) {
+ __be32 ip_daddr = other_tuple->src_v4.s_addr;
+ struct rtable *rt4;
+
+ skb->protocol = htons(ETH_P_IP);
+ rt4 = dst_rtable(tuplehash->tuple.dst_cache);
+ neigh = ip_neigh_gw4(rt4->dst.dev,
+ rt_nexthop(rt4, ip_daddr));
+ dst = &rt4->dst;
+ } else {
+ rt = dst_rt6_info(tuplehash->tuple.dst_cache);
+ neigh = ip_neigh_gw6(rt->dst.dev,
+ rt6_nexthop(rt, ip6_daddr));
+ dst = &rt->dst;
+ }
if (IS_ERR(neigh)) {
flow_offload_teardown(flow);
return NF_DROP;
}
xmit.dest = neigh->ha;
- skb_dst_set_noref(skb, &rt->dst);
+ skb_dst_set_noref(skb, dst);
break;
+ }
case FLOW_OFFLOAD_XMIT_DIRECT:
xmit.outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.out.ifidx);
if (!xmit.outdev) {
diff --git a/net/netfilter/nf_flow_table_path.c b/net/netfilter/nf_flow_table_path.c
index 5a5774d9b6f5..74b6f5ea35f9 100644
--- a/net/netfilter/nf_flow_table_path.c
+++ b/net/netfilter/nf_flow_table_path.c
@@ -209,12 +209,11 @@ static int nft_flow_tunnel_update_route(const struct nft_pktinfo *pkt,
struct dst_entry *tun_dst = NULL;
struct flowi fl = {};
- switch (nft_pf(pkt)) {
+ switch (tun->encap_proto) {
case NFPROTO_IPV4:
fl.u.ip4.daddr = tun->dst_v4.s_addr;
fl.u.ip4.saddr = tun->src_v4.s_addr;
fl.u.ip4.flowi4_iif = nft_in(pkt)->ifindex;
- fl.u.ip4.flowi4_dscp = ip4h_dscp(ip_hdr(pkt->skb));
fl.u.ip4.flowi4_mark = pkt->skb->mark;
fl.u.ip4.flowi4_flags = FLOWI_FLAG_ANYSRC;
break;
@@ -222,13 +221,12 @@ static int nft_flow_tunnel_update_route(const struct nft_pktinfo *pkt,
fl.u.ip6.daddr = tun->dst_v6;
fl.u.ip6.saddr = tun->src_v6;
fl.u.ip6.flowi6_iif = nft_in(pkt)->ifindex;
- fl.u.ip6.flowlabel = ip6_flowinfo(ipv6_hdr(pkt->skb));
fl.u.ip6.flowi6_mark = pkt->skb->mark;
fl.u.ip6.flowi6_flags = FLOWI_FLAG_ANYSRC;
break;
}
- nf_route(nft_net(pkt), &tun_dst, &fl, false, nft_pf(pkt));
+ nf_route(nft_net(pkt), &tun_dst, &fl, false, tun->encap_proto);
if (!tun_dst)
return -ENOENT;
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH nf-next 4/4] selftests: netfilter: nft_flowtable.sh: Add IPv4 over IPv6 flowtable selftest
2026-05-05 14:49 [PATCH nf-next 0/4] Add IPv4 over IPv6 flowtable SW acceleration Lorenzo Bianconi
` (2 preceding siblings ...)
2026-05-05 14:49 ` [PATCH nf-next 3/4] net: netfilter: Add IPv4 over IPv6 tunnel flowtable acceleration Lorenzo Bianconi
@ 2026-05-05 14:49 ` Lorenzo Bianconi
3 siblings, 0 replies; 5+ messages in thread
From: Lorenzo Bianconi @ 2026-05-05 14:49 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Felix Fietkau, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, David Ahern,
Ido Schimmel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
Shuah Khan
Cc: linux-arm-kernel, linux-mediatek, netdev, netfilter-devel,
coreteam, linux-kselftest, Lorenzo Bianconi
Similar to IPIP and IP6IP6, introduce specific selftest for IPv4 over IPv6
flowtable sw acceleration in nft_flowtable.sh
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
.../selftests/net/netfilter/nft_flowtable.sh | 26 ++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/tools/testing/selftests/net/netfilter/nft_flowtable.sh b/tools/testing/selftests/net/netfilter/nft_flowtable.sh
index 7a34ef468975..d9995549f8a6 100755
--- a/tools/testing/selftests/net/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/net/netfilter/nft_flowtable.sh
@@ -594,7 +594,9 @@ ip netns exec "$nsr1" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null
ip -net "$nsr1" link add name tun6 type ip6tnl local fee1:2::1 remote fee1:2::2
ip -net "$nsr1" link set tun6 up
+ip -net "$nsr1" addr add 192.168.210.1/24 dev tun6
ip -net "$nsr1" addr add fee1:3::1/64 dev tun6 nodad
+ip netns exec "$nsr1" sysctl net.ipv4.conf.tun6.forwarding=1 > /dev/null
ip -net "$nsr2" link add name tun0 type ipip local 192.168.10.2 remote 192.168.10.1
ip -net "$nsr2" link set tun0 up
@@ -603,7 +605,9 @@ ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null
ip -net "$nsr2" link add name tun6 type ip6tnl local fee1:2::2 remote fee1:2::1 || ret=1
ip -net "$nsr2" link set tun6 up
+ip -net "$nsr2" addr add 192.168.210.2/24 dev tun6
ip -net "$nsr2" addr add fee1:3::2/64 dev tun6 nodad
+ip netns exec "$nsr2" sysctl net.ipv4.conf.tun6.forwarding=1 > /dev/null
ip -net "$nsr1" route change default via 192.168.100.2
ip -net "$nsr2" route change default via 192.168.100.1
@@ -636,6 +640,15 @@ else
ret=1
fi
+ip -net "$nsr1" route change default via 192.168.210.2
+ip -net "$nsr2" route change default via 192.168.210.1
+
+if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IP6IP4 tunnel"; then
+ echo "FAIL: flow offload for ns1/ns2 with IP6IP4 tunnel" 1>&2
+ ip netns exec "$nsr1" nft list ruleset
+ ret=1
+fi
+
# Create vlan tagged devices for IPIP traffic.
ip -net "$nsr1" link add link veth1 name veth1.10 type vlan id 10
ip -net "$nsr1" link set veth1.10 up
@@ -653,7 +666,9 @@ ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun0.10 a
ip -net "$nsr1" link add name tun6.10 type ip6tnl local fee1:4::1 remote fee1:4::2
ip -net "$nsr1" link set tun6.10 up
+ip -net "$nsr1" addr add 192.168.220.1/24 dev tun6.10
ip -net "$nsr1" addr add fee1:5::1/64 dev tun6.10 nodad
+ip netns exec "$nsr1" sysctl net.ipv4.conf.tun6/10.forwarding=1 > /dev/null
ip -6 -net "$nsr1" route delete default
ip -6 -net "$nsr1" route add default via fee1:5::2
ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun6.10 accept'
@@ -672,7 +687,9 @@ ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0/10.forwarding=1 > /dev/null
ip -net "$nsr2" link add name tun6.10 type ip6tnl local fee1:4::2 remote fee1:4::1 || ret=1
ip -net "$nsr2" link set tun6.10 up
+ip -net "$nsr2" addr add 192.168.220.2/24 dev tun6.10
ip -net "$nsr2" addr add fee1:5::2/64 dev tun6.10 nodad
+ip netns exec "$nsr2" sysctl net.ipv4.conf.tun6/10.forwarding=1 > /dev/null
ip -6 -net "$nsr2" route delete default
ip -6 -net "$nsr2" route add default via fee1:5::1
@@ -690,6 +707,15 @@ else
ret=1
fi
+ip -net "$nsr1" route change default via 192.168.220.2
+ip -net "$nsr2" route change default via 192.168.220.1
+
+if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IP6IP4 tunnel over vlan"; then
+ echo "FAIL: flow offload for ns1/ns2 with IP6IP4 tunnel over vlan" 1>&2
+ ip netns exec "$nsr1" nft list ruleset
+ ret=1
+fi
+
# Restore the previous configuration
ip -net "$nsr1" route change default via 192.168.10.2
ip -net "$nsr2" route change default via 192.168.10.1
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-05 14:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05 14:49 [PATCH nf-next 0/4] Add IPv4 over IPv6 flowtable SW acceleration Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 1/4] net: netfilter: Add ether_type to net_device_path_ctx Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 2/4] net: netfilter: Add encap_proto to flow_offload_tunnel Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 3/4] net: netfilter: Add IPv4 over IPv6 tunnel flowtable acceleration Lorenzo Bianconi
2026-05-05 14:49 ` [PATCH nf-next 4/4] selftests: netfilter: nft_flowtable.sh: Add IPv4 over IPv6 flowtable selftest Lorenzo Bianconi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox