[PATCH nf-next v3 0/2] Add IPIP flowtable SW acceleratio

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH nf-next v3 0/2] Add IPIP flowtable SW acceleratio
@ 2025-07-03 14:16 Lorenzo Bianconi
  2025-07-03 14:16 ` [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration Lorenzo Bianconi
  2025-07-03 14:16 ` [PATCH nf-next v3 2/2] selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest Lorenzo Bianconi
  0 siblings, 2 replies; 8+ messages in thread
From: Lorenzo Bianconi @ 2025-07-03 14:16 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Shuah Khan
  Cc: netdev, netfilter-devel, coreteam, linux-kselftest,
	Lorenzo Bianconi

Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
infrastructure.

---
Changes in v3:
- Add outer IP header sanity checks
- target nf-next tree instead of net-next
- Link to v2: https://lore.kernel.org/r/20250627-nf-flowtable-ipip-v2-0-c713003ce75b@kernel.org

Changes in v2:
- Introduce IPIP flowtable selftest
- Link to v1: https://lore.kernel.org/r/20250623-nf-flowtable-ipip-v1-1-2853596e3941@kernel.org

---
Lorenzo Bianconi (2):
      net: netfilter: Add IPIP flowtable SW acceleration
      selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest

 net/ipv4/ipip.c                                    | 21 ++++++++++++
 net/netfilter/nf_flow_table_ip.c                   | 34 ++++++++++++++++--
 .../selftests/net/netfilter/nft_flowtable.sh       | 40 ++++++++++++++++++++++
 3 files changed, 93 insertions(+), 2 deletions(-)
---
base-commit: 8b98f34ce1d8c520403362cb785231f9898eb3ff
change-id: 20250623-nf-flowtable-ipip-1b3d7b08d067

Best regards,
-- 
Lorenzo Bianconi <lorenzo@kernel.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration
  2025-07-03 14:16 [PATCH nf-next v3 0/2] Add IPIP flowtable SW acceleratio Lorenzo Bianconi
@ 2025-07-03 14:16 ` Lorenzo Bianconi
  2025-07-03 14:35   ` Pablo Neira Ayuso
  2025-07-03 14:16 ` [PATCH nf-next v3 2/2] selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest Lorenzo Bianconi
  1 sibling, 1 reply; 8+ messages in thread
From: Lorenzo Bianconi @ 2025-07-03 14:16 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Shuah Khan
  Cc: netdev, netfilter-devel, coreteam, linux-kselftest,
	Lorenzo Bianconi

Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
infrastructure.
IPIP SW acceleration can be tested running the following scenario where
the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
tunnel is used to access a remote site (using eth1 as the underlay device):

ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)

$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.2/24 scope global eth0
       valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 scope global eth1
       valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 192.168.1.1 peer 192.168.1.2
    inet 192.168.100.1/24 scope global tun0
       valid_lft forever preferred_lft forever

$ip route show
default via 192.168.100.2 dev tun0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1

$nft list ruleset
table inet filter {
        flowtable ft {
                hook ingress priority filter
                devices = { eth0, eth1 }
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                meta l4proto { tcp, udp } flow add @ft
        }
}

Reproducing the scenario described above using veths I got the following
results:
- TCP stream transmitted into the IPIP tunnel:
  - net-next:				~41Gbps
  - net-next + IPIP flowtbale support:	~40Gbps
- TCP stream received from the IPIP tunnel:
  - net-next:				~35Gbps
  - net-next + IPIP flowtbale support:	~49Gbps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/ipv4/ipip.c                  | 21 +++++++++++++++++++++
 net/netfilter/nf_flow_table_ip.c | 34 ++++++++++++++++++++++++++++++++--
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 3e03af073a1ccc3d7597a998a515b6cfdded40b5..05fb1c859170d74009d693bc8513183bdec3ff90 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -353,6 +353,26 @@ ipip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p, int cmd)
 	return ip_tunnel_ctl(dev, p, cmd);
 }
 
+static int ipip_fill_forward_path(struct net_device_path_ctx *ctx,
+				  struct net_device_path *path)
+{
+	struct ip_tunnel *tunnel = netdev_priv(ctx->dev);
+	const struct iphdr *tiph = &tunnel->parms.iph;
+	struct rtable *rt;
+
+	rt = ip_route_output(dev_net(ctx->dev), tiph->daddr, 0, 0, 0,
+			     RT_SCOPE_UNIVERSE);
+	if (IS_ERR(rt))
+		return PTR_ERR(rt);
+
+	path->type = DEV_PATH_ETHERNET;
+	path->dev = ctx->dev;
+	ctx->dev = rt->dst.dev;
+	ip_rt_put(rt);
+
+	return 0;
+}
+
 static const struct net_device_ops ipip_netdev_ops = {
 	.ndo_init       = ipip_tunnel_init,
 	.ndo_uninit     = ip_tunnel_uninit,
@@ -362,6 +382,7 @@ static const struct net_device_ops ipip_netdev_ops = {
 	.ndo_get_stats64 = dev_get_tstats64,
 	.ndo_get_iflink = ip_tunnel_get_iflink,
 	.ndo_tunnel_ctl	= ipip_tunnel_ctl,
+	.ndo_fill_forward_path = ipip_fill_forward_path,
 };
 
 #define IPIP_FEATURES (NETIF_F_SG |		\
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 8cd4cf7ae21120f1057c4fce5aaca4e3152ae76d..6b55e00b1022f0a2b02d9bfd1bd34bb55c1b83f7 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -277,13 +277,37 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
 	return NF_STOLEN;
 }
 
+static bool nf_flow_ip4_encap_proto(struct sk_buff *skb, u16 *size)
+{
+	struct iphdr *iph;
+
+	if (!pskb_may_pull(skb, sizeof(*iph)))
+		return false;
+
+	iph = (struct iphdr *)skb_network_header(skb);
+	*size = iph->ihl << 2;
+
+	if (ip_is_fragment(iph) || unlikely(ip_has_options(*size)))
+		return false;
+
+	if (iph->ttl <= 1)
+		return false;
+
+	return iph->protocol == IPPROTO_IPIP;
+}
+
 static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
 				       u32 *offset)
 {
 	struct vlan_ethhdr *veth;
 	__be16 inner_proto;
+	u16 size;
 
 	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if (nf_flow_ip4_encap_proto(skb, &size))
+			*offset += size;
+		return true;
 	case htons(ETH_P_8021Q):
 		if (!pskb_may_pull(skb, skb_mac_offset(skb) + sizeof(*veth)))
 			return false;
@@ -310,6 +334,7 @@ static void nf_flow_encap_pop(struct sk_buff *skb,
 			      struct flow_offload_tuple_rhash *tuplehash)
 {
 	struct vlan_hdr *vlan_hdr;
+	u16 size;
 	int i;
 
 	for (i = 0; i < tuplehash->tuple.encap_num; i++) {
@@ -331,6 +356,12 @@ static void nf_flow_encap_pop(struct sk_buff *skb,
 			break;
 		}
 	}
+
+	if (skb->protocol == htons(ETH_P_IP) &&
+	    nf_flow_ip4_encap_proto(skb, &size)) {
+		skb_pull(skb, size);
+		skb_reset_network_header(skb);
+	}
 }
 
 static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
@@ -357,8 +388,7 @@ nf_flow_offload_lookup(struct nf_flowtable_ctx *ctx,
 {
 	struct flow_offload_tuple tuple = {};
 
-	if (skb->protocol != htons(ETH_P_IP) &&
-	    !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IP), &ctx->offset))
+	if (!nf_flow_skb_encap_protocol(skb, htons(ETH_P_IP), &ctx->offset))
 		return NULL;
 
 	if (nf_flow_tuple_ip(ctx, skb, &tuple) < 0)

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH nf-next v3 2/2] selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest
  2025-07-03 14:16 [PATCH nf-next v3 0/2] Add IPIP flowtable SW acceleratio Lorenzo Bianconi
  2025-07-03 14:16 ` [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration Lorenzo Bianconi
@ 2025-07-03 14:16 ` Lorenzo Bianconi
  1 sibling, 0 replies; 8+ messages in thread
From: Lorenzo Bianconi @ 2025-07-03 14:16 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Shuah Khan
  Cc: netdev, netfilter-devel, coreteam, linux-kselftest,
	Lorenzo Bianconi

Introduce specific selftest for IPIP flowtable SW acceleration in
nft_flowtable.sh

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 .../selftests/net/netfilter/nft_flowtable.sh       | 40 ++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/tools/testing/selftests/net/netfilter/nft_flowtable.sh b/tools/testing/selftests/net/netfilter/nft_flowtable.sh
index a4ee5496f2a17cedf1ee71214397012c7906650f..d1c9d3eeda2c9874008f9d6de6cabaabea79b9fb 100755
--- a/tools/testing/selftests/net/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/net/netfilter/nft_flowtable.sh
@@ -519,6 +519,44 @@ if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 ""; then
 	ip netns exec "$nsr1" nft list ruleset
 fi
 
+# IPIP tunnel test:
+# Add IPIP tunnel interfaces and check flowtable acceleration.
+test_ipip() {
+if ! ip -net "$nsr1" link add name tun0 type ipip \
+     local 192.168.10.1 remote 192.168.10.2 >/dev/null;then
+	echo "SKIP: could not add ipip tunnel"
+	[ "$ret" -eq 0 ] && ret=$ksft_skip
+	return
+fi
+ip -net "$nsr1" link set tun0 up
+ip -net "$nsr1" addr add 192.168.100.1/24 dev tun0
+ip netns exec "$nsr1" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null
+
+ip -net "$nsr2" link add name tun0 type ipip local 192.168.10.2 remote 192.168.10.1
+ip -net "$nsr2" link set tun0 up
+ip -net "$nsr2" addr add 192.168.100.2/24 dev tun0
+ip netns exec "$nsr2" sysctl net.ipv4.conf.tun0.forwarding=1 > /dev/null
+
+ip -net "$nsr1" route change default via 192.168.100.2
+ip -net "$nsr2" route change default via 192.168.100.1
+ip -net "$ns2" route add default via 10.0.2.1
+
+ip netns exec "$nsr1" nft -a insert rule inet filter forward 'meta oif tun0 accept'
+ip netns exec "$nsr1" nft -a insert rule inet filter forward \
+	'meta oif "veth0" tcp sport 12345 ct mark set 1 flow add @f1 counter name routed_repl accept'
+
+if ! test_tcp_forwarding_nat "$ns1" "$ns2" 1 "IPIP tunnel"; then
+	echo "FAIL: flow offload for ns1/ns2 with IPIP tunnel" 1>&2
+	ip netns exec "$nsr1" nft list ruleset
+	ret=1
+fi
+
+# Restore the previous configuration
+ip -net "$nsr1" route change default via 192.168.10.2
+ip -net "$nsr2" route change default via 192.168.10.1
+ip -net "$ns2" route del default via 10.0.2.1
+}
+
 # Another test:
 # Add bridge interface br0 to Router1, with NAT enabled.
 test_bridge() {
@@ -604,6 +642,8 @@ ip -net "$nsr1" addr add dead:1::1/64 dev veth0 nodad
 ip -net "$nsr1" link set up dev veth0
 }
 
+test_ipip
+
 test_bridge
 
 KEY_SHA="0x"$(ps -af | sha1sum | cut -d " " -f 1)

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration
  2025-07-03 14:16 ` [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration Lorenzo Bianconi
@ 2025-07-03 14:35   ` Pablo Neira Ayuso
  2025-07-04 13:00     ` Lorenzo Bianconi
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-03 14:35 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jozsef Kadlecsik, Shuah Khan, netdev,
	netfilter-devel, coreteam, linux-kselftest

On Thu, Jul 03, 2025 at 04:16:02PM +0200, Lorenzo Bianconi wrote:
> Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
> infrastructure.
> IPIP SW acceleration can be tested running the following scenario where
> the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
> tunnel is used to access a remote site (using eth1 as the underlay device):

Question below.

> ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)
> 
> $ip addr show
> 6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>     link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.0.2/24 scope global eth0
>        valid_lft forever preferred_lft forever
> 7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>     link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.1.1/24 scope global eth1
>        valid_lft forever preferred_lft forever
> 8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
>     link/ipip 192.168.1.1 peer 192.168.1.2
>     inet 192.168.100.1/24 scope global tun0
>        valid_lft forever preferred_lft forever
> 
> $ip route show
> default via 192.168.100.2 dev tun0
> 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
> 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
> 192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1
> 
> $nft list ruleset
> table inet filter {
>         flowtable ft {
>                 hook ingress priority filter
>                 devices = { eth0, eth1 }
>         }
> 
>         chain forward {
>                 type filter hook forward priority filter; policy accept;
>                 meta l4proto { tcp, udp } flow add @ft
>         }
> }
> 
> Reproducing the scenario described above using veths I got the following
> results:
> - TCP stream transmitted into the IPIP tunnel:
>   - net-next:				~41Gbps
>   - net-next + IPIP flowtbale support:	~40Gbps
                      ^^^^^^^^^
no gain on tx side.

> - TCP stream received from the IPIP tunnel:
>   - net-next:				~35Gbps
>   - net-next + IPIP flowtbale support:	~49Gbps
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  net/ipv4/ipip.c                  | 21 +++++++++++++++++++++
>  net/netfilter/nf_flow_table_ip.c | 34 ++++++++++++++++++++++++++++++++--
>  2 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
> index 3e03af073a1ccc3d7597a998a515b6cfdded40b5..05fb1c859170d74009d693bc8513183bdec3ff90 100644
> --- a/net/ipv4/ipip.c
> +++ b/net/ipv4/ipip.c
> @@ -353,6 +353,26 @@ ipip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p, int cmd)
>  	return ip_tunnel_ctl(dev, p, cmd);
>  }
>  
> +static int ipip_fill_forward_path(struct net_device_path_ctx *ctx,
> +				  struct net_device_path *path)
> +{
> +	struct ip_tunnel *tunnel = netdev_priv(ctx->dev);
> +	const struct iphdr *tiph = &tunnel->parms.iph;
> +	struct rtable *rt;
> +
> +	rt = ip_route_output(dev_net(ctx->dev), tiph->daddr, 0, 0, 0,
> +			     RT_SCOPE_UNIVERSE);
> +	if (IS_ERR(rt))
> +		return PTR_ERR(rt);
> +
> +	path->type = DEV_PATH_ETHERNET;
> +	path->dev = ctx->dev;
> +	ctx->dev = rt->dst.dev;
> +	ip_rt_put(rt);
> +
> +	return 0;
> +}
> +
>  static const struct net_device_ops ipip_netdev_ops = {
>  	.ndo_init       = ipip_tunnel_init,
>  	.ndo_uninit     = ip_tunnel_uninit,
> @@ -362,6 +382,7 @@ static const struct net_device_ops ipip_netdev_ops = {
>  	.ndo_get_stats64 = dev_get_tstats64,
>  	.ndo_get_iflink = ip_tunnel_get_iflink,
>  	.ndo_tunnel_ctl	= ipip_tunnel_ctl,
> +	.ndo_fill_forward_path = ipip_fill_forward_path,
>  };
>  
>  #define IPIP_FEATURES (NETIF_F_SG |		\
> diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> index 8cd4cf7ae21120f1057c4fce5aaca4e3152ae76d..6b55e00b1022f0a2b02d9bfd1bd34bb55c1b83f7 100644
> --- a/net/netfilter/nf_flow_table_ip.c
> +++ b/net/netfilter/nf_flow_table_ip.c
> @@ -277,13 +277,37 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
>  	return NF_STOLEN;
>  }
>  
> +static bool nf_flow_ip4_encap_proto(struct sk_buff *skb, u16 *size)
> +{
> +	struct iphdr *iph;
> +
> +	if (!pskb_may_pull(skb, sizeof(*iph)))
> +		return false;
> +
> +	iph = (struct iphdr *)skb_network_header(skb);
> +	*size = iph->ihl << 2;
> +
> +	if (ip_is_fragment(iph) || unlikely(ip_has_options(*size)))
> +		return false;
> +
> +	if (iph->ttl <= 1)
> +		return false;
> +
> +	return iph->protocol == IPPROTO_IPIP;

Once the flow is in the flowtable, it is possible to inject traffic
with forged outer IP header, this is only looking at the inner IP
header.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration
  2025-07-03 14:35   ` Pablo Neira Ayuso
@ 2025-07-04 13:00     ` Lorenzo Bianconi
  2025-07-07 19:58       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Bianconi @ 2025-07-04 13:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Lorenzo Bianconi, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jozsef Kadlecsik,
	Shuah Khan, netdev, netfilter-devel, coreteam, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 5568 bytes --]

> On Thu, Jul 03, 2025 at 04:16:02PM +0200, Lorenzo Bianconi wrote:
> > Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
> > infrastructure.
> > IPIP SW acceleration can be tested running the following scenario where
> > the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
> > tunnel is used to access a remote site (using eth1 as the underlay device):
> 
> Question below.
> 
> > ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)
> > 
> > $ip addr show
> > 6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> >     link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
> >     inet 192.168.0.2/24 scope global eth0
> >        valid_lft forever preferred_lft forever
> > 7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> >     link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
> >     inet 192.168.1.1/24 scope global eth1
> >        valid_lft forever preferred_lft forever
> > 8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
> >     link/ipip 192.168.1.1 peer 192.168.1.2
> >     inet 192.168.100.1/24 scope global tun0
> >        valid_lft forever preferred_lft forever
> > 
> > $ip route show
> > default via 192.168.100.2 dev tun0
> > 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
> > 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
> > 192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1
> > 
> > $nft list ruleset
> > table inet filter {
> >         flowtable ft {
> >                 hook ingress priority filter
> >                 devices = { eth0, eth1 }
> >         }
> > 
> >         chain forward {
> >                 type filter hook forward priority filter; policy accept;
> >                 meta l4proto { tcp, udp } flow add @ft
> >         }
> > }
> > 
> > Reproducing the scenario described above using veths I got the following
> > results:
> > - TCP stream transmitted into the IPIP tunnel:
> >   - net-next:				~41Gbps
> >   - net-next + IPIP flowtbale support:	~40Gbps
>                       ^^^^^^^^^
> no gain on tx side.

In this case the IPIP flowtable acceleration is effective just on the ACKs
packets so I guess it is expected we have ~ the same results. The real gain is
when the TCP stream is from the tunnel net_device to the NIC one.

> 
> > - TCP stream received from the IPIP tunnel:
> >   - net-next:				~35Gbps
> >   - net-next + IPIP flowtbale support:	~49Gbps
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  net/ipv4/ipip.c                  | 21 +++++++++++++++++++++
> >  net/netfilter/nf_flow_table_ip.c | 34 ++++++++++++++++++++++++++++++++--
> >  2 files changed, 53 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
> > index 3e03af073a1ccc3d7597a998a515b6cfdded40b5..05fb1c859170d74009d693bc8513183bdec3ff90 100644
> > --- a/net/ipv4/ipip.c
> > +++ b/net/ipv4/ipip.c
> > @@ -353,6 +353,26 @@ ipip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p, int cmd)
> >  	return ip_tunnel_ctl(dev, p, cmd);
> >  }
> >  
> > +static int ipip_fill_forward_path(struct net_device_path_ctx *ctx,
> > +				  struct net_device_path *path)
> > +{
> > +	struct ip_tunnel *tunnel = netdev_priv(ctx->dev);
> > +	const struct iphdr *tiph = &tunnel->parms.iph;
> > +	struct rtable *rt;
> > +
> > +	rt = ip_route_output(dev_net(ctx->dev), tiph->daddr, 0, 0, 0,
> > +			     RT_SCOPE_UNIVERSE);
> > +	if (IS_ERR(rt))
> > +		return PTR_ERR(rt);
> > +
> > +	path->type = DEV_PATH_ETHERNET;
> > +	path->dev = ctx->dev;
> > +	ctx->dev = rt->dst.dev;
> > +	ip_rt_put(rt);
> > +
> > +	return 0;
> > +}
> > +
> >  static const struct net_device_ops ipip_netdev_ops = {
> >  	.ndo_init       = ipip_tunnel_init,
> >  	.ndo_uninit     = ip_tunnel_uninit,
> > @@ -362,6 +382,7 @@ static const struct net_device_ops ipip_netdev_ops = {
> >  	.ndo_get_stats64 = dev_get_tstats64,
> >  	.ndo_get_iflink = ip_tunnel_get_iflink,
> >  	.ndo_tunnel_ctl	= ipip_tunnel_ctl,
> > +	.ndo_fill_forward_path = ipip_fill_forward_path,
> >  };
> >  
> >  #define IPIP_FEATURES (NETIF_F_SG |		\
> > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> > index 8cd4cf7ae21120f1057c4fce5aaca4e3152ae76d..6b55e00b1022f0a2b02d9bfd1bd34bb55c1b83f7 100644
> > --- a/net/netfilter/nf_flow_table_ip.c
> > +++ b/net/netfilter/nf_flow_table_ip.c
> > @@ -277,13 +277,37 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
> >  	return NF_STOLEN;
> >  }
> >  
> > +static bool nf_flow_ip4_encap_proto(struct sk_buff *skb, u16 *size)
> > +{
> > +	struct iphdr *iph;
> > +
> > +	if (!pskb_may_pull(skb, sizeof(*iph)))
> > +		return false;
> > +
> > +	iph = (struct iphdr *)skb_network_header(skb);
> > +	*size = iph->ihl << 2;
> > +
> > +	if (ip_is_fragment(iph) || unlikely(ip_has_options(*size)))
> > +		return false;
> > +
> > +	if (iph->ttl <= 1)
> > +		return false;
> > +
> > +	return iph->protocol == IPPROTO_IPIP;
> 

what kind of sanity checks are we supposed to perform? Something similar to
what we have in ip_rcv_core()?

> Once the flow is in the flowtable, it is possible to inject traffic
> with forged outer IP header, this is only looking at the inner IP
> header.

what is the difference with the plain IP/TCP use-case?

Regards,
Lorenzo

> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration
  2025-07-04 13:00     ` Lorenzo Bianconi
@ 2025-07-07 19:58       ` Pablo Neira Ayuso
  2025-07-08  7:58         ` Lorenzo Bianconi
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-07 19:58 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jozsef Kadlecsik,
	Shuah Khan, netdev, netfilter-devel, coreteam, linux-kselftest

On Fri, Jul 04, 2025 at 03:00:40PM +0200, Lorenzo Bianconi wrote:
> > On Thu, Jul 03, 2025 at 04:16:02PM +0200, Lorenzo Bianconi wrote:
> > > Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
> > > infrastructure.
> > > IPIP SW acceleration can be tested running the following scenario where
> > > the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
> > > tunnel is used to access a remote site (using eth1 as the underlay device):
> > 
> > Question below.
> > 
> > > ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)
> > > 
> > > $ip addr show
> > > 6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> > >     link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
> > >     inet 192.168.0.2/24 scope global eth0
> > >        valid_lft forever preferred_lft forever
> > > 7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> > >     link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
> > >     inet 192.168.1.1/24 scope global eth1
> > >        valid_lft forever preferred_lft forever
> > > 8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
> > >     link/ipip 192.168.1.1 peer 192.168.1.2
> > >     inet 192.168.100.1/24 scope global tun0
> > >        valid_lft forever preferred_lft forever
> > > 
> > > $ip route show
> > > default via 192.168.100.2 dev tun0
> > > 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
> > > 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
> > > 192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1
> > > 
> > > $nft list ruleset
> > > table inet filter {
> > >         flowtable ft {
> > >                 hook ingress priority filter
> > >                 devices = { eth0, eth1 }
> > >         }
> > > 
> > >         chain forward {
> > >                 type filter hook forward priority filter; policy accept;
> > >                 meta l4proto { tcp, udp } flow add @ft
> > >         }
> > > }
> > > 
> > > Reproducing the scenario described above using veths I got the following
> > > results:
> > > - TCP stream transmitted into the IPIP tunnel:
> > >   - net-next:				~41Gbps
> > >   - net-next + IPIP flowtbale support:	~40Gbps
> >                       ^^^^^^^^^
> > no gain on tx side.
> 
> In this case the IPIP flowtable acceleration is effective just on the ACKs
> packets so I guess it is expected we have ~ the same results. The real gain is
> when the TCP stream is from the tunnel net_device to the NIC one.

That is, only rx side follows the flowtable datapath.

> > > - TCP stream received from the IPIP tunnel:
> > >   - net-next:				~35Gbps
> > >   - net-next + IPIP flowtbale support:	~49Gbps
> > > 
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > ---
> > >  net/ipv4/ipip.c                  | 21 +++++++++++++++++++++
> > >  net/netfilter/nf_flow_table_ip.c | 34 ++++++++++++++++++++++++++++++++--
> > >  2 files changed, 53 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
> > > index 3e03af073a1ccc3d7597a998a515b6cfdded40b5..05fb1c859170d74009d693bc8513183bdec3ff90 100644
> > > --- a/net/ipv4/ipip.c
> > > +++ b/net/ipv4/ipip.c
> > > @@ -353,6 +353,26 @@ ipip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p, int cmd)
> > >  	return ip_tunnel_ctl(dev, p, cmd);
> > >  }
> > >  
> > > +static int ipip_fill_forward_path(struct net_device_path_ctx *ctx,
> > > +				  struct net_device_path *path)
> > > +{
> > > +	struct ip_tunnel *tunnel = netdev_priv(ctx->dev);
> > > +	const struct iphdr *tiph = &tunnel->parms.iph;
> > > +	struct rtable *rt;
> > > +
> > > +	rt = ip_route_output(dev_net(ctx->dev), tiph->daddr, 0, 0, 0,
> > > +			     RT_SCOPE_UNIVERSE);
> > > +	if (IS_ERR(rt))
> > > +		return PTR_ERR(rt);
> > > +
> > > +	path->type = DEV_PATH_ETHERNET;
> > > +	path->dev = ctx->dev;
> > > +	ctx->dev = rt->dst.dev;
> > > +	ip_rt_put(rt);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static const struct net_device_ops ipip_netdev_ops = {
> > >  	.ndo_init       = ipip_tunnel_init,
> > >  	.ndo_uninit     = ip_tunnel_uninit,
> > > @@ -362,6 +382,7 @@ static const struct net_device_ops ipip_netdev_ops = {
> > >  	.ndo_get_stats64 = dev_get_tstats64,
> > >  	.ndo_get_iflink = ip_tunnel_get_iflink,
> > >  	.ndo_tunnel_ctl	= ipip_tunnel_ctl,
> > > +	.ndo_fill_forward_path = ipip_fill_forward_path,
> > >  };
> > >  
> > >  #define IPIP_FEATURES (NETIF_F_SG |		\
> > > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> > > index 8cd4cf7ae21120f1057c4fce5aaca4e3152ae76d..6b55e00b1022f0a2b02d9bfd1bd34bb55c1b83f7 100644
> > > --- a/net/netfilter/nf_flow_table_ip.c
> > > +++ b/net/netfilter/nf_flow_table_ip.c
> > > @@ -277,13 +277,37 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
> > >  	return NF_STOLEN;
> > >  }
> > >  
> > > +static bool nf_flow_ip4_encap_proto(struct sk_buff *skb, u16 *size)
> > > +{
> > > +	struct iphdr *iph;
> > > +
> > > +	if (!pskb_may_pull(skb, sizeof(*iph)))
> > > +		return false;
> > > +
> > > +	iph = (struct iphdr *)skb_network_header(skb);
> > > +	*size = iph->ihl << 2;
> > > +
> > > +	if (ip_is_fragment(iph) || unlikely(ip_has_options(*size)))
> > > +		return false;
> > > +
> > > +	if (iph->ttl <= 1)
> > > +		return false;
> > > +
> > > +	return iph->protocol == IPPROTO_IPIP;
> > 
> 
> what kind of sanity checks are we supposed to perform? Something similar to
> what we have in ip_rcv_core()?

I am not referring to sanity checks.

VLAN/PPP ID (layer 2 encapsulation) is part of the lookup in the
flowtable, why IPIP (layer 3 tunnel) does not get the same handling?

> > Once the flow is in the flowtable, it is possible to inject traffic
> > with forged outer IP header, this is only looking at the inner IP
> > header.
> 
> what is the difference with the plain IP/TCP use-case?

Not referring to the generic packet forging scenario. I refer to the
scenario that would allow to forward packets for any IPIP outer header
given the inner header finds a matching in the flowtable. I think that
needs to be sorted out.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration
  2025-07-07 19:58       ` Pablo Neira Ayuso
@ 2025-07-08  7:58         ` Lorenzo Bianconi
  2025-07-14 19:06           ` Lorenzo Bianconi
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Bianconi @ 2025-07-08  7:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Lorenzo Bianconi, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jozsef Kadlecsik,
	Shuah Khan, netdev, netfilter-devel, coreteam, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 6893 bytes --]

On Jul 07, Pablo Neira Ayuso wrote:
> On Fri, Jul 04, 2025 at 03:00:40PM +0200, Lorenzo Bianconi wrote:
> > > On Thu, Jul 03, 2025 at 04:16:02PM +0200, Lorenzo Bianconi wrote:
> > > > Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
> > > > infrastructure.
> > > > IPIP SW acceleration can be tested running the following scenario where
> > > > the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
> > > > tunnel is used to access a remote site (using eth1 as the underlay device):
> > > 
> > > Question below.
> > > 
> > > > ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)
> > > > 
> > > > $ip addr show
> > > > 6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> > > >     link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
> > > >     inet 192.168.0.2/24 scope global eth0
> > > >        valid_lft forever preferred_lft forever
> > > > 7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> > > >     link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
> > > >     inet 192.168.1.1/24 scope global eth1
> > > >        valid_lft forever preferred_lft forever
> > > > 8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
> > > >     link/ipip 192.168.1.1 peer 192.168.1.2
> > > >     inet 192.168.100.1/24 scope global tun0
> > > >        valid_lft forever preferred_lft forever
> > > > 
> > > > $ip route show
> > > > default via 192.168.100.2 dev tun0
> > > > 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
> > > > 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
> > > > 192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1
> > > > 
> > > > $nft list ruleset
> > > > table inet filter {
> > > >         flowtable ft {
> > > >                 hook ingress priority filter
> > > >                 devices = { eth0, eth1 }
> > > >         }
> > > > 
> > > >         chain forward {
> > > >                 type filter hook forward priority filter; policy accept;
> > > >                 meta l4proto { tcp, udp } flow add @ft
> > > >         }
> > > > }
> > > > 
> > > > Reproducing the scenario described above using veths I got the following
> > > > results:
> > > > - TCP stream transmitted into the IPIP tunnel:
> > > >   - net-next:				~41Gbps
> > > >   - net-next + IPIP flowtbale support:	~40Gbps
> > >                       ^^^^^^^^^
> > > no gain on tx side.
> > 
> > In this case the IPIP flowtable acceleration is effective just on the ACKs
> > packets so I guess it is expected we have ~ the same results. The real gain is
> > when the TCP stream is from the tunnel net_device to the NIC one.
> 
> That is, only rx side follows the flowtable datapath.
> 
> > > > - TCP stream received from the IPIP tunnel:
> > > >   - net-next:				~35Gbps
> > > >   - net-next + IPIP flowtbale support:	~49Gbps
> > > > 
> > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > ---
> > > >  net/ipv4/ipip.c                  | 21 +++++++++++++++++++++
> > > >  net/netfilter/nf_flow_table_ip.c | 34 ++++++++++++++++++++++++++++++++--
> > > >  2 files changed, 53 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
> > > > index 3e03af073a1ccc3d7597a998a515b6cfdded40b5..05fb1c859170d74009d693bc8513183bdec3ff90 100644
> > > > --- a/net/ipv4/ipip.c
> > > > +++ b/net/ipv4/ipip.c
> > > > @@ -353,6 +353,26 @@ ipip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p, int cmd)
> > > >  	return ip_tunnel_ctl(dev, p, cmd);
> > > >  }
> > > >  
> > > > +static int ipip_fill_forward_path(struct net_device_path_ctx *ctx,
> > > > +				  struct net_device_path *path)
> > > > +{
> > > > +	struct ip_tunnel *tunnel = netdev_priv(ctx->dev);
> > > > +	const struct iphdr *tiph = &tunnel->parms.iph;
> > > > +	struct rtable *rt;
> > > > +
> > > > +	rt = ip_route_output(dev_net(ctx->dev), tiph->daddr, 0, 0, 0,
> > > > +			     RT_SCOPE_UNIVERSE);
> > > > +	if (IS_ERR(rt))
> > > > +		return PTR_ERR(rt);
> > > > +
> > > > +	path->type = DEV_PATH_ETHERNET;
> > > > +	path->dev = ctx->dev;
> > > > +	ctx->dev = rt->dst.dev;
> > > > +	ip_rt_put(rt);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  static const struct net_device_ops ipip_netdev_ops = {
> > > >  	.ndo_init       = ipip_tunnel_init,
> > > >  	.ndo_uninit     = ip_tunnel_uninit,
> > > > @@ -362,6 +382,7 @@ static const struct net_device_ops ipip_netdev_ops = {
> > > >  	.ndo_get_stats64 = dev_get_tstats64,
> > > >  	.ndo_get_iflink = ip_tunnel_get_iflink,
> > > >  	.ndo_tunnel_ctl	= ipip_tunnel_ctl,
> > > > +	.ndo_fill_forward_path = ipip_fill_forward_path,
> > > >  };
> > > >  
> > > >  #define IPIP_FEATURES (NETIF_F_SG |		\
> > > > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> > > > index 8cd4cf7ae21120f1057c4fce5aaca4e3152ae76d..6b55e00b1022f0a2b02d9bfd1bd34bb55c1b83f7 100644
> > > > --- a/net/netfilter/nf_flow_table_ip.c
> > > > +++ b/net/netfilter/nf_flow_table_ip.c
> > > > @@ -277,13 +277,37 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
> > > >  	return NF_STOLEN;
> > > >  }
> > > >  
> > > > +static bool nf_flow_ip4_encap_proto(struct sk_buff *skb, u16 *size)
> > > > +{
> > > > +	struct iphdr *iph;
> > > > +
> > > > +	if (!pskb_may_pull(skb, sizeof(*iph)))
> > > > +		return false;
> > > > +
> > > > +	iph = (struct iphdr *)skb_network_header(skb);
> > > > +	*size = iph->ihl << 2;
> > > > +
> > > > +	if (ip_is_fragment(iph) || unlikely(ip_has_options(*size)))
> > > > +		return false;
> > > > +
> > > > +	if (iph->ttl <= 1)
> > > > +		return false;
> > > > +
> > > > +	return iph->protocol == IPPROTO_IPIP;
> > > 
> > 
> > what kind of sanity checks are we supposed to perform? Something similar to
> > what we have in ip_rcv_core()?
> 
> I am not referring to sanity checks.
> 
> VLAN/PPP ID (layer 2 encapsulation) is part of the lookup in the
> flowtable, why IPIP (layer 3 tunnel) does not get the same handling?

ack, right. Do you have any suggestion about what field (or combination
of fields) we can use from the outer IP header similar to the VLAN/PPP
encapsulation?

> 
> > > Once the flow is in the flowtable, it is possible to inject traffic
> > > with forged outer IP header, this is only looking at the inner IP
> > > header.
> > 
> > what is the difference with the plain IP/TCP use-case?
> 
> Not referring to the generic packet forging scenario. I refer to the
> scenario that would allow to forward packets for any IPIP outer header
> given the inner header finds a matching in the flowtable. I think that
> needs to be sorted out.

ack.

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration
  2025-07-08  7:58         ` Lorenzo Bianconi
@ 2025-07-14 19:06           ` Lorenzo Bianconi
  0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Bianconi @ 2025-07-14 19:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Lorenzo Bianconi, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jozsef Kadlecsik,
	Shuah Khan, netdev, netfilter-devel, coreteam, linux-kselftest

[-- Attachment #1: Type: text/plain, Size: 7346 bytes --]

> On Jul 07, Pablo Neira Ayuso wrote:
> > On Fri, Jul 04, 2025 at 03:00:40PM +0200, Lorenzo Bianconi wrote:
> > > > On Thu, Jul 03, 2025 at 04:16:02PM +0200, Lorenzo Bianconi wrote:
> > > > > Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
> > > > > infrastructure.
> > > > > IPIP SW acceleration can be tested running the following scenario where
> > > > > the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
> > > > > tunnel is used to access a remote site (using eth1 as the underlay device):
> > > > 
> > > > Question below.
> > > > 
> > > > > ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)
> > > > > 
> > > > > $ip addr show
> > > > > 6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> > > > >     link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
> > > > >     inet 192.168.0.2/24 scope global eth0
> > > > >        valid_lft forever preferred_lft forever
> > > > > 7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> > > > >     link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
> > > > >     inet 192.168.1.1/24 scope global eth1
> > > > >        valid_lft forever preferred_lft forever
> > > > > 8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
> > > > >     link/ipip 192.168.1.1 peer 192.168.1.2
> > > > >     inet 192.168.100.1/24 scope global tun0
> > > > >        valid_lft forever preferred_lft forever
> > > > > 
> > > > > $ip route show
> > > > > default via 192.168.100.2 dev tun0
> > > > > 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
> > > > > 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
> > > > > 192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1
> > > > > 
> > > > > $nft list ruleset
> > > > > table inet filter {
> > > > >         flowtable ft {
> > > > >                 hook ingress priority filter
> > > > >                 devices = { eth0, eth1 }
> > > > >         }
> > > > > 
> > > > >         chain forward {
> > > > >                 type filter hook forward priority filter; policy accept;
> > > > >                 meta l4proto { tcp, udp } flow add @ft
> > > > >         }
> > > > > }
> > > > > 
> > > > > Reproducing the scenario described above using veths I got the following
> > > > > results:
> > > > > - TCP stream transmitted into the IPIP tunnel:
> > > > >   - net-next:				~41Gbps
> > > > >   - net-next + IPIP flowtbale support:	~40Gbps
> > > >                       ^^^^^^^^^
> > > > no gain on tx side.
> > > 
> > > In this case the IPIP flowtable acceleration is effective just on the ACKs
> > > packets so I guess it is expected we have ~ the same results. The real gain is
> > > when the TCP stream is from the tunnel net_device to the NIC one.
> > 
> > That is, only rx side follows the flowtable datapath.
> > 
> > > > > - TCP stream received from the IPIP tunnel:
> > > > >   - net-next:				~35Gbps
> > > > >   - net-next + IPIP flowtbale support:	~49Gbps
> > > > > 
> > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > > ---
> > > > >  net/ipv4/ipip.c                  | 21 +++++++++++++++++++++
> > > > >  net/netfilter/nf_flow_table_ip.c | 34 ++++++++++++++++++++++++++++++++--
> > > > >  2 files changed, 53 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
> > > > > index 3e03af073a1ccc3d7597a998a515b6cfdded40b5..05fb1c859170d74009d693bc8513183bdec3ff90 100644
> > > > > --- a/net/ipv4/ipip.c
> > > > > +++ b/net/ipv4/ipip.c
> > > > > @@ -353,6 +353,26 @@ ipip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p, int cmd)
> > > > >  	return ip_tunnel_ctl(dev, p, cmd);
> > > > >  }
> > > > >  
> > > > > +static int ipip_fill_forward_path(struct net_device_path_ctx *ctx,
> > > > > +				  struct net_device_path *path)
> > > > > +{
> > > > > +	struct ip_tunnel *tunnel = netdev_priv(ctx->dev);
> > > > > +	const struct iphdr *tiph = &tunnel->parms.iph;
> > > > > +	struct rtable *rt;
> > > > > +
> > > > > +	rt = ip_route_output(dev_net(ctx->dev), tiph->daddr, 0, 0, 0,
> > > > > +			     RT_SCOPE_UNIVERSE);
> > > > > +	if (IS_ERR(rt))
> > > > > +		return PTR_ERR(rt);
> > > > > +
> > > > > +	path->type = DEV_PATH_ETHERNET;
> > > > > +	path->dev = ctx->dev;
> > > > > +	ctx->dev = rt->dst.dev;
> > > > > +	ip_rt_put(rt);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >  static const struct net_device_ops ipip_netdev_ops = {
> > > > >  	.ndo_init       = ipip_tunnel_init,
> > > > >  	.ndo_uninit     = ip_tunnel_uninit,
> > > > > @@ -362,6 +382,7 @@ static const struct net_device_ops ipip_netdev_ops = {
> > > > >  	.ndo_get_stats64 = dev_get_tstats64,
> > > > >  	.ndo_get_iflink = ip_tunnel_get_iflink,
> > > > >  	.ndo_tunnel_ctl	= ipip_tunnel_ctl,
> > > > > +	.ndo_fill_forward_path = ipip_fill_forward_path,
> > > > >  };
> > > > >  
> > > > >  #define IPIP_FEATURES (NETIF_F_SG |		\
> > > > > diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
> > > > > index 8cd4cf7ae21120f1057c4fce5aaca4e3152ae76d..6b55e00b1022f0a2b02d9bfd1bd34bb55c1b83f7 100644
> > > > > --- a/net/netfilter/nf_flow_table_ip.c
> > > > > +++ b/net/netfilter/nf_flow_table_ip.c
> > > > > @@ -277,13 +277,37 @@ static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
> > > > >  	return NF_STOLEN;
> > > > >  }
> > > > >  
> > > > > +static bool nf_flow_ip4_encap_proto(struct sk_buff *skb, u16 *size)
> > > > > +{
> > > > > +	struct iphdr *iph;
> > > > > +
> > > > > +	if (!pskb_may_pull(skb, sizeof(*iph)))
> > > > > +		return false;
> > > > > +
> > > > > +	iph = (struct iphdr *)skb_network_header(skb);
> > > > > +	*size = iph->ihl << 2;
> > > > > +
> > > > > +	if (ip_is_fragment(iph) || unlikely(ip_has_options(*size)))
> > > > > +		return false;
> > > > > +
> > > > > +	if (iph->ttl <= 1)
> > > > > +		return false;
> > > > > +
> > > > > +	return iph->protocol == IPPROTO_IPIP;
> > > > 
> > > 
> > > what kind of sanity checks are we supposed to perform? Something similar to
> > > what we have in ip_rcv_core()?
> > 
> > I am not referring to sanity checks.
> > 
> > VLAN/PPP ID (layer 2 encapsulation) is part of the lookup in the
> > flowtable, why IPIP (layer 3 tunnel) does not get the same handling?
> 
> ack, right. Do you have any suggestion about what field (or combination
> of fields) we can use from the outer IP header similar to the VLAN/PPP
> encapsulation?

What about a hash computed over some of the outer IP header fields? (e.g IP
saddr and daddr).

Regards,
Lorenzo

> 
> > 
> > > > Once the flow is in the flowtable, it is possible to inject traffic
> > > > with forged outer IP header, this is only looking at the inner IP
> > > > header.
> > > 
> > > what is the difference with the plain IP/TCP use-case?
> > 
> > Not referring to the generic packet forging scenario. I refer to the
> > scenario that would allow to forward packets for any IPIP outer header
> > given the inner header finds a matching in the flowtable. I think that
> > needs to be sorted out.
> 
> ack.
> 
> Regards,
> Lorenzo



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-14 19:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-03 14:16 [PATCH nf-next v3 0/2] Add IPIP flowtable SW acceleratio Lorenzo Bianconi
2025-07-03 14:16 ` [PATCH nf-next v3 1/2] net: netfilter: Add IPIP flowtable SW acceleration Lorenzo Bianconi
2025-07-03 14:35   ` Pablo Neira Ayuso
2025-07-04 13:00     ` Lorenzo Bianconi
2025-07-07 19:58       ` Pablo Neira Ayuso
2025-07-08  7:58         ` Lorenzo Bianconi
2025-07-14 19:06           ` Lorenzo Bianconi
2025-07-03 14:16 ` [PATCH nf-next v3 2/2] selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest Lorenzo Bianconi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).