[PATCH] xfrm: Add pre-encap fragmentation for packet offload

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] xfrm: Add pre-encap fragmentation for packet offload
@ 2024-11-24  9:35 Ilia Lin
  2024-11-24 12:04 ` Leon Romanovsky
  2024-11-26 12:51 ` Steffen Klassert
  0 siblings, 2 replies; 12+ messages in thread
From: Ilia Lin @ 2024-11-24  9:35 UTC (permalink / raw)
  To: steffen.klassert, leonro, herbert, davem, dsahern, edumazet, kuba,
	pabeni, horms
  Cc: netdev, linux-kernel

In packet offload mode the raw packets will be sent to the NiC,
and will not return to the Network Stack. In event of crossing
the MTU size after the encapsulation, the NiC HW may not be
able to fragment the final packet.
Adding mandatory pre-encapsulation fragmentation for both
IPv4 and IPv6, if tunnel mode with packet offload is configured
on the state.

Signed-off-by: Ilia Lin <ilia.lin@kernel.org>
---
 net/ipv4/xfrm4_output.c | 31 +++++++++++++++++++++++++++++--
 net/ipv6/xfrm6_output.c |  8 ++++++--
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 3cff51ba72bb0..a4271e0dd51bb 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -14,17 +14,44 @@
 #include <net/xfrm.h>
 #include <net/icmp.h>
 
+static int __xfrm4_output_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	return xfrm_output(sk, skb);
+}
+
 static int __xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-#ifdef CONFIG_NETFILTER
-	struct xfrm_state *x = skb_dst(skb)->xfrm;
+	struct dst_entry *dst = skb_dst(skb);
+	struct xfrm_state *x = dst->xfrm;
+	unsigned int mtu;
+	bool toobig;
 
+#ifdef CONFIG_NETFILTER
 	if (!x) {
 		IPCB(skb)->flags |= IPSKB_REROUTED;
 		return dst_output(net, sk, skb);
 	}
 #endif
 
+	if (x->props.mode != XFRM_MODE_TUNNEL || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+		goto skip_frag;
+
+	mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
+
+	toobig = skb->len > mtu && !skb_is_gso(skb);
+
+	if (!skb->ignore_df && toobig && skb->sk) {
+		xfrm_local_error(skb, mtu);
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	if (toobig) {
+		IPCB(skb)->frag_max_size = mtu;
+		return ip_do_fragment(net, sk, skb, __xfrm4_output_finish);
+	}
+
+skip_frag:
 	return xfrm_output(sk, skb);
 }
 
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 5f7b1fdbffe62..fdd2f2f5adc71 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -75,10 +75,14 @@ static int __xfrm6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (x->props.mode != XFRM_MODE_TUNNEL)
 		goto skip_frag;
 
-	if (skb->protocol == htons(ETH_P_IPV6))
+	if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET) {
+		mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
+		IP6CB(skb)->frag_max_size = mtu;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
 		mtu = ip6_skb_dst_mtu(skb);
-	else
+	} else {
 		mtu = dst_mtu(skb_dst(skb));
+	}
 
 	toobig = skb->len > mtu && !skb_is_gso(skb);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-24  9:35 [PATCH] xfrm: Add pre-encap fragmentation for packet offload Ilia Lin
@ 2024-11-24 12:04 ` Leon Romanovsky
  2024-11-25  9:26   ` Ilia Lin
  2024-11-26 12:51 ` Steffen Klassert
  1 sibling, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2024-11-24 12:04 UTC (permalink / raw)
  To: Ilia Lin
  Cc: steffen.klassert, herbert, davem, dsahern, edumazet, kuba, pabeni,
	horms, netdev, linux-kernel

On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> In packet offload mode the raw packets will be sent to the NiC,
> and will not return to the Network Stack. In event of crossing
> the MTU size after the encapsulation, the NiC HW may not be
> able to fragment the final packet.

Yes, HW doesn't know how to handle these packets.

> Adding mandatory pre-encapsulation fragmentation for both
> IPv4 and IPv6, if tunnel mode with packet offload is configured
> on the state.

I was under impression is that xfrm_dev_offload_ok() is responsible to
prevent fragmentation.
https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-24 12:04 ` Leon Romanovsky
@ 2024-11-25  9:26   ` Ilia Lin
  2024-11-25 19:43     ` Leon Romanovsky
  0 siblings, 1 reply; 12+ messages in thread
From: Ilia Lin @ 2024-11-25  9:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ilia Lin, steffen.klassert, herbert, davem, dsahern, edumazet,
	kuba, pabeni, horms, netdev, linux-kernel

On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > In packet offload mode the raw packets will be sent to the NiC,
> > and will not return to the Network Stack. In event of crossing
> > the MTU size after the encapsulation, the NiC HW may not be
> > able to fragment the final packet.
>
> Yes, HW doesn't know how to handle these packets.
>
> > Adding mandatory pre-encapsulation fragmentation for both
> > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > on the state.
>
> I was under impression is that xfrm_dev_offload_ok() is responsible to
> prevent fragmentation.
> https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410

With my change we can both support inner fragmentation or prevent it,
depending on the network device driver implementation.

>
> Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-25  9:26   ` Ilia Lin
@ 2024-11-25 19:43     ` Leon Romanovsky
  2024-11-26  7:48       ` Ilia Lin
       [not found]       ` <CA+5LGR0e677wm5zEx9yYZDtsCUL6etMoRB2yF9o5msqdVOWU8w@mail.gmail.com>
  0 siblings, 2 replies; 12+ messages in thread
From: Leon Romanovsky @ 2024-11-25 19:43 UTC (permalink / raw)
  To: Ilia Lin
  Cc: steffen.klassert, herbert, davem, dsahern, edumazet, kuba, pabeni,
	horms, netdev, linux-kernel

On Mon, Nov 25, 2024 at 11:26:14AM +0200, Ilia Lin wrote:
> On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > > In packet offload mode the raw packets will be sent to the NiC,
> > > and will not return to the Network Stack. In event of crossing
> > > the MTU size after the encapsulation, the NiC HW may not be
> > > able to fragment the final packet.
> >
> > Yes, HW doesn't know how to handle these packets.
> >
> > > Adding mandatory pre-encapsulation fragmentation for both
> > > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > > on the state.
> >
> > I was under impression is that xfrm_dev_offload_ok() is responsible to
> > prevent fragmentation.
> > https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410
> 
> With my change we can both support inner fragmentation or prevent it,
> depending on the network device driver implementation.

The thing is that fragmentation isn't desirable thing. Why didn't PMTU
take into account headers so we can rely on existing code and do not add
extra logic for packet offload?

Thanks

> 
> >
> > Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-25 19:43     ` Leon Romanovsky
@ 2024-11-26  7:48       ` Ilia Lin
       [not found]       ` <CA+5LGR0e677wm5zEx9yYZDtsCUL6etMoRB2yF9o5msqdVOWU8w@mail.gmail.com>
  1 sibling, 0 replies; 12+ messages in thread
From: Ilia Lin @ 2024-11-26  7:48 UTC (permalink / raw)
  To: leon
  Cc: davem, dsahern, edumazet, herbert, horms, ilia.lin, kuba,
	linux-kernel, netdev, pabeni, steffen.klassert

On Mon, Nov 25, 2024 at 9:43 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Nov 25, 2024 at 11:26:14AM +0200, Ilia Lin wrote:
> > On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > > > In packet offload mode the raw packets will be sent to the NiC,
> > > > and will not return to the Network Stack. In event of crossing
> > > > the MTU size after the encapsulation, the NiC HW may not be
> > > > able to fragment the final packet.
> > >
> > > Yes, HW doesn't know how to handle these packets.
> > >
> > > > Adding mandatory pre-encapsulation fragmentation for both
> > > > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > > > on the state.
> > >
> > > I was under impression is that xfrm_dev_offload_ok() is responsible to
> > > prevent fragmentation.
> > > https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410
> >
> > With my change we can both support inner fragmentation or prevent it,
> > depending on the network device driver implementation.
>
> The thing is that fragmentation isn't desirable thing. Why didn't PMTU
> take into account headers so we can rely on existing code and do not add
> extra logic for packet offload?

I agree that PMTU is a preferred option, but the packets may be routed from
a host behind the VPN, which is unaware that it transmits into an IPsec tunnel,
and therefore will not count on the extra headers.

>
> Thanks
>
> >
> > >
> > > Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
       [not found]       ` <CA+5LGR0e677wm5zEx9yYZDtsCUL6etMoRB2yF9o5msqdVOWU8w@mail.gmail.com>
@ 2024-11-26  8:35         ` Leon Romanovsky
  2024-11-26 12:59           ` Steffen Klassert
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2024-11-26  8:35 UTC (permalink / raw)
  To: Ilia Lin, Steffen Klassert
  Cc: herbert, David Miller, dsahern, edumazet, kuba, pabeni, horms,
	netdev, open list

On Tue, Nov 26, 2024 at 09:09:03AM +0200, Ilia Lin wrote:
> On Mon, Nov 25, 2024 at 9:43 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Mon, Nov 25, 2024 at 11:26:14AM +0200, Ilia Lin wrote:
> > > On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > > > > In packet offload mode the raw packets will be sent to the NiC,
> > > > > and will not return to the Network Stack. In event of crossing
> > > > > the MTU size after the encapsulation, the NiC HW may not be
> > > > > able to fragment the final packet.
> > > >
> > > > Yes, HW doesn't know how to handle these packets.
> > > >
> > > > > Adding mandatory pre-encapsulation fragmentation for both
> > > > > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > > > > on the state.
> > > >
> > > > I was under impression is that xfrm_dev_offload_ok() is responsible to
> > > > prevent fragmentation.
> > > >
> https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410
> > >
> > > With my change we can both support inner fragmentation or prevent it,
> > > depending on the network device driver implementation.
> >
> > The thing is that fragmentation isn't desirable thing. Why didn't PMTU
> > take into account headers so we can rely on existing code and do not add
> > extra logic for packet offload?
> 
> I agree that PMTU is preferred option, but the packets may be routed from
> a host behind the VPN, which is unaware that it transmits into an IPsec
> tunnel,
> and therefore will not count on the extra headers.

My basic web search shows that PMTU works correctly for IPsec tunnels too.

Steffen, do we need special case for packet offload here? My preference is
to make sure that we will have as less possible special cases for packet
offload.

Thanks

> >
> > Thanks
> >
> > >
> > > >
> > > > Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-24  9:35 [PATCH] xfrm: Add pre-encap fragmentation for packet offload Ilia Lin
  2024-11-24 12:04 ` Leon Romanovsky
@ 2024-11-26 12:51 ` Steffen Klassert
  2024-11-26 13:22   ` Leon Romanovsky
  1 sibling, 1 reply; 12+ messages in thread
From: Steffen Klassert @ 2024-11-26 12:51 UTC (permalink / raw)
  To: Ilia Lin
  Cc: leonro, herbert, davem, dsahern, edumazet, kuba, pabeni, horms,
	netdev, linux-kernel

On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> In packet offload mode the raw packets will be sent to the NiC,
> and will not return to the Network Stack. In event of crossing
> the MTU size after the encapsulation, the NiC HW may not be
> able to fragment the final packet.
> Adding mandatory pre-encapsulation fragmentation for both
> IPv4 and IPv6, if tunnel mode with packet offload is configured
> on the state.
> 
> Signed-off-by: Ilia Lin <ilia.lin@kernel.org>
> ---
>  net/ipv4/xfrm4_output.c | 31 +++++++++++++++++++++++++++++--
>  net/ipv6/xfrm6_output.c |  8 ++++++--
>  2 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
> index 3cff51ba72bb0..a4271e0dd51bb 100644
> --- a/net/ipv4/xfrm4_output.c
> +++ b/net/ipv4/xfrm4_output.c
> @@ -14,17 +14,44 @@
>  #include <net/xfrm.h>
>  #include <net/icmp.h>
>  
> +static int __xfrm4_output_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
> +{
> +	return xfrm_output(sk, skb);
> +}
> +
>  static int __xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>  {
> -#ifdef CONFIG_NETFILTER
> -	struct xfrm_state *x = skb_dst(skb)->xfrm;
> +	struct dst_entry *dst = skb_dst(skb);
> +	struct xfrm_state *x = dst->xfrm;
> +	unsigned int mtu;
> +	bool toobig;
>  
> +#ifdef CONFIG_NETFILTER
>  	if (!x) {
>  		IPCB(skb)->flags |= IPSKB_REROUTED;
>  		return dst_output(net, sk, skb);
>  	}
>  #endif
>  
> +	if (x->props.mode != XFRM_MODE_TUNNEL || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
> +		goto skip_frag;
> +
> +	mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
> +
> +	toobig = skb->len > mtu && !skb_is_gso(skb);
> +
> +	if (!skb->ignore_df && toobig && skb->sk) {
> +		xfrm_local_error(skb, mtu);
> +		kfree_skb(skb);
> +		return -EMSGSIZE;
> +	}
> +
> +	if (toobig) {
> +		IPCB(skb)->frag_max_size = mtu;
> +		return ip_do_fragment(net, sk, skb, __xfrm4_output_finish);
> +	}

This would fragment the packet even if the DF bit is set.

Please no further packet offload stuff in generic code.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-26  8:35         ` Leon Romanovsky
@ 2024-11-26 12:59           ` Steffen Klassert
  2024-11-26 13:21             ` Leon Romanovsky
  0 siblings, 1 reply; 12+ messages in thread
From: Steffen Klassert @ 2024-11-26 12:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ilia Lin, herbert, David Miller, dsahern, edumazet, kuba, pabeni,
	horms, netdev, open list

On Tue, Nov 26, 2024 at 10:35:13AM +0200, Leon Romanovsky wrote:
> On Tue, Nov 26, 2024 at 09:09:03AM +0200, Ilia Lin wrote:
> > On Mon, Nov 25, 2024 at 9:43 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Mon, Nov 25, 2024 at 11:26:14AM +0200, Ilia Lin wrote:
> > > > On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > > > > > In packet offload mode the raw packets will be sent to the NiC,
> > > > > > and will not return to the Network Stack. In event of crossing
> > > > > > the MTU size after the encapsulation, the NiC HW may not be
> > > > > > able to fragment the final packet.
> > > > >
> > > > > Yes, HW doesn't know how to handle these packets.
> > > > >
> > > > > > Adding mandatory pre-encapsulation fragmentation for both
> > > > > > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > > > > > on the state.
> > > > >
> > > > > I was under impression is that xfrm_dev_offload_ok() is responsible to
> > > > > prevent fragmentation.
> > > > >
> > https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410
> > > >
> > > > With my change we can both support inner fragmentation or prevent it,
> > > > depending on the network device driver implementation.
> > >
> > > The thing is that fragmentation isn't desirable thing. Why didn't PMTU
> > > take into account headers so we can rely on existing code and do not add
> > > extra logic for packet offload?
> > 
> > I agree that PMTU is preferred option, but the packets may be routed from
> > a host behind the VPN, which is unaware that it transmits into an IPsec
> > tunnel,
> > and therefore will not count on the extra headers.
> 
> My basic web search shows that PMTU works correctly for IPsec tunnels too.

Yes, at least SW and crypto offload IPsec PMTU works correctly.

> 
> Steffen, do we need special case for packet offload here? My preference is
> to make sure that we will have as less possible special cases for packet
> offload.

Looks like the problem on packet offload is that packets
bigger than MTU size are dropped before the PMTU signaling
is handled.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-26 12:59           ` Steffen Klassert
@ 2024-11-26 13:21             ` Leon Romanovsky
  2024-11-28  9:25               ` Steffen Klassert
  0 siblings, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2024-11-26 13:21 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Ilia Lin, herbert, David Miller, dsahern, edumazet, kuba, pabeni,
	horms, netdev, open list

On Tue, Nov 26, 2024 at 01:59:31PM +0100, Steffen Klassert wrote:
> On Tue, Nov 26, 2024 at 10:35:13AM +0200, Leon Romanovsky wrote:
> > On Tue, Nov 26, 2024 at 09:09:03AM +0200, Ilia Lin wrote:
> > > On Mon, Nov 25, 2024 at 9:43 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Mon, Nov 25, 2024 at 11:26:14AM +0200, Ilia Lin wrote:
> > > > > On Sun, Nov 24, 2024 at 2:04 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > > > > > > In packet offload mode the raw packets will be sent to the NiC,
> > > > > > > and will not return to the Network Stack. In event of crossing
> > > > > > > the MTU size after the encapsulation, the NiC HW may not be
> > > > > > > able to fragment the final packet.
> > > > > >
> > > > > > Yes, HW doesn't know how to handle these packets.
> > > > > >
> > > > > > > Adding mandatory pre-encapsulation fragmentation for both
> > > > > > > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > > > > > > on the state.
> > > > > >
> > > > > > I was under impression is that xfrm_dev_offload_ok() is responsible to
> > > > > > prevent fragmentation.
> > > > > >
> > > https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410
> > > > >
> > > > > With my change we can both support inner fragmentation or prevent it,
> > > > > depending on the network device driver implementation.
> > > >
> > > > The thing is that fragmentation isn't desirable thing. Why didn't PMTU
> > > > take into account headers so we can rely on existing code and do not add
> > > > extra logic for packet offload?
> > > 
> > > I agree that PMTU is preferred option, but the packets may be routed from
> > > a host behind the VPN, which is unaware that it transmits into an IPsec
> > > tunnel,
> > > and therefore will not count on the extra headers.
> > 
> > My basic web search shows that PMTU works correctly for IPsec tunnels too.
> 
> Yes, at least SW and crypto offload IPsec PMTU works correctly.
> 
> > 
> > Steffen, do we need special case for packet offload here? My preference is
> > to make sure that we will have as less possible special cases for packet
> > offload.
> 
> Looks like the problem on packet offload is that packets
> bigger than MTU size are dropped before the PMTU signaling
> is handled.

But PMTU should be less or equal to MTU, even before first packet was
sent. Otherwise already first packet will be fragmented.

Thanks


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-26 12:51 ` Steffen Klassert
@ 2024-11-26 13:22   ` Leon Romanovsky
  0 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2024-11-26 13:22 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Ilia Lin, herbert, davem, dsahern, edumazet, kuba, pabeni, horms,
	netdev, linux-kernel

On Tue, Nov 26, 2024 at 01:51:42PM +0100, Steffen Klassert wrote:
> On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> > In packet offload mode the raw packets will be sent to the NiC,
> > and will not return to the Network Stack. In event of crossing
> > the MTU size after the encapsulation, the NiC HW may not be
> > able to fragment the final packet.
> > Adding mandatory pre-encapsulation fragmentation for both
> > IPv4 and IPv6, if tunnel mode with packet offload is configured
> > on the state.
> > 
> > Signed-off-by: Ilia Lin <ilia.lin@kernel.org>
> > ---
> >  net/ipv4/xfrm4_output.c | 31 +++++++++++++++++++++++++++++--
> >  net/ipv6/xfrm6_output.c |  8 ++++++--
> >  2 files changed, 35 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
> > index 3cff51ba72bb0..a4271e0dd51bb 100644
> > --- a/net/ipv4/xfrm4_output.c
> > +++ b/net/ipv4/xfrm4_output.c
> > @@ -14,17 +14,44 @@
> >  #include <net/xfrm.h>
> >  #include <net/icmp.h>
> >  
> > +static int __xfrm4_output_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
> > +{
> > +	return xfrm_output(sk, skb);
> > +}
> > +
> >  static int __xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb)
> >  {
> > -#ifdef CONFIG_NETFILTER
> > -	struct xfrm_state *x = skb_dst(skb)->xfrm;
> > +	struct dst_entry *dst = skb_dst(skb);
> > +	struct xfrm_state *x = dst->xfrm;
> > +	unsigned int mtu;
> > +	bool toobig;
> >  
> > +#ifdef CONFIG_NETFILTER
> >  	if (!x) {
> >  		IPCB(skb)->flags |= IPSKB_REROUTED;
> >  		return dst_output(net, sk, skb);
> >  	}
> >  #endif
> >  
> > +	if (x->props.mode != XFRM_MODE_TUNNEL || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
> > +		goto skip_frag;
> > +
> > +	mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
> > +
> > +	toobig = skb->len > mtu && !skb_is_gso(skb);
> > +
> > +	if (!skb->ignore_df && toobig && skb->sk) {
> > +		xfrm_local_error(skb, mtu);
> > +		kfree_skb(skb);
> > +		return -EMSGSIZE;
> > +	}
> > +
> > +	if (toobig) {
> > +		IPCB(skb)->frag_max_size = mtu;
> > +		return ip_do_fragment(net, sk, skb, __xfrm4_output_finish);
> > +	}
> 
> This would fragment the packet even if the DF bit is set.
> 
> Please no further packet offload stuff in generic code.

+ 100000

Thanks

> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-26 13:21             ` Leon Romanovsky
@ 2024-11-28  9:25               ` Steffen Klassert
  2024-11-28 12:14                 ` Leon Romanovsky
  0 siblings, 1 reply; 12+ messages in thread
From: Steffen Klassert @ 2024-11-28  9:25 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ilia Lin, herbert, David Miller, dsahern, edumazet, kuba, pabeni,
	horms, netdev, open list

On Tue, Nov 26, 2024 at 03:21:45PM +0200, Leon Romanovsky wrote:
> On Tue, Nov 26, 2024 at 01:59:31PM +0100, Steffen Klassert wrote:
> > On Tue, Nov 26, 2024 at 10:35:13AM +0200, Leon Romanovsky wrote:
> > > 
> > > Steffen, do we need special case for packet offload here? My preference is
> > > to make sure that we will have as less possible special cases for packet
> > > offload.
> > 
> > Looks like the problem on packet offload is that packets
> > bigger than MTU size are dropped before the PMTU signaling
> > is handled.
> 
> But PMTU should be less or equal to MTU, even before first packet was
> sent. Otherwise already first packet will be fragmented.

Atually I ment PMTU. On packet offload, we just drop packets bigger
than PMTU. We need to make sure that xfrm{4,6}_tunnel_check_size
is called. This will either fragment or do PMTU signaling.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] xfrm: Add pre-encap fragmentation for packet offload
  2024-11-28  9:25               ` Steffen Klassert
@ 2024-11-28 12:14                 ` Leon Romanovsky
  0 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2024-11-28 12:14 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Ilia Lin, herbert, David Miller, dsahern, edumazet, kuba, pabeni,
	horms, netdev, open list

On Thu, Nov 28, 2024 at 10:25:23AM +0100, Steffen Klassert wrote:
> On Tue, Nov 26, 2024 at 03:21:45PM +0200, Leon Romanovsky wrote:
> > On Tue, Nov 26, 2024 at 01:59:31PM +0100, Steffen Klassert wrote:
> > > On Tue, Nov 26, 2024 at 10:35:13AM +0200, Leon Romanovsky wrote:
> > > > 
> > > > Steffen, do we need special case for packet offload here? My preference is
> > > > to make sure that we will have as less possible special cases for packet
> > > > offload.
> > > 
> > > Looks like the problem on packet offload is that packets
> > > bigger than MTU size are dropped before the PMTU signaling
> > > is handled.
> > 
> > But PMTU should be less or equal to MTU, even before first packet was
> > sent. Otherwise already first packet will be fragmented.
> 
> Atually I ment PMTU. On packet offload, we just drop packets bigger
> than PMTU. We need to make sure that xfrm{4,6}_tunnel_check_size
> is called. This will either fragment or do PMTU signaling.

Right, I'll check it next week (change is clear, need some time to set
testing setup).

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-11-28 12:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-24  9:35 [PATCH] xfrm: Add pre-encap fragmentation for packet offload Ilia Lin
2024-11-24 12:04 ` Leon Romanovsky
2024-11-25  9:26   ` Ilia Lin
2024-11-25 19:43     ` Leon Romanovsky
2024-11-26  7:48       ` Ilia Lin
     [not found]       ` <CA+5LGR0e677wm5zEx9yYZDtsCUL6etMoRB2yF9o5msqdVOWU8w@mail.gmail.com>
2024-11-26  8:35         ` Leon Romanovsky
2024-11-26 12:59           ` Steffen Klassert
2024-11-26 13:21             ` Leon Romanovsky
2024-11-28  9:25               ` Steffen Klassert
2024-11-28 12:14                 ` Leon Romanovsky
2024-11-26 12:51 ` Steffen Klassert
2024-11-26 13:22   ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).