netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] l2tp: avoid checksum offload for fragmented packets
@ 2013-05-16 17:04 Tom Parkin
  2013-05-16 17:04 ` Tom Parkin
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Parkin @ 2013-05-16 17:04 UTC (permalink / raw)
  To: netdev; +Cc: jchapman, Tom Parkin

UDP encapsulated L2TP data is currently subject to checksum errors if hardware
checksum offload is used, and the packet is fragmented.

This patch works around this issue by falling back to software checksumming if
the packet looks like it might fragment.

The problem with this approach (and the "RFC" part) is that we don't know how
big the IP header will be for the packet.  If we guess at the worst-case
scenario of 60b we can be sure to avoid checksum errors, but we may also be
using expensive software checksums in some cases where we don't really need
to.

Any suggestions on a better approach to solving this problem?  Should we be
using socket corking similar to the UDP code so that we can examine whether
the IP layer fragmented the packet after calling ip_append_data?

Tom Parkin (1):
  l2tp: avoid checksum offload for fragmented packets

 net/l2tp/l2tp_core.c |   53 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 21 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC] l2tp: avoid checksum offload for fragmented packets
  2013-05-16 17:04 [RFC] l2tp: avoid checksum offload for fragmented packets Tom Parkin
@ 2013-05-16 17:04 ` Tom Parkin
  2013-05-27 18:58   ` Benjamin LaHaise
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Parkin @ 2013-05-16 17:04 UTC (permalink / raw)
  To: netdev; +Cc: jchapman, Tom Parkin

Hardware offload for UDP datagram checksum calculation doesn't work with
fragmented IP packets -- the device will note the fragmentation and leave the
UDP checksum well alone.

As such, if we expect the L2TP packet to be fragmented by the IP layer we need
to perform the UDP checksum ourselves in software (ref: net/ipv4/udp.c).

This change modifies the L2TP xmit path to fallback to software checksum
calculation if the L2TP packet + IP header exceeds the tunnel device MTU.
Since we don't know what the IP header length will be a priori, we assume the
worst-case of 60b.  This will likely result in unnecessary software
checksumming when packet sizes approach the MTU since it's probably not common
to be using the full IP header.

An alternative approach is to mimic UDP and use socket corking to allow us to
pass the skb to the IP layer prior to finally pushing the button on xmit.
This lets IP do his fragmentation before we authorise the packet send,
allowing us to check whether the packet was actually fragmented by IP or not.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
---
 net/l2tp/l2tp_core.c |   53 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 6984c3a..bc10658 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1133,6 +1133,33 @@ static void l2tp_xmit_ipv6_csum(struct sock *sk, struct sk_buff *skb,
 }
 #endif
 
+static void l2tp_xmit_ipv4_csum(struct sock *sk, struct sk_buff *skb,
+				int udp_len)
+{
+	struct inet_sock *inet = inet_sk(sk);
+	struct udphdr *uh = udp_hdr(skb);
+
+	if (!skb_dst(skb) || !skb_dst(skb)->dev ||
+	    !(skb_dst(skb)->dev->features & NETIF_F_V4_CSUM) ||
+	    (udp_len + 60) > dst_mtu(skb_dst(skb))) {
+		__wsum csum;
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		csum = skb_checksum(skb, 0, udp_len, 0);
+		uh->check = csum_tcpudp_magic(inet->inet_saddr,
+				inet->inet_daddr,
+				udp_len, IPPROTO_UDP, csum);
+		if (uh->check == 0)
+			uh->check = CSUM_MANGLED_0;
+	} else {
+		skb->ip_summed = CHECKSUM_PARTIAL;
+		skb->csum_start = skb_transport_header(skb) - skb->head;
+		skb->csum_offset = offsetof(struct udphdr, check);
+		uh->check = ~csum_tcpudp_magic(inet->inet_saddr,
+				inet->inet_daddr,
+				udp_len, IPPROTO_UDP, 0);
+	}
+}
+
 /* If caller requires the skb to have a ppp header, the header must be
  * inserted in the skb data before calling this function.
  */
@@ -1197,30 +1224,14 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int hdr_len
 		uh->check = 0;
 
 		/* Calculate UDP checksum if configured to do so */
+		if (sk->sk_no_check == UDP_CSUM_NOXMIT)
+			skb->ip_summed = CHECKSUM_NONE;
 #if IS_ENABLED(CONFIG_IPV6)
-		if (sk->sk_family == PF_INET6)
+		else if (sk->sk_family == PF_INET6)
 			l2tp_xmit_ipv6_csum(sk, skb, udp_len);
-		else
 #endif
-		if (sk->sk_no_check == UDP_CSUM_NOXMIT)
-			skb->ip_summed = CHECKSUM_NONE;
-		else if ((skb_dst(skb) && skb_dst(skb)->dev) &&
-			 (!(skb_dst(skb)->dev->features & NETIF_F_V4_CSUM))) {
-			skb->ip_summed = CHECKSUM_COMPLETE;
-			csum = skb_checksum(skb, 0, udp_len, 0);
-			uh->check = csum_tcpudp_magic(inet->inet_saddr,
-						      inet->inet_daddr,
-						      udp_len, IPPROTO_UDP, csum);
-			if (uh->check == 0)
-				uh->check = CSUM_MANGLED_0;
-		} else {
-			skb->ip_summed = CHECKSUM_PARTIAL;
-			skb->csum_start = skb_transport_header(skb) - skb->head;
-			skb->csum_offset = offsetof(struct udphdr, check);
-			uh->check = ~csum_tcpudp_magic(inet->inet_saddr,
-						       inet->inet_daddr,
-						       udp_len, IPPROTO_UDP, 0);
-		}
+		else
+			l2tp_xmit_ipv4_csum(sk, skb, udp_len);
 		break;
 
 	case L2TP_ENCAPTYPE_IP:
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC] l2tp: avoid checksum offload for fragmented packets
  2013-05-16 17:04 ` Tom Parkin
@ 2013-05-27 18:58   ` Benjamin LaHaise
  2013-05-29 17:15     ` Tom Parkin
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin LaHaise @ 2013-05-27 18:58 UTC (permalink / raw)
  To: Tom Parkin; +Cc: netdev, jchapman

On Thu, May 16, 2013 at 06:04:20PM +0100, Tom Parkin wrote:
> Hardware offload for UDP datagram checksum calculation doesn't work with
> fragmented IP packets -- the device will note the fragmentation and leave the
> UDP checksum well alone.

> As such, if we expect the L2TP packet to be fragmented by the IP layer we need
> to perform the UDP checksum ourselves in software (ref: net/ipv4/udp.c).

Hrm, indeed.

> This change modifies the L2TP xmit path to fallback to software checksum
> calculation if the L2TP packet + IP header exceeds the tunnel device MTU.
> Since we don't know what the IP header length will be a priori, we assume the
> worst-case of 60b.  This will likely result in unnecessary software
> checksumming when packet sizes approach the MTU since it's probably not common
> to be using the full IP header.

Using the worst case value of 60 is a poor choice for many users of L2TP --
plenty of the wholesale ISP services in the world using PPPoE transport 
sessions to ISPs using frame with headers of ethernet(14) + IP(20) + UDP(8) + 
L2TP(6) = 48 (this setup is used by a number of large telcos here in Canada).  
This will results in spurious use of software checksumming over links that 
are provisioned with the minimum usable MTU (which is common with this kind 
of link).  Please make the code calculate the correct size of the added 
headers to avoid uexpected CPU overhead.

> An alternative approach is to mimic UDP and use socket corking to allow us to
> pass the skb to the IP layer prior to finally pushing the button on xmit.
> This lets IP do his fragmentation before we authorise the packet send,
> allowing us to check whether the packet was actually fragmented by IP or not.

That is probably undesirable from a CPU usage point of view.  Ideally, the 
kernel's L2TP stack should generate ICMP frag needed messages for such 
frames to avoid the fragmentation overhead (ipip is one such tunnelling 
protocol that does this; there are others).

> Signed-off-by: Tom Parkin <tparkin@katalix.com>

Nacked-by: Benamin LaHaise at least until the IPv6 issue (see blow) is fixed 
at the bare minimum.

> ---
>  net/l2tp/l2tp_core.c |   53 ++++++++++++++++++++++++++++++--------------------
>  1 file changed, 32 insertions(+), 21 deletions(-)
> 
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 6984c3a..bc10658 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c

...

> @@ -1197,30 +1224,14 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int hdr_len
>  		uh->check = 0;
>  
>  		/* Calculate UDP checksum if configured to do so */
> +		if (sk->sk_no_check == UDP_CSUM_NOXMIT)
> +			skb->ip_summed = CHECKSUM_NONE;
>  #if IS_ENABLED(CONFIG_IPV6)
> -		if (sk->sk_family == PF_INET6)
> +		else if (sk->sk_family == PF_INET6)
>  			l2tp_xmit_ipv6_csum(sk, skb, udp_len);
> -		else
...

The last time I checked, for IPv6 UDP packets, the checksum MUST always be 
calculated (RFC 2460).  If this has changed, you'll also need to update the 
IPv6 UDP receive path to allow rx packets with a zero checksum, as I believe 
they are noisily dropped at present.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] l2tp: avoid checksum offload for fragmented packets
  2013-05-27 18:58   ` Benjamin LaHaise
@ 2013-05-29 17:15     ` Tom Parkin
  0 siblings, 0 replies; 4+ messages in thread
From: Tom Parkin @ 2013-05-29 17:15 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: netdev, jchapman

[-- Attachment #1: Type: text/plain, Size: 3193 bytes --]

Thanks, Ben.

On Mon, May 27, 2013 at 02:58:29PM -0400, Benjamin LaHaise wrote:
> > This change modifies the L2TP xmit path to fallback to software checksum
> > calculation if the L2TP packet + IP header exceeds the tunnel device MTU.
> > Since we don't know what the IP header length will be a priori, we assume the
> > worst-case of 60b.  This will likely result in unnecessary software
> > checksumming when packet sizes approach the MTU since it's probably not common
> > to be using the full IP header.
> 
> Using the worst case value of 60 is a poor choice for many users of L2TP --
> plenty of the wholesale ISP services in the world using PPPoE transport 
> sessions to ISPs using frame with headers of ethernet(14) + IP(20) + UDP(8) + 
> L2TP(6) = 48 (this setup is used by a number of large telcos here in Canada).  
> This will results in spurious use of software checksumming over links that 
> are provisioned with the minimum usable MTU (which is common with this kind 
> of link).  Please make the code calculate the correct size of the added 
> headers to avoid uexpected CPU overhead.

Yes, I agree.  I wasn't sure whether direct calculation of the IP
header length would be acceptable, or whether there was another
mechanism available that I should be making use of.

I'll respin this patch with direct calculations rather than the worse-case
guess.

> > An alternative approach is to mimic UDP and use socket corking to allow us to
> > pass the skb to the IP layer prior to finally pushing the button on xmit.
> > This lets IP do his fragmentation before we authorise the packet send,
> > allowing us to check whether the packet was actually fragmented by IP or not.
> 
> That is probably undesirable from a CPU usage point of view.  Ideally, the 
> kernel's L2TP stack should generate ICMP frag needed messages for such 
> frames to avoid the fragmentation overhead (ipip is one such tunnelling 
> protocol that does this; there are others).

I agree.  That sounds like a better overall approach.  Perhaps we
could look at fixing up the immediate issue with a patch similar to
this (with your review comments resolved), and then add support for
ICMP frag needed messages as a further piece of work?

> > @@ -1197,30 +1224,14 @@ int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int hdr_len
> >  		uh->check = 0;
> >  
> >  		/* Calculate UDP checksum if configured to do so */
> > +		if (sk->sk_no_check == UDP_CSUM_NOXMIT)
> > +			skb->ip_summed = CHECKSUM_NONE;
> >  #if IS_ENABLED(CONFIG_IPV6)
> > -		if (sk->sk_family == PF_INET6)
> > +		else if (sk->sk_family == PF_INET6)
> >  			l2tp_xmit_ipv6_csum(sk, skb, udp_len);
> > -		else
> ...
> 
> The last time I checked, for IPv6 UDP packets, the checksum MUST always be 
> calculated (RFC 2460).  If this has changed, you'll also need to update the 
> IPv6 UDP receive path to allow rx packets with a zero checksum, as I believe 
> they are noisily dropped at present.

Good catch, I'll fix that by reverting this chunk.

-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-05-29 17:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-16 17:04 [RFC] l2tp: avoid checksum offload for fragmented packets Tom Parkin
2013-05-16 17:04 ` Tom Parkin
2013-05-27 18:58   ` Benjamin LaHaise
2013-05-29 17:15     ` Tom Parkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).