netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v5] xen-netback: fix fragment detection in checksum setup
@ 2013-12-03 17:39 Paul Durrant
  2013-12-05  6:28 ` annie li
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Paul Durrant @ 2013-12-03 17:39 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: Paul Durrant, Zoltan Kiss, Wei Liu, Ian Campbell, David Vrabel,
	David Miller

The code to detect fragments in checksum_setup() was missing for IPv4 and
too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
with a fragment header even if they are not a fragment - i.e. offset is zero,
and M bit is not set).

This patch also incorporates a fix to callers of maybe_pull_tail() where
skb->network_header was being erroneously added to the length argument.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
cc: David Miller <davem@davemloft.net>
---
v5
- Return an error from maybe_pull_tail() to distiguish issues due to the
  frontend from those due to the backend.

v4
- Re-work maybe_pull_tail() to have a failure path and update mac and
  network header pointers on success.
- Re-work callers of maybe_pull_tail() to handle failures and allow for
  movement of skb header pointers.
- Incorporate fixes from patch "Fix pull size in checksum_setup_ip" since
  it also modifies callers of maybe_pull_tail().
- Remove error messages in checksum routines that could be triggered by frontends

v3
- Use defined values for 'fragment offset' and 'more fragments'

v2
- Added comments noting what fragment/offset masks mean

 drivers/net/xen-netback/netback.c |  236 +++++++++++++++++++++----------------
 include/linux/ipv6.h              |    1 +
 include/net/ipv6.h                |    3 +-
 3 files changed, 140 insertions(+), 100 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 64f0e0d..acf1392 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1149,49 +1149,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 	return 0;
 }
 
-static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
+static inline int maybe_pull_tail(struct sk_buff *skb, unsigned int len,
+				  unsigned int max)
 {
-	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
-		/* If we need to pullup then pullup to the max, so we
-		 * won't need to do it again.
-		 */
-		int target = min_t(int, skb->len, MAX_TCP_HEADER);
-		__pskb_pull_tail(skb, target - skb_headlen(skb));
-	}
+	if (skb_headlen(skb) >= len)
+		return 0;
+
+	/* If we need to pullup then pullup to the max, so we
+	 * won't need to do it again.
+	 */
+	if (max > skb->len)
+		max = skb->len;
+
+	if (__pskb_pull_tail(skb, max - skb_headlen(skb)) == NULL)
+		return -ENOMEM;
+
+	if (skb_headlen(skb) < len)
+		return -EPROTO;
+
+	return 0;
 }
 
+/* This value should be large enough to cover a tagged ethernet header plus
+ * maximally sized IP and TCP or UDP headers.
+ */
+#define MAX_IP_HDR_LEN 128
+
 static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
 			     int recalculate_partial_csum)
 {
-	struct iphdr *iph = (void *)skb->data;
-	unsigned int header_size;
 	unsigned int off;
-	int err = -EPROTO;
+	bool fragment;
+	int err;
+
+	fragment = false;
+
+	err = maybe_pull_tail(skb,
+			      sizeof(struct iphdr),
+			      MAX_IP_HDR_LEN);
+	if (err < 0)
+		goto out;
 
-	off = sizeof(struct iphdr);
+	if (ip_hdr(skb)->frag_off & htons(IP_OFFSET | IP_MF))
+		fragment = true;
 
-	header_size = skb->network_header + off + MAX_IPOPTLEN;
-	maybe_pull_tail(skb, header_size);
+	off = ip_hdrlen(skb);
 
-	off = iph->ihl * 4;
+	err = -EPROTO;
 
-	switch (iph->protocol) {
+	switch (ip_hdr(skb)->protocol) {
 	case IPPROTO_TCP:
 		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct tcphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
-			struct tcphdr *tcph = tcp_hdr(skb);
-
-			header_size = skb->network_header +
-				off +
-				sizeof(struct tcphdr);
-			maybe_pull_tail(skb, header_size);
-
-			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - off,
-							 IPPROTO_TCP, 0);
+			err = maybe_pull_tail(skb,
+					      off + sizeof(struct tcphdr),
+					      MAX_IP_HDR_LEN);
+			if (err < 0)
+				goto out;
+
+			tcp_hdr(skb)->check =
+				~csum_tcpudp_magic(ip_hdr(skb)->saddr,
+						   ip_hdr(skb)->daddr,
+						   skb->len - off,
+						   IPPROTO_TCP, 0);
 		}
 		break;
 	case IPPROTO_UDP:
@@ -1200,24 +1223,20 @@ static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
 			goto out;
 
 		if (recalculate_partial_csum) {
-			struct udphdr *udph = udp_hdr(skb);
-
-			header_size = skb->network_header +
-				off +
-				sizeof(struct udphdr);
-			maybe_pull_tail(skb, header_size);
-
-			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - off,
-							 IPPROTO_UDP, 0);
+			err = maybe_pull_tail(skb,
+					      off + sizeof(struct udphdr),
+					      MAX_IP_HDR_LEN);
+			if (err < 0)
+				goto out;
+
+			udp_hdr(skb)->check =
+				~csum_tcpudp_magic(ip_hdr(skb)->saddr,
+						   ip_hdr(skb)->daddr,
+						   skb->len - off,
+						   IPPROTO_UDP, 0);
 		}
 		break;
 	default:
-		if (net_ratelimit())
-			netdev_err(vif->dev,
-				   "Attempting to checksum a non-TCP/UDP packet, "
-				   "dropping a protocol %d packet\n",
-				   iph->protocol);
 		goto out;
 	}
 
@@ -1227,75 +1246,99 @@ out:
 	return err;
 }
 
+/* This value should be large enough to cover a tagged ethernet header plus
+ * an IPv6 header, all options, and a maximal TCP or UDP header.
+ */
+#define MAX_IPV6_HDR_LEN 256
+
+#define OPT_HDR(type, skb, off) \
+	(type *)(skb_network_header(skb) + (off))
+
 static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
 			       int recalculate_partial_csum)
 {
-	int err = -EPROTO;
-	struct ipv6hdr *ipv6h = (void *)skb->data;
+	int err;
 	u8 nexthdr;
-	unsigned int header_size;
 	unsigned int off;
+	unsigned int len;
 	bool fragment;
 	bool done;
 
+	fragment = false;
 	done = false;
 
 	off = sizeof(struct ipv6hdr);
 
-	header_size = skb->network_header + off;
-	maybe_pull_tail(skb, header_size);
+	err = maybe_pull_tail(skb, off, MAX_IPV6_HDR_LEN);
+	if (err < 0)
+		goto out;
 
-	nexthdr = ipv6h->nexthdr;
+	nexthdr = ipv6_hdr(skb)->nexthdr;
 
-	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
-	       !done) {
+	len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
+	while (off <= len && !done) {
 		switch (nexthdr) {
 		case IPPROTO_DSTOPTS:
 		case IPPROTO_HOPOPTS:
 		case IPPROTO_ROUTING: {
-			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
+			struct ipv6_opt_hdr *hp;
 
-			header_size = skb->network_header +
-				off +
-				sizeof(struct ipv6_opt_hdr);
-			maybe_pull_tail(skb, header_size);
+			err = maybe_pull_tail(skb,
+					      off +
+					      sizeof(struct ipv6_opt_hdr),
+					      MAX_IPV6_HDR_LEN);
+			if (err < 0)
+				goto out;
 
+			hp = OPT_HDR(struct ipv6_opt_hdr, skb, off);
 			nexthdr = hp->nexthdr;
 			off += ipv6_optlen(hp);
 			break;
 		}
 		case IPPROTO_AH: {
-			struct ip_auth_hdr *hp = (void *)(skb->data + off);
+			struct ip_auth_hdr *hp;
+
+			err = maybe_pull_tail(skb,
+					      off +
+					      sizeof(struct ip_auth_hdr),
+					      MAX_IPV6_HDR_LEN);
+			if (err < 0)
+				goto out;
+
+			hp = OPT_HDR(struct ip_auth_hdr, skb, off);
+			nexthdr = hp->nexthdr;
+			off += ipv6_authlen(hp);
+			break;
+		}
+		case IPPROTO_FRAGMENT: {
+			struct frag_hdr *hp;
 
-			header_size = skb->network_header +
-				off +
-				sizeof(struct ip_auth_hdr);
-			maybe_pull_tail(skb, header_size);
+			err = maybe_pull_tail(skb,
+					      off +
+					      sizeof(struct frag_hdr),
+					      MAX_IPV6_HDR_LEN);
+			if (err < 0)
+				goto out;
+
+			hp = OPT_HDR(struct frag_hdr, skb, off);
+
+			if (hp->frag_off & htons(IP6_OFFSET | IP6_MF))
+				fragment = true;
 
 			nexthdr = hp->nexthdr;
-			off += (hp->hdrlen+2)<<2;
+			off += sizeof(struct frag_hdr);
 			break;
 		}
-		case IPPROTO_FRAGMENT:
-			fragment = true;
-			/* fall through */
 		default:
 			done = true;
 			break;
 		}
 	}
 
-	if (!done) {
-		if (net_ratelimit())
-			netdev_err(vif->dev, "Failed to parse packet header\n");
-		goto out;
-	}
+	err = -EPROTO;
 
-	if (fragment) {
-		if (net_ratelimit())
-			netdev_err(vif->dev, "Packet is a fragment!\n");
+	if (!done || fragment)
 		goto out;
-	}
 
 	switch (nexthdr) {
 	case IPPROTO_TCP:
@@ -1304,17 +1347,17 @@ static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
 			goto out;
 
 		if (recalculate_partial_csum) {
-			struct tcphdr *tcph = tcp_hdr(skb);
-
-			header_size = skb->network_header +
-				off +
-				sizeof(struct tcphdr);
-			maybe_pull_tail(skb, header_size);
-
-			tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
-						       &ipv6h->daddr,
-						       skb->len - off,
-						       IPPROTO_TCP, 0);
+			err = maybe_pull_tail(skb,
+					      off + sizeof(struct tcphdr),
+					      MAX_IPV6_HDR_LEN);
+			if (err < 0)
+				goto out;
+
+			tcp_hdr(skb)->check =
+				~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+						 &ipv6_hdr(skb)->daddr,
+						 skb->len - off,
+						 IPPROTO_TCP, 0);
 		}
 		break;
 	case IPPROTO_UDP:
@@ -1323,25 +1366,20 @@ static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
 			goto out;
 
 		if (recalculate_partial_csum) {
-			struct udphdr *udph = udp_hdr(skb);
-
-			header_size = skb->network_header +
-				off +
-				sizeof(struct udphdr);
-			maybe_pull_tail(skb, header_size);
-
-			udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
-						       &ipv6h->daddr,
-						       skb->len - off,
-						       IPPROTO_UDP, 0);
+			err = maybe_pull_tail(skb,
+					      off + sizeof(struct udphdr),
+					      MAX_IPV6_HDR_LEN);
+			if (err < 0)
+				goto out;
+
+			udp_hdr(skb)->check =
+				~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+						 &ipv6_hdr(skb)->daddr,
+						 skb->len - off,
+						 IPPROTO_UDP, 0);
 		}
 		break;
 	default:
-		if (net_ratelimit())
-			netdev_err(vif->dev,
-				   "Attempting to checksum a non-TCP/UDP packet, "
-				   "dropping a protocol %d packet\n",
-				   nexthdr);
 		goto out;
 	}
 
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 5d89d1b..c56c350 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -4,6 +4,7 @@
 #include <uapi/linux/ipv6.h>
 
 #define ipv6_optlen(p)  (((p)->hdrlen+1) << 3)
+#define ipv6_authlen(p) (((p)->hdrlen+2) << 2)
 /*
  * This structure contains configuration options per IPv6 link.
  */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index eb198ac..488316e 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -110,7 +110,8 @@ struct frag_hdr {
 	__be32	identification;
 };
 
-#define	IP6_MF	0x0001
+#define	IP6_MF		0x0001
+#define	IP6_OFFSET	0xFFF8
 
 #include <net/sock.h>
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-03 17:39 [PATCH net v5] xen-netback: fix fragment detection in checksum setup Paul Durrant
@ 2013-12-05  6:28 ` annie li
  2013-12-05  9:08   ` Paul Durrant
  2013-12-05 14:53 ` Wei Liu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: annie li @ 2013-12-05  6:28 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, netdev, Zoltan Kiss, Wei Liu, Ian Campbell,
	David Vrabel, David Miller


On 2013/12/4 1:39, Paul Durrant wrote:

>   
> +/* This value should be large enough to cover a tagged ethernet header plus
> + * an IPv6 header, all options, and a maximal TCP or UDP header.
> + */
> +#define MAX_IPV6_HDR_LEN 256
> +
> +#define OPT_HDR(type, skb, off) \
> +	(type *)(skb_network_header(skb) + (off))
> +
>   static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
>   			       int recalculate_partial_csum)
>   {
> -	int err = -EPROTO;
> -	struct ipv6hdr *ipv6h = (void *)skb->data;
> +	int err;
>   	u8 nexthdr;
> -	unsigned int header_size;
>   	unsigned int off;
> +	unsigned int len;
>   	bool fragment;
>   	bool done;
>   
> +	fragment = false;
>   	done = false;
>   
>   	off = sizeof(struct ipv6hdr);
>   
> -	header_size = skb->network_header + off;
> -	maybe_pull_tail(skb, header_size);
> +	err = maybe_pull_tail(skb, off, MAX_IPV6_HDR_LEN);
> +	if (err < 0)
> +		goto out;
>   
> -	nexthdr = ipv6h->nexthdr;
> +	nexthdr = ipv6_hdr(skb)->nexthdr;
>   
> -	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
> -	       !done) {
> +	len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
> +	while (off <= len && !done) {
>   		switch (nexthdr) {
>   		case IPPROTO_DSTOPTS:
>   		case IPPROTO_HOPOPTS:
>   		case IPPROTO_ROUTING: {
> -			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> +			struct ipv6_opt_hdr *hp;
>   
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct ipv6_opt_hdr);
> -			maybe_pull_tail(skb, header_size);
> +			err = maybe_pull_tail(skb,
> +					      off +
> +					      sizeof(struct ipv6_opt_hdr),
> +					      MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
>   
> +			hp = OPT_HDR(struct ipv6_opt_hdr, skb, off);
>   			nexthdr = hp->nexthdr;
>   			off += ipv6_optlen(hp);
>   			break;
>   		}
>   		case IPPROTO_AH: {
> -			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> +			struct ip_auth_hdr *hp;
> +
> +			err = maybe_pull_tail(skb,
> +					      off +
> +					      sizeof(struct ip_auth_hdr),
> +					      MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			hp = OPT_HDR(struct ip_auth_hdr, skb, off);
> +			nexthdr = hp->nexthdr;
> +			off += ipv6_authlen(hp);
> +			break;
> +		}
> +		case IPPROTO_FRAGMENT: {
> +			struct frag_hdr *hp;
>   
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct ip_auth_hdr);
> -			maybe_pull_tail(skb, header_size);
> +			err = maybe_pull_tail(skb,
> +					      off +
> +					      sizeof(struct frag_hdr),
> +					      MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			hp = OPT_HDR(struct frag_hdr, skb, off);
> +
> +			if (hp->frag_off & htons(IP6_OFFSET | IP6_MF))
> +				fragment = true;
>   
>   			nexthdr = hp->nexthdr;
> -			off += (hp->hdrlen+2)<<2;
> +			off += sizeof(struct frag_hdr);
>   			break;
>   		}
> -		case IPPROTO_FRAGMENT:
> -			fragment = true;
> -			/* fall through */

About IPv6 extension headers, should "Encapsulating Security Payload" be 
processed too?(See rfc2460 and rfc2406) Is there any concern to ignore 
it here?

Thanks
Annie

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-05  6:28 ` annie li
@ 2013-12-05  9:08   ` Paul Durrant
  2013-12-05 10:00     ` [Xen-devel] " annie li
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Durrant @ 2013-12-05  9:08 UTC (permalink / raw)
  To: annie li
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Zoltan Kiss,
	Wei Liu, Ian Campbell, David Vrabel, David Miller

> -----Original Message-----
> From: annie li [mailto:annie.li@oracle.com]
> Sent: 05 December 2013 06:28
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Zoltan Kiss; Wei Liu;
> Ian Campbell; David Vrabel; David Miller
> Subject: Re: [PATCH net v5] xen-netback: fix fragment detection in checksum
> setup
> 
> 
> On 2013/12/4 1:39, Paul Durrant wrote:
> 
> >
> > +/* This value should be large enough to cover a tagged ethernet header
> plus
> > + * an IPv6 header, all options, and a maximal TCP or UDP header.
> > + */
> > +#define MAX_IPV6_HDR_LEN 256
> > +
> > +#define OPT_HDR(type, skb, off) \
> > +	(type *)(skb_network_header(skb) + (off))
> > +
> >   static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> >   			       int recalculate_partial_csum)
> >   {
> > -	int err = -EPROTO;
> > -	struct ipv6hdr *ipv6h = (void *)skb->data;
> > +	int err;
> >   	u8 nexthdr;
> > -	unsigned int header_size;
> >   	unsigned int off;
> > +	unsigned int len;
> >   	bool fragment;
> >   	bool done;
> >
> > +	fragment = false;
> >   	done = false;
> >
> >   	off = sizeof(struct ipv6hdr);
> >
> > -	header_size = skb->network_header + off;
> > -	maybe_pull_tail(skb, header_size);
> > +	err = maybe_pull_tail(skb, off, MAX_IPV6_HDR_LEN);
> > +	if (err < 0)
> > +		goto out;
> >
> > -	nexthdr = ipv6h->nexthdr;
> > +	nexthdr = ipv6_hdr(skb)->nexthdr;
> >
> > -	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> &&
> > -	       !done) {
> > +	len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
> > +	while (off <= len && !done) {
> >   		switch (nexthdr) {
> >   		case IPPROTO_DSTOPTS:
> >   		case IPPROTO_HOPOPTS:
> >   		case IPPROTO_ROUTING: {
> > -			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> > +			struct ipv6_opt_hdr *hp;
> >
> > -			header_size = skb->network_header +
> > -				off +
> > -				sizeof(struct ipv6_opt_hdr);
> > -			maybe_pull_tail(skb, header_size);
> > +			err = maybe_pull_tail(skb,
> > +					      off +
> > +					      sizeof(struct ipv6_opt_hdr),
> > +					      MAX_IPV6_HDR_LEN);
> > +			if (err < 0)
> > +				goto out;
> >
> > +			hp = OPT_HDR(struct ipv6_opt_hdr, skb, off);
> >   			nexthdr = hp->nexthdr;
> >   			off += ipv6_optlen(hp);
> >   			break;
> >   		}
> >   		case IPPROTO_AH: {
> > -			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> > +			struct ip_auth_hdr *hp;
> > +
> > +			err = maybe_pull_tail(skb,
> > +					      off +
> > +					      sizeof(struct ip_auth_hdr),
> > +					      MAX_IPV6_HDR_LEN);
> > +			if (err < 0)
> > +				goto out;
> > +
> > +			hp = OPT_HDR(struct ip_auth_hdr, skb, off);
> > +			nexthdr = hp->nexthdr;
> > +			off += ipv6_authlen(hp);
> > +			break;
> > +		}
> > +		case IPPROTO_FRAGMENT: {
> > +			struct frag_hdr *hp;
> >
> > -			header_size = skb->network_header +
> > -				off +
> > -				sizeof(struct ip_auth_hdr);
> > -			maybe_pull_tail(skb, header_size);
> > +			err = maybe_pull_tail(skb,
> > +					      off +
> > +					      sizeof(struct frag_hdr),
> > +					      MAX_IPV6_HDR_LEN);
> > +			if (err < 0)
> > +				goto out;
> > +
> > +			hp = OPT_HDR(struct frag_hdr, skb, off);
> > +
> > +			if (hp->frag_off & htons(IP6_OFFSET | IP6_MF))
> > +				fragment = true;
> >
> >   			nexthdr = hp->nexthdr;
> > -			off += (hp->hdrlen+2)<<2;
> > +			off += sizeof(struct frag_hdr);
> >   			break;
> >   		}
> > -		case IPPROTO_FRAGMENT:
> > -			fragment = true;
> > -			/* fall through */
> 
> About IPv6 extension headers, should "Encapsulating Security Payload" be
> processed too?(See rfc2460 and rfc2406) Is there any concern to ignore
> it here?
> 

It's deliberately ignored. If we have an ESP header then the TCP/UDP header and payload will be encrypted so you can't do checksum offload, hence this function should return an error.

  Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-05  9:08   ` Paul Durrant
@ 2013-12-05 10:00     ` annie li
  2013-12-05 10:45       ` Paul Durrant
  0 siblings, 1 reply; 15+ messages in thread
From: annie li @ 2013-12-05 10:00 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Ian Campbell, netdev@vger.kernel.org,
	xen-devel@lists.xen.org, David Vrabel, Zoltan Kiss, David Miller


On 2013/12/5 17:08, Paul Durrant wrote:
>> -----Original Message-----
>> From: annie li [mailto:annie.li@oracle.com]
>> Sent: 05 December 2013 06:28
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Zoltan Kiss; Wei Liu;
>> Ian Campbell; David Vrabel; David Miller
>> Subject: Re: [PATCH net v5] xen-netback: fix fragment detection in checksum
>> setup
>>
>>
>> On 2013/12/4 1:39, Paul Durrant wrote:
>>
>>> +/* This value should be large enough to cover a tagged ethernet header
>> plus
>>> + * an IPv6 header, all options, and a maximal TCP or UDP header.
>>> + */
>>> +#define MAX_IPV6_HDR_LEN 256
>>> +
>>> +#define OPT_HDR(type, skb, off) \
>>> +	(type *)(skb_network_header(skb) + (off))
>>> +
>>>    static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
>>>    			       int recalculate_partial_csum)
>>>    {
>>> -	int err = -EPROTO;
>>> -	struct ipv6hdr *ipv6h = (void *)skb->data;
>>> +	int err;
>>>    	u8 nexthdr;
>>> -	unsigned int header_size;
>>>    	unsigned int off;
>>> +	unsigned int len;
>>>    	bool fragment;
>>>    	bool done;
>>>
>>> +	fragment = false;
>>>    	done = false;
>>>
>>>    	off = sizeof(struct ipv6hdr);
>>>
>>> -	header_size = skb->network_header + off;
>>> -	maybe_pull_tail(skb, header_size);
>>> +	err = maybe_pull_tail(skb, off, MAX_IPV6_HDR_LEN);
>>> +	if (err < 0)
>>> +		goto out;
>>>
>>> -	nexthdr = ipv6h->nexthdr;
>>> +	nexthdr = ipv6_hdr(skb)->nexthdr;
>>>
>>> -	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
>> &&
>>> -	       !done) {
>>> +	len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
>>> +	while (off <= len && !done) {
>>>    		switch (nexthdr) {
>>>    		case IPPROTO_DSTOPTS:
>>>    		case IPPROTO_HOPOPTS:
>>>    		case IPPROTO_ROUTING: {
>>> -			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
>>> +			struct ipv6_opt_hdr *hp;
>>>
>>> -			header_size = skb->network_header +
>>> -				off +
>>> -				sizeof(struct ipv6_opt_hdr);
>>> -			maybe_pull_tail(skb, header_size);
>>> +			err = maybe_pull_tail(skb,
>>> +					      off +
>>> +					      sizeof(struct ipv6_opt_hdr),
>>> +					      MAX_IPV6_HDR_LEN);
>>> +			if (err < 0)
>>> +				goto out;
>>>
>>> +			hp = OPT_HDR(struct ipv6_opt_hdr, skb, off);
>>>    			nexthdr = hp->nexthdr;
>>>    			off += ipv6_optlen(hp);
>>>    			break;
>>>    		}
>>>    		case IPPROTO_AH: {
>>> -			struct ip_auth_hdr *hp = (void *)(skb->data + off);
>>> +			struct ip_auth_hdr *hp;
>>> +
>>> +			err = maybe_pull_tail(skb,
>>> +					      off +
>>> +					      sizeof(struct ip_auth_hdr),
>>> +					      MAX_IPV6_HDR_LEN);
>>> +			if (err < 0)
>>> +				goto out;
>>> +
>>> +			hp = OPT_HDR(struct ip_auth_hdr, skb, off);
>>> +			nexthdr = hp->nexthdr;
>>> +			off += ipv6_authlen(hp);
>>> +			break;
>>> +		}
>>> +		case IPPROTO_FRAGMENT: {
>>> +			struct frag_hdr *hp;
>>>
>>> -			header_size = skb->network_header +
>>> -				off +
>>> -				sizeof(struct ip_auth_hdr);
>>> -			maybe_pull_tail(skb, header_size);
>>> +			err = maybe_pull_tail(skb,
>>> +					      off +
>>> +					      sizeof(struct frag_hdr),
>>> +					      MAX_IPV6_HDR_LEN);
>>> +			if (err < 0)
>>> +				goto out;
>>> +
>>> +			hp = OPT_HDR(struct frag_hdr, skb, off);
>>> +
>>> +			if (hp->frag_off & htons(IP6_OFFSET | IP6_MF))
>>> +				fragment = true;
>>>
>>>    			nexthdr = hp->nexthdr;
>>> -			off += (hp->hdrlen+2)<<2;
>>> +			off += sizeof(struct frag_hdr);
>>>    			break;
>>>    		}
>>> -		case IPPROTO_FRAGMENT:
>>> -			fragment = true;
>>> -			/* fall through */
>> About IPv6 extension headers, should "Encapsulating Security Payload" be
>> processed too?(See rfc2460 and rfc2406) Is there any concern to ignore
>> it here?
>>
> It's deliberately ignored. If we have an ESP header then the TCP/UDP header and payload will be encrypted so you can't do checksum offload, hence this function should return an error.
>

Got it. But from current code, if ESP extension header exists, switch 
(nexthdr) will go into default case directly, and done is set with true, 
then the code quits switch and while loop. After that, it will go into 
next switch and then the default case since the nexthdr is ipv6 ESP 
extension header, not TCP/UDP header. Then netdev_err probably prints 
out improper log.

Thanks
Annie

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-05 10:00     ` [Xen-devel] " annie li
@ 2013-12-05 10:45       ` Paul Durrant
  2013-12-05 13:11         ` annie li
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Durrant @ 2013-12-05 10:45 UTC (permalink / raw)
  To: annie li
  Cc: Wei Liu, Ian Campbell, netdev@vger.kernel.org,
	xen-devel@lists.xen.org, David Vrabel, Zoltan Kiss, David Miller

> -----Original Message-----
> From: annie li [mailto:annie.li@oracle.com]
> Sent: 05 December 2013 10:01
> To: Paul Durrant
> Cc: Wei Liu; Ian Campbell; netdev@vger.kernel.org; xen-devel@lists.xen.org;
> David Vrabel; Zoltan Kiss; David Miller
> Subject: Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection
> in checksum setup
> 
> 
> On 2013/12/5 17:08, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: annie li [mailto:annie.li@oracle.com]
> >> Sent: 05 December 2013 06:28
> >> To: Paul Durrant
> >> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Zoltan Kiss; Wei
> Liu;
> >> Ian Campbell; David Vrabel; David Miller
> >> Subject: Re: [PATCH net v5] xen-netback: fix fragment detection in
> checksum
> >> setup
> >>
> >>
> >> On 2013/12/4 1:39, Paul Durrant wrote:
> >>
> >>> +/* This value should be large enough to cover a tagged ethernet
> header
> >> plus
> >>> + * an IPv6 header, all options, and a maximal TCP or UDP header.
> >>> + */
> >>> +#define MAX_IPV6_HDR_LEN 256
> >>> +
> >>> +#define OPT_HDR(type, skb, off) \
> >>> +	(type *)(skb_network_header(skb) + (off))
> >>> +
> >>>    static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> >>>    			       int recalculate_partial_csum)
> >>>    {
> >>> -	int err = -EPROTO;
> >>> -	struct ipv6hdr *ipv6h = (void *)skb->data;
> >>> +	int err;
> >>>    	u8 nexthdr;
> >>> -	unsigned int header_size;
> >>>    	unsigned int off;
> >>> +	unsigned int len;
> >>>    	bool fragment;
> >>>    	bool done;
> >>>
> >>> +	fragment = false;
> >>>    	done = false;
> >>>
> >>>    	off = sizeof(struct ipv6hdr);
> >>>
> >>> -	header_size = skb->network_header + off;
> >>> -	maybe_pull_tail(skb, header_size);
> >>> +	err = maybe_pull_tail(skb, off, MAX_IPV6_HDR_LEN);
> >>> +	if (err < 0)
> >>> +		goto out;
> >>>
> >>> -	nexthdr = ipv6h->nexthdr;
> >>> +	nexthdr = ipv6_hdr(skb)->nexthdr;
> >>>
> >>> -	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> >> &&
> >>> -	       !done) {
> >>> +	len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
> >>> +	while (off <= len && !done) {
> >>>    		switch (nexthdr) {
> >>>    		case IPPROTO_DSTOPTS:
> >>>    		case IPPROTO_HOPOPTS:
> >>>    		case IPPROTO_ROUTING: {
> >>> -			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> >>> +			struct ipv6_opt_hdr *hp;
> >>>
> >>> -			header_size = skb->network_header +
> >>> -				off +
> >>> -				sizeof(struct ipv6_opt_hdr);
> >>> -			maybe_pull_tail(skb, header_size);
> >>> +			err = maybe_pull_tail(skb,
> >>> +					      off +
> >>> +					      sizeof(struct ipv6_opt_hdr),
> >>> +					      MAX_IPV6_HDR_LEN);
> >>> +			if (err < 0)
> >>> +				goto out;
> >>>
> >>> +			hp = OPT_HDR(struct ipv6_opt_hdr, skb, off);
> >>>    			nexthdr = hp->nexthdr;
> >>>    			off += ipv6_optlen(hp);
> >>>    			break;
> >>>    		}
> >>>    		case IPPROTO_AH: {
> >>> -			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> >>> +			struct ip_auth_hdr *hp;
> >>> +
> >>> +			err = maybe_pull_tail(skb,
> >>> +					      off +
> >>> +					      sizeof(struct ip_auth_hdr),
> >>> +					      MAX_IPV6_HDR_LEN);
> >>> +			if (err < 0)
> >>> +				goto out;
> >>> +
> >>> +			hp = OPT_HDR(struct ip_auth_hdr, skb, off);
> >>> +			nexthdr = hp->nexthdr;
> >>> +			off += ipv6_authlen(hp);
> >>> +			break;
> >>> +		}
> >>> +		case IPPROTO_FRAGMENT: {
> >>> +			struct frag_hdr *hp;
> >>>
> >>> -			header_size = skb->network_header +
> >>> -				off +
> >>> -				sizeof(struct ip_auth_hdr);
> >>> -			maybe_pull_tail(skb, header_size);
> >>> +			err = maybe_pull_tail(skb,
> >>> +					      off +
> >>> +					      sizeof(struct frag_hdr),
> >>> +					      MAX_IPV6_HDR_LEN);
> >>> +			if (err < 0)
> >>> +				goto out;
> >>> +
> >>> +			hp = OPT_HDR(struct frag_hdr, skb, off);
> >>> +
> >>> +			if (hp->frag_off & htons(IP6_OFFSET | IP6_MF))
> >>> +				fragment = true;
> >>>
> >>>    			nexthdr = hp->nexthdr;
> >>> -			off += (hp->hdrlen+2)<<2;
> >>> +			off += sizeof(struct frag_hdr);
> >>>    			break;
> >>>    		}
> >>> -		case IPPROTO_FRAGMENT:
> >>> -			fragment = true;
> >>> -			/* fall through */
> >> About IPv6 extension headers, should "Encapsulating Security Payload"
> be
> >> processed too?(See rfc2460 and rfc2406) Is there any concern to ignore
> >> it here?
> >>
> > It's deliberately ignored. If we have an ESP header then the TCP/UDP
> header and payload will be encrypted so you can't do checksum offload,
> hence this function should return an error.
> >
> 
> Got it. But from current code, if ESP extension header exists, switch
> (nexthdr) will go into default case directly, and done is set with true,
> then the code quits switch and while loop.

That's correct. ESP is considered a terminating header.

> After that, it will go into
> next switch and then the default case since the nexthdr is ipv6 ESP
> extension header, not TCP/UDP header. Then netdev_err probably prints
> out improper log.
> 

Which netdev_err? This patch should remove all calls to netdev_err from this codepath. Did I miss one?

  Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-05 10:45       ` Paul Durrant
@ 2013-12-05 13:11         ` annie li
  0 siblings, 0 replies; 15+ messages in thread
From: annie li @ 2013-12-05 13:11 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, Ian Campbell, netdev@vger.kernel.org,
	xen-devel@lists.xen.org, David Vrabel, Zoltan Kiss, David Miller


On 2013-12-5 18:45, Paul Durrant wrote:
>> -----Original Message-----
>> From: annie li [mailto:annie.li@oracle.com]
>> Sent: 05 December 2013 10:01
>> To: Paul Durrant
>> Cc: Wei Liu; Ian Campbell; netdev@vger.kernel.org; xen-devel@lists.xen.org;
>> David Vrabel; Zoltan Kiss; David Miller
>> Subject: Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection
>> in checksum setup
>>
>>
>> On 2013/12/5 17:08, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: annie li [mailto:annie.li@oracle.com]
>>>> Sent: 05 December 2013 06:28
>>>> To: Paul Durrant
>>>> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Zoltan Kiss; Wei
>> Liu;
>>>> Ian Campbell; David Vrabel; David Miller
>>>> Subject: Re: [PATCH net v5] xen-netback: fix fragment detection in
>> checksum
>>>> setup
>>>>
>>>>
>>>> On 2013/12/4 1:39, Paul Durrant wrote:
>>>>
>>>>> +/* This value should be large enough to cover a tagged ethernet
>> header
>>>> plus
>>>>> + * an IPv6 header, all options, and a maximal TCP or UDP header.
>>>>> + */
>>>>> +#define MAX_IPV6_HDR_LEN 256
>>>>> +
>>>>> +#define OPT_HDR(type, skb, off) \
>>>>> +	(type *)(skb_network_header(skb) + (off))
>>>>> +
>>>>>     static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
>>>>>     			       int recalculate_partial_csum)
>>>>>     {
>>>>> -	int err = -EPROTO;
>>>>> -	struct ipv6hdr *ipv6h = (void *)skb->data;
>>>>> +	int err;
>>>>>     	u8 nexthdr;
>>>>> -	unsigned int header_size;
>>>>>     	unsigned int off;
>>>>> +	unsigned int len;
>>>>>     	bool fragment;
>>>>>     	bool done;
>>>>>
>>>>> +	fragment = false;
>>>>>     	done = false;
>>>>>
>>>>>     	off = sizeof(struct ipv6hdr);
>>>>>
>>>>> -	header_size = skb->network_header + off;
>>>>> -	maybe_pull_tail(skb, header_size);
>>>>> +	err = maybe_pull_tail(skb, off, MAX_IPV6_HDR_LEN);
>>>>> +	if (err < 0)
>>>>> +		goto out;
>>>>>
>>>>> -	nexthdr = ipv6h->nexthdr;
>>>>> +	nexthdr = ipv6_hdr(skb)->nexthdr;
>>>>>
>>>>> -	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
>>>> &&
>>>>> -	       !done) {
>>>>> +	len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
>>>>> +	while (off <= len && !done) {
>>>>>     		switch (nexthdr) {
>>>>>     		case IPPROTO_DSTOPTS:
>>>>>     		case IPPROTO_HOPOPTS:
>>>>>     		case IPPROTO_ROUTING: {
>>>>> -			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
>>>>> +			struct ipv6_opt_hdr *hp;
>>>>>
>>>>> -			header_size = skb->network_header +
>>>>> -				off +
>>>>> -				sizeof(struct ipv6_opt_hdr);
>>>>> -			maybe_pull_tail(skb, header_size);
>>>>> +			err = maybe_pull_tail(skb,
>>>>> +					      off +
>>>>> +					      sizeof(struct ipv6_opt_hdr),
>>>>> +					      MAX_IPV6_HDR_LEN);
>>>>> +			if (err < 0)
>>>>> +				goto out;
>>>>>
>>>>> +			hp = OPT_HDR(struct ipv6_opt_hdr, skb, off);
>>>>>     			nexthdr = hp->nexthdr;
>>>>>     			off += ipv6_optlen(hp);
>>>>>     			break;
>>>>>     		}
>>>>>     		case IPPROTO_AH: {
>>>>> -			struct ip_auth_hdr *hp = (void *)(skb->data + off);
>>>>> +			struct ip_auth_hdr *hp;
>>>>> +
>>>>> +			err = maybe_pull_tail(skb,
>>>>> +					      off +
>>>>> +					      sizeof(struct ip_auth_hdr),
>>>>> +					      MAX_IPV6_HDR_LEN);
>>>>> +			if (err < 0)
>>>>> +				goto out;
>>>>> +
>>>>> +			hp = OPT_HDR(struct ip_auth_hdr, skb, off);
>>>>> +			nexthdr = hp->nexthdr;
>>>>> +			off += ipv6_authlen(hp);
>>>>> +			break;
>>>>> +		}
>>>>> +		case IPPROTO_FRAGMENT: {
>>>>> +			struct frag_hdr *hp;
>>>>>
>>>>> -			header_size = skb->network_header +
>>>>> -				off +
>>>>> -				sizeof(struct ip_auth_hdr);
>>>>> -			maybe_pull_tail(skb, header_size);
>>>>> +			err = maybe_pull_tail(skb,
>>>>> +					      off +
>>>>> +					      sizeof(struct frag_hdr),
>>>>> +					      MAX_IPV6_HDR_LEN);
>>>>> +			if (err < 0)
>>>>> +				goto out;
>>>>> +
>>>>> +			hp = OPT_HDR(struct frag_hdr, skb, off);
>>>>> +
>>>>> +			if (hp->frag_off & htons(IP6_OFFSET | IP6_MF))
>>>>> +				fragment = true;
>>>>>
>>>>>     			nexthdr = hp->nexthdr;
>>>>> -			off += (hp->hdrlen+2)<<2;
>>>>> +			off += sizeof(struct frag_hdr);
>>>>>     			break;
>>>>>     		}
>>>>> -		case IPPROTO_FRAGMENT:
>>>>> -			fragment = true;
>>>>> -			/* fall through */
>>>> About IPv6 extension headers, should "Encapsulating Security Payload"
>> be
>>>> processed too?(See rfc2460 and rfc2406) Is there any concern to ignore
>>>> it here?
>>>>
>>> It's deliberately ignored. If we have an ESP header then the TCP/UDP
>> header and payload will be encrypted so you can't do checksum offload,
>> hence this function should return an error.
>> Got it. But from current code, if ESP extension header exists, switch
>> (nexthdr) will go into default case directly, and done is set with true,
>> then the code quits switch and while loop.
> That's correct. ESP is considered a terminating header.
>
>> After that, it will go into
>> next switch and then the default case since the nexthdr is ipv6 ESP
>> extension header, not TCP/UDP header. Then netdev_err probably prints
>> out improper log.
>>
> Which netdev_err? This patch should remove all calls to netdev_err from this codepath. Did I miss one?

You are right! I mixed up this patch and original source code.

Thanks
Annie

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-03 17:39 [PATCH net v5] xen-netback: fix fragment detection in checksum setup Paul Durrant
  2013-12-05  6:28 ` annie li
@ 2013-12-05 14:53 ` Wei Liu
  2013-12-06  1:32 ` David Miller
  2013-12-10 16:11 ` [Xen-devel] " Jan Beulich
  3 siblings, 0 replies; 15+ messages in thread
From: Wei Liu @ 2013-12-05 14:53 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, netdev, Zoltan Kiss, Wei Liu, Ian Campbell,
	David Vrabel, David Miller

On Tue, Dec 03, 2013 at 05:39:29PM +0000, Paul Durrant wrote:
> The code to detect fragments in checksum_setup() was missing for IPv4 and
> too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
> with a fragment header even if they are not a fragment - i.e. offset is zero,
> and M bit is not set).
> 
> This patch also incorporates a fix to callers of maybe_pull_tail() where
> skb->network_header was being erroneously added to the length argument.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> cc: David Miller <davem@davemloft.net>

Acked-by: Wei Liu <wei.liu2@citrix.com>

Thanks
Wei.

> ---
> v5
> - Return an error from maybe_pull_tail() to distiguish issues due to the
>   frontend from those due to the backend.
> 
> v4
> - Re-work maybe_pull_tail() to have a failure path and update mac and
>   network header pointers on success.
> - Re-work callers of maybe_pull_tail() to handle failures and allow for
>   movement of skb header pointers.
> - Incorporate fixes from patch "Fix pull size in checksum_setup_ip" since
>   it also modifies callers of maybe_pull_tail().
> - Remove error messages in checksum routines that could be triggered by frontends
> 
> v3
> - Use defined values for 'fragment offset' and 'more fragments'
> 
> v2
> - Added comments noting what fragment/offset masks mean
> 
>  drivers/net/xen-netback/netback.c |  236 +++++++++++++++++++++----------------
>  include/linux/ipv6.h              |    1 +
>  include/net/ipv6.h                |    3 +-
>  3 files changed, 140 insertions(+), 100 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 64f0e0d..acf1392 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1149,49 +1149,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
>  	return 0;
>  }
>  
> -static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
> +static inline int maybe_pull_tail(struct sk_buff *skb, unsigned int len,
> +				  unsigned int max)
>  {
> -	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> -		/* If we need to pullup then pullup to the max, so we
> -		 * won't need to do it again.
> -		 */
> -		int target = min_t(int, skb->len, MAX_TCP_HEADER);
> -		__pskb_pull_tail(skb, target - skb_headlen(skb));
> -	}
> +	if (skb_headlen(skb) >= len)
> +		return 0;
> +
> +	/* If we need to pullup then pullup to the max, so we
> +	 * won't need to do it again.
> +	 */
> +	if (max > skb->len)
> +		max = skb->len;
> +
> +	if (__pskb_pull_tail(skb, max - skb_headlen(skb)) == NULL)
> +		return -ENOMEM;
> +
> +	if (skb_headlen(skb) < len)
> +		return -EPROTO;
> +
> +	return 0;
>  }
>  
> +/* This value should be large enough to cover a tagged ethernet header plus
> + * maximally sized IP and TCP or UDP headers.
> + */
> +#define MAX_IP_HDR_LEN 128
> +
>  static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
>  			     int recalculate_partial_csum)
>  {
> -	struct iphdr *iph = (void *)skb->data;
> -	unsigned int header_size;
>  	unsigned int off;
> -	int err = -EPROTO;
> +	bool fragment;
> +	int err;
> +
> +	fragment = false;
> +
> +	err = maybe_pull_tail(skb,
> +			      sizeof(struct iphdr),
> +			      MAX_IP_HDR_LEN);
> +	if (err < 0)
> +		goto out;
>  
> -	off = sizeof(struct iphdr);
> +	if (ip_hdr(skb)->frag_off & htons(IP_OFFSET | IP_MF))
> +		fragment = true;
>  
> -	header_size = skb->network_header + off + MAX_IPOPTLEN;
> -	maybe_pull_tail(skb, header_size);
> +	off = ip_hdrlen(skb);
>  
> -	off = iph->ihl * 4;
> +	err = -EPROTO;
>  
> -	switch (iph->protocol) {
> +	switch (ip_hdr(skb)->protocol) {
>  	case IPPROTO_TCP:
>  		if (!skb_partial_csum_set(skb, off,
>  					  offsetof(struct tcphdr, check)))
>  			goto out;
>  
>  		if (recalculate_partial_csum) {
> -			struct tcphdr *tcph = tcp_hdr(skb);
> -
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct tcphdr);
> -			maybe_pull_tail(skb, header_size);
> -
> -			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
> -							 skb->len - off,
> -							 IPPROTO_TCP, 0);
> +			err = maybe_pull_tail(skb,
> +					      off + sizeof(struct tcphdr),
> +					      MAX_IP_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			tcp_hdr(skb)->check =
> +				~csum_tcpudp_magic(ip_hdr(skb)->saddr,
> +						   ip_hdr(skb)->daddr,
> +						   skb->len - off,
> +						   IPPROTO_TCP, 0);
>  		}
>  		break;
>  	case IPPROTO_UDP:
> @@ -1200,24 +1223,20 @@ static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
>  			goto out;
>  
>  		if (recalculate_partial_csum) {
> -			struct udphdr *udph = udp_hdr(skb);
> -
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct udphdr);
> -			maybe_pull_tail(skb, header_size);
> -
> -			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
> -							 skb->len - off,
> -							 IPPROTO_UDP, 0);
> +			err = maybe_pull_tail(skb,
> +					      off + sizeof(struct udphdr),
> +					      MAX_IP_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			udp_hdr(skb)->check =
> +				~csum_tcpudp_magic(ip_hdr(skb)->saddr,
> +						   ip_hdr(skb)->daddr,
> +						   skb->len - off,
> +						   IPPROTO_UDP, 0);
>  		}
>  		break;
>  	default:
> -		if (net_ratelimit())
> -			netdev_err(vif->dev,
> -				   "Attempting to checksum a non-TCP/UDP packet, "
> -				   "dropping a protocol %d packet\n",
> -				   iph->protocol);
>  		goto out;
>  	}
>  
> @@ -1227,75 +1246,99 @@ out:
>  	return err;
>  }
>  
> +/* This value should be large enough to cover a tagged ethernet header plus
> + * an IPv6 header, all options, and a maximal TCP or UDP header.
> + */
> +#define MAX_IPV6_HDR_LEN 256
> +
> +#define OPT_HDR(type, skb, off) \
> +	(type *)(skb_network_header(skb) + (off))
> +
>  static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
>  			       int recalculate_partial_csum)
>  {
> -	int err = -EPROTO;
> -	struct ipv6hdr *ipv6h = (void *)skb->data;
> +	int err;
>  	u8 nexthdr;
> -	unsigned int header_size;
>  	unsigned int off;
> +	unsigned int len;
>  	bool fragment;
>  	bool done;
>  
> +	fragment = false;
>  	done = false;
>  
>  	off = sizeof(struct ipv6hdr);
>  
> -	header_size = skb->network_header + off;
> -	maybe_pull_tail(skb, header_size);
> +	err = maybe_pull_tail(skb, off, MAX_IPV6_HDR_LEN);
> +	if (err < 0)
> +		goto out;
>  
> -	nexthdr = ipv6h->nexthdr;
> +	nexthdr = ipv6_hdr(skb)->nexthdr;
>  
> -	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
> -	       !done) {
> +	len = sizeof(struct ipv6hdr) + ntohs(ipv6_hdr(skb)->payload_len);
> +	while (off <= len && !done) {
>  		switch (nexthdr) {
>  		case IPPROTO_DSTOPTS:
>  		case IPPROTO_HOPOPTS:
>  		case IPPROTO_ROUTING: {
> -			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> +			struct ipv6_opt_hdr *hp;
>  
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct ipv6_opt_hdr);
> -			maybe_pull_tail(skb, header_size);
> +			err = maybe_pull_tail(skb,
> +					      off +
> +					      sizeof(struct ipv6_opt_hdr),
> +					      MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
>  
> +			hp = OPT_HDR(struct ipv6_opt_hdr, skb, off);
>  			nexthdr = hp->nexthdr;
>  			off += ipv6_optlen(hp);
>  			break;
>  		}
>  		case IPPROTO_AH: {
> -			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> +			struct ip_auth_hdr *hp;
> +
> +			err = maybe_pull_tail(skb,
> +					      off +
> +					      sizeof(struct ip_auth_hdr),
> +					      MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			hp = OPT_HDR(struct ip_auth_hdr, skb, off);
> +			nexthdr = hp->nexthdr;
> +			off += ipv6_authlen(hp);
> +			break;
> +		}
> +		case IPPROTO_FRAGMENT: {
> +			struct frag_hdr *hp;
>  
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct ip_auth_hdr);
> -			maybe_pull_tail(skb, header_size);
> +			err = maybe_pull_tail(skb,
> +					      off +
> +					      sizeof(struct frag_hdr),
> +					      MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			hp = OPT_HDR(struct frag_hdr, skb, off);
> +
> +			if (hp->frag_off & htons(IP6_OFFSET | IP6_MF))
> +				fragment = true;
>  
>  			nexthdr = hp->nexthdr;
> -			off += (hp->hdrlen+2)<<2;
> +			off += sizeof(struct frag_hdr);
>  			break;
>  		}
> -		case IPPROTO_FRAGMENT:
> -			fragment = true;
> -			/* fall through */
>  		default:
>  			done = true;
>  			break;
>  		}
>  	}
>  
> -	if (!done) {
> -		if (net_ratelimit())
> -			netdev_err(vif->dev, "Failed to parse packet header\n");
> -		goto out;
> -	}
> +	err = -EPROTO;
>  
> -	if (fragment) {
> -		if (net_ratelimit())
> -			netdev_err(vif->dev, "Packet is a fragment!\n");
> +	if (!done || fragment)
>  		goto out;
> -	}
>  
>  	switch (nexthdr) {
>  	case IPPROTO_TCP:
> @@ -1304,17 +1347,17 @@ static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
>  			goto out;
>  
>  		if (recalculate_partial_csum) {
> -			struct tcphdr *tcph = tcp_hdr(skb);
> -
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct tcphdr);
> -			maybe_pull_tail(skb, header_size);
> -
> -			tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
> -						       &ipv6h->daddr,
> -						       skb->len - off,
> -						       IPPROTO_TCP, 0);
> +			err = maybe_pull_tail(skb,
> +					      off + sizeof(struct tcphdr),
> +					      MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			tcp_hdr(skb)->check =
> +				~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
> +						 &ipv6_hdr(skb)->daddr,
> +						 skb->len - off,
> +						 IPPROTO_TCP, 0);
>  		}
>  		break;
>  	case IPPROTO_UDP:
> @@ -1323,25 +1366,20 @@ static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
>  			goto out;
>  
>  		if (recalculate_partial_csum) {
> -			struct udphdr *udph = udp_hdr(skb);
> -
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct udphdr);
> -			maybe_pull_tail(skb, header_size);
> -
> -			udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
> -						       &ipv6h->daddr,
> -						       skb->len - off,
> -						       IPPROTO_UDP, 0);
> +			err = maybe_pull_tail(skb,
> +					      off + sizeof(struct udphdr),
> +					      MAX_IPV6_HDR_LEN);
> +			if (err < 0)
> +				goto out;
> +
> +			udp_hdr(skb)->check =
> +				~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
> +						 &ipv6_hdr(skb)->daddr,
> +						 skb->len - off,
> +						 IPPROTO_UDP, 0);
>  		}
>  		break;
>  	default:
> -		if (net_ratelimit())
> -			netdev_err(vif->dev,
> -				   "Attempting to checksum a non-TCP/UDP packet, "
> -				   "dropping a protocol %d packet\n",
> -				   nexthdr);
>  		goto out;
>  	}
>  
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 5d89d1b..c56c350 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -4,6 +4,7 @@
>  #include <uapi/linux/ipv6.h>
>  
>  #define ipv6_optlen(p)  (((p)->hdrlen+1) << 3)
> +#define ipv6_authlen(p) (((p)->hdrlen+2) << 2)
>  /*
>   * This structure contains configuration options per IPv6 link.
>   */
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index eb198ac..488316e 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -110,7 +110,8 @@ struct frag_hdr {
>  	__be32	identification;
>  };
>  
> -#define	IP6_MF	0x0001
> +#define	IP6_MF		0x0001
> +#define	IP6_OFFSET	0xFFF8
>  
>  #include <net/sock.h>
>  
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-03 17:39 [PATCH net v5] xen-netback: fix fragment detection in checksum setup Paul Durrant
  2013-12-05  6:28 ` annie li
  2013-12-05 14:53 ` Wei Liu
@ 2013-12-06  1:32 ` David Miller
  2013-12-10 16:11 ` [Xen-devel] " Jan Beulich
  3 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2013-12-06  1:32 UTC (permalink / raw)
  To: paul.durrant
  Cc: xen-devel, netdev, zoltan.kiss, wei.liu2, ian.campbell,
	david.vrabel

From: Paul Durrant <paul.durrant@citrix.com>
Date: Tue, 3 Dec 2013 17:39:29 +0000

> The code to detect fragments in checksum_setup() was missing for IPv4 and
> too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
> with a fragment header even if they are not a fragment - i.e. offset is zero,
> and M bit is not set).
> 
> This patch also incorporates a fix to callers of maybe_pull_tail() where
> skb->network_header was being erroneously added to the length argument.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

Applied and queued up for -stable, thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-03 17:39 [PATCH net v5] xen-netback: fix fragment detection in checksum setup Paul Durrant
                   ` (2 preceding siblings ...)
  2013-12-06  1:32 ` David Miller
@ 2013-12-10 16:11 ` Jan Beulich
  2013-12-10 16:24   ` Paul Durrant
  3 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2013-12-10 16:11 UTC (permalink / raw)
  To: Paul Durrant
  Cc: David Vrabel, Ian Campbell, Wei Liu, Zoltan Kiss, David Miller,
	xen-devel, netdev

>>> On 03.12.13 at 18:39, Paul Durrant <paul.durrant@citrix.com> wrote:
>  static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
>  			     int recalculate_partial_csum)
>  {
> -	struct iphdr *iph = (void *)skb->data;
> -	unsigned int header_size;
>  	unsigned int off;
> -	int err = -EPROTO;
> +	bool fragment;
> +	int err;
> +
> +	fragment = false;
> +
> +	err = maybe_pull_tail(skb,
> +			      sizeof(struct iphdr),
> +			      MAX_IP_HDR_LEN);
> +	if (err < 0)
> +		goto out;
>  
> -	off = sizeof(struct iphdr);
> +	if (ip_hdr(skb)->frag_off & htons(IP_OFFSET | IP_MF))
> +		fragment = true;

You don't seem to be using "fragment" anywhere.

>  
> -	header_size = skb->network_header + off + MAX_IPOPTLEN;
> -	maybe_pull_tail(skb, header_size);
> +	off = ip_hdrlen(skb);
>  
> -	off = iph->ihl * 4;
> +	err = -EPROTO;
>  
> -	switch (iph->protocol) {
> +	switch (ip_hdr(skb)->protocol) {
>  	case IPPROTO_TCP:
>  		if (!skb_partial_csum_set(skb, off,
>  					  offsetof(struct tcphdr, check)))
>  			goto out;
>  
>  		if (recalculate_partial_csum) {
> -			struct tcphdr *tcph = tcp_hdr(skb);
> -
> -			header_size = skb->network_header +
> -				off +
> -				sizeof(struct tcphdr);
> -			maybe_pull_tail(skb, header_size);
> -
> -			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
> -							 skb->len - off,
> -							 IPPROTO_TCP, 0);
> +			err = maybe_pull_tail(skb,
> +					      off + sizeof(struct tcphdr),
> +					      MAX_IP_HDR_LEN);

Is it really necessary/worthwhile to specify MAX_IP_HDR_LEN
here? Other than in the IPv6 case you're not risking to need
another pull if you simply used off + sizeof(struct tcphdr) instead.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-10 16:11 ` [Xen-devel] " Jan Beulich
@ 2013-12-10 16:24   ` Paul Durrant
  2013-12-10 16:58     ` Jan Beulich
  2013-12-10 17:12     ` Eric Dumazet
  0 siblings, 2 replies; 15+ messages in thread
From: Paul Durrant @ 2013-12-10 16:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: David Vrabel, Ian Campbell, Wei Liu, Zoltan Kiss, David Miller,
	xen-devel, netdev@vger.kernel.org

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 10 December 2013 16:12
> To: Paul Durrant
> Cc: David Vrabel; Ian Campbell; Wei Liu; Zoltan Kiss; David Miller; xen-devel;
> netdev@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection
> in checksum setup
> 
> >>> On 03.12.13 at 18:39, Paul Durrant <paul.durrant@citrix.com> wrote:
> >  static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> >  			     int recalculate_partial_csum)
> >  {
> > -	struct iphdr *iph = (void *)skb->data;
> > -	unsigned int header_size;
> >  	unsigned int off;
> > -	int err = -EPROTO;
> > +	bool fragment;
> > +	int err;
> > +
> > +	fragment = false;
> > +
> > +	err = maybe_pull_tail(skb,
> > +			      sizeof(struct iphdr),
> > +			      MAX_IP_HDR_LEN);
> > +	if (err < 0)
> > +		goto out;
> >
> > -	off = sizeof(struct iphdr);
> > +	if (ip_hdr(skb)->frag_off & htons(IP_OFFSET | IP_MF))
> > +		fragment = true;
> 
> You don't seem to be using "fragment" anywhere.
> 
> >
> > -	header_size = skb->network_header + off + MAX_IPOPTLEN;
> > -	maybe_pull_tail(skb, header_size);
> > +	off = ip_hdrlen(skb);
> >
> > -	off = iph->ihl * 4;
> > +	err = -EPROTO;
> >
> > -	switch (iph->protocol) {
> > +	switch (ip_hdr(skb)->protocol) {
> >  	case IPPROTO_TCP:
> >  		if (!skb_partial_csum_set(skb, off,
> >  					  offsetof(struct tcphdr, check)))
> >  			goto out;
> >
> >  		if (recalculate_partial_csum) {
> > -			struct tcphdr *tcph = tcp_hdr(skb);
> > -
> > -			header_size = skb->network_header +
> > -				off +
> > -				sizeof(struct tcphdr);
> > -			maybe_pull_tail(skb, header_size);
> > -
> > -			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph-
> >daddr,
> > -							 skb->len - off,
> > -							 IPPROTO_TCP, 0);
> > +			err = maybe_pull_tail(skb,
> > +					      off + sizeof(struct tcphdr),
> > +					      MAX_IP_HDR_LEN);
> 
> Is it really necessary/worthwhile to specify MAX_IP_HDR_LEN
> here? Other than in the IPv6 case you're not risking to need
> another pull if you simply used off + sizeof(struct tcphdr) instead.
> 

Yes, I guess that's true but if we decide to pull up at all then is it harmful to pull more than we absolutely need?

  Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-10 16:24   ` Paul Durrant
@ 2013-12-10 16:58     ` Jan Beulich
  2013-12-10 17:29       ` Paul Durrant
  2013-12-10 17:12     ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2013-12-10 16:58 UTC (permalink / raw)
  To: Paul Durrant
  Cc: David Vrabel, Ian Campbell, Wei Liu, Zoltan Kiss, David Miller,
	xen-devel, netdev@vger.kernel.org

>>> On 10.12.13 at 17:24, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 10 December 2013 16:12
>> To: Paul Durrant
>> Cc: David Vrabel; Ian Campbell; Wei Liu; Zoltan Kiss; David Miller; 
> xen-devel;
>> netdev@vger.kernel.org 
>> Subject: Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection
>> in checksum setup
>> 
>> >>> On 03.12.13 at 18:39, Paul Durrant <paul.durrant@citrix.com> wrote:
>> >  static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
>> >  			     int recalculate_partial_csum)
>> >  {
>> > -	struct iphdr *iph = (void *)skb->data;
>> > -	unsigned int header_size;
>> >  	unsigned int off;
>> > -	int err = -EPROTO;
>> > +	bool fragment;
>> > +	int err;
>> > +
>> > +	fragment = false;
>> > +
>> > +	err = maybe_pull_tail(skb,
>> > +			      sizeof(struct iphdr),
>> > +			      MAX_IP_HDR_LEN);
>> > +	if (err < 0)
>> > +		goto out;
>> >
>> > -	off = sizeof(struct iphdr);
>> > +	if (ip_hdr(skb)->frag_off & htons(IP_OFFSET | IP_MF))
>> > +		fragment = true;
>> 
>> You don't seem to be using "fragment" anywhere.
>> 
>> >
>> > -	header_size = skb->network_header + off + MAX_IPOPTLEN;
>> > -	maybe_pull_tail(skb, header_size);
>> > +	off = ip_hdrlen(skb);
>> >
>> > -	off = iph->ihl * 4;
>> > +	err = -EPROTO;
>> >
>> > -	switch (iph->protocol) {
>> > +	switch (ip_hdr(skb)->protocol) {
>> >  	case IPPROTO_TCP:
>> >  		if (!skb_partial_csum_set(skb, off,
>> >  					  offsetof(struct tcphdr, check)))
>> >  			goto out;
>> >
>> >  		if (recalculate_partial_csum) {
>> > -			struct tcphdr *tcph = tcp_hdr(skb);
>> > -
>> > -			header_size = skb->network_header +
>> > -				off +
>> > -				sizeof(struct tcphdr);
>> > -			maybe_pull_tail(skb, header_size);
>> > -
>> > -			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph-
>> >daddr,
>> > -							 skb->len - off,
>> > -							 IPPROTO_TCP, 0);
>> > +			err = maybe_pull_tail(skb,
>> > +					      off + sizeof(struct tcphdr),
>> > +					      MAX_IP_HDR_LEN);
>> 
>> Is it really necessary/worthwhile to specify MAX_IP_HDR_LEN
>> here? Other than in the IPv6 case you're not risking to need
>> another pull if you simply used off + sizeof(struct tcphdr) instead.
>> 
> 
> Yes, I guess that's true but if we decide to pull up at all then is it 
> harmful to pull more than we absolutely need?

_If_ we manage to pull anything here, it means we weren't able to
pull up to the max anyway, so it seems a little odd to try again.

Another question: Don't the skb_partial_csum_set() calls require
the respective pulls to have happened already?

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-10 16:24   ` Paul Durrant
  2013-12-10 16:58     ` Jan Beulich
@ 2013-12-10 17:12     ` Eric Dumazet
  2013-12-11  9:47       ` Ian Campbell
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2013-12-10 17:12 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Jan Beulich, David Vrabel, Ian Campbell, Wei Liu, Zoltan Kiss,
	David Miller, xen-devel, netdev@vger.kernel.org

On Tue, 2013-12-10 at 16:24 +0000, Paul Durrant wrote:

> Yes, I guess that's true but if we decide to pull up at all then is it harmful to pull more than we absolutely need?

Pulling too many bytes might hurt aggregation.

When one skb is merged to another one (skb_try_coalesce()), if you have
payload in skb->head, we cannot free the head, and need one additional
'frag'

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-10 16:58     ` Jan Beulich
@ 2013-12-10 17:29       ` Paul Durrant
  2013-12-11  7:27         ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Durrant @ 2013-12-10 17:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: David Vrabel, Ian Campbell, Wei Liu, Zoltan Kiss, David Miller,
	xen-devel, netdev@vger.kernel.org

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Jan Beulich
> Sent: 10 December 2013 16:58
> To: Paul Durrant
> Cc: David Vrabel; Ian Campbell; Wei Liu; Zoltan Kiss; David Miller; xen-devel;
> netdev@vger.kernel.org
> Subject: RE: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection
> in checksum setup
> 
> >>> On 10.12.13 at 17:24, Paul Durrant <Paul.Durrant@citrix.com> wrote:
> >>  -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: 10 December 2013 16:12
> >> To: Paul Durrant
> >> Cc: David Vrabel; Ian Campbell; Wei Liu; Zoltan Kiss; David Miller;
> > xen-devel;
> >> netdev@vger.kernel.org
> >> Subject: Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment
> detection
> >> in checksum setup
> >>
> >> >>> On 03.12.13 at 18:39, Paul Durrant <paul.durrant@citrix.com> wrote:
> >> >  static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> >> >  			     int recalculate_partial_csum)
> >> >  {
> >> > -	struct iphdr *iph = (void *)skb->data;
> >> > -	unsigned int header_size;
> >> >  	unsigned int off;
> >> > -	int err = -EPROTO;
> >> > +	bool fragment;
> >> > +	int err;
> >> > +
> >> > +	fragment = false;
> >> > +
> >> > +	err = maybe_pull_tail(skb,
> >> > +			      sizeof(struct iphdr),
> >> > +			      MAX_IP_HDR_LEN);
> >> > +	if (err < 0)
> >> > +		goto out;
> >> >
> >> > -	off = sizeof(struct iphdr);
> >> > +	if (ip_hdr(skb)->frag_off & htons(IP_OFFSET | IP_MF))
> >> > +		fragment = true;
> >>
> >> You don't seem to be using "fragment" anywhere.
> >>
> >> >
> >> > -	header_size = skb->network_header + off + MAX_IPOPTLEN;
> >> > -	maybe_pull_tail(skb, header_size);
> >> > +	off = ip_hdrlen(skb);
> >> >
> >> > -	off = iph->ihl * 4;
> >> > +	err = -EPROTO;
> >> >
> >> > -	switch (iph->protocol) {
> >> > +	switch (ip_hdr(skb)->protocol) {
> >> >  	case IPPROTO_TCP:
> >> >  		if (!skb_partial_csum_set(skb, off,
> >> >  					  offsetof(struct tcphdr, check)))
> >> >  			goto out;
> >> >
> >> >  		if (recalculate_partial_csum) {
> >> > -			struct tcphdr *tcph = tcp_hdr(skb);
> >> > -
> >> > -			header_size = skb->network_header +
> >> > -				off +
> >> > -				sizeof(struct tcphdr);
> >> > -			maybe_pull_tail(skb, header_size);
> >> > -
> >> > -			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph-
> >> >daddr,
> >> > -							 skb->len - off,
> >> > -							 IPPROTO_TCP, 0);
> >> > +			err = maybe_pull_tail(skb,
> >> > +					      off + sizeof(struct tcphdr),
> >> > +					      MAX_IP_HDR_LEN);
> >>
> >> Is it really necessary/worthwhile to specify MAX_IP_HDR_LEN
> >> here? Other than in the IPv6 case you're not risking to need
> >> another pull if you simply used off + sizeof(struct tcphdr) instead.
> >>
> >
> > Yes, I guess that's true but if we decide to pull up at all then is it
> > harmful to pull more than we absolutely need?
> 
> _If_ we manage to pull anything here, it means we weren't able to
> pull up to the max anyway, so it seems a little odd to try again.
> 

No, it may mean that we've not tried to pull at all until we get to this point because the IP header was already in the linear area. The whole point of the max argument is that we only try once.

> Another question: Don't the skb_partial_csum_set() calls require
> the respective pulls to have happened already?
> 

I'm not aware of that requirement. There's no specific comment to that effect that I can see and the code suggests that we only need to have pulled up as far as the end of the checksum location itself.

  Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-10 17:29       ` Paul Durrant
@ 2013-12-11  7:27         ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2013-12-11  7:27 UTC (permalink / raw)
  To: Paul Durrant
  Cc: David Vrabel, Ian Campbell, Wei Liu, Zoltan Kiss, David Miller,
	xen-devel, netdev@vger.kernel.org

>>> On 10.12.13 at 18:29, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> Another question: Don't the skb_partial_csum_set() calls require
>> the respective pulls to have happened already?
>> 
> 
> I'm not aware of that requirement. There's no specific comment to that 
> effect that I can see and the code suggests that we only need to have pulled 
> up as far as the end of the checksum location itself.

Which we didn't - the previous pull only covered the IP header (and
in the IPv4 case even thereof only the basic, fixed size part).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Xen-devel] [PATCH net v5] xen-netback: fix fragment detection in checksum setup
  2013-12-10 17:12     ` Eric Dumazet
@ 2013-12-11  9:47       ` Ian Campbell
  0 siblings, 0 replies; 15+ messages in thread
From: Ian Campbell @ 2013-12-11  9:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paul Durrant, Jan Beulich, David Vrabel, Wei Liu, Zoltan Kiss,
	David Miller, xen-devel, netdev@vger.kernel.org

On Tue, 2013-12-10 at 09:12 -0800, Eric Dumazet wrote:
> On Tue, 2013-12-10 at 16:24 +0000, Paul Durrant wrote:
> 
> > Yes, I guess that's true but if we decide to pull up at all then is it harmful to pull more than we absolutely need?
> 
> Pulling too many bytes might hurt aggregation.
> 
> When one skb is merged to another one (skb_try_coalesce()), if you have
> payload in skb->head, we cannot free the head, and need one additional
> 'frag'

Does that ever happen to skb's which are received by a NIC and passed up
to the network stack (including forwarding by bridging of netfilter)? I
thought this path only happened for locally generated traffic.

If it can happen for received skb's then that seems like a pretty solid
reason not to over estimate in the pull up phase.

Ian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-12-11  9:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-03 17:39 [PATCH net v5] xen-netback: fix fragment detection in checksum setup Paul Durrant
2013-12-05  6:28 ` annie li
2013-12-05  9:08   ` Paul Durrant
2013-12-05 10:00     ` [Xen-devel] " annie li
2013-12-05 10:45       ` Paul Durrant
2013-12-05 13:11         ` annie li
2013-12-05 14:53 ` Wei Liu
2013-12-06  1:32 ` David Miller
2013-12-10 16:11 ` [Xen-devel] " Jan Beulich
2013-12-10 16:24   ` Paul Durrant
2013-12-10 16:58     ` Jan Beulich
2013-12-10 17:29       ` Paul Durrant
2013-12-11  7:27         ` Jan Beulich
2013-12-10 17:12     ` Eric Dumazet
2013-12-11  9:47       ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).