netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] net: Remote checksum offload for VXLAN
@ 2014-11-24 23:52 Tom Herbert
  2014-11-24 23:52 ` [PATCH net-next 1/3] net: Add remcsum_adjust as common function for remote checksum offload Tom Herbert
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Tom Herbert @ 2014-11-24 23:52 UTC (permalink / raw)
  To: davem, netdev

This patch set adds support for remote checksum offload in VXLAN.

The remote checksum offload is generalized by creating a common
function (remcsum_adjust) that does the work of modifying the
checksum in remote checksum offload. This function can be called
from normal or GRO path. GUE was modified to use this function.

To support RCO is VXLAN we use the 9th bit in the reserved
flags to indicated remote checksum offload. The start and offset
values are encoded n a compressed form in the low order (reserved)
byte of the vni field.

Remote checksum offload is described in
https://tools.ietf.org/html/draft-herbert-remotecsumoffload-01

Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).

With UDP checksums and Remote Checksum Offload
  IPv4
      Client
        11.84% CPU utilization
      Server
        12.96% CPU utilization
      9197 Mbps
  IPv6
      Client
        12.46% CPU utilization
      Server
        14.48% CPU utilization
      8963 Mbps

With UDP checksums, no remote checksum offload
  IPv4
      Client
        15.67% CPU utilization
      Server
        14.83% CPU utilization
      9094 Mbps
  IPv6
      Client
        16.21% CPU utilization
      Server
        14.32% CPU utilization
      9058 Mbps
 
No UDP checksums
  IPv4
      Client
        15.03% CPU utilization
      Server
        23.09% CPU utilization
      9089 Mbps
  IPv6
      Client
        16.18% CPU utilization
      Server
        26.57% CPU utilization
       8954 Mbps

Tom Herbert (3):
  net: Add remcsum_adjust as common function for remote checksum offload
  gue: Call remcsum_adjust
  vxlan: Remote checksum offload

 drivers/net/vxlan.c          | 188 +++++++++++++++++++++++++++++++++++++++++--
 include/net/checksum.h       |  16 ++++
 include/net/vxlan.h          |   2 +
 include/uapi/linux/if_link.h |   1 +
 net/ipv4/fou.c               |  84 ++++---------------
 5 files changed, 216 insertions(+), 75 deletions(-)

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/3] net: Add remcsum_adjust as common function for remote checksum offload
  2014-11-24 23:52 [PATCH net-next 0/3] net: Remote checksum offload for VXLAN Tom Herbert
@ 2014-11-24 23:52 ` Tom Herbert
  2014-11-24 23:52 ` [PATCH net-next 2/3] gue: Call remcsum_adjust Tom Herbert
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Tom Herbert @ 2014-11-24 23:52 UTC (permalink / raw)
  To: davem, netdev

This function does the work to update a checksum field as part of
remote checksum offload.

remcsum_adjust does the following:

1) Subtract out the calculated checksum from the beginning of the
   packet (ptr arg) to the start offset.
2) Adjust the checksum field indicated by offset based on the modified
   checksum value from above step.
3) Return the difference in the old checksum field value and the
   new one. The caller will use this to update skb->csum and NAPI csum.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/checksum.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index 6465bae..e339a95 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -151,4 +151,20 @@ static inline void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
 				 (__force __be32)to, pseudohdr);
 }
 
+static inline __wsum remcsum_adjust(void *ptr, __wsum csum,
+				    int start, int offset)
+{
+	__sum16 *psum = (__sum16 *)(ptr + offset);
+	__wsum delta;
+
+	/* Subtract out checksum up to start */
+	csum = csum_sub(csum, csum_partial(ptr, start, 0));
+
+	/* Set derived checksum in packet */
+	delta = csum_sub(csum_fold(csum), *psum);
+	*psum = csum_fold(csum);
+
+	return delta;
+}
+
 #endif
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/3] gue: Call remcsum_adjust
  2014-11-24 23:52 [PATCH net-next 0/3] net: Remote checksum offload for VXLAN Tom Herbert
  2014-11-24 23:52 ` [PATCH net-next 1/3] net: Add remcsum_adjust as common function for remote checksum offload Tom Herbert
@ 2014-11-24 23:52 ` Tom Herbert
  2014-11-24 23:52 ` [PATCH net-next 3/3] vxlan: Remote checksum offload Tom Herbert
  2014-11-25 18:50 ` [PATCH net-next 0/3] net: Remote checksum offload for VXLAN David Miller
  3 siblings, 0 replies; 8+ messages in thread
From: Tom Herbert @ 2014-11-24 23:52 UTC (permalink / raw)
  To: davem, netdev

Change remote checksum offload to call remcsum_adjust. This also
eliminates the optimization to skip an IP header as part of the
adjustment (really does not seem to be much of a win).

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/fou.c | 84 ++++++++++++----------------------------------------------
 1 file changed, 17 insertions(+), 67 deletions(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 3dfe982..b986298 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -64,15 +64,13 @@ static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
 }
 
 static struct guehdr *gue_remcsum(struct sk_buff *skb, struct guehdr *guehdr,
-				  void *data, int hdrlen, u8 ipproto)
+				  void *data, size_t hdrlen, u8 ipproto)
 {
 	__be16 *pd = data;
-	u16 start = ntohs(pd[0]);
-	u16 offset = ntohs(pd[1]);
-	u16 poffset = 0;
-	u16 plen;
-	__wsum csum, delta;
-	__sum16 *psum;
+	size_t start = ntohs(pd[0]);
+	size_t offset = ntohs(pd[1]);
+	size_t plen = hdrlen + max_t(size_t, offset + sizeof(u16), start);
+	__wsum delta;
 
 	if (skb->remcsum_offload) {
 		/* Already processed in GRO path */
@@ -80,35 +78,15 @@ static struct guehdr *gue_remcsum(struct sk_buff *skb, struct guehdr *guehdr,
 		return guehdr;
 	}
 
-	if (start > skb->len - hdrlen ||
-	    offset > skb->len - hdrlen - sizeof(u16))
-		return NULL;
-
-	if (unlikely(skb->ip_summed != CHECKSUM_COMPLETE))
-		__skb_checksum_complete(skb);
-
-	plen = hdrlen + offset + sizeof(u16);
 	if (!pskb_may_pull(skb, plen))
 		return NULL;
 	guehdr = (struct guehdr *)&udp_hdr(skb)[1];
 
-	if (ipproto == IPPROTO_IP && sizeof(struct iphdr) < plen) {
-		struct iphdr *ip = (struct iphdr *)(skb->data + hdrlen);
-
-		/* If next header happens to be IP we can skip that for the
-		 * checksum calculation since the IP header checksum is zero
-		 * if correct.
-		 */
-		poffset = ip->ihl * 4;
-	}
-
-	csum = csum_sub(skb->csum, skb_checksum(skb, poffset + hdrlen,
-						start - poffset - hdrlen, 0));
+	if (unlikely(skb->ip_summed != CHECKSUM_COMPLETE))
+		__skb_checksum_complete(skb);
 
-	/* Set derived checksum in packet */
-	psum = (__sum16 *)(skb->data + hdrlen + offset);
-	delta = csum_sub(csum_fold(csum), *psum);
-	*psum = csum_fold(csum);
+	delta = remcsum_adjust((void *)guehdr + hdrlen,
+			       skb->csum, start, offset);
 
 	/* Adjust skb->csum since we changed the packet */
 	skb->csum = csum_add(skb->csum, delta);
@@ -158,9 +136,6 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 
 	ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len);
 
-	/* Pull UDP header now, skb->data points to guehdr */
-	__skb_pull(skb, sizeof(struct udphdr));
-
 	/* Pull csum through the guehdr now . This can be used if
 	 * there is a remote checksum offload.
 	 */
@@ -188,7 +163,7 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 	if (unlikely(guehdr->control))
 		return gue_control_message(skb, guehdr);
 
-	__skb_pull(skb, hdrlen);
+	__skb_pull(skb, sizeof(struct udphdr) + hdrlen);
 	skb_reset_transport_header(skb);
 
 	return -guehdr->proto_ctype;
@@ -248,24 +223,17 @@ static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
 				      size_t hdrlen, u8 ipproto)
 {
 	__be16 *pd = data;
-	u16 start = ntohs(pd[0]);
-	u16 offset = ntohs(pd[1]);
-	u16 poffset = 0;
-	u16 plen;
-	void *ptr;
-	__wsum csum, delta;
-	__sum16 *psum;
+	size_t start = ntohs(pd[0]);
+	size_t offset = ntohs(pd[1]);
+	size_t plen = hdrlen + max_t(size_t, offset + sizeof(u16), start);
+	__wsum delta;
 
 	if (skb->remcsum_offload)
 		return guehdr;
 
-	if (start > skb_gro_len(skb) - hdrlen ||
-	    offset > skb_gro_len(skb) - hdrlen - sizeof(u16) ||
-	    !NAPI_GRO_CB(skb)->csum_valid || skb->remcsum_offload)
+	if (!NAPI_GRO_CB(skb)->csum_valid)
 		return NULL;
 
-	plen = hdrlen + offset + sizeof(u16);
-
 	/* Pull checksum that will be written */
 	if (skb_gro_header_hard(skb, off + plen)) {
 		guehdr = skb_gro_header_slow(skb, off + plen, off);
@@ -273,26 +241,8 @@ static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
 			return NULL;
 	}
 
-	ptr = (void *)guehdr + hdrlen;
-
-	if (ipproto == IPPROTO_IP &&
-	    (hdrlen + sizeof(struct iphdr) < plen)) {
-		struct iphdr *ip = (struct iphdr *)(ptr + hdrlen);
-
-		/* If next header happens to be IP we can skip
-		 * that for the checksum calculation since the
-		 * IP header checksum is zero if correct.
-		 */
-		poffset = ip->ihl * 4;
-	}
-
-	csum = csum_sub(NAPI_GRO_CB(skb)->csum,
-			csum_partial(ptr + poffset, start - poffset, 0));
-
-	/* Set derived checksum in packet */
-	psum = (__sum16 *)(ptr + offset);
-	delta = csum_sub(csum_fold(csum), *psum);
-	*psum = csum_fold(csum);
+	delta = remcsum_adjust((void *)guehdr + hdrlen,
+			       NAPI_GRO_CB(skb)->csum, start, offset);
 
 	/* Adjust skb->csum since we changed the packet */
 	skb->csum = csum_add(skb->csum, delta);
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 3/3] vxlan: Remote checksum offload
  2014-11-24 23:52 [PATCH net-next 0/3] net: Remote checksum offload for VXLAN Tom Herbert
  2014-11-24 23:52 ` [PATCH net-next 1/3] net: Add remcsum_adjust as common function for remote checksum offload Tom Herbert
  2014-11-24 23:52 ` [PATCH net-next 2/3] gue: Call remcsum_adjust Tom Herbert
@ 2014-11-24 23:52 ` Tom Herbert
  2014-11-25  1:06   ` Jesse Gross
  2014-11-25 18:50 ` [PATCH net-next 0/3] net: Remote checksum offload for VXLAN David Miller
  3 siblings, 1 reply; 8+ messages in thread
From: Tom Herbert @ 2014-11-24 23:52 UTC (permalink / raw)
  To: davem, netdev

Add support for remote checksum offload in VXLAN. This commandeers a
reserved bit to indicate that RCO is being done, and uses the low order
reserved eight bits of the VNI to hold the start and offset values in a
compressed manner.

Start is encoded in the low order seven bits of VNI. This is start >> 1
so that the checksum start offset is 0-254 using even values only.
Checksum offset (transport checksum field) is indicated in the high
order bit in the low order byte of the VNI. If the bit is set, the
checksum field is for UDP (so offset = start + 6), else checksum
field is for TCP (so offset = start + 16). Only TCP and UDP are
supported in this implementation.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/vxlan.c          | 188 +++++++++++++++++++++++++++++++++++++++++--
 include/net/vxlan.h          |   2 +
 include/uapi/linux/if_link.h |   1 +
 3 files changed, 183 insertions(+), 8 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e9f81d4..763f95e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -65,7 +65,17 @@
 #define VXLAN_VID_MASK	(VXLAN_N_VID - 1)
 #define VXLAN_HLEN (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
 
-#define VXLAN_FLAGS 0x08000000	/* struct vxlanhdr.vx_flags required value. */
+#define VXLAN_F_REQ	htonl(0x08000000) /* Required VXLAN flag */
+#define VXLAN_F_RCO	htonl(0x00400000) /* Remote checksum offload */
+
+#define VXLAN_MANDATORY_FLAGS (VXLAN_F_REQ)
+#define VXLAN_ALL_FLAGS (VXLAN_F_REQ | VXLAN_F_RCO)
+
+#define VXLAN_RCO_MASK	0x7f	/* Last byte of vni field */
+#define VXLAN_RCO_UDP	0x80	/* Indicate UDP RCO (TCP when not set *) */
+#define VXLAN_RCO_SHIFT	1	/* Left shift of start */
+#define VXLAN_RCO_SHIFT_MASK ((1 << VXLAN_RCO_SHIFT) - 1)
+#define VXLAN_MAX_REMCSUM_START (VXLAN_RCO_MASK << VXLAN_RCO_SHIFT)
 
 /* UDP port for VXLAN traffic.
  * The IANA assigned port is 4789, but the Linux default is 8472
@@ -545,6 +555,46 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 	return 1;
 }
 
+static struct vxlanhdr *vxlan_gro_remcsum(struct sk_buff *skb,
+					  unsigned int off,
+					  struct vxlanhdr *vh, size_t hdrlen,
+					  u32 data)
+{
+	size_t start, offset, plen;
+	__wsum delta;
+
+	if (skb->remcsum_offload)
+		return vh;
+
+	if (!NAPI_GRO_CB(skb)->csum_valid)
+		return NULL;
+
+	start = (data & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
+	offset = start + ((data & VXLAN_RCO_UDP) ?
+			  offsetof(struct udphdr, check) :
+			  offsetof(struct tcphdr, check));
+
+	plen = hdrlen + offset + sizeof(u16);
+
+	/* Pull checksum that will be written */
+	if (skb_gro_header_hard(skb, off + plen)) {
+		vh = skb_gro_header_slow(skb, off + plen, off);
+		if (!vh)
+			return NULL;
+	}
+
+	delta = remcsum_adjust((void *)vh + hdrlen,
+			       NAPI_GRO_CB(skb)->csum, start, offset);
+
+	/* Adjust skb->csum since we changed the packet */
+	skb->csum = csum_add(skb->csum, delta);
+	NAPI_GRO_CB(skb)->csum = csum_add(NAPI_GRO_CB(skb)->csum, delta);
+
+	skb->remcsum_offload = 1;
+
+	return vh;
+}
+
 static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 {
 	struct sk_buff *p, **pp = NULL;
@@ -566,6 +616,14 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
 	skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
 	skb_gro_postpull_rcsum(skb, vh, sizeof(struct vxlanhdr));
 
+	if (vh->vx_flags & VXLAN_F_RCO) {
+		vh = vxlan_gro_remcsum(skb, off_vx, vh, sizeof(struct vxlanhdr),
+				       ntohl(vh->vx_vni));
+
+		if (!vh)
+			goto out;
+	}
+
 	off_eth = skb_gro_offset(skb);
 	hlen = off_eth + sizeof(*eh);
 	eh   = skb_gro_header_fast(skb, off_eth);
@@ -1131,6 +1189,42 @@ static void vxlan_igmp_leave(struct work_struct *work)
 	dev_put(vxlan->dev);
 }
 
+static struct vxlanhdr *vxlan_remcsum(struct sk_buff *skb, struct vxlanhdr *vh,
+				      size_t hdrlen, u32 data)
+{
+	size_t start, offset, plen;
+	__wsum delta;
+
+	if (skb->remcsum_offload) {
+		/* Already processed in GRO path */
+		skb->remcsum_offload = 0;
+		return vh;
+	}
+
+	start = (data & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
+	offset = start + ((data & VXLAN_RCO_UDP) ?
+			  offsetof(struct udphdr, check) :
+			  offsetof(struct tcphdr, check));
+
+	plen = hdrlen + offset + sizeof(u16);
+
+	if (!pskb_may_pull(skb, plen))
+		return NULL;
+
+	vh = (struct vxlanhdr *)(udp_hdr(skb) + 1);
+
+	if (unlikely(skb->ip_summed != CHECKSUM_COMPLETE))
+		__skb_checksum_complete(skb);
+
+	delta = remcsum_adjust((void *)vh + hdrlen,
+			       skb->csum, start, offset);
+
+	/* Adjust skb->csum since we changed the packet */
+	skb->csum = csum_add(skb->csum, delta);
+
+	return vh;
+}
+
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
@@ -1143,8 +1237,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 
 	/* Return packets with reserved bits set */
 	vxh = (struct vxlanhdr *)(udp_hdr(skb) + 1);
-	if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
-	    (vxh->vx_vni & htonl(0xff))) {
+	if ((vxh->vx_flags & VXLAN_MANDATORY_FLAGS) != VXLAN_MANDATORY_FLAGS ||
+	    (vxh->vx_flags & ~VXLAN_ALL_FLAGS)) {
 		netdev_dbg(skb->dev, "invalid vxlan flags=%#x vni=%#x\n",
 			   ntohl(vxh->vx_flags), ntohl(vxh->vx_vni));
 		goto error;
@@ -1152,6 +1246,14 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 
 	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
 		goto drop;
+	vxh = (struct vxlanhdr *)(udp_hdr(skb) + 1);
+
+	if (vxh->vx_flags & VXLAN_F_RCO) {
+		vxh = vxlan_remcsum(skb, vxh, sizeof(struct vxlanhdr),
+				    ntohl(vxh->vx_vni));
+		if (!vxh)
+			goto drop;
+	}
 
 	vs = rcu_dereference_sk_user_data(sk);
 	if (!vs)
@@ -1577,8 +1679,23 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 	int min_headroom;
 	int err;
 	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
+	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
+	u16 hdrlen = sizeof(struct vxlanhdr);
+
+	if ((vs->flags & VXLAN_F_REMCSUM) &&
+	    skb->ip_summed == CHECKSUM_PARTIAL) {
+		int csum_start = skb_checksum_start_offset(skb);
+
+		if (csum_start <= VXLAN_MAX_REMCSUM_START &&
+		    !(csum_start & VXLAN_RCO_SHIFT_MASK) &&
+		    (skb->csum_offset == offsetof(struct udphdr, check) ||
+		     skb->csum_offset == offsetof(struct tcphdr, check))) {
+			udp_sum = false;
+			type |= SKB_GSO_TUNNEL_REMCSUM;
+		}
+	}
 
-	skb = udp_tunnel_handle_offloads(skb, udp_sum);
+	skb = iptunnel_handle_offloads(skb, udp_sum, type);
 	if (IS_ERR(skb))
 		return -EINVAL;
 
@@ -1598,9 +1715,25 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 		return -ENOMEM;
 
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
+	vxh->vx_flags = VXLAN_MANDATORY_FLAGS;
 	vxh->vx_vni = vni;
 
+	if (type & SKB_GSO_TUNNEL_REMCSUM) {
+		u32 data = (skb_checksum_start_offset(skb) - hdrlen) >>
+			   VXLAN_RCO_SHIFT;
+
+		if (skb->csum_offset == offsetof(struct udphdr, check))
+			data |= VXLAN_RCO_UDP;
+
+		vxh->vx_vni |= htonl(data);
+		vxh->vx_flags |= VXLAN_F_RCO;
+
+		if (!skb_is_gso(skb)) {
+			skb->ip_summed = CHECKSUM_NONE;
+			skb->encapsulation = 0;
+		}
+	}
+
 	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
@@ -1618,8 +1751,23 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 	int min_headroom;
 	int err;
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
+	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
+	u16 hdrlen = sizeof(struct vxlanhdr);
+
+	if ((vs->flags & VXLAN_F_REMCSUM) &&
+	    skb->ip_summed == CHECKSUM_PARTIAL) {
+		int csum_start = skb_checksum_start_offset(skb);
+
+		if (csum_start <= VXLAN_MAX_REMCSUM_START &&
+		    !(csum_start & VXLAN_RCO_SHIFT_MASK) &&
+		    (skb->csum_offset == offsetof(struct udphdr, check) ||
+		     skb->csum_offset == offsetof(struct tcphdr, check))) {
+			udp_sum = false;
+			type |= SKB_GSO_TUNNEL_REMCSUM;
+		}
+	}
 
-	skb = udp_tunnel_handle_offloads(skb, udp_sum);
+	skb = iptunnel_handle_offloads(skb, udp_sum, type);
 	if (IS_ERR(skb))
 		return -EINVAL;
 
@@ -1637,9 +1785,25 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 		return -ENOMEM;
 
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
+	vxh->vx_flags = VXLAN_MANDATORY_FLAGS;
 	vxh->vx_vni = vni;
 
+	if (type & SKB_GSO_TUNNEL_REMCSUM) {
+		u32 data = (skb_checksum_start_offset(skb) - hdrlen) >>
+			   VXLAN_RCO_SHIFT;
+
+		if (skb->csum_offset == offsetof(struct udphdr, check))
+			data |= VXLAN_RCO_UDP;
+
+		vxh->vx_vni |= htonl(data);
+		vxh->vx_flags |= VXLAN_F_RCO;
+
+		if (!skb_is_gso(skb)) {
+			skb->ip_summed = CHECKSUM_NONE;
+			skb->encapsulation = 0;
+		}
+	}
+
 	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
@@ -2229,6 +2393,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_UDP_CSUM]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_REMCSUM]	= { .type = NLA_U8 },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -2350,6 +2515,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	atomic_set(&vs->refcnt, 1);
 	vs->rcv = rcv;
 	vs->data = data;
+	vs->flags = flags;
 
 	/* Initialize the vxlan udp offloads structure */
 	vs->udp_offloads.port = port;
@@ -2547,6 +2713,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	    nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
 		vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
 
+	if (data[IFLA_VXLAN_REMCSUM] && nla_get_u8(data[IFLA_VXLAN_REMCSUM]))
+		vxlan->flags |= VXLAN_F_REMCSUM;
+
 	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
 			   vxlan->dst_port)) {
 		pr_info("duplicate VNI %u\n", vni);
@@ -2615,6 +2784,7 @@ static size_t vxlan_get_size(const struct net_device *dev)
 		nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_UDP_CSUM */
 		nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_UDP_ZERO_CSUM6_TX */
 		nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_UDP_ZERO_CSUM6_RX */
+		nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_REMCSUM */
 		0;
 }
 
@@ -2680,7 +2850,9 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u8(skb, IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
 			!!(vxlan->flags & VXLAN_F_UDP_ZERO_CSUM6_TX)) ||
 	    nla_put_u8(skb, IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
-			!!(vxlan->flags & VXLAN_F_UDP_ZERO_CSUM6_RX)))
+			!!(vxlan->flags & VXLAN_F_UDP_ZERO_CSUM6_RX)) ||
+	    nla_put_u8(skb, IFLA_VXLAN_REMCSUM,
+			!!(vxlan->flags & VXLAN_F_REMCSUM)))
 		goto nla_put_failure;
 
 	if (nla_put(skb, IFLA_VXLAN_PORT_RANGE, sizeof(ports), &ports))
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 57cccd0..652e0dc 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -28,6 +28,7 @@ struct vxlan_sock {
 	struct hlist_head vni_list[VNI_HASH_SIZE];
 	atomic_t	  refcnt;
 	struct udp_offload udp_offloads;
+	u32		  flags;
 };
 
 #define VXLAN_F_LEARN			0x01
@@ -39,6 +40,7 @@ struct vxlan_sock {
 #define VXLAN_F_UDP_CSUM		0x40
 #define VXLAN_F_UDP_ZERO_CSUM6_TX	0x80
 #define VXLAN_F_UDP_ZERO_CSUM6_RX	0x100
+#define VXLAN_F_REMCSUM			0x200
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
 				  vxlan_rcv_t *rcv, void *data,
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 7072d83..d9e31e2 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -353,6 +353,7 @@ enum {
 	IFLA_VXLAN_UDP_CSUM,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+	IFLA_VXLAN_REMCSUM,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 3/3] vxlan: Remote checksum offload
  2014-11-24 23:52 ` [PATCH net-next 3/3] vxlan: Remote checksum offload Tom Herbert
@ 2014-11-25  1:06   ` Jesse Gross
  2014-11-25  2:50     ` Tom Herbert
  0 siblings, 1 reply; 8+ messages in thread
From: Jesse Gross @ 2014-11-25  1:06 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev

On Mon, Nov 24, 2014 at 3:52 PM, Tom Herbert <therbert@google.com> wrote:
> Add support for remote checksum offload in VXLAN. This commandeers a
> reserved bit to indicate that RCO is being done, and uses the low order
> reserved eight bits of the VNI to hold the start and offset values in a
> compressed manner.

Why do you think that this is OK for you to do? It's clear that there
is no consensus for this (and in fact there are other proposals that
use that bit in a different way).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 3/3] vxlan: Remote checksum offload
  2014-11-25  1:06   ` Jesse Gross
@ 2014-11-25  2:50     ` Tom Herbert
  2014-11-25 18:54       ` Jesse Gross
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Herbert @ 2014-11-25  2:50 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev

On Mon, Nov 24, 2014 at 5:06 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Nov 24, 2014 at 3:52 PM, Tom Herbert <therbert@google.com> wrote:
>> Add support for remote checksum offload in VXLAN. This commandeers a
>> reserved bit to indicate that RCO is being done, and uses the low order
>> reserved eight bits of the VNI to hold the start and offset values in a
>> compressed manner.
>
> Why do you think that this is OK for you to do? It's clear that there
> is no consensus for this (and in fact there are other proposals that
> use that bit in a different way).

I asked on nvo3 list (which I believe is the appropriate forum) what
the best way to do this is but haven't gotten any response. I will ask
again-- I would assume that with an implementation and data in hand
that might be better basis for discussion.

The flag bit is currently unused in the Linux implementation, so I
don't think it can break anything as of now. I suppose we could make
RCO for VXLAN a config option and possibly change to use a different
if consensus is reached on the right approach in the future.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/3] net: Remote checksum offload for VXLAN
  2014-11-24 23:52 [PATCH net-next 0/3] net: Remote checksum offload for VXLAN Tom Herbert
                   ` (2 preceding siblings ...)
  2014-11-24 23:52 ` [PATCH net-next 3/3] vxlan: Remote checksum offload Tom Herbert
@ 2014-11-25 18:50 ` David Miller
  3 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2014-11-25 18:50 UTC (permalink / raw)
  To: therbert; +Cc: netdev

From: Tom Herbert <therbert@google.com>
Date: Mon, 24 Nov 2014 15:52:27 -0800

> This patch set adds support for remote checksum offload in VXLAN.
> 
> The remote checksum offload is generalized by creating a common
> function (remcsum_adjust) that does the work of modifying the
> checksum in remote checksum offload. This function can be called
> from normal or GRO path. GUE was modified to use this function.
> 
> To support RCO is VXLAN we use the 9th bit in the reserved
> flags to indicated remote checksum offload. The start and offset
> values are encoded n a compressed form in the low order (reserved)
> byte of the vni field.
> 
> Remote checksum offload is described in
> https://tools.ietf.org/html/draft-herbert-remotecsumoffload-01
> 
> Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).

What to do with the reserved bit seems to still be up in the air,
so I've marked this series as 'deferred'.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 3/3] vxlan: Remote checksum offload
  2014-11-25  2:50     ` Tom Herbert
@ 2014-11-25 18:54       ` Jesse Gross
  0 siblings, 0 replies; 8+ messages in thread
From: Jesse Gross @ 2014-11-25 18:54 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev

On Mon, Nov 24, 2014 at 6:50 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Nov 24, 2014 at 5:06 PM, Jesse Gross <jesse@nicira.com> wrote:
>> On Mon, Nov 24, 2014 at 3:52 PM, Tom Herbert <therbert@google.com> wrote:
>>> Add support for remote checksum offload in VXLAN. This commandeers a
>>> reserved bit to indicate that RCO is being done, and uses the low order
>>> reserved eight bits of the VNI to hold the start and offset values in a
>>> compressed manner.
>>
>> Why do you think that this is OK for you to do? It's clear that there
>> is no consensus for this (and in fact there are other proposals that
>> use that bit in a different way).
>
> I asked on nvo3 list (which I believe is the appropriate forum) what
> the best way to do this is but haven't gotten any response. I will ask
> again-- I would assume that with an implementation and data in hand
> that might be better basis for discussion.
>
> The flag bit is currently unused in the Linux implementation, so I
> don't think it can break anything as of now. I suppose we could make
> RCO for VXLAN a config option and possibly change to use a different
> if consensus is reached on the right approach in the future.

This will definitely break things if this is applied now and the bit
is later used for a different purpose in the future as there will be
no way to update existing deployments.

There are a ton of conflicting proposals in this space so I think
there are only two possible solutions at this point:
 * Potentially support all of them and chose a variant at runtime
though a series of configuration options. This seems ugly,
particularly for GRO.
 * Stick to the version described in the RFC.

I don't think the third alternative of protocol design by order of
patch submission is viable.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-11-25 18:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-24 23:52 [PATCH net-next 0/3] net: Remote checksum offload for VXLAN Tom Herbert
2014-11-24 23:52 ` [PATCH net-next 1/3] net: Add remcsum_adjust as common function for remote checksum offload Tom Herbert
2014-11-24 23:52 ` [PATCH net-next 2/3] gue: Call remcsum_adjust Tom Herbert
2014-11-24 23:52 ` [PATCH net-next 3/3] vxlan: Remote checksum offload Tom Herbert
2014-11-25  1:06   ` Jesse Gross
2014-11-25  2:50     ` Tom Herbert
2014-11-25 18:54       ` Jesse Gross
2014-11-25 18:50 ` [PATCH net-next 0/3] net: Remote checksum offload for VXLAN David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).