[PATCH net-next v2 0/2] NSH and VxLAN-GPE

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/2] NSH and VxLAN-GPE
@ 2016-02-11 19:57 Brian Russell
  2016-02-11 19:57 ` [PATCH net-next v2 1/2] nsh: encapsulation module Brian Russell
  2016-02-11 19:57 ` [PATCH net-next v2 2/2] vxlan: support GPE/NSH Brian Russell
  0 siblings, 2 replies; 10+ messages in thread
From: Brian Russell @ 2016-02-11 19:57 UTC (permalink / raw)
  To: netdev

These patches add a new module to support encap/decap of Network
Service Header (NSH) as defined in:

https://tools.ietf.org/html/draft-ietf-sfc-nsh-01

Both NSH Type 1 and Type 2 metadata are supported with a simple registration
hook to allow listeners to register to see packets with Type 1 or a specific
class of Type 2 metadata. NSH could be added to packets sent over a variety
of link types, eg. VxLAN, GRE, ethernet.

Also included is an extension to VxLAN to handle the Generic Protocol
Extension (GPE) as defined in:

https://tools.ietf.org/html/draft-ietf-nvo3-vxlan-gpe-01

This allows multi-protocol encapsulation over the VxLAN so IPv4, IPv6, MPLS
and NSH encapsulated packets can be sent and received in addition to ethernet
frames. Non-ethernet frames are sent to the default destination, which
requires that the remote option is specified when creating the VxLAN device.

I've tested this by using a netfilter module to encap some app-specific
metadata in NSH type 2 and send it over the VxLAN and a listener module
to receive the corresponding decap'd metadata.

I'm also submitting a corresponding patch for iproute2 to add the gpe option
to the "ip link add type vxlan" command.

v2 - fix copyright notices and tidy up use of types

Brian Russell (2):
  nsh: encapsulation module
  vxlan: support GPE/NSH

 drivers/net/vxlan.c           | 139 ++++++++++++++--
 include/net/nsh.h             | 161 +++++++++++++++++++
 include/net/vxlan.h           |  40 ++++-
 include/uapi/linux/if_ether.h |   1 +
 include/uapi/linux/if_link.h  |   1 +
 net/ipv4/Kconfig              |  10 ++
 net/ipv4/Makefile             |   1 +
 net/ipv4/nsh.c                | 365 ++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 704 insertions(+), 14 deletions(-)
 create mode 100644 include/net/nsh.h
 create mode 100644 net/ipv4/nsh.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 1/2] nsh: encapsulation module
  2016-02-11 19:57 [PATCH net-next v2 0/2] NSH and VxLAN-GPE Brian Russell
@ 2016-02-11 19:57 ` Brian Russell
  2016-02-15 17:01   ` Jiri Benc
  2016-02-17  3:31   ` Alexei Starovoitov
  2016-02-11 19:57 ` [PATCH net-next v2 2/2] vxlan: support GPE/NSH Brian Russell
  1 sibling, 2 replies; 10+ messages in thread
From: Brian Russell @ 2016-02-11 19:57 UTC (permalink / raw)
  To: netdev

Support encap/decap of Network Service Header (NSH) as defined in
https://tools.ietf.org/html/draft-ietf-sfc-nsh-01

Includes support for Type 1 and Type 2 metadata and a simple registration
for listeners to see decapsulated packets based on the Type/Class.

Signed-off-by: Brian Russell <brussell@brocade.com>
---
 include/net/nsh.h             | 161 +++++++++++++++++++
 include/uapi/linux/if_ether.h |   1 +
 net/ipv4/Kconfig              |  10 ++
 net/ipv4/Makefile             |   1 +
 net/ipv4/nsh.c                | 365 ++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 538 insertions(+)
 create mode 100644 include/net/nsh.h
 create mode 100644 net/ipv4/nsh.c

diff --git a/include/net/nsh.h b/include/net/nsh.h
new file mode 100644
index 0000000..8abf5f5
--- /dev/null
+++ b/include/net/nsh.h
@@ -0,0 +1,161 @@
+/*
+ * Network Service Header (NSH) inserted onto encapsulated packets
+ * or frames to realize service function paths.
+ * NSH also provides a mechanism for metadata exchange along the
+ * instantiated service path.
+ *
+ * https://tools.ietf.org/html/draft-ietf-sfc-nsh-01
+ *
+ * Copyright (c) 2016 by Brocade Communications Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef __NET_NSH_H
+#define __NET_NSH_H
+
+#include <linux/types.h>
+#include <linux/skbuff.h>
+
+/*
+ * NSH Base Header + Service Path Header
+ *
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |Ver|O|C|R|R|R|R|R|R|   Length  |    MD Type    | Next Protocol |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |          Service Path ID                      | Service Index |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Ver - Version, set to 0
+ * O - Indicates payload is OAM.
+ * C - Indicates critical metadata TLV is present (must be 0 for MD type 1).
+ * Length - total header length in 4-byte words.
+ * MD Type - Metadata type
+ *           Type 1 - 4 mandatory 4 byte context headers.
+ *           Type 2 - 0 or more var length context headers.
+ * Next Protocol - protocol type of original packet.
+ * Service Path ID (SPI) - identifies a service path. Participating nodes
+ *                         MUST use this identifier for Service Function
+ *                         Path selection.
+ * Service Index (SI) - provides location within the SFP.
+ */
+#define NSH_BF_VER0     0
+#define NSH_BF_VER_MASK 0xc0
+#define NSH_BF_OAM      BIT(5)
+#define NSH_BF_CRIT     BIT(4)
+#define NSH_N_SPI       (1u << 24)
+#define NSH_SPI_MASK    ((NSH_N_SPI-1) << 8)
+#define NSH_N_SI        (1u << 8)
+#define NSH_SI_MASK     (NSH_N_SI-1)
+
+#define NSH_MD_TYPE_1   1
+#define NSH_MD_TYPE_2   2
+
+#define NSH_NEXT_PROTO_IPv4 1
+#define NSH_NEXT_PROTO_IPv6 2
+#define NSH_NEXT_PROTO_ETH  3
+
+#define NSH_LEN_TYPE_1     6
+#define NSH_LEN_TYPE_2_MIN 2
+
+struct nsh_base {
+	u8 base_flags;
+	u8 length;
+	u8 md_type;
+	u8 next_proto;
+};
+
+struct nsh_header {
+	struct nsh_base base;
+	__be32 sp_header;
+};
+
+/*
+ * When the Base Header specifies MD Type 1, four 4-byte Context Headers
+ * MUST be added immediately following the Service Path Header. Thus length
+ * in the base header is set to 6.
+ * Context Headers that carry no metadata MUST be set to zero.
+ */
+#define NSH_MD_TYPE_1_NUM_HDRS 4
+
+struct nsh_md_type_1 {
+	__be32 ctx_hdr1;
+	__be32 ctx_hdr2;
+	__be32 ctx_hdr3;
+	__be32 ctx_hdr4;
+};
+
+/*
+ * When the Base Header specifies MD Type 2, zero or more variable
+ * length Context Headers follow the Service Path Header.
+ *
+ *     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *     |          TLV Class            |C|    Type     |R|R|R|   Len   |
+ *     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *     |                      Variable Metadata                        |
+ *     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * TLV Class - Scope of class (e.g. may be vendor or standards body).
+ * Type - Specific type of information within the scope of given class.
+ *        C bit (MSB) indicates criticality. When set, receiver must process.
+ * Len - Length of variable metadata in 4-byte words.
+ */
+#define NSH_TYPE_CRIT BIT(7)
+
+struct nsh_md_type_2 {
+	__be16 tlv_class;
+	u8 tlv_type;
+	u8 length;
+};
+
+/*
+ * Context header for encap/decap.
+ */
+#define NSH_MD_CLASS_TYPE_1 USHRT_MAX
+#define NSH_MD_TYPE_TYPE_1  U8_MAX
+#define NSH_MD_LEN_TYPE_1   4
+
+struct nsh_metadata {
+	u16 class;
+	u8 crit;
+	u8 type;
+	u8 len;  /* 4 byte words */
+	void *data;
+};
+
+/*
+ * Parse NSH header and notify registered listeners about any metadata.
+ */
+int nsh_decap(struct sk_buff *skb,
+	      u32 *spi,
+	      u8 *si,
+	      u8 *np);
+
+/*
+ * Add NSH header.
+ */
+int nsh_encap(struct sk_buff *skb,
+	      u32 spi,
+	      u8 si,
+	      u8 np,
+	      unsigned int num_ctx_hdrs,
+	      struct nsh_metadata *ctx_hdrs);
+
+
+/* Register hooks to be informed of nsh metadata of specified class */
+struct nsh_listener {
+	struct list_head list;
+	u16 class;
+	unsigned char max_ctx_hdrs;
+	int (*notify)(struct sk_buff *skb,
+		      u32 service_path_id,
+		      u8 service_index,
+		      u8 next_proto,
+		      struct nsh_metadata *ctx_hdrs,
+		      unsigned int num_ctx_hdrs);
+};
+
+int nsh_register_listener(struct nsh_listener *listener);
+int nsh_unregister_listener(struct nsh_listener *listener);
+#endif /* __NET_NSH_H */
diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index ea9221b..eb512b1 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -91,6 +91,7 @@
 #define ETH_P_TDLS	0x890D          /* TDLS */
 #define ETH_P_FIP	0x8914		/* FCoE Initialization Protocol */
 #define ETH_P_80221	0x8917		/* IEEE 802.21 Media Independent Handover Protocol */
+#define ETH_P_NSH       0x894F          /* Network Service Header */
 #define ETH_P_LOOPBACK	0x9000		/* Ethernet loopback packet, per IEEE 802.3 */
 #define ETH_P_QINQ1	0x9100		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
 #define ETH_P_QINQ2	0x9200		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY REGISTERED ID ] */
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 7758247..37c8c23 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -212,6 +212,16 @@ config NET_IPGRE_BROADCAST
 	  Network), but can be distributed all over the Internet. If you want
 	  to do that, say Y here and to "IP multicast routing" below.
 
+config NET_NSH
+        tristate 'Network Servive Header Encapsulation'
+        help
+          Network Service Header (NSH) inserted onto
+          encapsulated packets or frames to realize service function paths.
+          NSH also provides a mechanism for metadata exchange along the
+          instantiated service path.
+
+          To compile it as a module, choose M here.  If unsure, say N.
+
 config IP_MROUTE
 	bool "IP: multicast routing"
 	depends on IP_MULTICAST
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 62c049b..46d65f8 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -24,6 +24,7 @@ gre-y := gre_demux.o
 obj-$(CONFIG_NET_FOU) += fou.o
 obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o
 obj-$(CONFIG_NET_IPGRE) += ip_gre.o
+obj-$(CONFIG_NET_NSH) += nsh.o
 obj-$(CONFIG_NET_UDP_TUNNEL) += udp_tunnel.o
 obj-$(CONFIG_NET_IPVTI) += ip_vti.o
 obj-$(CONFIG_SYN_COOKIES) += syncookies.o
diff --git a/net/ipv4/nsh.c b/net/ipv4/nsh.c
new file mode 100644
index 0000000..331ea5e
--- /dev/null
+++ b/net/ipv4/nsh.c
@@ -0,0 +1,365 @@
+/*
+ * Network Service Header (NSH) inserted onto encapsulated packets
+ * or frames to realize service function paths.
+ * NSH also provides a mechanism for metadata exchange along the
+ * instantiated service path.
+ *
+ * https://tools.ietf.org/html/draft-ietf-sfc-nsh-01
+ *
+ * Copyright (c) 2016 by Brocade Communications Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <net/nsh.h>
+
+static struct list_head nsh_listeners;
+static DEFINE_MUTEX(nsh_listener_mutex);
+static struct nsh_metadata *decap_ctx_hdrs;
+static unsigned char limit_ctx_hdrs = 10;
+module_param_named(nsh_hdrs, limit_ctx_hdrs, byte, 0444);
+MODULE_PARM_DESC(nsh_hdrs, "Maximum NSH metadata headers per packet");
+
+int nsh_register_listener(struct nsh_listener *listener)
+{
+	if (listener->max_ctx_hdrs > limit_ctx_hdrs)
+		return -ENOMEM;
+
+	mutex_lock(&nsh_listener_mutex);
+	list_add(&listener->list, &nsh_listeners);
+	mutex_unlock(&nsh_listener_mutex);
+	return 0;
+}
+EXPORT_SYMBOL(nsh_register_listener);
+
+int nsh_unregister_listener(struct nsh_listener *listener)
+{
+	mutex_lock(&nsh_listener_mutex);
+	list_del(&listener->list);
+	mutex_unlock(&nsh_listener_mutex);
+	return 0;
+}
+EXPORT_SYMBOL(nsh_unregister_listener);
+
+static int
+notify_listeners(struct sk_buff *skb,
+		 u32 service_path_id,
+		 u8 service_index,
+		 u8 next_proto,
+		 struct nsh_metadata *ctx_hdrs,
+		 unsigned int num_ctx_hdrs)
+{
+	struct nsh_listener *listener;
+	int i, err = 0;
+
+	mutex_lock(&nsh_listener_mutex);
+	list_for_each_entry(listener, &nsh_listeners, list) {
+		for (i = 0; i < num_ctx_hdrs; i++)
+			if (listener->class == ctx_hdrs[i].class) {
+				err = listener->notify(skb,
+						       service_path_id,
+						       service_index,
+						       next_proto,
+						       ctx_hdrs,
+						       num_ctx_hdrs);
+				if (err < 0) {
+					mutex_unlock(&nsh_listener_mutex);
+					return err;
+				}
+				break;
+			}
+	}
+	mutex_unlock(&nsh_listener_mutex);
+	return 0;
+}
+
+static int
+type_1_decap(struct sk_buff *skb,
+	     struct nsh_md_type_1 *md,
+	     unsigned int max_ctx_hdrs,
+	     struct nsh_metadata *ctx_hdrs,
+	     unsigned int *num_ctx_hdrs)
+{
+	int i;
+	u32 *data =  &md->ctx_hdr1;
+
+	if (max_ctx_hdrs == 0)
+		return -ENOMEM;
+
+	ctx_hdrs[0].class = NSH_MD_CLASS_TYPE_1;
+	ctx_hdrs[0].type = NSH_MD_TYPE_TYPE_1;
+	ctx_hdrs[0].len = NSH_MD_LEN_TYPE_1;
+	ctx_hdrs[0].data = data;
+
+	for (i = 0; i < NSH_MD_TYPE_1_NUM_HDRS; i++, data++)
+		*data = ntohl(*data);
+
+	*num_ctx_hdrs = 1;
+
+	return 0;
+}
+
+static int
+type_2_decap(struct sk_buff *skb,
+	     struct nsh_md_type_2 *md,
+	     u8 md_len,
+	     unsigned int max_ctx_hdrs,
+	     struct nsh_metadata *ctx_hdrs,
+	     unsigned int *num_ctx_hdrs)
+{
+	u32 *data;
+	int i = 0, j;
+
+	while (md_len > 0) {
+		if (i > max_ctx_hdrs)
+			return -ENOMEM;
+
+		ctx_hdrs[i].class = ntohs(md->tlv_class);
+		ctx_hdrs[i].type = md->tlv_type;
+		if (ctx_hdrs[i].type & NSH_TYPE_CRIT) {
+			ctx_hdrs[i].type &= ~NSH_TYPE_CRIT;
+			ctx_hdrs[i].crit = 1;
+		}
+		ctx_hdrs[i].len = md->length;
+
+		data = (u32 *) ++md;
+		md_len--;
+
+		ctx_hdrs[i].data = data;
+
+		for (j = 0; j < ctx_hdrs[i].len; j++)
+			data[j] = ntohl(data[j]);
+
+		md = (struct nsh_md_type_2 *)&data[j];
+		md_len -= j;
+		i++;
+	}
+	*num_ctx_hdrs = i;
+
+	return 0;
+}
+
+/* Parse NSH header.
+ *
+ * No additional memory is allocated. Context header data is pointed
+ * to in the buffer payload. Context headers and skb are passed to anyone
+ * who has registered interest in the class(es) of metadata received.
+ *
+ * Returns the total number of 4 byte words in the NSH headers, <0 on failure.
+ */
+int nsh_decap(struct sk_buff *skb,
+	      u32 *spi,
+	      u8 *si,
+	      u8 *np)
+{
+	struct nsh_header *nsh = (struct nsh_header *)skb->data;
+	struct nsh_base *base = &nsh->base;
+	unsigned int max_ctx_hdrs = limit_ctx_hdrs;
+	unsigned int num_ctx_hdrs;
+	u32 service_path_id;
+	u8 service_index;
+	u8 next_proto;
+	u32 sph;
+	u8 md_type;
+	u8 hdrlen; /* 4 byte words */
+	unsigned int len; /* bytes */
+	int err;
+
+	hdrlen = base->length;
+	len = hdrlen * sizeof(u32);
+
+	if (unlikely(!pskb_may_pull(skb, len)))
+		return -ENOMEM;
+
+	skb_pull_rcsum(skb, len);
+
+	if (((base->base_flags & NSH_BF_VER_MASK) >> 6) != NSH_BF_VER0)
+		return -EINVAL;
+
+	next_proto = base->next_proto;
+
+	switch (next_proto) {
+	case NSH_NEXT_PROTO_IPv4:
+		skb->protocol = htons(ETH_P_IP);
+		break;
+	case NSH_NEXT_PROTO_IPv6:
+		skb->protocol = htons(ETH_P_IPV6);
+		break;
+	case NSH_NEXT_PROTO_ETH:
+		skb->protocol = htons(ETH_P_TEB);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (np)
+		*np = next_proto;
+
+	md_type = base->md_type;
+
+	switch (md_type) {
+	case NSH_MD_TYPE_1:
+		if (hdrlen != NSH_LEN_TYPE_1)
+			return -EINVAL;
+		err = type_1_decap(skb, (struct nsh_md_type_1 *) ++nsh,
+				   max_ctx_hdrs, decap_ctx_hdrs, &num_ctx_hdrs);
+		break;
+	case NSH_MD_TYPE_2:
+		if (hdrlen < NSH_LEN_TYPE_2_MIN)
+			return -EINVAL;
+		err = type_2_decap(skb, (struct nsh_md_type_2 *) ++nsh,
+				   hdrlen - NSH_LEN_TYPE_2_MIN,
+				   max_ctx_hdrs, decap_ctx_hdrs, &num_ctx_hdrs);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (err < 0)
+		return err;
+
+	sph = ntohl(nsh->sp_header);
+	service_path_id = (sph & NSH_SPI_MASK) >> 8;
+	service_index = sph & NSH_SI_MASK;
+
+	if (spi)
+		*spi = service_path_id;
+	if (si)
+		*si = service_index;
+
+	err = notify_listeners(skb, service_path_id,
+			       service_index, next_proto,
+			       decap_ctx_hdrs, num_ctx_hdrs);
+	if (err < 0)
+		return err;
+
+	return hdrlen;
+}
+EXPORT_SYMBOL_GPL(nsh_decap);
+
+static void
+type_1_encap(u32 *data_out,
+	     struct nsh_metadata *ctx_hdrs)
+{
+	int i;
+	u32 *data_in = (u32 *)ctx_hdrs[0].data;
+
+	for (i = 0; i < NSH_MD_TYPE_1_NUM_HDRS; i++)
+		data_out[i] = htonl(data_in[i]);
+}
+
+static void
+type_2_encap(struct nsh_md_type_2 *md,
+	     unsigned int num_ctx_hdrs,
+	     struct nsh_metadata *ctx_hdrs)
+{
+	int i, j;
+	u32 *data_in, *data_out;
+
+	for (i = 0; i < num_ctx_hdrs; i++) {
+		md->tlv_class = htons(ctx_hdrs[i].class);
+		md->tlv_type = ctx_hdrs[i].type;
+		if (ctx_hdrs[i].crit)
+			md->tlv_type |= NSH_TYPE_CRIT;
+		md->length = ctx_hdrs[i].len;
+
+		data_out = (u32 *) ++md;
+		data_in = (u32 *)ctx_hdrs[i].data;
+
+		for (j = 0; j < ctx_hdrs[i].len; j++)
+			data_out[j] = htonl(data_in[j]);
+
+		md = (struct nsh_md_type_2 *)&data_out[j];
+	}
+}
+
+/* Add NSH header.
+ */
+int nsh_encap(struct sk_buff *skb,
+	      u32 spi,
+	      u8 si,
+	      u8 np,
+	      unsigned int num_ctx_hdrs,
+	      struct nsh_metadata *ctx_hdrs)
+{
+	bool has_t1 = false, has_t2 = false;
+	bool has_crit = false;
+	unsigned int headroom = sizeof(struct nsh_header);
+	struct nsh_header *nsh;
+	struct nsh_base *base;
+	int i;
+	int err;
+
+	if (np != NSH_NEXT_PROTO_IPv4 &&
+	    np != NSH_NEXT_PROTO_IPv6 &&
+	    np != NSH_NEXT_PROTO_ETH)
+		return -EINVAL;
+
+	if (spi >= NSH_N_SPI)
+		return -EINVAL;
+
+	for (i = 0; i < num_ctx_hdrs; i++) {
+		if (ctx_hdrs[i].class == NSH_MD_CLASS_TYPE_1) {
+			if (num_ctx_hdrs != 1)
+				return -EINVAL;
+			headroom += NSH_MD_LEN_TYPE_1 * sizeof(u32);
+			has_t1 |= true;
+		} else {
+			headroom += ctx_hdrs[i].len * sizeof(u32) +
+				sizeof(struct nsh_md_type_2);
+			has_t2 |= true;
+			has_crit |= ctx_hdrs[i].type & NSH_TYPE_CRIT;
+		}
+
+		if (has_t1 && has_t2)
+			return -EINVAL;
+	}
+
+	err = skb_cow_head(skb, headroom);
+	if (err)
+		return err;
+
+	nsh = (struct nsh_header *)__skb_push(skb, headroom);
+
+	base = &nsh->base;
+	base->base_flags = has_crit ? NSH_BF_CRIT : 0; /* Ver 0, OAM 0 */
+	base->length = headroom / sizeof(u32);
+	base->md_type = has_t1 ? NSH_MD_TYPE_1 : NSH_MD_TYPE_2;
+	base->next_proto = np;
+
+	nsh->sp_header = htonl((spi << 8) | si);
+
+	if (has_t1)
+		type_1_encap((u32 *) ++nsh, ctx_hdrs);
+	else
+		type_2_encap((struct nsh_md_type_2 *) ++nsh, num_ctx_hdrs,
+			     ctx_hdrs);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nsh_encap);
+
+static int __init nsh_init(void)
+{
+	INIT_LIST_HEAD(&nsh_listeners);
+
+	decap_ctx_hdrs = kmalloc_array(limit_ctx_hdrs, sizeof(*decap_ctx_hdrs),
+				       GFP_KERNEL);
+	if (!decap_ctx_hdrs)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void __exit nsh_exit(void)
+{
+	kfree(decap_ctx_hdrs);
+}
+
+module_init(nsh_init);
+module_exit(nsh_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Brian Russell <brussell@brocade.com>");
+MODULE_DESCRIPTION("Network Service Header Encap/Decap");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 1/2] nsh: encapsulation module
  2016-02-11 19:57 ` [PATCH net-next v2 1/2] nsh: encapsulation module Brian Russell
@ 2016-02-15 17:01   ` Jiri Benc
  2016-03-01 11:11     ` Brian Russell
  2016-02-17  3:31   ` Alexei Starovoitov
  1 sibling, 1 reply; 10+ messages in thread
From: Jiri Benc @ 2016-02-15 17:01 UTC (permalink / raw)
  To: Brian Russell; +Cc: netdev

On Thu, 11 Feb 2016 19:57:05 +0000, Brian Russell wrote:
> --- /dev/null
> +++ b/net/ipv4/nsh.c
> @@ -0,0 +1,365 @@
> +/*
> + * Network Service Header (NSH) inserted onto encapsulated packets
> + * or frames to realize service function paths.
> + * NSH also provides a mechanism for metadata exchange along the
> + * instantiated service path.
> + *
> + * https://tools.ietf.org/html/draft-ietf-sfc-nsh-01
> + *
> + * Copyright (c) 2016 by Brocade Communications Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +#include <linux/module.h>
> +#include <net/nsh.h>
> +
> +static struct list_head nsh_listeners;
> +static DEFINE_MUTEX(nsh_listener_mutex);
> +static struct nsh_metadata *decap_ctx_hdrs;
> +static unsigned char limit_ctx_hdrs = 10;
> +module_param_named(nsh_hdrs, limit_ctx_hdrs, byte, 0444);
> +MODULE_PARM_DESC(nsh_hdrs, "Maximum NSH metadata headers per packet");

No module parameters, please. Especially not for something like
encapsulation where multiple users will want different settings.

> +
> +int nsh_register_listener(struct nsh_listener *listener)
> +{
> +	if (listener->max_ctx_hdrs > limit_ctx_hdrs)
> +		return -ENOMEM;
> +
> +	mutex_lock(&nsh_listener_mutex);
> +	list_add(&listener->list, &nsh_listeners);
> +	mutex_unlock(&nsh_listener_mutex);
> +	return 0;
> +}
> +EXPORT_SYMBOL(nsh_register_listener);
> +
> +int nsh_unregister_listener(struct nsh_listener *listener)
> +{
> +	mutex_lock(&nsh_listener_mutex);
> +	list_del(&listener->list);
> +	mutex_unlock(&nsh_listener_mutex);
> +	return 0;
> +}
> +EXPORT_SYMBOL(nsh_unregister_listener);

I'd like to see how this listener stuff is used. Please do not submit
patches adding API without actual users. It's hard (or, in this case,
I'd say even impossible) to properly review this without seeing how it
is used.

> +
> +static int
> +notify_listeners(struct sk_buff *skb,

Please do not break lines between the return type and name of the function.

> +		 u32 service_path_id,
> +		 u8 service_index,
> +		 u8 next_proto,
> +		 struct nsh_metadata *ctx_hdrs,
> +		 unsigned int num_ctx_hdrs)
> +{
> +	struct nsh_listener *listener;
> +	int i, err = 0;
> +
> +	mutex_lock(&nsh_listener_mutex);
> +	list_for_each_entry(listener, &nsh_listeners, list) {
> +		for (i = 0; i < num_ctx_hdrs; i++)
> +			if (listener->class == ctx_hdrs[i].class) {
> +				err = listener->notify(skb,
> +						       service_path_id,
> +						       service_index,
> +						       next_proto,
> +						       ctx_hdrs,
> +						       num_ctx_hdrs);
> +				if (err < 0) {
> +					mutex_unlock(&nsh_listener_mutex);
> +					return err;
> +				}
> +				break;
> +			}
> +	}
> +	mutex_unlock(&nsh_listener_mutex);
> +	return 0;
> +}
> +
> +static int
> +type_1_decap(struct sk_buff *skb,
> +	     struct nsh_md_type_1 *md,
> +	     unsigned int max_ctx_hdrs,
> +	     struct nsh_metadata *ctx_hdrs,
> +	     unsigned int *num_ctx_hdrs)
> +{
> +	int i;
> +	u32 *data =  &md->ctx_hdr1;
> +
> +	if (max_ctx_hdrs == 0)
> +		return -ENOMEM;
> +
> +	ctx_hdrs[0].class = NSH_MD_CLASS_TYPE_1;
> +	ctx_hdrs[0].type = NSH_MD_TYPE_TYPE_1;
> +	ctx_hdrs[0].len = NSH_MD_LEN_TYPE_1;
> +	ctx_hdrs[0].data = data;
> +
> +	for (i = 0; i < NSH_MD_TYPE_1_NUM_HDRS; i++, data++)
> +		*data = ntohl(*data);
> +
> +	*num_ctx_hdrs = 1;
> +
> +	return 0;
> +}
> +
> +static int
> +type_2_decap(struct sk_buff *skb,
> +	     struct nsh_md_type_2 *md,
> +	     u8 md_len,
> +	     unsigned int max_ctx_hdrs,
> +	     struct nsh_metadata *ctx_hdrs,
> +	     unsigned int *num_ctx_hdrs)
> +{
> +	u32 *data;
> +	int i = 0, j;
> +
> +	while (md_len > 0) {
> +		if (i > max_ctx_hdrs)
> +			return -ENOMEM;
> +
> +		ctx_hdrs[i].class = ntohs(md->tlv_class);
> +		ctx_hdrs[i].type = md->tlv_type;
> +		if (ctx_hdrs[i].type & NSH_TYPE_CRIT) {
> +			ctx_hdrs[i].type &= ~NSH_TYPE_CRIT;
> +			ctx_hdrs[i].crit = 1;
> +		}
> +		ctx_hdrs[i].len = md->length;
> +
> +		data = (u32 *) ++md;
> +		md_len--;
> +
> +		ctx_hdrs[i].data = data;
> +
> +		for (j = 0; j < ctx_hdrs[i].len; j++)
> +			data[j] = ntohl(data[j]);
> +
> +		md = (struct nsh_md_type_2 *)&data[j];
> +		md_len -= j;
> +		i++;
> +	}
> +	*num_ctx_hdrs = i;
> +
> +	return 0;
> +}
> +
> +/* Parse NSH header.
> + *
> + * No additional memory is allocated. Context header data is pointed
> + * to in the buffer payload. Context headers and skb are passed to anyone
> + * who has registered interest in the class(es) of metadata received.
> + *
> + * Returns the total number of 4 byte words in the NSH headers, <0 on failure.
> + */
> +int nsh_decap(struct sk_buff *skb,
> +	      u32 *spi,
> +	      u8 *si,
> +	      u8 *np)

No reason to have each parameter on a separate line, please put as many
of them as fit the 80 chars limit on the same line.

> +{
> +	struct nsh_header *nsh = (struct nsh_header *)skb->data;
> +	struct nsh_base *base = &nsh->base;
> +	unsigned int max_ctx_hdrs = limit_ctx_hdrs;
> +	unsigned int num_ctx_hdrs;
> +	u32 service_path_id;
> +	u8 service_index;
> +	u8 next_proto;
> +	u32 sph;
> +	u8 md_type;
> +	u8 hdrlen; /* 4 byte words */
> +	unsigned int len; /* bytes */
> +	int err;
> +
> +	hdrlen = base->length;
> +	len = hdrlen * sizeof(u32);
> +
> +	if (unlikely(!pskb_may_pull(skb, len)))
> +		return -ENOMEM;
> +
> +	skb_pull_rcsum(skb, len);
> +
> +	if (((base->base_flags & NSH_BF_VER_MASK) >> 6) != NSH_BF_VER0)
> +		return -EINVAL;
> +
> +	next_proto = base->next_proto;
> +
> +	switch (next_proto) {
> +	case NSH_NEXT_PROTO_IPv4:
> +		skb->protocol = htons(ETH_P_IP);
> +		break;
> +	case NSH_NEXT_PROTO_IPv6:
> +		skb->protocol = htons(ETH_P_IPV6);
> +		break;
> +	case NSH_NEXT_PROTO_ETH:
> +		skb->protocol = htons(ETH_P_TEB);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (np)
> +		*np = next_proto;
> +
> +	md_type = base->md_type;
> +
> +	switch (md_type) {
> +	case NSH_MD_TYPE_1:
> +		if (hdrlen != NSH_LEN_TYPE_1)
> +			return -EINVAL;
> +		err = type_1_decap(skb, (struct nsh_md_type_1 *) ++nsh,
> +				   max_ctx_hdrs, decap_ctx_hdrs, &num_ctx_hdrs);
> +		break;
> +	case NSH_MD_TYPE_2:
> +		if (hdrlen < NSH_LEN_TYPE_2_MIN)
> +			return -EINVAL;
> +		err = type_2_decap(skb, (struct nsh_md_type_2 *) ++nsh,
> +				   hdrlen - NSH_LEN_TYPE_2_MIN,
> +				   max_ctx_hdrs, decap_ctx_hdrs, &num_ctx_hdrs);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	if (err < 0)
> +		return err;
> +
> +	sph = ntohl(nsh->sp_header);
> +	service_path_id = (sph & NSH_SPI_MASK) >> 8;
> +	service_index = sph & NSH_SI_MASK;
> +
> +	if (spi)
> +		*spi = service_path_id;
> +	if (si)
> +		*si = service_index;
> +
> +	err = notify_listeners(skb, service_path_id,
> +			       service_index, next_proto,
> +			       decap_ctx_hdrs, num_ctx_hdrs);
> +	if (err < 0)
> +		return err;
> +
> +	return hdrlen;
> +}
> +EXPORT_SYMBOL_GPL(nsh_decap);
> +
> +static void
> +type_1_encap(u32 *data_out,
> +	     struct nsh_metadata *ctx_hdrs)
> +{
> +	int i;
> +	u32 *data_in = (u32 *)ctx_hdrs[0].data;
> +
> +	for (i = 0; i < NSH_MD_TYPE_1_NUM_HDRS; i++)
> +		data_out[i] = htonl(data_in[i]);
> +}
> +
> +static void
> +type_2_encap(struct nsh_md_type_2 *md,
> +	     unsigned int num_ctx_hdrs,
> +	     struct nsh_metadata *ctx_hdrs)
> +{
> +	int i, j;
> +	u32 *data_in, *data_out;
> +
> +	for (i = 0; i < num_ctx_hdrs; i++) {
> +		md->tlv_class = htons(ctx_hdrs[i].class);
> +		md->tlv_type = ctx_hdrs[i].type;
> +		if (ctx_hdrs[i].crit)
> +			md->tlv_type |= NSH_TYPE_CRIT;
> +		md->length = ctx_hdrs[i].len;
> +
> +		data_out = (u32 *) ++md;
> +		data_in = (u32 *)ctx_hdrs[i].data;
> +
> +		for (j = 0; j < ctx_hdrs[i].len; j++)
> +			data_out[j] = htonl(data_in[j]);
> +
> +		md = (struct nsh_md_type_2 *)&data_out[j];
> +	}
> +}
> +
> +/* Add NSH header.
> + */
> +int nsh_encap(struct sk_buff *skb,
> +	      u32 spi,
> +	      u8 si,
> +	      u8 np,
> +	      unsigned int num_ctx_hdrs,
> +	      struct nsh_metadata *ctx_hdrs)
> +{
> +	bool has_t1 = false, has_t2 = false;
> +	bool has_crit = false;
> +	unsigned int headroom = sizeof(struct nsh_header);
> +	struct nsh_header *nsh;
> +	struct nsh_base *base;
> +	int i;
> +	int err;
> +
> +	if (np != NSH_NEXT_PROTO_IPv4 &&
> +	    np != NSH_NEXT_PROTO_IPv6 &&
> +	    np != NSH_NEXT_PROTO_ETH)
> +		return -EINVAL;
> +
> +	if (spi >= NSH_N_SPI)
> +		return -EINVAL;
> +
> +	for (i = 0; i < num_ctx_hdrs; i++) {
> +		if (ctx_hdrs[i].class == NSH_MD_CLASS_TYPE_1) {
> +			if (num_ctx_hdrs != 1)
> +				return -EINVAL;
> +			headroom += NSH_MD_LEN_TYPE_1 * sizeof(u32);
> +			has_t1 |= true;
> +		} else {
> +			headroom += ctx_hdrs[i].len * sizeof(u32) +
> +				sizeof(struct nsh_md_type_2);
> +			has_t2 |= true;
> +			has_crit |= ctx_hdrs[i].type & NSH_TYPE_CRIT;
> +		}
> +
> +		if (has_t1 && has_t2)
> +			return -EINVAL;
> +	}
> +
> +	err = skb_cow_head(skb, headroom);
> +	if (err)
> +		return err;
> +
> +	nsh = (struct nsh_header *)__skb_push(skb, headroom);
> +
> +	base = &nsh->base;
> +	base->base_flags = has_crit ? NSH_BF_CRIT : 0; /* Ver 0, OAM 0 */
> +	base->length = headroom / sizeof(u32);
> +	base->md_type = has_t1 ? NSH_MD_TYPE_1 : NSH_MD_TYPE_2;
> +	base->next_proto = np;
> +
> +	nsh->sp_header = htonl((spi << 8) | si);
> +
> +	if (has_t1)
> +		type_1_encap((u32 *) ++nsh, ctx_hdrs);
> +	else
> +		type_2_encap((struct nsh_md_type_2 *) ++nsh, num_ctx_hdrs,
> +			     ctx_hdrs);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(nsh_encap);

Again, no user, no idea whether something is missing or wrong.

 Jiri

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 1/2] nsh: encapsulation module
  2016-02-15 17:01   ` Jiri Benc
@ 2016-03-01 11:11     ` Brian Russell
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Russell @ 2016-03-01 11:11 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev

On 15/02/16 17:01, Jiri Benc wrote:
> On Thu, 11 Feb 2016 19:57:05 +0000, Brian Russell wrote:
>> --- /dev/null
>> +++ b/net/ipv4/nsh.c
>> @@ -0,0 +1,365 @@
>> +/*
>> + * Network Service Header (NSH) inserted onto encapsulated packets
>> + * or frames to realize service function paths.
>> + * NSH also provides a mechanism for metadata exchange along the
>> + * instantiated service path.
>> + *
>> + * https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dietf-2Dsfc-2Dnsh-2D01&d=CwICAg&c=IL_XqQWOjubgfqINi2jTzg&r=Doie302MT-sezztwQymkPQ3_4X5Q3a0mKbiZzzoNm-0&m=_l0yn4-EnXWqISq7YNUBglAxKAsglgRhuBK6CMa3dI0&s=V0t-CrNcmmRojyUSSZZP22sZZZAR0ztofpOYoZh7--E&e= 
>> + *
>> + * Copyright (c) 2016 by Brocade Communications Systems, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +#include <linux/module.h>
>> +#include <net/nsh.h>
>> +
>> +static struct list_head nsh_listeners;
>> +static DEFINE_MUTEX(nsh_listener_mutex);
>> +static struct nsh_metadata *decap_ctx_hdrs;
>> +static unsigned char limit_ctx_hdrs = 10;
>> +module_param_named(nsh_hdrs, limit_ctx_hdrs, byte, 0444);
>> +MODULE_PARM_DESC(nsh_hdrs, "Maximum NSH metadata headers per packet");
> 
> No module parameters, please. Especially not for something like
> encapsulation where multiple users will want different settings.
>

Ok.
 
>> +
>> +int nsh_register_listener(struct nsh_listener *listener)
>> +{
>> +	if (listener->max_ctx_hdrs > limit_ctx_hdrs)
>> +		return -ENOMEM;
>> +
>> +	mutex_lock(&nsh_listener_mutex);
>> +	list_add(&listener->list, &nsh_listeners);
>> +	mutex_unlock(&nsh_listener_mutex);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(nsh_register_listener);
>> +
>> +int nsh_unregister_listener(struct nsh_listener *listener)
>> +{
>> +	mutex_lock(&nsh_listener_mutex);
>> +	list_del(&listener->list);
>> +	mutex_unlock(&nsh_listener_mutex);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(nsh_unregister_listener);
> 
> I'd like to see how this listener stuff is used. Please do not submit
> patches adding API without actual users. It's hard (or, in this case,
> I'd say even impossible) to properly review this without seeing how it
> is used.
> 

I've added a new module to the next patch iteration that uses it.

>> +
>> +static int
>> +notify_listeners(struct sk_buff *skb,
> 
> Please do not break lines between the return type and name of the function.
> 

Ok.

>> +		 u32 service_path_id,
>> +		 u8 service_index,
>> +		 u8 next_proto,
>> +		 struct nsh_metadata *ctx_hdrs,
>> +		 unsigned int num_ctx_hdrs)
>> +{
>> +	struct nsh_listener *listener;
>> +	int i, err = 0;
>> +
>> +	mutex_lock(&nsh_listener_mutex);
>> +	list_for_each_entry(listener, &nsh_listeners, list) {
>> +		for (i = 0; i < num_ctx_hdrs; i++)
>> +			if (listener->class == ctx_hdrs[i].class) {
>> +				err = listener->notify(skb,
>> +						       service_path_id,
>> +						       service_index,
>> +						       next_proto,
>> +						       ctx_hdrs,
>> +						       num_ctx_hdrs);
>> +				if (err < 0) {
>> +					mutex_unlock(&nsh_listener_mutex);
>> +					return err;
>> +				}
>> +				break;
>> +			}
>> +	}
>> +	mutex_unlock(&nsh_listener_mutex);
>> +	return 0;
>> +}
>> +
>> +static int
>> +type_1_decap(struct sk_buff *skb,
>> +	     struct nsh_md_type_1 *md,
>> +	     unsigned int max_ctx_hdrs,
>> +	     struct nsh_metadata *ctx_hdrs,
>> +	     unsigned int *num_ctx_hdrs)
>> +{
>> +	int i;
>> +	u32 *data =  &md->ctx_hdr1;
>> +
>> +	if (max_ctx_hdrs == 0)
>> +		return -ENOMEM;
>> +
>> +	ctx_hdrs[0].class = NSH_MD_CLASS_TYPE_1;
>> +	ctx_hdrs[0].type = NSH_MD_TYPE_TYPE_1;
>> +	ctx_hdrs[0].len = NSH_MD_LEN_TYPE_1;
>> +	ctx_hdrs[0].data = data;
>> +
>> +	for (i = 0; i < NSH_MD_TYPE_1_NUM_HDRS; i++, data++)
>> +		*data = ntohl(*data);
>> +
>> +	*num_ctx_hdrs = 1;
>> +
>> +	return 0;
>> +}
>> +
>> +static int
>> +type_2_decap(struct sk_buff *skb,
>> +	     struct nsh_md_type_2 *md,
>> +	     u8 md_len,
>> +	     unsigned int max_ctx_hdrs,
>> +	     struct nsh_metadata *ctx_hdrs,
>> +	     unsigned int *num_ctx_hdrs)
>> +{
>> +	u32 *data;
>> +	int i = 0, j;
>> +
>> +	while (md_len > 0) {
>> +		if (i > max_ctx_hdrs)
>> +			return -ENOMEM;
>> +
>> +		ctx_hdrs[i].class = ntohs(md->tlv_class);
>> +		ctx_hdrs[i].type = md->tlv_type;
>> +		if (ctx_hdrs[i].type & NSH_TYPE_CRIT) {
>> +			ctx_hdrs[i].type &= ~NSH_TYPE_CRIT;
>> +			ctx_hdrs[i].crit = 1;
>> +		}
>> +		ctx_hdrs[i].len = md->length;
>> +
>> +		data = (u32 *) ++md;
>> +		md_len--;
>> +
>> +		ctx_hdrs[i].data = data;
>> +
>> +		for (j = 0; j < ctx_hdrs[i].len; j++)
>> +			data[j] = ntohl(data[j]);
>> +
>> +		md = (struct nsh_md_type_2 *)&data[j];
>> +		md_len -= j;
>> +		i++;
>> +	}
>> +	*num_ctx_hdrs = i;
>> +
>> +	return 0;
>> +}
>> +
>> +/* Parse NSH header.
>> + *
>> + * No additional memory is allocated. Context header data is pointed
>> + * to in the buffer payload. Context headers and skb are passed to anyone
>> + * who has registered interest in the class(es) of metadata received.
>> + *
>> + * Returns the total number of 4 byte words in the NSH headers, <0 on failure.
>> + */
>> +int nsh_decap(struct sk_buff *skb,
>> +	      u32 *spi,
>> +	      u8 *si,
>> +	      u8 *np)
> 
> No reason to have each parameter on a separate line, please put as many
> of them as fit the 80 chars limit on the same line.
> 

Ok.

>> +{
>> +	struct nsh_header *nsh = (struct nsh_header *)skb->data;
>> +	struct nsh_base *base = &nsh->base;
>> +	unsigned int max_ctx_hdrs = limit_ctx_hdrs;
>> +	unsigned int num_ctx_hdrs;
>> +	u32 service_path_id;
>> +	u8 service_index;
>> +	u8 next_proto;
>> +	u32 sph;
>> +	u8 md_type;
>> +	u8 hdrlen; /* 4 byte words */
>> +	unsigned int len; /* bytes */
>> +	int err;
>> +
>> +	hdrlen = base->length;
>> +	len = hdrlen * sizeof(u32);
>> +
>> +	if (unlikely(!pskb_may_pull(skb, len)))
>> +		return -ENOMEM;
>> +
>> +	skb_pull_rcsum(skb, len);
>> +
>> +	if (((base->base_flags & NSH_BF_VER_MASK) >> 6) != NSH_BF_VER0)
>> +		return -EINVAL;
>> +
>> +	next_proto = base->next_proto;
>> +
>> +	switch (next_proto) {
>> +	case NSH_NEXT_PROTO_IPv4:
>> +		skb->protocol = htons(ETH_P_IP);
>> +		break;
>> +	case NSH_NEXT_PROTO_IPv6:
>> +		skb->protocol = htons(ETH_P_IPV6);
>> +		break;
>> +	case NSH_NEXT_PROTO_ETH:
>> +		skb->protocol = htons(ETH_P_TEB);
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (np)
>> +		*np = next_proto;
>> +
>> +	md_type = base->md_type;
>> +
>> +	switch (md_type) {
>> +	case NSH_MD_TYPE_1:
>> +		if (hdrlen != NSH_LEN_TYPE_1)
>> +			return -EINVAL;
>> +		err = type_1_decap(skb, (struct nsh_md_type_1 *) ++nsh,
>> +				   max_ctx_hdrs, decap_ctx_hdrs, &num_ctx_hdrs);
>> +		break;
>> +	case NSH_MD_TYPE_2:
>> +		if (hdrlen < NSH_LEN_TYPE_2_MIN)
>> +			return -EINVAL;
>> +		err = type_2_decap(skb, (struct nsh_md_type_2 *) ++nsh,
>> +				   hdrlen - NSH_LEN_TYPE_2_MIN,
>> +				   max_ctx_hdrs, decap_ctx_hdrs, &num_ctx_hdrs);
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (err < 0)
>> +		return err;
>> +
>> +	sph = ntohl(nsh->sp_header);
>> +	service_path_id = (sph & NSH_SPI_MASK) >> 8;
>> +	service_index = sph & NSH_SI_MASK;
>> +
>> +	if (spi)
>> +		*spi = service_path_id;
>> +	if (si)
>> +		*si = service_index;
>> +
>> +	err = notify_listeners(skb, service_path_id,
>> +			       service_index, next_proto,
>> +			       decap_ctx_hdrs, num_ctx_hdrs);
>> +	if (err < 0)
>> +		return err;
>> +
>> +	return hdrlen;
>> +}
>> +EXPORT_SYMBOL_GPL(nsh_decap);
>> +
>> +static void
>> +type_1_encap(u32 *data_out,
>> +	     struct nsh_metadata *ctx_hdrs)
>> +{
>> +	int i;
>> +	u32 *data_in = (u32 *)ctx_hdrs[0].data;
>> +
>> +	for (i = 0; i < NSH_MD_TYPE_1_NUM_HDRS; i++)
>> +		data_out[i] = htonl(data_in[i]);
>> +}
>> +
>> +static void
>> +type_2_encap(struct nsh_md_type_2 *md,
>> +	     unsigned int num_ctx_hdrs,
>> +	     struct nsh_metadata *ctx_hdrs)
>> +{
>> +	int i, j;
>> +	u32 *data_in, *data_out;
>> +
>> +	for (i = 0; i < num_ctx_hdrs; i++) {
>> +		md->tlv_class = htons(ctx_hdrs[i].class);
>> +		md->tlv_type = ctx_hdrs[i].type;
>> +		if (ctx_hdrs[i].crit)
>> +			md->tlv_type |= NSH_TYPE_CRIT;
>> +		md->length = ctx_hdrs[i].len;
>> +
>> +		data_out = (u32 *) ++md;
>> +		data_in = (u32 *)ctx_hdrs[i].data;
>> +
>> +		for (j = 0; j < ctx_hdrs[i].len; j++)
>> +			data_out[j] = htonl(data_in[j]);
>> +
>> +		md = (struct nsh_md_type_2 *)&data_out[j];
>> +	}
>> +}
>> +
>> +/* Add NSH header.
>> + */
>> +int nsh_encap(struct sk_buff *skb,
>> +	      u32 spi,
>> +	      u8 si,
>> +	      u8 np,
>> +	      unsigned int num_ctx_hdrs,
>> +	      struct nsh_metadata *ctx_hdrs)
>> +{
>> +	bool has_t1 = false, has_t2 = false;
>> +	bool has_crit = false;
>> +	unsigned int headroom = sizeof(struct nsh_header);
>> +	struct nsh_header *nsh;
>> +	struct nsh_base *base;
>> +	int i;
>> +	int err;
>> +
>> +	if (np != NSH_NEXT_PROTO_IPv4 &&
>> +	    np != NSH_NEXT_PROTO_IPv6 &&
>> +	    np != NSH_NEXT_PROTO_ETH)
>> +		return -EINVAL;
>> +
>> +	if (spi >= NSH_N_SPI)
>> +		return -EINVAL;
>> +
>> +	for (i = 0; i < num_ctx_hdrs; i++) {
>> +		if (ctx_hdrs[i].class == NSH_MD_CLASS_TYPE_1) {
>> +			if (num_ctx_hdrs != 1)
>> +				return -EINVAL;
>> +			headroom += NSH_MD_LEN_TYPE_1 * sizeof(u32);
>> +			has_t1 |= true;
>> +		} else {
>> +			headroom += ctx_hdrs[i].len * sizeof(u32) +
>> +				sizeof(struct nsh_md_type_2);
>> +			has_t2 |= true;
>> +			has_crit |= ctx_hdrs[i].type & NSH_TYPE_CRIT;
>> +		}
>> +
>> +		if (has_t1 && has_t2)
>> +			return -EINVAL;
>> +	}
>> +
>> +	err = skb_cow_head(skb, headroom);
>> +	if (err)
>> +		return err;
>> +
>> +	nsh = (struct nsh_header *)__skb_push(skb, headroom);
>> +
>> +	base = &nsh->base;
>> +	base->base_flags = has_crit ? NSH_BF_CRIT : 0; /* Ver 0, OAM 0 */
>> +	base->length = headroom / sizeof(u32);
>> +	base->md_type = has_t1 ? NSH_MD_TYPE_1 : NSH_MD_TYPE_2;
>> +	base->next_proto = np;
>> +
>> +	nsh->sp_header = htonl((spi << 8) | si);
>> +
>> +	if (has_t1)
>> +		type_1_encap((u32 *) ++nsh, ctx_hdrs);
>> +	else
>> +		type_2_encap((struct nsh_md_type_2 *) ++nsh, num_ctx_hdrs,
>> +			     ctx_hdrs);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(nsh_encap);
> 
> Again, no user, no idea whether something is missing or wrong.
> 

Next patch iteration has example use added.

Thanks,

Brian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 1/2] nsh: encapsulation module
  2016-02-11 19:57 ` [PATCH net-next v2 1/2] nsh: encapsulation module Brian Russell
  2016-02-15 17:01   ` Jiri Benc
@ 2016-02-17  3:31   ` Alexei Starovoitov
  2016-03-01 11:11     ` Brian Russell
  1 sibling, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2016-02-17  3:31 UTC (permalink / raw)
  To: Brian Russell; +Cc: netdev

On Thu, Feb 11, 2016 at 07:57:05PM +0000, Brian Russell wrote:
> Support encap/decap of Network Service Header (NSH) as defined in
> https://tools.ietf.org/html/draft-ietf-sfc-nsh-01
> 
> Includes support for Type 1 and Type 2 metadata and a simple registration
> for listeners to see decapsulated packets based on the Type/Class.
> 
> Signed-off-by: Brian Russell <brussell@brocade.com>
...
> +int nsh_register_listener(struct nsh_listener *listener)
> +{
> +	if (listener->max_ctx_hdrs > limit_ctx_hdrs)
> +		return -ENOMEM;
> +
> +	mutex_lock(&nsh_listener_mutex);
> +	list_add(&listener->list, &nsh_listeners);
> +	mutex_unlock(&nsh_listener_mutex);
> +	return 0;
> +}
> +EXPORT_SYMBOL(nsh_register_listener);
> +EXPORT_SYMBOL(nsh_unregister_listener);

looks like this patch doesn't actually implement the protocol,
but rather provides a placeholder for out of tree modules?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 1/2] nsh: encapsulation module
  2016-02-17  3:31   ` Alexei Starovoitov
@ 2016-03-01 11:11     ` Brian Russell
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Russell @ 2016-03-01 11:11 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev

On 17/02/16 03:31, Alexei Starovoitov wrote:
> On Thu, Feb 11, 2016 at 07:57:05PM +0000, Brian Russell wrote:
>> Support encap/decap of Network Service Header (NSH) as defined in
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dietf-2Dsfc-2Dnsh-2D01&d=CwIBAg&c=IL_XqQWOjubgfqINi2jTzg&r=Doie302MT-sezztwQymkPQ3_4X5Q3a0mKbiZzzoNm-0&m=TRPN3Zh-t31lBDA9ENrL1q3xeXBbMVLbXQhuldfgGN4&s=2lOXPH9TaoFa0x2lbk74kXi0vaLni54K6Hwjlb_Zs5k&e= 
>>
>> Includes support for Type 1 and Type 2 metadata and a simple registration
>> for listeners to see decapsulated packets based on the Type/Class.
>>
>> Signed-off-by: Brian Russell <brussell@brocade.com>
> ...
>> +int nsh_register_listener(struct nsh_listener *listener)
>> +{
>> +	if (listener->max_ctx_hdrs > limit_ctx_hdrs)
>> +		return -ENOMEM;
>> +
>> +	mutex_lock(&nsh_listener_mutex);
>> +	list_add(&listener->list, &nsh_listeners);
>> +	mutex_unlock(&nsh_listener_mutex);
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL(nsh_register_listener);
>> +EXPORT_SYMBOL(nsh_unregister_listener);
> 
> looks like this patch doesn't actually implement the protocol,
> but rather provides a placeholder for out of tree modules?
> 

It implements the protocol in terms of the NSH base and service path headers and it decaps the metadata if present. However, the actual interpretation of that metadata is left to registered listeners which might be out of tree modules. The NSH standard defines the mechanism to carry metadata but does not limit how it is used.

Thanks,

Brian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 2/2] vxlan: support GPE/NSH
  2016-02-11 19:57 [PATCH net-next v2 0/2] NSH and VxLAN-GPE Brian Russell
  2016-02-11 19:57 ` [PATCH net-next v2 1/2] nsh: encapsulation module Brian Russell
@ 2016-02-11 19:57 ` Brian Russell
  2016-02-15 16:49   ` Jiri Benc
  1 sibling, 1 reply; 10+ messages in thread
From: Brian Russell @ 2016-02-11 19:57 UTC (permalink / raw)
  To: netdev

Support the Generic Protocol Extension to VxLAN which extends VxLAN to
allow multi-protocol encapsulation. IPv4, IPv6, MPLS unicast and
NSH encapsulated packets can be sent and received in addition to ethernet
frames. As defined in:

https://tools.ietf.org/html/draft-ietf-nvo3-vxlan-gpe-01

Signed-off-by: Brian Russell <brussell@brocade.com>
---
 drivers/net/vxlan.c          | 139 +++++++++++++++++++++++++++++++++++++++----
 include/net/vxlan.h          |  40 ++++++++++++-
 include/uapi/linux/if_link.h |   1 +
 3 files changed, 166 insertions(+), 14 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ebf57d9..e6a6bfb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -50,6 +50,7 @@
 #include <net/ip6_checksum.h>
 #endif
 #include <net/dst_metadata.h>
+#include <net/nsh.h>
 
 #define VXLAN_VERSION	"0.1"
 
@@ -1168,14 +1169,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
 	if (!vxlan)
 		goto drop;
 
-	skb_reset_mac_header(skb);
 	skb_scrub_packet(skb, !net_eq(vxlan->net, dev_net(vxlan->dev)));
-	skb->protocol = eth_type_trans(skb, vxlan->dev);
-	skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
-
-	/* Ignore packet loops (and multicast echo) */
-	if (ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr))
-		goto drop;
 
 	/* Get data from the outer IP header */
 	if (vxlan_get_sk_family(vs) == AF_INET) {
@@ -1195,13 +1189,57 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
 		tun_dst = NULL;
 	}
 
+	switch (md->gpe_np) {
+	case VXLAN_GPE_NP_IPv4:
+		skb->protocol = htons(ETH_P_IP);
+		goto skip_l2;
+#if IS_ENABLED(CONFIG_IPV6)
+	case VXLAN_GPE_NP_IPv6:
+		skb->protocol = htons(ETH_P_IPV6);
+		goto skip_l2;
+#endif
+#if IS_ENABLED(CONFIG_MPLS)
+	case VXLAN_GPE_NP_MPLS:
+		skb->protocol = htons(ETH_P_MPLS_UC);
+		goto skip_l2;
+#endif
+#if IS_ENABLED(CONFIG_NET_NSH)
+	case VXLAN_GPE_NP_NSH:
+		{
+			u8 next_proto;
+
+			if (nsh_decap(skb, NULL, NULL, &next_proto) < 0)
+				goto drop;
+
+			if (next_proto != NSH_NEXT_PROTO_ETH)
+				goto skip_l2;
+		}
+		break;
+#endif
+	case VXLAN_GPE_NP_ETH:
+		/* GPE with next proto eth is equivalent to vanilla vxlan. */
+	default:
+		break;
+	}
+
+	skb_reset_mac_header(skb);
+	skb->protocol = eth_type_trans(skb, vxlan->dev);
+	skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+
+	/* Ignore packet loops (and multicast echo) */
+	if (ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr))
+		goto drop;
+
 	if ((vxlan->flags & VXLAN_F_LEARN) &&
 	    vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source))
 		goto drop;
 
+skip_l2:
 	skb_reset_network_header(skb);
+
 	/* In flow-based mode, GBP is carried in dst_metadata */
-	if (!(vs->flags & VXLAN_F_COLLECT_METADATA))
+	if (!(vs->flags & VXLAN_F_COLLECT_METADATA) &&
+	    !(vs->flags & VXLAN_F_GPE))
 		skb->mark = md->gbp;
 
 	if (oip6)
@@ -1252,6 +1290,10 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	struct vxlan_metadata _md;
 	struct vxlan_metadata *md = &_md;
 
+	vs = rcu_dereference_sk_user_data(sk);
+	if (!vs)
+		goto drop;
+
 	/* Need Vxlan and inner Ethernet header to be present */
 	if (!pskb_may_pull(skb, VXLAN_HLEN))
 		goto error;
@@ -1267,14 +1309,13 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 		goto bad_flags;
 	}
 
-	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
+	/* If GPE, protocol will be set once next proto examined. */
+	if (iptunnel_pull_header(skb, VXLAN_HLEN,
+				 vs->flags & VXLAN_F_GPE ?
+				 htons(ETH_P_IP) : htons(ETH_P_TEB)))
 		goto drop;
 	vxh = (struct vxlanhdr *)(udp_hdr(skb) + 1);
 
-	vs = rcu_dereference_sk_user_data(sk);
-	if (!vs)
-		goto drop;
-
 	if ((flags & VXLAN_HF_RCO) && (vs->flags & VXLAN_F_REMCSUM_RX)) {
 		vxh = vxlan_remcsum(skb, vxh, sizeof(struct vxlanhdr), vni,
 				    !!(vs->flags & VXLAN_F_REMCSUM_NOPARTIAL));
@@ -1318,6 +1359,16 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 		flags &= ~VXLAN_GBP_USED_BITS;
 	}
 
+	if (vs->flags & VXLAN_F_GPE) {
+		/* Next protocol is required */
+		if (!(flags & VXLAN_HF_GPE_NP))
+			goto bad_flags;
+
+		md->gpe_np = flags & VXLAN_GPE_NP_MASK;
+
+		flags &= ~VXLAN_GPE_USED_BITS;
+	}
+
 	if (flags || vni & ~VXLAN_VNI_MASK) {
 		/* If there are any unprocessed flags remaining treat
 		 * this as a malformed packet. This behavior diverges from
@@ -1664,6 +1715,37 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
+static void vxlan_build_gpe_hdr(struct vxlanhdr *vxh, __be16 proto)
+{
+	u32 next_proto;
+
+	switch (proto) {
+#if IS_ENABLED(CONFIG_NET_NSH)
+	case htons(ETH_P_NSH):
+		next_proto = VXLAN_GPE_NP_NSH;
+		break;
+#endif
+	case htons(ETH_P_IP):
+		next_proto = VXLAN_GPE_NP_IPv4;
+		break;
+#if IS_ENABLED(CONFIG_IPV6)
+	case htons(ETH_P_IPV6):
+		next_proto = VXLAN_GPE_NP_IPv6;
+		break;
+#endif
+#if IS_ENABLED(CONFIG_MPLS)
+	case htons(ETH_P_MPLS_UC):
+		next_proto = VXLAN_GPE_NP_MPLS;
+		break;
+#endif
+	default:
+		next_proto = VXLAN_GPE_NP_ETH;
+		break;
+	}
+
+	vxh->vx_flags |= htonl(VXLAN_HF_GPE_NP | next_proto);
+}
+
 static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, u32 vxflags,
 				struct vxlan_metadata *md)
 {
@@ -1750,6 +1832,9 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
 	if (vxflags & VXLAN_F_GBP)
 		vxlan_build_gbp_hdr(vxh, vxflags, md);
 
+	if (vxflags & VXLAN_F_GPE)
+		vxlan_build_gpe_hdr(vxh, skb->protocol);
+
 	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 	return 0;
 }
@@ -2073,6 +2158,26 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct vxlan_rdst *rdst, *fdst = NULL;
 	struct vxlan_fdb *f;
 
+	if (vxlan->flags & VXLAN_F_GPE) {
+		switch (skb->protocol) {
+#if IS_ENABLED(CONFIG_NET_NSH)
+		case htons(ETH_P_NSH):
+#endif
+#if IS_ENABLED(CONFIG_IPV6)
+		case htons(ETH_P_IPV6):
+#endif
+#if IS_ENABLED(CONFIG_MPLS)
+		case htons(ETH_P_MPLS_UC):
+#endif
+		case htons(ETH_P_IP):
+			vxlan_xmit_one(skb, dev, &vxlan->default_dst, false);
+			return NETDEV_TX_OK;
+		default:
+			/* Assume L2 and look for FDB entry */
+			break;
+		}
+	}
+
 	info = skb_tunnel_info(skb);
 
 	skb_reset_mac_header(skb);
@@ -2474,6 +2579,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_REMCSUM_TX]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_REMCSUM_RX]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_GBP]	= { .type = NLA_FLAG, },
+	[IFLA_VXLAN_GPE]	= { .type = NLA_FLAG, },
 	[IFLA_VXLAN_REMCSUM_NOPARTIAL]	= { .type = NLA_FLAG },
 };
 
@@ -2892,6 +2998,9 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev,
 	if (data[IFLA_VXLAN_GBP])
 		conf.flags |= VXLAN_F_GBP;
 
+	if (data[IFLA_VXLAN_GPE])
+		conf.flags |= VXLAN_F_GPE;
+
 	if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
 		conf.flags |= VXLAN_F_REMCSUM_NOPARTIAL;
 
@@ -3033,6 +3142,10 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_flag(skb, IFLA_VXLAN_GBP))
 		goto nla_put_failure;
 
+	if (vxlan->flags & VXLAN_F_GPE &&
+	    nla_put_flag(skb, IFLA_VXLAN_GPE))
+		goto nla_put_failure;
+
 	if (vxlan->flags & VXLAN_F_REMCSUM_NOPARTIAL &&
 	    nla_put_flag(skb, IFLA_VXLAN_REMCSUM_NOPARTIAL))
 		goto nla_put_failure;
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 25bd919..7886296 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -121,8 +121,44 @@ struct vxlanhdr_gbp {
 
 struct vxlan_metadata {
 	u32		gbp;
+	u8              gpe_np;
 };
 
+/*
+ * VXLAN Generic Protocol Extension:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|Ver|I|P|R|O|       Reserved                |Next Protocol  |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                VXLAN Network Identifier (VNI) |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Ver            Version, initially 0
+ * I = 1	  VXLAN Network Identifier (VNI) present
+ * P = 1          Next Protocol field is present
+ * O = 1          OAM
+ * Next Protocol  Indicates the protocol header immediately following
+ *                the VXLAN GPE header.
+ *
+ * https://tools.ietf.org/html/draft-ietf-nvo3-vxlan-gpe-01
+ *
+ * Use struct vxlanhdr above with some extra defines:
+ */
+#define VXLAN_HF_GPE_OAM BIT(25) /* GPE OAM bit */
+#define VXLAN_HF_GPE_NP  BIT(26) /* GPE protocol bit */
+
+#define VXLAN_GPE_NP_MASK (0xFF)
+
+#define VXLAN_GPE_NP_IPv4 0x1
+#define VXLAN_GPE_NP_IPv6 0x2
+#define VXLAN_GPE_NP_ETH  0x3
+#define VXLAN_GPE_NP_NSH  0x4
+#define VXLAN_GPE_NP_MPLS  0x5
+
+#define VXLAN_GPE_USED_BITS (VXLAN_HF_GPE_NP  | \
+			     VXLAN_HF_GPE_OAM | \
+			     VXLAN_GPE_NP_MASK)
+
+
 /* per UDP socket information */
 struct vxlan_sock {
 	struct hlist_node hlist;
@@ -204,6 +240,7 @@ struct vxlan_dev {
 #define VXLAN_F_GBP			0x800
 #define VXLAN_F_REMCSUM_NOPARTIAL	0x1000
 #define VXLAN_F_COLLECT_METADATA	0x2000
+#define VXLAN_F_GPE			0x4000
 
 /* Flags that are used in the receive path. These flags must match in
  * order for a socket to be shareable
@@ -212,7 +249,8 @@ struct vxlan_dev {
 					 VXLAN_F_UDP_ZERO_CSUM6_RX |	\
 					 VXLAN_F_REMCSUM_RX |		\
 					 VXLAN_F_REMCSUM_NOPARTIAL |	\
-					 VXLAN_F_COLLECT_METADATA)
+					 VXLAN_F_COLLECT_METADATA |     \
+					 VXLAN_F_GPE)
 
 struct net_device *vxlan_dev_create(struct net *net, const char *name,
 				    u8 name_assign_type, struct vxlan_config *conf);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index d452cea..e8d74a5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -456,6 +456,7 @@ enum {
 	IFLA_VXLAN_GBP,
 	IFLA_VXLAN_REMCSUM_NOPARTIAL,
 	IFLA_VXLAN_COLLECT_METADATA,
+	IFLA_VXLAN_GPE,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] vxlan: support GPE/NSH
  2016-02-11 19:57 ` [PATCH net-next v2 2/2] vxlan: support GPE/NSH Brian Russell
@ 2016-02-15 16:49   ` Jiri Benc
  2016-03-01 11:10     ` Brian Russell
  0 siblings, 1 reply; 10+ messages in thread
From: Jiri Benc @ 2016-02-15 16:49 UTC (permalink / raw)
  To: Brian Russell; +Cc: netdev

On Thu, 11 Feb 2016 19:57:06 +0000, Brian Russell wrote:
> +skip_l2:
>  	skb_reset_network_header(skb);
> +
>  	/* In flow-based mode, GBP is carried in dst_metadata */
> -	if (!(vs->flags & VXLAN_F_COLLECT_METADATA))
> +	if (!(vs->flags & VXLAN_F_COLLECT_METADATA) &&
> +	    !(vs->flags & VXLAN_F_GPE))
>  		skb->mark = md->gbp;

This is completely wrong. You cannot return a packet with a garbage in
place of the Ethernet header from ARPHRD_ETHER interface. For proper
VXLAN-GPE support, the vxlan interface needs to be in L3 mode, e.g.
ARPHRD_NONE.

To support L3 mode, the vxlan driver needs *tons* of cleanups (or tons
of duplicate code). This is exactly what I've done and what I'm in
process of merging. The number of patches is too big to be submitted as
a single patchset, hence I'm submitting in parts. The first one has
been already merged (net-next commit 19f76f63507f). For the full code,
look at: https://github.com/jbenc/linux-vxlan/commits/master

Comments are welcome.

 Jiri

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] vxlan: support GPE/NSH
  2016-02-15 16:49   ` Jiri Benc
@ 2016-03-01 11:10     ` Brian Russell
  2016-03-01 18:20       ` Jiri Benc
  0 siblings, 1 reply; 10+ messages in thread
From: Brian Russell @ 2016-03-01 11:10 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev


On 15/02/16 16:49, Jiri Benc wrote:
> On Thu, 11 Feb 2016 19:57:06 +0000, Brian Russell wrote:
>> +skip_l2:
>>  	skb_reset_network_header(skb);
>> +
>>  	/* In flow-based mode, GBP is carried in dst_metadata */
>> -	if (!(vs->flags & VXLAN_F_COLLECT_METADATA))
>> +	if (!(vs->flags & VXLAN_F_COLLECT_METADATA) &&
>> +	    !(vs->flags & VXLAN_F_GPE))
>>  		skb->mark = md->gbp;
> 
> This is completely wrong. You cannot return a packet with a garbage in
> place of the Ethernet header from ARPHRD_ETHER interface. For proper
> VXLAN-GPE support, the vxlan interface needs to be in L3 mode, e.g.
> ARPHRD_NONE.
> 

Yes, I see that, thanks for the clarification. (I was "getting away with" the broken code in my test as I was diverting packets bound for another interface onto the vxlan via a netfilter target.)

> To support L3 mode, the vxlan driver needs *tons* of cleanups (or tons
> of duplicate code). This is exactly what I've done and what I'm in
> process of merging. The number of patches is too big to be submitted as
> a single patchset, hence I'm submitting in parts. The first one has
> been already merged (net-next commit 19f76f63507f). For the full code,
> look at: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jbenc_linux-2Dvxlan_commits_master&d=CwICAg&c=IL_XqQWOjubgfqINi2jTzg&r=Doie302MT-sezztwQymkPQ3_4X5Q3a0mKbiZzzoNm-0&m=dDreCW0fBOYAR9o10-A-Ifd5jVGfykbZGpkzbN11nMc&s=qdPcZavap7kJQOuN2Udz-h-CDNkP4GiV_CYrQwXD_Kg&e= 
> 
> Comments are welcome.
> 

This looks great so I'll drop my vxlan-gpe patch. I'd like to add the NSH capability on top of your patchset which I see is currently under review. Or did you have plans to roll out NSH soon also?

Thanks,

Brian

>  Jiri
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/2] vxlan: support GPE/NSH
  2016-03-01 11:10     ` Brian Russell
@ 2016-03-01 18:20       ` Jiri Benc
  0 siblings, 0 replies; 10+ messages in thread
From: Jiri Benc @ 2016-03-01 18:20 UTC (permalink / raw)
  To: Brian Russell; +Cc: netdev

On Tue, 1 Mar 2016 11:10:40 +0000, Brian Russell wrote:
> I'd like to add the NSH capability on top of your patchset which I see
> is currently under review. Or did you have plans to roll out NSH soon
> also?

If you have something ready, send it to netdev. I want to add NSH
support but if others write that for me, I won't complain :-)

Adding NSH support to VXLAN-GPE itself is trivial, it's two lines. The
hard part is properly integrating NSH support to tc and openvswitch.

 Jiri

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-03-01 18:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-11 19:57 [PATCH net-next v2 0/2] NSH and VxLAN-GPE Brian Russell
2016-02-11 19:57 ` [PATCH net-next v2 1/2] nsh: encapsulation module Brian Russell
2016-02-15 17:01   ` Jiri Benc
2016-03-01 11:11     ` Brian Russell
2016-02-17  3:31   ` Alexei Starovoitov
2016-03-01 11:11     ` Brian Russell
2016-02-11 19:57 ` [PATCH net-next v2 2/2] vxlan: support GPE/NSH Brian Russell
2016-02-15 16:49   ` Jiri Benc
2016-03-01 11:10     ` Brian Russell
2016-03-01 18:20       ` Jiri Benc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).