[GIT net-next] Open vSwitch

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT net-next] Open vSwitch
@ 2012-11-29 18:35 Jesse Gross
  2012-11-29 18:35 ` [PATCH net-next 1/7] openvswitch: Process RARP packets with ethertype 0x8035 similar to ARP packets Jesse Gross
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Jesse Gross @ 2012-11-29 18:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dev

This series of improvements for 3.8/net-next contains four components:
 * Support for modifying IPv6 headers
 * Support for matching and setting skb->mark for better integration with
   things like iptables
 * Ability to recognize the EtherType for RARP packets
 * Two small performance enhancements

The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge
conflicts.  I left it as is but can do the merge if you want.  The conflicts
are:
 * ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of
   exthdrs_core.c.  Both should stay.
 * A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c
   after this patch.  The IPVS user has two instances of the old constant
   name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG.

The following changes since commit d04d382980c86bdee9960c3eb157a73f8ed230cc:

  openvswitch: Store flow key len if ARP opcode is not request or reply. (2012-10-30 17:17:09 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git master

for you to fetch changes up to 92eb1d477145b2e7780b5002e856f70b8c3d74da:

  openvswitch: Use RCU callback when detaching netdevices. (2012-11-28 14:04:34 -0800)

----------------------------------------------------------------
Ansis Atteka (3):
      ipv6: improve ipv6_find_hdr() to skip empty routing headers
      openvswitch: add ipv6 'set' action
      openvswitch: add skb mark matching and set action

Jesse Gross (2):
      ipv6: Move ipv6_find_hdr() out of Netfilter code.
      openvswitch: Use RCU callback when detaching netdevices.

Mehak Mahajan (1):
      openvswitch: Process RARP packets with ethertype 0x8035 similar to ARP packets.

Shan Wei (1):
      net: openvswitch: use this_cpu_ptr per-cpu helper

 include/linux/netfilter_ipv6/ip6_tables.h |    9 ---
 include/linux/openvswitch.h               |    1 +
 include/net/ipv6.h                        |   10 +++
 net/ipv6/exthdrs_core.c                   |  123 +++++++++++++++++++++++++++++
 net/ipv6/netfilter/ip6_tables.c           |  103 ------------------------
 net/netfilter/xt_HMARK.c                  |    8 +-
 net/openvswitch/actions.c                 |   97 +++++++++++++++++++++++
 net/openvswitch/datapath.c                |   27 ++++++-
 net/openvswitch/flow.c                    |   28 ++++++-
 net/openvswitch/flow.h                    |    8 +-
 net/openvswitch/vport-netdev.c            |   14 +++-
 net/openvswitch/vport-netdev.h            |    3 +
 net/openvswitch/vport.c                   |    5 +-
 13 files changed, 304 insertions(+), 132 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 1/7] openvswitch: Process RARP packets with ethertype 0x8035 similar to ARP packets.
  2012-11-29 18:35 [GIT net-next] Open vSwitch Jesse Gross
@ 2012-11-29 18:35 ` Jesse Gross
  2012-11-29 18:35 ` [PATCH net-next 2/7] ipv6: Move ipv6_find_hdr() out of Netfilter code Jesse Gross
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jesse Gross @ 2012-11-29 18:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dev, Mehak Mahajan

From: Mehak Mahajan <mmahajan@nicira.com>

With this commit, OVS will match the data in the RARP packets having
ethertype 0x8035, in the same way as the data in the ARP packets.

Signed-off-by: Mehak Mahajan <mmahajan@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 net/openvswitch/flow.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 733cbf4..e6ce902 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -689,7 +689,8 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
 			}
 		}
 
-	} else if (key->eth.type == htons(ETH_P_ARP) && arphdr_ok(skb)) {
+	} else if ((key->eth.type == htons(ETH_P_ARP) ||
+		   key->eth.type == htons(ETH_P_RARP)) && arphdr_ok(skb)) {
 		struct arp_eth_header *arp;
 
 		arp = (struct arp_eth_header *)skb_network_header(skb);
@@ -1086,7 +1087,8 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 			if (err)
 				return err;
 		}
-	} else if (swkey->eth.type == htons(ETH_P_ARP)) {
+	} else if (swkey->eth.type == htons(ETH_P_ARP) ||
+		   swkey->eth.type == htons(ETH_P_RARP)) {
 		const struct ovs_key_arp *arp_key;
 
 		if (!(attrs & (1 << OVS_KEY_ATTR_ARP)))
@@ -1222,7 +1224,8 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
 		ipv6_key->ipv6_tclass = swkey->ip.tos;
 		ipv6_key->ipv6_hlimit = swkey->ip.ttl;
 		ipv6_key->ipv6_frag = swkey->ip.frag;
-	} else if (swkey->eth.type == htons(ETH_P_ARP)) {
+	} else if (swkey->eth.type == htons(ETH_P_ARP) ||
+		   swkey->eth.type == htons(ETH_P_RARP)) {
 		struct ovs_key_arp *arp_key;
 
 		nla = nla_reserve(skb, OVS_KEY_ATTR_ARP, sizeof(*arp_key));
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 2/7] ipv6: Move ipv6_find_hdr() out of Netfilter code.
  2012-11-29 18:35 [GIT net-next] Open vSwitch Jesse Gross
  2012-11-29 18:35 ` [PATCH net-next 1/7] openvswitch: Process RARP packets with ethertype 0x8035 similar to ARP packets Jesse Gross
@ 2012-11-29 18:35 ` Jesse Gross
  2012-11-29 18:35 ` [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers Jesse Gross
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Jesse Gross @ 2012-11-29 18:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dev

Open vSwitch will soon also use ipv6_find_hdr() so this moves it
out of Netfilter-specific code into a more common location.

Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 include/linux/netfilter_ipv6/ip6_tables.h |    9 ---
 include/net/ipv6.h                        |    9 +++
 net/ipv6/exthdrs_core.c                   |  103 +++++++++++++++++++++++++++++
 net/ipv6/netfilter/ip6_tables.c           |  103 -----------------------------
 net/netfilter/xt_HMARK.c                  |    8 +--
 5 files changed, 116 insertions(+), 116 deletions(-)

diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h
index 5f84c62..610208b 100644
--- a/include/linux/netfilter_ipv6/ip6_tables.h
+++ b/include/linux/netfilter_ipv6/ip6_tables.h
@@ -47,15 +47,6 @@ ip6t_ext_hdr(u8 nexthdr)
 	       (nexthdr == IPPROTO_DSTOPTS);
 }
 
-enum {
-	IP6T_FH_F_FRAG	= (1 << 0),
-	IP6T_FH_F_AUTH	= (1 << 1),
-};
-
-/* find specified header and get offset to it */
-extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
-			 int target, unsigned short *fragoff, int *fragflg);
-
 #ifdef CONFIG_COMPAT
 #include <net/compat.h>
 
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 979bf6c..b2f0cfb 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -630,6 +630,15 @@ extern int			ipv6_skip_exthdr(const struct sk_buff *, int start,
 
 extern bool			ipv6_ext_hdr(u8 nexthdr);
 
+enum {
+	IP6_FH_F_FRAG	= (1 << 0),
+	IP6_FH_F_AUTH	= (1 << 1),
+};
+
+/* find specified header and get offset to it */
+extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
+			 int target, unsigned short *fragoff, int *fragflg);
+
 extern int ipv6_find_tlv(struct sk_buff *skb, int offset, int type);
 
 extern struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index f73d59a..8ea253a 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -111,3 +111,106 @@ int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp,
 	return start;
 }
 EXPORT_SYMBOL(ipv6_skip_exthdr);
+
+/*
+ * find the offset to specified header or the protocol number of last header
+ * if target < 0. "last header" is transport protocol header, ESP, or
+ * "No next header".
+ *
+ * Note that *offset is used as input/output parameter. an if it is not zero,
+ * then it must be a valid offset to an inner IPv6 header. This can be used
+ * to explore inner IPv6 header, eg. ICMPv6 error messages.
+ *
+ * If target header is found, its offset is set in *offset and return protocol
+ * number. Otherwise, return -1.
+ *
+ * If the first fragment doesn't contain the final protocol header or
+ * NEXTHDR_NONE it is considered invalid.
+ *
+ * Note that non-1st fragment is special case that "the protocol number
+ * of last header" is "next header" field in Fragment header. In this case,
+ * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
+ * isn't NULL.
+ *
+ * if flags is not NULL and it's a fragment, then the frag flag IP6_FH_F_FRAG
+ * will be set. If it's an AH header, the IP6_FH_F_AUTH flag is set and
+ * target < 0, then this function will stop at the AH header.
+ */
+int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
+		  int target, unsigned short *fragoff, int *flags)
+{
+	unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
+	u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+	unsigned int len;
+
+	if (fragoff)
+		*fragoff = 0;
+
+	if (*offset) {
+		struct ipv6hdr _ip6, *ip6;
+
+		ip6 = skb_header_pointer(skb, *offset, sizeof(_ip6), &_ip6);
+		if (!ip6 || (ip6->version != 6)) {
+			printk(KERN_ERR "IPv6 header not found\n");
+			return -EBADMSG;
+		}
+		start = *offset + sizeof(struct ipv6hdr);
+		nexthdr = ip6->nexthdr;
+	}
+	len = skb->len - start;
+
+	while (nexthdr != target) {
+		struct ipv6_opt_hdr _hdr, *hp;
+		unsigned int hdrlen;
+
+		if ((!ipv6_ext_hdr(nexthdr)) || nexthdr == NEXTHDR_NONE) {
+			if (target < 0)
+				break;
+			return -ENOENT;
+		}
+
+		hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr);
+		if (hp == NULL)
+			return -EBADMSG;
+		if (nexthdr == NEXTHDR_FRAGMENT) {
+			unsigned short _frag_off;
+			__be16 *fp;
+
+			if (flags)	/* Indicate that this is a fragment */
+				*flags |= IP6_FH_F_FRAG;
+			fp = skb_header_pointer(skb,
+						start+offsetof(struct frag_hdr,
+							       frag_off),
+						sizeof(_frag_off),
+						&_frag_off);
+			if (fp == NULL)
+				return -EBADMSG;
+
+			_frag_off = ntohs(*fp) & ~0x7;
+			if (_frag_off) {
+				if (target < 0 &&
+				    ((!ipv6_ext_hdr(hp->nexthdr)) ||
+				     hp->nexthdr == NEXTHDR_NONE)) {
+					if (fragoff)
+						*fragoff = _frag_off;
+					return hp->nexthdr;
+				}
+				return -ENOENT;
+			}
+			hdrlen = 8;
+		} else if (nexthdr == NEXTHDR_AUTH) {
+			if (flags && (*flags & IP6_FH_F_AUTH) && (target < 0))
+				break;
+			hdrlen = (hp->hdrlen + 2) << 2;
+		} else
+			hdrlen = ipv6_optlen(hp);
+
+		nexthdr = hp->nexthdr;
+		len -= hdrlen;
+		start += hdrlen;
+	}
+
+	*offset = start;
+	return nexthdr;
+}
+EXPORT_SYMBOL(ipv6_find_hdr);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index d7cb045..1ce4f15 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -2273,112 +2273,9 @@ static void __exit ip6_tables_fini(void)
 	unregister_pernet_subsys(&ip6_tables_net_ops);
 }
 
-/*
- * find the offset to specified header or the protocol number of last header
- * if target < 0. "last header" is transport protocol header, ESP, or
- * "No next header".
- *
- * Note that *offset is used as input/output parameter. an if it is not zero,
- * then it must be a valid offset to an inner IPv6 header. This can be used
- * to explore inner IPv6 header, eg. ICMPv6 error messages.
- *
- * If target header is found, its offset is set in *offset and return protocol
- * number. Otherwise, return -1.
- *
- * If the first fragment doesn't contain the final protocol header or
- * NEXTHDR_NONE it is considered invalid.
- *
- * Note that non-1st fragment is special case that "the protocol number
- * of last header" is "next header" field in Fragment header. In this case,
- * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
- * isn't NULL.
- *
- * if flags is not NULL and it's a fragment, then the frag flag IP6T_FH_F_FRAG
- * will be set. If it's an AH header, the IP6T_FH_F_AUTH flag is set and
- * target < 0, then this function will stop at the AH header.
- */
-int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
-		  int target, unsigned short *fragoff, int *flags)
-{
-	unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
-	u8 nexthdr = ipv6_hdr(skb)->nexthdr;
-	unsigned int len;
-
-	if (fragoff)
-		*fragoff = 0;
-
-	if (*offset) {
-		struct ipv6hdr _ip6, *ip6;
-
-		ip6 = skb_header_pointer(skb, *offset, sizeof(_ip6), &_ip6);
-		if (!ip6 || (ip6->version != 6)) {
-			printk(KERN_ERR "IPv6 header not found\n");
-			return -EBADMSG;
-		}
-		start = *offset + sizeof(struct ipv6hdr);
-		nexthdr = ip6->nexthdr;
-	}
-	len = skb->len - start;
-
-	while (nexthdr != target) {
-		struct ipv6_opt_hdr _hdr, *hp;
-		unsigned int hdrlen;
-
-		if ((!ipv6_ext_hdr(nexthdr)) || nexthdr == NEXTHDR_NONE) {
-			if (target < 0)
-				break;
-			return -ENOENT;
-		}
-
-		hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr);
-		if (hp == NULL)
-			return -EBADMSG;
-		if (nexthdr == NEXTHDR_FRAGMENT) {
-			unsigned short _frag_off;
-			__be16 *fp;
-
-			if (flags)	/* Indicate that this is a fragment */
-				*flags |= IP6T_FH_F_FRAG;
-			fp = skb_header_pointer(skb,
-						start+offsetof(struct frag_hdr,
-							       frag_off),
-						sizeof(_frag_off),
-						&_frag_off);
-			if (fp == NULL)
-				return -EBADMSG;
-
-			_frag_off = ntohs(*fp) & ~0x7;
-			if (_frag_off) {
-				if (target < 0 &&
-				    ((!ipv6_ext_hdr(hp->nexthdr)) ||
-				     hp->nexthdr == NEXTHDR_NONE)) {
-					if (fragoff)
-						*fragoff = _frag_off;
-					return hp->nexthdr;
-				}
-				return -ENOENT;
-			}
-			hdrlen = 8;
-		} else if (nexthdr == NEXTHDR_AUTH) {
-			if (flags && (*flags & IP6T_FH_F_AUTH) && (target < 0))
-				break;
-			hdrlen = (hp->hdrlen + 2) << 2;
-		} else
-			hdrlen = ipv6_optlen(hp);
-
-		nexthdr = hp->nexthdr;
-		len -= hdrlen;
-		start += hdrlen;
-	}
-
-	*offset = start;
-	return nexthdr;
-}
-
 EXPORT_SYMBOL(ip6t_register_table);
 EXPORT_SYMBOL(ip6t_unregister_table);
 EXPORT_SYMBOL(ip6t_do_table);
-EXPORT_SYMBOL(ipv6_find_hdr);
 
 module_init(ip6_tables_init);
 module_exit(ip6_tables_fini);
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
index 1686ca1..73b73f6 100644
--- a/net/netfilter/xt_HMARK.c
+++ b/net/netfilter/xt_HMARK.c
@@ -167,7 +167,7 @@ hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
 			  const struct xt_hmark_info *info)
 {
 	struct ipv6hdr *ip6, _ip6;
-	int flag = IP6T_FH_F_AUTH;
+	int flag = IP6_FH_F_AUTH;
 	unsigned int nhoff = 0;
 	u16 fragoff = 0;
 	int nexthdr;
@@ -177,7 +177,7 @@ hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
 	if (nexthdr < 0)
 		return 0;
 	/* No need to check for icmp errors on fragments */
-	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+	if ((flag & IP6_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
 		goto noicmp;
 	/* Use inner header in case of ICMP errors */
 	if (get_inner6_hdr(skb, &nhoff)) {
@@ -185,7 +185,7 @@ hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
 		if (ip6 == NULL)
 			return -1;
 		/* If AH present, use SPI like in ESP. */
-		flag = IP6T_FH_F_AUTH;
+		flag = IP6_FH_F_AUTH;
 		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
 		if (nexthdr < 0)
 			return -1;
@@ -201,7 +201,7 @@ noicmp:
 	if (t->proto == IPPROTO_ICMPV6)
 		return 0;
 
-	if (flag & IP6T_FH_F_FRAG)
+	if (flag & IP6_FH_F_FRAG)
 		return 0;
 
 	hmark_set_tuple_ports(skb, nhoff, t, info);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers
  2012-11-29 18:35 [GIT net-next] Open vSwitch Jesse Gross
  2012-11-29 18:35 ` [PATCH net-next 1/7] openvswitch: Process RARP packets with ethertype 0x8035 similar to ARP packets Jesse Gross
  2012-11-29 18:35 ` [PATCH net-next 2/7] ipv6: Move ipv6_find_hdr() out of Netfilter code Jesse Gross
@ 2012-11-29 18:35 ` Jesse Gross
  2012-12-03 14:04   ` Pablo Neira Ayuso
       [not found] ` <1354214149-33651-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Jesse Gross @ 2012-11-29 18:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dev, Ansis Atteka

From: Ansis Atteka <aatteka@nicira.com>

This patch prepares ipv6_find_hdr() function so that it could be
able to skip routing headers, where segements_left is 0. This is
required to handle multiple routing header case correctly when
changing IPv6 addresses.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 include/net/ipv6.h      |    5 +++--
 net/ipv6/exthdrs_core.c |   36 ++++++++++++++++++++++++++++--------
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index b2f0cfb..acbd8e0 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -631,8 +631,9 @@ extern int			ipv6_skip_exthdr(const struct sk_buff *, int start,
 extern bool			ipv6_ext_hdr(u8 nexthdr);
 
 enum {
-	IP6_FH_F_FRAG	= (1 << 0),
-	IP6_FH_F_AUTH	= (1 << 1),
+	IP6_FH_F_FRAG		= (1 << 0),
+	IP6_FH_F_AUTH		= (1 << 1),
+	IP6_FH_F_SKIP_RH	= (1 << 2),
 };
 
 /* find specified header and get offset to it */
diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index 8ea253a..11b4e29 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -132,9 +132,11 @@ EXPORT_SYMBOL(ipv6_skip_exthdr);
  * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
  * isn't NULL.
  *
- * if flags is not NULL and it's a fragment, then the frag flag IP6_FH_F_FRAG
- * will be set. If it's an AH header, the IP6_FH_F_AUTH flag is set and
- * target < 0, then this function will stop at the AH header.
+ * if flags is not NULL and it's a fragment, then the frag flag
+ * IP6_FH_F_FRAG will be set. If it's an AH header, the
+ * IP6_FH_F_AUTH flag is set and target < 0, then this function will
+ * stop at the AH header. If IP6_FH_F_SKIP_RH flag was passed, then this
+ * function will skip all those routing headers, where segements_left was 0.
  */
 int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		  int target, unsigned short *fragoff, int *flags)
@@ -142,6 +144,7 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 	unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
 	u8 nexthdr = ipv6_hdr(skb)->nexthdr;
 	unsigned int len;
+	bool found;
 
 	if (fragoff)
 		*fragoff = 0;
@@ -159,9 +162,10 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 	}
 	len = skb->len - start;
 
-	while (nexthdr != target) {
+	do {
 		struct ipv6_opt_hdr _hdr, *hp;
 		unsigned int hdrlen;
+		found = (nexthdr == target);
 
 		if ((!ipv6_ext_hdr(nexthdr)) || nexthdr == NEXTHDR_NONE) {
 			if (target < 0)
@@ -172,6 +176,20 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr);
 		if (hp == NULL)
 			return -EBADMSG;
+
+		if (nexthdr == NEXTHDR_ROUTING) {
+			struct ipv6_rt_hdr _rh, *rh;
+
+			rh = skb_header_pointer(skb, start, sizeof(_rh),
+						&_rh);
+			if (rh == NULL)
+				return -EBADMSG;
+
+			if (flags && (*flags & IP6_FH_F_SKIP_RH) &&
+			    rh->segments_left == 0)
+				found = false;
+		}
+
 		if (nexthdr == NEXTHDR_FRAGMENT) {
 			unsigned short _frag_off;
 			__be16 *fp;
@@ -205,10 +223,12 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		} else
 			hdrlen = ipv6_optlen(hp);
 
-		nexthdr = hp->nexthdr;
-		len -= hdrlen;
-		start += hdrlen;
-	}
+		if (!found) {
+			nexthdr = hp->nexthdr;
+			len -= hdrlen;
+			start += hdrlen;
+		}
+	} while (!found);
 
 	*offset = start;
 	return nexthdr;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers
  2012-11-29 18:35 ` [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers Jesse Gross
@ 2012-12-03 14:04   ` Pablo Neira Ayuso
  2012-12-03 17:28     ` Jesse Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-03 14:04 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev, dev, Ansis Atteka

On Thu, Nov 29, 2012 at 10:35:45AM -0800, Jesse Gross wrote:
> From: Ansis Atteka <aatteka@nicira.com>
> 
> This patch prepares ipv6_find_hdr() function so that it could be
> able to skip routing headers, where segements_left is 0. This is
> required to handle multiple routing header case correctly when
> changing IPv6 addresses.
> 
> Signed-off-by: Ansis Atteka <aatteka@nicira.com>
> Signed-off-by: Jesse Gross <jesse@nicira.com>
> ---
>  include/net/ipv6.h      |    5 +++--
>  net/ipv6/exthdrs_core.c |   36 ++++++++++++++++++++++++++++--------
>  2 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index b2f0cfb..acbd8e0 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -631,8 +631,9 @@ extern int			ipv6_skip_exthdr(const struct sk_buff *, int start,
>  extern bool			ipv6_ext_hdr(u8 nexthdr);
>  
>  enum {
> -	IP6_FH_F_FRAG	= (1 << 0),
> -	IP6_FH_F_AUTH	= (1 << 1),
> +	IP6_FH_F_FRAG		= (1 << 0),
> +	IP6_FH_F_AUTH		= (1 << 1),
> +	IP6_FH_F_SKIP_RH	= (1 << 2),
>  };
>  
>  /* find specified header and get offset to it */
> diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
> index 8ea253a..11b4e29 100644
> --- a/net/ipv6/exthdrs_core.c
> +++ b/net/ipv6/exthdrs_core.c
> @@ -132,9 +132,11 @@ EXPORT_SYMBOL(ipv6_skip_exthdr);
>   * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
>   * isn't NULL.
>   *
> - * if flags is not NULL and it's a fragment, then the frag flag IP6_FH_F_FRAG
> - * will be set. If it's an AH header, the IP6_FH_F_AUTH flag is set and
> - * target < 0, then this function will stop at the AH header.
> + * if flags is not NULL and it's a fragment, then the frag flag
> + * IP6_FH_F_FRAG will be set. If it's an AH header, the
> + * IP6_FH_F_AUTH flag is set and target < 0, then this function will
> + * stop at the AH header. If IP6_FH_F_SKIP_RH flag was passed, then this
> + * function will skip all those routing headers, where segements_left was 0.
>   */
>  int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>  		  int target, unsigned short *fragoff, int *flags)
> @@ -142,6 +144,7 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>  	unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
>  	u8 nexthdr = ipv6_hdr(skb)->nexthdr;
>  	unsigned int len;
> +	bool found;
>  
>  	if (fragoff)
>  		*fragoff = 0;
> @@ -159,9 +162,10 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>  	}
>  	len = skb->len - start;
>  
> -	while (nexthdr != target) {

If the offset is set as parameter via ipv6_find_hdr, we now are always
entering the loop even if we found the target header we're looking
for, before that didn't happen.

Something seems wrong here to me.

> +	do {
>  		struct ipv6_opt_hdr _hdr, *hp;
>  		unsigned int hdrlen;
> +		found = (nexthdr == target);
>  
>  		if ((!ipv6_ext_hdr(nexthdr)) || nexthdr == NEXTHDR_NONE) {
>  			if (target < 0)
> @@ -172,6 +176,20 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>  		hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr);
>  		if (hp == NULL)
>  			return -EBADMSG;
> +
> +		if (nexthdr == NEXTHDR_ROUTING) {
> +			struct ipv6_rt_hdr _rh, *rh;
> +
> +			rh = skb_header_pointer(skb, start, sizeof(_rh),
> +						&_rh);
> +			if (rh == NULL)
> +				return -EBADMSG;
> +
> +			if (flags && (*flags & IP6_FH_F_SKIP_RH) &&
> +			    rh->segments_left == 0)
> +				found = false;
> +		}
> +
>  		if (nexthdr == NEXTHDR_FRAGMENT) {
>  			unsigned short _frag_off;
>  			__be16 *fp;
> @@ -205,10 +223,12 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>  		} else
>  			hdrlen = ipv6_optlen(hp);
>  
> -		nexthdr = hp->nexthdr;
> -		len -= hdrlen;
> -		start += hdrlen;
> -	}
> +		if (!found) {
> +			nexthdr = hp->nexthdr;
> +			len -= hdrlen;
> +			start += hdrlen;
> +		}
> +	} while (!found);
>  
>  	*offset = start;
>  	return nexthdr;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers
  2012-12-03 14:04   ` Pablo Neira Ayuso
@ 2012-12-03 17:28     ` Jesse Gross
  2012-12-03 18:06       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 18+ messages in thread
From: Jesse Gross @ 2012-12-03 17:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Miller, Ansis Atteka

On Mon, Dec 3, 2012 at 6:04 AM, Pablo Neira Ayuso <pablo-Cap9r6Oaw4JrovVCs/uTlw@public.gmane.org> wrote:
> On Thu, Nov 29, 2012 at 10:35:45AM -0800, Jesse Gross wrote:
>> @@ -159,9 +162,10 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>>       }
>>       len = skb->len - start;
>>
>> -     while (nexthdr != target) {
>
> If the offset is set as parameter via ipv6_find_hdr, we now are always
> entering the loop even if we found the target header we're looking
> for, before that didn't happen.
>
> Something seems wrong here to me.

If the target header is a routing header then you might still need to
continue searching because the first one that you see could be empty.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers
  2012-12-03 17:28     ` Jesse Gross
@ 2012-12-03 18:06       ` Pablo Neira Ayuso
  2012-12-04 18:15         ` Jesse Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Pablo Neira Ayuso @ 2012-12-03 18:06 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, netdev, dev, Ansis Atteka

On Mon, Dec 03, 2012 at 09:28:55AM -0800, Jesse Gross wrote:
> On Mon, Dec 3, 2012 at 6:04 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Thu, Nov 29, 2012 at 10:35:45AM -0800, Jesse Gross wrote:
> >> @@ -159,9 +162,10 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
> >>       }
> >>       len = skb->len - start;
> >>
> >> -     while (nexthdr != target) {
> >
> > If the offset is set as parameter via ipv6_find_hdr, we now are always
> > entering the loop even if we found the target header we're looking
> > for, before that didn't happen.
> >
> > Something seems wrong here to me.
> 
> If the target header is a routing header then you might still need to
> continue searching because the first one that you see could be empty.

OK, but if it's not a routing header what we're searching for (which
seems to be the case of netfilter/IPVS) we waste way more cycles on
copying the IPv6 header again and with way more things that are
completely useless.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers
  2012-12-03 18:06       ` Pablo Neira Ayuso
@ 2012-12-04 18:15         ` Jesse Gross
  2012-12-04 20:47           ` Ansis Atteka
  0 siblings, 1 reply; 18+ messages in thread
From: Jesse Gross @ 2012-12-04 18:15 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: David Miller, netdev, dev, Ansis Atteka

On Mon, Dec 3, 2012 at 10:06 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Dec 03, 2012 at 09:28:55AM -0800, Jesse Gross wrote:
>> On Mon, Dec 3, 2012 at 6:04 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > On Thu, Nov 29, 2012 at 10:35:45AM -0800, Jesse Gross wrote:
>> >> @@ -159,9 +162,10 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>> >>       }
>> >>       len = skb->len - start;
>> >>
>> >> -     while (nexthdr != target) {
>> >
>> > If the offset is set as parameter via ipv6_find_hdr, we now are always
>> > entering the loop even if we found the target header we're looking
>> > for, before that didn't happen.
>> >
>> > Something seems wrong here to me.
>>
>> If the target header is a routing header then you might still need to
>> continue searching because the first one that you see could be empty.
>
> OK, but if it's not a routing header what we're searching for (which
> seems to be the case of netfilter/IPVS) we waste way more cycles on
> copying the IPv6 header again and with way more things that are
> completely useless.

We could add a check to short circuit this but it seems like a
premature optimization to me.

Ansis, can you comment?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers
  2012-12-04 18:15         ` Jesse Gross
@ 2012-12-04 20:47           ` Ansis Atteka
  0 siblings, 0 replies; 18+ messages in thread
From: Ansis Atteka @ 2012-12-04 20:47 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Pablo Neira Ayuso, David Miller, netdev, dev

On Tue, Dec 4, 2012 at 10:15 AM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Dec 3, 2012 at 10:06 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> On Mon, Dec 03, 2012 at 09:28:55AM -0800, Jesse Gross wrote:
>>> On Mon, Dec 3, 2012 at 6:04 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>>> > On Thu, Nov 29, 2012 at 10:35:45AM -0800, Jesse Gross wrote:
>>> >> @@ -159,9 +162,10 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
>>> >>       }
>>> >>       len = skb->len - start;
>>> >>
>>> >> -     while (nexthdr != target) {
>>> >
>>> > If the offset is set as parameter via ipv6_find_hdr, we now are always
>>> > entering the loop even if we found the target header we're looking
>>> > for, before that didn't happen.
>>> >
>>> > Something seems wrong here to me.
>>>
>>> If the target header is a routing header then you might still need to
>>> continue searching because the first one that you see could be empty.
>>
>> OK, but if it's not a routing header what we're searching for (which
>> seems to be the case of netfilter/IPVS) we waste way more cycles on
>> copying the IPv6 header again and with way more things that are
>> completely useless.
>
> We could add a check to short circuit this but it seems like a
> premature optimization to me.
>
> Ansis, can you comment?

Pablo, I think that the only concern you have here is about
optimizations, right?

We could either:
1. add an "if" statement that terminates loop early; or
2. switch back to "while () {}" loop and change condition from
"nexthdr != target" to something like "nexthdr != target || (nexthdr
== NEXTHDR_ROUTING && flags && (*flags & IP6_FH_F_SKIP_RH))".

This function seemed like a good candidate to extend it for this. I
think that iptables could make use of it too (to figure out whether L4
checksums need to be updated in presence of routing headers and NAT).

Thanks,
Ansis

^ permalink raw reply	[flat|nested] 18+ messages in thread

[parent not found: <1354214149-33651-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>]

* [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
       [not found] ` <1354214149-33651-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
@ 2012-11-29 18:35   ` Jesse Gross
       [not found]     ` <1354214149-33651-5-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
  2012-11-29 18:35   ` [PATCH net-next 5/7] net: openvswitch: use this_cpu_ptr per-cpu helper Jesse Gross
  1 sibling, 1 reply; 18+ messages in thread
From: Jesse Gross @ 2012-11-29 18:35 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Ansis Atteka <aatteka-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

This patch adds ipv6 set action functionality. It allows to change
traffic class, flow label, hop-limit, ipv6 source and destination
address fields.

Signed-off-by: Ansis Atteka <aatteka-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/actions.c  |   93 ++++++++++++++++++++++++++++++++++++++++++++
 net/openvswitch/datapath.c |   20 ++++++++++
 2 files changed, 113 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 0811447..a58ed27 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -28,6 +28,7 @@
 #include <linux/if_arp.h>
 #include <linux/if_vlan.h>
 #include <net/ip.h>
+#include <net/ipv6.h>
 #include <net/checksum.h>
 #include <net/dsfield.h>
 
@@ -162,6 +163,53 @@ static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh,
 	*addr = new_addr;
 }
 
+static void update_ipv6_checksum(struct sk_buff *skb, u8 l4_proto,
+				 __be32 addr[4], const __be32 new_addr[4])
+{
+	int transport_len = skb->len - skb_transport_offset(skb);
+
+	if (l4_proto == IPPROTO_TCP) {
+		if (likely(transport_len >= sizeof(struct tcphdr)))
+			inet_proto_csum_replace16(&tcp_hdr(skb)->check, skb,
+						  addr, new_addr, 1);
+	} else if (l4_proto == IPPROTO_UDP) {
+		if (likely(transport_len >= sizeof(struct udphdr))) {
+			struct udphdr *uh = udp_hdr(skb);
+
+			if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
+				inet_proto_csum_replace16(&uh->check, skb,
+							  addr, new_addr, 1);
+				if (!uh->check)
+					uh->check = CSUM_MANGLED_0;
+			}
+		}
+	}
+}
+
+static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto,
+			  __be32 addr[4], const __be32 new_addr[4],
+			  bool recalculate_csum)
+{
+	if (recalculate_csum)
+		update_ipv6_checksum(skb, l4_proto, addr, new_addr);
+
+	skb->rxhash = 0;
+	memcpy(addr, new_addr, sizeof(__be32[4]));
+}
+
+static void set_ipv6_tc(struct ipv6hdr *nh, u8 tc)
+{
+	nh->priority = tc >> 4;
+	nh->flow_lbl[0] = (nh->flow_lbl[0] & 0x0F) | ((tc & 0x0F) << 4);
+}
+
+static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl)
+{
+	nh->flow_lbl[0] = (nh->flow_lbl[0] & 0xF0) | (fl & 0x000F0000) >> 16;
+	nh->flow_lbl[1] = (fl & 0x0000FF00) >> 8;
+	nh->flow_lbl[2] = fl & 0x000000FF;
+}
+
 static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl)
 {
 	csum_replace2(&nh->check, htons(nh->ttl << 8), htons(new_ttl << 8));
@@ -195,6 +243,47 @@ static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *ipv4_key)
 	return 0;
 }
 
+static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *ipv6_key)
+{
+	struct ipv6hdr *nh;
+	int err;
+	__be32 *saddr;
+	__be32 *daddr;
+
+	err = make_writable(skb, skb_network_offset(skb) +
+			    sizeof(struct ipv6hdr));
+	if (unlikely(err))
+		return err;
+
+	nh = ipv6_hdr(skb);
+	saddr = (__be32 *)&nh->saddr;
+	daddr = (__be32 *)&nh->daddr;
+
+	if (memcmp(ipv6_key->ipv6_src, saddr, sizeof(ipv6_key->ipv6_src)))
+		set_ipv6_addr(skb, ipv6_key->ipv6_proto, saddr,
+			      ipv6_key->ipv6_src, true);
+
+	if (memcmp(ipv6_key->ipv6_dst, daddr, sizeof(ipv6_key->ipv6_dst))) {
+		unsigned int offset = 0;
+		int flags = IP6_FH_F_SKIP_RH;
+		bool recalc_csum = true;
+
+		if (ipv6_ext_hdr(nh->nexthdr))
+			recalc_csum = ipv6_find_hdr(skb, &offset,
+						    NEXTHDR_ROUTING, NULL,
+						    &flags) != NEXTHDR_ROUTING;
+
+		set_ipv6_addr(skb, ipv6_key->ipv6_proto, daddr,
+			      ipv6_key->ipv6_dst, recalc_csum);
+	}
+
+	set_ipv6_tc(nh, ipv6_key->ipv6_tclass);
+	set_ipv6_fl(nh, ntohl(ipv6_key->ipv6_label));
+	nh->hop_limit = ipv6_key->ipv6_hlimit;
+
+	return 0;
+}
+
 /* Must follow make_writable() since that can move the skb data. */
 static void set_tp_port(struct sk_buff *skb, __be16 *port,
 			 __be16 new_port, __sum16 *check)
@@ -347,6 +436,10 @@ static int execute_set_action(struct sk_buff *skb,
 		err = set_ipv4(skb, nla_data(nested_attr));
 		break;
 
+	case OVS_KEY_ATTR_IPV6:
+		err = set_ipv6(skb, nla_data(nested_attr));
+		break;
+
 	case OVS_KEY_ATTR_TCP:
 		err = set_tcp(skb, nla_data(nested_attr));
 		break;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 4c4b62c..fd4a6a4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -479,6 +479,7 @@ static int validate_set(const struct nlattr *a,
 
 	switch (key_type) {
 	const struct ovs_key_ipv4 *ipv4_key;
+	const struct ovs_key_ipv6 *ipv6_key;
 
 	case OVS_KEY_ATTR_PRIORITY:
 	case OVS_KEY_ATTR_ETHERNET:
@@ -500,6 +501,25 @@ static int validate_set(const struct nlattr *a,
 
 		break;
 
+	case OVS_KEY_ATTR_IPV6:
+		if (flow_key->eth.type != htons(ETH_P_IPV6))
+			return -EINVAL;
+
+		if (!flow_key->ip.proto)
+			return -EINVAL;
+
+		ipv6_key = nla_data(ovs_key);
+		if (ipv6_key->ipv6_proto != flow_key->ip.proto)
+			return -EINVAL;
+
+		if (ipv6_key->ipv6_frag != flow_key->ip.frag)
+			return -EINVAL;
+
+		if (ntohl(ipv6_key->ipv6_label) & 0xFFF00000)
+			return -EINVAL;
+
+		break;
+
 	case OVS_KEY_ATTR_TCP:
 		if (flow_key->ip.proto != IPPROTO_TCP)
 			return -EINVAL;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

[parent not found: <1354214149-33651-5-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
       [not found]     ` <1354214149-33651-5-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
@ 2012-12-12  3:14       ` Tom Herbert
       [not found]         ` <CA+mtBx-Zf9FNf11H9RM12etHnJ1bPpM_Eyc4mR7E6xsb7sUP2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2012-12-12  3:14 UTC (permalink / raw)
  To: Jesse Gross
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Miller

> This patch adds ipv6 set action functionality. It allows to change
> traffic class, flow label, hop-limit, ipv6 source and destination
> address fields.
>
I have to wonder about these patches and the underlying design
direction.  Aren't these sort of things and more already implemented
by IPtables but in a modular and extensible fashion?  Has there been
any thought into hooking OVS to IP tables to leverage all the existing
functionality?

Thanks,
Tom

> Signed-off-by: Ansis Atteka <aatteka-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
> ---
>  net/openvswitch/actions.c  |   93 ++++++++++++++++++++++++++++++++++++++++++++
>  net/openvswitch/datapath.c |   20 ++++++++++
>  2 files changed, 113 insertions(+)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 0811447..a58ed27 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -28,6 +28,7 @@
>  #include <linux/if_arp.h>
>  #include <linux/if_vlan.h>
>  #include <net/ip.h>
> +#include <net/ipv6.h>
>  #include <net/checksum.h>
>  #include <net/dsfield.h>
>
> @@ -162,6 +163,53 @@ static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh,
>         *addr = new_addr;
>  }
>
> +static void update_ipv6_checksum(struct sk_buff *skb, u8 l4_proto,
> +                                __be32 addr[4], const __be32 new_addr[4])
> +{
> +       int transport_len = skb->len - skb_transport_offset(skb);
> +
> +       if (l4_proto == IPPROTO_TCP) {
> +               if (likely(transport_len >= sizeof(struct tcphdr)))
> +                       inet_proto_csum_replace16(&tcp_hdr(skb)->check, skb,
> +                                                 addr, new_addr, 1);
> +       } else if (l4_proto == IPPROTO_UDP) {
> +               if (likely(transport_len >= sizeof(struct udphdr))) {
> +                       struct udphdr *uh = udp_hdr(skb);
> +
> +                       if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
> +                               inet_proto_csum_replace16(&uh->check, skb,
> +                                                         addr, new_addr, 1);
> +                               if (!uh->check)
> +                                       uh->check = CSUM_MANGLED_0;
> +                       }
> +               }
> +       }
> +}
> +
> +static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto,
> +                         __be32 addr[4], const __be32 new_addr[4],
> +                         bool recalculate_csum)
> +{
> +       if (recalculate_csum)
> +               update_ipv6_checksum(skb, l4_proto, addr, new_addr);
> +
> +       skb->rxhash = 0;
> +       memcpy(addr, new_addr, sizeof(__be32[4]));
> +}
> +
> +static void set_ipv6_tc(struct ipv6hdr *nh, u8 tc)
> +{
> +       nh->priority = tc >> 4;
> +       nh->flow_lbl[0] = (nh->flow_lbl[0] & 0x0F) | ((tc & 0x0F) << 4);
> +}
> +
> +static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl)
> +{
> +       nh->flow_lbl[0] = (nh->flow_lbl[0] & 0xF0) | (fl & 0x000F0000) >> 16;
> +       nh->flow_lbl[1] = (fl & 0x0000FF00) >> 8;
> +       nh->flow_lbl[2] = fl & 0x000000FF;
> +}
> +
>  static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl)
>  {
>         csum_replace2(&nh->check, htons(nh->ttl << 8), htons(new_ttl << 8));
> @@ -195,6 +243,47 @@ static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *ipv4_key)
>         return 0;
>  }
>
> +static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *ipv6_key)
> +{
> +       struct ipv6hdr *nh;
> +       int err;
> +       __be32 *saddr;
> +       __be32 *daddr;
> +
> +       err = make_writable(skb, skb_network_offset(skb) +
> +                           sizeof(struct ipv6hdr));
> +       if (unlikely(err))
> +               return err;
> +
> +       nh = ipv6_hdr(skb);
> +       saddr = (__be32 *)&nh->saddr;
> +       daddr = (__be32 *)&nh->daddr;
> +
> +       if (memcmp(ipv6_key->ipv6_src, saddr, sizeof(ipv6_key->ipv6_src)))
> +               set_ipv6_addr(skb, ipv6_key->ipv6_proto, saddr,
> +                             ipv6_key->ipv6_src, true);
> +
> +       if (memcmp(ipv6_key->ipv6_dst, daddr, sizeof(ipv6_key->ipv6_dst))) {
> +               unsigned int offset = 0;
> +               int flags = IP6_FH_F_SKIP_RH;
> +               bool recalc_csum = true;
> +
> +               if (ipv6_ext_hdr(nh->nexthdr))
> +                       recalc_csum = ipv6_find_hdr(skb, &offset,
> +                                                   NEXTHDR_ROUTING, NULL,
> +                                                   &flags) != NEXTHDR_ROUTING;
> +
> +               set_ipv6_addr(skb, ipv6_key->ipv6_proto, daddr,
> +                             ipv6_key->ipv6_dst, recalc_csum);
> +       }
> +
> +       set_ipv6_tc(nh, ipv6_key->ipv6_tclass);
> +       set_ipv6_fl(nh, ntohl(ipv6_key->ipv6_label));
> +       nh->hop_limit = ipv6_key->ipv6_hlimit;
> +
> +       return 0;
> +}
> +
>  /* Must follow make_writable() since that can move the skb data. */
>  static void set_tp_port(struct sk_buff *skb, __be16 *port,
>                          __be16 new_port, __sum16 *check)
> @@ -347,6 +436,10 @@ static int execute_set_action(struct sk_buff *skb,
>                 err = set_ipv4(skb, nla_data(nested_attr));
>                 break;
>
> +       case OVS_KEY_ATTR_IPV6:
> +               err = set_ipv6(skb, nla_data(nested_attr));
> +               break;
> +
>         case OVS_KEY_ATTR_TCP:
>                 err = set_tcp(skb, nla_data(nested_attr));
>                 break;
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 4c4b62c..fd4a6a4 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -479,6 +479,7 @@ static int validate_set(const struct nlattr *a,
>
>         switch (key_type) {
>         const struct ovs_key_ipv4 *ipv4_key;
> +       const struct ovs_key_ipv6 *ipv6_key;
>
>         case OVS_KEY_ATTR_PRIORITY:
>         case OVS_KEY_ATTR_ETHERNET:
> @@ -500,6 +501,25 @@ static int validate_set(const struct nlattr *a,
>
>                 break;
>
> +       case OVS_KEY_ATTR_IPV6:
> +               if (flow_key->eth.type != htons(ETH_P_IPV6))
> +                       return -EINVAL;
> +
> +               if (!flow_key->ip.proto)
> +                       return -EINVAL;
> +
> +               ipv6_key = nla_data(ovs_key);
> +               if (ipv6_key->ipv6_proto != flow_key->ip.proto)
> +                       return -EINVAL;
> +
> +               if (ipv6_key->ipv6_frag != flow_key->ip.frag)
> +                       return -EINVAL;
> +
> +               if (ntohl(ipv6_key->ipv6_label) & 0xFFF00000)
> +                       return -EINVAL;
> +
> +               break;
> +
>         case OVS_KEY_ATTR_TCP:
>                 if (flow_key->ip.proto != IPPROTO_TCP)
>                         return -EINVAL;
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

[parent not found: <CA+mtBx-Zf9FNf11H9RM12etHnJ1bPpM_Eyc4mR7E6xsb7sUP2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
       [not found]         ` <CA+mtBx-Zf9FNf11H9RM12etHnJ1bPpM_Eyc4mR7E6xsb7sUP2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-12 18:17           ` Jesse Gross
       [not found]             ` <CAEP_g=-1aWGsjR55AaD6sLLt4QzbYgUs-3hfNNONrrf8MDwSyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Jesse Gross @ 2012-12-12 18:17 UTC (permalink / raw)
  To: Tom Herbert
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Miller

On Tue, Dec 11, 2012 at 7:14 PM, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> This patch adds ipv6 set action functionality. It allows to change
>> traffic class, flow label, hop-limit, ipv6 source and destination
>> address fields.
>>
> I have to wonder about these patches and the underlying design
> direction.  Aren't these sort of things and more already implemented
> by IPtables but in a modular and extensible fashion?  Has there been
> any thought into hooking OVS to IP tables to leverage all the existing
> functionality?

At an implementation level, the goal is definitely to share as much
code as possible.  Some of that was obviously done to support this
patch and I'm sure there are more areas where it could be taken
further.

At a more conceptual level we've explored this path a number of times
and it's never been attractive since it has a tendency to drag more
OVS code into other parts of the kernel and generally make things
worse for everybody.  Of course, it's hard to say without knowing what
you're thinking.  Do you have a specific proposal?

^ permalink raw reply	[flat|nested] 18+ messages in thread

[parent not found: <CAEP_g=-1aWGsjR55AaD6sLLt4QzbYgUs-3hfNNONrrf8MDwSyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
       [not found]             ` <CAEP_g=-1aWGsjR55AaD6sLLt4QzbYgUs-3hfNNONrrf8MDwSyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-12 18:38               ` Tom Herbert
       [not found]                 ` <CA+mtBx-84PQoHmauNpN4vYLWXcJdESMMep849DQcUAjkmC7PXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2012-12-12 18:38 UTC (permalink / raw)
  To: Jesse Gross
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Miller, Mike Waychison

> At an implementation level, the goal is definitely to share as much
> code as possible.  Some of that was obviously done to support this
> patch and I'm sure there are more areas where it could be taken
> further.
>
> At a more conceptual level we've explored this path a number of times
> and it's never been attractive since it has a tendency to drag more
> OVS code into other parts of the kernel and generally make things
> worse for everybody.  Of course, it's hard to say without knowing what
> you're thinking.  Do you have a specific proposal?

Where is the line drawn?  Is the intent that over the next five years
that functionality will be added ad hoc increments to make OVS have
the same functionality as IP tables, tc, routing?  Are we going to
have things like NAT, stateful firewalls, DDOS mechanisms implemented
in OVS (we already have people proposing such things!).

^ permalink raw reply	[flat|nested] 18+ messages in thread

[parent not found: <CA+mtBx-84PQoHmauNpN4vYLWXcJdESMMep849DQcUAjkmC7PXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH net-next 4/7] openvswitch: add ipv6 'set' action
       [not found]                 ` <CA+mtBx-84PQoHmauNpN4vYLWXcJdESMMep849DQcUAjkmC7PXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-12 19:17                   ` Jesse Gross
  0 siblings, 0 replies; 18+ messages in thread
From: Jesse Gross @ 2012-12-12 19:17 UTC (permalink / raw)
  To: Tom Herbert
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	David Miller, Mike Waychison

On Wed, Dec 12, 2012 at 10:38 AM, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> At an implementation level, the goal is definitely to share as much
>> code as possible.  Some of that was obviously done to support this
>> patch and I'm sure there are more areas where it could be taken
>> further.
>>
>> At a more conceptual level we've explored this path a number of times
>> and it's never been attractive since it has a tendency to drag more
>> OVS code into other parts of the kernel and generally make things
>> worse for everybody.  Of course, it's hard to say without knowing what
>> you're thinking.  Do you have a specific proposal?
>
> Where is the line drawn?  Is the intent that over the next five years
> that functionality will be added ad hoc increments to make OVS have
> the same functionality as IP tables, tc, routing?  Are we going to
> have things like NAT, stateful firewalls, DDOS mechanisms implemented
> in OVS (we already have people proposing such things!).

Definitely no to all of the above. (As an aside, years ago there was
NAT functionality in a precursor to OVS.  Everybody hated it and was
very happy when it was removed, so I wouldn't worry about that type of
thing popping up in OVS any time soon.)

The design of OVS works pretty well for the types of stateless
operations that are currently implemented because those map nicely to
flows that userspace can use to program in a fairly clean and powerful
manner.  This is much less true for things like stateful rules, QoS,
DPI, etc. because you either want to look at more information than
would usually be considered a flow or have state that changes very
quickly.  In these cases, the data plane needs to take action on its
own and the interaction with userspace is more akin to configuration
than programming.

As these types of features come up, I think you will start to see more
integration with netfilter and other tools (in fact, there are several
examples of this already - OVS QoS uses tc, the ability to interact
with skb->mark was added recently, and Pravin has been doing a lot of
work to refactor and integrate with the upstream tunneling code).
There are some definite tradeoffs to doing it this way, mostly in the
area of state management, so I don't think that it's feasible to
switch wholesale over to this model.  However, if we're careful then I
think it's possible to get the best of both worlds.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 5/7] net: openvswitch: use this_cpu_ptr per-cpu helper
       [not found] ` <1354214149-33651-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
  2012-11-29 18:35   ` [PATCH net-next 4/7] openvswitch: add ipv6 'set' action Jesse Gross
@ 2012-11-29 18:35   ` Jesse Gross
  1 sibling, 0 replies; 18+ messages in thread
From: Jesse Gross @ 2012-11-29 18:35 UTC (permalink / raw)
  To: David Miller
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Shan Wei

From: Shan Wei <davidshan-1Nz4purKYjRBDgjK7y7TUQ@public.gmane.org>

just use more faster this_cpu_ptr instead of per_cpu_ptr(p, smp_processor_id());

Signed-off-by: Shan Wei <davidshan-1Nz4purKYjRBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c |    4 ++--
 net/openvswitch/vport.c    |    5 ++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index fd4a6a4..7b1d6d2 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -208,7 +208,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
 	int error;
 	int key_len;
 
-	stats = per_cpu_ptr(dp->stats_percpu, smp_processor_id());
+	stats = this_cpu_ptr(dp->stats_percpu);
 
 	/* Extract flow from 'skb' into 'key'. */
 	error = ovs_flow_extract(skb, p->port_no, &key, &key_len);
@@ -282,7 +282,7 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
 	return 0;
 
 err:
-	stats = per_cpu_ptr(dp->stats_percpu, smp_processor_id());
+	stats = this_cpu_ptr(dp->stats_percpu);
 
 	u64_stats_update_begin(&stats->sync);
 	stats->n_lost++;
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 03779e8..70af0be 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -333,8 +333,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
 {
 	struct vport_percpu_stats *stats;
 
-	stats = per_cpu_ptr(vport->percpu_stats, smp_processor_id());
-
+	stats = this_cpu_ptr(vport->percpu_stats);
 	u64_stats_update_begin(&stats->sync);
 	stats->rx_packets++;
 	stats->rx_bytes += skb->len;
@@ -359,7 +358,7 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
 	if (likely(sent)) {
 		struct vport_percpu_stats *stats;
 
-		stats = per_cpu_ptr(vport->percpu_stats, smp_processor_id());
+		stats = this_cpu_ptr(vport->percpu_stats);
 
 		u64_stats_update_begin(&stats->sync);
 		stats->tx_packets++;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 6/7] openvswitch: add skb mark matching and set action
  2012-11-29 18:35 [GIT net-next] Open vSwitch Jesse Gross
                   ` (3 preceding siblings ...)
       [not found] ` <1354214149-33651-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
@ 2012-11-29 18:35 ` Jesse Gross
  2012-11-29 18:35 ` [PATCH net-next 7/7] openvswitch: Use RCU callback when detaching netdevices Jesse Gross
  2012-11-30 17:03 ` [GIT net-next] Open vSwitch David Miller
  6 siblings, 0 replies; 18+ messages in thread
From: Jesse Gross @ 2012-11-29 18:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dev, Ansis Atteka

From: Ansis Atteka <aatteka@nicira.com>

This patch adds support for skb mark matching and set action.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 include/linux/openvswitch.h |    1 +
 net/openvswitch/actions.c   |    4 ++++
 net/openvswitch/datapath.c  |    3 +++
 net/openvswitch/flow.c      |   19 ++++++++++++++++++-
 net/openvswitch/flow.h      |    8 +++++---
 5 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index eb1efa5..d42e174 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -243,6 +243,7 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
 	OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
 	OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
+	OVS_KEY_ATTR_SKB_MARK,  /* u32 skb mark */
 	__OVS_KEY_ATTR_MAX
 };
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index a58ed27..ac2defe 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -428,6 +428,10 @@ static int execute_set_action(struct sk_buff *skb,
 		skb->priority = nla_get_u32(nested_attr);
 		break;
 
+	case OVS_KEY_ATTR_SKB_MARK:
+		skb->mark = nla_get_u32(nested_attr);
+		break;
+
 	case OVS_KEY_ATTR_ETHERNET:
 		err = set_eth_addr(skb, nla_data(nested_attr));
 		break;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 7b1d6d2..f996db3 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -482,6 +482,7 @@ static int validate_set(const struct nlattr *a,
 	const struct ovs_key_ipv6 *ipv6_key;
 
 	case OVS_KEY_ATTR_PRIORITY:
+	case OVS_KEY_ATTR_SKB_MARK:
 	case OVS_KEY_ATTR_ETHERNET:
 		break;
 
@@ -695,6 +696,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err_flow_free;
 
 	err = ovs_flow_metadata_from_nlattrs(&flow->key.phy.priority,
+					     &flow->key.phy.skb_mark,
 					     &flow->key.phy.in_port,
 					     a[OVS_PACKET_ATTR_KEY]);
 	if (err)
@@ -714,6 +716,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 
 	OVS_CB(packet)->flow = flow;
 	packet->priority = flow->key.phy.priority;
+	packet->mark = flow->key.phy.skb_mark;
 
 	rcu_read_lock();
 	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index e6ce902..c3294ce 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -604,6 +604,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
 
 	key->phy.priority = skb->priority;
 	key->phy.in_port = in_port;
+	key->phy.skb_mark = skb->mark;
 
 	skb_reset_mac_header(skb);
 
@@ -803,6 +804,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_ENCAP] = -1,
 	[OVS_KEY_ATTR_PRIORITY] = sizeof(u32),
 	[OVS_KEY_ATTR_IN_PORT] = sizeof(u32),
+	[OVS_KEY_ATTR_SKB_MARK] = sizeof(u32),
 	[OVS_KEY_ATTR_ETHERNET] = sizeof(struct ovs_key_ethernet),
 	[OVS_KEY_ATTR_VLAN] = sizeof(__be16),
 	[OVS_KEY_ATTR_ETHERTYPE] = sizeof(__be16),
@@ -988,6 +990,10 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 	} else {
 		swkey->phy.in_port = DP_MAX_PORTS;
 	}
+	if (attrs & (1 << OVS_KEY_ATTR_SKB_MARK)) {
+		swkey->phy.skb_mark = nla_get_u32(a[OVS_KEY_ATTR_SKB_MARK]);
+		attrs &= ~(1 << OVS_KEY_ATTR_SKB_MARK);
+	}
 
 	/* Data attributes. */
 	if (!(attrs & (1 << OVS_KEY_ATTR_ETHERNET)))
@@ -1115,6 +1121,8 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 
 /**
  * ovs_flow_metadata_from_nlattrs - parses Netlink attributes into a flow key.
+ * @priority: receives the skb priority
+ * @mark: receives the skb mark
  * @in_port: receives the extracted input port.
  * @key: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute
  * sequence.
@@ -1124,7 +1132,7 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
  * get the metadata, that is, the parts of the flow key that cannot be
  * extracted from the packet itself.
  */
-int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
+int ovs_flow_metadata_from_nlattrs(u32 *priority, u32 *mark, u16 *in_port,
 			       const struct nlattr *attr)
 {
 	const struct nlattr *nla;
@@ -1132,6 +1140,7 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
 
 	*in_port = DP_MAX_PORTS;
 	*priority = 0;
+	*mark = 0;
 
 	nla_for_each_nested(nla, attr, rem) {
 		int type = nla_type(nla);
@@ -1150,6 +1159,10 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
 					return -EINVAL;
 				*in_port = nla_get_u32(nla);
 				break;
+
+			case OVS_KEY_ATTR_SKB_MARK:
+				*mark = nla_get_u32(nla);
+				break;
 			}
 		}
 	}
@@ -1171,6 +1184,10 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
 	    nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, swkey->phy.in_port))
 		goto nla_put_failure;
 
+	if (swkey->phy.skb_mark &&
+	    nla_put_u32(skb, OVS_KEY_ATTR_SKB_MARK, swkey->phy.skb_mark))
+		goto nla_put_failure;
+
 	nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
 	if (!nla)
 		goto nla_put_failure;
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 14a324e..a7bb60f 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -43,6 +43,7 @@ struct sw_flow_actions {
 struct sw_flow_key {
 	struct {
 		u32	priority;	/* Packet QoS priority. */
+		u32	skb_mark;	/* SKB mark. */
 		u16	in_port;	/* Input switch port (or DP_MAX_PORTS). */
 	} phy;
 	struct {
@@ -144,6 +145,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
  *                         ------  ---  ------  -----
  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
  *  OVS_KEY_ATTR_IN_PORT       4    --     4      8
+ *  OVS_KEY_ATTR_SKB_MARK      4    --     4      8
  *  OVS_KEY_ATTR_ETHERNET     12    --     4     16
  *  OVS_KEY_ATTR_ETHERTYPE     2     2     4      8  (outer VLAN ethertype)
  *  OVS_KEY_ATTR_8021Q         4    --     4      8
@@ -153,14 +155,14 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
  *  OVS_KEY_ATTR_ICMPV6        2     2     4      8
  *  OVS_KEY_ATTR_ND           28    --     4     32
  *  -------------------------------------------------
- *  total                                       144
+ *  total                                       152
  */
-#define FLOW_BUFSIZE 144
+#define FLOW_BUFSIZE 152
 
 int ovs_flow_to_nlattrs(const struct sw_flow_key *, struct sk_buff *);
 int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 		      const struct nlattr *);
-int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
+int ovs_flow_metadata_from_nlattrs(u32 *priority, u32 *mark, u16 *in_port,
 			       const struct nlattr *);
 
 #define MAX_ACTIONS_BUFSIZE    (16 * 1024)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 7/7] openvswitch: Use RCU callback when detaching netdevices.
  2012-11-29 18:35 [GIT net-next] Open vSwitch Jesse Gross
                   ` (4 preceding siblings ...)
  2012-11-29 18:35 ` [PATCH net-next 6/7] openvswitch: add skb mark matching and set action Jesse Gross
@ 2012-11-29 18:35 ` Jesse Gross
  2012-11-30 17:03 ` [GIT net-next] Open vSwitch David Miller
  6 siblings, 0 replies; 18+ messages in thread
From: Jesse Gross @ 2012-11-29 18:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, dev, Justin Pettit

Currently, each time a device is detached from an OVS datapath
we call synchronize RCU before freeing associated data structures.
However, if a bridge is deleted (which detaches all ports) when
many devices are connected then there can be a long delay.  This
switches to use call_rcu() to group the cost together.

Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 net/openvswitch/vport-netdev.c |   14 ++++++++++----
 net/openvswitch/vport-netdev.h |    3 +++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index a903348..a9327e2 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -114,6 +114,15 @@ error:
 	return ERR_PTR(err);
 }
 
+static void free_port_rcu(struct rcu_head *rcu)
+{
+	struct netdev_vport *netdev_vport = container_of(rcu,
+					struct netdev_vport, rcu);
+
+	dev_put(netdev_vport->dev);
+	ovs_vport_free(vport_from_priv(netdev_vport));
+}
+
 static void netdev_destroy(struct vport *vport)
 {
 	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
@@ -122,10 +131,7 @@ static void netdev_destroy(struct vport *vport)
 	netdev_rx_handler_unregister(netdev_vport->dev);
 	dev_set_promiscuity(netdev_vport->dev, -1);
 
-	synchronize_rcu();
-
-	dev_put(netdev_vport->dev);
-	ovs_vport_free(vport);
+	call_rcu(&netdev_vport->rcu, free_port_rcu);
 }
 
 const char *ovs_netdev_get_name(const struct vport *vport)
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index f7072a2..6478079 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -20,12 +20,15 @@
 #define VPORT_NETDEV_H 1
 
 #include <linux/netdevice.h>
+#include <linux/rcupdate.h>
 
 #include "vport.h"
 
 struct vport *ovs_netdev_get_vport(struct net_device *dev);
 
 struct netdev_vport {
+	struct rcu_head rcu;
+
 	struct net_device *dev;
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [GIT net-next] Open vSwitch
  2012-11-29 18:35 [GIT net-next] Open vSwitch Jesse Gross
                   ` (5 preceding siblings ...)
  2012-11-29 18:35 ` [PATCH net-next 7/7] openvswitch: Use RCU callback when detaching netdevices Jesse Gross
@ 2012-11-30 17:03 ` David Miller
  6 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2012-11-30 17:03 UTC (permalink / raw)
  To: jesse; +Cc: netdev, dev

From: Jesse Gross <jesse@nicira.com>
Date: Thu, 29 Nov 2012 10:35:42 -0800

> This series of improvements for 3.8/net-next contains four components:
>  * Support for modifying IPv6 headers
>  * Support for matching and setting skb->mark for better integration with
>    things like iptables
>  * Ability to recognize the EtherType for RARP packets
>  * Two small performance enhancements
> 
> The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge
> conflicts.  I left it as is but can do the merge if you want.  The conflicts
> are:
>  * ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of
>    exthdrs_core.c.  Both should stay.
>  * A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c
>    after this patch.  The IPVS user has two instances of the old constant
>    name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG.

Pulled, thanks Jesse.

The merge conflict directions were particularly helpful.

If you ever do the merge yourself (I'm ambivalent about where you or I
do it), make sure you force the merge commit message to have a
description of the conflict resolution similarly to what you provided
here.

Thanks again.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-12-12 19:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-29 18:35 [GIT net-next] Open vSwitch Jesse Gross
2012-11-29 18:35 ` [PATCH net-next 1/7] openvswitch: Process RARP packets with ethertype 0x8035 similar to ARP packets Jesse Gross
2012-11-29 18:35 ` [PATCH net-next 2/7] ipv6: Move ipv6_find_hdr() out of Netfilter code Jesse Gross
2012-11-29 18:35 ` [PATCH net-next 3/7] ipv6: improve ipv6_find_hdr() to skip empty routing headers Jesse Gross
2012-12-03 14:04   ` Pablo Neira Ayuso
2012-12-03 17:28     ` Jesse Gross
2012-12-03 18:06       ` Pablo Neira Ayuso
2012-12-04 18:15         ` Jesse Gross
2012-12-04 20:47           ` Ansis Atteka
     [not found] ` <1354214149-33651-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
2012-11-29 18:35   ` [PATCH net-next 4/7] openvswitch: add ipv6 'set' action Jesse Gross
     [not found]     ` <1354214149-33651-5-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
2012-12-12  3:14       ` Tom Herbert
     [not found]         ` <CA+mtBx-Zf9FNf11H9RM12etHnJ1bPpM_Eyc4mR7E6xsb7sUP2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-12 18:17           ` Jesse Gross
     [not found]             ` <CAEP_g=-1aWGsjR55AaD6sLLt4QzbYgUs-3hfNNONrrf8MDwSyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-12 18:38               ` Tom Herbert
     [not found]                 ` <CA+mtBx-84PQoHmauNpN4vYLWXcJdESMMep849DQcUAjkmC7PXQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-12 19:17                   ` Jesse Gross
2012-11-29 18:35   ` [PATCH net-next 5/7] net: openvswitch: use this_cpu_ptr per-cpu helper Jesse Gross
2012-11-29 18:35 ` [PATCH net-next 6/7] openvswitch: add skb mark matching and set action Jesse Gross
2012-11-29 18:35 ` [PATCH net-next 7/7] openvswitch: Use RCU callback when detaching netdevices Jesse Gross
2012-11-30 17:03 ` [GIT net-next] Open vSwitch David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).