[PATCH RFC 0/9] vti4: prepare namespace and interfamily support.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 0/9] vti4: prepare namespace and interfamily support.
@ 2013-12-05 12:00 Steffen Klassert
  2013-12-05 12:01 ` [PATCH RFC 1/9] xfrm4: Add IPsec protocol multiplexer Steffen Klassert
                   ` (11 more replies)
  0 siblings, 12 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:00 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

This patchset prepares vti4 for proper namespace and interfamily support.
It is based on the net tree because net-next is still too far behind the
mainline.

Currently the receive hook is in the middle of the decapsulation
process, some of the header pointers point still into the IPsec packet
others point already into the decapsulated packet. This makes it
very unflexible and proper namespace and interfamily support can't
be done as it is.

The patchset that implements an IPsec protocol multiplexer, so that vti
can register it's own receive path hooks. Further it makes the i_key
usable for vti and changes the vti4 code to do the following:

vti uses the IPsec protocol multiplexer to register it's
own receive side hooks for ESP and AH.

Vti does the following on receive side:

1. Do an input policy check for the IPsec packet we received.
   This is required because this packet could be already
   processed by IPsec (tunnel in tunnel or a block policy
   is present), so an inbound policy check is needed.

2. Clean the skb to not leak informations on namespace
   transitions.

3. Mark the packet with the i_key. The policy and the state
   must match this key now. Policy and state belong to the vti
   namespace and policy enforcement is done at the further layers.

4. Call the generic xfrm layer to do decryption and decapsulation.

5. Wait for a callback from the xfrm layer to properly update
   the device statistics.

On transmit side:

1. Mark the packet with the o_key. The policy and the state
   must match this key now.

2. Do a xfrm_lookup on the original packet with the mark applied.

3. Check if we got an IPsec route.

4. Clean the skb to not leak informations on namespace
   transitions.

5. Attach the dst_enty we got from the xfrm_lookup to the skb.

6. Call dst_output to do the IPsec processing.

7. Do the device statistics.

It has not much testing so far. I can setup the vti tunnels and ping through
it. That's all I tried so test carefully ;-).

I'll start to care about the ipv6 side after I've got some feedback on this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 1/9] xfrm4: Add IPsec protocol multiplexer
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
@ 2013-12-05 12:01 ` Steffen Klassert
  2013-12-05 12:01 ` [PATCH RFC 2/9] esp4: Use the IPsec protocol multiplexer API Steffen Klassert
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:01 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

This patch add an IPsec protocol multiplexer. With this
it is possible to add alternative protocol handlers as
needed for IPsec virtual tunnel interfaces.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h        |   14 ++++
 net/ipv4/Makefile         |    2 +-
 net/ipv4/xfrm4_protocol.c |  205 +++++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_input.c     |    9 +-
 4 files changed, 227 insertions(+), 3 deletions(-)
 create mode 100644 net/ipv4/xfrm4_protocol.c

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 6b82fdf..7f71462 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1348,6 +1348,17 @@ struct xfrm_algo_desc {
 	struct sadb_alg desc;
 };
 
+
+/* XFRM protocol handlers.  */
+struct xfrm4_protocol {
+	int (*handler)(struct sk_buff *skb);
+	int (*cb_handler)(struct sk_buff *skb, int err);
+	int (*err_handler)(struct sk_buff *skb, u32 info);
+
+	struct xfrm4_protocol __rcu *next;
+	int priority;
+};
+
 /* XFRM tunnel handlers.  */
 struct xfrm_tunnel {
 	int (*handler)(struct sk_buff *skb);
@@ -1503,6 +1514,9 @@ int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 int xfrm4_output(struct sk_buff *skb);
 int xfrm4_output_finish(struct sk_buff *skb);
+void xfrm4_rcv_cb(struct sk_buff *skb, u8 protocol, int err);
+int xfrm4_protocol_register(struct xfrm4_protocol *handler, unsigned char protocol);
+int xfrm4_protocol_deregister(struct xfrm4_protocol *handler, unsigned char protocol);
 int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 4b81e91..4a73d5c 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -55,4 +55,4 @@ obj-$(CONFIG_MEMCG_KMEM) += tcp_memcontrol.o
 obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
 
 obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \
-		      xfrm4_output.o
+		      xfrm4_output.o xfrm4_protocol.o
diff --git a/net/ipv4/xfrm4_protocol.c b/net/ipv4/xfrm4_protocol.c
new file mode 100644
index 0000000..8a9e0d7
--- /dev/null
+++ b/net/ipv4/xfrm4_protocol.c
@@ -0,0 +1,205 @@
+/* xfrm4_protocol.c - Generic xfrm protocol multiplexer.
+ *
+ * Copyright (C) 2013 secunet Security Networks AG
+ *
+ * Author:
+ * Steffen Klassert <steffen.klassert@secunet.com>
+ *
+ * Based on:
+ * net/ipv4/tunnel4.c
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/init.h>
+#include <linux/mutex.h>
+#include <linux/skbuff.h>
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/protocol.h>
+#include <net/xfrm.h>
+
+static struct xfrm4_protocol __rcu *esp4_handlers __read_mostly;
+static struct xfrm4_protocol __rcu *ah4_handlers __read_mostly;
+static DEFINE_MUTEX(xfrm4_protocol_mutex);
+
+static inline struct xfrm4_protocol __rcu **proto_handlers(u8 protocol)
+{
+	switch (protocol) {
+	case IPPROTO_ESP:
+		return &esp4_handlers;
+	case IPPROTO_AH:
+		return &ah4_handlers;
+	}
+
+	return NULL;
+}
+
+#define for_each_protocol_rcu(head, handler)		\
+	for (handler = rcu_dereference(head);		\
+	     handler != NULL;				\
+	     handler = rcu_dereference(handler->next))	\
+
+void xfrm4_rcv_cb(struct sk_buff *skb, u8 protocol, int err)
+{
+	struct xfrm4_protocol *handler;
+
+	for_each_protocol_rcu(*proto_handlers(protocol), handler)
+		if (!handler->cb_handler(skb, err))
+			return;
+}
+EXPORT_SYMBOL(xfrm4_rcv_cb);
+
+static int xfrm4_esp_rcv(struct sk_buff *skb)
+{
+	struct xfrm4_protocol *handler;
+
+	for_each_protocol_rcu(esp4_handlers, handler)
+		if (!handler->handler(skb))
+			return 0;
+
+	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
+
+	kfree_skb(skb);
+	return 0;
+}
+
+static void xfrm4_esp_err(struct sk_buff *skb, u32 info)
+{
+	struct xfrm4_protocol *handler;
+
+	for_each_protocol_rcu(esp4_handlers, handler)
+		if (!handler->err_handler(skb, info))
+			break;
+}
+
+static int xfrm4_ah_rcv(struct sk_buff *skb)
+{
+	struct xfrm4_protocol *handler;
+
+	for_each_protocol_rcu(esp4_handlers, handler)
+		if (!handler->handler(skb))
+			return 0;
+
+	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
+
+	kfree_skb(skb);
+	return 0;
+}
+
+static void xfrm4_ah_err(struct sk_buff *skb, u32 info)
+{
+	struct xfrm4_protocol *handler;
+
+	for_each_protocol_rcu(ah4_handlers, handler)
+		if (!handler->err_handler(skb, info))
+			break;
+}
+
+static const struct net_protocol esp4_protocol = {
+	.handler	=	xfrm4_esp_rcv,
+	.err_handler	=	xfrm4_esp_err,
+	.no_policy	=	1,
+	.netns_ok	=	1,
+};
+
+static const struct net_protocol ah4_protocol = {
+	.handler	=	xfrm4_ah_rcv,
+	.err_handler	=	xfrm4_ah_err,
+	.no_policy	=	1,
+	.netns_ok	=	1,
+};
+
+static inline const struct net_protocol *netproto(unsigned char protocol)
+{
+	switch (protocol) {
+	case IPPROTO_ESP:
+		return &esp4_protocol;
+	case IPPROTO_AH:
+		return &ah4_protocol;
+	}
+
+	return NULL;
+}
+
+int xfrm4_protocol_register(struct xfrm4_protocol *handler,
+			    unsigned char protocol)
+{
+	struct xfrm4_protocol __rcu **pprev;
+	struct xfrm4_protocol *t;
+	bool add_netproto = false;
+
+	int ret = -EEXIST;
+	int priority = handler->priority;
+
+	if (!rcu_dereference(*proto_handlers(protocol)))
+		add_netproto = true;
+
+	mutex_lock(&xfrm4_protocol_mutex);
+
+	for (pprev = proto_handlers(protocol);
+	     (t = rcu_dereference_protected(*pprev,
+			lockdep_is_held(&xfrm4_protocol_mutex))) != NULL;
+	     pprev = &t->next) {
+		if (t->priority < priority)
+			break;
+		if (t->priority == priority)
+			goto err;
+	}
+
+	handler->next = *pprev;
+	rcu_assign_pointer(*pprev, handler);
+
+	ret = 0;
+
+err:
+	mutex_unlock(&xfrm4_protocol_mutex);
+
+	if (add_netproto) {
+		if (inet_add_protocol(netproto(protocol), protocol)) {
+			pr_err("%s: can't add protocol\n", __func__);
+			ret = -EAGAIN;
+		}
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(xfrm4_protocol_register);
+
+int xfrm4_protocol_deregister(struct xfrm4_protocol *handler,
+			      unsigned char protocol)
+{
+	struct xfrm4_protocol __rcu **pprev;
+	struct xfrm4_protocol *t;
+	int ret = -ENOENT;
+
+	mutex_lock(&xfrm4_protocol_mutex);
+
+	for (pprev = proto_handlers(protocol);
+	     (t = rcu_dereference_protected(*pprev,
+			lockdep_is_held(&xfrm4_protocol_mutex))) != NULL;
+	     pprev = &t->next) {
+		if (t == handler) {
+			*pprev = handler->next;
+			ret = 0;
+			break;
+		}
+	}
+
+	mutex_unlock(&xfrm4_protocol_mutex);
+
+	if (!rcu_dereference(*proto_handlers(protocol))) {
+		if (inet_del_protocol(netproto(protocol), protocol) < 0) {
+			pr_err("%s: can't remove protocol\n", __func__);
+			ret = -EAGAIN;
+		}
+	}
+
+	synchronize_net();
+
+	return ret;
+}
+EXPORT_SYMBOL(xfrm4_protocol_deregister);
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 8884399..dc3066e 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -108,7 +108,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 	int err;
 	__be32 seq;
 	__be32 seq_hi;
-	struct xfrm_state *x;
+	struct xfrm_state *x = NULL;
 	xfrm_address_t *daddr;
 	struct xfrm_mode *inner_mode;
 	unsigned int family;
@@ -199,8 +199,10 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 		nexthdr = x->type->input(x, skb);
 
-		if (nexthdr == -EINPROGRESS)
+		if (nexthdr == -EINPROGRESS) {
+			xfrm4_rcv_cb(skb, x->type->proto, nexthdr);
 			return 0;
+		}
 
 resume:
 		spin_lock(&x->lock);
@@ -263,6 +265,8 @@ resume:
 		}
 	} while (!err);
 
+	xfrm4_rcv_cb(skb, x->type->proto, 0);
+
 	nf_reset(skb);
 
 	if (decaps) {
@@ -276,6 +280,7 @@ resume:
 drop_unlock:
 	spin_unlock(&x->lock);
 drop:
+	xfrm4_rcv_cb(skb, x ? x->type->proto : nexthdr, -1);
 	kfree_skb(skb);
 	return 0;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 2/9] esp4: Use the IPsec protocol multiplexer API
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
  2013-12-05 12:01 ` [PATCH RFC 1/9] xfrm4: Add IPsec protocol multiplexer Steffen Klassert
@ 2013-12-05 12:01 ` Steffen Klassert
  2013-12-05 12:02 ` [PATCH RFC 3/9] esp4: Export esp4_err Steffen Klassert
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:01 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

Switch esp4 to use the new IPsec protocol multiplexer.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/esp4.c |   25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 7785b28..d1a4ccd 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -473,7 +473,7 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu)
 		 net_adj) & ~(blksize - 1)) + net_adj - 2;
 }
 
-static void esp4_err(struct sk_buff *skb, u32 info)
+static int esp4_err(struct sk_buff *skb, u32 info)
 {
 	struct net *net = dev_net(skb->dev);
 	const struct iphdr *iph = (const struct iphdr *)skb->data;
@@ -483,23 +483,25 @@ static void esp4_err(struct sk_buff *skb, u32 info)
 	switch (icmp_hdr(skb)->type) {
 	case ICMP_DEST_UNREACH:
 		if (icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
-			return;
+			return 0;
 	case ICMP_REDIRECT:
 		break;
 	default:
-		return;
+		return 0;
 	}
 
 	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
 			      esph->spi, IPPROTO_ESP, AF_INET);
 	if (!x)
-		return;
+		return 0;
 
 	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
 		ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_ESP, 0);
 	else
 		ipv4_redirect(skb, net, 0, 0, IPPROTO_ESP, 0);
 	xfrm_state_put(x);
+
+	return 0;
 }
 
 static void esp_destroy(struct xfrm_state *x)
@@ -672,6 +674,11 @@ error:
 	return err;
 }
 
+static int esp4_rcv_cb(struct sk_buff *skb, int err)
+{
+	return 0;
+}
+
 static const struct xfrm_type esp_type =
 {
 	.description	= "ESP4",
@@ -685,11 +692,11 @@ static const struct xfrm_type esp_type =
 	.output		= esp_output
 };
 
-static const struct net_protocol esp4_protocol = {
+static struct xfrm4_protocol esp4_protocol = {
 	.handler	=	xfrm4_rcv,
+	.cb_handler	=	esp4_rcv_cb,
 	.err_handler	=	esp4_err,
-	.no_policy	=	1,
-	.netns_ok	=	1,
+	.priority	=	0,
 };
 
 static int __init esp4_init(void)
@@ -698,7 +705,7 @@ static int __init esp4_init(void)
 		pr_info("%s: can't add xfrm type\n", __func__);
 		return -EAGAIN;
 	}
-	if (inet_add_protocol(&esp4_protocol, IPPROTO_ESP) < 0) {
+	if (xfrm4_protocol_register(&esp4_protocol, IPPROTO_ESP) < 0) {
 		pr_info("%s: can't add protocol\n", __func__);
 		xfrm_unregister_type(&esp_type, AF_INET);
 		return -EAGAIN;
@@ -708,7 +715,7 @@ static int __init esp4_init(void)
 
 static void __exit esp4_fini(void)
 {
-	if (inet_del_protocol(&esp4_protocol, IPPROTO_ESP) < 0)
+	if (xfrm4_protocol_deregister(&esp4_protocol, IPPROTO_ESP) < 0)
 		pr_info("%s: can't remove protocol\n", __func__);
 	if (xfrm_unregister_type(&esp_type, AF_INET) < 0)
 		pr_info("%s: can't remove xfrm type\n", __func__);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 3/9] esp4: Export esp4_err
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
  2013-12-05 12:01 ` [PATCH RFC 1/9] xfrm4: Add IPsec protocol multiplexer Steffen Klassert
  2013-12-05 12:01 ` [PATCH RFC 2/9] esp4: Use the IPsec protocol multiplexer API Steffen Klassert
@ 2013-12-05 12:02 ` Steffen Klassert
  2013-12-05 12:02 ` [PATCH RFC 4/9] ah4: Use the IPsec protocol multiplexer API Steffen Klassert
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:02 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

esp4_err can be shared with the upcomming vti esp handler.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/esp.h |    2 ++
 net/ipv4/esp4.c   |    3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/esp.h b/include/net/esp.h
index a43be85..e239597 100644
--- a/include/net/esp.h
+++ b/include/net/esp.h
@@ -5,6 +5,8 @@
 
 struct ip_esp_hdr;
 
+int esp4_err(struct sk_buff *skb, u32 info);
+
 static inline struct ip_esp_hdr *ip_esp_hdr(const struct sk_buff *skb)
 {
 	return (struct ip_esp_hdr *)skb_transport_header(skb);
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index d1a4ccd..1e72fe0 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -473,7 +473,7 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu)
 		 net_adj) & ~(blksize - 1)) + net_adj - 2;
 }
 
-static int esp4_err(struct sk_buff *skb, u32 info)
+int esp4_err(struct sk_buff *skb, u32 info)
 {
 	struct net *net = dev_net(skb->dev);
 	const struct iphdr *iph = (const struct iphdr *)skb->data;
@@ -503,6 +503,7 @@ static int esp4_err(struct sk_buff *skb, u32 info)
 
 	return 0;
 }
+EXPORT_SYMBOL(esp4_err);
 
 static void esp_destroy(struct xfrm_state *x)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 4/9] ah4: Use the IPsec protocol multiplexer API
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (2 preceding siblings ...)
  2013-12-05 12:02 ` [PATCH RFC 3/9] esp4: Export esp4_err Steffen Klassert
@ 2013-12-05 12:02 ` Steffen Klassert
  2013-12-05 12:03 ` [PATCH RFC 5/9] ah4: Export ah4_err Steffen Klassert
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:02 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

Switch ah4 to use the new IPsec protocol multiplexer.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ah4.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 7179026..ea30e4e 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -397,7 +397,7 @@ out:
 	return err;
 }
 
-static void ah4_err(struct sk_buff *skb, u32 info)
+static int ah4_err(struct sk_buff *skb, u32 info)
 {
 	struct net *net = dev_net(skb->dev);
 	const struct iphdr *iph = (const struct iphdr *)skb->data;
@@ -407,23 +407,25 @@ static void ah4_err(struct sk_buff *skb, u32 info)
 	switch (icmp_hdr(skb)->type) {
 	case ICMP_DEST_UNREACH:
 		if (icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
-			return;
+			return 0;
 	case ICMP_REDIRECT:
 		break;
 	default:
-		return;
+		return 0;
 	}
 
 	x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
 			      ah->spi, IPPROTO_AH, AF_INET);
 	if (!x)
-		return;
+		return 0;
 
 	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
 		ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_AH, 0);
 	else
 		ipv4_redirect(skb, net, 0, 0, IPPROTO_AH, 0);
 	xfrm_state_put(x);
+
+	return 0;
 }
 
 static int ah_init_state(struct xfrm_state *x)
@@ -505,6 +507,10 @@ static void ah_destroy(struct xfrm_state *x)
 	kfree(ahp);
 }
 
+static int ah4_rcv_cb(struct sk_buff *skb, int err)
+{
+	return 0;
+}
 
 static const struct xfrm_type ah_type =
 {
@@ -518,11 +524,11 @@ static const struct xfrm_type ah_type =
 	.output		= ah_output
 };
 
-static const struct net_protocol ah4_protocol = {
+static struct xfrm4_protocol ah4_protocol = {
 	.handler	=	xfrm4_rcv,
+	.cb_handler	=	ah4_rcv_cb,
 	.err_handler	=	ah4_err,
-	.no_policy	=	1,
-	.netns_ok	=	1,
+	.priority	=	0,
 };
 
 static int __init ah4_init(void)
@@ -531,7 +537,7 @@ static int __init ah4_init(void)
 		pr_info("%s: can't add xfrm type\n", __func__);
 		return -EAGAIN;
 	}
-	if (inet_add_protocol(&ah4_protocol, IPPROTO_AH) < 0) {
+	if (xfrm4_protocol_register(&ah4_protocol, IPPROTO_AH) < 0) {
 		pr_info("%s: can't add protocol\n", __func__);
 		xfrm_unregister_type(&ah_type, AF_INET);
 		return -EAGAIN;
@@ -541,7 +547,7 @@ static int __init ah4_init(void)
 
 static void __exit ah4_fini(void)
 {
-	if (inet_del_protocol(&ah4_protocol, IPPROTO_AH) < 0)
+	if (xfrm4_protocol_deregister(&ah4_protocol, IPPROTO_AH) < 0)
 		pr_info("%s: can't remove protocol\n", __func__);
 	if (xfrm_unregister_type(&ah_type, AF_INET) < 0)
 		pr_info("%s: can't remove xfrm type\n", __func__);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 5/9] ah4: Export ah4_err
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (3 preceding siblings ...)
  2013-12-05 12:02 ` [PATCH RFC 4/9] ah4: Use the IPsec protocol multiplexer API Steffen Klassert
@ 2013-12-05 12:03 ` Steffen Klassert
  2013-12-05 12:03 ` [PATCH RFC 6/9] xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer Steffen Klassert
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:03 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

ah4_err can be shared with the upcomming vti ah handler.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/ah.h |    2 ++
 net/ipv4/ah4.c   |    3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/ah.h b/include/net/ah.h
index ca95b98..6aa0f43 100644
--- a/include/net/ah.h
+++ b/include/net/ah.h
@@ -6,6 +6,8 @@
 /* This is the maximum truncated ICV length that we know of. */
 #define MAX_AH_AUTH_LEN	64
 
+int ah4_err(struct sk_buff *skb, u32 info);
+
 struct crypto_ahash;
 
 struct ah_data {
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index ea30e4e..4374671 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -397,7 +397,7 @@ out:
 	return err;
 }
 
-static int ah4_err(struct sk_buff *skb, u32 info)
+int ah4_err(struct sk_buff *skb, u32 info)
 {
 	struct net *net = dev_net(skb->dev);
 	const struct iphdr *iph = (const struct iphdr *)skb->data;
@@ -427,6 +427,7 @@ static int ah4_err(struct sk_buff *skb, u32 info)
 
 	return 0;
 }
+EXPORT_SYMBOL(ah4_err);
 
 static int ah_init_state(struct xfrm_state *x)
 {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 6/9] xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (4 preceding siblings ...)
  2013-12-05 12:03 ` [PATCH RFC 5/9] ah4: Export ah4_err Steffen Klassert
@ 2013-12-05 12:03 ` Steffen Klassert
  2013-12-05 12:04 ` [PATCH RFC 7/9] ip_tunnel: Make vti work with i_key set Steffen Klassert
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:03 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

IPsec vti_rcv needs to remind the tunnel pointer to
check it later at the vti_rcv_cb callback. So add
this pointer to the IPsec common buffer.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h |   26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 7f71462..8295b4c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -600,16 +600,24 @@ struct xfrm_mgr {
 int xfrm_register_km(struct xfrm_mgr *km);
 int xfrm_unregister_km(struct xfrm_mgr *km);
 
+struct xfrm_tunnel_skb_cb {
+	union {
+		struct inet_skb_parm h4;
+		struct inet6_skb_parm h6;
+        } header;
+
+	struct ip_tunnel *tunnel;
+};
+
+#define XFRM_TUNNEL_SKB_CB(__skb) ((struct xfrm_tunnel_skb_cb *)&((__skb)->cb[0]))
+
 /*
  * This structure is used for the duration where packets are being
  * transformed by IPsec.  As soon as the packet leaves IPsec the
  * area beyond the generic IP part may be overwritten.
  */
 struct xfrm_skb_cb {
-	union {
-		struct inet_skb_parm h4;
-		struct inet6_skb_parm h6;
-        } header;
+	struct xfrm_tunnel_skb_cb header;
 
         /* Sequence number for replay protection. */
 	union {
@@ -631,10 +639,7 @@ struct xfrm_skb_cb {
  * to transmit header information to the mode input/output functions.
  */
 struct xfrm_mode_skb_cb {
-	union {
-		struct inet_skb_parm h4;
-		struct inet6_skb_parm h6;
-	} header;
+	struct xfrm_tunnel_skb_cb header;
 
 	/* Copied from header for IPv4, always set to zero and DF for IPv6. */
 	__be16 id;
@@ -666,10 +671,7 @@ struct xfrm_mode_skb_cb {
  * related information.
  */
 struct xfrm_spi_skb_cb {
-	union {
-		struct inet_skb_parm h4;
-		struct inet6_skb_parm h6;
-	} header;
+	struct xfrm_tunnel_skb_cb header;
 
 	unsigned int daddroff;
 	unsigned int family;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 7/9] ip_tunnel: Make vti work with i_key set
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (5 preceding siblings ...)
  2013-12-05 12:03 ` [PATCH RFC 6/9] xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer Steffen Klassert
@ 2013-12-05 12:04 ` Steffen Klassert
  2013-12-05 12:05 ` [PATCH RFC 8/9] vti: Update the ipv4 side to use it's own receive hook Steffen Klassert
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:04 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

Vti uses the o_key to mark packets that were transmitted or received
by a vti interface. Unfortunately we can't apply different marks
to in and outbound packets with only one key availabe. Vti interfaces
typically use wildcard selectors for vti IPsec policies. On forwarding,
the same output policy will match for both directions. This generates
a loop between the IPsec gateways until the ttl of the packet is
exceeded.

The gre i_key/o_key are usually there to find the right gre tunnel
during a lookup. When vti uses the i_key to mark packets, the tunnel
lookup does not work any more because vti does not use the gre keys
as a hash key for the lookup.

This patch workarounds this my not including the i_key when comupting
the hash for the tunnel lookup in case of vti tunnels.

With this we have separate keys available for the transmitting and
receiving side of the vti interface.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_tunnel.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 90ff957..4bc7a6e 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -228,13 +228,17 @@ static struct hlist_head *ip_bucket(struct ip_tunnel_net *itn,
 {
 	unsigned int h;
 	__be32 remote;
+	__be32 i_key = parms->i_key;

 	if (parms->iph.daddr && !ipv4_is_multicast(parms->iph.daddr))
 		remote = parms->iph.daddr;
 	else
 		remote = 0;

-	h = ip_tunnel_hash(itn, parms->i_key, remote);
+	if (!(parms->i_flags & TUNNEL_KEY) && (parms->i_flags & VTI_ISVTI))
+		i_key = 0;
+
+	h = ip_tunnel_hash(itn, i_key, remote);
 	return &itn->tunnels[h];
 }

-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 8/9] vti: Update the ipv4 side to use it's own receive hook.
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (6 preceding siblings ...)
  2013-12-05 12:04 ` [PATCH RFC 7/9] ip_tunnel: Make vti work with i_key set Steffen Klassert
@ 2013-12-05 12:05 ` Steffen Klassert
  2013-12-12 16:26   ` Nicolas Dichtel
  2013-12-05 12:05 ` [PATCH RFC 9/9] xfrm4: Remove xfrm_tunnel_notifier Steffen Klassert
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:05 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

With this patch, vti uses the IPsec protocol multiplexer to
register it's own receive side hooks for ESP and AH.

Vti now does the following on receive side:

1. Do an input policy check for the IPsec packet we received.
   This is required because this packet could be already
   prosecces by IPsec, so an inbuond policy check is needed.

2. Clean the skb to not leak informations on namespace
   transitions.

3. Mark the packet with the i_key. The policy and the state
   must match this key now. Policy and state belong to the vti
   namespace and policy enforcement is done at the further layers.

4. Call the generic xfrm layer to do decryption and decapsulation.

5. Wait for a callback from the xfrm layer to properly update
   the device statistics.

On transmit side:

1. Mark the packet with the o_key. The policy and the state
   must match this key now.

2. Do a xfrm_lookup on the original packet with the mark applied.

3. Check if we got an IPsec route.

4. Clean the skb to not leak informations on namespace
   transitions.

5. Attach the dst_enty we got from the xfrm_lookup to the skb.

6. Call dst_output to do the IPsec processing.

7. Do the device statistics.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_vti.c |  124 +++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 88 insertions(+), 36 deletions(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 52b802a..fe3d3ab 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -41,6 +41,8 @@
 #include <net/ip_tunnels.h>
 #include <net/inet_ecn.h>
 #include <net/xfrm.h>
+#include <net/esp.h>
+#include <net/ah.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -49,7 +51,6 @@ static struct rtnl_link_ops vti_link_ops __read_mostly;
 static int vti_net_id __read_mostly;
 static int vti_tunnel_init(struct net_device *dev);
 
-/* We dont digest the packet therefore let the packet pass */
 static int vti_rcv(struct sk_buff *skb)
 {
 	struct ip_tunnel *tunnel;
@@ -60,48 +61,72 @@ static int vti_rcv(struct sk_buff *skb)
 	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
 				  iph->saddr, iph->daddr, 0);
 	if (tunnel != NULL) {
-		struct pcpu_tstats *tstats;
-		u32 oldmark = skb->mark;
-		int ret;
+		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
+			goto drop;
+
+		XFRM_TUNNEL_SKB_CB(skb)->tunnel = tunnel;
+
+		/* Partially clear the buffer, the rest is done by xfrm_input. */
+		if (!net_eq(tunnel->net, dev_net(tunnel->dev)))
+			skb_orphan(skb);
+		skb->tstamp.tv64 = 0;
+		skb->pkt_type = PACKET_HOST;
+		skb->skb_iif = 0;
+		nf_reset_trace(skb);
+		secpath_reset(skb);
+
+		skb->mark = be32_to_cpu(tunnel->parms.i_key);
+		skb->dev = tunnel->dev;
+
+		return xfrm4_rcv(skb);
+	}
+
+	return -1;
+drop:
+	kfree_skb(skb);
+	return 0;
+}
 
+static int vti_rcv_cb(struct sk_buff *skb, int err)
+{
+	struct net_device *dev = skb->dev;
+	struct pcpu_tstats *tstats;
+	struct ip_tunnel *tunnel = XFRM_TUNNEL_SKB_CB(skb)->tunnel;
+
+	if (!tunnel || tunnel != netdev_priv(dev))
+		return -1;
 
-		/* temporarily mark the skb with the tunnel o_key, to
-		 * only match policies with this mark.
-		 */
-		skb->mark = be32_to_cpu(tunnel->parms.o_key);
-		ret = xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb);
-		skb->mark = oldmark;
-		if (!ret)
-			return -1;
+	tstats = this_cpu_ptr(dev->tstats);
 
-		tstats = this_cpu_ptr(tunnel->dev->tstats);
+	if (!err) {
 		u64_stats_update_begin(&tstats->syncp);
 		tstats->rx_packets++;
 		tstats->rx_bytes += skb->len;
 		u64_stats_update_end(&tstats->syncp);
 
-		secpath_reset(skb);
-		skb->dev = tunnel->dev;
-		return 1;
+		return 0;
 	}
 
-	return -1;
+	if (err == -EINPROGRESS)
+		return 0;
+
+	dev->stats.rx_errors++;
+	dev->stats.rx_dropped++;
+
+	return 0;
 }
 
 /* This function assumes it is being called from dev_queue_xmit()
  * and that skb is filled properly by that function.
  */
-
 static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	struct iphdr  *tiph = &tunnel->parms.iph;
 	u8     tos;
 	struct rtable *rt;		/* Route to the other host */
 	struct net_device *tdev;	/* Device to other host */
 	struct iphdr  *old_iph = ip_hdr(skb);
-	__be32 dst = tiph->daddr;
-	struct flowi4 fl4;
+	struct flowi fl;
 	int err;
 
 	if (skb->protocol != htons(ETH_P_IP))
@@ -109,17 +134,17 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	tos = old_iph->tos;
 
-	memset(&fl4, 0, sizeof(fl4));
-	flowi4_init_output(&fl4, tunnel->parms.link,
-			   be32_to_cpu(tunnel->parms.o_key), RT_TOS(tos),
-			   RT_SCOPE_UNIVERSE,
-			   IPPROTO_IPIP, 0,
-			   dst, tiph->saddr, 0, 0);
-	rt = ip_route_output_key(dev_net(dev), &fl4);
+	memset(&fl, 0, sizeof(fl));
+	skb->mark = be32_to_cpu(tunnel->parms.o_key);
+	xfrm_decode_session(skb, &fl, AF_INET);
+
+	dst_hold(skb_dst(skb));
+	rt = (struct rtable *)xfrm_lookup(tunnel->net, skb_dst(skb), &fl, NULL, 0);
 	if (IS_ERR(rt)) {
 		dev->stats.tx_carrier_errors++;
 		goto tx_error_icmp;
 	}
+
 	/* if there is no transform then this tunnel is not functional.
 	 * Or if the xfrm is not mode tunnel.
 	 */
@@ -147,9 +172,8 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
-	skb_dst_drop(skb);
+	skb_scrub_packet(skb, !net_eq(tunnel->net, dev_net(dev)));
 	skb_dst_set(skb, &rt->dst);
-	nf_reset(skb);
 	skb->dev = skb_dst(skb)->dev;
 
 	err = dst_output(skb);
@@ -181,12 +205,13 @@ vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 			return -EINVAL;
 	}
 
+	p.i_flags |= VTI_ISVTI;
 	err = ip_tunnel_ioctl(dev, &p, cmd);
 	if (err)
 		return err;
 
 	if (cmd != SIOCDELTUNNEL) {
-		p.i_flags |= GRE_KEY | VTI_ISVTI;
+		p.i_flags |= GRE_KEY;
 		p.o_flags |= GRE_KEY;
 	}
 
@@ -241,9 +266,18 @@ static void __net_init vti_fb_tunnel_init(struct net_device *dev)
 	iph->ihl		= 5;
 }
 
-static struct xfrm_tunnel_notifier vti_handler __read_mostly = {
+static struct xfrm4_protocol vti_esp4_protocol __read_mostly = {
+	.handler	=	vti_rcv,
+	.cb_handler	=	vti_rcv_cb,
+	.err_handler	=	esp4_err,
+	.priority	=	100,
+};
+
+static struct xfrm4_protocol vti_ah4_protocol __read_mostly = {
 	.handler	=	vti_rcv,
-	.priority	=	1,
+	.cb_handler	=	vti_rcv_cb,
+	.err_handler	=	ah4_err,
+	.priority	=	100,
 };
 
 static int __net_init vti_init_net(struct net *net)
@@ -287,6 +321,8 @@ static void vti_netlink_parms(struct nlattr *data[],
 	if (!data)
 		return;
 
+	parms->i_flags = VTI_ISVTI;
+
 	if (data[IFLA_VTI_LINK])
 		parms->link = nla_get_u32(data[IFLA_VTI_LINK]);
 
@@ -382,12 +418,24 @@ static int __init vti_init(void)
 	err = register_pernet_device(&vti_net_ops);
 	if (err < 0)
 		return err;
-	err = xfrm4_mode_tunnel_input_register(&vti_handler);
+	err = xfrm4_protocol_register(&vti_esp4_protocol, IPPROTO_ESP);
+	if (err < 0) {
+		unregister_pernet_device(&vti_net_ops);
+		pr_info("vti init: can't register tunnel\n");
+
+		return err;
+	}
+
+	err = xfrm4_protocol_register(&vti_ah4_protocol, IPPROTO_AH);
 	if (err < 0) {
+		xfrm4_protocol_deregister(&vti_esp4_protocol, IPPROTO_ESP);
 		unregister_pernet_device(&vti_net_ops);
 		pr_info("vti init: can't register tunnel\n");
+
+		return err;
 	}
 
+
 	err = rtnl_link_register(&vti_link_ops);
 	if (err < 0)
 		goto rtnl_link_failed;
@@ -395,7 +443,8 @@ static int __init vti_init(void)
 	return err;
 
 rtnl_link_failed:
-	xfrm4_mode_tunnel_input_deregister(&vti_handler);
+	xfrm4_protocol_deregister(&vti_ah4_protocol, IPPROTO_AH);
+	xfrm4_protocol_deregister(&vti_esp4_protocol, IPPROTO_ESP);
 	unregister_pernet_device(&vti_net_ops);
 	return err;
 }
@@ -403,8 +452,11 @@ rtnl_link_failed:
 static void __exit vti_fini(void)
 {
 	rtnl_link_unregister(&vti_link_ops);
-	if (xfrm4_mode_tunnel_input_deregister(&vti_handler))
+	if (xfrm4_protocol_deregister(&vti_ah4_protocol, IPPROTO_AH))
 		pr_info("vti close: can't deregister tunnel\n");
+	if (xfrm4_protocol_deregister(&vti_esp4_protocol, IPPROTO_ESP))
+		pr_info("vti close: can't deregister tunnel\n");
+
 
 	unregister_pernet_device(&vti_net_ops);
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 9/9] xfrm4: Remove xfrm_tunnel_notifier
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (7 preceding siblings ...)
  2013-12-05 12:05 ` [PATCH RFC 8/9] vti: Update the ipv4 side to use it's own receive hook Steffen Klassert
@ 2013-12-05 12:05 ` Steffen Klassert
  2013-12-05 17:27 ` [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Stephen Hemminger
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-05 12:05 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault, Saurabh Mohan

This was used from vti and is replaced by the IPsec protocol
multiplexer hooks. It is now unused, so remove it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/xfrm4_mode_tunnel.c |   68 ------------------------------------------
 1 file changed, 68 deletions(-)

diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index 31b1815..05f2b48 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -15,65 +15,6 @@
 #include <net/ip.h>
 #include <net/xfrm.h>
 
-/* Informational hook. The decap is still done here. */
-static struct xfrm_tunnel_notifier __rcu *rcv_notify_handlers __read_mostly;
-static DEFINE_MUTEX(xfrm4_mode_tunnel_input_mutex);
-
-int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler)
-{
-	struct xfrm_tunnel_notifier __rcu **pprev;
-	struct xfrm_tunnel_notifier *t;
-	int ret = -EEXIST;
-	int priority = handler->priority;
-
-	mutex_lock(&xfrm4_mode_tunnel_input_mutex);
-
-	for (pprev = &rcv_notify_handlers;
-	     (t = rcu_dereference_protected(*pprev,
-	     lockdep_is_held(&xfrm4_mode_tunnel_input_mutex))) != NULL;
-	     pprev = &t->next) {
-		if (t->priority > priority)
-			break;
-		if (t->priority == priority)
-			goto err;
-
-	}
-
-	handler->next = *pprev;
-	rcu_assign_pointer(*pprev, handler);
-
-	ret = 0;
-
-err:
-	mutex_unlock(&xfrm4_mode_tunnel_input_mutex);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_register);
-
-int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler)
-{
-	struct xfrm_tunnel_notifier __rcu **pprev;
-	struct xfrm_tunnel_notifier *t;
-	int ret = -ENOENT;
-
-	mutex_lock(&xfrm4_mode_tunnel_input_mutex);
-	for (pprev = &rcv_notify_handlers;
-	     (t = rcu_dereference_protected(*pprev,
-	     lockdep_is_held(&xfrm4_mode_tunnel_input_mutex))) != NULL;
-	     pprev = &t->next) {
-		if (t == handler) {
-			*pprev = handler->next;
-			ret = 0;
-			break;
-		}
-	}
-	mutex_unlock(&xfrm4_mode_tunnel_input_mutex);
-	synchronize_net();
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_deregister);
-
 static inline void ipip_ecn_decapsulate(struct sk_buff *skb)
 {
 	struct iphdr *inner_iph = ipip_hdr(skb);
@@ -127,14 +68,8 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 	return 0;
 }
 
-#define for_each_input_rcu(head, handler)	\
-	for (handler = rcu_dereference(head);	\
-	     handler != NULL;			\
-	     handler = rcu_dereference(handler->next))
-
 static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 {
-	struct xfrm_tunnel_notifier *handler;
 	int err = -EINVAL;
 
 	if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP)
@@ -143,9 +78,6 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
 		goto out;
 
-	for_each_input_rcu(rcv_notify_handlers, handler)
-		handler->handler(skb);
-
 	err = skb_unclone(skb, GFP_ATOMIC);
 	if (err)
 		goto out;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/9] vti4: prepare namespace and interfamily support.
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (8 preceding siblings ...)
  2013-12-05 12:05 ` [PATCH RFC 9/9] xfrm4: Remove xfrm_tunnel_notifier Steffen Klassert
@ 2013-12-05 17:27 ` Stephen Hemminger
  2013-12-06 20:20 ` David Miller
  2013-12-09  9:17 ` Christophe Gouault
  11 siblings, 0 replies; 15+ messages in thread
From: Stephen Hemminger @ 2013-12-05 17:27 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, Christophe Gouault, Saurabh Mohan

On Thu, 5 Dec 2013 13:00:28 +0100
Steffen Klassert <steffen.klassert@secunet.com> wrote:

> This patchset prepares vti4 for proper namespace and interfamily support.
> It is based on the net tree because net-next is still too far behind the
> mainline.

Then ask David to merge net -> net-next. That is the standard method
to submit new feature patches that overlap.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/9] vti4: prepare namespace and interfamily support.
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (9 preceding siblings ...)
  2013-12-05 17:27 ` [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Stephen Hemminger
@ 2013-12-06 20:20 ` David Miller
  2013-12-09  9:17 ` Christophe Gouault
  11 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2013-12-06 20:20 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev, christophe.gouault, saurabh.mohan

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 5 Dec 2013 13:00:28 +0100

> The patchset that implements an IPsec protocol multiplexer, so that vti
> can register it's own receive path hooks. Further it makes the i_key
> usable for vti and changes the vti4 code to do the following:
> 
> vti uses the IPsec protocol multiplexer to register it's
> own receive side hooks for ESP and AH.

I have no fundamental objections to this series, looks good.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/9] vti4: prepare namespace and interfamily support.
  2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
                   ` (10 preceding siblings ...)
  2013-12-06 20:20 ` David Miller
@ 2013-12-09  9:17 ` Christophe Gouault
  11 siblings, 0 replies; 15+ messages in thread
From: Christophe Gouault @ 2013-12-09  9:17 UTC (permalink / raw)
  To: Steffen Klassert, netdev; +Cc: Saurabh Mohan

Hi Steffen,

On 12/05/2013 01:00 PM, Steffen Klassert wrote:
> This patchset prepares vti4 for proper namespace and interfamily support.
> It is based on the net tree because net-next is still too far behind the
> mainline.
[...]
>
> It has not much testing so far. I can setup the vti tunnels and ping through
> it. That's all I tried so test carefully ;-).

This series of patchs looks good and seems to address all identified 
limitations of the current vti implementation.

I'll try to run deeper tests this week. Thank you for working on 
improving vti.

Best Regards,
Christophe

> I'll start to care about the ipv6 side after I've got some feedback on this.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 8/9] vti: Update the ipv4 side to use it's own receive hook.
  2013-12-05 12:05 ` [PATCH RFC 8/9] vti: Update the ipv4 side to use it's own receive hook Steffen Klassert
@ 2013-12-12 16:26   ` Nicolas Dichtel
  2013-12-13  9:56     ` Steffen Klassert
  0 siblings, 1 reply; 15+ messages in thread
From: Nicolas Dichtel @ 2013-12-12 16:26 UTC (permalink / raw)
  To: Steffen Klassert, netdev; +Cc: Christophe Gouault, Saurabh Mohan

Le 05/12/2013 13:05, Steffen Klassert a écrit :
> With this patch, vti uses the IPsec protocol multiplexer to
> register it's own receive side hooks for ESP and AH.
>
> Vti now does the following on receive side:
>
> 1. Do an input policy check for the IPsec packet we received.
>     This is required because this packet could be already
>     prosecces by IPsec, so an inbuond policy check is needed.
>
> 2. Clean the skb to not leak informations on namespace
>     transitions.
>
> 3. Mark the packet with the i_key. The policy and the state
>     must match this key now. Policy and state belong to the vti
>     namespace and policy enforcement is done at the further layers.
>
> 4. Call the generic xfrm layer to do decryption and decapsulation.
>
> 5. Wait for a callback from the xfrm layer to properly update
>     the device statistics.
>
> On transmit side:
>
> 1. Mark the packet with the o_key. The policy and the state
>     must match this key now.
>
> 2. Do a xfrm_lookup on the original packet with the mark applied.
>
> 3. Check if we got an IPsec route.
>
> 4. Clean the skb to not leak informations on namespace
>     transitions.
>
> 5. Attach the dst_enty we got from the xfrm_lookup to the skb.
>
> 6. Call dst_output to do the IPsec processing.
>
> 7. Do the device statistics.
>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>   net/ipv4/ip_vti.c |  124 +++++++++++++++++++++++++++++++++++++----------------
>   1 file changed, 88 insertions(+), 36 deletions(-)
>
> diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
> index 52b802a..fe3d3ab 100644
> --- a/net/ipv4/ip_vti.c
> +++ b/net/ipv4/ip_vti.c
> @@ -41,6 +41,8 @@
>   #include <net/ip_tunnels.h>
>   #include <net/inet_ecn.h>
>   #include <net/xfrm.h>
> +#include <net/esp.h>
> +#include <net/ah.h>
>   #include <net/net_namespace.h>
>   #include <net/netns/generic.h>
>
> @@ -49,7 +51,6 @@ static struct rtnl_link_ops vti_link_ops __read_mostly;
>   static int vti_net_id __read_mostly;
>   static int vti_tunnel_init(struct net_device *dev);
>
> -/* We dont digest the packet therefore let the packet pass */
>   static int vti_rcv(struct sk_buff *skb)
>   {
>   	struct ip_tunnel *tunnel;
> @@ -60,48 +61,72 @@ static int vti_rcv(struct sk_buff *skb)
>   	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
>   				  iph->saddr, iph->daddr, 0);
>   	if (tunnel != NULL) {
> -		struct pcpu_tstats *tstats;
> -		u32 oldmark = skb->mark;
> -		int ret;
> +		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
> +			goto drop;
> +
> +		XFRM_TUNNEL_SKB_CB(skb)->tunnel = tunnel;
> +
> +		/* Partially clear the buffer, the rest is done by xfrm_input. */
> +		if (!net_eq(tunnel->net, dev_net(tunnel->dev)))
> +			skb_orphan(skb);
> +		skb->tstamp.tv64 = 0;
> +		skb->pkt_type = PACKET_HOST;
> +		skb->skb_iif = 0;
> +		nf_reset_trace(skb);
> +		secpath_reset(skb);
Is it not better to call skb_scrub_packet() (if necessary adding a new
argument to skip some operations)?
skb_scrub_packet() ensures to perform all needed operations when crossing
netns/tunnel.
It also eases the maintenance and make the code more consistent: when a bug is
fixed in skb_scrub_packet() all users benefit from it (eg 239c78db9c41 net:
clear local_df when passing skb between namespaces), otherwise we will always
forget some users.

Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 8/9] vti: Update the ipv4 side to use it's own receive hook.
  2013-12-12 16:26   ` Nicolas Dichtel
@ 2013-12-13  9:56     ` Steffen Klassert
  0 siblings, 0 replies; 15+ messages in thread
From: Steffen Klassert @ 2013-12-13  9:56 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, Christophe Gouault, Saurabh Mohan

On Thu, Dec 12, 2013 at 05:26:55PM +0100, Nicolas Dichtel wrote:
> Le 05/12/2013 13:05, Steffen Klassert a écrit :
> >
> >-/* We dont digest the packet therefore let the packet pass */
> >  static int vti_rcv(struct sk_buff *skb)
> >  {
> >  	struct ip_tunnel *tunnel;
> >@@ -60,48 +61,72 @@ static int vti_rcv(struct sk_buff *skb)
> >  	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
> >  				  iph->saddr, iph->daddr, 0);
> >  	if (tunnel != NULL) {
> >-		struct pcpu_tstats *tstats;
> >-		u32 oldmark = skb->mark;
> >-		int ret;
> >+		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
> >+			goto drop;
> >+
> >+		XFRM_TUNNEL_SKB_CB(skb)->tunnel = tunnel;
> >+
> >+		/* Partially clear the buffer, the rest is done by xfrm_input. */
> >+		if (!net_eq(tunnel->net, dev_net(tunnel->dev)))
> >+			skb_orphan(skb);
> >+		skb->tstamp.tv64 = 0;
> >+		skb->pkt_type = PACKET_HOST;
> >+		skb->skb_iif = 0;
> >+		nf_reset_trace(skb);
> >+		secpath_reset(skb);
> Is it not better to call skb_scrub_packet() (if necessary adding a new
> argument to skip some operations)?

Yes, looks like we can simply use skb_scrub_packet(). xfrm_input()
will do nf_reset() and skb_dst_drop() again, but this should not
harm too much.

I'll incorporate this into the v2 patchest.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-12-13  9:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 12:00 [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Steffen Klassert
2013-12-05 12:01 ` [PATCH RFC 1/9] xfrm4: Add IPsec protocol multiplexer Steffen Klassert
2013-12-05 12:01 ` [PATCH RFC 2/9] esp4: Use the IPsec protocol multiplexer API Steffen Klassert
2013-12-05 12:02 ` [PATCH RFC 3/9] esp4: Export esp4_err Steffen Klassert
2013-12-05 12:02 ` [PATCH RFC 4/9] ah4: Use the IPsec protocol multiplexer API Steffen Klassert
2013-12-05 12:03 ` [PATCH RFC 5/9] ah4: Export ah4_err Steffen Klassert
2013-12-05 12:03 ` [PATCH RFC 6/9] xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer Steffen Klassert
2013-12-05 12:04 ` [PATCH RFC 7/9] ip_tunnel: Make vti work with i_key set Steffen Klassert
2013-12-05 12:05 ` [PATCH RFC 8/9] vti: Update the ipv4 side to use it's own receive hook Steffen Klassert
2013-12-12 16:26   ` Nicolas Dichtel
2013-12-13  9:56     ` Steffen Klassert
2013-12-05 12:05 ` [PATCH RFC 9/9] xfrm4: Remove xfrm_tunnel_notifier Steffen Klassert
2013-12-05 17:27 ` [PATCH RFC 0/9] vti4: prepare namespace and interfamily support Stephen Hemminger
2013-12-06 20:20 ` David Miller
2013-12-09  9:17 ` Christophe Gouault

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).