[0/12] Trying to merge xfrm input path before I got side-tracked...

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [0/12] Trying to merge xfrm input path before I got side-tracked...
@ 2007-10-16 14:18 Herbert Xu
  2007-10-16 14:33 ` [PATCH 1/12] [IPSEC]: Fix pure tunnel modes involving IPv6 Herbert Xu
                   ` (13 more replies)
  0 siblings, 14 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:18 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: Patrick McHardy

Hi Dave:

I was well on my way to merging the xfrm input path before I got
side-tracked by inter-family transforms :)

Anyway, here's a dump of what I've got.  The one note-worthy bit
is the patch to reinject transport mode packets through netif_rx
rather than calling netfilter directly.

Everything else is pretty mundane (modulo any bugs of course).

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/12] [IPSEC]: Fix pure tunnel modes involving IPv6
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 2/12] [IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input Herbert Xu
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Fix pure tunnel modes involving IPv6

I noticed that my recent patch broke 6-on-4 pure IPsec tunnels (the ones
that are only used for incompressible IPsec packets).  Subsequent reviews
show that I broke 6-on-6 pure tunnels more than three years ago and nobody
ever noticed. I suppose every must be testing 6-on-6 IPComp with large
pings which are very compressible :)

This patch fixes both cases.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv4/xfrm4_tunnel.c |    2 +-
 net/ipv6/xfrm6_tunnel.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c
index 1312417..83e9580 100644
--- a/net/ipv4/xfrm4_tunnel.c
+++ b/net/ipv4/xfrm4_tunnel.c
@@ -18,7 +18,7 @@ static int ipip_output(struct xfrm_state *x, struct sk_buff *skb)
 
 static int ipip_xfrm_rcv(struct xfrm_state *x, struct sk_buff *skb)
 {
-	return IPPROTO_IP;
+	return ip_hdr(skb)->protocol;
 }
 
 static int ipip_init_state(struct xfrm_state *x)
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 3f8a3ab..6c67ac1 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -248,7 +248,7 @@ static int xfrm6_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 
 static int xfrm6_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 {
-	return 0;
+	return skb_network_header(skb)[IP6CB(skb)->nhoff];
 }
 
 static int xfrm6_tunnel_rcv(struct sk_buff *skb)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/12] [IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
  2007-10-16 14:33 ` [PATCH 1/12] [IPSEC]: Fix pure tunnel modes involving IPv6 Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 3/12] [IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi Herbert Xu
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input

This patch moves the tunnel parsing for IPv4 out of xfrm4_input and into
xfrm4_tunnel.  This change is in line with what IPv6 does and will allow
us to merge the two input functions.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/net/xfrm.h      |    8 ++++++++
 net/ipv4/xfrm4_input.c  |   36 +++++++++++-------------------------
 net/ipv4/xfrm4_tunnel.c |    9 +++++++--
 3 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 0e84484..680739f 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1046,7 +1046,15 @@ extern void xfrm_replay_notify(struct xfrm_state *x, int event);
 extern int xfrm_state_mtu(struct xfrm_state *x, int mtu);
 extern int xfrm_init_state(struct xfrm_state *x);
 extern int xfrm_output(struct sk_buff *skb);
+extern int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
+			   int encap_type);
 extern int xfrm4_rcv(struct sk_buff *skb);
+
+static inline int xfrm4_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
+{
+	return xfrm4_rcv_encap(skb, nexthdr, spi, 0);
+}
+
 extern int xfrm4_output(struct sk_buff *skb);
 extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index e9bbfde..5cb0b59 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -16,19 +16,6 @@
 #include <net/ip.h>
 #include <net/xfrm.h>
 
-static int xfrm4_parse_spi(struct sk_buff *skb, u8 nexthdr, __be32 *spi, __be32 *seq)
-{
-	switch (nexthdr) {
-	case IPPROTO_IPIP:
-	case IPPROTO_IPV6:
-		*spi = ip_hdr(skb)->saddr;
-		*seq = 0;
-		return 0;
-	}
-
-	return xfrm_parse_spi(skb, nexthdr, spi, seq);
-}
-
 #ifdef CONFIG_NETFILTER
 static inline int xfrm4_rcv_encap_finish(struct sk_buff *skb)
 {
@@ -46,28 +33,29 @@ drop:
 }
 #endif
 
-static int xfrm4_rcv_encap(struct sk_buff *skb, __u16 encap_type)
+int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
+		    int encap_type)
 {
-	__be32 spi, seq;
+	int err;
+	__be32 seq;
 	struct xfrm_state *xfrm_vec[XFRM_MAX_DEPTH];
 	struct xfrm_state *x;
 	int xfrm_nr = 0;
 	int decaps = 0;
-	int err = xfrm4_parse_spi(skb, ip_hdr(skb)->protocol, &spi, &seq);
 	unsigned int nhoff = offsetof(struct iphdr, protocol);
 
-	if (err != 0)
+	seq = 0;
+	if (!spi && (err = xfrm_parse_spi(skb, nexthdr, &spi, &seq)) != 0)
 		goto drop;
 
 	do {
 		const struct iphdr *iph = ip_hdr(skb);
-		int nexthdr;
 
 		if (xfrm_nr == XFRM_MAX_DEPTH)
 			goto drop;
 
 		x = xfrm_state_lookup((xfrm_address_t *)&iph->daddr, spi,
-				iph->protocol != IPPROTO_IPV6 ? iph->protocol : IPPROTO_IPIP, AF_INET);
+				      nexthdr, AF_INET);
 		if (x == NULL)
 			goto drop;
 
@@ -111,7 +99,7 @@ static int xfrm4_rcv_encap(struct sk_buff *skb, __u16 encap_type)
 			break;
 		}
 
-		err = xfrm_parse_spi(skb, ip_hdr(skb)->protocol, &spi, &seq);
+		err = xfrm_parse_spi(skb, nexthdr, &spi, &seq);
 		if (err < 0)
 			goto drop;
 	} while (!err);
@@ -165,6 +153,7 @@ drop:
 	kfree_skb(skb);
 	return 0;
 }
+EXPORT_SYMBOL(xfrm4_rcv_encap);
 
 /* If it's a keepalive packet, then just eat it.
  * If it's an encapsulated packet, then pass it to the
@@ -252,11 +241,8 @@ int xfrm4_udp_encap_rcv(struct sock *sk, struct sk_buff *skb)
 	__skb_pull(skb, len);
 	skb_reset_transport_header(skb);
 
-	/* modify the protocol (it's ESP!) */
-	iph->protocol = IPPROTO_ESP;
-
 	/* process ESP */
-	ret = xfrm4_rcv_encap(skb, encap_type);
+	ret = xfrm4_rcv_encap(skb, IPPROTO_ESP, 0, encap_type);
 	return ret;
 
 drop:
@@ -266,7 +252,7 @@ drop:
 
 int xfrm4_rcv(struct sk_buff *skb)
 {
-	return xfrm4_rcv_encap(skb, 0);
+	return xfrm4_rcv_spi(skb, ip_hdr(skb)->protocol, 0);
 }
 
 EXPORT_SYMBOL(xfrm4_rcv);
diff --git a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c
index 83e9580..3268451 100644
--- a/net/ipv4/xfrm4_tunnel.c
+++ b/net/ipv4/xfrm4_tunnel.c
@@ -48,20 +48,25 @@ static struct xfrm_type ipip_type = {
 	.output		= ipip_output
 };
 
+static int xfrm_tunnel_rcv(struct sk_buff *skb)
+{
+	return xfrm4_rcv_spi(skb, IPPROTO_IP, ip_hdr(skb)->saddr);
+}
+
 static int xfrm_tunnel_err(struct sk_buff *skb, u32 info)
 {
 	return -ENOENT;
 }
 
 static struct xfrm_tunnel xfrm_tunnel_handler = {
-	.handler	=	xfrm4_rcv,
+	.handler	=	xfrm_tunnel_rcv,
 	.err_handler	=	xfrm_tunnel_err,
 	.priority	=	2,
 };
 
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 static struct xfrm_tunnel xfrm64_tunnel_handler = {
-	.handler	=	xfrm4_rcv,
+	.handler	=	xfrm_tunnel_rcv,
 	.err_handler	=	xfrm_tunnel_err,
 	.priority	=	2,
 };

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/12] [IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
  2007-10-16 14:33 ` [PATCH 1/12] [IPSEC]: Fix pure tunnel modes involving IPv6 Herbert Xu
  2007-10-16 14:33 ` [PATCH 2/12] [IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 4/12] [IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi Herbert Xu
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi

Currently xfrm6_rcv_spi gets the nexthdr value itself from the packet.
This means that we need to fix up the value in case we have a 4-on-6
tunnel.  Moving this logic into the caller simplifies things and allows
us to merge the code with IPv4.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/net/xfrm.h      |    2 +-
 net/ipv6/xfrm6_input.c  |    9 ++++-----
 net/ipv6/xfrm6_tunnel.c |    2 +-
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 680739f..d8974ca 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1058,7 +1058,7 @@ static inline int xfrm4_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 extern int xfrm4_output(struct sk_buff *skb);
 extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
-extern int xfrm6_rcv_spi(struct sk_buff *skb, __be32 spi);
+extern int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
 extern int xfrm6_rcv(struct sk_buff *skb);
 extern int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 			    xfrm_address_t *saddr, u8 proto);
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 02f69e5..596a730 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -16,7 +16,7 @@
 #include <net/ipv6.h>
 #include <net/xfrm.h>
 
-int xfrm6_rcv_spi(struct sk_buff *skb, __be32 spi)
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
 	int err;
 	__be32 seq;
@@ -24,11 +24,9 @@ int xfrm6_rcv_spi(struct sk_buff *skb, __be32 spi)
 	struct xfrm_state *x;
 	int xfrm_nr = 0;
 	int decaps = 0;
-	int nexthdr;
 	unsigned int nhoff;
 
 	nhoff = IP6CB(skb)->nhoff;
-	nexthdr = skb_network_header(skb)[nhoff];
 
 	seq = 0;
 	if (!spi && (err = xfrm_parse_spi(skb, nexthdr, &spi, &seq)) != 0)
@@ -41,7 +39,7 @@ int xfrm6_rcv_spi(struct sk_buff *skb, __be32 spi)
 			goto drop;
 
 		x = xfrm_state_lookup((xfrm_address_t *)&iph->daddr, spi,
-				nexthdr != IPPROTO_IPIP ? nexthdr : IPPROTO_IPV6, AF_INET6);
+				      nexthdr, AF_INET6);
 		if (x == NULL)
 			goto drop;
 		spin_lock(&x->lock);
@@ -135,7 +133,8 @@ EXPORT_SYMBOL(xfrm6_rcv_spi);
 
 int xfrm6_rcv(struct sk_buff *skb)
 {
-	return xfrm6_rcv_spi(skb, 0);
+	return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
+			     0);
 }
 
 EXPORT_SYMBOL(xfrm6_rcv);
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 6c67ac1..fae90ff 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -257,7 +257,7 @@ static int xfrm6_tunnel_rcv(struct sk_buff *skb)
 	__be32 spi;
 
 	spi = xfrm6_tunnel_spi_lookup((xfrm_address_t *)&iph->saddr);
-	return xfrm6_rcv_spi(skb, spi) > 0 ? : 0;
+	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi) > 0 ? : 0;
 }
 
 static int xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt,

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/12] [IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (2 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 3/12] [IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 5/12] [IPSEC]: Fix length check in xfrm_parse_spi Herbert Xu
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi

Not every transform needs to zap ip_summed.  For example, a pure tunnel
mode encapsulation does not affect the hardware checksum at all.  In fact,
every algorithm (that needs this) other than AH6 already does its own
ip_summed zapping.

This patch moves the zapping into AH6 which is in line with what IPv4 does.

Possible future optimisation: Checksum the data as we copy them in IPComp.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv6/ah6.c         |    2 ++
 net/ipv6/xfrm6_input.c |    1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index f9f6891..a8221d1 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -344,6 +344,8 @@ static int ah6_input(struct xfrm_state *x, struct sk_buff *skb)
 	    pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
 		goto out;
 
+	skb->ip_summed = CHECKSUM_NONE;
+
 	hdr_len = skb->data - skb_network_header(skb);
 	ah = (struct ip_auth_hdr *)skb->data;
 	ahp = x->data;
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 596a730..b1201c3 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -97,7 +97,6 @@ int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 	memcpy(skb->sp->xvec + skb->sp->len, xfrm_vec,
 	       xfrm_nr * sizeof(xfrm_vec[0]));
 	skb->sp->len += xfrm_nr;
-	skb->ip_summed = CHECKSUM_NONE;
 
 	nf_reset(skb);
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/12] [IPSEC]: Fix length check in xfrm_parse_spi
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (3 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 4/12] [IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 6/12] [IPSEC]: Move type and mode map into xfrm_state.c Herbert Xu
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Fix length check in xfrm_parse_spi

Currently xfrm_parse_spi requires there to be 16 bytes for AH and ESP.
In contrived cases there may not actually be 16 bytes there since the
respective header sizes are less than that (8 and 12 currently).

This patch changes the test to use the actual header length instead of 16.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/xfrm/xfrm_input.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 113f444..cb97fda 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -49,13 +49,16 @@ EXPORT_SYMBOL(secpath_dup);
 int xfrm_parse_spi(struct sk_buff *skb, u8 nexthdr, __be32 *spi, __be32 *seq)
 {
 	int offset, offset_seq;
+	int hlen;
 
 	switch (nexthdr) {
 	case IPPROTO_AH:
+		hlen = sizeof(struct ip_auth_hdr);
 		offset = offsetof(struct ip_auth_hdr, spi);
 		offset_seq = offsetof(struct ip_auth_hdr, seq_no);
 		break;
 	case IPPROTO_ESP:
+		hlen = sizeof(struct ip_esp_hdr);
 		offset = offsetof(struct ip_esp_hdr, spi);
 		offset_seq = offsetof(struct ip_esp_hdr, seq_no);
 		break;
@@ -69,7 +72,7 @@ int xfrm_parse_spi(struct sk_buff *skb, u8 nexthdr, __be32 *spi, __be32 *seq)
 		return 1;
 	}
 
-	if (!pskb_may_pull(skb, 16))
+	if (!pskb_may_pull(skb, hlen))
 		return -EINVAL;
 
 	*spi = *(__be32*)(skb_transport_header(skb) + offset);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/12] [IPSEC]: Move type and mode map into xfrm_state.c
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (4 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 5/12] [IPSEC]: Fix length check in xfrm_parse_spi Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 7/12] [IPSEC]: Remove xfrmX_tunnel_check_size Herbert Xu
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Move type and mode map into xfrm_state.c

The type and mode maps are only used by SAs, not policies.  So it makes
sense to move them from xfrm_policy.c into xfrm_state.c.  This alos allows
us to mark xfrm_get_type/xfrm_put_type/xfrm_get_mode/xfrm_put_mode as
static.

The only other change I've made in the move is to get rid of the casts
on the request_module call for types.  They're unnecessary because C
will promote them to ints anyway.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/net/xfrm.h     |    8 --
 net/xfrm/xfrm_policy.c |  173 -------------------------------------------------
 net/xfrm/xfrm_state.c  |  170 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 172 insertions(+), 179 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index d8974ca..7f156a0 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -228,8 +228,6 @@ struct xfrm_type;
 struct xfrm_dst;
 struct xfrm_policy_afinfo {
 	unsigned short		family;
-	struct xfrm_type	*type_map[IPPROTO_MAX];
-	struct xfrm_mode	*mode_map[XFRM_MODE_MAX];
 	struct dst_ops		*dst_ops;
 	void			(*garbage_collect)(void);
 	int			(*dst_lookup)(struct xfrm_dst **dst, struct flowi *fl);
@@ -256,6 +254,8 @@ extern int __xfrm_state_delete(struct xfrm_state *x);
 
 struct xfrm_state_afinfo {
 	unsigned short		family;
+	struct xfrm_type	*type_map[IPPROTO_MAX];
+	struct xfrm_mode	*mode_map[XFRM_MODE_MAX];
 	int			(*init_flags)(struct xfrm_state *x);
 	void			(*init_tempsel)(struct xfrm_state *x, struct flowi *fl,
 						struct xfrm_tmpl *tmpl,
@@ -295,8 +295,6 @@ struct xfrm_type
 
 extern int xfrm_register_type(struct xfrm_type *type, unsigned short family);
 extern int xfrm_unregister_type(struct xfrm_type *type, unsigned short family);
-extern struct xfrm_type *xfrm_get_type(u8 proto, unsigned short family);
-extern void xfrm_put_type(struct xfrm_type *type);
 
 struct xfrm_mode {
 	int (*input)(struct xfrm_state *x, struct sk_buff *skb);
@@ -320,8 +318,6 @@ struct xfrm_mode {
 
 extern int xfrm_register_mode(struct xfrm_mode *mode, int family);
 extern int xfrm_unregister_mode(struct xfrm_mode *mode, int family);
-extern struct xfrm_mode *xfrm_get_mode(unsigned int encap, int family);
-extern void xfrm_put_mode(struct xfrm_mode *mode);
 
 struct xfrm_tmpl
 {
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index af27c19..ca24c90 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -49,8 +49,6 @@ static DEFINE_SPINLOCK(xfrm_policy_gc_lock);
 
 static struct xfrm_policy_afinfo *xfrm_policy_get_afinfo(unsigned short family);
 static void xfrm_policy_put_afinfo(struct xfrm_policy_afinfo *afinfo);
-static struct xfrm_policy_afinfo *xfrm_policy_lock_afinfo(unsigned int family);
-static void xfrm_policy_unlock_afinfo(struct xfrm_policy_afinfo *afinfo);
 
 static inline int
 __xfrm4_selector_match(struct xfrm_selector *sel, struct flowi *fl)
@@ -86,72 +84,6 @@ int xfrm_selector_match(struct xfrm_selector *sel, struct flowi *fl,
 	return 0;
 }
 
-int xfrm_register_type(struct xfrm_type *type, unsigned short family)
-{
-	struct xfrm_policy_afinfo *afinfo = xfrm_policy_lock_afinfo(family);
-	struct xfrm_type **typemap;
-	int err = 0;
-
-	if (unlikely(afinfo == NULL))
-		return -EAFNOSUPPORT;
-	typemap = afinfo->type_map;
-
-	if (likely(typemap[type->proto] == NULL))
-		typemap[type->proto] = type;
-	else
-		err = -EEXIST;
-	xfrm_policy_unlock_afinfo(afinfo);
-	return err;
-}
-EXPORT_SYMBOL(xfrm_register_type);
-
-int xfrm_unregister_type(struct xfrm_type *type, unsigned short family)
-{
-	struct xfrm_policy_afinfo *afinfo = xfrm_policy_lock_afinfo(family);
-	struct xfrm_type **typemap;
-	int err = 0;
-
-	if (unlikely(afinfo == NULL))
-		return -EAFNOSUPPORT;
-	typemap = afinfo->type_map;
-
-	if (unlikely(typemap[type->proto] != type))
-		err = -ENOENT;
-	else
-		typemap[type->proto] = NULL;
-	xfrm_policy_unlock_afinfo(afinfo);
-	return err;
-}
-EXPORT_SYMBOL(xfrm_unregister_type);
-
-struct xfrm_type *xfrm_get_type(u8 proto, unsigned short family)
-{
-	struct xfrm_policy_afinfo *afinfo;
-	struct xfrm_type **typemap;
-	struct xfrm_type *type;
-	int modload_attempted = 0;
-
-retry:
-	afinfo = xfrm_policy_get_afinfo(family);
-	if (unlikely(afinfo == NULL))
-		return NULL;
-	typemap = afinfo->type_map;
-
-	type = typemap[proto];
-	if (unlikely(type && !try_module_get(type->owner)))
-		type = NULL;
-	if (!type && !modload_attempted) {
-		xfrm_policy_put_afinfo(afinfo);
-		request_module("xfrm-type-%d-%d",
-			       (int) family, (int) proto);
-		modload_attempted = 1;
-		goto retry;
-	}
-
-	xfrm_policy_put_afinfo(afinfo);
-	return type;
-}
-
 int xfrm_dst_lookup(struct xfrm_dst **dst, struct flowi *fl,
 		    unsigned short family)
 {
@@ -170,94 +102,6 @@ int xfrm_dst_lookup(struct xfrm_dst **dst, struct flowi *fl,
 }
 EXPORT_SYMBOL(xfrm_dst_lookup);
 
-void xfrm_put_type(struct xfrm_type *type)
-{
-	module_put(type->owner);
-}
-
-int xfrm_register_mode(struct xfrm_mode *mode, int family)
-{
-	struct xfrm_policy_afinfo *afinfo;
-	struct xfrm_mode **modemap;
-	int err;
-
-	if (unlikely(mode->encap >= XFRM_MODE_MAX))
-		return -EINVAL;
-
-	afinfo = xfrm_policy_lock_afinfo(family);
-	if (unlikely(afinfo == NULL))
-		return -EAFNOSUPPORT;
-
-	err = -EEXIST;
-	modemap = afinfo->mode_map;
-	if (likely(modemap[mode->encap] == NULL)) {
-		modemap[mode->encap] = mode;
-		err = 0;
-	}
-
-	xfrm_policy_unlock_afinfo(afinfo);
-	return err;
-}
-EXPORT_SYMBOL(xfrm_register_mode);
-
-int xfrm_unregister_mode(struct xfrm_mode *mode, int family)
-{
-	struct xfrm_policy_afinfo *afinfo;
-	struct xfrm_mode **modemap;
-	int err;
-
-	if (unlikely(mode->encap >= XFRM_MODE_MAX))
-		return -EINVAL;
-
-	afinfo = xfrm_policy_lock_afinfo(family);
-	if (unlikely(afinfo == NULL))
-		return -EAFNOSUPPORT;
-
-	err = -ENOENT;
-	modemap = afinfo->mode_map;
-	if (likely(modemap[mode->encap] == mode)) {
-		modemap[mode->encap] = NULL;
-		err = 0;
-	}
-
-	xfrm_policy_unlock_afinfo(afinfo);
-	return err;
-}
-EXPORT_SYMBOL(xfrm_unregister_mode);
-
-struct xfrm_mode *xfrm_get_mode(unsigned int encap, int family)
-{
-	struct xfrm_policy_afinfo *afinfo;
-	struct xfrm_mode *mode;
-	int modload_attempted = 0;
-
-	if (unlikely(encap >= XFRM_MODE_MAX))
-		return NULL;
-
-retry:
-	afinfo = xfrm_policy_get_afinfo(family);
-	if (unlikely(afinfo == NULL))
-		return NULL;
-
-	mode = afinfo->mode_map[encap];
-	if (unlikely(mode && !try_module_get(mode->owner)))
-		mode = NULL;
-	if (!mode && !modload_attempted) {
-		xfrm_policy_put_afinfo(afinfo);
-		request_module("xfrm-mode-%d-%d", family, encap);
-		modload_attempted = 1;
-		goto retry;
-	}
-
-	xfrm_policy_put_afinfo(afinfo);
-	return mode;
-}
-
-void xfrm_put_mode(struct xfrm_mode *mode)
-{
-	module_put(mode->owner);
-}
-
 static inline unsigned long make_jiffies(long secs)
 {
 	if (secs >= (MAX_SCHEDULE_TIMEOUT-1)/HZ)
@@ -2213,23 +2057,6 @@ static void xfrm_policy_put_afinfo(struct xfrm_policy_afinfo *afinfo)
 	read_unlock(&xfrm_policy_afinfo_lock);
 }
 
-static struct xfrm_policy_afinfo *xfrm_policy_lock_afinfo(unsigned int family)
-{
-	struct xfrm_policy_afinfo *afinfo;
-	if (unlikely(family >= NPROTO))
-		return NULL;
-	write_lock_bh(&xfrm_policy_afinfo_lock);
-	afinfo = xfrm_policy_afinfo[family];
-	if (unlikely(!afinfo))
-		write_unlock_bh(&xfrm_policy_afinfo_lock);
-	return afinfo;
-}
-
-static void xfrm_policy_unlock_afinfo(struct xfrm_policy_afinfo *afinfo)
-{
-	write_unlock_bh(&xfrm_policy_afinfo_lock);
-}
-
 static int xfrm_dev_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
 	struct net_device *dev = ptr;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 344f0a6..dc438f2 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -187,6 +187,176 @@ int __xfrm_state_delete(struct xfrm_state *x);
 int km_query(struct xfrm_state *x, struct xfrm_tmpl *t, struct xfrm_policy *pol);
 void km_state_expired(struct xfrm_state *x, int hard, u32 pid);
 
+static struct xfrm_state_afinfo *xfrm_state_lock_afinfo(unsigned int family)
+{
+	struct xfrm_state_afinfo *afinfo;
+	if (unlikely(family >= NPROTO))
+		return NULL;
+	write_lock_bh(&xfrm_state_afinfo_lock);
+	afinfo = xfrm_state_afinfo[family];
+	if (unlikely(!afinfo))
+		write_unlock_bh(&xfrm_state_afinfo_lock);
+	return afinfo;
+}
+
+static void xfrm_state_unlock_afinfo(struct xfrm_state_afinfo *afinfo)
+{
+	write_unlock_bh(&xfrm_state_afinfo_lock);
+}
+
+int xfrm_register_type(struct xfrm_type *type, unsigned short family)
+{
+	struct xfrm_state_afinfo *afinfo = xfrm_state_lock_afinfo(family);
+	struct xfrm_type **typemap;
+	int err = 0;
+
+	if (unlikely(afinfo == NULL))
+		return -EAFNOSUPPORT;
+	typemap = afinfo->type_map;
+
+	if (likely(typemap[type->proto] == NULL))
+		typemap[type->proto] = type;
+	else
+		err = -EEXIST;
+	xfrm_state_unlock_afinfo(afinfo);
+	return err;
+}
+EXPORT_SYMBOL(xfrm_register_type);
+
+int xfrm_unregister_type(struct xfrm_type *type, unsigned short family)
+{
+	struct xfrm_state_afinfo *afinfo = xfrm_state_lock_afinfo(family);
+	struct xfrm_type **typemap;
+	int err = 0;
+
+	if (unlikely(afinfo == NULL))
+		return -EAFNOSUPPORT;
+	typemap = afinfo->type_map;
+
+	if (unlikely(typemap[type->proto] != type))
+		err = -ENOENT;
+	else
+		typemap[type->proto] = NULL;
+	xfrm_state_unlock_afinfo(afinfo);
+	return err;
+}
+EXPORT_SYMBOL(xfrm_unregister_type);
+
+static struct xfrm_type *xfrm_get_type(u8 proto, unsigned short family)
+{
+	struct xfrm_state_afinfo *afinfo;
+	struct xfrm_type **typemap;
+	struct xfrm_type *type;
+	int modload_attempted = 0;
+
+retry:
+	afinfo = xfrm_state_get_afinfo(family);
+	if (unlikely(afinfo == NULL))
+		return NULL;
+	typemap = afinfo->type_map;
+
+	type = typemap[proto];
+	if (unlikely(type && !try_module_get(type->owner)))
+		type = NULL;
+	if (!type && !modload_attempted) {
+		xfrm_state_put_afinfo(afinfo);
+		request_module("xfrm-type-%d-%d", family, proto);
+		modload_attempted = 1;
+		goto retry;
+	}
+
+	xfrm_state_put_afinfo(afinfo);
+	return type;
+}
+
+static void xfrm_put_type(struct xfrm_type *type)
+{
+	module_put(type->owner);
+}
+
+int xfrm_register_mode(struct xfrm_mode *mode, int family)
+{
+	struct xfrm_state_afinfo *afinfo;
+	struct xfrm_mode **modemap;
+	int err;
+
+	if (unlikely(mode->encap >= XFRM_MODE_MAX))
+		return -EINVAL;
+
+	afinfo = xfrm_state_lock_afinfo(family);
+	if (unlikely(afinfo == NULL))
+		return -EAFNOSUPPORT;
+
+	err = -EEXIST;
+	modemap = afinfo->mode_map;
+	if (likely(modemap[mode->encap] == NULL)) {
+		modemap[mode->encap] = mode;
+		err = 0;
+	}
+
+	xfrm_state_unlock_afinfo(afinfo);
+	return err;
+}
+EXPORT_SYMBOL(xfrm_register_mode);
+
+int xfrm_unregister_mode(struct xfrm_mode *mode, int family)
+{
+	struct xfrm_state_afinfo *afinfo;
+	struct xfrm_mode **modemap;
+	int err;
+
+	if (unlikely(mode->encap >= XFRM_MODE_MAX))
+		return -EINVAL;
+
+	afinfo = xfrm_state_lock_afinfo(family);
+	if (unlikely(afinfo == NULL))
+		return -EAFNOSUPPORT;
+
+	err = -ENOENT;
+	modemap = afinfo->mode_map;
+	if (likely(modemap[mode->encap] == mode)) {
+		modemap[mode->encap] = NULL;
+		err = 0;
+	}
+
+	xfrm_state_unlock_afinfo(afinfo);
+	return err;
+}
+EXPORT_SYMBOL(xfrm_unregister_mode);
+
+static struct xfrm_mode *xfrm_get_mode(unsigned int encap, int family)
+{
+	struct xfrm_state_afinfo *afinfo;
+	struct xfrm_mode *mode;
+	int modload_attempted = 0;
+
+	if (unlikely(encap >= XFRM_MODE_MAX))
+		return NULL;
+
+retry:
+	afinfo = xfrm_state_get_afinfo(family);
+	if (unlikely(afinfo == NULL))
+		return NULL;
+
+	mode = afinfo->mode_map[encap];
+	if (unlikely(mode && !try_module_get(mode->owner)))
+		mode = NULL;
+	if (!mode && !modload_attempted) {
+		xfrm_state_put_afinfo(afinfo);
+		request_module("xfrm-mode-%d-%d", family, encap);
+		modload_attempted = 1;
+		goto retry;
+	}
+
+	xfrm_state_put_afinfo(afinfo);
+	return mode;
+}
+
+static void xfrm_put_mode(struct xfrm_mode *mode)
+{
+	module_put(mode->owner);
+}
+
 static void xfrm_state_gc_destroy(struct xfrm_state *x)
 {
 	del_timer_sync(&x->timer);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/12] [IPSEC]: Remove xfrmX_tunnel_check_size
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (5 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 6/12] [IPSEC]: Move type and mode map into xfrm_state.c Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 8/12] [IPSEC]: Store afinfo pointer in xfrm_mode Herbert Xu
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Remove xfrmX_tunnel_check_size

These functions have always been causing trouble by sending ICMP errors
back to the local host which was totally confused about how to deal with
it and most often ended up causing a downward spiral which only finishes
when the MTU is so small that you can't send packets out anymore.

They're also wrong now that we have inter-family transforms.  They'll
end up trying to shove an IPv4 packet into the IPv6 ICMP stack and vice
versa.

In fact, I've just realised that they are totally unnecessary.  The reason
is that whoever calls us should have already checked the MTU.  In particular,
there are two cases:

1) The packet is forwarded in which case the forwarding function would've
performed the check.

2) The packet is local in which case whoever generated it should've checked.
If they didn't check then us sending back an ICMP error wouldn't do any good
anyway since the next time they transmit they'll still get it wrong.

So the only time this function has an effect is when the MTU happens to
change between the caller checking it and us checking it.  This is useless
because if we did catch such a change there's nothing stopping a further
MTU change between us checking it and the packet actually getting to the
device.

Indeed, at the bottom of the stack there will be another check by either
ip_output or ip6_output that would catch such an MTU change.

Such a change would not be able to send an ICMP error back to the original
sender but that's a general problem of our IPsec stack which we might solve
one day.  In any case, the benefit of having xfrmX_tunnel_check_size is
now outweighed by its problems.

Therefore this patch removes them both.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv4/xfrm4_output.c |   31 -------------------------------
 net/ipv6/xfrm6_output.c |   26 --------------------------
 2 files changed, 57 deletions(-)

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index a4edd66..ba26490 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -17,42 +17,11 @@
 #include <net/xfrm.h>
 #include <net/icmp.h>

-static int xfrm4_tunnel_check_size(struct sk_buff *skb)
-{
-	int mtu, ret = 0;
-	struct dst_entry *dst;
-
-	if (IPCB(skb)->flags & IPSKB_XFRM_TUNNEL_SIZE)
-		goto out;
-
-	IPCB(skb)->flags |= IPSKB_XFRM_TUNNEL_SIZE;
-
-	if (!(ip_hdr(skb)->frag_off & htons(IP_DF)) || skb->local_df)
-		goto out;
-
-	dst = skb->dst;
-	mtu = dst_mtu(dst);
-	if (skb->len > mtu) {
-		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
-		ret = -EMSGSIZE;
-	}
-out:
-	return ret;
-}
-
 static inline int xfrm4_output_one(struct sk_buff *skb)
 {
-	struct dst_entry *dst = skb->dst;
-	struct xfrm_state *x = dst->xfrm;
 	struct iphdr *iph;
 	int err;

-	if (x->props.mode == XFRM_MODE_TUNNEL) {
-		err = xfrm4_tunnel_check_size(skb);
-		if (err)
-			goto error_nolock;
-	}
-
 	err = xfrm_output(skb);
 	if (err)
 		goto error_nolock;
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index a5a32c1..4fb477a 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -25,37 +25,11 @@ int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,

 EXPORT_SYMBOL(xfrm6_find_1stfragopt);

-static int xfrm6_tunnel_check_size(struct sk_buff *skb)
-{
-	int mtu, ret = 0;
-	struct dst_entry *dst = skb->dst;
-
-	mtu = dst_mtu(dst);
-	if (mtu < IPV6_MIN_MTU)
-		mtu = IPV6_MIN_MTU;
-
-	if (skb->len > mtu) {
-		skb->dev = dst->dev;
-		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
-		ret = -EMSGSIZE;
-	}
-
-	return ret;
-}
-
 static inline int xfrm6_output_one(struct sk_buff *skb)
 {
-	struct dst_entry *dst = skb->dst;
-	struct xfrm_state *x = dst->xfrm;
 	struct ipv6hdr *iph;
 	int err;

-	if (x->props.mode == XFRM_MODE_TUNNEL) {
-		err = xfrm6_tunnel_check_size(skb);
-		if (err)
-			goto error_nolock;
-	}
-
 	err = xfrm_output(skb);
 	if (err)
 		goto error_nolock;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/12] [IPSEC]: Store afinfo pointer in xfrm_mode
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (6 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 7/12] [IPSEC]: Remove xfrmX_tunnel_check_size Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 9/12] [IPSEC]: Use the top IPv4 route's peer instead of the bottom Herbert Xu
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Store afinfo pointer in xfrm_mode

It is convenient to have a pointer from xfrm_state to address-specific
functions such as the output function for a family.  Currently the
address-specific policy code calls out to the xfrm state code to get
those pointers when we could get it in an easier way via the state
itself.

This patch adds an xfrm_state_afinfo to xfrm_mode (since they're
address-specific) and changes the policy code to use it.  I've also
added an owner field to do reference counting on the module providing
the afinfo even though it isn't strictly necessary today since IPv6
can't be unloaded yet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/net/xfrm.h      |    6 +++---
 net/ipv4/xfrm4_policy.c |   13 +------------
 net/ipv4/xfrm4_state.c  |    1 +
 net/ipv6/xfrm6_policy.c |   14 +-------------
 net/ipv6/xfrm6_state.c  |    1 +
 net/xfrm/xfrm_state.c   |   26 +++++++++++++++++---------
 6 files changed, 24 insertions(+), 37 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 7f156a0..a9e8247 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -253,7 +253,8 @@ extern void km_state_expired(struct xfrm_state *x, int hard, u32 pid);
 extern int __xfrm_state_delete(struct xfrm_state *x);
 
 struct xfrm_state_afinfo {
-	unsigned short		family;
+	unsigned int		family;
+	struct module		*owner;
 	struct xfrm_type	*type_map[IPPROTO_MAX];
 	struct xfrm_mode	*mode_map[XFRM_MODE_MAX];
 	int			(*init_flags)(struct xfrm_state *x);
@@ -267,8 +268,6 @@ struct xfrm_state_afinfo {
 
 extern int xfrm_state_register_afinfo(struct xfrm_state_afinfo *afinfo);
 extern int xfrm_state_unregister_afinfo(struct xfrm_state_afinfo *afinfo);
-extern struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family);
-extern void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo);
 
 extern void xfrm_state_delete_tunnel(struct xfrm_state *x);
 
@@ -312,6 +311,7 @@ struct xfrm_mode {
 	 */
 	int (*output)(struct xfrm_state *x,struct sk_buff *skb);
 
+	struct xfrm_state_afinfo *afinfo;
 	struct module *owner;
 	unsigned int encap;
 };
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 329825c..bd07a98 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -151,7 +151,6 @@ __xfrm4_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int
 	i = 0;
 	for (; dst_prev != &rt->u.dst; dst_prev = dst_prev->child) {
 		struct xfrm_dst *x = (struct xfrm_dst*)dst_prev;
-		struct xfrm_state_afinfo *afinfo;
 		x->u.rt.fl = *fl;
 
 		dst_prev->xfrm = xfrm[i++];
@@ -169,17 +168,7 @@ __xfrm4_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int
 		/* Copy neighbout for reachability confirmation */
 		dst_prev->neighbour	= neigh_clone(rt->u.dst.neighbour);
 		dst_prev->input		= rt->u.dst.input;
-		/* XXX: When IPv6 module can be unloaded, we should manage reference
-		 * to xfrm6_output in afinfo->output. Miyazawa
-		 * */
-		afinfo = xfrm_state_get_afinfo(dst_prev->xfrm->props.family);
-		if (!afinfo) {
-			dst = *dst_p;
-			err = -EAFNOSUPPORT;
-			goto error;
-		}
-		dst_prev->output = afinfo->output;
-		xfrm_state_put_afinfo(afinfo);
+		dst_prev->output = dst_prev->xfrm->mode->afinfo->output;
 		if (dst_prev->xfrm->props.family == AF_INET && rt->peer)
 			atomic_inc(&rt->peer->refcnt);
 		x->u.rt.peer = rt->peer;
diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c
index 93e2c06..13d54a1 100644
--- a/net/ipv4/xfrm4_state.c
+++ b/net/ipv4/xfrm4_state.c
@@ -49,6 +49,7 @@ __xfrm4_init_tempsel(struct xfrm_state *x, struct flowi *fl,
 
 static struct xfrm_state_afinfo xfrm4_state_afinfo = {
 	.family			= AF_INET,
+	.owner			= THIS_MODULE,
 	.init_flags		= xfrm4_init_flags,
 	.init_tempsel		= __xfrm4_init_tempsel,
 	.output			= xfrm4_output,
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 15aa4c5..e473708 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -215,7 +215,6 @@ __xfrm6_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int
 	i = 0;
 	for (; dst_prev != &rt->u.dst; dst_prev = dst_prev->child) {
 		struct xfrm_dst *x = (struct xfrm_dst*)dst_prev;
-		struct xfrm_state_afinfo *afinfo;
 
 		dst_prev->xfrm = xfrm[i++];
 		dst_prev->dev = rt->u.dst.dev;
@@ -232,18 +231,7 @@ __xfrm6_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int
 		/* Copy neighbour for reachability confirmation */
 		dst_prev->neighbour	= neigh_clone(rt->u.dst.neighbour);
 		dst_prev->input		= rt->u.dst.input;
-		/* XXX: When IPv4 is implemented as module and can be unloaded,
-		 * we should manage reference to xfrm4_output in afinfo->output.
-		 * Miyazawa
-		 */
-		afinfo = xfrm_state_get_afinfo(dst_prev->xfrm->props.family);
-		if (!afinfo) {
-			dst = *dst_p;
-			goto error;
-		}
-
-		dst_prev->output = afinfo->output;
-		xfrm_state_put_afinfo(afinfo);
+		dst_prev->output = dst_prev->xfrm->mode->afinfo->output;
 		/* Sheit... I remember I did this right. Apparently,
 		 * it was magically lost, so this code needs audit */
 		x->u.rt6.rt6i_flags    = rt0->rt6i_flags&(RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL);
diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c
index cdadb48..8bb9b08 100644
--- a/net/ipv6/xfrm6_state.c
+++ b/net/ipv6/xfrm6_state.c
@@ -168,6 +168,7 @@ __xfrm6_tmpl_sort(struct xfrm_tmpl **dst, struct xfrm_tmpl **src, int n)
 
 static struct xfrm_state_afinfo xfrm6_state_afinfo = {
 	.family			= AF_INET6,
+	.owner			= THIS_MODULE,
 	.init_tempsel		= __xfrm6_init_tempsel,
 	.tmpl_sort		= __xfrm6_tmpl_sort,
 	.state_sort		= __xfrm6_state_sort,
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index dc438f2..48b4a06 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -57,6 +57,9 @@ static unsigned int xfrm_state_hashmax __read_mostly = 1 * 1024 * 1024;
 static unsigned int xfrm_state_num;
 static unsigned int xfrm_state_genid;
 
+static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned int family);
+static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo);
+
 static inline unsigned int xfrm_dst_hash(xfrm_address_t *daddr,
 					 xfrm_address_t *saddr,
 					 u32 reqid,
@@ -289,11 +292,18 @@ int xfrm_register_mode(struct xfrm_mode *mode, int family)
 
 	err = -EEXIST;
 	modemap = afinfo->mode_map;
-	if (likely(modemap[mode->encap] == NULL)) {
-		modemap[mode->encap] = mode;
-		err = 0;
-	}
+	if (modemap[mode->encap])
+		goto out;
 
+	err = -ENOENT;
+	if (!try_module_get(afinfo->owner))
+		goto out;
+
+	mode->afinfo = afinfo;
+	modemap[mode->encap] = mode;
+	err = 0;
+
+out:
 	xfrm_state_unlock_afinfo(afinfo);
 	return err;
 }
@@ -316,6 +326,7 @@ int xfrm_unregister_mode(struct xfrm_mode *mode, int family)
 	modemap = afinfo->mode_map;
 	if (likely(modemap[mode->encap] == mode)) {
 		modemap[mode->encap] = NULL;
+		module_put(mode->afinfo->owner);
 		err = 0;
 	}
 
@@ -1869,7 +1880,7 @@ int xfrm_state_unregister_afinfo(struct xfrm_state_afinfo *afinfo)
 }
 EXPORT_SYMBOL(xfrm_state_unregister_afinfo);
 
-struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family)
+static struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned int family)
 {
 	struct xfrm_state_afinfo *afinfo;
 	if (unlikely(family >= NPROTO))
@@ -1881,14 +1892,11 @@ struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned short family)
 	return afinfo;
 }
 
-void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo)
+static void xfrm_state_put_afinfo(struct xfrm_state_afinfo *afinfo)
 {
 	read_unlock(&xfrm_state_afinfo_lock);
 }
 
-EXPORT_SYMBOL(xfrm_state_get_afinfo);
-EXPORT_SYMBOL(xfrm_state_put_afinfo);
-
 /* Temporarily located here until net/xfrm/xfrm_tunnel.c is created */
 void xfrm_state_delete_tunnel(struct xfrm_state *x)
 {

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 9/12] [IPSEC]: Use the top IPv4 route's peer instead of the bottom
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (7 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 8/12] [IPSEC]: Store afinfo pointer in xfrm_mode Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 10/12] [IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP Herbert Xu
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Use the top IPv4 route's peer instead of the bottom

For IPv4 we were using the bottom route's peer instead of the top one.
This is wrong because the peer is only used by TCP to keep track of
information about the TCP destination address which certainly does not
live in the bottom route.

This patch fixes that which allows us to get rid of the family check
since the bottom route could be IPv6 while the top one must always
be IPv4.

I've also changed the other fields which are IPv4-specific to get the
info from the top route instead of potentially bogus data from the
bottom route.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv4/xfrm4_policy.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index bd07a98..f9b4e4f 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -169,16 +169,16 @@ __xfrm4_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int
 		dst_prev->neighbour	= neigh_clone(rt->u.dst.neighbour);
 		dst_prev->input		= rt->u.dst.input;
 		dst_prev->output = dst_prev->xfrm->mode->afinfo->output;
-		if (dst_prev->xfrm->props.family == AF_INET && rt->peer)
-			atomic_inc(&rt->peer->refcnt);
-		x->u.rt.peer = rt->peer;
+		if (rt0->peer)
+			atomic_inc(&rt0->peer->refcnt);
+		x->u.rt.peer = rt0->peer;
 		/* Sheit... I remember I did this right. Apparently,
 		 * it was magically lost, so this code needs audit */
 		x->u.rt.rt_flags = rt0->rt_flags&(RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL);
-		x->u.rt.rt_type = rt->rt_type;
+		x->u.rt.rt_type = rt0->rt_type;
 		x->u.rt.rt_src = rt0->rt_src;
 		x->u.rt.rt_dst = rt0->rt_dst;
-		x->u.rt.rt_gateway = rt->rt_gateway;
+		x->u.rt.rt_gateway = rt0->rt_gateway;
 		x->u.rt.rt_spec_dst = rt0->rt_spec_dst;
 		x->u.rt.idev = rt0->idev;
 		in_dev_hold(rt0->idev);
@@ -280,7 +280,7 @@ static void xfrm4_dst_destroy(struct dst_entry *dst)
 
 	if (likely(xdst->u.rt.idev))
 		in_dev_put(xdst->u.rt.idev);
-	if (dst->xfrm && dst->xfrm->props.family == AF_INET && likely(xdst->u.rt.peer))
+	if (likely(xdst->u.rt.peer))
 		inet_putpeer(xdst->u.rt.peer);
 	xfrm_dst_destroy(xdst);
 }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/12] [IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (8 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 9/12] [IPSEC]: Use the top IPv4 route's peer instead of the bottom Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 14:33 ` [PATCH 11/12] [IPSEC]: Reinject packet instead of calling netfilter directly on input Herbert Xu
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP

Combining RO and AH/ESP/IPCOMP does not make sense.  So this patch adds a
check in the state initialisation function to prevent this.

This allows us to safely remove the mode input function of RO since it
can never be called anymore.  Indeed, if somehow it does get called we'll
know about it through an OOPS instead of it slipping past silently.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv6/ah6.c           |    9 ++++++++-
 net/ipv6/esp6.c          |    9 ++++++++-
 net/ipv6/ipcomp6.c       |    9 ++++++++-
 net/ipv6/xfrm6_mode_ro.c |    9 ---------
 4 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index a8221d1..67cd066 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -477,8 +477,15 @@ static int ah6_init_state(struct xfrm_state *x)
 
 	x->props.header_len = XFRM_ALIGN8(sizeof(struct ip_auth_hdr) +
 					  ahp->icv_trunc_len);
-	if (x->props.mode == XFRM_MODE_TUNNEL)
+	switch (x->props.mode) {
+	case XFRM_MODE_BEET:
+	case XFRM_MODE_TRANSPORT:
+		break;
+	case XFRM_MODE_TUNNEL:
 		x->props.header_len += sizeof(struct ipv6hdr);
+	default:
+		goto error;
+	}
 	x->data = ahp;
 
 	return 0;
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 9eb9285..b071543 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -354,8 +354,15 @@ static int esp6_init_state(struct xfrm_state *x)
 				    (x->ealg->alg_key_len + 7) / 8))
 		goto error;
 	x->props.header_len = sizeof(struct ip_esp_hdr) + esp->conf.ivlen;
-	if (x->props.mode == XFRM_MODE_TUNNEL)
+	switch (x->props.mode) {
+	case XFRM_MODE_BEET:
+	case XFRM_MODE_TRANSPORT:
+		break;
+	case XFRM_MODE_TUNNEL:
 		x->props.header_len += sizeof(struct ipv6hdr);
+	default:
+		goto error;
+	}
 	x->data = esp;
 	return 0;
 
diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
index 28fc8ed..80ef2a1 100644
--- a/net/ipv6/ipcomp6.c
+++ b/net/ipv6/ipcomp6.c
@@ -411,8 +411,15 @@ static int ipcomp6_init_state(struct xfrm_state *x)
 		goto out;
 
 	x->props.header_len = 0;
-	if (x->props.mode == XFRM_MODE_TUNNEL)
+	switch (x->props.mode) {
+	case XFRM_MODE_BEET:
+	case XFRM_MODE_TRANSPORT:
+		break;
+	case XFRM_MODE_TUNNEL:
 		x->props.header_len += sizeof(struct ipv6hdr);
+	default:
+		goto error;
+	}
 
 	mutex_lock(&ipcomp6_resource_mutex);
 	if (!ipcomp6_alloc_scratches())
diff --git a/net/ipv6/xfrm6_mode_ro.c b/net/ipv6/xfrm6_mode_ro.c
index 957ae36..a7bc8c6 100644
--- a/net/ipv6/xfrm6_mode_ro.c
+++ b/net/ipv6/xfrm6_mode_ro.c
@@ -58,16 +58,7 @@ static int xfrm6_ro_output(struct xfrm_state *x, struct sk_buff *skb)
 	return 0;
 }
 
-/*
- * Do nothing about routing optimization header unlike IPsec.
- */
-static int xfrm6_ro_input(struct xfrm_state *x, struct sk_buff *skb)
-{
-	return 0;
-}
-
 static struct xfrm_mode xfrm6_ro_mode = {
-	.input = xfrm6_ro_input,
 	.output = xfrm6_ro_output,
 	.owner = THIS_MODULE,
 	.encap = XFRM_MODE_ROUTEOPTIMIZATION,

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/12] [IPSEC]: Reinject packet instead of calling netfilter directly on input
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (9 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 10/12] [IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
  2007-10-16 15:05   ` YOSHIFUJI Hideaki / 吉藤英明
  2007-10-16 14:33 ` [PATCH 12/12] [NET]: Add netif_rerx_secpath Herbert Xu
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[IPSEC]: Reinject packet instead of calling netfilter directly on input

Currently we call netfilter directly on input after a series of transport
mode transforms (and BEET but that's a separate bug).  This is inconsistent
because other parts of the stack such AF_PACKET cannot see the decapsulated
packet.  In fact this is a common complaint about the Linux IPsec stack.

Another problem is that there is a potential for stack overflow if we
encounter a DNAT rule which turns a foreign packet into a local one that
contains another transport mode SA.

This patch introduces a major behavioural change by reinjecting the
packet instead of calling netfilter directly.

This solves both of the aformentioned problems.

It is still inconsistent with how we do things on output since we don't
pass things through AF_PACKET there either but the same inconsistency
exists for tunnel mode too so it's not a new problem.

To make things easier I've added a new function called netif_rerx which
resets netfilter and the dst before reinjecting the packet using netif_rx.
This can be used by other tunnel code as well.

I haven't added a reinject function for RO mode since it can never be
called on that path and if it does we want to know about it through an
OOPS.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netdevice.h       |    1 +
 include/net/xfrm.h              |    8 ++++++++
 net/core/dev.c                  |   12 ++++++++++++
 net/ipv4/xfrm4_input.c          |   24 ++----------------------
 net/ipv4/xfrm4_mode_beet.c      |    7 +++++++
 net/ipv4/xfrm4_mode_transport.c |   11 +++++++++++
 net/ipv4/xfrm4_mode_tunnel.c    |    7 +++++++
 net/ipv6/xfrm6_input.c          |   23 ++---------------------
 net/ipv6/xfrm6_mode_beet.c      |    7 +++++++
 net/ipv6/xfrm6_mode_transport.c |   10 ++++++++++
 net/ipv6/xfrm6_mode_tunnel.c    |    7 +++++++
 11 files changed, 74 insertions(+), 43 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 39dd83b..097f911 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1039,6 +1039,7 @@ extern void dev_kfree_skb_any(struct sk_buff *skb);
 #define HAVE_NETIF_RX 1
 extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
+extern int		netif_rerx(struct sk_buff *skb);
 #define HAVE_NETIF_RECEIVE_SKB 1
 extern int		netif_receive_skb(struct sk_buff *skb);
 extern int		dev_valid_name(const char *name);
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index a9e8247..e5ae5fa 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -311,6 +311,14 @@ struct xfrm_mode {
 	 */
 	int (*output)(struct xfrm_state *x,struct sk_buff *skb);
 
+	/*
+	 * Reinject packet into stack.
+	 *
+	 * On entry, the packet is in the state as on exit from the
+	 * input function above.
+	 */
+	int (*reinject)(struct xfrm_state *x,struct sk_buff *skb);
+
 	struct xfrm_state_afinfo *afinfo;
 	struct module *owner;
 	unsigned int encap;
diff --git a/net/core/dev.c b/net/core/dev.c
index 38b03da..b753ec8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1808,6 +1808,18 @@ int netif_rx_ni(struct sk_buff *skb)
 
 EXPORT_SYMBOL(netif_rx_ni);
 
+/* Reinject a packet that has previously been processed, e.g., by tunneling. */
+int netif_rerx(struct sk_buff *skb)
+{
+	nf_reset(skb);
+
+	dst_release(skb->dst);
+	skb->dst = NULL;
+
+	return netif_rx(skb);
+}
+EXPORT_SYMBOL(netif_rerx);
+
 static inline struct net_device *skb_bond(struct sk_buff *skb)
 {
 	struct net_device *dev = skb->dev;
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 5cb0b59..f5576d5 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -41,7 +41,6 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
 	struct xfrm_state *xfrm_vec[XFRM_MAX_DEPTH];
 	struct xfrm_state *x;
 	int xfrm_nr = 0;
-	int decaps = 0;
 	unsigned int nhoff = offsetof(struct iphdr, protocol);
 
 	seq = 0;
@@ -95,7 +94,6 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
 			goto drop;
 
 		if (x->props.mode == XFRM_MODE_TUNNEL) {
-			decaps = 1;
 			break;
 		}
 
@@ -122,26 +120,8 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
 	       xfrm_nr * sizeof(xfrm_vec[0]));
 	skb->sp->len += xfrm_nr;
 
-	nf_reset(skb);
-
-	if (decaps) {
-		dst_release(skb->dst);
-		skb->dst = NULL;
-		netif_rx(skb);
-		return 0;
-	} else {
-#ifdef CONFIG_NETFILTER
-		__skb_push(skb, skb->data - skb_network_header(skb));
-		ip_hdr(skb)->tot_len = htons(skb->len);
-		ip_send_check(ip_hdr(skb));
-
-		NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
-			xfrm4_rcv_encap_finish);
-		return 0;
-#else
-		return -ip_hdr(skb)->protocol;
-#endif
-	}
+	x->mode->reinject(x, skb);
+	return 0;
 
 drop_unlock:
 	spin_unlock(&x->lock);
diff --git a/net/ipv4/xfrm4_mode_beet.c b/net/ipv4/xfrm4_mode_beet.c
index 73d2338..012ae98 100644
--- a/net/ipv4/xfrm4_mode_beet.c
+++ b/net/ipv4/xfrm4_mode_beet.c
@@ -11,6 +11,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <linux/stringify.h>
 #include <net/dst.h>
@@ -109,9 +110,15 @@ out:
 	return err;
 }
 
+static int xfrm4_beet_reinject(struct xfrm_state *x, struct sk_buff *skb)
+{
+	return netif_rerx(skb);
+}
+
 static struct xfrm_mode xfrm4_beet_mode = {
 	.input = xfrm4_beet_input,
 	.output = xfrm4_beet_output,
+	.reinject = xfrm4_beet_reinject,
 	.owner = THIS_MODULE,
 	.encap = XFRM_MODE_BEET,
 };
diff --git a/net/ipv4/xfrm4_mode_transport.c b/net/ipv4/xfrm4_mode_transport.c
index fd840c7..602418b 100644
--- a/net/ipv4/xfrm4_mode_transport.c
+++ b/net/ipv4/xfrm4_mode_transport.c
@@ -7,6 +7,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <linux/stringify.h>
 #include <net/dst.h>
@@ -54,9 +55,19 @@ static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 	return 0;
 }
 
+static int xfrm4_transport_reinject(struct xfrm_state *x, struct sk_buff *skb)
+{
+	__skb_push(skb, skb->data - skb_network_header(skb));
+	ip_hdr(skb)->tot_len = htons(skb->len);
+	ip_send_check(ip_hdr(skb));
+
+	return netif_rerx(skb);
+}
+
 static struct xfrm_mode xfrm4_transport_mode = {
 	.input = xfrm4_transport_input,
 	.output = xfrm4_transport_output,
+	.reinject = xfrm4_transport_reinject,
 	.owner = THIS_MODULE,
 	.encap = XFRM_MODE_TRANSPORT,
 };
diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index 1ae9d32..780908a 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -7,6 +7,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <linux/stringify.h>
 #include <net/dst.h>
@@ -134,9 +135,15 @@ out:
 	return err;
 }
 
+static int xfrm4_tunnel_reinject(struct xfrm_state *x, struct sk_buff *skb)
+{
+	return netif_rerx(skb);
+}
+
 static struct xfrm_mode xfrm4_tunnel_mode = {
 	.input = xfrm4_tunnel_input,
 	.output = xfrm4_tunnel_output,
+	.reinject = xfrm4_tunnel_reinject,
 	.owner = THIS_MODULE,
 	.encap = XFRM_MODE_TUNNEL,
 };
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index b1201c3..1347e0a 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -23,7 +23,6 @@ int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 	struct xfrm_state *xfrm_vec[XFRM_MAX_DEPTH];
 	struct xfrm_state *x;
 	int xfrm_nr = 0;
-	int decaps = 0;
 	unsigned int nhoff;
 
 	nhoff = IP6CB(skb)->nhoff;
@@ -72,7 +71,6 @@ int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 			goto drop;
 
 		if (x->props.mode == XFRM_MODE_TUNNEL) { /* XXX */
-			decaps = 1;
 			break;
 		}
 
@@ -98,25 +96,8 @@ int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 	       xfrm_nr * sizeof(xfrm_vec[0]));
 	skb->sp->len += xfrm_nr;
 
-	nf_reset(skb);
-
-	if (decaps) {
-		dst_release(skb->dst);
-		skb->dst = NULL;
-		netif_rx(skb);
-		return -1;
-	} else {
-#ifdef CONFIG_NETFILTER
-		ipv6_hdr(skb)->payload_len = htons(skb->len);
-		__skb_push(skb, skb->data - skb_network_header(skb));
-
-		NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
-			ip6_rcv_finish);
-		return -1;
-#else
-		return 1;
-#endif
-	}
+	x->mode->reinject(x, skb);
+	return -1;
 
 drop_unlock:
 	spin_unlock(&x->lock);
diff --git a/net/ipv6/xfrm6_mode_beet.c b/net/ipv6/xfrm6_mode_beet.c
index 13bb1e8..17622cf 100644
--- a/net/ipv6/xfrm6_mode_beet.c
+++ b/net/ipv6/xfrm6_mode_beet.c
@@ -11,6 +11,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <linux/stringify.h>
 #include <net/dsfield.h>
@@ -74,9 +75,15 @@ out:
 	return err;
 }
 
+static int xfrm6_beet_reinject(struct xfrm_state *x, struct sk_buff *skb)
+{
+	return netif_rerx(skb);
+}
+
 static struct xfrm_mode xfrm6_beet_mode = {
 	.input = xfrm6_beet_input,
 	.output = xfrm6_beet_output,
+	.reinject = xfrm6_beet_reinject,
 	.owner = THIS_MODULE,
 	.encap = XFRM_MODE_BEET,
 };
diff --git a/net/ipv6/xfrm6_mode_transport.c b/net/ipv6/xfrm6_mode_transport.c
index 4e34410..e165442 100644
--- a/net/ipv6/xfrm6_mode_transport.c
+++ b/net/ipv6/xfrm6_mode_transport.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <linux/stringify.h>
 #include <net/dst.h>
@@ -59,9 +60,18 @@ static int xfrm6_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 	return 0;
 }
 
+static int xfrm6_transport_reinject(struct xfrm_state *x, struct sk_buff *skb)
+{
+	ipv6_hdr(skb)->payload_len = htons(skb->len);
+	__skb_push(skb, skb->data - skb_network_header(skb));
+
+	return netif_rerx(skb);
+}
+
 static struct xfrm_mode xfrm6_transport_mode = {
 	.input = xfrm6_transport_input,
 	.output = xfrm6_transport_output,
+	.reinject = xfrm6_transport_reinject,
 	.owner = THIS_MODULE,
 	.encap = XFRM_MODE_TRANSPORT,
 };
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index ea22838..1329d6a 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <linux/stringify.h>
 #include <net/dsfield.h>
@@ -113,9 +114,15 @@ out:
 	return err;
 }
 
+static int xfrm6_tunnel_reinject(struct xfrm_state *x, struct sk_buff *skb)
+{
+	return netif_rerx(skb);
+}
+
 static struct xfrm_mode xfrm6_tunnel_mode = {
 	.input = xfrm6_tunnel_input,
 	.output = xfrm6_tunnel_output,
+	.reinject = xfrm6_tunnel_reinject,
 	.owner = THIS_MODULE,
 	.encap = XFRM_MODE_TUNNEL,
 };

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/12] [NET]: Add netif_rerx_secpath
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (10 preceding siblings ...)
  2007-10-16 14:33 ` [PATCH 11/12] [IPSEC]: Reinject packet instead of calling netfilter directly on input Herbert Xu
@ 2007-10-16 14:33 ` Herbert Xu
       [not found] ` <E1IhnHJ-0003A4-00@gondolin.me.apana.org.au>
  2007-10-16 14:39 ` [0/12] Trying to merge xfrm input path before I got side-tracked Patrick McHardy
  13 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:33 UTC (permalink / raw)
  To: David S. Miller, netdev, Patrick McHardy, Herbert Xu

[NET]: Add netif_rerx_secpath

This patch follows on the netif_rerx addition.  A number of tunnels reinject
packets back into the stack in the way of netif_rerx.  They also need to
reset the security path since they're not part of the IPsec stack.

This patch creates the netif_rerx_secpath function which resets the security
path before calling netif_rerx.  It also uses them in the appropriate places.

The only spot of note is ipmr.c where we didn't reset the security path
before.  However, that is clearly oversight since PIM is certainly not
part of the IPsec stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 drivers/net/veth.c        |    7 +------
 include/linux/netdevice.h |    1 +
 net/core/dev.c            |    9 +++++++++
 net/ipv4/ip_gre.c         |    9 ++-------
 net/ipv4/ipip.c           |    7 +------
 net/ipv4/ipmr.c           |   10 ++--------
 net/ipv6/ip6_tunnel.c     |    6 +-----
 net/ipv6/sit.c            |    5 +----
 8 files changed, 18 insertions(+), 36 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index fdd1e03..a19bc0c 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -14,7 +14,6 @@
 #include <linux/etherdevice.h>
 
 #include <net/dst.h>
-#include <net/xfrm.h>
 #include <net/veth.h>
 
 #define DRV_NAME	"veth"
@@ -172,11 +171,7 @@ static int veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (dev->features & NETIF_F_NO_CSUM)
 		skb->ip_summed = rcv_priv->ip_summed;
 
-	dst_release(skb->dst);
-	skb->dst = NULL;
 	skb->mark = 0;
-	secpath_reset(skb);
-	nf_reset(skb);
 
 	length = skb->len;
 
@@ -187,7 +182,7 @@ static int veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	stats->rx_bytes += length;
 	stats->rx_packets++;
 
-	netif_rx(skb);
+	netif_rerx_secpath(skb);
 	return 0;
 
 outf:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 097f911..e19c696 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1040,6 +1040,7 @@ extern void dev_kfree_skb_any(struct sk_buff *skb);
 extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
 extern int		netif_rerx(struct sk_buff *skb);
+extern int		netif_rerx_secpath(struct sk_buff *skb);
 #define HAVE_NETIF_RECEIVE_SKB 1
 extern int		netif_receive_skb(struct sk_buff *skb);
 extern int		dev_valid_name(const char *name);
diff --git a/net/core/dev.c b/net/core/dev.c
index b753ec8..202c69a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -113,6 +113,7 @@
 #include <linux/delay.h>
 #include <net/wext.h>
 #include <net/iw_handler.h>
+#include <net/xfrm.h>
 #include <asm/current.h>
 #include <linux/audit.h>
 #include <linux/dmaengine.h>
@@ -1820,6 +1821,14 @@ int netif_rerx(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(netif_rerx);
 
+/* Reinject a packet without keeping the secpath. */
+int netif_rerx_secpath(struct sk_buff *skb)
+{
+	secpath_reset(skb);
+	return netif_rerx(skb);
+}
+EXPORT_SYMBOL(netif_rerx_secpath);
+
 static inline struct net_device *skb_bond(struct sk_buff *skb)
 {
 	struct net_device *dev = skb->dev;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index f151900..5882eaf 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -38,7 +38,7 @@
 #include <net/checksum.h>
 #include <net/dsfield.h>
 #include <net/inet_ecn.h>
-#include <net/xfrm.h>
+#include <net/route.h>
 
 #ifdef CONFIG_IPV6
 #include <net/ipv6.h>
@@ -599,8 +599,6 @@ static int ipgre_rcv(struct sk_buff *skb)
 
 	read_lock(&ipgre_lock);
 	if ((tunnel = ipgre_tunnel_lookup(iph->saddr, iph->daddr, key)) != NULL) {
-		secpath_reset(skb);
-
 		skb->protocol = *(__be16*)(h + 2);
 		/* WCCP version 1 and 2 protocol decoding.
 		 * - Change protocol to IP
@@ -646,11 +644,8 @@ static int ipgre_rcv(struct sk_buff *skb)
 		tunnel->stat.rx_packets++;
 		tunnel->stat.rx_bytes += skb->len;
 		skb->dev = tunnel->dev;
-		dst_release(skb->dst);
-		skb->dst = NULL;
-		nf_reset(skb);
 		ipgre_ecn_decapsulate(iph, skb);
-		netif_rx(skb);
+		netif_rerx_secpath(skb);
 		read_unlock(&ipgre_lock);
 		return(0);
 	}
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 5cd5bbe..cc78c8f 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -476,8 +476,6 @@ static int ipip_rcv(struct sk_buff *skb)
 			return 0;
 		}
 
-		secpath_reset(skb);
-
 		skb->mac_header = skb->network_header;
 		skb_reset_network_header(skb);
 		skb->protocol = htons(ETH_P_IP);
@@ -486,11 +484,8 @@ static int ipip_rcv(struct sk_buff *skb)
 		tunnel->stat.rx_packets++;
 		tunnel->stat.rx_bytes += skb->len;
 		skb->dev = tunnel->dev;
-		dst_release(skb->dst);
-		skb->dst = NULL;
-		nf_reset(skb);
 		ipip_ecn_decapsulate(iph, skb);
-		netif_rx(skb);
+		netif_rerx_secpath(skb);
 		read_unlock(&ipip_lock);
 		return 0;
 	}
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 37bb497..f0ad033 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1483,12 +1483,9 @@ int pim_rcv_v1(struct sk_buff * skb)
 	skb->protocol = htons(ETH_P_IP);
 	skb->ip_summed = 0;
 	skb->pkt_type = PACKET_HOST;
-	dst_release(skb->dst);
-	skb->dst = NULL;
 	((struct net_device_stats*)netdev_priv(reg_dev))->rx_bytes += skb->len;
 	((struct net_device_stats*)netdev_priv(reg_dev))->rx_packets++;
-	nf_reset(skb);
-	netif_rx(skb);
+	netif_rerx_secpath(skb);
 	dev_put(reg_dev);
 	return 0;
  drop:
@@ -1539,12 +1536,9 @@ static int pim_rcv(struct sk_buff * skb)
 	skb->protocol = htons(ETH_P_IP);
 	skb->ip_summed = 0;
 	skb->pkt_type = PACKET_HOST;
-	dst_release(skb->dst);
 	((struct net_device_stats*)netdev_priv(reg_dev))->rx_bytes += skb->len;
 	((struct net_device_stats*)netdev_priv(reg_dev))->rx_packets++;
-	skb->dst = NULL;
-	nf_reset(skb);
-	netif_rx(skb);
+	netif_rerx_secpath(skb);
 	dev_put(reg_dev);
 	return 0;
  drop:
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 2320cc2..2746ce0 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -699,22 +699,18 @@ static int ip6_tnl_rcv(struct sk_buff *skb, __u16 protocol,
 			read_unlock(&ip6_tnl_lock);
 			goto discard;
 		}
-		secpath_reset(skb);
 		skb->mac_header = skb->network_header;
 		skb_reset_network_header(skb);
 		skb->protocol = htons(protocol);
 		skb->pkt_type = PACKET_HOST;
 		memset(skb->cb, 0, sizeof(struct inet6_skb_parm));
 		skb->dev = t->dev;
-		dst_release(skb->dst);
-		skb->dst = NULL;
-		nf_reset(skb);
 
 		dscp_ecn_decapsulate(t, ipv6h, skb);
 
 		t->stat.rx_packets++;
 		t->stat.rx_bytes += skb->len;
-		netif_rx(skb);
+		netif_rerx_secpath(skb);
 		read_unlock(&ip6_tnl_lock);
 		return 0;
 	}
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 466657a..08081ef 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -385,11 +385,8 @@ static int ipip6_rcv(struct sk_buff *skb)
 		tunnel->stat.rx_packets++;
 		tunnel->stat.rx_bytes += skb->len;
 		skb->dev = tunnel->dev;
-		dst_release(skb->dst);
-		skb->dst = NULL;
-		nf_reset(skb);
 		ipip6_ecn_decapsulate(iph, skb);
-		netif_rx(skb);
+		netif_rerx_secpath(skb);
 		read_unlock(&ipip6_lock);
 		return 0;
 	}

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/12] [IPSEC]: Remove xfrmX_tunnel_check_size
       [not found] ` <E1IhnHJ-0003A4-00@gondolin.me.apana.org.au>
@ 2007-10-16 14:38   ` Patrick McHardy
  2007-10-16 15:17     ` Herbert Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2007-10-16 14:38 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

Herbert Xu wrote:
> [IPSEC]: Remove xfrmX_tunnel_check_size
> 
> These functions have always been causing trouble by sending ICMP errors
> back to the local host which was totally confused about how to deal with
> it and most often ended up causing a downward spiral which only finishes
> when the MTU is so small that you can't send packets out anymore.
> 
> They're also wrong now that we have inter-family transforms.  They'll
> end up trying to shove an IPv4 packet into the IPv6 ICMP stack and vice
> versa.
> 
> In fact, I've just realised that they are totally unnecessary.  The reason
> is that whoever calls us should have already checked the MTU.  In particular,
> there are two cases:
> 
> 1) The packet is forwarded in which case the forwarding function would've
> performed the check.
> 
> 2) The packet is local in which case whoever generated it should've checked.
> If they didn't check then us sending back an ICMP error wouldn't do any good
> anyway since the next time they transmit they'll still get it wrong.
> 
> So the only time this function has an effect is when the MTU happens to
> change between the caller checking it and us checking it.  This is useless
> because if we did catch such a change there's nothing stopping a further
> MTU change between us checking it and the packet actually getting to the
> device.


Thats true, but for the first case we actually have something in the
stack doing that, which is NAT and routing by fwmark. Maybe netfilter
should just send an ICMP error back, that would also solve the problem
of silently dropped packets when rerouting to an unreachable
destination.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [0/12] Trying to merge xfrm input path before I got side-tracked...
  2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
                   ` (12 preceding siblings ...)
       [not found] ` <E1IhnHJ-0003A4-00@gondolin.me.apana.org.au>
@ 2007-10-16 14:39 ` Patrick McHardy
  2007-10-16 14:44   ` Herbert Xu
  13 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2007-10-16 14:39 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

Herbert Xu wrote:
> Hi Dave:
> 
> I was well on my way to merging the xfrm input path before I got
> side-tracked by inter-family transforms :)
> 
> Anyway, here's a dump of what I've got.  The one note-worthy bit
> is the patch to reinject transport mode packets through netif_rx
> rather than calling netfilter directly.
> 
> Everything else is pretty mundane (modulo any bugs of course).


I think your patches didn't make it to netdev because the last
address missed a closing '>'.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [0/12] Trying to merge xfrm input path before I got side-tracked...
  2007-10-16 14:39 ` [0/12] Trying to merge xfrm input path before I got side-tracked Patrick McHardy
@ 2007-10-16 14:44   ` Herbert Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 14:44 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netdev

On Tue, Oct 16, 2007 at 04:39:56PM +0200, Patrick McHardy wrote:
> 
> I think your patches didn't make it to netdev because the last
> address missed a closing '>'.

I think so too.  I've just resent them to netdev only but
currently my mail server is being spammed so it's not sending
anything out.

I suppose you'll see them when you see this message :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 11/12] [IPSEC]: Reinject packet instead of calling netfilter directly on input
  2007-10-16 14:33 ` [PATCH 11/12] [IPSEC]: Reinject packet instead of calling netfilter directly on input Herbert Xu
@ 2007-10-16 15:05   ` YOSHIFUJI Hideaki / 吉藤英明
  2007-10-16 15:12     ` Herbert Xu
  0 siblings, 1 reply; 19+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2007-10-16 15:05 UTC (permalink / raw)
  To: herbert; +Cc: davem, netdev, kaber, yoshfuji, kozakai

Herbert,

I think with this change, we parse extension headers, twice.
We really do not want to do this.

--yoshfuji

In article <E1IhnTz-0003HT-00@gondolin.me.apana.org.au> (at Tue, 16 Oct 2007 22:33:19 +0800), Herbert Xu <herbert@gondor.apana.org.au> says:

> [IPSEC]: Reinject packet instead of calling netfilter directly on input
> 
> Currently we call netfilter directly on input after a series of transport
> mode transforms (and BEET but that's a separate bug).  This is inconsistent
> because other parts of the stack such AF_PACKET cannot see the decapsulated
> packet.  In fact this is a common complaint about the Linux IPsec stack.
> 
> Another problem is that there is a potential for stack overflow if we
> encounter a DNAT rule which turns a foreign packet into a local one that
> contains another transport mode SA.
> 
> This patch introduces a major behavioural change by reinjecting the
> packet instead of calling netfilter directly.
> 
> This solves both of the aformentioned problems.
> 
> It is still inconsistent with how we do things on output since we don't
> pass things through AF_PACKET there either but the same inconsistency
> exists for tunnel mode too so it's not a new problem.
> 
> To make things easier I've added a new function called netif_rerx which
> resets netfilter and the dst before reinjecting the packet using netif_rx.
> This can be used by other tunnel code as well.
> 
> I haven't added a reinject function for RO mode since it can never be
> called on that path and if it does we want to know about it through an
> OOPS.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> ---
> 
>  include/linux/netdevice.h       |    1 +
>  include/net/xfrm.h              |    8 ++++++++
>  net/core/dev.c                  |   12 ++++++++++++
>  net/ipv4/xfrm4_input.c          |   24 ++----------------------
>  net/ipv4/xfrm4_mode_beet.c      |    7 +++++++
>  net/ipv4/xfrm4_mode_transport.c |   11 +++++++++++
>  net/ipv4/xfrm4_mode_tunnel.c    |    7 +++++++
>  net/ipv6/xfrm6_input.c          |   23 ++---------------------
>  net/ipv6/xfrm6_mode_beet.c      |    7 +++++++
>  net/ipv6/xfrm6_mode_transport.c |   10 ++++++++++
>  net/ipv6/xfrm6_mode_tunnel.c    |    7 +++++++
>  11 files changed, 74 insertions(+), 43 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 39dd83b..097f911 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1039,6 +1039,7 @@ extern void dev_kfree_skb_any(struct sk_buff *skb);
>  #define HAVE_NETIF_RX 1
>  extern int		netif_rx(struct sk_buff *skb);
>  extern int		netif_rx_ni(struct sk_buff *skb);
> +extern int		netif_rerx(struct sk_buff *skb);
>  #define HAVE_NETIF_RECEIVE_SKB 1
>  extern int		netif_receive_skb(struct sk_buff *skb);
>  extern int		dev_valid_name(const char *name);
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index a9e8247..e5ae5fa 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -311,6 +311,14 @@ struct xfrm_mode {
>  	 */
>  	int (*output)(struct xfrm_state *x,struct sk_buff *skb);
>  
> +	/*
> +	 * Reinject packet into stack.
> +	 *
> +	 * On entry, the packet is in the state as on exit from the
> +	 * input function above.
> +	 */
> +	int (*reinject)(struct xfrm_state *x,struct sk_buff *skb);
> +
>  	struct xfrm_state_afinfo *afinfo;
>  	struct module *owner;
>  	unsigned int encap;
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 38b03da..b753ec8 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1808,6 +1808,18 @@ int netif_rx_ni(struct sk_buff *skb)
>  
>  EXPORT_SYMBOL(netif_rx_ni);
>  
> +/* Reinject a packet that has previously been processed, e.g., by tunneling. */
> +int netif_rerx(struct sk_buff *skb)
> +{
> +	nf_reset(skb);
> +
> +	dst_release(skb->dst);
> +	skb->dst = NULL;
> +
> +	return netif_rx(skb);
> +}
> +EXPORT_SYMBOL(netif_rerx);
> +
>  static inline struct net_device *skb_bond(struct sk_buff *skb)
>  {
>  	struct net_device *dev = skb->dev;
> diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
> index 5cb0b59..f5576d5 100644
> --- a/net/ipv4/xfrm4_input.c
> +++ b/net/ipv4/xfrm4_input.c
> @@ -41,7 +41,6 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
>  	struct xfrm_state *xfrm_vec[XFRM_MAX_DEPTH];
>  	struct xfrm_state *x;
>  	int xfrm_nr = 0;
> -	int decaps = 0;
>  	unsigned int nhoff = offsetof(struct iphdr, protocol);
>  
>  	seq = 0;
> @@ -95,7 +94,6 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
>  			goto drop;
>  
>  		if (x->props.mode == XFRM_MODE_TUNNEL) {
> -			decaps = 1;
>  			break;
>  		}
>  
> @@ -122,26 +120,8 @@ int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi,
>  	       xfrm_nr * sizeof(xfrm_vec[0]));
>  	skb->sp->len += xfrm_nr;
>  
> -	nf_reset(skb);
> -
> -	if (decaps) {
> -		dst_release(skb->dst);
> -		skb->dst = NULL;
> -		netif_rx(skb);
> -		return 0;
> -	} else {
> -#ifdef CONFIG_NETFILTER
> -		__skb_push(skb, skb->data - skb_network_header(skb));
> -		ip_hdr(skb)->tot_len = htons(skb->len);
> -		ip_send_check(ip_hdr(skb));
> -
> -		NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
> -			xfrm4_rcv_encap_finish);
> -		return 0;
> -#else
> -		return -ip_hdr(skb)->protocol;
> -#endif
> -	}
> +	x->mode->reinject(x, skb);
> +	return 0;
>  
>  drop_unlock:
>  	spin_unlock(&x->lock);
> diff --git a/net/ipv4/xfrm4_mode_beet.c b/net/ipv4/xfrm4_mode_beet.c
> index 73d2338..012ae98 100644
> --- a/net/ipv4/xfrm4_mode_beet.c
> +++ b/net/ipv4/xfrm4_mode_beet.c
> @@ -11,6 +11,7 @@
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/netdevice.h>
>  #include <linux/skbuff.h>
>  #include <linux/stringify.h>
>  #include <net/dst.h>
> @@ -109,9 +110,15 @@ out:
>  	return err;
>  }
>  
> +static int xfrm4_beet_reinject(struct xfrm_state *x, struct sk_buff *skb)
> +{
> +	return netif_rerx(skb);
> +}
> +
>  static struct xfrm_mode xfrm4_beet_mode = {
>  	.input = xfrm4_beet_input,
>  	.output = xfrm4_beet_output,
> +	.reinject = xfrm4_beet_reinject,
>  	.owner = THIS_MODULE,
>  	.encap = XFRM_MODE_BEET,
>  };
> diff --git a/net/ipv4/xfrm4_mode_transport.c b/net/ipv4/xfrm4_mode_transport.c
> index fd840c7..602418b 100644
> --- a/net/ipv4/xfrm4_mode_transport.c
> +++ b/net/ipv4/xfrm4_mode_transport.c
> @@ -7,6 +7,7 @@
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/netdevice.h>
>  #include <linux/skbuff.h>
>  #include <linux/stringify.h>
>  #include <net/dst.h>
> @@ -54,9 +55,19 @@ static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
>  	return 0;
>  }
>  
> +static int xfrm4_transport_reinject(struct xfrm_state *x, struct sk_buff *skb)
> +{
> +	__skb_push(skb, skb->data - skb_network_header(skb));
> +	ip_hdr(skb)->tot_len = htons(skb->len);
> +	ip_send_check(ip_hdr(skb));
> +
> +	return netif_rerx(skb);
> +}
> +
>  static struct xfrm_mode xfrm4_transport_mode = {
>  	.input = xfrm4_transport_input,
>  	.output = xfrm4_transport_output,
> +	.reinject = xfrm4_transport_reinject,
>  	.owner = THIS_MODULE,
>  	.encap = XFRM_MODE_TRANSPORT,
>  };
> diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
> index 1ae9d32..780908a 100644
> --- a/net/ipv4/xfrm4_mode_tunnel.c
> +++ b/net/ipv4/xfrm4_mode_tunnel.c
> @@ -7,6 +7,7 @@
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/netdevice.h>
>  #include <linux/skbuff.h>
>  #include <linux/stringify.h>
>  #include <net/dst.h>
> @@ -134,9 +135,15 @@ out:
>  	return err;
>  }
>  
> +static int xfrm4_tunnel_reinject(struct xfrm_state *x, struct sk_buff *skb)
> +{
> +	return netif_rerx(skb);
> +}
> +
>  static struct xfrm_mode xfrm4_tunnel_mode = {
>  	.input = xfrm4_tunnel_input,
>  	.output = xfrm4_tunnel_output,
> +	.reinject = xfrm4_tunnel_reinject,
>  	.owner = THIS_MODULE,
>  	.encap = XFRM_MODE_TUNNEL,
>  };
> diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
> index b1201c3..1347e0a 100644
> --- a/net/ipv6/xfrm6_input.c
> +++ b/net/ipv6/xfrm6_input.c
> @@ -23,7 +23,6 @@ int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
>  	struct xfrm_state *xfrm_vec[XFRM_MAX_DEPTH];
>  	struct xfrm_state *x;
>  	int xfrm_nr = 0;
> -	int decaps = 0;
>  	unsigned int nhoff;
>  
>  	nhoff = IP6CB(skb)->nhoff;
> @@ -72,7 +71,6 @@ int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
>  			goto drop;
>  
>  		if (x->props.mode == XFRM_MODE_TUNNEL) { /* XXX */
> -			decaps = 1;
>  			break;
>  		}
>  
> @@ -98,25 +96,8 @@ int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
>  	       xfrm_nr * sizeof(xfrm_vec[0]));
>  	skb->sp->len += xfrm_nr;
>  
> -	nf_reset(skb);
> -
> -	if (decaps) {
> -		dst_release(skb->dst);
> -		skb->dst = NULL;
> -		netif_rx(skb);
> -		return -1;
> -	} else {
> -#ifdef CONFIG_NETFILTER
> -		ipv6_hdr(skb)->payload_len = htons(skb->len);
> -		__skb_push(skb, skb->data - skb_network_header(skb));
> -
> -		NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
> -			ip6_rcv_finish);
> -		return -1;
> -#else
> -		return 1;
> -#endif
> -	}
> +	x->mode->reinject(x, skb);
> +	return -1;
>  
>  drop_unlock:
>  	spin_unlock(&x->lock);
> diff --git a/net/ipv6/xfrm6_mode_beet.c b/net/ipv6/xfrm6_mode_beet.c
> index 13bb1e8..17622cf 100644
> --- a/net/ipv6/xfrm6_mode_beet.c
> +++ b/net/ipv6/xfrm6_mode_beet.c
> @@ -11,6 +11,7 @@
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/netdevice.h>
>  #include <linux/skbuff.h>
>  #include <linux/stringify.h>
>  #include <net/dsfield.h>
> @@ -74,9 +75,15 @@ out:
>  	return err;
>  }
>  
> +static int xfrm6_beet_reinject(struct xfrm_state *x, struct sk_buff *skb)
> +{
> +	return netif_rerx(skb);
> +}
> +
>  static struct xfrm_mode xfrm6_beet_mode = {
>  	.input = xfrm6_beet_input,
>  	.output = xfrm6_beet_output,
> +	.reinject = xfrm6_beet_reinject,
>  	.owner = THIS_MODULE,
>  	.encap = XFRM_MODE_BEET,
>  };
> diff --git a/net/ipv6/xfrm6_mode_transport.c b/net/ipv6/xfrm6_mode_transport.c
> index 4e34410..e165442 100644
> --- a/net/ipv6/xfrm6_mode_transport.c
> +++ b/net/ipv6/xfrm6_mode_transport.c
> @@ -8,6 +8,7 @@
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/netdevice.h>
>  #include <linux/skbuff.h>
>  #include <linux/stringify.h>
>  #include <net/dst.h>
> @@ -59,9 +60,18 @@ static int xfrm6_transport_input(struct xfrm_state *x, struct sk_buff *skb)
>  	return 0;
>  }
>  
> +static int xfrm6_transport_reinject(struct xfrm_state *x, struct sk_buff *skb)
> +{
> +	ipv6_hdr(skb)->payload_len = htons(skb->len);
> +	__skb_push(skb, skb->data - skb_network_header(skb));
> +
> +	return netif_rerx(skb);
> +}
> +
>  static struct xfrm_mode xfrm6_transport_mode = {
>  	.input = xfrm6_transport_input,
>  	.output = xfrm6_transport_output,
> +	.reinject = xfrm6_transport_reinject,
>  	.owner = THIS_MODULE,
>  	.encap = XFRM_MODE_TRANSPORT,
>  };
> diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
> index ea22838..1329d6a 100644
> --- a/net/ipv6/xfrm6_mode_tunnel.c
> +++ b/net/ipv6/xfrm6_mode_tunnel.c
> @@ -8,6 +8,7 @@
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> +#include <linux/netdevice.h>
>  #include <linux/skbuff.h>
>  #include <linux/stringify.h>
>  #include <net/dsfield.h>
> @@ -113,9 +114,15 @@ out:
>  	return err;
>  }
>  
> +static int xfrm6_tunnel_reinject(struct xfrm_state *x, struct sk_buff *skb)
> +{
> +	return netif_rerx(skb);
> +}
> +
>  static struct xfrm_mode xfrm6_tunnel_mode = {
>  	.input = xfrm6_tunnel_input,
>  	.output = xfrm6_tunnel_output,
> +	.reinject = xfrm6_tunnel_reinject,
>  	.owner = THIS_MODULE,
>  	.encap = XFRM_MODE_TUNNEL,
>  };
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 11/12] [IPSEC]: Reinject packet instead of calling netfilter directly on input
  2007-10-16 15:05   ` YOSHIFUJI Hideaki / 吉藤英明
@ 2007-10-16 15:12     ` Herbert Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 15:12 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明
  Cc: davem, netdev, kaber, kozakai

On Wed, Oct 17, 2007 at 12:05:47AM +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
> 
> I think with this change, we parse extension headers, twice.
> We really do not want to do this.

Good point.  I'll need to think of some other way to do this then.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/12] [IPSEC]: Remove xfrmX_tunnel_check_size
  2007-10-16 14:38   ` [PATCH 7/12] [IPSEC]: Remove xfrmX_tunnel_check_size Patrick McHardy
@ 2007-10-16 15:17     ` Herbert Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2007-10-16 15:17 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netdev

On Tue, Oct 16, 2007 at 04:38:04PM +0200, Patrick McHardy wrote:
>
> Thats true, but for the first case we actually have something in the
> stack doing that, which is NAT and routing by fwmark. Maybe netfilter
> should just send an ICMP error back, that would also solve the problem
> of silently dropped packets when rerouting to an unreachable
> destination.

Crap, NAT is now bane :)

OK Dave, please scratch everything starting from patch 7.
The first 6 patches should be OK though, unless something
else comes up :)

Patrick, my plan to solve this is to move the POST_ROUTING
calls up one-level.  So we'd call them from ip_forward_finish
and where we currently call dst_output.

Let me play with this and see how it turns out.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-10-16 15:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-16 14:18 [0/12] Trying to merge xfrm input path before I got side-tracked Herbert Xu
2007-10-16 14:33 ` [PATCH 1/12] [IPSEC]: Fix pure tunnel modes involving IPv6 Herbert Xu
2007-10-16 14:33 ` [PATCH 2/12] [IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input Herbert Xu
2007-10-16 14:33 ` [PATCH 3/12] [IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi Herbert Xu
2007-10-16 14:33 ` [PATCH 4/12] [IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi Herbert Xu
2007-10-16 14:33 ` [PATCH 5/12] [IPSEC]: Fix length check in xfrm_parse_spi Herbert Xu
2007-10-16 14:33 ` [PATCH 6/12] [IPSEC]: Move type and mode map into xfrm_state.c Herbert Xu
2007-10-16 14:33 ` [PATCH 7/12] [IPSEC]: Remove xfrmX_tunnel_check_size Herbert Xu
2007-10-16 14:33 ` [PATCH 8/12] [IPSEC]: Store afinfo pointer in xfrm_mode Herbert Xu
2007-10-16 14:33 ` [PATCH 9/12] [IPSEC]: Use the top IPv4 route's peer instead of the bottom Herbert Xu
2007-10-16 14:33 ` [PATCH 10/12] [IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP Herbert Xu
2007-10-16 14:33 ` [PATCH 11/12] [IPSEC]: Reinject packet instead of calling netfilter directly on input Herbert Xu
2007-10-16 15:05   ` YOSHIFUJI Hideaki / 吉藤英明
2007-10-16 15:12     ` Herbert Xu
2007-10-16 14:33 ` [PATCH 12/12] [NET]: Add netif_rerx_secpath Herbert Xu
     [not found] ` <E1IhnHJ-0003A4-00@gondolin.me.apana.org.au>
2007-10-16 14:38   ` [PATCH 7/12] [IPSEC]: Remove xfrmX_tunnel_check_size Patrick McHardy
2007-10-16 15:17     ` Herbert Xu
2007-10-16 14:39 ` [0/12] Trying to merge xfrm input path before I got side-tracked Patrick McHardy
2007-10-16 14:44   ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).