[PATCH 00/13]: Netfilter IPsec support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/13]: Netfilter IPsec support
@ 2005-11-20 16:31 Patrick McHardy
  2005-11-20 16:31 ` [PATCH 01/13]: [NETFILTER]: Remove okfn usage in ip_vs_core.c Patrick McHardy
                   ` (14 more replies)
  0 siblings, 15 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

This is the latest netfilter/IPsec patchset. Its purpose is to make
IPsec look as much as a normal tunnel device to netfilter as possible
and to enable NAT support.

It consists of basically five parts:

- output hooks:

Currently on the output path netfilter sees the plain text packet in
LOCAL_OUT and FORWARD and the encapsulated packet in POST_ROUTING.
For connection tracking and NAT the plain text packets need to be
visible on POST_ROUTING and the encapsulated packets on LOCAL_OUT as
well. The patchset adds two new functions, ip_dst_output and
ip6_dst_output that call the appropriate netfilter hooks for the
plain text packet before encapsulation and for the encapsulated
packets once for each tunnel mode transform.

- input hooks:

The input path is already mostly symetrical to the output path with
the new output hooks, except for one case, if the innermost transform
uses transport mode the decapsulated packets will not hit netfilter
again. New hooks are added to xfrm{4,6}_input to handle this case.
The hooks are only called if the _last_ transform is transport mode,
otherwise decapsulated transport mode packets are not visible to
netfilter.

- policy lookups after NAT:

When NAT changes a packet it already calls ip_route_me_harder, which
reroutes the packet and does a new policy lookup. It only looks at
the IP addresses however, changing the port numbers require a new
policy lookup as well. It also doesn't reroute in POST_ROUTING, since
the packet has already been routed. To behave more like a regular
tunnel device a policy lookup is now also done after SNAT and the
packet is passed to dst_output again if the lookup yielded a new
policy.

- policy checks after NAT:

Policy checks look up the policy of the decapsulated packet and check
that all decapsulations match what has been specified by the policy.
If the packet has been NATed before policy checks the policy lookup
might return a different policy from what was actually used. To handle
this a new function nf_nat_decode_session reconstructs a struct flowi
for the original packet which is then used for policy lookups.

- policy match:

To allow matching on the policy or the decapsulations done on the input
path a new match is added. It can be used to replace rules like
"-i ipsec0" or "-o ipsec0" which were commonly used with KLIPS, but can
also be used for more fine-grained filtering.

Changes this last post:

- updated to apply to latest kernel
- the defered fragmentation patch has been replaced by a new patch
  which moves the POST_ROUTING hook before fragmentation
- added missing EXPORT_SYMBOL(xfrm_decode_session) for IPv6
- moving nf_reset from ip_local_deliver_finish to the upper protocols
  has been split into a seperate patch, unnecessary nf_reset's on
  paths were the packet is dropped have been removed and a missing
  nf_reset before icmp_send in ip_local_deliver_finish has been added.

The patches are now in a state in which I think they could be merged in
the net-2.6.16 tree. Unfortunately cloning the tree fails for me, so
they are still based on Linus's tree, but I don't think there are any
changes in net-2.6.16 yet which conflict.

The patches are also available in a git-tree:

http://people.netfilter.org/kaber/nf-2.6-xfrm.git/

[NETFILTER]: Remove okfn usage in ip_vs_core.c
[NETFILTER]: Call POST_ROUTING hook before fragmentation
[IPV4]: Replace dst_output by ip_dst_output
[IPV6]: Replace dst_output by ip6_dst_output
[IPV4/6]: Netfilter IPsec output hooks
[IPV4/6]: Netfilter IPsec input hooks
[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
[NETFILTER]: Use conntrack information to determine if packet was NATed
[NETFILTER]: Redo policy lookups after NAT when neccessary
[NETFILTER]: Keep the conntrack reference until after policy checks
[NETFILTER]: Handle NAT in IPsec policy checks
[NETFILTER]: Export ip6_masked_addrcmp, don't pass IPv6 addresses on stack
[NETFILTER]: Add ipt_policy/ip6t_policy matches

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/13]: [NETFILTER]: Remove okfn usage in ip_vs_core.c
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 02/13]: [NETFILTER]: Call POST_ROUTING hook before fragmentation Patrick McHardy
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Remove okfn usage in ip_vs_core.c

okfn should only be used from different contexts to avoid deep call stacks,
i.e. by nf_queue.

Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit ebb0baec0a5e909d4acf16a15601f013093fefb3
tree da877af1ad83666e8ccd54dd94da485444e6b37b
parent b286e39207237e2f6929959372bf66d9a8d05a82
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:48:33 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:48:33 +0100

 net/ipv4/ipvs/ip_vs_core.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ipvs/ip_vs_core.c b/net/ipv4/ipvs/ip_vs_core.c
index 1a0843c..b63bb28 100644
--- a/net/ipv4/ipvs/ip_vs_core.c
+++ b/net/ipv4/ipvs/ip_vs_core.c
@@ -532,11 +532,8 @@ static unsigned int ip_vs_post_routing(u
 {
 	if (!((*pskb)->ipvs_property))
 		return NF_ACCEPT;
-
 	/* The packet was sent from IPVS, exit this chain */
-	(*okfn)(*pskb);
-
-	return NF_STOLEN;
+	return NF_STOP;
 }
 
 u16 ip_vs_checksum_complete(struct sk_buff *skb, int offset)

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 02/13]: [NETFILTER]: Call POST_ROUTING hook before fragmentation
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
  2005-11-20 16:31 ` [PATCH 01/13]: [NETFILTER]: Remove okfn usage in ip_vs_core.c Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 03/13]: [IPV4]: Replace dst_output by ip_dst_output Patrick McHardy
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Call POST_ROUTING hook before fragmentation

Call POST_ROUTING hook before fragmentation to get rid of the okfn use
in ip_refrag and save the useless fragmentation/defragmentation step
when NAT is used.

The patch introduces one user-visible change, the POSTROUTING chain
in the mangle table gets entire packets, not fragments, which should
simplify use of the MARK and CLASSIFY targets for queueing as a nice
side-effect.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit d3c70d774e32c4d6f4cc6b8b0b73678aa14a9932
tree 16b82a0bedbb29b6a709140a7047ce2d52bb776f
parent ebb0baec0a5e909d4acf16a15601f013093fefb3
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:49:11 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:49:11 +0100

 include/net/ip.h                               |    1 -
 net/ipv4/ip_output.c                           |   30 +++++++++++-------------
 net/ipv4/netfilter/ip_conntrack_standalone.c   |   26 +--------------------
 net/ipv4/netfilter/ip_nat_standalone.c         |   17 --------------
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   26 +--------------------
 5 files changed, 16 insertions(+), 84 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index e4563bb..9f09882 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -310,7 +310,6 @@ enum ip_defrag_users
 	IP_DEFRAG_CALL_RA_CHAIN,
 	IP_DEFRAG_CONNTRACK_IN,
 	IP_DEFRAG_CONNTRACK_OUT,
-	IP_DEFRAG_NAT_OUT,
 	IP_DEFRAG_VS_IN,
 	IP_DEFRAG_VS_OUT,
 	IP_DEFRAG_VS_FWD
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 11c2f68..946e812 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -202,13 +202,11 @@ static inline int ip_finish_output2(stru
 
 static inline int ip_finish_output(struct sk_buff *skb)
 {
-	struct net_device *dev = skb->dst->dev;
-
-	skb->dev = dev;
-	skb->protocol = htons(ETH_P_IP);
-
-	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, dev,
-		       ip_finish_output2);
+	if (skb->len > dst_mtu(skb->dst) &&
+	    !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
+		return ip_fragment(skb, ip_finish_output2);
+	else
+		return ip_finish_output2(skb);
 }
 
 int ip_mc_output(struct sk_buff *skb)
@@ -265,21 +263,21 @@ int ip_mc_output(struct sk_buff *skb)
 				newskb->dev, ip_dev_loopback_xmit);
 	}
 
-	if (skb->len > dst_mtu(&rt->u.dst))
-		return ip_fragment(skb, ip_finish_output);
-	else
-		return ip_finish_output(skb);
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dev,
+		       ip_finish_output);
 }
 
 int ip_output(struct sk_buff *skb)
 {
+	struct net_device *dev = skb->dst->dev;
+
 	IP_INC_STATS(IPSTATS_MIB_OUTREQUESTS);
 
-	if (skb->len > dst_mtu(skb->dst) &&
-		!(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
-		return ip_fragment(skb, ip_finish_output);
-	else
-		return ip_finish_output(skb);
+	skb->dev = dev;
+	skb->protocol = htons(ETH_P_IP);
+
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, dev,
+		       ip_finish_output);
 }
 
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
diff --git a/net/ipv4/netfilter/ip_conntrack_standalone.c b/net/ipv4/netfilter/ip_conntrack_standalone.c
index dd476b1..13ed18a 100644
--- a/net/ipv4/netfilter/ip_conntrack_standalone.c
+++ b/net/ipv4/netfilter/ip_conntrack_standalone.c
@@ -450,30 +450,6 @@ static unsigned int ip_conntrack_defrag(
 	return NF_ACCEPT;
 }
 
-static unsigned int ip_refrag(unsigned int hooknum,
-			      struct sk_buff **pskb,
-			      const struct net_device *in,
-			      const struct net_device *out,
-			      int (*okfn)(struct sk_buff *))
-{
-	struct rtable *rt = (struct rtable *)(*pskb)->dst;
-
-	/* We've seen it coming out the other side: confirm */
-	if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT)
-		return NF_DROP;
-
-	/* Local packets are never produced too large for their
-	   interface.  We degfragment them at LOCAL_OUT, however,
-	   so we have to refragment them here. */
-	if ((*pskb)->len > dst_mtu(&rt->u.dst) &&
-	    !skb_shinfo(*pskb)->tso_size) {
-		/* No hook can be after us, so this should be OK. */
-		ip_fragment(*pskb, okfn);
-		return NF_STOLEN;
-	}
-	return NF_ACCEPT;
-}
-
 static unsigned int ip_conntrack_local(unsigned int hooknum,
 				       struct sk_buff **pskb,
 				       const struct net_device *in,
@@ -543,7 +519,7 @@ static struct nf_hook_ops ip_conntrack_h
 
 /* Refragmenter; last chance. */
 static struct nf_hook_ops ip_conntrack_out_ops = {
-	.hook		= ip_refrag,
+	.hook		= ip_confirm,
 	.owner		= THIS_MODULE,
 	.pf		= PF_INET,
 	.hooknum	= NF_IP_POST_ROUTING,
diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c
index 30cd4e1..f04111f 100644
--- a/net/ipv4/netfilter/ip_nat_standalone.c
+++ b/net/ipv4/netfilter/ip_nat_standalone.c
@@ -190,23 +190,6 @@ ip_nat_out(unsigned int hooknum,
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
 		return NF_ACCEPT;
 
-	/* We can hit fragment here; forwarded packets get
-	   defragmented by connection tracking coming in, then
-	   fragmented (grr) by the forward code.
-
-	   In future: If we have nfct != NULL, AND we have NAT
-	   initialized, AND there is no helper, then we can do full
-	   NAPT on the head, and IP-address-only NAT on the rest.
-
-	   I'm starting to have nightmares about fragments.  */
-
-	if ((*pskb)->nh.iph->frag_off & htons(IP_MF|IP_OFFSET)) {
-		*pskb = ip_ct_gather_frags(*pskb, IP_DEFRAG_NAT_OUT);
-
-		if (!*pskb)
-			return NF_STOLEN;
-	}
-
 	return ip_nat_fn(hooknum, pskb, in, out, okfn);
 }
 
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 8202c1c..818ba72 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -180,30 +180,6 @@ static unsigned int ipv4_conntrack_defra
 	return NF_ACCEPT;
 }
 
-static unsigned int ipv4_refrag(unsigned int hooknum,
-				struct sk_buff **pskb,
-				const struct net_device *in,
-				const struct net_device *out,
-				int (*okfn)(struct sk_buff *))
-{
-	struct rtable *rt = (struct rtable *)(*pskb)->dst;
-
-	/* We've seen it coming out the other side: confirm */
-	if (ipv4_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT)
-		return NF_DROP;
-
-	/* Local packets are never produced too large for their
-	   interface.  We degfragment them at LOCAL_OUT, however,
-	   so we have to refragment them here. */
-	if ((*pskb)->len > dst_mtu(&rt->u.dst) &&
-	    !skb_shinfo(*pskb)->tso_size) {
-		/* No hook can be after us, so this should be OK. */
-		ip_fragment(*pskb, okfn);
-		return NF_STOLEN;
-	}
-	return NF_ACCEPT;
-}
-
 static unsigned int ipv4_conntrack_in(unsigned int hooknum,
 				      struct sk_buff **pskb,
 				      const struct net_device *in,
@@ -283,7 +259,7 @@ static struct nf_hook_ops ipv4_conntrack
 
 /* Refragmenter; last chance. */
 static struct nf_hook_ops ipv4_conntrack_out_ops = {
-	.hook		= ipv4_refrag,
+	.hook		= ipv4_confirm,
 	.owner		= THIS_MODULE,
 	.pf		= PF_INET,
 	.hooknum	= NF_IP_POST_ROUTING,

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 03/13]: [IPV4]: Replace dst_output by ip_dst_output
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
  2005-11-20 16:31 ` [PATCH 01/13]: [NETFILTER]: Remove okfn usage in ip_vs_core.c Patrick McHardy
  2005-11-20 16:31 ` [PATCH 02/13]: [NETFILTER]: Call POST_ROUTING hook before fragmentation Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 04/13]: [IPV6]: Replace dst_output by ip6_dst_output Patrick McHardy
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[IPV4]: Replace dst_output by ip_dst_output

Preparation for netfilter IPsec support.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 4eb320a6444a9035da8a83e4886b3691a2ea98f7
tree d31f7b331e06e1e598593c4095be7713e6fd3ba0
parent d3c70d774e32c4d6f4cc6b8b0b73678aa14a9932
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:49:28 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:49:28 +0100

 include/net/dst.h               |    2 ++
 include/net/ipip.h              |    2 +-
 net/ipv4/igmp.c                 |    4 ++--
 net/ipv4/ip_forward.c           |    2 +-
 net/ipv4/ip_output.c            |    6 +++---
 net/ipv4/ipmr.c                 |    2 +-
 net/ipv4/ipvs/ip_vs_xmit.c      |    2 +-
 net/ipv4/netfilter/ipt_REJECT.c |    2 +-
 net/ipv4/raw.c                  |    2 +-
 9 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 6c196a5..07f552b 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -236,6 +236,8 @@ static inline int dst_output(struct sk_b
 	}
 }
 
+#define ip_dst_output	dst_output
+
 /* Input packet from network to transport.  */
 static inline int dst_input(struct sk_buff *skb)
 {
diff --git a/include/net/ipip.h b/include/net/ipip.h
index f490c3c..b267496 100644
--- a/include/net/ipip.h
+++ b/include/net/ipip.h
@@ -34,7 +34,7 @@ struct ip_tunnel
 	ip_select_ident(iph, &rt->u.dst, NULL);				\
 	ip_send_check(iph);						\
 									\
-	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, dst_output);\
+	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, ip_dst_output);\
 	if (err == NET_XMIT_SUCCESS || err == NET_XMIT_CN) {		\
 		stats->tx_bytes += pkt_len;				\
 		stats->tx_packets++;					\
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index c04607b..55779dc 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -343,7 +343,7 @@ static int igmpv3_sendpack(struct sk_buf
 	pig->csum = ip_compute_csum((void *)skb->h.igmph, igmplen);
 
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, skb->dev,
-		       dst_output);
+		       ip_dst_output);
 }
 
 static int grec_size(struct ip_mc_list *pmc, int type, int gdel, int sdel)
@@ -674,7 +674,7 @@ static int igmp_send_report(struct in_de
 	ih->csum=ip_compute_csum((void *)ih, sizeof(struct igmphdr));
 
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 }
 
 static void igmp_gq_timer_expire(unsigned long data)
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 0923add..486355d 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -51,7 +51,7 @@ static inline int ip_forward_finish(stru
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
-	return dst_output(skb);
+	return ip_dst_output(skb);
 }
 
 int ip_forward(struct sk_buff *skb)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 946e812..2c91f03 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -155,7 +155,7 @@ int ip_build_and_send_pkt(struct sk_buff
 
 	/* Send it out. */
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 }
 
 EXPORT_SYMBOL_GPL(ip_build_and_send_pkt);
@@ -360,7 +360,7 @@ packet_routed:
 	skb->priority = sk->sk_priority;
 
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 
 no_route:
 	IP_INC_STATS(IPSTATS_MIB_OUTNOROUTES);
@@ -1251,7 +1251,7 @@ int ip_push_pending_frames(struct sock *
 
 	/* Netfilter gets whole the not fragmented skb. */
 	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, 
-		      skb->dst->dev, dst_output);
+		      skb->dst->dev, ip_dst_output);
 	if (err) {
 		if (err > 0)
 			err = inet->recverr ? net_xmit_errno(err) : 0;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 302b7eb..40af27f 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1125,7 +1125,7 @@ static inline int ipmr_forward_finish(st
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
-	return dst_output(skb);
+	return ip_dst_output(skb);
 }
 
 /*
diff --git a/net/ipv4/ipvs/ip_vs_xmit.c b/net/ipv4/ipvs/ip_vs_xmit.c
index 3b87482..b66c3da 100644
--- a/net/ipv4/ipvs/ip_vs_xmit.c
+++ b/net/ipv4/ipvs/ip_vs_xmit.c
@@ -130,7 +130,7 @@ do {							\
 	(skb)->ipvs_property = 1;			\
 	(skb)->ip_summed = CHECKSUM_NONE;		\
 	NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, (skb), NULL,	\
-		(rt)->u.dst.dev, dst_output);		\
+		(rt)->u.dst.dev, ip_dst_output);	\
 } while (0)
 
 
diff --git a/net/ipv4/netfilter/ipt_REJECT.c b/net/ipv4/netfilter/ipt_REJECT.c
index f057025..55b601b 100644
--- a/net/ipv4/netfilter/ipt_REJECT.c
+++ b/net/ipv4/netfilter/ipt_REJECT.c
@@ -220,7 +220,7 @@ static void send_reset(struct sk_buff *o
 	nf_ct_attach(nskb, oldskb);
 
 	NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, nskb, NULL, nskb->dst->dev,
-		dst_output);
+		ip_dst_output);
 	return;
 
  free_nskb:
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 4b0d7e4..421538a 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -313,7 +313,7 @@ static int raw_send_hdrinc(struct sock *
 	}
 
 	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		      dst_output);
+		      ip_dst_output);
 	if (err > 0)
 		err = inet->recverr ? net_xmit_errno(err) : 0;
 	if (err)

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 04/13]: [IPV6]: Replace dst_output by ip6_dst_output
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (2 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 03/13]: [IPV4]: Replace dst_output by ip_dst_output Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks Patrick McHardy
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[IPV6]: Replace dst_output by ip6_dst_output

Preparation for netfilter IPsec support.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 73f59ffcebcd0a08f6a405c8522074e8b5892b73
tree 4be1e3bb174f611fa57ee6e1b8d9187e784c85ad
parent 4eb320a6444a9035da8a83e4886b3691a2ea98f7
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:49:40 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:49:40 +0100

 include/net/dst.h                |    1 +
 net/ipv6/ip6_input.c             |    4 ++--
 net/ipv6/ip6_output.c            |    7 ++++---
 net/ipv6/ip6_tunnel.c            |    2 +-
 net/ipv6/ndisc.c                 |    8 ++++----
 net/ipv6/netfilter/ip6t_REJECT.c |    2 +-
 net/ipv6/raw.c                   |    2 +-
 7 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 07f552b..4886f25 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -237,6 +237,7 @@ static inline int dst_output(struct sk_b
 }
 
 #define ip_dst_output	dst_output
+#define ip6_dst_output	dst_output
 
 /* Input packet from network to transport.  */
 static inline int dst_input(struct sk_buff *skb)
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index a6026d2..33d3b0e 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -255,9 +255,9 @@ int ip6_mc_input(struct sk_buff *skb)
 			
 			if (deliver) {
 				skb2 = skb_clone(skb, GFP_ATOMIC);
-				dst_output(skb2);
+				ip6_dst_output(skb2);
 			} else {
-				dst_output(skb);
+				ip6_dst_output(skb);
 				return 0;
 			}
 		}
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c1fa693..c270798 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -230,7 +230,7 @@ int ip6_xmit(struct sock *sk, struct sk_
 	if ((skb->len <= mtu) || ipfragok) {
 		IP6_INC_STATS(IPSTATS_MIB_OUTREQUESTS);
 		return NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev,
-				dst_output);
+				ip6_dst_output);
 	}
 
 	if (net_ratelimit())
@@ -308,7 +308,7 @@ static int ip6_call_ra_chain(struct sk_b
 
 static inline int ip6_forward_finish(struct sk_buff *skb)
 {
-	return dst_output(skb);
+	return ip6_dst_output(skb);
 }
 
 int ip6_forward(struct sk_buff *skb)
@@ -1181,7 +1181,8 @@ int ip6_push_pending_frames(struct sock 
 
 	skb->dst = dst_clone(&rt->u.dst);
 	IP6_INC_STATS(IPSTATS_MIB_OUTREQUESTS);	
-	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, skb->dst->dev, dst_output);
+	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, skb->dst->dev,
+	              ip6_dst_output);
 	if (err) {
 		if (err > 0)
 			err = np->recverr ? net_xmit_errno(err) : 0;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index e315d0f..7c01eaf 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -746,7 +746,7 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, str
 	nf_reset(skb);
 	pkt_len = skb->len;
 	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, 
-		      skb->dst->dev, dst_output);
+		      skb->dst->dev, ip6_dst_output);
 
 	if (err == NET_XMIT_SUCCESS || err == NET_XMIT_CN) {
 		stats->tx_bytes += pkt_len;
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 305d9ee..287170f 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -499,7 +499,7 @@ static void ndisc_send_na(struct net_dev
 	skb->dst = dst;
 	idev = in6_dev_get(dst->dev);
 	IP6_INC_STATS(IPSTATS_MIB_OUTREQUESTS);
-	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev, dst_output);
+	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev, ip6_dst_output);
 	if (!err) {
 		ICMP6_INC_STATS(idev, ICMP6_MIB_OUTNEIGHBORADVERTISEMENTS);
 		ICMP6_INC_STATS(idev, ICMP6_MIB_OUTMSGS);
@@ -582,7 +582,7 @@ void ndisc_send_ns(struct net_device *de
 	skb->dst = dst;
 	idev = in6_dev_get(dst->dev);
 	IP6_INC_STATS(IPSTATS_MIB_OUTREQUESTS);
-	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev, dst_output);
+	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev, ip6_dst_output);
 	if (!err) {
 		ICMP6_INC_STATS(idev, ICMP6_MIB_OUTNEIGHBORSOLICITS);
 		ICMP6_INC_STATS(idev, ICMP6_MIB_OUTMSGS);
@@ -654,7 +654,7 @@ void ndisc_send_rs(struct net_device *de
 	skb->dst = dst;
 	idev = in6_dev_get(dst->dev);
 	IP6_INC_STATS(IPSTATS_MIB_OUTREQUESTS);	
-	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev, dst_output);
+	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev, ip6_dst_output);
 	if (!err) {
 		ICMP6_INC_STATS(idev, ICMP6_MIB_OUTROUTERSOLICITS);
 		ICMP6_INC_STATS(idev, ICMP6_MIB_OUTMSGS);
@@ -1438,7 +1438,7 @@ void ndisc_send_redirect(struct sk_buff 
 	buff->dst = dst;
 	idev = in6_dev_get(dst->dev);
 	IP6_INC_STATS(IPSTATS_MIB_OUTREQUESTS);
-	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, buff, NULL, dst->dev, dst_output);
+	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, buff, NULL, dst->dev, ip6_dst_output);
 	if (!err) {
 		ICMP6_INC_STATS(idev, ICMP6_MIB_OUTREDIRECTS);
 		ICMP6_INC_STATS(idev, ICMP6_MIB_OUTMSGS);
diff --git a/net/ipv6/netfilter/ip6t_REJECT.c b/net/ipv6/netfilter/ip6t_REJECT.c
index b03e87a..8579eaa 100644
--- a/net/ipv6/netfilter/ip6t_REJECT.c
+++ b/net/ipv6/netfilter/ip6t_REJECT.c
@@ -161,7 +161,7 @@ static void send_reset(struct sk_buff *o
 						   sizeof(struct tcphdr), 0));
 
 	NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, nskb, NULL, nskb->dst->dev,
-		dst_output);
+		ip6_dst_output);
 }
 
 static inline void
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 8e9628f..e7fd031 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -571,7 +571,7 @@ static int rawv6_send_hdrinc(struct sock
 
 	IP6_INC_STATS(IPSTATS_MIB_OUTREQUESTS);		
 	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		      dst_output);
+		      ip6_dst_output);
 	if (err > 0)
 		err = np->recverr ? net_xmit_errno(err) : 0;
 	if (err)

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (3 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 04/13]: [IPV6]: Replace dst_output by ip6_dst_output Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-22  4:40   ` Herbert Xu
  2005-11-20 16:31 ` [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks Patrick McHardy
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[IPV4/6]: Netfilter IPsec output hooks

Add alternative ip_dst_output/ip6_dst_output functions to call netfilter
hooks between xfrm transforms. Packets visit the FORWARD/LOCAL_OUT and
POST_ROUTING hook before encapsulation and the LOCAL_OUT and POST_ROUTING
hook after each tunnel mode transform.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit b847425c693f43a63137d18e36e5c8cf0187c175
tree a811e8c573150bc279a9df53958270f25cb531bc
parent 73f59ffcebcd0a08f6a405c8522074e8b5892b73
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:49:58 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:49:58 +0100

 include/net/dst.h       |    5 +++++
 net/ipv4/netfilter.c    |   31 ++++++++++++++++++++++++++++++-
 net/ipv4/xfrm4_output.c |    1 +
 net/ipv6/netfilter.c    |   29 +++++++++++++++++++++++++++++
 net/ipv6/xfrm6_output.c |    1 +
 5 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 4886f25..7eadd0c 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -236,8 +236,13 @@ static inline int dst_output(struct sk_b
 	}
 }
 
+#if defined(CONFIG_XFRM) && defined(CONFIG_NETFILTER)
+extern int ip_dst_output(struct sk_buff *skb);
+extern int ip6_dst_output(struct sk_buff *skb);
+#else
 #define ip_dst_output	dst_output
 #define ip6_dst_output	dst_output
+#endif
 
 /* Input packet from network to transport.  */
 static inline int dst_input(struct sk_buff *skb)
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index ae0779d..b93e7cd 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -10,8 +10,9 @@
 #include <linux/tcp.h>
 #include <linux/udp.h>
 #include <linux/icmp.h>
-#include <net/route.h>
 #include <linux/ip.h>
+#include <net/route.h>
+#include <net/xfrm.h>
 
 /* route_me_harder function, used by iptable_nat, iptable_mangle + ip_queue */
 int ip_route_me_harder(struct sk_buff **pskb)
@@ -78,6 +79,34 @@ int ip_route_me_harder(struct sk_buff **
 }
 EXPORT_SYMBOL(ip_route_me_harder);
 
+#ifdef CONFIG_XFRM
+static inline int __ip_dst_output(struct sk_buff *skb)
+{
+	int err;
+
+	do {
+		err = skb->dst->output(skb);
+
+		if (likely(err == 0))
+			return err;
+		if (unlikely(err != NET_XMIT_BYPASS))
+			return err;
+	} while (skb->dst->xfrm && !skb->dst->xfrm->props.mode);
+
+	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, skb->dst->dev,
+	               ip_dst_output);
+}
+
+int ip_dst_output(struct sk_buff *skb)
+{
+	if (skb->dst->xfrm != NULL)
+		return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+		               skb->dst->dev, __ip_dst_output);
+	return dst_output(skb);
+}
+EXPORT_SYMBOL(ip_dst_output);
+#endif /* CONFIG_XFRM */
+
 /*
  * Extra routing may needed on local out, as the QUEUE target never
  * returns control to the table.
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 66620a9..c135746 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -133,6 +133,7 @@ int xfrm4_output(struct sk_buff *skb)
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
+	nf_reset(skb);
 	err = NET_XMIT_BYPASS;
 
 out_exit:
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index f8626eb..06b275e 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -10,6 +10,7 @@
 #include <net/dst.h>
 #include <net/ipv6.h>
 #include <net/ip6_route.h>
+#include <net/xfrm.h>
 
 int ip6_route_me_harder(struct sk_buff *skb)
 {
@@ -41,6 +42,34 @@ int ip6_route_me_harder(struct sk_buff *
 }
 EXPORT_SYMBOL(ip6_route_me_harder);
 
+#ifdef CONFIG_XFRM
+static inline int __ip6_dst_output(struct sk_buff *skb)
+{
+	int err;
+	
+	do {
+		err = skb->dst->output(skb);
+
+		if (likely(err == 0))
+			return err;
+		if (unlikely(err != NET_XMIT_BYPASS))
+			return err;
+	} while (skb->dst->xfrm && !skb->dst->xfrm->props.mode);
+
+	return NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, skb->dst->dev,
+	               ip6_dst_output);
+}
+
+int ip6_dst_output(struct sk_buff *skb)
+{
+	if (skb->dst->xfrm != NULL)
+		return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL,
+		               skb->dst->dev, __ip6_dst_output);
+	return dst_output(skb);
+}
+EXPORT_SYMBOL(ip6_dst_output);
+#endif /* CONFIG_XFRM */
+
 /*
  * Extra routing may needed on local out, as the QUEUE target never
  * returns control to the table.
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 6b98677..a566d25 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -132,6 +132,7 @@ int xfrm6_output(struct sk_buff *skb)
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
+	nf_reset(skb);
 	err = NET_XMIT_BYPASS;
 
 out_exit:

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-20 16:31 ` [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks Patrick McHardy
@ 2005-11-22  4:40   ` Herbert Xu
  2005-11-22  4:53     ` Patrick McHardy
  0 siblings, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-11-22  4:40 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Sun, Nov 20, 2005 at 04:31:34PM +0000, Patrick McHardy wrote:
>
> diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
> index ae0779d..b93e7cd 100644
> --- a/net/ipv4/netfilter.c
> +++ b/net/ipv4/netfilter.c
> @@ -78,6 +79,34 @@ int ip_route_me_harder(struct sk_buff **
>  }
>  EXPORT_SYMBOL(ip_route_me_harder);
>  
> +#ifdef CONFIG_XFRM
> +static inline int __ip_dst_output(struct sk_buff *skb)

I'd like to suggest an alternative way of doing this that

1) Keeps this XFRM stuff in xfrm*.c.
2) Removes the need for ip_dst_output.

Please see the attached patch.

> +	do {
> +		err = skb->dst->output(skb);
> +
> +		if (likely(err == 0))
> +			return err;
> +		if (unlikely(err != NET_XMIT_BYPASS))
> +			return err;
> +	} while (skb->dst->xfrm && !skb->dst->xfrm->props.mode);
> +
> +	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, skb->dst->dev,
> +	               ip_dst_output);

The idea is simply to put this stuff in xfrm[46]_output directly.
So for your patch you would simply need to add the two NF_HOOK
calls at the beginning and end of xfrm[46]_output once they've
been modified in the way I outline below.

> diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
> index 66620a9..c135746 100644
> --- a/net/ipv4/xfrm4_output.c
> +++ b/net/ipv4/xfrm4_output.c
> @@ -133,6 +133,7 @@ int xfrm4_output(struct sk_buff *skb)
>  		err = -EHOSTUNREACH;
>  		goto error_nolock;
>  	}
> +	nf_reset(skb);
>  	err = NET_XMIT_BYPASS;

Shouldn't this sit after POST_ROUTING, i.e., once the connection
has been confirmed?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/net/ip.h b/include/net/ip.h
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -113,26 +113,31 @@ int xfrm4_output(struct sk_buff *skb)
 			goto error_nolock;
 	}
 
-	spin_lock_bh(&x->lock);
-	err = xfrm_state_check(x, skb);
-	if (err)
-		goto error;
-
-	xfrm4_encap(skb);
-
-	err = x->type->output(x, skb);
-	if (err)
-		goto error;
+	do {
+		spin_lock_bh(&x->lock);
+		err = xfrm_state_check(x, skb);
+		if (err)
+			goto error;
+
+		xfrm4_encap(skb);
+
+		err = x->type->output(x, skb);
+		if (err)
+			goto error;
 
-	x->curlft.bytes += skb->len;
-	x->curlft.packets++;
+		x->curlft.bytes += skb->len;
+		x->curlft.packets++;
 
-	spin_unlock_bh(&x->lock);
+		spin_unlock_bh(&x->lock);
 	
-	if (!(skb->dst = dst_pop(dst))) {
-		err = -EHOSTUNREACH;
-		goto error_nolock;
-	}
+		if (!(skb->dst = dst_pop(dst))) {
+			err = -EHOSTUNREACH;
+			goto error_nolock;
+		}
+		dst = skb->dst;
+		x = dst->xfrm;
+	} while (x && !x->props.mode);
+
 	err = NET_XMIT_BYPASS;
 
 out_exit:
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -110,28 +110,33 @@ int xfrm6_output(struct sk_buff *skb)
 			goto error_nolock;
 	}
 
-	spin_lock_bh(&x->lock);
-	err = xfrm_state_check(x, skb);
-	if (err)
-		goto error;
-
-	xfrm6_encap(skb);
-
-	err = x->type->output(x, skb);
-	if (err)
-		goto error;
-
-	x->curlft.bytes += skb->len;
-	x->curlft.packets++;
-
-	spin_unlock_bh(&x->lock);
+	do {
+		spin_lock_bh(&x->lock);
+		err = xfrm_state_check(x, skb);
+		if (err)
+			goto error;
+
+		xfrm6_encap(skb);
+
+		err = x->type->output(x, skb);
+		if (err)
+			goto error;
+
+		x->curlft.bytes += skb->len;
+		x->curlft.packets++;
+
+		spin_unlock_bh(&x->lock);
+
+		skb->nh.raw = skb->data;
+		
+		if (!(skb->dst = dst_pop(dst))) {
+			err = -EHOSTUNREACH;
+			goto error_nolock;
+		}
+		dst = skb->dst;
+		x = dst->xfrm;
+	} while (x && !x->props.mode);
 
-	skb->nh.raw = skb->data;
-	
-	if (!(skb->dst = dst_pop(dst))) {
-		err = -EHOSTUNREACH;
-		goto error_nolock;
-	}
 	err = NET_XMIT_BYPASS;
 
 out_exit:

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-22  4:40   ` Herbert Xu
@ 2005-11-22  4:53     ` Patrick McHardy
  2005-11-22  5:13       ` Patrick McHardy
  2005-11-22 10:30       ` Herbert Xu
  0 siblings, 2 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-22  4:53 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netfilter-devel, davem

Herbert Xu wrote:
> On Sun, Nov 20, 2005 at 04:31:34PM +0000, Patrick McHardy wrote:
> 
>>diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
>>index ae0779d..b93e7cd 100644
>>--- a/net/ipv4/netfilter.c
>>+++ b/net/ipv4/netfilter.c
>>@@ -78,6 +79,34 @@ int ip_route_me_harder(struct sk_buff **
>> }
>> EXPORT_SYMBOL(ip_route_me_harder);
>> 
>>+#ifdef CONFIG_XFRM
>>+static inline int __ip_dst_output(struct sk_buff *skb)
> 
> 
> I'd like to suggest an alternative way of doing this that
> 
> 1) Keeps this XFRM stuff in xfrm*.c.
> 2) Removes the need for ip_dst_output.
> 
> Please see the attached patch.
> 
> 
>>+	do {
>>+		err = skb->dst->output(skb);
>>+
>>+		if (likely(err == 0))
>>+			return err;
>>+		if (unlikely(err != NET_XMIT_BYPASS))
>>+			return err;
>>+	} while (skb->dst->xfrm && !skb->dst->xfrm->props.mode);
>>+
>>+	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, skb->dst->dev,
>>+	               ip_dst_output);
> 
> 
> The idea is simply to put this stuff in xfrm[46]_output directly.
> So for your patch you would simply need to add the two NF_HOOK
> calls at the beginning and end of xfrm[46]_output once they've
> been modified in the way I outline below.

This looks nice, but placing the hooks at the end of the xfrm[46]
functions doesn't work with queueing without recursively calling
dst_output (as okfn) since we have to provide an okfn but also
have to return ownership of the skb back to dst_output.

>>diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
>>index 66620a9..c135746 100644
>>--- a/net/ipv4/xfrm4_output.c
>>+++ b/net/ipv4/xfrm4_output.c
>>@@ -133,6 +133,7 @@ int xfrm4_output(struct sk_buff *skb)
>> 		err = -EHOSTUNREACH;
>> 		goto error_nolock;
>> 	}
>>+	nf_reset(skb);
>> 	err = NET_XMIT_BYPASS;
> 
> 
> Shouldn't this sit after POST_ROUTING, i.e., once the connection
> has been confirmed?

This is after POST_ROUTING :) POST_ROUTING is called before the
transforms and LOCAL_OUT afterwards. But it could be moved to the
ip/ip6_dst_output functions to avoid unnecessarily trying to reset
the skb for transport mode transforms.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-22  4:53     ` Patrick McHardy
@ 2005-11-22  5:13       ` Patrick McHardy
  2005-11-22 10:30       ` Herbert Xu
  1 sibling, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-22  5:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netfilter-devel, davem

Patrick McHardy wrote:
> Herbert Xu wrote:
> 
>> On Sun, Nov 20, 2005 at 04:31:34PM +0000, Patrick McHardy wrote:
>>
>>> diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
>>> index ae0779d..b93e7cd 100644
>>> --- a/net/ipv4/netfilter.c
>>> +++ b/net/ipv4/netfilter.c
>>> @@ -78,6 +79,34 @@ int ip_route_me_harder(struct sk_buff **
>>> }
>>> EXPORT_SYMBOL(ip_route_me_harder);
>>>
>>> +#ifdef CONFIG_XFRM
>>> +static inline int __ip_dst_output(struct sk_buff *skb)
>>
>>
>>
>> I'd like to suggest an alternative way of doing this that
>>
>> 1) Keeps this XFRM stuff in xfrm*.c.
>> 2) Removes the need for ip_dst_output.
>>
>> Please see the attached patch.
>>
>>
>>> +    do {
>>> +        err = skb->dst->output(skb);
>>> +
>>> +        if (likely(err == 0))
>>> +            return err;
>>> +        if (unlikely(err != NET_XMIT_BYPASS))
>>> +            return err;
>>> +    } while (skb->dst->xfrm && !skb->dst->xfrm->props.mode);
>>> +
>>> +    return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, skb->dst->dev,
>>> +                   ip_dst_output);
>>
>>
>>
>> The idea is simply to put this stuff in xfrm[46]_output directly.
>> So for your patch you would simply need to add the two NF_HOOK
>> calls at the beginning and end of xfrm[46]_output once they've
>> been modified in the way I outline below.
> 
> 
> This looks nice, but placing the hooks at the end of the xfrm[46]
> functions doesn't work with queueing without recursively calling
> dst_output (as okfn) since we have to provide an okfn but also
> have to return ownership of the skb back to dst_output.

I should add, the same affects ip_dst_output/__ip_dst_output of
course, which is why they do call themselves recursively. But since
__ip_dst_output is an inline function and is not called through the
function pointer except from a different context (ip_queue), the
compiler does a good job at eliminating the recursion for the
inlined version. This probably wouldn't work if we used a recursive
dst_output call in xfrm[46]_output. But your patch looks like a nice
idea anyway, so how about we move the looping to xfrm[46]_input and
keep ip/ip6_dst_output for the hooks?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-22  4:53     ` Patrick McHardy
  2005-11-22  5:13       ` Patrick McHardy
@ 2005-11-22 10:30       ` Herbert Xu
  2005-11-22 10:31         ` Herbert Xu
  1 sibling, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-11-22 10:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Tue, Nov 22, 2005 at 05:53:35AM +0100, Patrick McHardy wrote:
> 
> This looks nice, but placing the hooks at the end of the xfrm[46]
> functions doesn't work with queueing without recursively calling
> dst_output (as okfn) since we have to provide an okfn but also
> have to return ownership of the skb back to dst_output.

This patch (on top of the last one) is what we could do to eliminate
the need to return control to dst_output.  The only reason for
dst_output to exist is to handle compilers that can't optimise away
tail calls.  So if we are going to rely on the compiler to do away
with tail calls (ip_dst_output <-> __ip_dst_output), then we might
as well get rid of the loop in dst_output.

BTW, I killed the corresponding inline (which would have gone onto
xfrm[46]_output_finish) because the compiler should be able to
optimise it into a tail call.  If it doesn't, then we're in trouble
anyway since it won't be able to optimise away the call to dst_output.

> This is after POST_ROUTING :) POST_ROUTING is called before the
> transforms and LOCAL_OUT afterwards. But it could be moved to the
> ip/ip6_dst_output functions to avoid unnecessarily trying to reset
> the skb for transport mode transforms.

You're absolute right.  I somehow mistook LOCAL_OUT for POST_ROUTING :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/net/dst.h b/include/net/dst.h
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -224,16 +224,7 @@ static inline void dst_set_expires(struc
 /* Output packet to network from transport.  */
 static inline int dst_output(struct sk_buff *skb)
 {
-	int err;
-
-	for (;;) {
-		err = skb->dst->output(skb);
-
-		if (likely(err == 0))
-			return err;
-		if (unlikely(err != NET_XMIT_BYPASS))
-			return err;
-	}
+	return skb->dst->output(skb);
 }
 
 /* Input packet from network to transport.  */
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -10,6 +10,7 @@
 
 #include <linux/skbuff.h>
 #include <linux/spinlock.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/inet_ecn.h>
 #include <net/ip.h>
 #include <net/xfrm.h>
@@ -95,7 +96,7 @@ out:
 	return ret;
 }
 
-int xfrm4_output(struct sk_buff *skb)
+static int xfrm4_output_finish(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb->dst;
 	struct xfrm_state *x = dst->xfrm;
@@ -138,13 +139,20 @@ int xfrm4_output(struct sk_buff *skb)
 		x = dst->xfrm;
 	} while (x && !x->props.mode);
 
-	err = NET_XMIT_BYPASS;
+	nf_reset(skb);
+	
+	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, dst->dev,
+		       dst_output);
 
-out_exit:
-	return err;
 error:
 	spin_unlock_bh(&x->lock);
 error_nolock:
 	kfree_skb(skb);
-	goto out_exit;
+	return err;
+}
+
+int xfrm4_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev,
+		       xfrm4_output_finish);
 }
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -12,6 +12,7 @@
 #include <linux/skbuff.h>
 #include <linux/spinlock.h>
 #include <linux/icmpv6.h>
+#include <linux/netfilter_ipv6.h>
 #include <net/dsfield.h>
 #include <net/inet_ecn.h>
 #include <net/ipv6.h>
@@ -92,7 +93,7 @@ static int xfrm6_tunnel_check_size(struc
 	return ret;
 }
 
-int xfrm6_output(struct sk_buff *skb)
+static int xfrm6_output_finish(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb->dst;
 	struct xfrm_state *x = dst->xfrm;
@@ -137,13 +138,20 @@ int xfrm6_output(struct sk_buff *skb)
 		x = dst->xfrm;
 	} while (x && !x->props.mode);
 
-	err = NET_XMIT_BYPASS;
+	nf_reset(skb);
+	
+	return NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, dst->dev,
+		       dst_output);
 
-out_exit:
-	return err;
 error:
 	spin_unlock_bh(&x->lock);
 error_nolock:
 	kfree_skb(skb);
-	goto out_exit;
+	return err;
+}
+
+int xfrm6_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev,
+		       xfrm6_output_finish);
 }

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-22 10:30       ` Herbert Xu
@ 2005-11-22 10:31         ` Herbert Xu
  2005-11-22 12:13           ` Herbert Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-11-22 10:31 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Tue, Nov 22, 2005 at 09:30:38PM +1100, herbert wrote:
>
> the need to return control to dst_output.  The only reason for
> dst_output to exist is to handle compilers that can't optimise away
> tail calls.  So if we are going to rely on the compiler to do away
> with tail calls (ip_dst_output <-> __ip_dst_output), then we might
> as well get rid of the loop in dst_output.

Unfortunately it looks like gcc 3.3.5 at least is too dumb to optimise
it away.  I think we'll need a better strategy.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-22 10:31         ` Herbert Xu
@ 2005-11-22 12:13           ` Herbert Xu
  2005-11-28  1:07             ` Patrick McHardy
  0 siblings, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-11-22 12:13 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Tue, Nov 22, 2005 at 09:31:39PM +1100, herbert wrote:
> 
> Unfortunately it looks like gcc 3.3.5 at least is too dumb to optimise
> it away.  I think we'll need a better strategy.

OK, the idea is still the same: Move the loop from dst_output into
xfrm4_output/xfrm6_output since they're the only ones who need to it.

In order to avoid the tail call issue, I've added the inline function
nf_hook which is nf_hook_slow plus the empty list check.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -168,6 +168,37 @@ void nf_log_packet(int pf,
 		   const struct net_device *out,
 		   struct nf_loginfo *li,
 		   const char *fmt, ...);
+
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
+		 struct net_device *indev, struct net_device *outdev,
+		 int (*okfn)(struct sk_buff *), int thresh);
+
+/**
+ *	nf_hook_thresh - call a netfilter hook
+ *	
+ *	Returns 1 if the hook has allowed the packet to pass.  The function
+ *	okfn must be invoked by the caller in this case.  Any other return
+ *	value indicates the packet has been consumed by the hook.
+ */
+static inline int nf_hook_thresh(int pf, unsigned int hook,
+				 struct sk_buff **pskb,
+				 struct net_device *indev,
+				 struct net_device *outdev,
+				 int (*okfn)(struct sk_buff *), int thresh)
+{
+#ifndef CONFIG_NETFILTER_DEBUG
+	if (list_empty(&nf_hooks[pf][hook]))
+		return 1;
+#endif
+	return nf_hook_slow(pf, hook, pskb, indev, outdev, okfn, thresh);
+}
+
+static inline int nf_hook(int pf, unsigned int hook, struct sk_buff **pskb,
+			  struct net_device *indev, struct net_device *outdev,
+			  int (*okfn)(struct sk_buff *))
+{
+	return nf_hook_thresh(pf, hook, pskb, indev, outdev, okfn, INT_MIN);
+}
                    
 /* Activate hook; either okfn or kfree_skb called, unless a hook
    returns NF_STOLEN (in which case, it's up to the hook to deal with
@@ -188,35 +219,17 @@ void nf_log_packet(int pf,
 
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
-#ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			       \
-({int __ret;								       \
-if ((__ret=nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN)) == 1) \
-	__ret = (okfn)(skb);						       \
-__ret;})
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	       \
-({int __ret;								       \
-if ((__ret=nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh)) == 1)  \
-	__ret = (okfn)(skb);						       \
-__ret;})
-#else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			       \
-({int __ret;								       \
-if (list_empty(&nf_hooks[pf][hook]) ||					       \
-    (__ret=nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN)) == 1) \
-	__ret = (okfn)(skb);						       \
-__ret;})
+
+/* HX: It's slightly less gross now. */
+
 #define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	       \
 ({int __ret;								       \
-if (list_empty(&nf_hooks[pf][hook]) ||					       \
-    (__ret=nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh)) == 1)  \
+if ((__ret=nf_hook_thresh(pf, hook, &(skb), indev, outdev, okfn, thresh)) == 1)\
 	__ret = (okfn)(skb);						       \
 __ret;})
-#endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
-		 struct net_device *indev, struct net_device *outdev,
-		 int (*okfn)(struct sk_buff *), int thresh);
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
+	NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, INT_MIN)
 
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, int pf, int optval, char __user *opt, 
diff --git a/include/net/dst.h b/include/net/dst.h
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -224,16 +224,7 @@ static inline void dst_set_expires(struc
 /* Output packet to network from transport.  */
 static inline int dst_output(struct sk_buff *skb)
 {
-	int err;
-
-	for (;;) {
-		err = skb->dst->output(skb);
-
-		if (likely(err == 0))
-			return err;
-		if (unlikely(err != NET_XMIT_BYPASS))
-			return err;
-	}
+	return skb->dst->output(skb);
 }
 
 /* Input packet from network to transport.  */
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -8,8 +8,10 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <linux/compiler.h>
 #include <linux/skbuff.h>
 #include <linux/spinlock.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/inet_ecn.h>
 #include <net/ip.h>
 #include <net/xfrm.h>
@@ -95,7 +97,7 @@ out:
 	return ret;
 }
 
-int xfrm4_output(struct sk_buff *skb)
+static int xfrm4_output_one(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb->dst;
 	struct xfrm_state *x = dst->xfrm;
@@ -138,7 +140,7 @@ int xfrm4_output(struct sk_buff *skb)
 		x = dst->xfrm;
 	} while (x && !x->props.mode);
 
-	err = NET_XMIT_BYPASS;
+	err = 0;
 
 out_exit:
 	return err;
@@ -148,3 +150,34 @@ error_nolock:
 	kfree_skb(skb);
 	goto out_exit;
 }
+
+static int xfrm4_output_finish(struct sk_buff *skb)
+{
+	int err;
+
+	while (likely((err = xfrm4_output_one(skb)) == 0)) {
+		nf_reset(skb);
+
+		err = nf_hook(PF_INET, NF_IP_LOCAL_OUT, &skb, NULL,
+			      skb->dst->dev, dst_output);
+		if (unlikely(err != 1))
+			break;
+
+		err = 0;
+		if (!skb->dst->xfrm)
+			break;
+
+		err = nf_hook(PF_INET, NF_IP_POST_ROUTING, &skb, NULL,
+			      skb->dst->dev, xfrm4_output_finish);
+		if (unlikely(err != 1))
+			break;
+	}
+
+	return err;
+}
+
+int xfrm4_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev,
+		       xfrm4_output_finish);
+}
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -9,9 +9,11 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <linux/compiler.h>
 #include <linux/skbuff.h>
 #include <linux/spinlock.h>
 #include <linux/icmpv6.h>
+#include <linux/netfilter_ipv6.h>
 #include <net/dsfield.h>
 #include <net/inet_ecn.h>
 #include <net/ipv6.h>
@@ -92,7 +94,7 @@ static int xfrm6_tunnel_check_size(struc
 	return ret;
 }
 
-int xfrm6_output(struct sk_buff *skb)
+static int xfrm6_output_one(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb->dst;
 	struct xfrm_state *x = dst->xfrm;
@@ -137,7 +139,7 @@ int xfrm6_output(struct sk_buff *skb)
 		x = dst->xfrm;
 	} while (x && !x->props.mode);
 
-	err = NET_XMIT_BYPASS;
+	err = 0;
 
 out_exit:
 	return err;
@@ -147,3 +149,34 @@ error_nolock:
 	kfree_skb(skb);
 	goto out_exit;
 }
+
+static int xfrm6_output_finish(struct sk_buff *skb)
+{
+	int err;
+
+	while (likely((err = xfrm6_output_one(skb)) == 0)) {
+		nf_reset(skb);
+	
+		err = nf_hook(PF_INET6, NF_IP6_LOCAL_OUT, &skb, NULL,
+			      skb->dst->dev, dst_output);
+		if (unlikely(err != 1))
+			break;
+
+		err = 0;
+		if (!skb->dst->xfrm)
+			break;
+
+		err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, &skb, NULL,
+			      skb->dst->dev, xfrm6_output_finish);
+		if (unlikely(err != 1))
+			break;
+	}
+
+	return err;
+}
+
+int xfrm6_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev,
+		       xfrm6_output_finish);
+}

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-22 12:13           ` Herbert Xu
@ 2005-11-28  1:07             ` Patrick McHardy
  2005-11-28  4:56               ` Herbert Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-11-28  1:07 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netfilter-devel, davem

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

Herbert Xu wrote:
> On Tue, Nov 22, 2005 at 09:31:39PM +1100, herbert wrote:
> 
>>Unfortunately it looks like gcc 3.3.5 at least is too dumb to optimise
>>it away.  I think we'll need a better strategy.
> 
> 
> OK, the idea is still the same: Move the loop from dst_output into
> xfrm4_output/xfrm6_output since they're the only ones who need to it.
> 
> In order to avoid the tail call issue, I've added the inline function
> nf_hook which is nf_hook_slow plus the empty list check.

Thanks, this looks great. I've changed it to only call the hooks
before tunnel mode transforms and added a missing dst_output call
for the final packet.

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 8459 bytes --]

[XFRM4/6]: Netfilter IPsec output hooks

Call netfilter hooks before IPsec transforms. Packets visit the FORWARD/LOCAL_OUT
and POST_ROUTING hook before the first encapsulation and the LOCAL_OUT and
POST_ROUTING hook before each following tunnel mode transform.

Based in large parts on patch by Herbert Xu <herbert@gondor.apana.org.au>,
that hides everything within xfrm{4,6}_output.c. Original description:
-
Move the loop from dst_output into xfrm4_output/xfrm6_output since they're
the only ones who need to it.

In order to avoid the tail call issue, I've added the inline function
nf_hook which is nf_hook_slow plus the empty list check.
-

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 7abb84c6c3916fc365051a090c752db682b022ab
tree 31b5c4089aaf23cd2c44516f95fddf2158d0fd70
parent b47e6dc58fa6342f2403a32dd1060bc8b1cef56b
author Patrick McHardy <kaber@trash.net> Mon, 28 Nov 2005 01:56:11 +0100
committer Patrick McHardy <kaber@trash.net> Mon, 28 Nov 2005 01:56:11 +0100

 include/linux/netfilter.h |   61 +++++++++++++++++++++++++++------------------
 include/net/dst.h         |   11 +-------
 net/ipv4/xfrm4_output.c   |   39 +++++++++++++++++++++++++++--
 net/ipv6/xfrm6_output.c   |   39 +++++++++++++++++++++++++++--
 4 files changed, 112 insertions(+), 38 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index be365e7..79bb977 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -168,6 +168,37 @@ void nf_log_packet(int pf,
 		   const struct net_device *out,
 		   struct nf_loginfo *li,
 		   const char *fmt, ...);
+
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
+		 struct net_device *indev, struct net_device *outdev,
+		 int (*okfn)(struct sk_buff *), int thresh);
+
+/**
+ *	nf_hook_thresh - call a netfilter hook
+ *	
+ *	Returns 1 if the hook has allowed the packet to pass.  The function
+ *	okfn must be invoked by the caller in this case.  Any other return
+ *	value indicates the packet has been consumed by the hook.
+ */
+static inline int nf_hook_thresh(int pf, unsigned int hook,
+				 struct sk_buff **pskb,
+				 struct net_device *indev,
+				 struct net_device *outdev,
+				 int (*okfn)(struct sk_buff *), int thresh)
+{
+#ifndef CONFIG_NETFILTER_DEBUG
+	if (list_empty(&nf_hooks[pf][hook]))
+		return 1;
+#endif
+	return nf_hook_slow(pf, hook, pskb, indev, outdev, okfn, thresh);
+}
+
+static inline int nf_hook(int pf, unsigned int hook, struct sk_buff **pskb,
+			  struct net_device *indev, struct net_device *outdev,
+			  int (*okfn)(struct sk_buff *))
+{
+	return nf_hook_thresh(pf, hook, pskb, indev, outdev, okfn, INT_MIN);
+}
                    
 /* Activate hook; either okfn or kfree_skb called, unless a hook
    returns NF_STOLEN (in which case, it's up to the hook to deal with
@@ -188,35 +219,17 @@ void nf_log_packet(int pf,
 
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
-#ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			       \
-({int __ret;								       \
-if ((__ret=nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN)) == 1) \
-	__ret = (okfn)(skb);						       \
-__ret;})
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	       \
-({int __ret;								       \
-if ((__ret=nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh)) == 1)  \
-	__ret = (okfn)(skb);						       \
-__ret;})
-#else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			       \
-({int __ret;								       \
-if (list_empty(&nf_hooks[pf][hook]) ||					       \
-    (__ret=nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN)) == 1) \
-	__ret = (okfn)(skb);						       \
-__ret;})
+
+/* HX: It's slightly less gross now. */
+
 #define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	       \
 ({int __ret;								       \
-if (list_empty(&nf_hooks[pf][hook]) ||					       \
-    (__ret=nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh)) == 1)  \
+if ((__ret=nf_hook_thresh(pf, hook, &(skb), indev, outdev, okfn, thresh)) == 1)\
 	__ret = (okfn)(skb);						       \
 __ret;})
-#endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
-		 struct net_device *indev, struct net_device *outdev,
-		 int (*okfn)(struct sk_buff *), int thresh);
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \
+	NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, INT_MIN)
 
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, int pf, int optval, char __user *opt, 
diff --git a/include/net/dst.h b/include/net/dst.h
index 6c196a5..e641dd2 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -224,16 +224,7 @@ static inline void dst_set_expires(struc
 /* Output packet to network from transport.  */
 static inline int dst_output(struct sk_buff *skb)
 {
-	int err;
-
-	for (;;) {
-		err = skb->dst->output(skb);
-
-		if (likely(err == 0))
-			return err;
-		if (unlikely(err != NET_XMIT_BYPASS))
-			return err;
-	}
+	return skb->dst->output(skb);
 }
 
 /* Input packet from network to transport.  */
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 66620a9..67b9483 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -8,8 +8,10 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <linux/compiler.h>
 #include <linux/skbuff.h>
 #include <linux/spinlock.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/inet_ecn.h>
 #include <net/ip.h>
 #include <net/xfrm.h>
@@ -95,7 +97,7 @@ out:
 	return ret;
 }
 
-int xfrm4_output(struct sk_buff *skb)
+static int xfrm4_output_one(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb->dst;
 	struct xfrm_state *x = dst->xfrm;
@@ -133,7 +135,7 @@ int xfrm4_output(struct sk_buff *skb)
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
-	err = NET_XMIT_BYPASS;
+	err = 0;
 
 out_exit:
 	return err;
@@ -143,3 +145,36 @@ error_nolock:
 	kfree_skb(skb);
 	goto out_exit;
 }
+
+static int xfrm4_output_finish(struct sk_buff *skb)
+{
+	int err;
+
+	while (likely((err = xfrm4_output_one(skb)) == 0)) {
+		if (!skb->dst->xfrm || skb->dst->xfrm->props.mode) {
+			nf_reset(skb);
+			err = nf_hook(PF_INET, NF_IP_LOCAL_OUT, &skb, NULL,
+				      skb->dst->dev, dst_output);
+			if (unlikely(err != 1))
+				break;
+
+			if (!skb->dst->xfrm) {
+				err = dst_output(skb);
+				break;
+			}
+
+			err = nf_hook(PF_INET, NF_IP_POST_ROUTING, &skb, NULL,
+				      skb->dst->dev, xfrm4_output_finish);
+			if (unlikely(err != 1))
+				break;
+		}
+	}
+
+	return err;
+}
+
+int xfrm4_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev,
+		       xfrm4_output_finish);
+}
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 6b98677..69ee2bf 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -9,9 +9,11 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <linux/compiler.h>
 #include <linux/skbuff.h>
 #include <linux/spinlock.h>
 #include <linux/icmpv6.h>
+#include <linux/netfilter_ipv6.h>
 #include <net/dsfield.h>
 #include <net/inet_ecn.h>
 #include <net/ipv6.h>
@@ -92,7 +94,7 @@ static int xfrm6_tunnel_check_size(struc
 	return ret;
 }
 
-int xfrm6_output(struct sk_buff *skb)
+static int xfrm6_output_one(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb->dst;
 	struct xfrm_state *x = dst->xfrm;
@@ -132,7 +134,7 @@ int xfrm6_output(struct sk_buff *skb)
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
-	err = NET_XMIT_BYPASS;
+	err = 0;
 
 out_exit:
 	return err;
@@ -142,3 +144,36 @@ error_nolock:
 	kfree_skb(skb);
 	goto out_exit;
 }
+
+static int xfrm6_output_finish(struct sk_buff *skb)
+{
+	int err;
+
+	while (likely((err = xfrm6_output_one(skb)) == 0)) {
+		if (!skb->dst->xfrm || skb->dst->xfrm->props.mode) {
+			nf_reset(skb);
+			err = nf_hook(PF_INET6, NF_IP6_LOCAL_OUT, &skb, NULL,
+				      skb->dst->dev, dst_output);
+			if (unlikely(err != 1))
+				break;
+
+			if (!skb->dst->xfrm) {
+				err = dst_output(skb);
+				break;
+			}
+
+			err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, &skb, NULL,
+				      skb->dst->dev, xfrm6_output_finish);
+			if (unlikely(err != 1))
+				break;
+		}
+	}
+
+	return err;
+}
+
+int xfrm6_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev,
+		       xfrm6_output_finish);
+}

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-28  1:07             ` Patrick McHardy
@ 2005-11-28  4:56               ` Herbert Xu
  2005-11-28 12:25                 ` Patrick McHardy
  2005-12-04 22:09                 ` Patrick McHardy
  0 siblings, 2 replies; 59+ messages in thread
From: Herbert Xu @ 2005-11-28  4:56 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Mon, Nov 28, 2005 at 02:07:03AM +0100, Patrick McHardy wrote:
> 
> Thanks, this looks great. I've changed it to only call the hooks

Glad you liked it :)

> before tunnel mode transforms and added a missing dst_output call
> for the final packet.

This shouldn't be necessary if you apply it on top of my previous
patch which made xfrm[46]_output process the first SA and all subsequent
transport mode SAs.  I've included that patch here again.

I think it still makes sense to do that because this corresponds
with the usual representation of an IPsec connection and it
simplifies the handling of netfilter hooks.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/net/ip.h b/include/net/ip.h
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -113,26 +113,31 @@ int xfrm4_output(struct sk_buff *skb)
 			goto error_nolock;
 	}
 
-	spin_lock_bh(&x->lock);
-	err = xfrm_state_check(x, skb);
-	if (err)
-		goto error;
-
-	xfrm4_encap(skb);
-
-	err = x->type->output(x, skb);
-	if (err)
-		goto error;
+	do {
+		spin_lock_bh(&x->lock);
+		err = xfrm_state_check(x, skb);
+		if (err)
+			goto error;
+
+		xfrm4_encap(skb);
+
+		err = x->type->output(x, skb);
+		if (err)
+			goto error;
 
-	x->curlft.bytes += skb->len;
-	x->curlft.packets++;
+		x->curlft.bytes += skb->len;
+		x->curlft.packets++;
 
-	spin_unlock_bh(&x->lock);
+		spin_unlock_bh(&x->lock);
 	
-	if (!(skb->dst = dst_pop(dst))) {
-		err = -EHOSTUNREACH;
-		goto error_nolock;
-	}
+		if (!(skb->dst = dst_pop(dst))) {
+			err = -EHOSTUNREACH;
+			goto error_nolock;
+		}
+		dst = skb->dst;
+		x = dst->xfrm;
+	} while (x && !x->props.mode);
+
 	err = NET_XMIT_BYPASS;
 
 out_exit:
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -110,28 +110,33 @@ int xfrm6_output(struct sk_buff *skb)
 			goto error_nolock;
 	}
 
-	spin_lock_bh(&x->lock);
-	err = xfrm_state_check(x, skb);
-	if (err)
-		goto error;
-
-	xfrm6_encap(skb);
-
-	err = x->type->output(x, skb);
-	if (err)
-		goto error;
-
-	x->curlft.bytes += skb->len;
-	x->curlft.packets++;
-
-	spin_unlock_bh(&x->lock);
+	do {
+		spin_lock_bh(&x->lock);
+		err = xfrm_state_check(x, skb);
+		if (err)
+			goto error;
+
+		xfrm6_encap(skb);
+
+		err = x->type->output(x, skb);
+		if (err)
+			goto error;
+
+		x->curlft.bytes += skb->len;
+		x->curlft.packets++;
+
+		spin_unlock_bh(&x->lock);
+
+		skb->nh.raw = skb->data;
+		
+		if (!(skb->dst = dst_pop(dst))) {
+			err = -EHOSTUNREACH;
+			goto error_nolock;
+		}
+		dst = skb->dst;
+		x = dst->xfrm;
+	} while (x && !x->props.mode);
 
-	skb->nh.raw = skb->data;
-	
-	if (!(skb->dst = dst_pop(dst))) {
-		err = -EHOSTUNREACH;
-		goto error_nolock;
-	}
 	err = NET_XMIT_BYPASS;
 
 out_exit:

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-28  4:56               ` Herbert Xu
@ 2005-11-28 12:25                 ` Patrick McHardy
  2005-12-04 22:09                 ` Patrick McHardy
  1 sibling, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-28 12:25 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netfilter-devel, davem

Herbert Xu wrote:
> On Mon, Nov 28, 2005 at 02:07:03AM +0100, Patrick McHardy wrote:
> 
>>Thanks, this looks great. I've changed it to only call the hooks
> 
> 
> Glad you liked it :)
> 
> 
>>before tunnel mode transforms and added a missing dst_output call
>>for the final packet.
> 
> 
> This shouldn't be necessary if you apply it on top of my previous
> patch which made xfrm[46]_output process the first SA and all subsequent
> transport mode SAs.  I've included that patch here again.
> 
> I think it still makes sense to do that because this corresponds
> with the usual representation of an IPsec connection and it
> simplifies the handling of netfilter hooks.

I agree, I missed that your patch based on that one. Let me have
another look :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-11-28  4:56               ` Herbert Xu
  2005-11-28 12:25                 ` Patrick McHardy
@ 2005-12-04 22:09                 ` Patrick McHardy
  2005-12-04 22:15                   ` Herbert Xu
  1 sibling, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-12-04 22:09 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netfilter-devel, davem

Herbert Xu wrote:
>>before tunnel mode transforms and added a missing dst_output call
>>for the final packet.
> 
> This shouldn't be necessary if you apply it on top of my previous
> patch which made xfrm[46]_output process the first SA and all subsequent
> transport mode SAs.  I've included that patch here again.

Thanks, I've added the correct patch now :) Unless I missed something,
it was still missing a call to dst_output after the last transform
in xfrm4_output_finish, unless we keep the loop in dst_output.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks
  2005-12-04 22:09                 ` Patrick McHardy
@ 2005-12-04 22:15                   ` Herbert Xu
  0 siblings, 0 replies; 59+ messages in thread
From: Herbert Xu @ 2005-12-04 22:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Sun, Dec 04, 2005 at 11:09:09PM +0100, Patrick McHardy wrote:
> 
> Thanks, I've added the correct patch now :) Unless I missed something,
> it was still missing a call to dst_output after the last transform
> in xfrm4_output_finish, unless we keep the loop in dst_output.

Good catch.  The lines

		err = 0;
		if (!skb->dst->xfrm)
			break;

should read

		if (!skb->dst->xfrm)
			return dst_output(skb);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (4 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-21  4:42   ` Yasuyuki KOZAKAI
                     ` (3 more replies)
  2005-11-20 16:31 ` [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder Patrick McHardy
                   ` (8 subsequent siblings)
  14 siblings, 4 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[IPV4/6]: Netfilter IPsec input hooks

When the innermost transform uses transport mode the decapsulated packet
is not visible to netfilter. Pass the packet through the PRE_ROUTING and
LOCAL_IN hooks again before handing it to upper layer protocols to make
netfilter-visibility symetrical to the output path.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 08cf39d5d7d8b942431a6529daa3ab69ecfb34b5
tree 6f2a1a85f915b1b6f89ad50cf3d8855f21a561b6
parent b847425c693f43a63137d18e36e5c8cf0187c175
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:50:22 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 21:50:22 +0100

 include/linux/netfilter_ipv4.h |    2 +-
 include/net/ipv6.h             |    2 ++
 net/ipv4/netfilter.c           |   20 ++++++++++++++++++++
 net/ipv4/xfrm4_input.c         |   14 ++++++++++++++
 net/ipv6/ip6_input.c           |    2 +-
 net/ipv6/xfrm6_input.c         |   13 +++++++++++++
 6 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/include/linux/netfilter_ipv4.h b/include/linux/netfilter_ipv4.h
index fdc4a95..e9103fe 100644
--- a/include/linux/netfilter_ipv4.h
+++ b/include/linux/netfilter_ipv4.h
@@ -79,7 +79,7 @@ enum nf_ip_hook_priorities {
 
 #ifdef __KERNEL__
 extern int ip_route_me_harder(struct sk_buff **pskb);
-
+extern int ip_xfrm_transport_hook(struct sk_buff *skb);
 #endif /*__KERNEL__*/
 
 #endif /*__LINUX_IP_NETFILTER_H*/
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 6addb4d..4fbfe43 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -414,6 +414,8 @@ extern int			ipv6_rcv(struct sk_buff *sk
 					 struct packet_type *pt,
 					 struct net_device *orig_dev);
 
+extern int			ip6_rcv_finish(struct sk_buff *skb);
+
 /*
  *	upper-layer output functions
  */
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index b93e7cd..3c39296 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -105,6 +105,26 @@ int ip_dst_output(struct sk_buff *skb)
 	return dst_output(skb);
 }
 EXPORT_SYMBOL(ip_dst_output);
+
+/*
+ * okfn for transport mode xfrm_input.c hook. Basically a copy of
+ * ip_rcv_finish without statistics and option parsing.
+ */
+int ip_xfrm_transport_hook(struct sk_buff *skb)
+{
+	struct iphdr *iph = skb->nh.iph;
+
+	if (likely(skb->dst == NULL)) {
+		int err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos,
+		                         skb->dev);
+		if (unlikely(err))
+			goto drop;
+	}
+	return dst_input(skb);
+drop:
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
 #endif /* CONFIG_XFRM */
 
 /*
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 2d3849c..d90cd93 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -11,6 +11,8 @@
 
 #include <linux/module.h>
 #include <linux/string.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/inet_ecn.h>
 #include <net/ip.h>
 #include <net/xfrm.h>
@@ -137,6 +139,8 @@ int xfrm4_rcv_encap(struct sk_buff *skb,
 	memcpy(skb->sp->x+skb->sp->len, xfrm_vec, xfrm_nr*sizeof(struct sec_decap_state));
 	skb->sp->len += xfrm_nr;
 
+	nf_reset(skb);
+
 	if (decaps) {
 		if (!(skb->dev->flags&IFF_LOOPBACK)) {
 			dst_release(skb->dst);
@@ -145,7 +149,17 @@ int xfrm4_rcv_encap(struct sk_buff *skb,
 		netif_rx(skb);
 		return 0;
 	} else {
+#ifdef CONFIG_NETFILTER
+		__skb_push(skb, skb->data - skb->nh.raw);
+		skb->nh.iph->tot_len = htons(skb->len);
+		ip_send_check(skb->nh.iph);
+
+		NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
+		        ip_xfrm_transport_hook);
+		return 0;
+#else
 		return -skb->nh.iph->protocol;
+#endif
 	}
 
 drop_unlock:
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 33d3b0e..e84b3cd 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -48,7 +48,7 @@
 
 
 
-static inline int ip6_rcv_finish( struct sk_buff *skb) 
+inline int ip6_rcv_finish(struct sk_buff *skb) 
 {
 	if (skb->dst == NULL)
 		ip6_route_input(skb);
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 28c29d7..9987416 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -11,6 +11,8 @@
 
 #include <linux/module.h>
 #include <linux/string.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv6.h>
 #include <net/dsfield.h>
 #include <net/inet_ecn.h>
 #include <net/ip.h>
@@ -121,6 +123,8 @@ int xfrm6_rcv_spi(struct sk_buff **pskb,
 	skb->sp->len += xfrm_nr;
 	skb->ip_summed = CHECKSUM_NONE;
 
+	nf_reset(skb);
+
 	if (decaps) {
 		if (!(skb->dev->flags&IFF_LOOPBACK)) {
 			dst_release(skb->dst);
@@ -129,7 +133,16 @@ int xfrm6_rcv_spi(struct sk_buff **pskb,
 		netif_rx(skb);
 		return -1;
 	} else {
+#ifdef CONFIG_NETFILTER
+		skb->nh.ipv6h->payload_len = htons(skb->len);
+		__skb_push(skb, skb->data - skb->nh.raw);
+
+		NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
+		        ip6_rcv_finish);
+		return -1;
+#else
 		return 1;
+#endif
 	}
 
 drop_unlock:

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-20 16:31 ` [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks Patrick McHardy
@ 2005-11-21  4:42   ` Yasuyuki KOZAKAI
       [not found]   ` <200511210442.jAL4gPoO001846@toshiba.co.jp>
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 59+ messages in thread
From: Yasuyuki KOZAKAI @ 2005-11-21  4:42 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel, davem

Hi, Patrick,

From: Patrick McHardy <kaber@trash.net>
Date: Sun, 20 Nov 2005 17:31:36 +0100

> [IPV4/6]: Netfilter IPsec input hooks
> 
> When the innermost transform uses transport mode the decapsulated packet
> is not visible to netfilter. Pass the packet through the PRE_ROUTING and
> LOCAL_IN hooks again before handing it to upper layer protocols to make
> netfilter-visibility symetrical to the output path.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

At first, now I could agree to use same name for hooks before/after xfrm
processing, if it's important to keep compatibility than to avoid difficulty
to use. Even now I think it's confusing to pass packets before/after xfrm to
same hook, and believe it's ideal to use different name for them.
But people can configure correctly according to you, then my concern might be
needless fear.

Next is about re-visiting stack, I'm beginning to understand your patch.
It looks natural to re-route after DNAT on input path of IPv4. That's really
needed if packets have been mangled.

But is there any reason that we have to allow packets to re-visit
ip_local_deliver_finish() in the case that they have not been mangled
at PRE_ROUTING ? I think no. This situation is like ip_nat_out().

And this can be said about IPv6 input path. If packets have not been mangled
(this is ordinary case because ip6tables doesn't have neither NAT nor
target module to mangle addresses directly), they don't have to re-route
and don't have to re-visit ip6_input_finish().

In the other way, if their addresses have been mangled, it's necessary to
re-route. I agree re-visiting ip6_input_finish() in this case.

Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6)
so that ip_local_deliver_finish()/ip6_input_finish() can continue to process
headers if packets have not been mangled ? Is this difficult or impossible to
implement ?

This can solve the issue about twice processing of statistics and IPv6
extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can
continue to process extension headers after ESP/AH in ordinary case.

In special case, if some codes mangle IPv6 addresses, that's the codes
to take care of the possibility of re-visiting ip6_input_finish().

What do you think ?

# Please note that these are just my opinions and other USAGI guys might have
# other opinions.

Regards,

-- Yasuyuki Kozakai

^ permalink raw reply	[flat|nested] 59+ messages in thread

[parent not found: <200511210442.jAL4gPoO001846@toshiba.co.jp>]

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
       [not found]   ` <200511210442.jAL4gPoO001846@toshiba.co.jp>
@ 2005-11-21  6:52     ` Patrick McHardy
  2005-11-21  7:00       ` David S. Miller
                         ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-21  6:52 UTC (permalink / raw)
  To: Yasuyuki KOZAKAI; +Cc: netdev, netfilter-devel, davem

Yasuyuki KOZAKAI wrote:
> At first, now I could agree to use same name for hooks before/after xfrm
> processing, if it's important to keep compatibility than to avoid difficulty
> to use. Even now I think it's confusing to pass packets before/after xfrm to
> same hook, and believe it's ideal to use different name for them.
> But people can configure correctly according to you, then my concern might be
> needless fear.

I don't see why it is confusing. Plain text packets are visible before
encapsulation (and they have to be because we don't necessarily know
if packets will be encapsulated at the time the hooks are called in
case the policy lookup after NAT returns a policy), plain text packets
are visible after decapsulation. With different hooks we can't have
symetrical behaviour because of the case I mentioned above, and that
would be confusing IMO.

> Next is about re-visiting stack, I'm beginning to understand your patch.
> It looks natural to re-route after DNAT on input path of IPv4. That's really
> needed if packets have been mangled.
> 
> But is there any reason that we have to allow packets to re-visit
> ip_local_deliver_finish() in the case that they have not been mangled
> at PRE_ROUTING ? I think no. This situation is like ip_nat_out().

My patches don't change when and if packets will reach
ip_local_deliver_finish(), they just add a possibility for rerouting.
Currently the transforms are called from ip_local_deliver_finish() and
in transport mode the decapsulated packet continues its path in
ip_local_deliver_finish(), with my patches they will go through two
netfilter hooks and continue the exact same codepath, given that
they are not NATed to a foreign destination IP on their way.

> And this can be said about IPv6 input path. If packets have not been mangled
> (this is ordinary case because ip6tables doesn't have neither NAT nor
> target module to mangle addresses directly), they don't have to re-route
> and don't have to re-visit ip6_input_finish().
> 
> In the other way, if their addresses have been mangled, it's necessary to
> re-route. I agree re-visiting ip6_input_finish() in this case.

Same goes for ip6_input_finish as for ip_local_deliver_finish(),
the packet would continue its path there anyway. Do you actually
mean ip6_rcv_finish()?

> Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6)
> so that ip_local_deliver_finish()/ip6_input_finish() can continue to process
> headers if packets have not been mangled ? Is this difficult or impossible to
> implement ?

I'm not sure I understand. Do you propose to check if the packet was
mangled after the PRE_ROUTING hook (if it was not stolen or queued)
and if not return directly to ip6_input_finish()? Where would the
LOCAL_IN hook be called?

> This can solve the issue about twice processing of statistics and IPv6
> extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can
> continue to process extension headers after ESP/AH in ordinary case.

AFAICT statistics are not affected by my patches, except for the
iptables counters. The double parsing of extension headers is indeed
a problem I forgot about, it looks like we need to carry nhoff around
if it can't be derived from the packet.

Regards
Patrick

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-21  6:52     ` Patrick McHardy
@ 2005-11-21  7:00       ` David S. Miller
  2005-11-21  7:47         ` Herbert Xu
  2005-11-21 16:52         ` Patrick McHardy
  2005-11-21 10:53       ` Yasuyuki KOZAKAI
       [not found]       ` <200511211053.jALAro04019574@toshiba.co.jp>
  2 siblings, 2 replies; 59+ messages in thread
From: David S. Miller @ 2005-11-21  7:00 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel, yasuyuki.kozakai

From: Patrick McHardy <kaber@trash.net>
Date: Mon, 21 Nov 2005 07:52:36 +0100

> I don't see why it is confusing. Plain text packets are visible before
> encapsulation (and they have to be because we don't necessarily know
> if packets will be encapsulated at the time the hooks are called in
> case the policy lookup after NAT returns a policy), plain text packets
> are visible after decapsulation. With different hooks we can't have
> symetrical behaviour because of the case I mentioned above, and that
> would be confusing IMO.

I think this is a very important point.

I can see no serious argument against this behavior, especially for
output.  On input, there is an argument of paranoia about seeing
plaintext packets, but administrator could do this anyways with
tcpdump or custom kernel module if this system is the decapsulation
point.

I've read over Patrick's two most recent postings of these patches
and I think they are generally sane and I cannot find any holes in
them.  Herbert brought up the legitimate concern about defragmentation,
but I think that's a detail and does not take away from the structural
soundness of Patrick's approach.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-21  7:00       ` David S. Miller
@ 2005-11-21  7:47         ` Herbert Xu
  2005-11-21 16:52         ` Patrick McHardy
  1 sibling, 0 replies; 59+ messages in thread
From: Herbert Xu @ 2005-11-21  7:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, netfilter-devel, kaber, yasuyuki.kozakai

David S. Miller <davem@davemloft.net> wrote:
> 
> I've read over Patrick's two most recent postings of these patches
> and I think they are generally sane and I cannot find any holes in
> them.  Herbert brought up the legitimate concern about defragmentation,
> but I think that's a detail and does not take away from the structural
> soundness of Patrick's approach.

Yes I agree completely.  The new IPsec stack has been around for three
years and it's about time that we have proper netfilter support for it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-21  7:00       ` David S. Miller
  2005-11-21  7:47         ` Herbert Xu
@ 2005-11-21 16:52         ` Patrick McHardy
  1 sibling, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-21 16:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, netfilter-devel, yasuyuki.kozakai

David S. Miller wrote:
> I've read over Patrick's two most recent postings of these patches
> and I think they are generally sane and I cannot find any holes in
> them.  Herbert brought up the legitimate concern about defragmentation,
> but I think that's a detail and does not take away from the structural
> soundness of Patrick's approach.

I think we implicitly agreed on moving the POST_ROUTING hook before
fragmentation and change the user-visible behaviour of the mangle
POSTROUTING chain. At least neither Harald not Rusty objected to
the patch :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-21  6:52     ` Patrick McHardy
  2005-11-21  7:00       ` David S. Miller
@ 2005-11-21 10:53       ` Yasuyuki KOZAKAI
       [not found]       ` <200511211053.jALAro04019574@toshiba.co.jp>
  2 siblings, 0 replies; 59+ messages in thread
From: Yasuyuki KOZAKAI @ 2005-11-21 10:53 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel, davem, yasuyuki.kozakai


Hi,

From: Patrick McHardy <kaber@trash.net>
Date: Mon, 21 Nov 2005 07:52:36 +0100

> I don't see why it is confusing. Plain text packets are visible before
> encapsulation (and they have to be because we don't necessarily know
> if packets will be encapsulated at the time the hooks are called in
> case the policy lookup after NAT returns a policy), plain text packets
> are visible after decapsulation. With different hooks we can't have
> symetrical behaviour because of the case I mentioned above, and that
> would be confusing IMO.

Well, what I worried about was just ease to use, not internal processing.
I suspected that users can correctly configure IPsec and packet filtering.
Just doing "iptables -P INPUT -j DROP; iptables -A INPUT -p esp -j ACCEPT"
will drop all input packets and this is different behavior with current
kernel, for example. So I just imagined many people would say "Why ?".

But as you said in other mail, probably this is my needles fear, isn't this ?
As you know, I'm basically worrier. :)

> > And this can be said about IPv6 input path. If packets have not been mangled
> > (this is ordinary case because ip6tables doesn't have neither NAT nor
> > target module to mangle addresses directly), they don't have to re-route
> > and don't have to re-visit ip6_input_finish().
> > 
> > In the other way, if their addresses have been mangled, it's necessary to
> > re-route. I agree re-visiting ip6_input_finish() in this case.
> 
> Same goes for ip6_input_finish as for ip_local_deliver_finish(),
> the packet would continue its path there anyway. Do you actually
> mean ip6_rcv_finish()?

No. I mean ip6_input_finish(). calling ip6_input_finish() twice causes
problem at processing of IPv6 extension headers. This is different point
between IPv4 and IPv6.

Please note that I don't mean IPv4 processing in your patch has problem.
I think it will work. What I wanted to do was just avoiding double processing
of extension headers and synchronizing IPv4/IPv6 behavior as possible.

> > Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6)
> > so that ip_local_deliver_finish()/ip6_input_finish() can continue to process
> > headers if packets have not been mangled ? Is this difficult or impossible to
> > implement ?
> 
> I'm not sure I understand. Do you propose to check if the packet was
> mangled after the PRE_ROUTING hook (if it was not stolen or queued)
> and if not return directly to ip6_input_finish()?

Yes.

>                                                    Where would the
> LOCAL_IN hook be called?

Ah, indeed. But we can add it just before return directly to
ip6_input_finish().

> > This can solve the issue about twice processing of statistics and IPv6
> > extension headers. Because ip_local_deliver_finish()/ip6_input_finish() can
> > continue to process extension headers after ESP/AH in ordinary case.
> 
> AFAICT statistics are not affected by my patches, except for the
> iptables counters. The double parsing of extension headers is indeed
> a problem I forgot about, it looks like we need to carry nhoff around
> if it can't be derived from the packet.

Of cause that is one solution.

-- Yasuyuki Kozakai

^ permalink raw reply	[flat|nested] 59+ messages in thread

[parent not found: <200511211053.jALAro04019574@toshiba.co.jp>]

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
       [not found]       ` <200511211053.jALAro04019574@toshiba.co.jp>
@ 2005-11-21 16:34         ` Patrick McHardy
  0 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-21 16:34 UTC (permalink / raw)
  To: Yasuyuki KOZAKAI; +Cc: netdev, netfilter-devel, davem

Yasuyuki KOZAKAI wrote:
>>I don't see why it is confusing. Plain text packets are visible before
>>encapsulation (and they have to be because we don't necessarily know
>>if packets will be encapsulated at the time the hooks are called in
>>case the policy lookup after NAT returns a policy), plain text packets
>>are visible after decapsulation. With different hooks we can't have
>>symetrical behaviour because of the case I mentioned above, and that
>>would be confusing IMO.
> 
> Well, what I worried about was just ease to use, not internal processing.
> I suspected that users can correctly configure IPsec and packet filtering.
> Just doing "iptables -P INPUT -j DROP; iptables -A INPUT -p esp -j ACCEPT"
> will drop all input packets and this is different behavior with current
> kernel, for example. So I just imagined many people would say "Why ?".
> 
> But as you said in other mail, probably this is my needles fear, isn't this ?
> As you know, I'm basically worrier. :)

:) I think its OK since in tunnel mode the decapsulated packets already
need to be allowed by the ruleset, so I think its not too confusing to
expect the same in transport mode.

>>>Then, why don't we make xfrm{4,6}_rcv_spi() return 0 (1 if IPv6)
>>>so that ip_local_deliver_finish()/ip6_input_finish() can continue to process
>>>headers if packets have not been mangled ? Is this difficult or impossible to
>>>implement ?
>>
>>I'm not sure I understand. Do you propose to check if the packet was
>>mangled after the PRE_ROUTING hook (if it was not stolen or queued)
>>and if not return directly to ip6_input_finish()?
> 
> Yes.
> 
>> Where would the LOCAL_IN hook be called?
> 
> Ah, indeed. But we can add it just before return directly to
> ip6_input_finish().

Please see my mail to Kazunori. The hooks take ownership of the
skb, changing this would become pretty ugly because of queued
or stolen packets. And it would still leave one path where packets
are not directly returned, so I think the double-parsing should
be handled otherwise.

>>AFAICT statistics are not affected by my patches, except for the
>>iptables counters. The double parsing of extension headers is indeed
>>a problem I forgot about, it looks like we need to carry nhoff around
>>if it can't be derived from the packet.
> 
> Of cause that is one solution.

I'm going to try that if I can't come up with something better.

^ permalink raw reply	[flat|nested] 59+ messages in thread

[parent not found: <438185ED.3050005@miyazawa.org>]

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
       [not found]   ` <438185ED.3050005@miyazawa.org>
@ 2005-11-21  8:50     ` YOSHIFUJI Hideaki / 吉藤英明
  2005-11-21 16:29       ` Patrick McHardy
  0 siblings, 1 reply; 59+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-11-21  8:50 UTC (permalink / raw)
  To: kazunori, kaber; +Cc: netdev, netfilter-devel, davem

Hello.

In article <438185ED.3050005@miyazawa.org> (at Mon, 21 Nov 2005 17:31:41 +0900), Kazunori Miyazawa <kazunori@miyazawa.org> says:

> Your ip_xfrm_transport_hook is a good idea, I think.
> 
> We could call ip6_rcv_finish if the netfilter changed the addresses
> or otherwise we can continue the loop to avoid the cost in a similar
> way because we can know the change with checking skb->dst.

Well, I agree.

In article <20051120163135.16666.76993.sendpatchset@localhost.localdomain> (at Sun, 20 Nov 2005 17:31:36 +0100), Patrick McHardy <kaber@trash.net> says:

> diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
> index b93e7cd..3c39296 100644
> --- a/net/ipv4/netfilter.c
> +++ b/net/ipv4/netfilter.c
> @@ -105,6 +105,26 @@ int ip_dst_output(struct sk_buff *skb)
>  	return dst_output(skb);
>  }
>  EXPORT_SYMBOL(ip_dst_output);
> +
> +/*
> + * okfn for transport mode xfrm_input.c hook. Basically a copy of
> + * ip_rcv_finish without statistics and option parsing.
> + */
> +int ip_xfrm_transport_hook(struct sk_buff *skb)
> +{
> +	struct iphdr *iph = skb->nh.iph;
> +
> +	if (likely(skb->dst == NULL)) {
> +		int err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos,
> +		                         skb->dev);
> +		if (unlikely(err))
> +			goto drop;
> +	}
> +	return dst_input(skb);
> +drop:
> +	kfree_skb(skb);
> +	return NET_RX_DROP;
> +}
>  #endif /* CONFIG_XFRM */
>  
:
> @@ -129,7 +133,16 @@ int xfrm6_rcv_spi(struct sk_buff **pskb,
>  		netif_rx(skb);
>  		return -1;
>  	} else {
> +#ifdef CONFIG_NETFILTER
> +		skb->nh.ipv6h->payload_len = htons(skb->len);
> +		__skb_push(skb, skb->data - skb->nh.raw);
> +
> +		NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
> +		        ip6_rcv_finish);
> +		return -1;
> +#else
>  		return 1;
> +#endif
>  	}
>  

Probably, we can do similarly for ipv6; e.g.:

int ip6_xfrm_transport_hook(struct sk_buff *skb)
{
#if 0 /* We NEVER support NAT. :-) */
     if (likely(skb->dst == NULL)) {
            int err = ip6_route_input()
            if (unlikely(err))
                     goto drop;
     }
#endif
     __skb_pull(skb, skb->h.raw - skb->nh.raw);
     return NET_RX_SUCCESS;
drop:
     kfree_skb(skb);
     return NET_RX_DROP;
}

:

      } else {
#ifdef CONFIG_NETFILTER
             skb->nh.ipv6h->payload_len = htons(skb->len);
	     skb->h.raw = skb->data;
             __skb_push(skb, skb->data - skb->nh.raw);

             if (NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
                         ip6_xfrm_transport_hook) == NET_RX_DROP)
                 return -1;
#endif
             return 1;
      }

Then, we can continue parsing extension headers, I think.

-- 
YOSHIFUJI Hideaki @ USAGI Project  <yoshfuji@linux-ipv6.org>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-21  8:50     ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-11-21 16:29       ` Patrick McHardy
  0 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-21 16:29 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev, netfilter-devel, kazunori, davem

YOSHIFUJI Hideaki / ^[$B5HF#1QL@^[ wrote:
> Hello.
> 
> In article <438185ED.3050005@miyazawa.org> (at Mon, 21 Nov 2005 17:31:41 +0900), Kazunori Miyazawa <kazunori@miyazawa.org> says:
> 
> 
>>Your ip_xfrm_transport_hook is a good idea, I think.
>>
>>We could call ip6_rcv_finish if the netfilter changed the addresses
>>or otherwise we can continue the loop to avoid the cost in a similar
>>way because we can know the change with checking skb->dst.
> 
> 
> Well, I agree.
> 
> Probably, we can do similarly for ipv6; e.g.:
> 
> int ip6_xfrm_transport_hook(struct sk_buff *skb)
> {
> #if 0 /* We NEVER support NAT. :-) */
>      if (likely(skb->dst == NULL)) {
>             int err = ip6_route_input()
>             if (unlikely(err))
>                      goto drop;
>      }
> #endif
>      __skb_pull(skb, skb->h.raw - skb->nh.raw);
>      return NET_RX_SUCCESS;
> drop:
>      kfree_skb(skb);
>      return NET_RX_DROP;
> }
> 
> :
> 
>       } else {
> #ifdef CONFIG_NETFILTER
>              skb->nh.ipv6h->payload_len = htons(skb->len);
> 	     skb->h.raw = skb->data;
>              __skb_push(skb, skb->data - skb->nh.raw);
> 
>              if (NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
>                          ip6_xfrm_transport_hook) == NET_RX_DROP)
>                  return -1;
> #endif
>              return 1;
>       }
> 
> Then, we can continue parsing extension headers, I think.

Is it the rerouting you're concerned about? It will usually not
happen because skb->dst is not NULL. Its needed for NFQUEUE,
packets can be changes in userspace and need rerouting afterwards.
In any case there would still be one path on which extension
headers would be parsed twice, so I'm going to look into different
ways to fix that.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-20 16:31 ` [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks Patrick McHardy
                     ` (2 preceding siblings ...)
       [not found]   ` <438185ED.3050005@miyazawa.org>
@ 2005-12-01  1:27   ` Herbert Xu
  2005-12-04 22:06     ` Patrick McHardy
  3 siblings, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-12-01  1:27 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Sun, Nov 20, 2005 at 04:31:36PM +0000, Patrick McHardy wrote:
>
> @@ -145,7 +149,17 @@ int xfrm4_rcv_encap(struct sk_buff *skb,
>  		netif_rx(skb);
>  		return 0;
>  	} else {
> +#ifdef CONFIG_NETFILTER
> +		__skb_push(skb, skb->data - skb->nh.raw);
> +		skb->nh.iph->tot_len = htons(skb->len);
> +		ip_send_check(skb->nh.iph);
> +
> +		NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
> +		        ip_xfrm_transport_hook);
> +		return 0;
> +#else
>  		return -skb->nh.iph->protocol;
> +#endif

I'm worried about this bit.  This looks like it'll go back to the top
of the IP stack with the existing call chain.  So could grow as the
number of transforms increase.

Perhaps we need to play a dst_input/netif_rx trick here.

Actually, was there a problem with your original netif_rx approach
apart from the issue with double counting?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-12-01  1:27   ` Herbert Xu
@ 2005-12-04 22:06     ` Patrick McHardy
  2005-12-04 22:10       ` Herbert Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-12-04 22:06 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netfilter-devel, davem

Herbert Xu wrote:
> On Sun, Nov 20, 2005 at 04:31:36PM +0000, Patrick McHardy wrote:
> 
>>@@ -145,7 +149,17 @@ int xfrm4_rcv_encap(struct sk_buff *skb,
>> 		netif_rx(skb);
>> 		return 0;
>> 	} else {
>>+#ifdef CONFIG_NETFILTER
>>+		__skb_push(skb, skb->data - skb->nh.raw);
>>+		skb->nh.iph->tot_len = htons(skb->len);
>>+		ip_send_check(skb->nh.iph);
>>+
>>+		NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
>>+		        ip_xfrm_transport_hook);
>>+		return 0;
>>+#else
>> 		return -skb->nh.iph->protocol;
>>+#endif
> 
> 
> I'm worried about this bit.  This looks like it'll go back to the top
> of the IP stack with the existing call chain.  So could grow as the
> number of transforms increase.

Its not so bad. It adds ip_xfrm_transport_hook and
ip_local_deliver_finish to the call stack, but since two subsequent
transport mode SAs are always processed at once it can't take this
path again without calling netif_rx in between.

> Perhaps we need to play a dst_input/netif_rx trick here.
> 
> Actually, was there a problem with your original netif_rx approach
> apart from the issue with double counting?

Besides the double counting, packets also appear on the packet sockets
after transport mode decapsulation with the original approach. For
IPv6 there's also the double-parsing of extension header issue.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-12-04 22:06     ` Patrick McHardy
@ 2005-12-04 22:10       ` Herbert Xu
  2005-12-04 22:49         ` Patrick McHardy
  0 siblings, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-12-04 22:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Sun, Dec 04, 2005 at 11:06:02PM +0100, Patrick McHardy wrote:
>
> >I'm worried about this bit.  This looks like it'll go back to the top
> >of the IP stack with the existing call chain.  So could grow as the
> >number of transforms increase.
> 
> Its not so bad. It adds ip_xfrm_transport_hook and
> ip_local_deliver_finish to the call stack, but since two subsequent
> transport mode SAs are always processed at once it can't take this
> path again without calling netif_rx in between.

If there is a DNAT in the way, this will jump to the very start of
the stack.  So if we have a hostile IPsec peer, and the DNAT rules
are such that this can occur, then we could be in trouble (especially
because policy/selector verification does not occur until all IPsec
has been done so we can't check inner address validitiy at this point).
 
> Besides the double counting, packets also appear on the packet sockets
> after transport mode decapsulation with the original approach. For
> IPv6 there's also the double-parsing of extension header issue.

Having the packets appear twice on AF_PACKET is probably desirable :)

I'll need to think about the double-parsing though.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-12-04 22:10       ` Herbert Xu
@ 2005-12-04 22:49         ` Patrick McHardy
  0 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-12-04 22:49 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netfilter-devel, davem

Herbert Xu wrote:
> On Sun, Dec 04, 2005 at 11:06:02PM +0100, Patrick McHardy wrote:
> 
> If there is a DNAT in the way, this will jump to the very start of
> the stack.  So if we have a hostile IPsec peer, and the DNAT rules
> are such that this can occur, then we could be in trouble (especially
> because policy/selector verification does not occur until all IPsec
> has been done so we can't check inner address validitiy at this point).

We could return NET_XMIT_BYPASS from ip_xfrm_transport_hook(), although
it looks a bit ugly to use NET_XMIT* on the input path.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (5 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-28 21:06   ` Herbert Xu
  2005-11-20 16:31 ` [PATCH 08/13]: [NETFILTER]: Use conntrack information to determine if packet was NATed Patrick McHardy
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder

ip_route_me_harder doesn't use the port numbers of the xfrm lookup and
uses ip_route_input for non-local addresses which doesn't do a xfrm
lookup, ip6_route_me_harder doesn't do a xfrm lookup at all.

Use xfrm_decode_session and do the lookup manually, make sure both
only do the lookup if the packet hasn't been transformed already.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit ffa4445cd4284d3d9b688c80f5a3b9f8b26d59e6
tree 3edbdce75cc680c51e38697d45479dbfd4404452
parent 08cf39d5d7d8b942431a6529daa3ab69ecfb34b5
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:05:08 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:05:08 +0100

 include/linux/ipv6.h    |    2 ++
 include/net/ip.h        |    1 +
 include/net/xfrm.h      |    2 +-
 net/ipv4/netfilter.c    |    9 ++++++++-
 net/ipv4/xfrm4_output.c |    1 +
 net/ipv6/netfilter.c    |    8 +++++++-
 net/ipv6/xfrm6_output.c |    1 +
 net/xfrm/xfrm_policy.c  |    9 +++++----
 8 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index e0b9227..d7b3fac 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -190,6 +190,8 @@ struct inet6_skb_parm {
 	__u16			srcrt;
 	__u16			dst1;
 	__u16			lastopt;
+	__u16			flags;
+#define IP6SKB_XFRM_TRANSFORMED	1
 };
 
 #define IP6CB(skb)	((struct inet6_skb_parm*)((skb)->cb))
diff --git a/include/net/ip.h b/include/net/ip.h
index 9f09882..377036b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -45,6 +45,7 @@ struct inet_skb_parm
 #define IPSKB_TRANSLATED	2
 #define IPSKB_FORWARDED		4
 #define IPSKB_XFRM_TUNNEL_SIZE	8
+#define IPSKB_XFRM_TRANSFORMED	16
 };
 
 struct ipcm_cookie
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 5beae1c..19d6aa0 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -644,7 +644,7 @@ static inline int xfrm6_policy_check(str
 	return xfrm_policy_check(sk, dir, skb, AF_INET6);
 }
 
-
+extern int xfrm_decode_session(struct sk_buff *skb, struct flowi *fl, unsigned short family);
 extern int __xfrm_route_forward(struct sk_buff *skb, unsigned short family);
 
 static inline int xfrm_route_forward(struct sk_buff *skb, unsigned short family)
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index 3c39296..db330b6 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -13,6 +13,7 @@
 #include <linux/ip.h>
 #include <net/route.h>
 #include <net/xfrm.h>
+#include <net/ip.h>
 
 /* route_me_harder function, used by iptable_nat, iptable_mangle + ip_queue */
 int ip_route_me_harder(struct sk_buff **pskb)
@@ -34,7 +35,6 @@ int ip_route_me_harder(struct sk_buff **
 #ifdef CONFIG_IP_ROUTE_FWMARK
 		fl.nl_u.ip4_u.fwmark = (*pskb)->nfmark;
 #endif
-		fl.proto = iph->protocol;
 		if (ip_route_output_key(&rt, &fl) != 0)
 			return -1;
 
@@ -61,6 +61,13 @@ int ip_route_me_harder(struct sk_buff **
 	if ((*pskb)->dst->error)
 		return -1;
 
+#ifdef CONFIG_XFRM
+	if (!(IPCB(*pskb)->flags & IPSKB_XFRM_TRANSFORMED) &&
+	    xfrm_decode_session(*pskb, &fl, AF_INET) == 0)
+		if (xfrm_lookup(&(*pskb)->dst, &fl, (*pskb)->sk, 0))
+			return -1;
+#endif
+
 	/* Change in oif may mean change in hh_len. */
 	hh_len = (*pskb)->dst->dev->hard_header_len;
 	if (skb_headroom(*pskb) < hh_len) {
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index c135746..9e49eeb 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -133,6 +133,7 @@ int xfrm4_output(struct sk_buff *skb)
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 	nf_reset(skb);
 	err = NET_XMIT_BYPASS;
 
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 06b275e..8bc6305 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -22,7 +22,6 @@ int ip6_route_me_harder(struct sk_buff *
 		{ .ip6_u =
 		  { .daddr = iph->daddr,
 		    .saddr = iph->saddr, } },
-		.proto = iph->nexthdr,
 	};
 
 	dst = ip6_route_output(skb->sk, &fl);
@@ -34,6 +33,13 @@ int ip6_route_me_harder(struct sk_buff *
 		return -EINVAL;
 	}
 
+#ifdef CONFIG_XFRM
+	if (!(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) &&
+	    xfrm_decode_session(skb, &fl, AF_INET6) == 0)
+		if (xfrm_lookup(&skb->dst, &fl, skb->sk, 0))
+			return -1;
+#endif
+
 	/* Drop old route. */
 	dst_release(skb->dst);
 
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index a566d25..929e4eb 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -132,6 +132,7 @@ int xfrm6_output(struct sk_buff *skb)
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
+	IP6CB(skb)->flags |= IP6SKB_XFRM_TRANSFORMED;
 	nf_reset(skb);
 	err = NET_XMIT_BYPASS;
 
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 0db9e57..e441f35 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -906,8 +906,8 @@ xfrm_policy_ok(struct xfrm_tmpl *tmpl, s
 	return start;
 }
 
-static int
-_decode_session(struct sk_buff *skb, struct flowi *fl, unsigned short family)
+int xfrm_decode_session(struct sk_buff *skb, struct flowi *fl,
+                        unsigned short family)
 {
 	struct xfrm_policy_afinfo *afinfo = xfrm_policy_get_afinfo(family);
 
@@ -918,6 +918,7 @@ _decode_session(struct sk_buff *skb, str
 	xfrm_policy_put_afinfo(afinfo);
 	return 0;
 }
+EXPORT_SYMBOL(xfrm_decode_session);
 
 static inline int secpath_has_tunnel(struct sec_path *sp, int k)
 {
@@ -935,7 +936,7 @@ int __xfrm_policy_check(struct sock *sk,
 	struct xfrm_policy *pol;
 	struct flowi fl;
 
-	if (_decode_session(skb, &fl, family) < 0)
+	if (xfrm_decode_session(skb, &fl, family) < 0)
 		return 0;
 
 	/* First, check used SA against their selectors. */
@@ -1007,7 +1008,7 @@ int __xfrm_route_forward(struct sk_buff 
 {
 	struct flowi fl;
 
-	if (_decode_session(skb, &fl, family) < 0)
+	if (xfrm_decode_session(skb, &fl, family) < 0)
 		return 0;
 
 	return xfrm_lookup(&skb->dst, &fl, NULL, 0) == 0;

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
  2005-11-20 16:31 ` [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder Patrick McHardy
@ 2005-11-28 21:06   ` Herbert Xu
  2005-11-29  7:02     ` Patrick McHardy
  0 siblings, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-11-28 21:06 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Sun, Nov 20, 2005 at 04:31:37PM +0000, Patrick McHardy wrote:
>
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 9f09882..377036b 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -45,6 +45,7 @@ struct inet_skb_parm
>  #define IPSKB_TRANSLATED	2
>  #define IPSKB_FORWARDED		4
>  #define IPSKB_XFRM_TUNNEL_SIZE	8
> +#define IPSKB_XFRM_TRANSFORMED	16
>  };

My only question about this patch is where should we clear these flags?
For instance, when ipip/gre transmits a packet, should this flag (and
perhaps other flags here) be cleared?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
  2005-11-28 21:06   ` Herbert Xu
@ 2005-11-29  7:02     ` Patrick McHardy
  2005-11-29  7:34       ` Herbert Xu
  0 siblings, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-11-29  7:02 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netfilter-devel, davem

Herbert Xu wrote:
> On Sun, Nov 20, 2005 at 04:31:37PM +0000, Patrick McHardy wrote:
> 
>>diff --git a/include/net/ip.h b/include/net/ip.h
>>index 9f09882..377036b 100644
>>--- a/include/net/ip.h
>>+++ b/include/net/ip.h
>>@@ -45,6 +45,7 @@ struct inet_skb_parm
>> #define IPSKB_TRANSLATED	2
>> #define IPSKB_FORWARDED		4
>> #define IPSKB_XFRM_TUNNEL_SIZE	8
>>+#define IPSKB_XFRM_TRANSFORMED	16
>> };
> 
> 
> My only question about this patch is where should we clear these flags?
> For instance, when ipip/gre transmits a packet, should this flag (and
> perhaps other flags here) be cleared?


Good point. This specific flags should be cleared when a packet
(re-)enters the IP stack, I guess by definition of the cb, this
holds for the other flags as well. Looking at the other flags:

- IPSKB_MASQUERADED is unused
- IPSKB_TRANSLATED is unused
- IPSKB_FORWARDED is used by ipmr in a way that looks broken,
   it expects the flags on the input path to be the same it set
   on the output path.
- IPSKB_XFRM_TUNNEL_SIZE should be cleared when a packet enters
   the IP stack

It seems in most places where only IPCB(skb)->opt is cleared
the entire CB should be cleared. A couple of spots also look
completely unnecessary, for example all places clearing the CB
before passing the packet to netif_rx. I would expect the next
user beeing responsible for clearing the space he needs if
neccessary.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
  2005-11-29  7:02     ` Patrick McHardy
@ 2005-11-29  7:34       ` Herbert Xu
  2005-11-29  7:49         ` David S. Miller
  0 siblings, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-11-29  7:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

On Tue, Nov 29, 2005 at 08:02:34AM +0100, Patrick McHardy wrote:
> 
> - IPSKB_MASQUERADED is unused
> - IPSKB_TRANSLATED is unused
> - IPSKB_FORWARDED is used by ipmr in a way that looks broken,
>   it expects the flags on the input path to be the same it set
>   on the output path.
> - IPSKB_XFRM_TUNNEL_SIZE should be cleared when a packet enters
>   the IP stack

Yes that looks correct.

> It seems in most places where only IPCB(skb)->opt is cleared
> the entire CB should be cleared. A couple of spots also look
> completely unnecessary, for example all places clearing the CB
> before passing the packet to netif_rx. I would expect the next
> user beeing responsible for clearing the space he needs if
> neccessary.

Agreed.  However, it seems that ip_rcv() only clears the CB options
if ihl is greater than 5.  So until that's changed the people feeding
netif_rx will have to clear the CB.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
  2005-11-29  7:34       ` Herbert Xu
@ 2005-11-29  7:49         ` David S. Miller
  2005-11-29 11:31           ` Herbert Xu
  0 siblings, 1 reply; 59+ messages in thread
From: David S. Miller @ 2005-11-29  7:49 UTC (permalink / raw)
  To: herbert; +Cc: netdev, netfilter-devel, kaber

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 29 Nov 2005 18:34:41 +1100

> On Tue, Nov 29, 2005 at 08:02:34AM +0100, Patrick McHardy wrote:
> > It seems in most places where only IPCB(skb)->opt is cleared
> > the entire CB should be cleared. A couple of spots also look
> > completely unnecessary, for example all places clearing the CB
> > before passing the packet to netif_rx. I would expect the next
> > user beeing responsible for clearing the space he needs if
> > neccessary.
> 
> Agreed.  However, it seems that ip_rcv() only clears the CB options
> if ihl is greater than 5.  So until that's changed the people feeding
> netif_rx will have to clear the CB.

I wonder if that stuff can be simplified somehow.

We only use those options in two ways:

1) To process early in input via ip_options_compile()
   and the source route check in ip_rcv_finish()

2) To do forwarding processing on options in ip_forward_finish()
   and the multicast equivalent in ipmr.c

3) For locally destined packets, when the options are to be
   passed to the user via a recvmsg() CMSG.

Well, there is a 4th, which is what we're talking about here,
which is all of the zero'ing out of the thing during encapsulation
which is mostly a waste.

I think #1 and #2 can be handled by an on-stack copy of "struct
ip_options" high enough in the call chain, but #3 is a bit less
trivial to cope with like that.

It would be nice to kill the IPCB() copy, and give us 12 bytes back in
skb->cb[] :-)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
  2005-11-29  7:49         ` David S. Miller
@ 2005-11-29 11:31           ` Herbert Xu
  0 siblings, 0 replies; 59+ messages in thread
From: Herbert Xu @ 2005-11-29 11:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, netfilter-devel, kaber

On Mon, Nov 28, 2005 at 11:49:47PM -0800, David S. Miller wrote:
> 
> I think #1 and #2 can be handled by an on-stack copy of "struct
> ip_options" high enough in the call chain, but #3 is a bit less
> trivial to cope with like that.
> 
> It would be nice to kill the IPCB() copy, and give us 12 bytes back in
> skb->cb[] :-)

That would be great.  I'm glad that Patrick's patch set is bringing
out potential ways of shrinking sk_buff.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 08/13]: [NETFILTER]: Use conntrack information to determine if packet was NATed
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (6 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 09/13]: [NETFILTER]: Redo policy lookups after NAT when neccessary Patrick McHardy
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Use conntrack information to determine if packet was NATed

Preparation for full IPsec support for NAT:

Use conntrack information instead of saving the saving and comparing the
addresses to determine if a packet was NATed and needs to be rerouted to
make it easier to extend the key.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 864cef1bc7d011f4a07b01043786107ad570c820
tree dbf671924ab6eb7bd87baaae9118bb57d7b3ab8e
parent ffa4445cd4284d3d9b688c80f5a3b9f8b26d59e6
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:05:26 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:05:26 +0100

 net/ipv4/netfilter/ip_nat_standalone.c |   34 ++++++++++++++++++--------------
 1 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c
index f04111f..1bb5089 100644
--- a/net/ipv4/netfilter/ip_nat_standalone.c
+++ b/net/ipv4/netfilter/ip_nat_standalone.c
@@ -162,18 +162,20 @@ ip_nat_in(unsigned int hooknum,
           const struct net_device *out,
           int (*okfn)(struct sk_buff *))
 {
-	u_int32_t saddr, daddr;
+	struct ip_conntrack *ct;
+	enum ip_conntrack_info ctinfo;
 	unsigned int ret;
 
-	saddr = (*pskb)->nh.iph->saddr;
-	daddr = (*pskb)->nh.iph->daddr;
-
 	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
 	if (ret != NF_DROP && ret != NF_STOLEN
-	    && ((*pskb)->nh.iph->saddr != saddr
-	        || (*pskb)->nh.iph->daddr != daddr)) {
-		dst_release((*pskb)->dst);
-		(*pskb)->dst = NULL;
+	    && (ct = ip_conntrack_get(*pskb, &ctinfo)) != NULL) {
+		enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+
+		if (ct->tuplehash[dir].tuple.src.ip !=
+		    ct->tuplehash[!dir].tuple.dst.ip) {
+			dst_release((*pskb)->dst);
+			(*pskb)->dst = NULL;
+		}
 	}
 	return ret;
 }
@@ -200,7 +202,8 @@ ip_nat_local_fn(unsigned int hooknum,
 		const struct net_device *out,
 		int (*okfn)(struct sk_buff *))
 {
-	u_int32_t saddr, daddr;
+	struct ip_conntrack *ct;
+	enum ip_conntrack_info ctinfo;
 	unsigned int ret;
 
 	/* root is playing with raw sockets. */
@@ -208,14 +211,15 @@ ip_nat_local_fn(unsigned int hooknum,
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
 		return NF_ACCEPT;
 
-	saddr = (*pskb)->nh.iph->saddr;
-	daddr = (*pskb)->nh.iph->daddr;
-
 	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
 	if (ret != NF_DROP && ret != NF_STOLEN
-	    && ((*pskb)->nh.iph->saddr != saddr
-		|| (*pskb)->nh.iph->daddr != daddr))
-		return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;
+	    && (ct = ip_conntrack_get(*pskb, &ctinfo)) != NULL) {
+		enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+
+		if (ct->tuplehash[dir].tuple.dst.ip !=
+		    ct->tuplehash[!dir].tuple.src.ip)
+			return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;
+	}
 	return ret;
 }
 

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/13]: [NETFILTER]: Redo policy lookups after NAT when neccessary
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (7 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 08/13]: [NETFILTER]: Use conntrack information to determine if packet was NATed Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:43   ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 10/13]: [NETFILTER]: Keep the conntrack reference until after policy checks Patrick McHardy
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Redo policy lookups after NAT when neccessary

When NAT changes the key used for the xfrm lookup it needs to be done again.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 8cb6cfa80dd5dc4da1de280a0278746c262a2d8d
tree efffac2335bd21d0a0c2aa848df544002e4316f4
parent 864cef1bc7d011f4a07b01043786107ad570c820
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:07:34 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:07:34 +0100

 include/net/dst.h                      |    1 +
 net/ipv4/ip_output.c                   |    7 ++++++-
 net/ipv4/netfilter.c                   |    2 +-
 net/ipv4/netfilter/ip_nat_standalone.c |   27 +++++++++++++++++++++++++--
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 7eadd0c..4630e17 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -237,6 +237,7 @@ static inline int dst_output(struct sk_b
 }
 
 #if defined(CONFIG_XFRM) && defined(CONFIG_NETFILTER)
+extern int __ip_dst_output(struct sk_buff *skb);
 extern int ip_dst_output(struct sk_buff *skb);
 extern int ip6_dst_output(struct sk_buff *skb);
 #else
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 2c91f03..6836389 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -195,13 +195,18 @@ static inline int ip_finish_output2(stru
 		return dst->neighbour->output(skb);
 
 	if (net_ratelimit())
-		printk(KERN_DEBUG "ip_finish_output2: No header cache and no neighbour!\n");
+		printk(KERN_DEBUG "ip_finish_output3: No header cache and no neighbour!\n");
 	kfree_skb(skb);
 	return -EINVAL;
 }
 
 static inline int ip_finish_output(struct sk_buff *skb)
 {
+#if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
+	/* Policy lookup after SNAT yielded a new policy */
+	if (skb->dst->xfrm != NULL)
+		return __ip_dst_output(skb);
+#endif
 	if (skb->len > dst_mtu(skb->dst) &&
 	    !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
 		return ip_fragment(skb, ip_finish_output2);
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index db330b6..8fda96a 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -87,7 +87,7 @@ int ip_route_me_harder(struct sk_buff **
 EXPORT_SYMBOL(ip_route_me_harder);
 
 #ifdef CONFIG_XFRM
-static inline int __ip_dst_output(struct sk_buff *skb)
+inline int __ip_dst_output(struct sk_buff *skb)
 {
 	int err;
 
diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c
index 1bb5089..b518697 100644
--- a/net/ipv4/netfilter/ip_nat_standalone.c
+++ b/net/ipv4/netfilter/ip_nat_standalone.c
@@ -187,12 +187,30 @@ ip_nat_out(unsigned int hooknum,
 	   const struct net_device *out,
 	   int (*okfn)(struct sk_buff *))
 {
+	struct ip_conntrack *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned int ret;
+
 	/* root is playing with raw sockets. */
 	if ((*pskb)->len < sizeof(struct iphdr)
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
 		return NF_ACCEPT;
 
-	return ip_nat_fn(hooknum, pskb, in, out, okfn);
+	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
+	if (ret != NF_DROP && ret != NF_STOLEN
+	    && (ct = ip_conntrack_get(*pskb, &ctinfo)) != NULL) {
+		enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+
+		if (ct->tuplehash[dir].tuple.src.ip !=
+		    ct->tuplehash[!dir].tuple.dst.ip
+#ifdef CONFIG_XFRM
+		    || ct->tuplehash[dir].tuple.src.u.all !=
+		       ct->tuplehash[!dir].tuple.dst.u.all
+#endif
+		    )
+			return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;
+	}
+	return ret;
 }
 
 static unsigned int
@@ -217,7 +235,12 @@ ip_nat_local_fn(unsigned int hooknum,
 		enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
 
 		if (ct->tuplehash[dir].tuple.dst.ip !=
-		    ct->tuplehash[!dir].tuple.src.ip)
+		    ct->tuplehash[!dir].tuple.src.ip
+#ifdef CONFIG_XFRM
+		    || ct->tuplehash[dir].tuple.dst.u.all !=
+		       ct->tuplehash[dir].tuple.src.u.all
+#endif
+		    )
 			return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;
 	}
 	return ret;

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 09/13]: [NETFILTER]: Redo policy lookups after NAT when neccessary
  2005-11-20 16:31 ` [PATCH 09/13]: [NETFILTER]: Redo policy lookups after NAT when neccessary Patrick McHardy
@ 2005-11-20 16:43   ` Patrick McHardy
  0 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:43 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel

Patrick McHardy wrote:
> [NETFILTER]: Redo policy lookups after NAT when neccessary
> 
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -195,13 +195,18 @@ static inline int ip_finish_output2(stru
>  		return dst->neighbour->output(skb);
>  
>  	if (net_ratelimit())
> -		printk(KERN_DEBUG "ip_finish_output2: No header cache and no neighbour!\n");
> +		printk(KERN_DEBUG "ip_finish_output3: No header cache and no neighbour!\n");
>  	kfree_skb(skb);
>  	return -EINVAL;
>  }

Damnit .. if you apply the patches please edit out this chunk,
its a remnant from an earlier series.

Thanks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 10/13]: [NETFILTER]: Keep the conntrack reference until after policy checks
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (8 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 09/13]: [NETFILTER]: Redo policy lookups after NAT when neccessary Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 11/13]: [NETFILTER]: Handle NAT in IPsec " Patrick McHardy
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Keep the conntrack reference until after policy checks

Keep the conntrack reference until policy checks have been performed for
IPsec NAT support. The reference needs to be dropped before a packet is
queued to avoid having the conntrack module unloadable and before icmp_send
is called to avoid having the reference attached manually to the outgoing
ICMP packet and the conntrack entry confirmed when ICMP packets leaves the
machine.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 3f615f37e68903f0eea66f5b242bfbb1875ee204
tree 4c3f9966a01f44784b9cd42e8f7091dbbc09fe5a
parent 8cb6cfa80dd5dc4da1de280a0278746c262a2d8d
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:08:45 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:08:45 +0100

 net/dccp/ipv4.c     |    1 +
 net/ipv4/ip_input.c |   15 +++++++--------
 net/ipv4/raw.c      |    1 +
 net/ipv4/tcp_ipv4.c |    1 +
 net/ipv4/udp.c      |    2 ++
 net/sctp/input.c    |    1 +
 6 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ca03521..0030923 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -1147,6 +1147,7 @@ int dccp_v4_rcv(struct sk_buff *skb)
 		dccp_pr_debug("xfrm4_policy_check failed\n");
 		goto discard_and_relse;
 	}
+	nf_reset(skb);
 
         if (sk_filter(sk, skb, 0)) {
 		dccp_pr_debug("sk_filter failed\n");
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 473d0f2..1757778 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -203,10 +203,6 @@ static inline int ip_local_deliver_finis
 
 	__skb_pull(skb, ihl);
 
-	/* Free reference early: we don't need it any more, and it may
-           hold ip_conntrack module loaded indefinitely. */
-	nf_reset(skb);
-
         /* Point into the IP datagram, just past the header. */
         skb->h.raw = skb->data;
 
@@ -231,10 +227,12 @@ static inline int ip_local_deliver_finis
 		if ((ipprot = rcu_dereference(inet_protos[hash])) != NULL) {
 			int ret;
 
-			if (!ipprot->no_policy &&
-			    !xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-				kfree_skb(skb);
-				goto out;
+			if (!ipprot->no_policy) {
+				if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+					kfree_skb(skb);
+					goto out;
+				}
+				nf_reset(skb);
 			}
 			ret = ipprot->handler(skb);
 			if (ret < 0) {
@@ -246,6 +244,7 @@ static inline int ip_local_deliver_finis
 			if (!raw_sk) {
 				if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
 					IP_INC_STATS_BH(IPSTATS_MIB_INUNKNOWNPROTOS);
+					nf_reset(skb);
 					icmp_send(skb, ICMP_DEST_UNREACH,
 						  ICMP_PROT_UNREACH, 0);
 				}
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 421538a..8251a28 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -255,6 +255,7 @@ int raw_rcv(struct sock *sk, struct sk_b
 		kfree_skb(skb);
 		return NET_RX_DROP;
 	}
+	nf_reset(skb);
 
 	skb_push(skb, skb->data - skb->nh.raw);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4d5021e..66300f9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1238,6 +1238,7 @@ process:
 
 	if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
 		goto discard_and_relse;
+	nf_reset(skb);
 
 	if (sk_filter(sk, skb, 0))
 		goto discard_and_relse;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 2422a5f..483ca69 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1001,6 +1001,7 @@ static int udp_queue_rcv_skb(struct sock
 		kfree_skb(skb);
 		return -1;
 	}
+	nf_reset(skb);
 
 	if (up->encap_type) {
 		/*
@@ -1163,6 +1164,7 @@ int udp_rcv(struct sk_buff *skb)
 
 	if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 		goto drop;
+	nf_reset(skb);
 
 	/* No socket. Drop packet silently, if checksum is wrong */
 	if (udp_checksum_complete(skb))
diff --git a/net/sctp/input.c b/net/sctp/input.c
index b24ff2c..100f577 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -225,6 +225,7 @@ int sctp_rcv(struct sk_buff *skb)
 
 	if (!xfrm_policy_check(sk, XFRM_POLICY_IN, skb, family))
 		goto discard_release;
+	nf_reset(skb);
 
 	ret = sk_filter(sk, skb, 1);
 	if (ret)

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 11/13]: [NETFILTER]: Handle NAT in IPsec policy checks
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (9 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 10/13]: [NETFILTER]: Keep the conntrack reference until after policy checks Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 12/13]: [NETFILTER]: Export ip6_masked_addrcmp, don't pass IPv6 addresses on stack Patrick McHardy
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Handle NAT in IPsec policy checks

Handle NAT of decapsulated IPsec packets by reconstructing the struct flowi
of the original packet from the conntrack information for IPsec policy
checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 8b46eb2d8365ab18cc965f37681033162a834fe5
tree 5e8b46d5acd6fa4b1445f181fcf8bce94cade5b0
parent 3f615f37e68903f0eea66f5b242bfbb1875ee204
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:09:15 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:09:15 +0100

 include/linux/netfilter.h              |   16 ++++++++++
 net/ipv4/netfilter.c                   |    3 ++
 net/ipv4/netfilter/ip_nat_standalone.c |   50 +++++++++++++++++++++++++++++++-
 net/xfrm/xfrm_policy.c                 |    2 +
 4 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index be365e7..cc9ee97 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -261,6 +261,20 @@ struct nf_queue_rerouter {
 extern int nf_register_queue_rerouter(int pf, struct nf_queue_rerouter *rer);
 extern int nf_unregister_queue_rerouter(int pf);
 
+#include <net/flow.h>
+extern void (*ip_nat_decode_session)(struct sk_buff *, struct flowi *);
+
+static inline void
+nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, int family)
+{
+#ifdef CONFIG_IP_NF_NAT_NEEDED
+	void (*decodefn)(struct sk_buff *, struct flowi *);
+
+	if (family == AF_INET && (decodefn = ip_nat_decode_session) != NULL)
+		decodefn(skb, fl);
+#endif
+}
+
 #ifdef CONFIG_PROC_FS
 #include <linux/proc_fs.h>
 extern struct proc_dir_entry *proc_net_netfilter;
@@ -269,6 +283,8 @@ extern struct proc_dir_entry *proc_net_n
 #else /* !CONFIG_NETFILTER */
 #define NF_HOOK(pf, hook, skb, indev, outdev, okfn) (okfn)(skb)
 static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {}
+static inline void
+nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, int family) {}
 #endif /*CONFIG_NETFILTER*/
 
 #endif /*__KERNEL__*/
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index 8fda96a..b66856d 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -134,6 +134,9 @@ drop:
 }
 #endif /* CONFIG_XFRM */
 
+void (*ip_nat_decode_session)(struct sk_buff *, struct flowi *);
+EXPORT_SYMBOL(ip_nat_decode_session);
+
 /*
  * Extra routing may needed on local out, as the QUEUE target never
  * returns control to the table.
diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c
index b518697..8b8a1f0 100644
--- a/net/ipv4/netfilter/ip_nat_standalone.c
+++ b/net/ipv4/netfilter/ip_nat_standalone.c
@@ -55,6 +55,44 @@
 			         : ((hooknum) == NF_IP_LOCAL_IN ? "LOCAL_IN"  \
 				    : "*ERROR*")))
 
+#ifdef CONFIG_XFRM
+static void nat_decode_session(struct sk_buff *skb, struct flowi *fl)
+{
+	struct ip_conntrack *ct;
+	struct ip_conntrack_tuple *t;
+	enum ip_conntrack_info ctinfo;
+	enum ip_conntrack_dir dir;
+	unsigned long statusbit;
+
+	ct = ip_conntrack_get(skb, &ctinfo);
+	if (ct == NULL)
+		return;
+	dir = CTINFO2DIR(ctinfo);
+	t = &ct->tuplehash[dir].tuple;
+
+	if (dir == IP_CT_DIR_ORIGINAL)
+		statusbit = IPS_DST_NAT;
+	else
+		statusbit = IPS_SRC_NAT;
+
+	if (ct->status & statusbit) {
+		fl->fl4_dst = t->dst.ip;
+		if (t->dst.protonum == IPPROTO_TCP ||
+		    t->dst.protonum == IPPROTO_UDP)
+			fl->fl_ip_dport = t->dst.u.tcp.port;
+	}
+
+	statusbit ^= IPS_NAT_MASK;
+
+	if (ct->status & statusbit) {
+		fl->fl4_src = t->src.ip;
+		if (t->dst.protonum == IPPROTO_TCP ||
+		    t->dst.protonum == IPPROTO_UDP)
+			fl->fl_ip_sport = t->src.u.tcp.port;
+	}
+}
+#endif
+		
 static unsigned int
 ip_nat_fn(unsigned int hooknum,
 	  struct sk_buff **pskb,
@@ -330,10 +368,14 @@ static int init_or_cleanup(int init)
 
 	if (!init) goto cleanup;
 
+#ifdef CONFIG_XFRM
+	BUG_ON(ip_nat_decode_session != NULL);
+	ip_nat_decode_session = nat_decode_session;
+#endif
 	ret = ip_nat_rule_init();
 	if (ret < 0) {
 		printk("ip_nat_init: can't setup rules.\n");
-		goto cleanup_nothing;
+		goto cleanup_decode_session;
 	}
 	ret = nf_register_hook(&ip_nat_in_ops);
 	if (ret < 0) {
@@ -381,7 +423,11 @@ static int init_or_cleanup(int init)
 	nf_unregister_hook(&ip_nat_in_ops);
  cleanup_rule_init:
 	ip_nat_rule_cleanup();
- cleanup_nothing:
+ cleanup_decode_session:
+#ifdef CONFIG_XFRM
+	ip_nat_decode_session = NULL;
+	synchronize_net();
+#endif
 	return ret;
 }
 
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index e441f35..8351bb0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -22,6 +22,7 @@
 #include <linux/workqueue.h>
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
+#include <linux/netfilter.h>
 #include <linux/module.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
@@ -938,6 +939,7 @@ int __xfrm_policy_check(struct sock *sk,
 
 	if (xfrm_decode_session(skb, &fl, family) < 0)
 		return 0;
+	nf_nat_decode_session(skb, &fl, family);
 
 	/* First, check used SA against their selectors. */
 	if (skb->sp) {

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/13]: [NETFILTER]: Export ip6_masked_addrcmp, don't pass IPv6 addresses on stack
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (10 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 11/13]: [NETFILTER]: Handle NAT in IPsec " Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
  2005-11-20 16:31 ` [PATCH 13/13]: [NETFILTER]: Add ipt_policy/ip6t_policy matches Patrick McHardy
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Export ip6_masked_addrcmp, don't pass IPv6 addresses on stack

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 055c50b770e63ced784808ae22ef339724b1a44c
tree b8dc07727bb80b83c5b236f4157ed588927f46da
parent 8b46eb2d8365ab18cc965f37681033162a834fe5
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:09:28 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:09:28 +0100

 include/linux/netfilter_ipv6/ip6_tables.h |    4 ++++
 net/ipv6/netfilter/ip6_tables.c           |   18 ++++++++++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/linux/netfilter_ipv6/ip6_tables.h
index 2efc046..1e11010 100644
--- a/include/linux/netfilter_ipv6/ip6_tables.h
+++ b/include/linux/netfilter_ipv6/ip6_tables.h
@@ -476,6 +476,10 @@ extern int ip6t_ext_hdr(u8 nexthdr);
 extern int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 			 u8 target);
 
+extern int ip6_masked_addrcmp(const struct in6_addr *addr1,
+			      const struct in6_addr *mask,
+			      const struct in6_addr *addr2);
+
 #define IP6T_ALIGN(s) (((s) + (__alignof__(struct ip6t_entry)-1)) & ~(__alignof__(struct ip6t_entry)-1))
 
 #endif /*__KERNEL__*/
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 7d49222..71a80e0 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -128,13 +128,14 @@ static LIST_HEAD(ip6t_tables);
 #define up(x) do { printk("UP:%u:" #x "\n", __LINE__); up(x); } while(0)
 #endif
 
-static int ip6_masked_addrcmp(struct in6_addr addr1, struct in6_addr mask,
-			      struct in6_addr addr2)
+int
+ip6_masked_addrcmp(const struct in6_addr *addr1, const struct in6_addr *mask,
+                   const struct in6_addr *addr2)
 {
 	int i;
 	for( i = 0; i < 16; i++){
-		if((addr1.s6_addr[i] & mask.s6_addr[i]) != 
-		   (addr2.s6_addr[i] & mask.s6_addr[i]))
+		if((addr1->s6_addr[i] & mask->s6_addr[i]) != 
+		   (addr2->s6_addr[i] & mask->s6_addr[i]))
 			return 1;
 	}
 	return 0;
@@ -168,10 +169,10 @@ ip6_packet_match(const struct sk_buff *s
 
 #define FWINV(bool,invflg) ((bool) ^ !!(ip6info->invflags & invflg))
 
-	if (FWINV(ip6_masked_addrcmp(ipv6->saddr,ip6info->smsk,ip6info->src),
-		  IP6T_INV_SRCIP)
-	    || FWINV(ip6_masked_addrcmp(ipv6->daddr,ip6info->dmsk,ip6info->dst),
-		     IP6T_INV_DSTIP)) {
+	if (FWINV(ip6_masked_addrcmp(&ipv6->saddr, &ip6info->smsk,
+	                             &ip6info->src), IP6T_INV_SRCIP)
+	    || FWINV(ip6_masked_addrcmp(&ipv6->daddr, &ip6info->dmsk,
+	                                &ip6info->dst), IP6T_INV_DSTIP)) {
 		dprintf("Source or dest mismatch.\n");
 /*
 		dprintf("SRC: %u. Mask: %u. Target: %u.%s\n", ip->saddr,
@@ -2094,6 +2095,7 @@ EXPORT_SYMBOL(ip6t_register_target);
 EXPORT_SYMBOL(ip6t_unregister_target);
 EXPORT_SYMBOL(ip6t_ext_hdr);
 EXPORT_SYMBOL(ipv6_find_hdr);
+EXPORT_SYMBOL(ip6_masked_addrcmp);
 
 module_init(init);
 module_exit(fini);

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 13/13]: [NETFILTER]: Add ipt_policy/ip6t_policy matches
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (11 preceding siblings ...)
  2005-11-20 16:31 ` [PATCH 12/13]: [NETFILTER]: Export ip6_masked_addrcmp, don't pass IPv6 addresses on stack Patrick McHardy
@ 2005-11-20 16:31 ` Patrick McHardy
       [not found] ` <200511201902.10179.lists@naasa.net>
  2005-11-22 22:34 ` David S. Miller
  14 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 16:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, Patrick McHardy

[NETFILTER]: Add ipt_policy/ip6t_policy matches

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit ff88b88efc987d1267eccf01e16880458d189a25
tree 53c34259c195cf64903940f151becd967bcce74d
parent 055c50b770e63ced784808ae22ef339724b1a44c
author Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:10:02 +0100
committer Patrick McHardy <kaber@trash.net> Sat, 19 Nov 2005 22:10:02 +0100

 include/linux/netfilter_ipv4/ipt_policy.h  |   52 ++++++++
 include/linux/netfilter_ipv6/ip6t_policy.h |   52 ++++++++
 net/ipv4/netfilter/Kconfig                 |   10 ++
 net/ipv4/netfilter/Makefile                |    1 
 net/ipv4/netfilter/ipt_policy.c            |  170 +++++++++++++++++++++++++++
 net/ipv6/netfilter/Kconfig                 |   10 ++
 net/ipv6/netfilter/Makefile                |    1 
 net/ipv6/netfilter/ip6t_policy.c           |  175 ++++++++++++++++++++++++++++
 8 files changed, 471 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter_ipv4/ipt_policy.h b/include/linux/netfilter_ipv4/ipt_policy.h
new file mode 100644
index 0000000..7fd1bec
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ipt_policy.h
@@ -0,0 +1,52 @@
+#ifndef _IPT_POLICY_H
+#define _IPT_POLICY_H
+
+#define IPT_POLICY_MAX_ELEM	4
+
+enum ipt_policy_flags
+{
+	IPT_POLICY_MATCH_IN	= 0x1,
+	IPT_POLICY_MATCH_OUT	= 0x2,
+	IPT_POLICY_MATCH_NONE	= 0x4,
+	IPT_POLICY_MATCH_STRICT	= 0x8,
+};
+
+enum ipt_policy_modes
+{
+	IPT_POLICY_MODE_TRANSPORT,
+	IPT_POLICY_MODE_TUNNEL
+};
+
+struct ipt_policy_spec
+{
+	u_int8_t	saddr:1,
+			daddr:1,
+			proto:1,
+			mode:1,
+			spi:1,
+			reqid:1;
+};
+
+struct ipt_policy_elem
+{
+	u_int32_t	saddr;
+	u_int32_t	smask;
+	u_int32_t	daddr;
+	u_int32_t	dmask;
+	u_int32_t	spi;
+	u_int32_t	reqid;
+	u_int8_t	proto;
+	u_int8_t	mode;
+
+	struct ipt_policy_spec	match;
+	struct ipt_policy_spec	invert;
+};
+
+struct ipt_policy_info
+{
+	struct ipt_policy_elem pol[IPT_POLICY_MAX_ELEM];
+	u_int16_t flags;
+	u_int16_t len;
+};
+
+#endif /* _IPT_POLICY_H */
diff --git a/include/linux/netfilter_ipv6/ip6t_policy.h b/include/linux/netfilter_ipv6/ip6t_policy.h
new file mode 100644
index 0000000..5a93afc
--- /dev/null
+++ b/include/linux/netfilter_ipv6/ip6t_policy.h
@@ -0,0 +1,52 @@
+#ifndef _IP6T_POLICY_H
+#define _IP6T_POLICY_H
+
+#define IP6T_POLICY_MAX_ELEM	4
+
+enum ip6t_policy_flags
+{
+	IP6T_POLICY_MATCH_IN		= 0x1,
+	IP6T_POLICY_MATCH_OUT		= 0x2,
+	IP6T_POLICY_MATCH_NONE		= 0x4,
+	IP6T_POLICY_MATCH_STRICT	= 0x8,
+};
+
+enum ip6t_policy_modes
+{
+	IP6T_POLICY_MODE_TRANSPORT,
+	IP6T_POLICY_MODE_TUNNEL
+};
+
+struct ip6t_policy_spec
+{
+	u_int8_t	saddr:1,
+			daddr:1,
+			proto:1,
+			mode:1,
+			spi:1,
+			reqid:1;
+};
+
+struct ip6t_policy_elem
+{
+	struct in6_addr	saddr;
+	struct in6_addr	smask;
+	struct in6_addr	daddr;
+	struct in6_addr	dmask;
+	u_int32_t	spi;
+	u_int32_t	reqid;
+	u_int8_t	proto;
+	u_int8_t	mode;
+
+	struct ip6t_policy_spec	match;
+	struct ip6t_policy_spec	invert;
+};
+
+struct ip6t_policy_info
+{
+	struct ip6t_policy_elem pol[IP6T_POLICY_MAX_ELEM];
+	u_int16_t flags;
+	u_int16_t len;
+};
+
+#endif /* _IP6T_POLICY_H */
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 9d3c8b5..5e8189f 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -487,6 +487,16 @@ config IP_NF_MATCH_STRING
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_NF_MATCH_POLICY
+       tristate "IPsec policy match support"
+       depends on IP_NF_IPTABLES && XFRM
+       help
+         Policy matching allows you to match packets based on the
+         IPsec policy that was used during decapsulation/will
+         be used during encapsulation.
+
+         To compile it as a module, choose M here.  If unsure, say N.
+
 # `filter', generic and specific targets
 config IP_NF_FILTER
 	tristate "Packet filtering"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 058c48e..2202517 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -71,6 +71,7 @@ obj-$(CONFIG_IP_NF_MATCH_TCPMSS) += ipt_
 obj-$(CONFIG_IP_NF_MATCH_REALM) += ipt_realm.o
 obj-$(CONFIG_IP_NF_MATCH_ADDRTYPE) += ipt_addrtype.o
 obj-$(CONFIG_IP_NF_MATCH_PHYSDEV) += ipt_physdev.o
+obj-$(CONFIG_IP_NF_MATCH_POLICY) += ipt_policy.o
 obj-$(CONFIG_IP_NF_MATCH_COMMENT) += ipt_comment.o
 obj-$(CONFIG_IP_NF_MATCH_STRING) += ipt_string.o
 
diff --git a/net/ipv4/netfilter/ipt_policy.c b/net/ipv4/netfilter/ipt_policy.c
new file mode 100644
index 0000000..709debc
--- /dev/null
+++ b/net/ipv4/netfilter/ipt_policy.c
@@ -0,0 +1,170 @@
+/* IP tables module for matching IPsec policy
+ *
+ * Copyright (c) 2004,2005 Patrick McHardy, <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/init.h>
+#include <net/xfrm.h>
+
+#include <linux/netfilter_ipv4.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
+#include <linux/netfilter_ipv4/ipt_policy.h>
+
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
+MODULE_DESCRIPTION("IPtables IPsec policy matching module");
+MODULE_LICENSE("GPL");
+
+
+static inline int
+match_xfrm_state(struct xfrm_state *x, const struct ipt_policy_elem *e)
+{
+#define MATCH(x,y)	(!e->match.x || ((e->x == (y)) ^ e->invert.x))
+
+	return MATCH(saddr, x->props.saddr.a4 & e->smask) &&
+	       MATCH(daddr, x->id.daddr.a4 & e->dmask) &&
+	       MATCH(proto, x->id.proto) &&
+	       MATCH(mode, x->props.mode) &&
+	       MATCH(spi, x->id.spi) &&
+	       MATCH(reqid, x->props.reqid);
+}
+
+static int
+match_policy_in(const struct sk_buff *skb, const struct ipt_policy_info *info)
+{
+	const struct ipt_policy_elem *e;
+	struct sec_path *sp = skb->sp;
+	int strict = info->flags & IPT_POLICY_MATCH_STRICT;
+	int i, pos;
+
+	if (sp == NULL)
+		return -1;
+	if (strict && info->len != sp->len)
+		return 0;
+
+	for (i = sp->len - 1; i >= 0; i--) {
+		pos = strict ? i - sp->len + 1 : 0;
+		if (pos >= info->len)
+			return 0;
+		e = &info->pol[pos];
+
+		if (match_xfrm_state(sp->x[i].xvec, e)) {
+			if (!strict)
+				return 1;
+		} else if (strict)
+			return 0;
+	}
+
+	return strict ? 1 : 0;
+}
+
+static int
+match_policy_out(const struct sk_buff *skb, const struct ipt_policy_info *info)
+{
+	const struct ipt_policy_elem *e;
+	struct dst_entry *dst = skb->dst;
+	int strict = info->flags & IPT_POLICY_MATCH_STRICT;
+	int i, pos;
+
+	if (dst->xfrm == NULL)
+		return -1;
+
+	for (i = 0; dst && dst->xfrm; dst = dst->child, i++) {
+		pos = strict ? i : 0;
+		if (pos >= info->len)
+			return 0;
+		e = &info->pol[pos];
+
+		if (match_xfrm_state(dst->xfrm, e)) {
+			if (!strict)
+				return 1;
+		} else if (strict)
+			return 0;
+	}
+
+	return strict ? 1 : 0;
+}
+
+static int match(const struct sk_buff *skb,
+                 const struct net_device *in,
+                 const struct net_device *out,
+                 const void *matchinfo, int offset, int *hotdrop)
+{
+	const struct ipt_policy_info *info = matchinfo;
+	int ret;
+
+	if (info->flags & IPT_POLICY_MATCH_IN)
+		ret = match_policy_in(skb, info);
+	else
+		ret = match_policy_out(skb, info);
+
+	if (ret < 0)
+		ret = info->flags & IPT_POLICY_MATCH_NONE ? 1 : 0;
+	else if (info->flags & IPT_POLICY_MATCH_NONE)
+		ret = 0;
+
+	return ret;
+}
+
+static int checkentry(const char *tablename, const struct ipt_ip *ip,
+                      void *matchinfo, unsigned int matchsize,
+                      unsigned int hook_mask)
+{
+	struct ipt_policy_info *info = matchinfo;
+
+	if (matchsize != IPT_ALIGN(sizeof(*info))) {
+		printk(KERN_ERR "ipt_policy: matchsize %u != %zu\n",
+		       matchsize, IPT_ALIGN(sizeof(*info)));
+		return 0;
+	}
+	if (!(info->flags & (IPT_POLICY_MATCH_IN|IPT_POLICY_MATCH_OUT))) {
+		printk(KERN_ERR "ipt_policy: neither incoming nor "
+		                "outgoing policy selected\n");
+		return 0;
+	}
+	if (hook_mask & (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_LOCAL_IN)
+	    && info->flags & IPT_POLICY_MATCH_OUT) {
+		printk(KERN_ERR "ipt_policy: output policy not valid in "
+		                "PRE_ROUTING and INPUT\n");
+		return 0;
+	}
+	if (hook_mask & (1 << NF_IP_POST_ROUTING | 1 << NF_IP_LOCAL_OUT)
+	    && info->flags & IPT_POLICY_MATCH_IN) {
+		printk(KERN_ERR "ipt_policy: input policy not valid in "
+		                "POST_ROUTING and OUTPUT\n");
+		return 0;
+	}
+	if (info->len > IPT_POLICY_MAX_ELEM) {
+		printk(KERN_ERR "ipt_policy: too many policy elements\n");
+		return 0;
+	}
+
+	return 1;
+}
+
+static struct ipt_match policy_match = {
+	.name		= "policy",
+	.match		= match,
+	.checkentry 	= checkentry,
+	.me		= THIS_MODULE,
+};
+
+static int __init init(void)
+{
+	return ipt_register_match(&policy_match);
+}
+
+static void __exit fini(void)
+{
+	ipt_unregister_match(&policy_match);
+}
+
+module_init(init);
+module_exit(fini);
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 060d612..96eae96 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -179,6 +179,16 @@ config IP6_NF_MATCH_PHYSDEV
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP6_NF_MATCH_POLICY
+	tristate "IPsec policy match support"
+	depends on IP6_NF_IPTABLES && XFRM
+	help
+	  Policy matching allows you to match packets based on the
+	  IPsec policy that was used during decapsulation/will
+	  be used during encapsulation.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 # The targets
 config IP6_NF_FILTER
 	tristate "Packet filtering"
diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
index 9ab5b2c..c0c809b 100644
--- a/net/ipv6/netfilter/Makefile
+++ b/net/ipv6/netfilter/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_
 obj-$(CONFIG_IP6_NF_MATCH_IPV6HEADER) += ip6t_ipv6header.o
 obj-$(CONFIG_IP6_NF_MATCH_FRAG) += ip6t_frag.o
 obj-$(CONFIG_IP6_NF_MATCH_AHESP) += ip6t_esp.o ip6t_ah.o
+obj-$(CONFIG_IP6_NF_MATCH_POLICY) += ip6t_policy.o
 obj-$(CONFIG_IP6_NF_MATCH_EUI64) += ip6t_eui64.o
 obj-$(CONFIG_IP6_NF_MATCH_MULTIPORT) += ip6t_multiport.o
 obj-$(CONFIG_IP6_NF_MATCH_OWNER) += ip6t_owner.o
diff --git a/net/ipv6/netfilter/ip6t_policy.c b/net/ipv6/netfilter/ip6t_policy.c
new file mode 100644
index 0000000..13fedad
--- /dev/null
+++ b/net/ipv6/netfilter/ip6t_policy.c
@@ -0,0 +1,175 @@
+/* IP tables module for matching IPsec policy
+ *
+ * Copyright (c) 2004,2005 Patrick McHardy, <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/init.h>
+#include <net/xfrm.h>
+
+#include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#include <linux/netfilter_ipv6/ip6t_policy.h>
+
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
+MODULE_DESCRIPTION("IPtables IPsec policy matching module");
+MODULE_LICENSE("GPL");
+
+
+static inline int
+match_xfrm_state(struct xfrm_state *x, const struct ip6t_policy_elem *e)
+{
+#define MATCH_ADDR(x,y,z)	(!e->match.x || \
+				 ((ip6_masked_addrcmp((z), &e->x, &e->y)) == 0) ^ e->invert.x)
+#define MATCH(x,y)		(!e->match.x || ((e->x == (y)) ^ e->invert.x))
+	
+	return MATCH_ADDR(saddr, smask, (struct in6_addr *)&x->props.saddr.a6) &&
+	       MATCH_ADDR(daddr, dmask, (struct in6_addr *)&x->id.daddr.a6) &&
+	       MATCH(proto, x->id.proto) &&
+	       MATCH(mode, x->props.mode) &&
+	       MATCH(spi, x->id.spi) &&
+	       MATCH(reqid, x->props.reqid);
+}
+
+static int
+match_policy_in(const struct sk_buff *skb, const struct ip6t_policy_info *info)
+{
+	const struct ip6t_policy_elem *e;
+	struct sec_path *sp = skb->sp;
+	int strict = info->flags & IP6T_POLICY_MATCH_STRICT;
+	int i, pos;
+
+	if (sp == NULL)
+		return -1;
+	if (strict && info->len != sp->len)
+		return 0;
+
+	for (i = sp->len - 1; i >= 0; i--) {
+		pos = strict ? i - sp->len + 1 : 0;
+		if (pos >= info->len)
+			return 0;
+		e = &info->pol[pos];
+
+		if (match_xfrm_state(sp->x[i].xvec, e)) {
+			if (!strict)
+				return 1;
+		} else if (strict)
+			return 0;
+	}
+
+	return strict ? 1 : 0;
+}
+
+static int
+match_policy_out(const struct sk_buff *skb, const struct ip6t_policy_info *info)
+{
+	const struct ip6t_policy_elem *e;
+	struct dst_entry *dst = skb->dst;
+	int strict = info->flags & IP6T_POLICY_MATCH_STRICT;
+	int i, pos;
+
+	if (dst->xfrm == NULL)
+		return -1;
+
+	for (i = 0; dst && dst->xfrm; dst = dst->child, i++) {
+		pos = strict ? i : 0;
+		if (pos >= info->len)
+			return 0;
+		e = &info->pol[pos];
+
+		if (match_xfrm_state(dst->xfrm, e)) {
+			if (!strict)
+				return 1;
+		} else if (strict)
+			return 0;
+	}
+
+	return strict ? 1 : 0;
+}
+
+static int match(const struct sk_buff *skb,
+                 const struct net_device *in,
+                 const struct net_device *out,
+                 const void *matchinfo,
+		 int offset,
+		 unsigned int protoff,
+		 int *hotdrop)
+{
+	const struct ip6t_policy_info *info = matchinfo;
+	int ret;
+
+	if (info->flags & IP6T_POLICY_MATCH_IN)
+		ret = match_policy_in(skb, info);
+	else
+		ret = match_policy_out(skb, info);
+
+	if (ret < 0)
+		ret = info->flags & IP6T_POLICY_MATCH_NONE ? 1 : 0;
+	else if (info->flags & IP6T_POLICY_MATCH_NONE)
+		ret = 0;
+
+	return ret;
+}
+
+static int checkentry(const char *tablename, const struct ip6t_ip6 *ip,
+                      void *matchinfo, unsigned int matchsize,
+                      unsigned int hook_mask)
+{
+	struct ip6t_policy_info *info = matchinfo;
+
+	if (matchsize != IP6T_ALIGN(sizeof(*info))) {
+		printk(KERN_ERR "ip6t_policy: matchsize %u != %zu\n",
+		       matchsize, IP6T_ALIGN(sizeof(*info)));
+		return 0;
+	}
+	if (!(info->flags & (IP6T_POLICY_MATCH_IN|IP6T_POLICY_MATCH_OUT))) {
+		printk(KERN_ERR "ip6t_policy: neither incoming nor "
+		                "outgoing policy selected\n");
+		return 0;
+	}
+	if (hook_mask & (1 << NF_IP6_PRE_ROUTING | 1 << NF_IP6_LOCAL_IN)
+	    && info->flags & IP6T_POLICY_MATCH_OUT) {
+		printk(KERN_ERR "ip6t_policy: output policy not valid in "
+		                "PRE_ROUTING and INPUT\n");
+		return 0;
+	}
+	if (hook_mask & (1 << NF_IP6_POST_ROUTING | 1 << NF_IP6_LOCAL_OUT)
+	    && info->flags & IP6T_POLICY_MATCH_IN) {
+		printk(KERN_ERR "ip6t_policy: input policy not valid in "
+		                "POST_ROUTING and OUTPUT\n");
+		return 0;
+	}
+	if (info->len > IP6T_POLICY_MAX_ELEM) {
+		printk(KERN_ERR "ip6t_policy: too many policy elements\n");
+		return 0;
+	}
+
+	return 1;
+}
+
+static struct ip6t_match policy_match = {
+	.name		= "policy",
+	.match		= match,
+	.checkentry 	= checkentry,
+	.me		= THIS_MODULE,
+};
+
+static int __init init(void)
+{
+	return ip6t_register_match(&policy_match);
+}
+
+static void __exit fini(void)
+{
+	ip6t_unregister_match(&policy_match);
+}
+
+module_init(init);
+module_exit(fini);

^ permalink raw reply related	[flat|nested] 59+ messages in thread

[parent not found: <200511201902.10179.lists@naasa.net>]

* Re: [PATCH 00/13]: Netfilter IPsec support
       [not found] ` <200511201902.10179.lists@naasa.net>
@ 2005-11-20 18:07   ` Patrick McHardy
  0 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-20 18:07 UTC (permalink / raw)
  To: jplatte; +Cc: netdev, netfilter-devel, davem

Joerg Platte wrote:
> Am Sonntag, 20. November 2005 17:31 schrieb Patrick McHardy:
> Hi!
> 
>>- policy lookups after NAT:
>>
>>When NAT changes a packet it already calls ip_route_me_harder, which
>>reroutes the packet and does a new policy lookup. It only looks at
>>the IP addresses however, changing the port numbers require a new
>>policy lookup as well. It also doesn't reroute in POST_ROUTING, since
>>the packet has already been routed. To behave more like a regular
>>tunnel device a policy lookup is now also done after SNAT and the
>>packet is passed to dst_output again if the lookup yielded a new
>>policy.
> 
> I suppose, this is the reason, why masqueraded packages leave a recent kernel 
> unencrypted, even if they would match the policy. It's still not implemented 
> in mainline. Am I right? If yes, I hope your patches will be merged as soon 
> as possible :-)

You're right, that's the reason. Since the patches touch quite a lot of
code they won't make it in 2.6.15, though.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
                   ` (13 preceding siblings ...)
       [not found] ` <200511201902.10179.lists@naasa.net>
@ 2005-11-22 22:34 ` David S. Miller
  2005-11-22 22:38   ` YOSHIFUJI Hideaki / 吉藤英明
  2005-11-23  1:17   ` Patrick McHardy
  14 siblings, 2 replies; 59+ messages in thread
From: David S. Miller @ 2005-11-22 22:34 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel

From: Patrick McHardy <kaber@trash.net>
Date: Sun, 20 Nov 2005 17:31:28 +0100

> This is the latest netfilter/IPsec patchset. Its purpose is to make
> IPsec look as much as a normal tunnel device to netfilter as possible
> and to enable NAT support.

I think there are some of these patches that we can merge in
right now into net-2.6.16...

I want to do this so that Patrick doesn't have to repost
13 or so patches every time one of the parts still under
discussion gets changed.

Actually, it seems the only part under discussion is how to
avoid extension header reparsing and routing re-lookups on
the ipv6 side.  That could be fixed by a follow-on patch and
is not %100 necessary for initial integration in my opinion.

Can I get agreement on that?  Patrick sends me a dump of the
current state of his patch set right now, we put that into
net-2.6.16, and fix problems with followon patches.

Ok?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-22 22:34 ` David S. Miller
@ 2005-11-22 22:38   ` YOSHIFUJI Hideaki / 吉藤英明
  2005-11-23  1:20     ` Patrick McHardy
  2005-11-23  1:17   ` Patrick McHardy
  1 sibling, 1 reply; 59+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-11-22 22:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, kaber

In article <20051122.143438.84749134.davem@davemloft.net> (at Tue, 22 Nov 2005 14:34:38 -0800 (PST)), "David S. Miller" <davem@davemloft.net> says:

> I want to do this so that Patrick doesn't have to repost
> 13 or so patches every time one of the parts still under
> discussion gets changed.
> 
> Actually, it seems the only part under discussion is how to
> avoid extension header reparsing and routing re-lookups on
> the ipv6 side.  That could be fixed by a follow-on patch and
> is not %100 necessary for initial integration in my opinion.
> 
> Can I get agreement on that?  Patrick sends me a dump of the
> current state of his patch set right now, we put that into
> net-2.6.16, and fix problems with followon patches.
> 
> Ok?

I believe he can manage these patches in his tree,
but anyway...

Well, it is very important to fix the packet processing
path issues (including the extension header issue)
before we merge it to the mainline. Definitely.
What I want to ensure is that they will not reach the mainline
tree without resolving those issues.

Thank you.

--yoshfuji

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-22 22:38   ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-11-23  1:20     ` Patrick McHardy
  0 siblings, 0 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-23  1:20 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev, netfilter-devel, davem, kaber

YOSHIFUJI Hideaki / ^[$B5HF#1QL@^[ wrote:
> I believe he can manage these patches in his tree,
> but anyway...

It is kind of annoying redoing 13 patches for every small change.

> Well, it is very important to fix the packet processing
> path issues (including the extension header issue)
> before we merge it to the mainline. Definitely.
> What I want to ensure is that they will not reach the mainline
> tree without resolving those issues.

How about we put the IPv4 patches in now and I continue working
on the IPv6 side? The main approach is not going to change,
and I think users would appreciate having them available in
2.6.15.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-22 22:34 ` David S. Miller
  2005-11-22 22:38   ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-11-23  1:17   ` Patrick McHardy
  2005-11-23  1:35     ` Herbert Xu
  2005-11-23  3:35     ` David S. Miller
  1 sibling, 2 replies; 59+ messages in thread
From: Patrick McHardy @ 2005-11-23  1:17 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, netfilter-devel

David S. Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Sun, 20 Nov 2005 17:31:28 +0100
> 
> 
>>This is the latest netfilter/IPsec patchset. Its purpose is to make
>>IPsec look as much as a normal tunnel device to netfilter as possible
>>and to enable NAT support.
> 
> 
> I think there are some of these patches that we can merge in
> right now into net-2.6.16...
> 
> I want to do this so that Patrick doesn't have to repost
> 13 or so patches every time one of the parts still under
> discussion gets changed.
 >
> Actually, it seems the only part under discussion is how to
> avoid extension header reparsing and routing re-lookups on
> the ipv6 side.  That could be fixed by a follow-on patch and
> is not %100 necessary for initial integration in my opinion.
> 
> Can I get agreement on that?  Patrick sends me a dump of the
> current state of his patch set right now, we put that into
> net-2.6.16, and fix problems with followon patches.
> 
> Ok?

I would appreciate that, but I want to have a look closer look
at Herbert's patches first. Unfortunately its late and I have
to get up early, so its going to take me a day.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-23  1:17   ` Patrick McHardy
@ 2005-11-23  1:35     ` Herbert Xu
  2005-11-23  3:36       ` David S. Miller
  2005-11-23  3:35     ` David S. Miller
  1 sibling, 1 reply; 59+ messages in thread
From: Herbert Xu @ 2005-11-23  1:35 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, davem

Patrick McHardy <kaber@trash.net> wrote:
> 
> I would appreciate that, but I want to have a look closer look
> at Herbert's patches first. Unfortunately its late and I have
> to get up early, so its going to take me a day.

How about merging the patches that everybody has agreed on first?

So far, I haven't see any objections to patches 1 and 2 so they
can go in straight away.  They don't even touch IPv6.

Patches 3-6 could become redundant if you go with my suggestion.

The rest of them I haven't read yet so can't comment :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-23  1:35     ` Herbert Xu
@ 2005-11-23  3:36       ` David S. Miller
  2005-11-23  4:47         ` Herbert Xu
  2005-11-23  4:52         ` Yasuyuki KOZAKAI
  0 siblings, 2 replies; 59+ messages in thread
From: David S. Miller @ 2005-11-23  3:36 UTC (permalink / raw)
  To: herbert; +Cc: netdev, netfilter-devel, kaber

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 23 Nov 2005 12:35:53 +1100

> How about merging the patches that everybody has agreed on first?
> 
> So far, I haven't see any objections to patches 1 and 2 so they
> can go in straight away.  They don't even touch IPv6.
> 
> Patches 3-6 could become redundant if you go with my suggestion.
> 
> The rest of them I haven't read yet so can't comment :)

View the net-2.6.16 GIT tree as a sort of playpen, much like
-mm, until we get close to the real 2.6.16 upstream development
openning up.

I rebase all the time, and I can pluck out and change patches
at will.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-23  3:36       ` David S. Miller
@ 2005-11-23  4:47         ` Herbert Xu
  2005-11-23  4:52         ` Yasuyuki KOZAKAI
  1 sibling, 0 replies; 59+ messages in thread
From: Herbert Xu @ 2005-11-23  4:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, netfilter-devel, kaber

On Tue, Nov 22, 2005 at 07:36:31PM -0800, David S. Miller wrote:
> 
> View the net-2.6.16 GIT tree as a sort of playpen, much like
> -mm, until we get close to the real 2.6.16 upstream development
> openning up.
> 
> I rebase all the time, and I can pluck out and change patches
> at will.

Sure, I have no objections against merging the patch now and then add
clean-ups later.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-23  3:36       ` David S. Miller
  2005-11-23  4:47         ` Herbert Xu
@ 2005-11-23  4:52         ` Yasuyuki KOZAKAI
  1 sibling, 0 replies; 59+ messages in thread
From: Yasuyuki KOZAKAI @ 2005-11-23  4:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, netfilter-devel, herbert, kaber

From: "David S. Miller" <davem@davemloft.net>
Date: Tue, 22 Nov 2005 19:36:31 -0800 (PST)

> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Wed, 23 Nov 2005 12:35:53 +1100
> 
> > How about merging the patches that everybody has agreed on first?
> > 
> > So far, I haven't see any objections to patches 1 and 2 so they
> > can go in straight away.  They don't even touch IPv6.
> > 
> > Patches 3-6 could become redundant if you go with my suggestion.
> > 
> > The rest of them I haven't read yet so can't comment :)
> 
> View the net-2.6.16 GIT tree as a sort of playpen, much like
> -mm, until we get close to the real 2.6.16 upstream development
> openning up.
> 
> I rebase all the time, and I can pluck out and change patches
> at will.

Then I agree. It can increase the number of tester, I think.

-- Yasuyuki Kozakai

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 00/13]: Netfilter IPsec support
  2005-11-23  1:17   ` Patrick McHardy
  2005-11-23  1:35     ` Herbert Xu
@ 2005-11-23  3:35     ` David S. Miller
  1 sibling, 0 replies; 59+ messages in thread
From: David S. Miller @ 2005-11-23  3:35 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel

From: Patrick McHardy <kaber@trash.net>
Date: Wed, 23 Nov 2005 02:17:14 +0100

> I would appreciate that, but I want to have a look closer look
> at Herbert's patches first. Unfortunately its late and I have
> to get up early, so its going to take me a day.

Take your time :)

^ permalink raw reply	[flat|nested] 59+ messages in thread

[parent not found: <4381F4C7.9070903@trash.net>]

[parent not found: <43826F77.7040502@miyazawa.org>]

[parent not found: <438270F2.3000603@trash.net>]

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
       [not found]   ` <438270F2.3000603@trash.net>
@ 2005-11-23 10:38     ` YOSHIFUJI Hideaki / 吉藤英明
  2005-12-18 14:27       ` Patrick McHardy
  0 siblings, 1 reply; 59+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-11-23 10:38 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel, davem, kozakai, kazunori

Hello.

In article <438270F2.3000603@trash.net> (at Tue, 22 Nov 2005 02:14:26 +0100), Patrick McHardy <kaber@trash.net> says:

> The easiest way would be to store nhoff somewhere in the skb and
> use it to continue at the next header. But I still hope there is
> a way without keeping data in the skb.

We've coded up this.

Though we have still another issue (call chain issue) to resolve,
we're getting closer to the goal.
i.e. we should continue the loop for common case.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Yasuyuki Kozawai <yasuyuki.kozakai@toshiba.co.jp>

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8c5d600..1101851 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -272,6 +272,9 @@ struct sk_buff {
 	void			(*destructor)(struct sk_buff *skb);
 #ifdef CONFIG_NETFILTER
 	__u32			nfmark;
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	unsigned int		nf_nhoff;
+#endif
 	struct nf_conntrack	*nfct;
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	struct sk_buff		*nfct_reasm;
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 44b979a..0531d0a 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -463,6 +463,8 @@ extern int			ip6_dst_lookup(struct sock 
 extern int			ip6_output(struct sk_buff *skb);
 extern int			ip6_forward(struct sk_buff *skb);
 extern int			ip6_input(struct sk_buff *skb);
+extern int			ip6_input_finish2(struct sk_buff *skb,
+						  unsigned int nhoff);
 extern int			ip6_mc_input(struct sk_buff *skb);
 
 /*
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index e84b3cd..cd0606a 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -134,31 +134,13 @@ out:
  *	Deliver the packet to the host
  */
 
-
-static inline int ip6_input_finish(struct sk_buff *skb)
+int ip6_input_finish2(struct sk_buff *skb, unsigned int nhoff)
 {
 	struct inet6_protocol *ipprot;
 	struct sock *raw_sk;
-	unsigned int nhoff;
-	int nexthdr;
+	unsigned int nexthdr;
 	u8 hash;
 
-	skb->h.raw = skb->nh.raw + sizeof(struct ipv6hdr);
-
-	/*
-	 *	Parse extension headers
-	 */
-
-	nexthdr = skb->nh.ipv6h->nexthdr;
-	nhoff = offsetof(struct ipv6hdr, nexthdr);
-
-	/* Skip hop-by-hop options, they are already parsed. */
-	if (nexthdr == NEXTHDR_HOP) {
-		nhoff = sizeof(struct ipv6hdr);
-		nexthdr = skb->h.raw[0];
-		skb->h.raw += (skb->h.raw[1]+1)<<3;
-	}
-
 	rcu_read_lock();
 resubmit:
 	if (!pskb_pull(skb, skb->h.raw - skb->data))
@@ -221,6 +203,26 @@ discard:
 	return 0;
 }
 
+static inline int ip6_input_finish(struct sk_buff *skb)
+{
+	unsigned int nhoff;
+
+	skb->h.raw = skb->nh.raw + sizeof(struct ipv6hdr);
+
+	/*
+	 *	Parse extension headers
+	 */
+
+	nhoff = offsetof(struct ipv6hdr, nexthdr);
+
+	/* Skip hop-by-hop options, they are already parsed. */
+	if (skb->nh.ipv6h->nexthdr == NEXTHDR_HOP) {
+		nhoff = sizeof(struct ipv6hdr);
+		skb->h.raw += (skb->h.raw[1]+1)<<3;
+	}
+
+	return ip6_input_finish2(skb, nhoff);
+}
 
 int ip6_input(struct sk_buff *skb)
 {
diff --git a/net/ipv6/ipv6_syms.c b/net/ipv6/ipv6_syms.c
index 1648278..6051783 100644
--- a/net/ipv6/ipv6_syms.c
+++ b/net/ipv6/ipv6_syms.c
@@ -15,6 +15,7 @@ EXPORT_SYMBOL(ndisc_mc_map);
 EXPORT_SYMBOL(register_inet6addr_notifier);
 EXPORT_SYMBOL(unregister_inet6addr_notifier);
 EXPORT_SYMBOL(ip6_route_output);
+EXPORT_SYMBOL(ip6_input_finish2);
 EXPORT_SYMBOL(addrconf_lock);
 EXPORT_SYMBOL(ipv6_setsockopt);
 EXPORT_SYMBOL(ipv6_getsockopt);
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 9987416..2e3b28d 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -9,6 +9,7 @@
  *		IPv6 support
  */
 
+#include <linux/config.h>
 #include <linux/module.h>
 #include <linux/string.h>
 #include <linux/netfilter.h>
@@ -17,6 +18,7 @@
 #include <net/inet_ecn.h>
 #include <net/ip.h>
 #include <net/ipv6.h>
+#include <net/ip6_route.h>
 #include <net/xfrm.h>
 
 static inline void ipip6_ecn_decapsulate(struct sk_buff *skb)
@@ -28,6 +30,25 @@ static inline void ipip6_ecn_decapsulate
 		IP6_ECN_set_ce(inner_iph);
 }
 
+#ifdef CONFIG_NETFILTER
+static inline int xfrm6_rcv_spi_finish2(struct sk_buff *skb)
+{
+	__skb_pull(skb, skb->h.raw - skb->nh.raw);
+	return ip6_input_finish2(skb, skb->nf_nhoff);
+}
+
+static inline int xfrm6_rcv_spi_finish(struct sk_buff *skb)
+{
+	if (skb->dst == NULL) {
+		ip6_route_input(skb);
+		return dst_input(skb);
+	}
+
+	return NF_HOOK(PF_INET6, NF_IP6_LOCAL_IN, skb, skb->dev, NULL,
+		       xfrm6_rcv_spi_finish2);
+}
+#endif
+
 int xfrm6_rcv_spi(struct sk_buff **pskb, unsigned int *nhoffp, u32 spi)
 {
 	struct sk_buff *skb = *pskb;
@@ -136,9 +157,10 @@ int xfrm6_rcv_spi(struct sk_buff **pskb,
 #ifdef CONFIG_NETFILTER
 		skb->nh.ipv6h->payload_len = htons(skb->len);
 		__skb_push(skb, skb->data - skb->nh.raw);
+		skb->nf_nhoff = nhoff;
 
 		NF_HOOK(PF_INET6, NF_IP6_PRE_ROUTING, skb, skb->dev, NULL,
-		        ip6_rcv_finish);
+		        xfrm6_rcv_spi_finish);
 		return -1;
 #else
 		return 1;

-- 
YOSHIFUJI Hideaki @ USAGI Project  <yoshfuji@linux-ipv6.org>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-11-23 10:38     ` [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks YOSHIFUJI Hideaki / 吉藤英明
@ 2005-12-18 14:27       ` Patrick McHardy
  2005-12-18 15:15         ` YOSHIFUJI Hideaki / 吉藤英明
  0 siblings, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-12-18 14:27 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / ?$B5HF#1QL@
  Cc: netdev, netfilter-devel, davem, kozakai, kazunori

[-- Attachment #1: Type: text/plain, Size: 814 bytes --]

YOSHIFUJI Hideaki / ^[$B5HF#1QL@^[ wrote:
> In article <438270F2.3000603@trash.net> (at Tue, 22 Nov 2005 02:14:26 +0100), Patrick McHardy <kaber@trash.net> says:
> 
> 
>>The easiest way would be to store nhoff somewhere in the skb and
>>use it to continue at the next header. But I still hope there is
>>a way without keeping data in the skb.
> 
> 
> We've coded up this.

How about this patch instead? It eliminates the nhoffp argument
to IPv6 protocol handlers by storing it in the IP6CB, which allows
to call ip6_input_finish a second time and have it skip already
parsed headers and also gets rid of the manual hopopts skipping.

> Though we have still another issue (call chain issue) to resolve,
> we're getting closer to the goal.
> i.e. we should continue the loop for common case.

I'll look into this now.

[-- Attachment #2: 04.diff --]
[-- Type: text/x-patch, Size: 12821 bytes --]

[IPv6]: Move nextheader offset to the IP6CB

Move nextheader offset to the IP6CB to make it possible to pass a
packet to ip6_input_finish multiple times and have it skip already
parsed headers. As a nice side-effect this gets rid of the manual
hopopts skipping.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 6678dcab480a4c410fe7984c43a9d1830c9b9912
tree 5eb3a8f6f5379c5f2b91af6d37dab9ca880681ae
parent 4cd76cb0703ffcb89d6c49e4d9171d7b6a116cd9
author Patrick McHardy <kaber@trash.net> Sun, 18 Dec 2005 06:52:02 +0100
committer Patrick McHardy <kaber@trash.net> Sun, 18 Dec 2005 06:52:02 +0100

 include/linux/ipv6.h    |    1 +
 include/net/protocol.h  |    2 +-
 include/net/xfrm.h      |    6 +++---
 net/dccp/ipv6.c         |    2 +-
 net/ipv6/exthdrs.c      |   19 ++++++++++++-------
 net/ipv6/icmp.c         |    4 ++--
 net/ipv6/ip6_input.c    |   21 ++++++---------------
 net/ipv6/ip6_tunnel.c   |    2 +-
 net/ipv6/reassembly.c   |   11 +++++------
 net/ipv6/tcp_ipv6.c     |    2 +-
 net/ipv6/udp.c          |    2 +-
 net/ipv6/xfrm6_input.c  |    8 ++++----
 net/ipv6/xfrm6_tunnel.c |    6 +++---
 net/sctp/ipv6.c         |    2 +-
 14 files changed, 42 insertions(+), 46 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index a0d0489..04bd248 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -190,6 +190,7 @@ struct inet6_skb_parm {
 	__u16			srcrt;
 	__u16			dst1;
 	__u16			lastopt;
+	__u32			nhoff;
 };
 
 #define IP6CB(skb)	((struct inet6_skb_parm*)((skb)->cb))
diff --git a/include/net/protocol.h b/include/net/protocol.h
index a29cb29..7006a25 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -43,7 +43,7 @@ struct net_protocol {
 #if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
 struct inet6_protocol 
 {
-	int	(*handler)(struct sk_buff **skb, unsigned int *nhoffp);
+	int	(*handler)(struct sk_buff **skb);
 
 	void	(*err_handler)(struct sk_buff *skb,
 			       struct inet6_skb_parm *opt,
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index f327cb3..1b2876c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -830,7 +830,7 @@ struct xfrm_tunnel {
 };
 
 struct xfrm6_tunnel {
-	int (*handler)(struct sk_buff **pskb, unsigned int *nhoffp);
+	int (*handler)(struct sk_buff **pskb);
 	void (*err_handler)(struct sk_buff *skb, struct inet6_skb_parm *opt,
 			    int type, int code, int offset, __u32 info);
 };
@@ -867,8 +867,8 @@ extern int xfrm4_rcv(struct sk_buff *skb
 extern int xfrm4_output(struct sk_buff *skb);
 extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler);
 extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler);
-extern int xfrm6_rcv_spi(struct sk_buff **pskb, unsigned int *nhoffp, u32 spi);
-extern int xfrm6_rcv(struct sk_buff **pskb, unsigned int *nhoffp);
+extern int xfrm6_rcv_spi(struct sk_buff **pskb, u32 spi);
+extern int xfrm6_rcv(struct sk_buff **pskb);
 extern int xfrm6_tunnel_register(struct xfrm6_tunnel *handler);
 extern int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler);
 extern u32 xfrm6_tunnel_alloc_spi(xfrm_address_t *saddr);
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 599b0be..67518c5 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -1027,7 +1027,7 @@ discard:
 	return 0;
 }
 
-static int dccp_v6_rcv(struct sk_buff **pskb, unsigned int *nhoffp)
+static int dccp_v6_rcv(struct sk_buff **pskb)
 {
 	const struct dccp_hdr *dh;
 	struct sk_buff *skb = *pskb;
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 113374d..2a1e7e4 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -152,7 +152,7 @@ static struct tlvtype_proc tlvprocdestop
 	{-1,			NULL}
 };
 
-static int ipv6_destopt_rcv(struct sk_buff **skbp, unsigned int *nhoffp)
+static int ipv6_destopt_rcv(struct sk_buff **skbp)
 {
 	struct sk_buff *skb = *skbp;
 	struct inet6_skb_parm *opt = IP6CB(skb);
@@ -169,7 +169,7 @@ static int ipv6_destopt_rcv(struct sk_bu
 
 	if (ip6_parse_tlv(tlvprocdestopt_lst, skb)) {
 		skb->h.raw += ((skb->h.raw[1]+1)<<3);
-		*nhoffp = opt->dst1;
+		opt->nhoff = opt->dst1;
 		return 1;
 	}
 
@@ -192,7 +192,7 @@ void __init ipv6_destopt_init(void)
   NONE header. No data in packet.
  ********************************/
 
-static int ipv6_nodata_rcv(struct sk_buff **skbp, unsigned int *nhoffp)
+static int ipv6_nodata_rcv(struct sk_buff **skbp)
 {
 	struct sk_buff *skb = *skbp;
 
@@ -215,7 +215,7 @@ void __init ipv6_nodata_init(void)
   Routing header.
  ********************************/
 
-static int ipv6_rthdr_rcv(struct sk_buff **skbp, unsigned int *nhoffp)
+static int ipv6_rthdr_rcv(struct sk_buff **skbp)
 {
 	struct sk_buff *skb = *skbp;
 	struct inet6_skb_parm *opt = IP6CB(skb);
@@ -249,7 +249,7 @@ looped_back:
 		skb->h.raw += (hdr->hdrlen + 1) << 3;
 		opt->dst0 = opt->dst1;
 		opt->dst1 = 0;
-		*nhoffp = (&hdr->nexthdr) - skb->nh.raw;
+		opt->nhoff = (&hdr->nexthdr) - skb->nh.raw;
 		return 1;
 	}
 
@@ -487,9 +487,14 @@ static struct tlvtype_proc tlvprochopopt
 
 int ipv6_parse_hopopts(struct sk_buff *skb, int nhoff)
 {
-	IP6CB(skb)->hop = sizeof(struct ipv6hdr);
-	if (ip6_parse_tlv(tlvprochopopt_lst, skb))
+	struct inet6_skb_parm *opt = IP6CB(skb);
+
+	opt->hop = sizeof(struct ipv6hdr);
+	if (ip6_parse_tlv(tlvprochopopt_lst, skb)) {
+		skb->h.raw += (skb->h.raw[1]+1)<<3;
+		opt->nhoff = sizeof(struct ipv6hdr);
 		return sizeof(struct ipv6hdr);
+	}
 	return -1;
 }
 
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 34a3322..d415c00 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -79,7 +79,7 @@ DEFINE_SNMP_STAT(struct icmpv6_mib, icmp
 static DEFINE_PER_CPU(struct socket *, __icmpv6_socket) = NULL;
 #define icmpv6_socket	__get_cpu_var(__icmpv6_socket)
 
-static int icmpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp);
+static int icmpv6_rcv(struct sk_buff **pskb);
 
 static struct inet6_protocol icmpv6_protocol = {
 	.handler	=	icmpv6_rcv,
@@ -569,7 +569,7 @@ static void icmpv6_notify(struct sk_buff
  *	Handle icmp messages
  */
 
-static int icmpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp)
+static int icmpv6_rcv(struct sk_buff **pskb)
 {
 	struct sk_buff *skb = *pskb;
 	struct net_device *dev = skb->dev;
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index a6026d2..13d7241 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -97,6 +97,9 @@ int ipv6_rcv(struct sk_buff *skb, struct
 	if (hdr->version != 6)
 		goto err;
 
+	skb->h.raw = (u8 *)(hdr + 1);
+	IP6CB(skb)->nhoff = offsetof(struct ipv6hdr, nexthdr);
+
 	pkt_len = ntohs(hdr->payload_len);
 
 	/* pkt_len may be zero if Jumbo payload option is present */
@@ -111,8 +114,7 @@ int ipv6_rcv(struct sk_buff *skb, struct
 	}
 
 	if (hdr->nexthdr == NEXTHDR_HOP) {
-		skb->h.raw = (u8*)(hdr+1);
-		if (ipv6_parse_hopopts(skb, offsetof(struct ipv6hdr, nexthdr)) < 0) {
+		if (ipv6_parse_hopopts(skb, IP6CB(skb)->nhoff) < 0) {
 			IP6_INC_STATS_BH(IPSTATS_MIB_INHDRERRORS);
 			return 0;
 		}
@@ -143,26 +145,15 @@ static inline int ip6_input_finish(struc
 	int nexthdr;
 	u8 hash;
 
-	skb->h.raw = skb->nh.raw + sizeof(struct ipv6hdr);
-
 	/*
 	 *	Parse extension headers
 	 */
 
-	nexthdr = skb->nh.ipv6h->nexthdr;
-	nhoff = offsetof(struct ipv6hdr, nexthdr);
-
-	/* Skip hop-by-hop options, they are already parsed. */
-	if (nexthdr == NEXTHDR_HOP) {
-		nhoff = sizeof(struct ipv6hdr);
-		nexthdr = skb->h.raw[0];
-		skb->h.raw += (skb->h.raw[1]+1)<<3;
-	}
-
 	rcu_read_lock();
 resubmit:
 	if (!pskb_pull(skb, skb->h.raw - skb->data))
 		goto discard;
+	nhoff = IP6CB(skb)->nhoff;
 	nexthdr = skb->nh.raw[nhoff];
 
 	raw_sk = sk_head(&raw_v6_htable[nexthdr & (MAX_INET_PROTOS - 1)]);
@@ -194,7 +185,7 @@ resubmit:
 		    !xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) 
 			goto discard;
 		
-		ret = ipprot->handler(&skb, &nhoff);
+		ret = ipprot->handler(&skb);
 		if (ret > 0)
 			goto resubmit;
 		else if (ret == 0)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index e315d0f..f079621 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -510,7 +510,7 @@ static inline void ip6ip6_ecn_decapsulat
  **/
 
 static int 
-ip6ip6_rcv(struct sk_buff **pskb, unsigned int *nhoffp)
+ip6ip6_rcv(struct sk_buff **pskb)
 {
 	struct sk_buff *skb = *pskb;
 	struct ipv6hdr *ipv6h;
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 5d316cb..15e1456 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -581,7 +581,6 @@ err:
  *	the last and the first frames arrived and all the bits are here.
  */
 static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff **skb_in,
-			  unsigned int *nhoffp,
 			  struct net_device *dev)
 {
 	struct sk_buff *fp, *head = fq->fragments;
@@ -654,6 +653,7 @@ static int ip6_frag_reasm(struct frag_qu
 	head->dev = dev;
 	skb_set_timestamp(head, &fq->stamp);
 	head->nh.ipv6h->payload_len = htons(payload_len);
+	IP6CB(head)->nhoff = nhoff;
 
 	*skb_in = head;
 
@@ -663,7 +663,6 @@ static int ip6_frag_reasm(struct frag_qu
 
 	IP6_INC_STATS_BH(IPSTATS_MIB_REASMOKS);
 	fq->fragments = NULL;
-	*nhoffp = nhoff;
 	return 1;
 
 out_oversize:
@@ -678,7 +677,7 @@ out_fail:
 	return -1;
 }
 
-static int ipv6_frag_rcv(struct sk_buff **skbp, unsigned int *nhoffp)
+static int ipv6_frag_rcv(struct sk_buff **skbp)
 {
 	struct sk_buff *skb = *skbp; 
 	struct net_device *dev = skb->dev;
@@ -710,7 +709,7 @@ static int ipv6_frag_rcv(struct sk_buff 
 		skb->h.raw += sizeof(struct frag_hdr);
 		IP6_INC_STATS_BH(IPSTATS_MIB_REASMOKS);
 
-		*nhoffp = (u8*)fhdr - skb->nh.raw;
+		IP6CB(skb)->nhoff = (u8*)fhdr - skb->nh.raw;
 		return 1;
 	}
 
@@ -722,11 +721,11 @@ static int ipv6_frag_rcv(struct sk_buff 
 
 		spin_lock(&fq->lock);
 
-		ip6_frag_queue(fq, skb, fhdr, *nhoffp);
+		ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);
 
 		if (fq->last_in == (FIRST_IN|LAST_IN) &&
 		    fq->meat == fq->len)
-			ret = ip6_frag_reasm(fq, skbp, nhoffp, dev);
+			ret = ip6_frag_reasm(fq, skbp, dev);
 
 		spin_unlock(&fq->lock);
 		fq_put(fq, NULL);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2947bc5..a25f4e8 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1153,7 +1153,7 @@ ipv6_pktoptions:
 	return 0;
 }
 
-static int tcp_v6_rcv(struct sk_buff **pskb, unsigned int *nhoffp)
+static int tcp_v6_rcv(struct sk_buff **pskb)
 {
 	struct sk_buff *skb = *pskb;
 	struct tcphdr *th;	
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d8538dc..c476488 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -435,7 +435,7 @@ out:
 	read_unlock(&udp_hash_lock);
 }
 
-static int udpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp)
+static int udpv6_rcv(struct sk_buff **pskb)
 {
 	struct sk_buff *skb = *pskb;
 	struct sock *sk;
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 28c29d7..1079e47 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -26,7 +26,7 @@ static inline void ipip6_ecn_decapsulate
 		IP6_ECN_set_ce(inner_iph);
 }
 
-int xfrm6_rcv_spi(struct sk_buff **pskb, unsigned int *nhoffp, u32 spi)
+int xfrm6_rcv_spi(struct sk_buff **pskb, u32 spi)
 {
 	struct sk_buff *skb = *pskb;
 	int err;
@@ -38,7 +38,7 @@ int xfrm6_rcv_spi(struct sk_buff **pskb,
 	int nexthdr;
 	unsigned int nhoff;
 
-	nhoff = *nhoffp;
+	nhoff = IP6CB(skb)->nhoff;
 	nexthdr = skb->nh.raw[nhoff];
 
 	seq = 0;
@@ -144,7 +144,7 @@ drop:
 
 EXPORT_SYMBOL(xfrm6_rcv_spi);
 
-int xfrm6_rcv(struct sk_buff **pskb, unsigned int *nhoffp)
+int xfrm6_rcv(struct sk_buff **pskb)
 {
-	return xfrm6_rcv_spi(pskb, nhoffp, 0);
+	return xfrm6_rcv_spi(pskb, 0);
 }
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index fbef782..da09ff2 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -397,7 +397,7 @@ int xfrm6_tunnel_deregister(struct xfrm6
 
 EXPORT_SYMBOL(xfrm6_tunnel_deregister);
 
-static int xfrm6_tunnel_rcv(struct sk_buff **pskb, unsigned int *nhoffp)
+static int xfrm6_tunnel_rcv(struct sk_buff **pskb)
 {
 	struct sk_buff *skb = *pskb;
 	struct xfrm6_tunnel *handler = xfrm6_tunnel_handler;
@@ -405,11 +405,11 @@ static int xfrm6_tunnel_rcv(struct sk_bu
 	u32 spi;
 
 	/* device-like_ip6ip6_handler() */
-	if (handler && handler->handler(pskb, nhoffp) == 0)
+	if (handler && handler->handler(pskb) == 0)
 		return 0;
 
 	spi = xfrm6_tunnel_spi_lookup((xfrm_address_t *)&iph->saddr);
-	return xfrm6_rcv_spi(pskb, nhoffp, spi);
+	return xfrm6_rcv_spi(pskb, spi);
 }
 
 static void xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index fa3be2b..bb8f8cf 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -905,7 +905,7 @@ static struct inet_protosw sctpv6_stream
 	.flags         = SCTP_PROTOSW_FLAG,
 };
 
-static int sctp6_rcv(struct sk_buff **pskb, unsigned int *nhoffp)
+static int sctp6_rcv(struct sk_buff **pskb)
 {
 	return sctp_rcv(*pskb) ? -1 : 0;
 }

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-12-18 14:27       ` Patrick McHardy
@ 2005-12-18 15:15         ` YOSHIFUJI Hideaki / 吉藤英明
  2005-12-18 22:59           ` Patrick McHardy
  0 siblings, 1 reply; 59+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-12-18 15:15 UTC (permalink / raw)
  To: kaber, davem; +Cc: kozakai, netdev, netfilter-devel, kazunori

In article <43A571B5.205@trash.net> (at Sun, 18 Dec 2005 15:27:01 +0100), Patrick McHardy <kaber@trash.net> says:

> YOSHIFUJI Hideaki wrote:
> > In article <438270F2.3000603@trash.net> (at Tue, 22 Nov 2005 02:14:26 +0100), Patrick McHardy <kaber@trash.net> says:
> > 
> > 
> >>The easiest way would be to store nhoff somewhere in the skb and
> >>use it to continue at the next header. But I still hope there is
> >>a way without keeping data in the skb.
> > 
> > 
> > We've coded up this.
> 
> How about this patch instead? It eliminates the nhoffp argument
> to IPv6 protocol handlers by storing it in the IP6CB, which allows
> to call ip6_input_finish a second time and have it skip already
> parsed headers and also gets rid of the manual hopopts skipping.

The idea to store IP6CB itself seems sane to me.

BTW, we're now using full of skb->cb
(and we are even exceeding it w/ mobile-ipv6 extensions)...

--yoshfuji

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-12-18 15:15         ` YOSHIFUJI Hideaki / 吉藤英明
@ 2005-12-18 22:59           ` Patrick McHardy
  2005-12-19  3:46             ` YOSHIFUJI Hideaki / 吉藤英明
  0 siblings, 1 reply; 59+ messages in thread
From: Patrick McHardy @ 2005-12-18 22:59 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / ?$B5HF#1QL@
  Cc: netdev, netfilter-devel, kazunori, kozakai, davem

YOSHIFUJI Hideaki / ^[$B5HF#1QL@^[ wrote:
> In article <43A571B5.205@trash.net> (at Sun, 18 Dec 2005 15:27:01 +0100), Patrick McHardy <kaber@trash.net> says:
> 
>>How about this patch instead? It eliminates the nhoffp argument
>>to IPv6 protocol handlers by storing it in the IP6CB, which allows
>>to call ip6_input_finish a second time and have it skip already
>>parsed headers and also gets rid of the manual hopopts skipping.
> 
> 
> The idea to store IP6CB itself seems sane to me.
> 
> BTW, we're now using full of skb->cb
> (and we are even exceeding it w/ mobile-ipv6 extensions)...

Not in mainline so far, so maybe we can fit your extensions
and my patches without the mobile extensions, that apparently
exceed the CB anyway, in there for now. Can I look at those
patches somewhere? BTW, other fields in the IP6CB seem to
store offsets in u16 fields, is this OK for nhoff too? I
thought with jumbo options I need to use a u32 field.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks
  2005-12-18 22:59           ` Patrick McHardy
@ 2005-12-19  3:46             ` YOSHIFUJI Hideaki / 吉藤英明
  0 siblings, 0 replies; 59+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2005-12-19  3:46 UTC (permalink / raw)
  To: kaber; +Cc: netdev, netfilter-devel, kazunori, kozakai, davem

In article <43A5E9D7.8050705@trash.net> (at Sun, 18 Dec 2005 23:59:35 +0100), Patrick McHardy <kaber@trash.net> says:

> YOSHIFUJI Hideaki wrote:
:
> >  BTW, we're now using full of skb->cb
> > (and we are even exceeding it w/ mobile-ipv6 extensions)...
> 
> Not in mainline so far, so maybe we can fit your extensions
> and my patches without the mobile extensions, that apparently
> exceed the CB anyway, in there for now. Can I look at those
> patches somewhere? BTW, other fields in the IP6CB seem to
> store offsets in u16 fields, is this OK for nhoff too? I
> thought with jumbo options I need to use a u32 field.

Well, don't mind too much.
I just wanted to note that we're about to exceed skb->cb.
I will probably enlarge skb->cb if needed.

And, yes, good point.
I think nhoff should be u32 because it is critical.
In theory, u32 is definitely needed for others as well...

--yoshfuji

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2005-12-19  3:46 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-20 16:31 [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
2005-11-20 16:31 ` [PATCH 01/13]: [NETFILTER]: Remove okfn usage in ip_vs_core.c Patrick McHardy
2005-11-20 16:31 ` [PATCH 02/13]: [NETFILTER]: Call POST_ROUTING hook before fragmentation Patrick McHardy
2005-11-20 16:31 ` [PATCH 03/13]: [IPV4]: Replace dst_output by ip_dst_output Patrick McHardy
2005-11-20 16:31 ` [PATCH 04/13]: [IPV6]: Replace dst_output by ip6_dst_output Patrick McHardy
2005-11-20 16:31 ` [PATCH 05/13]: [IPV4/6]: Netfilter IPsec output hooks Patrick McHardy
2005-11-22  4:40   ` Herbert Xu
2005-11-22  4:53     ` Patrick McHardy
2005-11-22  5:13       ` Patrick McHardy
2005-11-22 10:30       ` Herbert Xu
2005-11-22 10:31         ` Herbert Xu
2005-11-22 12:13           ` Herbert Xu
2005-11-28  1:07             ` Patrick McHardy
2005-11-28  4:56               ` Herbert Xu
2005-11-28 12:25                 ` Patrick McHardy
2005-12-04 22:09                 ` Patrick McHardy
2005-12-04 22:15                   ` Herbert Xu
2005-11-20 16:31 ` [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks Patrick McHardy
2005-11-21  4:42   ` Yasuyuki KOZAKAI
     [not found]   ` <200511210442.jAL4gPoO001846@toshiba.co.jp>
2005-11-21  6:52     ` Patrick McHardy
2005-11-21  7:00       ` David S. Miller
2005-11-21  7:47         ` Herbert Xu
2005-11-21 16:52         ` Patrick McHardy
2005-11-21 10:53       ` Yasuyuki KOZAKAI
     [not found]       ` <200511211053.jALAro04019574@toshiba.co.jp>
2005-11-21 16:34         ` Patrick McHardy
     [not found]   ` <438185ED.3050005@miyazawa.org>
2005-11-21  8:50     ` YOSHIFUJI Hideaki / 吉藤英明
2005-11-21 16:29       ` Patrick McHardy
2005-12-01  1:27   ` Herbert Xu
2005-12-04 22:06     ` Patrick McHardy
2005-12-04 22:10       ` Herbert Xu
2005-12-04 22:49         ` Patrick McHardy
2005-11-20 16:31 ` [PATCH 07/13]: [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder Patrick McHardy
2005-11-28 21:06   ` Herbert Xu
2005-11-29  7:02     ` Patrick McHardy
2005-11-29  7:34       ` Herbert Xu
2005-11-29  7:49         ` David S. Miller
2005-11-29 11:31           ` Herbert Xu
2005-11-20 16:31 ` [PATCH 08/13]: [NETFILTER]: Use conntrack information to determine if packet was NATed Patrick McHardy
2005-11-20 16:31 ` [PATCH 09/13]: [NETFILTER]: Redo policy lookups after NAT when neccessary Patrick McHardy
2005-11-20 16:43   ` Patrick McHardy
2005-11-20 16:31 ` [PATCH 10/13]: [NETFILTER]: Keep the conntrack reference until after policy checks Patrick McHardy
2005-11-20 16:31 ` [PATCH 11/13]: [NETFILTER]: Handle NAT in IPsec " Patrick McHardy
2005-11-20 16:31 ` [PATCH 12/13]: [NETFILTER]: Export ip6_masked_addrcmp, don't pass IPv6 addresses on stack Patrick McHardy
2005-11-20 16:31 ` [PATCH 13/13]: [NETFILTER]: Add ipt_policy/ip6t_policy matches Patrick McHardy
     [not found] ` <200511201902.10179.lists@naasa.net>
2005-11-20 18:07   ` [PATCH 00/13]: Netfilter IPsec support Patrick McHardy
2005-11-22 22:34 ` David S. Miller
2005-11-22 22:38   ` YOSHIFUJI Hideaki / 吉藤英明
2005-11-23  1:20     ` Patrick McHardy
2005-11-23  1:17   ` Patrick McHardy
2005-11-23  1:35     ` Herbert Xu
2005-11-23  3:36       ` David S. Miller
2005-11-23  4:47         ` Herbert Xu
2005-11-23  4:52         ` Yasuyuki KOZAKAI
2005-11-23  3:35     ` David S. Miller
     [not found] <4381F4C7.9070903@trash.net>
     [not found] ` <43826F77.7040502@miyazawa.org>
     [not found]   ` <438270F2.3000603@trash.net>
2005-11-23 10:38     ` [PATCH 06/13]: [IPV4/6]: Netfilter IPsec input hooks YOSHIFUJI Hideaki / 吉藤英明
2005-12-18 14:27       ` Patrick McHardy
2005-12-18 15:15         ` YOSHIFUJI Hideaki / 吉藤英明
2005-12-18 22:59           ` Patrick McHardy
2005-12-19  3:46             ` YOSHIFUJI Hideaki / 吉藤英明

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).