NAT before IPsec with 2.6

All of lore.kernel.org
 help / color / mirror / Atom feed

* NAT before IPsec with 2.6
@ 2004-01-21 12:29 Michal Ludvig
  2004-01-23  6:57 ` Willy Tarreau
  2004-01-23 12:31 ` Henrik Nordstrom
  0 siblings, 2 replies; 63+ messages in thread
From: Michal Ludvig @ 2004-01-21 12:29 UTC (permalink / raw)
  To: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

Hi there,

with the native IPsec stack in 2.6.1 kernel it's impossible to do SNAT 
on outgoing packets that are leaving through an IPsec link. The problem 
is that the forwarded packets are encapsulated (with IPSec ESP/AH) yet 
before the POSTROUTING chain is reached.

So I hacked up this simple solution - I understand that it won't be 
accepted as-is, because now the POSTROUTING chain is reached twice (once 
from ip_forward and once from ip_output). But anyways - is this a 
correct approach and should I go this way with some cleanups?

Or is there another approach? I was thinking about extending the FORWARD 
chain and let it do SNAT/MASQUERADE. That would be cleaner, but more 
intrusive than my patch and would probably also require some changes in 
the userspace.

Comments, opinions, ...?

(Please Cc me on replies, thanks!)

Michal Ludvig
-- 
SUSE Labs                    mludvig@suse.cz | Cray is the only computer
(+420) 296.545.373        http://www.suse.cz | that runs an endless loop
Personal homepage http://www.logix.cz/michal | in just four hours.

[-- Attachment #2: kernel-nat-before-ipsec.diff --]
[-- Type: text/plain, Size: 1188 bytes --]

diff -rup linux-2.6.1.orig/net/ipv4/ip_forward.c linux-2.6.1.naga/net/ipv4/ip_forward.c
--- linux-2.6.1.orig/net/ipv4/ip_forward.c	2004-01-16 14:28:39.000000000 +0100
+++ linux-2.6.1.naga/net/ipv4/ip_forward.c	2004-01-20 16:11:09.904466001 +0100
@@ -46,12 +46,30 @@ static inline int ip_forward_finish(stru
 {
 	struct ip_options * opt	= &(IPCB(skb)->opt);
 
+	if (!xfrm4_route_forward(skb))
+		goto drop;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+	skb->nf_debug &= ~(1 << NF_IP_POST_ROUTING);
+#endif
+
 	IP_INC_STATS_BH(IpForwDatagrams);
 
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
 	return dst_output(skb);
+
+drop:
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
+
+static inline int ip_forward_postroute(struct sk_buff *skb)
+{
+	struct rtable *rt = (struct rtable*)skb->dst;
+
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, rt->u.dst.dev, ip_forward_finish);
 }
 
 int ip_forward(struct sk_buff *skb)
@@ -109,7 +131,7 @@ int ip_forward(struct sk_buff *skb)
 	skb->priority = rt_tos2priority(iph->tos);
 
 	return NF_HOOK(PF_INET, NF_IP_FORWARD, skb, skb->dev, rt->u.dst.dev,
-		       ip_forward_finish);
+		       ip_forward_postroute);
 
 sr_failed:
         /*

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-21 12:29 NAT before IPsec with 2.6 Michal Ludvig
@ 2004-01-23  6:57 ` Willy Tarreau
  2004-01-23 12:31 ` Henrik Nordstrom
  1 sibling, 0 replies; 63+ messages in thread
From: Willy Tarreau @ 2004-01-23  6:57 UTC (permalink / raw)
  To: Michal Ludvig; +Cc: netfilter-devel

Hi,

On Wed, Jan 21, 2004 at 01:29:09PM +0100, Michal Ludvig wrote:
> So I hacked up this simple solution - I understand that it won't be 
> accepted as-is, because now the POSTROUTING chain is reached twice (once 
> from ip_forward and once from ip_output). But anyways - is this a 
> correct approach and should I go this way with some cleanups?

I don't know if it's a correct approach, but I really think that all tables
and chains should apply to a consistent level of encapsulation, whatever it
is. Before, it was easy to process the packet at different levels because it
was sent to one (or even several) interemediate virtual interfaces. To keep
control of what goes in and out, now we should :

  1) re-evaluate all previously evaluated chains when a packet has been
     decapsulated ;
  2) re-inject encapsulated packets into the output & postrouting chains
     (it's a locally generated packet after all, it has no reason to bypass
     output filtering).
  3) keep assigned fwmark between passes so that we can track what each
     packet does.

I believe that 1) & 3) are already done, but I don't know if there's a general
consensus towards or against 2).

Cheers,
Willy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-21 12:29 NAT before IPsec with 2.6 Michal Ludvig
  2004-01-23  6:57 ` Willy Tarreau
@ 2004-01-23 12:31 ` Henrik Nordstrom
  2004-01-23 13:31   ` Michal Ludvig
  1 sibling, 1 reply; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-23 12:31 UTC (permalink / raw)
  To: Michal Ludvig; +Cc: netfilter-devel

On Wed, 21 Jan 2004, Michal Ludvig wrote:

> So I hacked up this simple solution - I understand that it won't be 
> accepted as-is, because now the POSTROUTING chain is reached twice (once 
> from ip_forward and once from ip_output). But anyways - is this a 
> correct approach and should I go this way with some cleanups?

The ip_forward call should be moved to just before where the packets gets
encapsulated when using IPSec. When using IPSec this is the conceptual 
POSTROUTING point in the stack for the plain packets.

Other than that I see no direct problems.

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-23 12:31 ` Henrik Nordstrom
@ 2004-01-23 13:31   ` Michal Ludvig
  2004-01-23 14:24     ` Henrik Nordstrom
  0 siblings, 1 reply; 63+ messages in thread
From: Michal Ludvig @ 2004-01-23 13:31 UTC (permalink / raw)
  To: Henrik Nordstrom; +Cc: netfilter-devel

Henrik Nordstrom told me that:
> On Wed, 21 Jan 2004, Michal Ludvig wrote:
> 
>>So I hacked up this simple solution - I understand that it won't be 
>>accepted as-is, because now the POSTROUTING chain is reached twice (once 
>>from ip_forward and once from ip_output). But anyways - is this a 
>>correct approach and should I go this way with some cleanups?
> 
> The ip_forward call should be moved to just before where the packets gets
> encapsulated when using IPSec. When using IPSec this is the conceptual 
> POSTROUTING point in the stack for the plain packets.

Without my patch the path is:

	ip_forward()
	     |
	check xfrm policy (here it's decided whether or not to encrypt)
	     |
	NF_IP_FORWARD (unfortunately no NAT available here...)
	     |
	ip_forward_finish()
	     |
	dst_output() => esp_output()

With my patch the path is:

	ip_forward()
	     |
	check xfrm policy (could probably be omitted, we'll do it after
	                   NAT once again)
	     |
	NF_IP_FORWARD (the packet may be e.g. dropped here)
	     |
	ip_forward_postroute()
	     |
	NF_IP_POST_ROUTING (do NAT here)
	     |
	ip_forward_finish()
	     |
	check xfrm policy (here it's decided whether or not to encrypt)
	     |
	dst_output() => esp_output()

I.e. the postrouting on the unencrypted packet is really called right 
before it gets encrypted. And the encrypted packet then hits the 
POSTROUTING again, but it's already a different packet, actually.

Was this your point or did I misunderstand you?

Michal Ludvig
-- 
SUSE Labs                    mludvig@suse.cz | Cray is the only computer
(+420) 296.545.373        http://www.suse.cz | that runs an endless loop
Personal homepage http://www.logix.cz/michal | in just four hours.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-23 13:31   ` Michal Ludvig
@ 2004-01-23 14:24     ` Henrik Nordstrom
  2004-01-23 14:40       ` Michal Ludvig
  2004-01-23 15:51       ` Tom Eastep
  0 siblings, 2 replies; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-23 14:24 UTC (permalink / raw)
  To: Michal Ludvig; +Cc: netfilter-devel

On Fri, 23 Jan 2004, Michal Ludvig wrote:

> I.e. the postrouting on the unencrypted packet is really called right 
> before it gets encrypted. And the encrypted packet then hits the 
> POSTROUTING again, but it's already a different packet, actually.

My issue is with packets not destinated for an IPSec tunnel.. from what I 
read your patch these will hit POSTROUTING twice. But maybe I misread your 
patch?

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-23 14:24     ` Henrik Nordstrom
@ 2004-01-23 14:40       ` Michal Ludvig
  2004-01-23 15:56         ` Henrik Nordstrom
  2004-01-23 15:51       ` Tom Eastep
  1 sibling, 1 reply; 63+ messages in thread
From: Michal Ludvig @ 2004-01-23 14:40 UTC (permalink / raw)
  To: Henrik Nordstrom; +Cc: netfilter-devel

Henrik Nordstrom told me that:
> On Fri, 23 Jan 2004, Michal Ludvig wrote:
> 
>>I.e. the postrouting on the unencrypted packet is really called right 
>>before it gets encrypted. And the encrypted packet then hits the 
>>POSTROUTING again, but it's already a different packet, actually.
> 
> My issue is with packets not destinated for an IPSec tunnel.. from what I 
> read your patch these will hit POSTROUTING twice. But maybe I misread your 
> patch?

Yes, that's true. But how to solve it? The best would be to skip the 
first POSTROUTING if the packet won't be encapsulated. But we can't know 
whether or not it will go to IPsec until we perform NAT on it. So it 
looks like we must skip the second POSTROUTING in the case where the 
packet was encapsulated. But how can we recognise these two situations?
Wild idea - comparing some hashes (src/dst ip/port) before and after the 
first POSTROUTING call?

Michal Ludvig
-- 
SUSE Labs                    mludvig@suse.cz | Cray is the only computer
(+420) 296.545.373        http://www.suse.cz | that runs an endless loop
Personal homepage http://www.logix.cz/michal | in just four hours.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-23 14:40       ` Michal Ludvig
@ 2004-01-23 15:56         ` Henrik Nordstrom
  0 siblings, 0 replies; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-23 15:56 UTC (permalink / raw)
  To: Michal Ludvig; +Cc: netfilter-devel

On Fri, 23 Jan 2004, Michal Ludvig wrote:

> Yes, that's true. But how to solve it? The best would be to skip the 
> first POSTROUTING if the packet won't be encapsulated. But we can't know 
> whether or not it will go to IPsec until we perform NAT on it.

As I said the hook needs to be just before the packet is about to enter 
IPSec.

Any decisions on what to IPSec or not needs to be based on other 
information than NAT, i.e. via nfmark routing or similar.

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-23 14:24     ` Henrik Nordstrom
  2004-01-23 14:40       ` Michal Ludvig
@ 2004-01-23 15:51       ` Tom Eastep
  2004-01-24  8:22         ` Willy Tarreau
  1 sibling, 1 reply; 63+ messages in thread
From: Tom Eastep @ 2004-01-23 15:51 UTC (permalink / raw)
  To: Henrik Nordstrom, Michal Ludvig; +Cc: netfilter-devel

On Friday 23 January 2004 06:24 am, Henrik Nordstrom wrote:
> On Fri, 23 Jan 2004, Michal Ludvig wrote:
> > I.e. the postrouting on the unencrypted packet is really called right
> > before it gets encrypted. And the encrypted packet then hits the
> > POSTROUTING again, but it's already a different packet, actually.
>
> My issue is with packets not destinated for an IPSec tunnel.. from what I
> read your patch these will hit POSTROUTING twice. But maybe I misread your
> patch?

I'm concerned that passing the unencrypted packet through POSTROUTING 
apparently makes it impossible for firewall rules to enforce encryption to a 
remote host or network.

It's my opinion that when the ipsec<n> devices where eliminated, the baby got 
thrown out with the bath water.

My $.02...

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-23 15:51       ` Tom Eastep
@ 2004-01-24  8:22         ` Willy Tarreau
  2004-01-24  9:21           ` Henrik Nordstrom
  0 siblings, 1 reply; 63+ messages in thread
From: Willy Tarreau @ 2004-01-24  8:22 UTC (permalink / raw)
  To: Tom Eastep; +Cc: Henrik Nordstrom, Michal Ludvig, netfilter-devel

Hi,

On Fri, Jan 23, 2004 at 07:51:40AM -0800, Tom Eastep wrote:
 
> I'm concerned that passing the unencrypted packet through POSTROUTING 
> apparently makes it impossible for firewall rules to enforce encryption to a 
> remote host or network.
> 
> It's my opinion that when the ipsec<n> devices where eliminated, the baby got 
> thrown out with the bath water.

That was my point too. Even if the packet is only encapsulated, from an IP
point of view, this is a completely new locally generated packet, and we
need to be able to apply filtering before and after encapsulation.

Willy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-24  8:22         ` Willy Tarreau
@ 2004-01-24  9:21           ` Henrik Nordstrom
  2004-01-24  9:27             ` Willy Tarreau
  0 siblings, 1 reply; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-24  9:21 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Tom Eastep, Michal Ludvig, netfilter-devel

On Sat, 24 Jan 2004, Willy Tarreau wrote:

> That was my point too. Even if the packet is only encapsulated, from an IP
> point of view, this is a completely new locally generated packet, and we
> need to be able to apply filtering before and after encapsulation.

Yes, and to do that the required netfilter hooks needs to be added to the 
new IPSec code when packets leave the IP stack and enters IPSec or the 
reverse, similar to how there was hooks for the ipsecX: interfaces.

These hooks have not yet been done, mostly because nobody has contributed
the code adding these hooks to the new IPSec implementation in a 
reasonable manner I guess.

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-24  9:21           ` Henrik Nordstrom
@ 2004-01-24  9:27             ` Willy Tarreau
  2004-01-27 10:39               ` Harald Welte
  0 siblings, 1 reply; 63+ messages in thread
From: Willy Tarreau @ 2004-01-24  9:27 UTC (permalink / raw)
  To: Henrik Nordstrom; +Cc: Tom Eastep, Michal Ludvig, netfilter-devel

On Sat, Jan 24, 2004 at 10:21:12AM +0100, Henrik Nordstrom wrote:
 
> These hooks have not yet been done, mostly because nobody has contributed
> the code adding these hooks to the new IPSec implementation in a 
> reasonable manner I guess.

OK, that's good news. At first, I thought that people were against this
mode of operation. This is now just a matter of time for anyone of us to
understand how to add the hooks and send a patch ! I wish I add time...

Regards,
Willy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-24  9:27             ` Willy Tarreau
@ 2004-01-27 10:39               ` Harald Welte
  2004-01-27 11:57                 ` Henrik Nordstrom
                                   ` (3 more replies)
  0 siblings, 4 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-27 10:39 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Henrik Nordstrom, Tom Eastep, Michal Ludvig, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 3896 bytes --]

On Sat, Jan 24, 2004 at 10:27:21AM +0100, Willy Tarreau wrote:
> On Sat, Jan 24, 2004 at 10:21:12AM +0100, Henrik Nordstrom wrote:
>  
> > These hooks have not yet been done, mostly because nobody has contributed
> > the code adding these hooks to the new IPSec implementation in a 
> > reasonable manner I guess.
> 
> OK, that's good news. At first, I thought that people were against this
> mode of operation. This is now just a matter of time for anyone of us to
> understand how to add the hooks and send a patch ! I wish I add time...

I have to agree with Henrik.  Once there is an implementation that the
netfilter core team is happy with, we're certainly going to push very
hard to include this in the mainstream kernel.

I have to admit that I haven't yet found a 'perfect' solution of that
problem - but I'm following the ongoing discussion.

JFYI: I really don't like the 'postrouting called twice' solution.  This
violates one of the core basics of netfilter/iptables so far.

I also disagree that an ESP packet should be trated as locally
generated (and thus iterate over OUTPUT).   From my perspective, locally
generated means more something like: sent via a local userspace process
using PF_INET sockets.  If we consider a packet after any kind of change 
as locally-generated, we could argue doing this with NAT'ed packets,
too.  Please try to convince me, if I'm missing some point.

One of the fundamental principles behind netfilter/iptables, especially
NAT, is it's symmetric behaviour (between incoming and outgoing packets,
between the hooks, ...).   So with any possible solution, this should be
kept in mind.

Also, I could expect way more headaches with regard to NAT of IPsec
encapsulated packets.  NAT is based on connection tracking and thus
connection tracking has to be fully compatible with the 2.6.x IPsec
implementation.

Right now, conntrack doesn't even notice that there are ESP/AH packets.
PREROUTING creates a new conntrack for the unencapsulated packet.  Later
the packet is encapsulated, and confirmed at POSTROUTING time [where it
is still the same skb, but one should argue whether that ESP packet
really belongs to the original connection].

My proposal:
From conntracks view (and thus from NAT view), there (shuld be?) two
seperate connections for any given to-be-encapsulated packet:

- the payload connection (tcp/udp/whatever)
	- conntrack entry created at PREROUTING
	- conntrack entry confirmed (put in hash) at XXX
- the AH/ESP connection 
	- conntrack entry created at YYY
	- conntrack entry confirmed at POSTROUTING

To do this, somewhen between esp_output() is called and the beginning of
the modification of the packet payload, we need to call
nf_hook(POST_ROUTING).  This way, conntrack would be able to put the
connection in the hash, and people can do SNAT-like operations in
nat->POSTROUTING.  We could even pass a dummy output device structure
with an interface name "esp" so people can SNAT everything heading for
esp encapsulation.

As for the AH/ESP connection, this has to be created just after the
packet was encapsulated.  We need ot clear skb->nfct (and drop refcount
to old conntrack) after encapsulation, and call nf_hook(PRE_ROUTING).

The then-encapsulated packet will then finally go via the traditional
ipv4 stack and come to the POST_ROUTING hook, where the ESP connection
can be confirmed.

This should solve all our NAT problems for free, shouldn't it?

> Regards,
> Willy

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 10:39               ` Harald Welte
@ 2004-01-27 11:57                 ` Henrik Nordstrom
  2004-01-27 13:07                   ` Harald Welte
  2004-01-27 19:54                   ` Michael Richardson
  2004-01-27 13:27                 ` Valentijn Sessink
                                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-27 11:57 UTC (permalink / raw)
  To: Harald Welte; +Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel

On Tue, 27 Jan 2004, Harald Welte wrote:

> I also disagree that an ESP packet should be trated as locally
> generated (and thus iterate over OUTPUT).

I partly, but at the same time it becomes somewhat complex when making 
firewalling policies when not.

For me it is easier to grasp what is happending with the packet if it is 
viewed like the original packet is forwarded to the ipsec engine and the 
ipsec engine then generates a new packet.

This is very different from NAT as ESP/AH is a complete protocol 
transformation where the resulting packet does not resemble the original 
packet in any manner. This is also very much visible to conntrack where 
the original packets needs to be tracked as one connection while the ipsec 
gateway<->gateway traffic needs to be a separate connection.  IPSec is 
essentially a tunneling protocol, tunneling IP packets over an encrypted 
connection. The tunnel endpoints are the IPSec gateways.

Even in the Host<->Host IPSec case the tunnel view is most appropriate. As
soon as the packets enter the IPSec encapsulation they loose all the
relevant propoerties of the original connection and becomes part of the
single IPSec Host <-> IPSec Host connection all packets between these two
IPSec host is transmitted over.

What may be interesting in netfilter is to preserve some kind of
inheritance, allowing "OUTPUT" rules to know important details about the
original packet when seeing an ESP packet.

My view of where hooks should be for IPSec:

(Client) -> Gateway -> (Gateway)

  1. PREROUTING, original client packet
  2. FORWARD, original client packet. Destination the IPSec SPI
  3. POSTROUTING, original client packet. Destination the IPSec SPI
  4. OUTPUT, ESP packet. Preferably with some preserved information about 
the original packet attached.
  5. POSTROUTING, ESP packet.

If the packet is rerouted in POSTROUTING then this should include IPSec
routing. If the new route is to an IPSec SPI then it is. If not then it is
not.

(Gateway) -> Gateway -> (client)

  1. PREROUTING, ESP packet
  2. FORWARD, ESP packet. Destination the IPSec SPI
  3. POSTROUTING, ESP packet. Destination the IPSec SPI.
  4. PREROUTING, decapsulated packet. With reference to the IPSec SPI.
  5. FORWARD, decapsulated packet. With reference to the IPSec SPI.
  6. POSTROUTING, decapsulated packet. With reference to the IPSec SPI.

Host -> (Gateway)

  1. OUTPUT, original packet
  2. POSTROUTING, original packet. Destination the IPSec SPI
  3. OUTPUT, ESP packet. Preferably with some preserved information about
the original packet attached.
  4. POSTROUTING, ESP packet. Preferably with some preserved information 
about the original packet attached.

(Gateway) -> Host

  1. PREROUTING, ESP packet.
  2. INPUT, ESP packet.
  3. PREROUTING, Decapsulated packet, with reference to the IPSec SPI
  4. INPUT, Decapsulated packet, with reference to the IPSec SPI

Non-IPSec traffic:

  As is today. There must be no change.

These hooks fits naturally into the concepts of the netfilter hook
principles if IPSec is viewed as a tunneling method, which it in my
opinion is best viewed as. It also fits the requirements for conntrack to
be able to track the different types of connections involved, including
NAT operations on both original and ESP traffic (assuming local NAT
enabled if DNAT is required on ESP).

> generated means more something like: sent via a local userspace process
> using PF_INET sockets.  If we consider a packet after any kind of change 
> as locally-generated, we could argue doing this with NAT'ed packets,
> too.  Please try to convince me, if I'm missing some point.

A NAT:ed connection is still the same connection at the packet level. ESP
moves the packet into a new connection carrying all kinds of packets
between the two IPSec endpoints where this host is one of the endpoints.

> One of the fundamental principles behind netfilter/iptables, especially
> NAT, is it's symmetric behaviour (between incoming and outgoing packets,
> between the hooks, ...).   So with any possible solution, this should be
> kept in mind.

Granted.

> My proposal:
> From conntracks view (and thus from NAT view), there (shuld be?) two
> seperate connections for any given to-be-encapsulated packet:

Ah, we agree! ;-)

The above hook orders should match your expectations I think. Maybe we
don't agree on the use of INPUT/OUTPUT in the case of forwarded client
traffic, but I see no other clean way. These packets are deliberately
addressed to/from this host, not the client, and part of the IPSec
connection they are forwarded over between the two IPSec endpoints,
not the original connection.

I save myself from diging into exacly where the hooks need to be in the
kernel as I have not looked in detail at the 2.6 IPSec implementation, but
I take for granted that several need to be at the corresponding locations
in the IPSec protocol kernel.  I also see it likely there needs to be some 
skb extensions to add the requires SPI tracking information to 
allow rulesets a better view of what is going on (optional).

Without knowing in detail how the 2.6 IPSec policies are maintained I see
a slight risk that the 2.6 IPSec policy implementation maybe can not
easily match the routing expectations, but it should be possible to find a
way to at least give reasonable hook order I think. But there is a risk it
is not possible to reroute packets to/from an IPSec route and the DNAT
capabilities may be somewhat limited in such case until the IPSec kernel
is extended with what may be required to do this cleanly. (i.e.  slightly
more than just inserting the appropriate hooks may be required)

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 11:57                 ` Henrik Nordstrom
@ 2004-01-27 13:07                   ` Harald Welte
  2004-01-27 13:22                     ` Henrik Nordstrom
                                       ` (3 more replies)
  2004-01-27 19:54                   ` Michael Richardson
  1 sibling, 4 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-27 13:07 UTC (permalink / raw)
  To: Henrik Nordstrom
  Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 4302 bytes --]

On Tue, Jan 27, 2004 at 12:57:10PM +0100, Henrik Nordstrom wrote:
> For me it is easier to grasp what is happending with the packet if it is 
> viewed like the original packet is forwarded to the ipsec engine and the 
> ipsec engine then generates a new packet.

agreed.

> What may be interesting in netfilter is to preserve some kind of
> inheritance, allowing "OUTPUT" rules to know important details about the
> original packet when seeing an ESP packet.

yes, but that's again another chapter (like how to preserve that kind of
information, ..

> My view of where hooks should be for IPSec:
> 
> (Client) -> Gateway -> (Gateway)
> 
>   1. PREROUTING, original client packet
>   2. FORWARD, original client packet. Destination the IPSec SPI
>   3. POSTROUTING, original client packet. Destination the IPSec SPI
>   4. OUTPUT, ESP packet. Preferably with some preserved information about 
> the original packet attached.
>   5. POSTROUTING, ESP packet.

I agree to all but 4.    My proposal was PREROUTING, yours OUTPUT.  In
the end, none of them really fits.  But I would consider this a minor
issue.  We could also create a new hook 'POSTENCAP' or whatever...
Maybe we could agree at that point, that there is a new hook
NF_IP_POST_XFRM.  It's a matter of iptable_nat/... which chain they want
to attach to that hook.

> If the packet is rerouted in POSTROUTING then this should include IPSec
> routing. If the new route is to an IPSec SPI then it is. If not then it is
> not.

You are talking about the first POSTROUTING, Number 3?  Then I would
definitely agree.

> (Gateway) -> Gateway -> (client)
> 
>   1. PREROUTING, ESP packet
>   2. FORWARD, ESP packet. Destination the IPSec SPI
>   3. POSTROUTING, ESP packet. Destination the IPSec SPI.
>   4. PREROUTING, decapsulated packet. With reference to the IPSec SPI.
>   5. FORWARD, decapsulated packet. With reference to the IPSec SPI.
>   6. POSTROUTING, decapsulated packet. With reference to the IPSec SPI.

This is not symmetric, despite being the exact opposite of the previous
case.  I don't like that.  If we make it similar to the previous case,
it would be:

1. PREROUTING, ESP packet
2. INPUT, ESP packet
3. PREROUTING, decapsulated packet
4. FORWARD, decapsulated packet
5. POSTROUTING, decapsulated packet.

So here we have the analgy of a locally received (INPUT) packet, as the
opposite of the locally received (OUTPUT) packet in your previous case.

> Host -> (Gateway)
> 
>   1. OUTPUT, original packet
>   2. POSTROUTING, original packet. Destination the IPSec SPI
>   3. OUTPUT, ESP packet. Preferably with some preserved information about
> the original packet attached.
>   4. POSTROUTING, ESP packet. Preferably with some preserved information 
> about the original packet attached.

this would make it similar to the case #1.

> (Gateway) -> Host
>   
>   1. PREROUTING, ESP packet.
>   2. INPUT, ESP packet.
>   3. PREROUTING, Decapsulated packet, with reference to the IPSec SPI
>   4. INPUT, Decapsulated packet, with reference to the IPSec SPI

agreed.

 
> Ah, we agree! ;-)

yes, at least in the largest part.

> I save myself from diging into exacly where the hooks need to be in the
> kernel as I have not looked in detail at the 2.6 IPSec implementation, but
> I take for granted that several need to be at the corresponding locations
> in the IPSec protocol kernel.  

I guess I cannot save myself from doing so.  Unfortunately I have very
limited time at this moment, so don't expect it to be done today.

> I also see it likely there needs to be some skb extensions to add the
> requires SPI tracking information to allow rulesets a better view of
> what is going on (optional).

yes, but let's that make step 2.  First we have to agree on the first
step, do a decent implementation and then submit it to netdev.

> Regards
> Henrik

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 13:07                   ` Harald Welte
@ 2004-01-27 13:22                     ` Henrik Nordstrom
  2004-01-27 14:12                     ` Henrik Nordstrom
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-27 13:22 UTC (permalink / raw)
  To: Harald Welte; +Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel

On Tue, 27 Jan 2004, Harald Welte wrote:

> > If the packet is rerouted in POSTROUTING then this should include IPSec
> > routing. If the new route is to an IPSec SPI then it is. If not then it is
> > not.
> 
> You are talking about the first POSTROUTING, Number 3?  Then I would
> definitely agree.

I am.

> > (Gateway) -> Gateway -> (client)
> 
> This is not symmetric, despite being the exact opposite of the previous
> case.  I don't like that.  If we make it similar to the previous case,
> it would be:
> 
> 1. PREROUTING, ESP packet
> 2. INPUT, ESP packet
> 3. PREROUTING, decapsulated packet
> 4. FORWARD, decapsulated packet
> 5. POSTROUTING, decapsulated packet.

Right. My error. Thinks one thing, writes another. Brain obviously not 
fully functional today.

> > I also see it likely there needs to be some skb extensions to add the
> > requires SPI tracking information to allow rulesets a better view of
> > what is going on (optional).
> 
> yes, but let's that make step 2.  First we have to agree on the first
> step, do a decent implementation and then submit it to netdev.

Agreed.

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 13:07                   ` Harald Welte
  2004-01-27 13:22                     ` Henrik Nordstrom
@ 2004-01-27 14:12                     ` Henrik Nordstrom
  2004-01-27 20:51                       ` Harald Welte
  2004-01-27 23:55                     ` Harald Welte
  2004-01-28  0:09                     ` [PATCH]Re: " Harald Welte
  3 siblings, 1 reply; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-27 14:12 UTC (permalink / raw)
  To: Harald Welte; +Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel

On Tue, 27 Jan 2004, Harald Welte wrote:

> > My view of where hooks should be for IPSec:
> > 
> > (Client) -> Gateway -> (Gateway)
> > 
> >   1. PREROUTING, original client packet
> >   2. FORWARD, original client packet. Destination the IPSec SPI
> >   3. POSTROUTING, original client packet. Destination the IPSec SPI
> >   4. OUTPUT, ESP packet. Preferably with some preserved information about 
> > the original packet attached.
> >   5. POSTROUTING, ESP packet.
> 
> I agree to all but 4.    My proposal was PREROUTING, yours OUTPUT.

It is very hard to distinguis the different IPSec cases here. Save of
adding yet another hook for encapsulated and to be decapsulated packets
OUTPUT(/INPUT) is the closest match in my opinion. The packet it sees is
the packet from this IPSec endpoint to the other IPSec endpoint on the
IPSec SPI connection between the two.

The packet inside may be a forwarded packet or locally generated, it's all 
the same. The ESP is an exchange between the two endpoints  OUTPUT -> 
POSTROUTING <-net-> PREROUTING -> INPUT

Use of PREROUTING introduces a completely new semantic unless you want to 
also call FORWARD on these packets which I do not think is appropriate.

If introducing new hooks then we would have

   XXXX -> POSTROUTING <-net-> PREROUTING -> YYYY

if XXXX and YYYY can be the same hook name in this case or not I'll leave
to the core team to ponder upon?

What exacly is the semantical problem of using OUTPUT/INPUT?  In my view
the ESP/AH packet is locally generated, and belongs to a connection with a
local endpoint.

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 14:12                     ` Henrik Nordstrom
@ 2004-01-27 20:51                       ` Harald Welte
  2004-01-27 22:35                         ` Henrik Nordstrom
  2004-01-27 22:41                         ` Willy Tarreau
  0 siblings, 2 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-27 20:51 UTC (permalink / raw)
  To: Henrik Nordstrom
  Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 2053 bytes --]

On Tue, Jan 27, 2004 at 03:12:49PM +0100, Henrik Nordstrom wrote:
> On Tue, 27 Jan 2004, Harald Welte wrote:
> 
> > > My view of where hooks should be for IPSec:
> > > 
> > > (Client) -> Gateway -> (Gateway)
> > > 
> > >   1. PREROUTING, original client packet
> > >   2. FORWARD, original client packet. Destination the IPSec SPI
> > >   3. POSTROUTING, original client packet. Destination the IPSec SPI
> > >   4. OUTPUT, ESP packet. Preferably with some preserved information about 
> > > the original packet attached.
> > >   5. POSTROUTING, ESP packet.
> > 
> > I agree to all but 4.    My proposal was PREROUTING, yours OUTPUT.
> 
> It is very hard to distinguis the different IPSec cases here. Save of
> adding yet another hook for encapsulated and to be decapsulated packets
> OUTPUT(/INPUT) is the closest match in my opinion. 

Please don't confuse hooks with chains.  I suggested to add new
netfilter hooks (NF_IP_{PRE,POST}_XFRM), but attach the old INPUT/OUTPUT
(or even PREROUTING/POSTROUTING) chains to those hooks.

Hm.  You have a point, but I'm still not really sure.  I'll do it the
way you proposed, but I might change my mind again in the future ;)

> If introducing new hooks then we would have
> 
>    XXXX -> POSTROUTING <-net-> PREROUTING -> YYYY
> 
> if XXXX and YYYY can be the same hook name in this case or not I'll leave
> to the core team to ponder upon?
> 
> What exacly is the semantical problem of using OUTPUT/INPUT?  In my view
> the ESP/AH packet is locally generated, and belongs to a connection with a
> local endpoint.

well, you convinced me (for now). new hooks, old chains. 

> Regards
> Henrik

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 20:51                       ` Harald Welte
@ 2004-01-27 22:35                         ` Henrik Nordstrom
  2004-01-28 13:48                           ` Harald Welte
  2004-01-27 22:41                         ` Willy Tarreau
  1 sibling, 1 reply; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-27 22:35 UTC (permalink / raw)
  To: Harald Welte; +Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel

On Tue, 27 Jan 2004, Harald Welte wrote:

> Please don't confuse hooks with chains.  I suggested to add new
> netfilter hooks (NF_IP_{PRE,POST}_XFRM), but attach the old INPUT/OUTPUT
> (or even PREROUTING/POSTROUTING) chains to those hooks.

Sorry for the naming mismatch. With the current close relation between 
hooks and chains it is easy to confuse the two when one is not installing 
hooks every day. All thoughts was in netfilter hooks not iptables chains.

Using the existing existing hooks with NF_IP_LOCAL_IN/OUT for the IPSec
wrapped packets is a exact match on what is required for conntrack, but to
answer my own question on what is wrong with INPUT/OUTPUT there is nothing
wrong but NAT and mangle may benefit from using a couple of new hooks for
the "original" packets just at the border to IPSec due to the different
rerouting requirements. NF_IP_LOCAL_IN/OUT is however an exact match for
all existing netfilter module requirements on the encapsulated packets
from what I can tell.

If there is a desire to use new hooks for the encapsulated packets then
this should be done for the whole path of those packets, which obviously
becomes somewhat tricky as it is not known yet at the NF_IP_PRE_ROUTING
time of incoming packets that the encapsulated packet will go to IPSec and
a bastard on final output of transmitted packets.

My slightly revised proposal is now (this time using hook names to avoid
confusion)

(Client) -> Gateway -> (Gateway)

1. NF_IP_PRE_ROUTING
2. NF_IP_FORWARD
3. NF_IP_XXXXXXXX, transparently mapped to the POSTROUTING chain in 
iptables.

[IPSec encapsulation]

4. NF_IP_LOCAL_OUT
5. NF_IP_POST_ROUTING

(Gateway) -> Gateway -> (client)

1. NF_IP_PRE_ROUTING
2. NF_IP_LOCAL_IN

[IPSec decapsulation]

3. NF_IP_YYYYYYYY, transparently mapped to the PREROUTING chain in 
iptables
4. NF_IP_FORWARD
5. NF_IP_POST_ROUTING

And similarily for the Host -> Gateway/Host case (locally generated 
packets).

But I'd say the use of new hooks should be avoided unless seen needed when 
implementing rerouting. Similar "rerouting" needs to be done in the normal 
hook to allow netfilter packet modifications to change the routing/policy 
decision to cause a previously clear text packet to be sent over IPSec as 
is needed on the same changes to packets decided to go to an IPSec SPI.

My problem with adding new hooks is that I don't see a value in adding new
types of hooks if all uses of this hook will be mapped to the exact same
code paths.

What is pretty clear is that the "rerouting" requirements is far from
obvious and needs to be analyzed carefully. If one thought rerouting of
packets in the plain IP with ip_route_me_harder was a mess the situation
is now some orders of magnitude more complex it seems, at least from a
quick evaluation without digging into the actual 2.6 IPSec code.

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 22:35                         ` Henrik Nordstrom
@ 2004-01-28 13:48                           ` Harald Welte
  0 siblings, 0 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-28 13:48 UTC (permalink / raw)
  To: Henrik Nordstrom
  Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]

On Tue, Jan 27, 2004 at 11:35:48PM +0100, Henrik Nordstrom wrote:
> Sorry for the naming mismatch. With the current close relation between 
> hooks and chains it is easy to confuse the two when one is not installing 
> hooks every day. All thoughts was in netfilter hooks not iptables chains.
> 
> Using the existing existing hooks with NF_IP_LOCAL_IN/OUT for the IPSec
> wrapped packets is a exact match on what is required for conntrack, but to
> answer my own question on what is wrong with INPUT/OUTPUT there is nothing
> wrong but NAT and mangle may benefit from using a couple of new hooks for
> the "original" packets just at the border to IPSec due to the different
> rerouting requirements. NF_IP_LOCAL_IN/OUT is however an exact match for
> all existing netfilter module requirements on the encapsulated packets
> from what I can tell.

Well, I now had to find out that libiptc depends on a 'one hook per
chain' assumption, too bad.  My idea was to attach the same chain to
multiple hooks.  That is not possible, as of now.

As you can see in your patch, I gave up on all of that and just use
the existing hooks.

> What is pretty clear is that the "rerouting" requirements is far from
> obvious and needs to be analyzed carefully. If one thought rerouting of
> packets in the plain IP with ip_route_me_harder was a mess the situation
> is now some orders of magnitude more complex it seems, at least from a
> quick evaluation without digging into the actual 2.6 IPSec code.

yes, indeed.  Unfortunately, analyzing this whole issue has alrady cost
me too much of my time the last couple of days :(  I'll have to defer it
for now.

> Regards
> Henrik

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 20:51                       ` Harald Welte
  2004-01-27 22:35                         ` Henrik Nordstrom
@ 2004-01-27 22:41                         ` Willy Tarreau
  1 sibling, 0 replies; 63+ messages in thread
From: Willy Tarreau @ 2004-01-27 22:41 UTC (permalink / raw)
  To: Harald Welte, Henrik Nordstrom, Willy Tarreau, Tom Eastep,
	Michal Ludvig, netfilter-devel

Hi guys,

On Tue, Jan 27, 2004 at 09:51:23PM +0100, Harald Welte wrote:
<...> 
> Hm.  You have a point, but I'm still not really sure.  I'll do it the
> way you proposed, but I might change my mind again in the future ;)

Cool, I'm very glad that we're at least several ones to think it the same
way. I didn't respond because my proposal would have been identical to
Henrik's. So please count my vote :-)

Thanks,
Willy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 13:07                   ` Harald Welte
  2004-01-27 13:22                     ` Henrik Nordstrom
  2004-01-27 14:12                     ` Henrik Nordstrom
@ 2004-01-27 23:55                     ` Harald Welte
  2004-01-28  0:14                       ` Willy Tarreau
  2004-01-28  0:09                     ` [PATCH]Re: " Harald Welte
  3 siblings, 1 reply; 63+ messages in thread
From: Harald Welte @ 2004-01-27 23:55 UTC (permalink / raw)
  To: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 6649 bytes --]

Hi!

I've now hacked a preliminary patch.  Be warned, I didn't even test if
it compiles.  Maybe someone who actually has a running 2.6.x ipsec setup
can give it a try.

What it should be doing
- traverse the additional chains as described in the last email
- have connection tracking recognize two seperate connections, one
  for the decapsulated traffic, one for the encapsulated
- thus, even SNAT/DNAT should be working.
- locally-encapsulated traffic shows up with input device "ah4" or
  "esp4" in POSTROUTING.
- locally-encapsulated traffic shows up with output device "ah4" or
  "esp4" in OUTPUT.

What is missing (TODO):
- no dummy device names in INPUT/PREROUTING for locally-decapsulated
  packets.  This is somewhat harder
- no real output device shown in OUTPUT for locally-encapsulated
  packets.  I'm not sure if it is legal to typecast the just-popped
  dst_entry to 'struct rtable' and derive the output interface from
  there.

Please give feedback.

P.S.: Yes, the #ifdef's are ugly. I know.


diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/ah4.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/ah4.c
--- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/ah4.c	2003-12-18 11:39:26.000000000 +0100
+++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/ah4.c	2004-01-28 00:40:38.923531384 +0100
@@ -9,6 +9,12 @@
 #include <net/icmp.h>
 #include <asm/scatterlist.h>
 
+static struct net_device *ah_dummydev;
+
+static int ah_xmit_bypass(struct sk_buff *skb)
+{
+	return NET_XMIT_BYPASS;
+}
 
 /* Clear mutable options and find final destination to substitute
  * into IP header for icv calculation. Options are already checked
@@ -54,7 +60,7 @@
 	return 0;
 }
 
-static int ah_output(struct sk_buff *skb)
+static int ah_output_finish(struct sk_buff *skb)
 {
 	int err;
 	struct dst_entry *dst = skb->dst;
@@ -145,7 +151,17 @@
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
-	return NET_XMIT_BYPASS;
+
+#ifdef CONFIG_NETFILTER
+	nf_conntrack_put(skb->conntrack);
+	skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+	skb->nf_debug = 0;
+#endif
+#endif
+
+	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, ah_dummydev, NULL,
+		       ah_xmit_bypass);
 
 error:
 	spin_unlock_bh(&x->lock);
@@ -154,6 +170,12 @@
 	return err;
 }
 
+static ah_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET, IP_NF_POST_ROUTING, skb, skb->dev, ah_dummydev,
+		       ah_output_finish);
+}
+
 int ah_input(struct xfrm_state *x, struct xfrm_decap_state *decap, struct sk_buff *skb)
 {
 	int ah_hlen;
@@ -354,11 +376,18 @@
 		xfrm_unregister_type(&ah_type, AF_INET);
 		return -EAGAIN;
 	}
+	ah_dummydev = alloc_netdev(0, "ah4", NULL);
+	if (!ah_dummydev) {
+		printk(KERN_INFO "ip ah init: can't allocate dummydev\n");
+		inet_del_protocol(&ah4_protocol, IPPROTO_AH);
+		xfrm_unregister_type(&ah_type, AF_INET);
+	}
 	return 0;
 }
 
 static void __exit ah4_fini(void)
 {
+	free_netdev(ah_dummydev);
 	if (inet_del_protocol(&ah4_protocol, IPPROTO_AH) < 0)
 		printk(KERN_INFO "ip ah close: can't remove protocol\n");
 	if (xfrm_unregister_type(&ah_type, AF_INET) < 0)
diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/esp4.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/esp4.c
--- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/esp4.c	2003-12-18 11:38:22.000000000 +0100
+++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/esp4.c	2004-01-28 00:33:58.002480648 +0100
@@ -20,7 +20,16 @@
 	__u8		proto;
 };
 
-int esp_output(struct sk_buff *skb)
+static struct net_device *esp_dummydev;
+
+/* dummy function for netfilter.  esp_output_finish() has to return
+ * that particular value upon success */
+static int esp_xmit_bypass(struct sk_buff *skb)
+{
+	return NET_XMIT_BYPASS;
+}
+
+static int esp_output_finish(struct sk_buff *skb)
 {
 	int err;
 	struct dst_entry *dst = skb->dst;
@@ -199,7 +208,17 @@
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
-	return NET_XMIT_BYPASS;
+
+#ifdef CONFIG_NETFILTER
+	nf_conntrack_put(skb->conntrack);
+	skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+	skb->nf_debug = 0;
+#endif
+#endif
+	/* FIXME: return real destination device */
+	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, esp_dummydev, NULL,
+		       esp_xmit_bypass);
 
 error:
 	spin_unlock_bh(&x->lock);
@@ -208,6 +227,12 @@
 	return err;
 }
 
+int esp_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, esp_dummydev,
+		       esp_output_finish);
+}
+
 /*
  * Note: detecting truncated vs. non-truncated authentication data is very
  * expensive, so we only support truncated data, which is the recommended
@@ -596,11 +621,19 @@
 		xfrm_unregister_type(&esp_type, AF_INET);
 		return -EAGAIN;
 	}
+	esp_dummydev = alloc_netdev(0, "esp4", NULL);
+	if (!esp_dummydev) {
+		printk(KERN_INFO "ip esp init: can't allocate dummydev\n");
+		inet_del_protocol(&esp4_protocol, IPPROTO_ESP);
+		xfrm_unregister_type(&esp_type, AF_INET);
+		return -EAGAIN;
+	}
 	return 0;
 }
 
 static void __exit esp4_fini(void)
 {
+	free_netdev(esp_dummydev);
 	if (inet_del_protocol(&esp4_protocol, IPPROTO_ESP) < 0)
 		printk(KERN_INFO "ip esp close: can't remove protocol\n");
 	if (xfrm_unregister_type(&esp_type, AF_INET) < 0)
diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/xfrm4_input.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/xfrm4_input.c
--- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/xfrm4_input.c	2004-01-28 00:50:36.204730840 +0100
+++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/xfrm4_input.c	2004-01-28 00:51:39.557099816 +0100
@@ -6,6 +6,8 @@
  *		Split up af-specific portion
  *	Derek Atkins <derek@ihtfp.com>
  *		Add Encapsulation support
+ *	Harald Welte <laforge@netfilter.org>
+ *		Drop conntrack reference before calling netif_rx()
  * 	
  */
 
@@ -130,6 +132,13 @@
 			dst_release(skb->dst);
 			skb->dst = NULL;
 		}
+#ifdef CONFIG_NETFILTER
+		nf_conntrack_put(skb->conntrack);
+		skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+		skb->nf_debug = 0;
+#endif
+#endif
 		netif_rx(skb);
 		return 0;
 	} else {
-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 23:55                     ` Harald Welte
@ 2004-01-28  0:14                       ` Willy Tarreau
  0 siblings, 0 replies; 63+ messages in thread
From: Willy Tarreau @ 2004-01-28  0:14 UTC (permalink / raw)
  To: Harald Welte, Henrik Nordstrom, Tom Eastep, Michal Ludvig,
	netfilter-devel

On Wed, Jan 28, 2004 at 12:55:25AM +0100, Harald Welte wrote:
> Hi!
> 
> I've now hacked a preliminary patch.  Be warned, I didn't even test if
> it compiles.  Maybe someone who actually has a running 2.6.x ipsec setup
> can give it a try.

I have somewhere a working 2.6 ipsec backport to 2.4 thanks to Davem and
Herbert Xu. I'll see how it applies on it and if it works too, because
this is nearly the only thing refraining me from definitely switching over
this new stack. But don't expect any feedback before this week-end.

Thanks,
willy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH]Re: NAT before IPsec with 2.6
  2004-01-27 13:07                   ` Harald Welte
                                       ` (2 preceding siblings ...)
  2004-01-27 23:55                     ` Harald Welte
@ 2004-01-28  0:09                     ` Harald Welte
  2004-01-28  8:49                       ` Patrick McHardy
  2004-01-28 10:30                       ` [PATCH]Re: NAT before IPsec with 2.6 Andreas Jellinghaus
  3 siblings, 2 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-28  0:09 UTC (permalink / raw)
  To: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 7006 bytes --]

Hi!

[please ignore the last message, I mistakenly sent an outdated, incomplete 
 patch]

I've now hacked a preliminary patch.  Be warned, I didn't even test if
it compiles.  Maybe someone who actually has a running 2.6.x ipsec setup
can give it a try.

What it should be doing
- traverse the additional chains as described in the last email
- have connection tracking recognize two seperate connections, one
  for the decapsulated traffic, one for the encapsulated
- thus, even SNAT/DNAT should be working.
- locally-encapsulated traffic shows up with input device "ah4" or
  "esp4" in POSTROUTING.
- locally-encapsulated traffic shows up with output device "ah4" or
  "esp4" in OUTPUT.

What is missing (TODO):
- no dummy device names in INPUT/PREROUTING for locally-decapsulated
  packets.  This is somewhat harder
- no real output device shown in OUTPUT for locally-encapsulated
  packets.  I'm not sure if it is legal to typecast the just-popped
  dst_entry to 'struct rtable' and derive the output interface from
  there.

Please give feedback.

P.S.: Yes, the #ifdef's are ugly. I know.


diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/ah4.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/ah4.c
--- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/ah4.c	2003-12-18 11:39:26.000000000 +0100
+++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/ah4.c	2004-01-28 01:08:59.430015104 +0100
@@ -7,8 +7,15 @@
 #include <linux/crypto.h>
 #include <linux/pfkeyv2.h>
 #include <net/icmp.h>
+#include <linux/netfilter_ipv4.h>
 #include <asm/scatterlist.h>
 
+static struct net_device *ah_dummydev;
+
+static int ah_xmit_bypass(struct sk_buff *skb)
+{
+	return NET_XMIT_BYPASS;
+}
 
 /* Clear mutable options and find final destination to substitute
  * into IP header for icv calculation. Options are already checked
@@ -54,7 +61,7 @@
 	return 0;
 }
 
-static int ah_output(struct sk_buff *skb)
+static int ah_output_finish(struct sk_buff *skb)
 {
 	int err;
 	struct dst_entry *dst = skb->dst;
@@ -145,7 +152,17 @@
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
-	return NET_XMIT_BYPASS;
+
+#ifdef CONFIG_NETFILTER
+	nf_conntrack_put(skb->nfct);
+	skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+	skb->nf_debug = 0;
+#endif
+#endif
+
+	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, ah_dummydev, NULL,
+		       ah_xmit_bypass);
 
 error:
 	spin_unlock_bh(&x->lock);
@@ -154,6 +171,12 @@
 	return err;
 }
 
+static int ah_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, ah_dummydev,
+		       ah_output_finish);
+}
+
 int ah_input(struct xfrm_state *x, struct xfrm_decap_state *decap, struct sk_buff *skb)
 {
 	int ah_hlen;
@@ -354,11 +377,18 @@
 		xfrm_unregister_type(&ah_type, AF_INET);
 		return -EAGAIN;
 	}
+	ah_dummydev = alloc_netdev(0, "ah4", NULL);
+	if (!ah_dummydev) {
+		printk(KERN_INFO "ip ah init: can't allocate dummydev\n");
+		inet_del_protocol(&ah4_protocol, IPPROTO_AH);
+		xfrm_unregister_type(&ah_type, AF_INET);
+	}
 	return 0;
 }
 
 static void __exit ah4_fini(void)
 {
+	free_netdev(ah_dummydev);
 	if (inet_del_protocol(&ah4_protocol, IPPROTO_AH) < 0)
 		printk(KERN_INFO "ip ah close: can't remove protocol\n");
 	if (xfrm_unregister_type(&ah_type, AF_INET) < 0)
diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/esp4.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/esp4.c
--- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/esp4.c	2003-12-18 11:38:22.000000000 +0100
+++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/esp4.c	2004-01-28 01:09:19.610947136 +0100
@@ -8,6 +8,7 @@
 #include <linux/crypto.h>
 #include <linux/pfkeyv2.h>
 #include <linux/random.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/icmp.h>
 #include <net/udp.h>
 
@@ -20,7 +21,16 @@
 	__u8		proto;
 };
 
-int esp_output(struct sk_buff *skb)
+static struct net_device *esp_dummydev;
+
+/* dummy function for netfilter.  esp_output_finish() has to return
+ * that particular value upon success */
+static int esp_xmit_bypass(struct sk_buff *skb)
+{
+	return NET_XMIT_BYPASS;
+}
+
+static int esp_output_finish(struct sk_buff *skb)
 {
 	int err;
 	struct dst_entry *dst = skb->dst;
@@ -199,7 +209,17 @@
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
-	return NET_XMIT_BYPASS;
+
+#ifdef CONFIG_NETFILTER
+	nf_conntrack_put(skb->nfct);
+	skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+	skb->nf_debug = 0;
+#endif
+#endif
+	/* FIXME: return real destination device */
+	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, esp_dummydev, NULL,
+		       esp_xmit_bypass);
 
 error:
 	spin_unlock_bh(&x->lock);
@@ -208,6 +228,12 @@
 	return err;
 }
 
+int esp_output(struct sk_buff *skb)
+{
+	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, esp_dummydev,
+		       esp_output_finish);
+}
+
 /*
  * Note: detecting truncated vs. non-truncated authentication data is very
  * expensive, so we only support truncated data, which is the recommended
@@ -596,11 +622,19 @@
 		xfrm_unregister_type(&esp_type, AF_INET);
 		return -EAGAIN;
 	}
+	esp_dummydev = alloc_netdev(0, "esp4", NULL);
+	if (!esp_dummydev) {
+		printk(KERN_INFO "ip esp init: can't allocate dummydev\n");
+		inet_del_protocol(&esp4_protocol, IPPROTO_ESP);
+		xfrm_unregister_type(&esp_type, AF_INET);
+		return -EAGAIN;
+	}
 	return 0;
 }
 
 static void __exit esp4_fini(void)
 {
+	free_netdev(esp_dummydev);
 	if (inet_del_protocol(&esp4_protocol, IPPROTO_ESP) < 0)
 		printk(KERN_INFO "ip esp close: can't remove protocol\n");
 	if (xfrm_unregister_type(&esp_type, AF_INET) < 0)
diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/xfrm4_input.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/xfrm4_input.c
--- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/xfrm4_input.c	2004-01-28 00:50:36.204730840 +0100
+++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/xfrm4_input.c	2004-01-28 00:51:39.557099816 +0100
@@ -6,6 +6,8 @@
  *		Split up af-specific portion
  *	Derek Atkins <derek@ihtfp.com>
  *		Add Encapsulation support
+ *	Harald Welte <laforge@netfilter.org>
+ *		Drop conntrack reference before calling netif_rx()
  * 	
  */
 
@@ -130,6 +132,13 @@
 			dst_release(skb->dst);
 			skb->dst = NULL;
 		}
+#ifdef CONFIG_NETFILTER
+		nf_conntrack_put(skb->nfct);
+		skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+		skb->nf_debug = 0;
+#endif
+#endif
 		netif_rx(skb);
 		return 0;
 	} else {
--
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28  0:09                     ` [PATCH]Re: " Harald Welte
@ 2004-01-28  8:49                       ` Patrick McHardy
  2004-01-28  9:37                         ` Patrick McHardy
  2004-01-28 10:30                         ` Harald Welte
  2004-01-28 10:30                       ` [PATCH]Re: NAT before IPsec with 2.6 Andreas Jellinghaus
  1 sibling, 2 replies; 63+ messages in thread
From: Patrick McHardy @ 2004-01-28  8:49 UTC (permalink / raw)
  To: Harald Welte
  Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

Harald Welte wrote:

>Hi!
>
>[please ignore the last message, I mistakenly sent an outdated, incomplete 
> patch]
>
>I've now hacked a preliminary patch.  Be warned, I didn't even test if
>it compiles.  Maybe someone who actually has a running 2.6.x ipsec setup
>can give it a try.
>
>What it should be doing
>- traverse the additional chains as described in the last email
>- have connection tracking recognize two seperate connections, one
>  for the decapsulated traffic, one for the encapsulated
>- thus, even SNAT/DNAT should be working.
>- locally-encapsulated traffic shows up with input device "ah4" or
>  "esp4" in POSTROUTING.
>- locally-encapsulated traffic shows up with output device "ah4" or
>  "esp4" in OUTPUT.
>
>What is missing (TODO):
>- no dummy device names in INPUT/PREROUTING for locally-decapsulated
>  packets.  This is somewhat harder
>- no real output device shown in OUTPUT for locally-encapsulated
>  packets.  I'm not sure if it is legal to typecast the just-popped
>  dst_entry to 'struct rtable' and derive the output interface from
>  there.
>
>Please give feedback.
>  
>
I see two problems with this approach. The dummy devices don't have
any ip config, so f.e. REDIRECT will fail. The bigger problem is
hooking in output routines that return NET_XMIT_BYPASS. dst_output
loops until the return code of skb->dst->output != NET_XMIT_BYPASS.
These output routines replace skb->dst when finished by calling dst_pop.
If we pass the packet through netfilter in between, the dst_entry
might get replaced in ip_route_me_harder or elsewhere and not all
transformations will be applied. If NAT is used,
ip_route_{input,output} might even return a different policy bundle.

Anyways, I'm testing it (slightly hacked) as soon as the compile finishes ;)

Regards,
Patrick

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28  8:49                       ` Patrick McHardy
@ 2004-01-28  9:37                         ` Patrick McHardy
  2004-01-28 10:30                         ` Harald Welte
  1 sibling, 0 replies; 63+ messages in thread
From: Patrick McHardy @ 2004-01-28  9:37 UTC (permalink / raw)
  To: Harald Welte
  Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

Patrick McHardy wrote:

>I see two problems with this approach. The dummy devices don't have
>  
>
>any ip config, so f.e. REDIRECT will fail. The bigger problem is
>  
>
>hooking in output routines that return NET_XMIT_BYPASS. dst_output
>  
>
>loops until the return code of skb->dst->output != NET_XMIT_BYPASS.
>  
>
>These output routines replace skb->dst when finished by calling dst_pop.
>  
>
>If we pass the packet through netfilter in between, the dst_entry
>  
>
>might get replaced in ip_route_me_harder or elsewhere and not all
>  
>
>transformations will be applied. If NAT is used,
>  
>
>ip_route_{input,output} might even return a different policy bundle.
>

Bad news, I've tested the hook part (using skb->dev instead of ah4/esp4
devices), this is how the traces look:

IN= OUT=eth0 SRC=172.20.0.3 DST=192.168.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=28443 SEQ=1
IN= OUT= SRC=172.20.0.3 DST=192.168.0.1 LEN=152 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ESP SPI=0xb9f7900
IN= OUT= SRC=172.20.0.3 DST=192.168.0.1 LEN=176 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=AH SPI=0xc46e27

After applying a MARK rule to the first packet (causing rerouting):

IN= OUT=eth0 SRC=172.20.0.3 DST=192.168.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=51484 SEQ=1

Sniffing in between shows the unencrypted packet.

>Regards,
>  
>
>Patrick
>  
>
>  
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28  8:49                       ` Patrick McHardy
  2004-01-28  9:37                         ` Patrick McHardy
@ 2004-01-28 10:30                         ` Harald Welte
  2004-01-28 11:24                           ` Willy Tarreau
                                             ` (3 more replies)
  1 sibling, 4 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-28 10:30 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 2578 bytes --]

On Wed, Jan 28, 2004 at 09:49:56AM +0100, Patrick McHardy wrote:
> I see two problems with this approach. The dummy devices don't have
> any ip config, so f.e. REDIRECT will fail. 

Ok, let's ignore that for now.  dunno if the dummydev idea was so good
at all.

> The bigger problem is hooking in output routines that return
> NET_XMIT_BYPASS. dst_output loops until the return code of
> skb->dst->output != NET_XMIT_BYPASS.  These output routines replace
> skb->dst when finished by calling dst_pop.
>
> If we pass the packet through netfilter in between, the dst_entry
> might get replaced in ip_route_me_harder or elsewhere and not all
> transformations will be applied. 

I see.   So we would have to modify all code that changes skb->dst to
check if there is a dst stack.  If yes, it would have to iterate over
all of them and just change the last one.  Shouldn't be too hard to
intergrate that change.

However, it is hard to tell what is the correct behaviour in that case.

In POST_ROUTING of the original, unencapsulated packet, I would say it
is correct to not apply those transformations, in case rerouting of the
packet occurs.  If somebody reroutes here, he wants to affect the
original packet, not the encapsulated one.

However, once we did do the first encapsulation, it doesn't make sense
to encapsulate further layers until we've reached the real dst.  At this
point, LOCAL_OUT would be called with the heading-for-the-wire packet
and rerouting will and should affect the final device.

However, exposing all those intermediate steps to the
netfilter-hook-attached code is not so bad.  It enables people to do
whatever they want... they just need to be careful with their rules.
If there's too much interference with normal OUTPUT/POSTROUTING rules,
we could still go for new hooks+new chains.

> If NAT is used, ip_route_{input,output} might even return a different
> policy bundle.

The question is, again: What ist the desired behaviour?  Should the
policy be determined on the un-NAT'ed packet or on the NAT'ed one?

> Anyways, I'm testing it (slightly hacked) as soon as the compile finishes ;)

good luck :)

> Regards,
> Patrick

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28 10:30                         ` Harald Welte
@ 2004-01-28 11:24                           ` Willy Tarreau
  2004-01-28 13:39                             ` Harald Welte
  2004-01-28 15:58                             ` Tom Eastep
  2004-01-28 13:22                           ` Patrick McHardy
                                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 63+ messages in thread
From: Willy Tarreau @ 2004-01-28 11:24 UTC (permalink / raw)
  To: Harald Welte, Patrick McHardy, Henrik Nordstrom, Tom Eastep,
	Michal Ludvig, netfilter-devel

On Wed, Jan 28, 2004 at 11:30:00AM +0100, Harald Welte wrote:
 
> > If NAT is used, ip_route_{input,output} might even return a different
> > policy bundle.
> 
> The question is, again: What ist the desired behaviour?  Should the
> policy be determined on the un-NAT'ed packet or on the NAT'ed one?

Harald, I believe it's important to remember that NAT is not the only usage
of this extension. For many people seeking security (=those who install VPN
gateways for customers), it is very important to :
  - be able to filter what enters and leaves a tunnel
  - be able to filter where the encapsulated packets go to/come from

NAT is complementary IMHO.

Regards,
Willy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28 11:24                           ` Willy Tarreau
@ 2004-01-28 13:39                             ` Harald Welte
  2004-01-28 15:58                             ` Tom Eastep
  1 sibling, 0 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-28 13:39 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Patrick McHardy, Henrik Nordstrom, Tom Eastep, Michal Ludvig,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

On Wed, Jan 28, 2004 at 12:24:19PM +0100, Willy Tarreau wrote:
> On Wed, Jan 28, 2004 at 11:30:00AM +0100, Harald Welte wrote:
>  
> > > If NAT is used, ip_route_{input,output} might even return a different
> > > policy bundle.
> > 
> > The question is, again: What ist the desired behaviour?  Should the
> > policy be determined on the un-NAT'ed packet or on the NAT'ed one?
> 
> Harald, I believe it's important to remember that NAT is not the only usage
> of this extension. For many people seeking security (=those who install VPN
> gateways for customers), it is very important to :
>   - be able to filter what enters and leaves a tunnel
>   - be able to filter where the encapsulated packets go to/come from
> 
> NAT is complementary IMHO.

Yes, I totally agree.  My major point was conntrack being broken, and
thus stateful packet filtering not being possible.  The respective
iptables chains are missing.

Both of them are needed for the additional making-NAT-work.

> Regards,
> Willy

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28 11:24                           ` Willy Tarreau
  2004-01-28 13:39                             ` Harald Welte
@ 2004-01-28 15:58                             ` Tom Eastep
  1 sibling, 0 replies; 63+ messages in thread
From: Tom Eastep @ 2004-01-28 15:58 UTC (permalink / raw)
  To: Willy Tarreau, Harald Welte, Patrick McHardy, Henrik Nordstrom,
	Michal Ludvig, netfilter-devel

On Wednesday 28 January 2004 03:24 am, Willy Tarreau wrote:
> On Wed, Jan 28, 2004 at 11:30:00AM +0100, Harald Welte wrote:
> > > If NAT is used, ip_route_{input,output} might even return a different
> > > policy bundle.
> >
> > The question is, again: What ist the desired behaviour?  Should the
> > policy be determined on the un-NAT'ed packet or on the NAT'ed one?
>
> Harald, I believe it's important to remember that NAT is not the only usage
> of this extension. For many people seeking security (=those who install VPN
> gateways for customers), it is very important to :
>   - be able to filter what enters and leaves a tunnel
>   - be able to filter where the encapsulated packets go to/come from
>

I agree. And the 2.4 implementation of IPSEC allowed this control to be 
exercised in a natural and uniform way (the same model applied for all tunnel 
types including IPSEC).

- Unencrypted traffic is exchanged between the client and a special device.
- encrypted traffic is exchanged between the gateways.

These are separately tracked connections that obey the normal netfilter rules. 

For configured tunnels, I still believe that this is the proper model. 

Opportunistic keying OTOH probably shouldn't follow that model.

> NAT is complementary IMHO.

But there are instances where it is necessary.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28 10:30                         ` Harald Welte
  2004-01-28 11:24                           ` Willy Tarreau
@ 2004-01-28 13:22                           ` Patrick McHardy
  2004-01-28 14:23                           ` Henrik Nordstrom
  2004-02-01 14:52                           ` Patrick McHardy
  3 siblings, 0 replies; 63+ messages in thread
From: Patrick McHardy @ 2004-01-28 13:22 UTC (permalink / raw)
  To: Harald Welte
  Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

Harald Welte wrote:
> On Wed, Jan 28, 2004 at 09:49:56AM +0100, Patrick McHardy wrote:
> 
>>I see two problems with this approach. The dummy devices don't have
>>any ip config, so f.e. REDIRECT will fail. 
> 
> 
> Ok, let's ignore that for now.  dunno if the dummydev idea was so good
> at all.

I believe we pass the real device and provide a new match.
The real device might be interesting for filtering, too.
But as you say, let's ignore it for now.

>>The bigger problem is hooking in output routines that return
>>NET_XMIT_BYPASS. dst_output loops until the return code of
>>skb->dst->output != NET_XMIT_BYPASS.  These output routines replace
>>skb->dst when finished by calling dst_pop.
>>
>>If we pass the packet through netfilter in between, the dst_entry
>>might get replaced in ip_route_me_harder or elsewhere and not all
>>transformations will be applied. 
> 
> 
> I see.   So we would have to modify all code that changes skb->dst to
> check if there is a dst stack.  If yes, it would have to iterate over
> all of them and just change the last one.  Shouldn't be too hard to
> intergrate that change.

> However, it is hard to tell what is the correct behaviour in that case.
> 
> In POST_ROUTING of the original, unencapsulated packet, I would say it
> is correct to not apply those transformations, in case rerouting of the
> packet occurs.  If somebody reroutes here, he wants to affect the
> original packet, not the encapsulated one.

Agreed. The problem is outfn is already set to {ah,esp}_output when
POST_ROUTING of the original packet is called, we need to handle this
somehow.

We also should handle the opposite case, after a (normal) packet is
SNATed ip_route_me_harder will replace the dst_entry, the new one might
include ipsec transformations if a policy for the now-different packet
exists, but the packet is always passed to ip_finish_output2. If we
would call dst_output instead strange hook orders can occur:
  PRE_ROUTING FORWARD POST_ROUTING <SNAT, encapsulate>
   POST_ROUTING LOCAL_OUT <SNAT again ;)> ..
I can't imagine a good solution right now.

> However, once we did do the first encapsulation, it doesn't make sense
> to encapsulate further layers until we've reached the real dst.  At this
> point, LOCAL_OUT would be called with the heading-for-the-wire packet
> and rerouting will and should affect the final device.

I'm not sure if you understand you correctly. Do you mean to say that it
doesn't make sense to pass packets to netfilter hooks until all 
transformations have been applied ? If so, I agree, I don't think there
is much use in doing stuff to half-done packets. We can easily detect
the final dst_entry, it should have dst->xfrm = NULL (at least I would
think so). But I don't know how to skip intermediate ones, I guess
they look similar to the first one.

> 
> However, exposing all those intermediate steps to the
> netfilter-hook-attached code is not so bad.  It enables people to do
> whatever they want... they just need to be careful with their rules.
> If there's too much interference with normal OUTPUT/POSTROUTING rules,
> we could still go for new hooks+new chains.

This makes me think I misunderstood your statement above. I recall there
was some confusion between hook functions and hooks before, it seems
I got me, too ;) Do you propose to make these steps visible to iptables
chains or just to the registered hook functions, which would hide them
from the user by immediately returning in the iptables case ?

> 
> 
>>If NAT is used, ip_route_{input,output} might even return a different
>>policy bundle.
> 
> 
> The question is, again: What ist the desired behaviour?  Should the
> policy be determined on the un-NAT'ed packet or on the NAT'ed one?

The NATed one. When the policy says encrypt 10.0.0.1<->10.0.0.2, NAT
should bypass that. IIRC Herbert Xu mentioned some things about when
policy checks need to be done in the "2.6 IPSEC + SNAT" thread. I'm
going to read it again later.

>>Regards,
>>Patrick
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28 10:30                         ` Harald Welte
  2004-01-28 11:24                           ` Willy Tarreau
  2004-01-28 13:22                           ` Patrick McHardy
@ 2004-01-28 14:23                           ` Henrik Nordstrom
  2004-02-01 14:52                           ` Patrick McHardy
  3 siblings, 0 replies; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-28 14:23 UTC (permalink / raw)
  To: Harald Welte
  Cc: Patrick McHardy, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

On Wed, 28 Jan 2004, Harald Welte wrote:

> The question is, again: What ist the desired behaviour?  Should the
> policy be determined on the un-NAT'ed packet or on the NAT'ed one?

IN my opinion it must be on the NAT'ed one, or else there is a very large
risk the packet does not match the selected policy and the other endpoint
will reject the resulting IPSec packet.

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28 10:30                         ` Harald Welte
                                             ` (2 preceding siblings ...)
  2004-01-28 14:23                           ` Henrik Nordstrom
@ 2004-02-01 14:52                           ` Patrick McHardy
  2004-02-16  1:19                             ` Patrick McHardy
  3 siblings, 1 reply; 63+ messages in thread
From: Patrick McHardy @ 2004-02-01 14:52 UTC (permalink / raw)
  To: Harald Welte
  Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

I've hacked together a working version, it's a ugly hack but
it shows what parts are required for getting NAT working.
Comments are welcome.

The patch is split into three parts: 01-hooks.diff, 02-nat-original.diff 
  and 03-nat-reply.diff. I'm going to discuss each part seperately.

01-hooks.diff:
This patch adds the required hooks. Locally generated and forwarded
packets pass through POST_ROUTING before beeing encypted and pass
through LOCAL_OUT afterwards. This is currently not consistent with
input packets which pass through PRE_ROUTING and LOCAL_IN encrypted
and then again after beeing decapsulated from a tunnel-mode
encapsulation. This means packets from nested tunnels will pass
multiple times, packets from transport-mode connections won't pass
the hook after decryption at all.

02-nat-original.diff:
This patch adds policy lookups to ip_route_me_harder and changes
ip_nat_standalone to reroute in both LOCAL_OUT and POST_ROUTING
if the routing key or the key used for policy lookups changed.
This allows to NAT packets and have the correct policy applied
to them afterwards.

Since the POST_ROUTING hook is called with ip_finish_output2 as outfn
the packets which have a transformer-dst_entry after ip_route_me_harder
are manually confirmed (conntrack) and passed to dst_output. These
packets will be passed through LOCAL_OUT after encryption has been done.
There is a danger of loops, the encrypted packets could be natted in
LOCAL_OUT and POST_ROUTING again and again match a policy.

The packets matching a policy after rerouting in LOCAL_OUT don't need
this kind of special handling, LOCAL_OUT is called with dst_output
as outfn. They are currently not passwd through POST_ROUTING before
encryption, which I just noticed while writing this mail.

03-nat-reply.diff:
This patch changes xfrm_policy_check to lookup the correct policy
for input packets after NAT has been applied. Unfortunately some
policy checks are performed after skb->nfct has been released so
I had to introduce a leak.

To summarize the problems:
- Not completely consistens picture wrt. hooks passed and ordering.
Should be easily fixable.
- loop danger. suggestions ?
- nat information required at socket receive time for policy checks,
currently achieved by introducing a leak. possible solution is to
store the required data in new skb fields, it's only 6 bytes ..

What is currently working is the following:
- Filter traffic before/after encryption (more or less)
- SNAT/DNAT packets and have correct policies applied, including none
- Have policy checks find correct policy for SNAT/DNAT/SNAT+DNATed
packets


Regards,
Patrick

[-- Attachment #2: 01-hooks.diff --]
[-- Type: text/plain, Size: 4085 bytes --]

===== net/ipv4/ah4.c 1.29 vs edited =====
--- 1.29/net/ipv4/ah4.c	Sat Jan 24 19:08:48 2004
+++ edited/net/ipv4/ah4.c	Sat Jan 31 16:26:44 2004
@@ -6,6 +6,8 @@
 #include <net/ah.h>
 #include <linux/crypto.h>
 #include <linux/pfkeyv2.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/icmp.h>
 #include <asm/scatterlist.h>
 
@@ -54,6 +56,11 @@
 	return 0;
 }
 
+static inline int ah_finish_output(struct sk_buff *skb)
+{
+	return NET_XMIT_BYPASS;
+}
+
 static int ah_output(struct sk_buff *skb)
 {
 	int err;
@@ -144,6 +151,18 @@
 	if ((skb->dst = dst_pop(dst)) == NULL) {
 		err = -EHOSTUNREACH;
 		goto error_nolock;
+	}
+	/* final packet goes through LOCAL_OUT hook */
+	if (skb->dst->xfrm == NULL) {
+#ifdef CONFIG_NETFILTER
+		nf_conntrack_put(skb->nfct);
+		skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+		skb->nf_debug = 0;
+#endif
+#endif
+		return NF_HOOK(AF_INET, NF_IP_LOCAL_OUT, skb, NULL,
+		               skb->dst->dev, ah_finish_output);
 	}
 	return NET_XMIT_BYPASS;
 
===== net/ipv4/esp4.c 1.35 vs edited =====
--- 1.35/net/ipv4/esp4.c	Mon Aug 18 13:14:38 2003
+++ edited/net/ipv4/esp4.c	Sat Jan 31 16:26:33 2004
@@ -8,6 +8,8 @@
 #include <linux/crypto.h>
 #include <linux/pfkeyv2.h>
 #include <linux/random.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/icmp.h>
 #include <net/udp.h>
 
@@ -20,6 +22,11 @@
 	__u8		proto;
 };
 
+static inline int esp_finish_output(struct sk_buff *skb)
+{
+	return NET_XMIT_BYPASS;
+}
+
 int esp_output(struct sk_buff *skb)
 {
 	int err;
@@ -198,6 +205,18 @@
 	if ((skb->dst = dst_pop(dst)) == NULL) {
 		err = -EHOSTUNREACH;
 		goto error_nolock;
+	}
+	/* final packet goes through LOCAL_OUT hook */
+	if (skb->dst->xfrm == NULL) {
+#ifdef CONFIG_NETFILTER
+		nf_conntrack_put(skb->nfct);
+		skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+		skb->nf_debug = 0;
+#endif
+#endif
+		return NF_HOOK(AF_INET, NF_IP_LOCAL_OUT, skb, NULL,
+		               skb->dst->dev, esp_finish_output);
 	}
 	return NET_XMIT_BYPASS;
 
===== net/ipv4/ip_forward.c 1.9 vs edited =====
--- 1.9/net/ipv4/ip_forward.c	Sun Mar 23 11:21:28 2003
+++ edited/net/ipv4/ip_forward.c	Sat Jan 31 16:27:11 2004
@@ -51,6 +51,10 @@
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
+	if (skb->dst->xfrm != NULL)
+		return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev,
+		               dst_output);
+
 	return dst_output(skb);
 }
 
===== net/ipv4/ip_output.c 1.48 vs edited =====
--- 1.48/net/ipv4/ip_output.c	Wed Dec 17 21:06:18 2003
+++ edited/net/ipv4/ip_output.c	Sat Jan 31 16:27:22 2004
@@ -122,6 +122,14 @@
 	return ttl;
 }
 
+static inline int ip_dst_output(struct sk_buff *skb)
+{
+	if (skb->dst->xfrm != NULL)
+		return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+		               skb->dst->dev, dst_output);
+	return dst_output(skb);
+}
+
 /* 
  *		Add an ip header to a skbuff and send it out.
  *
@@ -164,7 +172,7 @@
 
 	/* Send it out. */
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 }
 
 static inline int ip_finish_output2(struct sk_buff *skb)
@@ -386,7 +394,7 @@
 	skb->priority = sk->sk_priority;
 
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 
 no_route:
 	IP_INC_STATS(IpOutNoRoutes);
@@ -1164,7 +1172,7 @@
 
 	/* Netfilter gets whole the not fragmented skb. */
 	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, 
-		      skb->dst->dev, dst_output);
+		      skb->dst->dev, ip_dst_output);
 	if (err) {
 		if (err > 0)
 			err = inet->recverr ? net_xmit_errno(err) : 0;
===== net/ipv4/xfrm4_input.c 1.9 vs edited =====
--- 1.9/net/ipv4/xfrm4_input.c	Fri Aug  8 06:17:15 2003
+++ edited/net/ipv4/xfrm4_input.c	Sat Jan 31 14:23:52 2004
@@ -130,6 +130,13 @@
 			dst_release(skb->dst);
 			skb->dst = NULL;
 		}
+#ifdef CONFIG_NETFILTER
+		nf_conntrack_put(skb->nfct);
+		skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+		skb->nf_debug = 0;
+#endif
+#endif
 		netif_rx(skb);
 		return 0;
 	} else {

[-- Attachment #3: 02-nat-original.diff --]
[-- Type: text/plain, Size: 4629 bytes --]

===== net/core/netfilter.c 1.26 vs edited =====
--- 1.26/net/core/netfilter.c	Sun Sep 28 18:34:18 2003
+++ edited/net/core/netfilter.c	Sun Feb  1 14:39:46 2004
@@ -25,6 +25,7 @@
 #include <linux/icmp.h>
 #include <net/sock.h>
 #include <net/route.h>
+#include <net/xfrm.h>
 #include <linux/ip.h>
 
 #define __KERNEL_SYSCALLS__
@@ -627,6 +628,9 @@
 	struct flowi fl = {};
 	struct dst_entry *odst;
 	unsigned int hh_len;
+#ifdef CONFIG_XFRM
+	struct xfrm_policy_afinfo *afinfo;
+#endif
 
 	/* some non-standard hacks like ipt_REJECT.c:send_reset() can cause
 	 * packets with foreign saddr to appear on the NF_IP_LOCAL_OUT hook.
@@ -665,6 +669,16 @@
 	if ((*pskb)->dst->error)
 		return -1;
 
+#ifdef CONFIG_XFRM
+	afinfo = xfrm_policy_get_afinfo(AF_INET);
+	if (afinfo != NULL) {
+		afinfo->decode_session(*pskb, &fl);
+		xfrm_policy_put_afinfo(afinfo);
+		if (xfrm_lookup(&(*pskb)->dst, &fl, (*pskb)->sk, 1) != 0)
+			return -1;
+	}
+#endif
+
 	/* Change in oif may mean change in hh_len. */
 	hh_len = (*pskb)->dst->dev->hard_header_len;
 	if (skb_headroom(*pskb) < hh_len) {
===== net/ipv4/netfilter/ip_conntrack_standalone.c 1.23 vs edited =====
--- 1.23/net/ipv4/netfilter/ip_conntrack_standalone.c	Fri Dec  5 21:30:11 2003
+++ edited/net/ipv4/netfilter/ip_conntrack_standalone.c	Sat Jan 31 14:37:49 2004
@@ -481,6 +481,7 @@
 EXPORT_SYMBOL(ip_conntrack_alter_reply);
 EXPORT_SYMBOL(ip_conntrack_destroyed);
 EXPORT_SYMBOL(ip_conntrack_get);
+EXPORT_SYMBOL(__ip_conntrack_confirm);
 EXPORT_SYMBOL(need_ip_conntrack);
 EXPORT_SYMBOL(ip_conntrack_helper_register);
 EXPORT_SYMBOL(ip_conntrack_helper_unregister);
===== net/ipv4/netfilter/ip_nat_standalone.c 1.28 vs edited =====
--- 1.28/net/ipv4/netfilter/ip_nat_standalone.c	Fri Oct  3 08:26:55 2003
+++ edited/net/ipv4/netfilter/ip_nat_standalone.c	Sun Feb  1 11:21:29 2004
@@ -160,6 +160,46 @@
 	return do_bindings(ct, ctinfo, info, hooknum, pskb);
 }
 
+struct flow_key
+{
+	u_int32_t addr;
+#ifdef CONFIG_XFRM
+	u_int16_t port;
+#endif
+};
+
+static inline void
+flow_key_get(struct sk_buff *skb, struct flow_key *key, int which)
+{
+	struct iphdr *iph = skb->nh.iph;
+
+	key->addr = which ? iph->daddr : iph->saddr;
+#ifdef CONFIG_XFRM
+	key->port = 0;
+	if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) {
+		u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4);
+		key->port = ports[which];
+	}
+#endif
+}
+
+static inline int
+flow_key_compare(struct sk_buff *skb, struct flow_key *key, int which)
+{
+	struct iphdr *iph = skb->nh.iph;
+	
+	if (key->addr != (which ? iph->daddr : iph->saddr))
+		return 1;
+#ifdef CONFIG_XFRM
+	if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) {
+		u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4);
+		if (key->port != ports[which])
+			return 1;
+	}
+#endif
+	return 0;
+}
+
 static unsigned int
 ip_nat_out(unsigned int hooknum,
 	   struct sk_buff **pskb,
@@ -167,6 +207,9 @@
 	   const struct net_device *out,
 	   int (*okfn)(struct sk_buff *))
 {
+	struct flow_key k;
+	unsigned int ret;
+
 	/* root is playing with raw sockets. */
 	if ((*pskb)->len < sizeof(struct iphdr)
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
@@ -189,7 +232,25 @@
 			return NF_STOLEN;
 	}
 
-	return ip_nat_fn(hooknum, pskb, in, out, okfn);
+	flow_key_get(*pskb, &k, 0);
+	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
+
+	if (ret != NF_DROP && ret != NF_STOLEN
+	    && flow_key_compare(*pskb, &k, 0)) {
+		if (ip_route_me_harder(pskb) != 0)
+			ret = NF_DROP;
+		else if ((*pskb)->dst->xfrm != NULL) {
+			/* packet matches policy after ip_route_me_harder -
+			 * need to manually direct packet to transformers */
+			/* WARNING: loop danger */
+			ret = ip_conntrack_confirm(*pskb);
+			if (ret != NF_DROP) {
+				dst_output(*pskb);
+				ret = NF_STOLEN;
+			}
+		}
+	}
+	return ret;
 }
 
 #ifdef CONFIG_IP_NF_NAT_LOCAL
@@ -200,7 +261,7 @@
 		const struct net_device *out,
 		int (*okfn)(struct sk_buff *))
 {
-	u_int32_t saddr, daddr;
+	struct flow_key k;
 	unsigned int ret;
 
 	/* root is playing with raw sockets. */
@@ -208,14 +269,14 @@
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
 		return NF_ACCEPT;
 
-	saddr = (*pskb)->nh.iph->saddr;
-	daddr = (*pskb)->nh.iph->daddr;
-
+	flow_key_get(*pskb, &k, 1);
 	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
+
 	if (ret != NF_DROP && ret != NF_STOLEN
-	    && ((*pskb)->nh.iph->saddr != saddr
-		|| (*pskb)->nh.iph->daddr != daddr))
-		return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;
+	    && flow_key_compare(*pskb, &k, 1)) {
+		if (ip_route_me_harder(pskb) != 0)
+			ret = NF_DROP;
+	}
 	return ret;
 }
 #endif

[-- Attachment #4: 03-nat-reply.diff --]
[-- Type: text/plain, Size: 3489 bytes --]

===== include/linux/netfilter.h 1.6 vs edited =====
--- 1.6/include/linux/netfilter.h	Wed Jun 25 00:36:10 2003
+++ edited/include/linux/netfilter.h	Sun Feb  1 14:39:16 2004
@@ -166,5 +166,21 @@
 #define NF_HOOK(pf, hook, skb, indev, outdev, okfn) (okfn)(skb)
 #endif /*CONFIG_NETFILTER*/
 
+#ifdef CONFIG_XFRM
+#ifdef CONFIG_IP_NF_NAT_NEEDED
+struct flowi;
+extern void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl);
+
+static inline void
+nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, int family)
+{
+	if (family == AF_INET)
+		nf_nat_decode_session4(skb, fl);
+}
+#else /* CONFIG_IP_NF_NAT_NEEDED */
+#define nf_nat_decode_session(skb, fl, family)
+#endif /* CONFIG_IP_NF_NAT_NEEDED */
+#endif /* CONFIG_XFRM */
+
 #endif /*__KERNEL__*/
 #endif /*__LINUX_NETFILTER_H*/
===== net/core/netfilter.c 1.26 vs edited =====
--- 1.26/net/core/netfilter.c	Sun Sep 28 18:34:18 2003
+++ edited/net/core/netfilter.c	Sun Feb  1 14:39:46 2004
@@ -681,6 +695,49 @@
 
 	return 0;
 }
+
+#if defined(CONFIG_IP_NF_NAT_NEEDED) && defined(CONFIG_XFRM)
+#include <linux/netfilter_ipv4/ip_conntrack.h>
+#include <linux/netfilter_ipv4/ip_nat.h>
+
+void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl)
+{
+	struct ip_conntrack *ct;
+	struct ip_nat_info_manip *m;
+	struct ip_conntrack_tuple *tuple;
+	unsigned int i;
+
+	if (skb->nfct == NULL)
+		return;
+	ct = (struct ip_conntrack *)skb->nfct->master;
+
+	for (i = 0; i < ct->nat.info.num_manips; i++) {
+		m = &ct->nat.info.manips[i];
+		if (m->direction != IP_CT_DIR_REPLY)
+			continue;
+		tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+		if (m->hooknum == NF_IP_PRE_ROUTING &&
+		    m->maniptype == IP_NAT_MANIP_DST) {
+			/* SNAT rule reply mangling */
+			fl->fl4_dst = tuple->dst.ip;
+			if (tuple->dst.protonum == IPPROTO_TCP ||
+			    tuple->dst.protonum == IPPROTO_UDP)
+				fl->fl_ip_dport = tuple->dst.u.tcp.port;
+		}
+#ifdef CONFIG_IP_NF_NAT_LOCAL
+		else if (m->hooknum == NF_IP_LOCAL_IN &&
+		         m->maniptype == IP_NAT_MANIP_SRC) {
+			/* DNAT rule reply mangling */
+			fl->fl4_src = tuple->src.ip;
+			if (tuple->dst.protonum == IPPROTO_TCP ||
+			    tuple->dst.protonum == IPPROTO_UDP)
+				fl->fl_ip_sport = tuple->src.u.tcp.port;
+		}
+#endif
+	}
+}
+#endif /* CONFIG_IP_NF_NAT_NEEDED && CONFIG_XFRM */
 
 int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len)
 {
===== net/ipv4/ip_input.c 1.20 vs edited =====
--- 1.20/net/ipv4/ip_input.c	Mon Sep 29 04:05:42 2003
+++ edited/net/ipv4/ip_input.c	Sat Jan 31 21:30:52 2004
@@ -207,12 +207,14 @@
 
 	__skb_pull(skb, ihl);
 
+#if 0
 #ifdef CONFIG_NETFILTER
 	/* Free reference early: we don't need it any more, and it may
            hold ip_conntrack module loaded indefinitely. */
 	nf_conntrack_put(skb->nfct);
 	skb->nfct = NULL;
 #endif /*CONFIG_NETFILTER*/
+#endif
 
         /* Point into the IP datagram, just past the header. */
         skb->h.raw = skb->data;
===== net/xfrm/xfrm_policy.c 1.47 vs edited =====
--- 1.47/net/xfrm/xfrm_policy.c	Wed Jan 14 08:30:19 2004
+++ edited/net/xfrm/xfrm_policy.c	Sun Feb  1 13:57:09 2004
@@ -21,6 +21,7 @@
 #include <linux/workqueue.h>
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
+#include <linux/netfilter.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
 
@@ -908,6 +909,7 @@
 
 	if (_decode_session(skb, &fl, family) < 0)
 		return 0;
+	nf_nat_decode_session(skb, &fl, family);
 
 	/* First, check used SA against their selectors. */
 	if (skb->sp) {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-02-01 14:52                           ` Patrick McHardy
@ 2004-02-16  1:19                             ` Patrick McHardy
  2004-02-18 14:57                               ` Patrick McHardy
  0 siblings, 1 reply; 63+ messages in thread
From: Patrick McHardy @ 2004-02-16  1:19 UTC (permalink / raw)
  To: Harald Welte
  Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

I've continued to work on the IPSEC+NAT issue, this is the
current status. The hooks for pre/post-ipsec are now entirely
contained in ip_output.c. The transformers mark transformed
packets with a new flag in the control buffer, this is used
for directing ipsec packets to LOCAL_OUT in ip_output(). It
also solves the loop danger when redirecting packets in
ip_nat_out after rerouting, ip_route_me_harder only looks up
a policy for untransformed packets. Redirected packets are
manually confirmed, should this be changed to just mark
them for redirection and do the actual redirecting in the
conntrack hook ? The policy check part is fine now, the
conntrack reference is kept as long as required for the
ip protocol handlers that do policy checks, but it should be
dropped on all paths before packets are queued to the socket.
The nf_nat_decode_packet function now handles all cases of NAT
which affect policy lookups, the last patches didn't handle
DNAT rules in PREROUTING. The last big problem is the input
side, I don't know how to make hook ordering symetric to
output. Suggestions are very welcome.


There are 4 patches:

01-nf_reset.diff
     Move common nf_conntrack_put/nfct=NULL/nf_debug=0 code to
     new inline function nf_reset.

02-hooks.diff
     Make packets to be encrypted visible on POST_ROUTING hook
     and encrypted packets on LOCAL_OUT. Reset nfct etc. before
     reposting the packet into the stack on reception.

03-nat-policy-lookup.diff
     Add policy lookups to ip_route_me_harder and change NAT to
     reroute for any change that affects routing or policy lookups

04-nat-policy-checks.diff
     Make xfrm_policy_check find correct policy for NATed packets


Best regards,
Patrick

[-- Attachment #2: 01-nf_reset.diff --]
[-- Type: text/x-patch, Size: 5407 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/16 00:43:03+01:00 kaber@trash.net 
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv6/sit.c
#   2004/02/16 00:41:03+01:00 kaber@trash.net +2 -14
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv6/ip6_tunnel.c
#   2004/02/16 00:41:03+01:00 kaber@trash.net +1 -7
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv4/ipip.c
#   2004/02/16 00:41:03+01:00 kaber@trash.net +2 -14
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv4/ip_input.c
#   2004/02/16 00:41:03+01:00 kaber@trash.net +1 -5
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv4/ip_gre.c
#   2004/02/16 00:41:03+01:00 kaber@trash.net +2 -14
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# include/linux/skbuff.h
#   2004/02/16 00:41:03+01:00 kaber@trash.net +12 -3
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
diff -Nru a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h	Mon Feb 16 02:12:55 2004
+++ b/include/linux/skbuff.h	Mon Feb 16 02:12:55 2004
@@ -1201,6 +1201,14 @@
 	if (nfct)
 		atomic_inc(&nfct->master->use);
 }
+static inline void nf_reset(struct sk_buff *skb)
+{
+	nf_conntrack_put(skb->nfct);
+	skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+	skb->nf_debug = 0;
+#endif
+}
 
 #ifdef CONFIG_BRIDGE_NETFILTER
 static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
@@ -1213,9 +1221,10 @@
 	if (nf_bridge)
 		atomic_inc(&nf_bridge->use);
 }
-#endif
-
-#endif
+#endif /* CONFIG_BRIDGE_NETFILTER */
+#else /* CONFIG_NETFILTER */
+static inline void nf_reset(struct sk_buff *skb) {}
+#endif /* CONFIG_NETFILTER */
 
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff -Nru a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
--- a/net/ipv4/ip_gre.c	Mon Feb 16 02:12:55 2004
+++ b/net/ipv4/ip_gre.c	Mon Feb 16 02:12:55 2004
@@ -643,13 +643,7 @@
 		skb->dev = tunnel->dev;
 		dst_release(skb->dst);
 		skb->dst = NULL;
-#ifdef CONFIG_NETFILTER
-		nf_conntrack_put(skb->nfct);
-		skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-		skb->nf_debug = 0;
-#endif
-#endif
+		nf_reset(skb);
 		ipgre_ecn_decapsulate(iph, skb);
 		netif_rx(skb);
 		read_unlock(&ipgre_lock);
@@ -877,13 +871,7 @@
 		}
 	}
 
-#ifdef CONFIG_NETFILTER
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-	skb->nf_debug = 0;
-#endif
-#endif
+	nf_reset(skb);
 
 	IPTUNNEL_XMIT();
 	tunnel->recursion--;
diff -Nru a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
--- a/net/ipv4/ip_input.c	Mon Feb 16 02:12:55 2004
+++ b/net/ipv4/ip_input.c	Mon Feb 16 02:12:55 2004
@@ -202,17 +202,13 @@
 
 #ifdef CONFIG_NETFILTER_DEBUG
 	nf_debug_ip_local_deliver(skb);
-	skb->nf_debug = 0;
 #endif /*CONFIG_NETFILTER_DEBUG*/
 
 	__skb_pull(skb, ihl);
 
-#ifdef CONFIG_NETFILTER
 	/* Free reference early: we don't need it any more, and it may
            hold ip_conntrack module loaded indefinitely. */
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#endif /*CONFIG_NETFILTER*/
+	nf_reset(skb);
 
         /* Point into the IP datagram, just past the header. */
         skb->h.raw = skb->data;
diff -Nru a/net/ipv4/ipip.c b/net/ipv4/ipip.c
--- a/net/ipv4/ipip.c	Mon Feb 16 02:12:55 2004
+++ b/net/ipv4/ipip.c	Mon Feb 16 02:12:55 2004
@@ -496,13 +496,7 @@
 		skb->dev = tunnel->dev;
 		dst_release(skb->dst);
 		skb->dst = NULL;
-#ifdef CONFIG_NETFILTER
-		nf_conntrack_put(skb->nfct);
-		skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-		skb->nf_debug = 0;
-#endif
-#endif
+		nf_reset(skb);
 		ipip_ecn_decapsulate(iph, skb);
 		netif_rx(skb);
 		read_unlock(&ipip_lock);
@@ -647,13 +641,7 @@
 	if ((iph->ttl = tiph->ttl) == 0)
 		iph->ttl	=	old_iph->ttl;
 
-#ifdef CONFIG_NETFILTER
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-	skb->nf_debug = 0;
-#endif
-#endif
+	nf_reset(skb);
 
 	IPTUNNEL_XMIT();
 	tunnel->recursion--;
diff -Nru a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
--- a/net/ipv6/ip6_tunnel.c	Mon Feb 16 02:12:55 2004
+++ b/net/ipv6/ip6_tunnel.c	Mon Feb 16 02:12:55 2004
@@ -715,13 +715,7 @@
 	ipv6h->nexthdr = proto;
 	ipv6_addr_copy(&ipv6h->saddr, &fl.fl6_src);
 	ipv6_addr_copy(&ipv6h->daddr, &fl.fl6_dst);
-#ifdef CONFIG_NETFILTER
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-	skb->nf_debug = 0;
-#endif
-#endif
+	nf_reset(skb);
 	pkt_len = skb->len;
 	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, 
 		      skb->dst->dev, dst_output);
diff -Nru a/net/ipv6/sit.c b/net/ipv6/sit.c
--- a/net/ipv6/sit.c	Mon Feb 16 02:12:55 2004
+++ b/net/ipv6/sit.c	Mon Feb 16 02:12:55 2004
@@ -388,13 +388,7 @@
 		skb->dev = tunnel->dev;
 		dst_release(skb->dst);
 		skb->dst = NULL;
-#ifdef CONFIG_NETFILTER
-		nf_conntrack_put(skb->nfct);
-		skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-		skb->nf_debug = 0;
-#endif
-#endif
+		nf_reset(skb);
 		ipip6_ecn_decapsulate(iph, skb);
 		netif_rx(skb);
 		read_unlock(&ipip6_lock);
@@ -580,13 +574,7 @@
 	if ((iph->ttl = tiph->ttl) == 0)
 		iph->ttl	=	iph6->hop_limit;
 
-#ifdef CONFIG_NETFILTER
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-	skb->nf_debug = 0;
-#endif
-#endif
+	nf_reset(skb);
 
 	IPTUNNEL_XMIT();
 	tunnel->recursion--;

[-- Attachment #3: 02-hooks.diff --]
[-- Type: text/x-patch, Size: 6379 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/16 00:50:24+01:00 kaber@trash.net 
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
# net/ipv4/xfrm4_tunnel.c
#   2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
# net/ipv4/xfrm4_input.c
#   2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
# net/ipv4/ipcomp.c
#   2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
# net/ipv4/ip_output.c
#   2004/02/16 00:50:18+01:00 kaber@trash.net +22 -4
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
# net/ipv4/ip_forward.c
#   2004/02/16 00:50:18+01:00 kaber@trash.net +4 -0
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
# net/ipv4/esp4.c
#   2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
# net/ipv4/ah4.c
#   2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
# include/net/ip.h
#   2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0
#   Make packets visible to POST_ROUTING before encryption and LOCAL_OUT
#   afterwards, reset netfilter fields before re-posting packet into the
#   stack on reception.
# 
diff -Nru a/include/net/ip.h b/include/net/ip.h
--- a/include/net/ip.h	Mon Feb 16 02:13:07 2004
+++ b/include/net/ip.h	Mon Feb 16 02:13:07 2004
@@ -48,6 +48,7 @@
 #define IPSKB_TRANSLATED	2
 #define IPSKB_FORWARDED		4
 #define IPSKB_XFRM_TUNNEL_SIZE	8
+#define IPSKB_XFRM_TRANSFORMED	16
 };
 
 struct ipcm_cookie
diff -Nru a/net/ipv4/ah4.c b/net/ipv4/ah4.c
--- a/net/ipv4/ah4.c	Mon Feb 16 02:13:07 2004
+++ b/net/ipv4/ah4.c	Mon Feb 16 02:13:07 2004
@@ -137,6 +137,7 @@
 	ip_send_check(top_iph);
 
 	skb->nh.raw = skb->data;
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 
 	x->curlft.bytes += skb->len;
 	x->curlft.packets++;
diff -Nru a/net/ipv4/esp4.c b/net/ipv4/esp4.c
--- a/net/ipv4/esp4.c	Mon Feb 16 02:13:07 2004
+++ b/net/ipv4/esp4.c	Mon Feb 16 02:13:07 2004
@@ -191,6 +191,7 @@
 	ip_send_check(top_iph);
 
 	skb->nh.raw = skb->data;
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 
 	x->curlft.bytes += skb->len;
 	x->curlft.packets++;
diff -Nru a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
--- a/net/ipv4/ip_forward.c	Mon Feb 16 02:13:07 2004
+++ b/net/ipv4/ip_forward.c	Mon Feb 16 02:13:07 2004
@@ -51,6 +51,10 @@
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
+	if (skb->dst->xfrm != NULL)
+		return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+		               skb->dst->dev, dst_output);
+
 	return dst_output(skb);
 }
 
diff -Nru a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
--- a/net/ipv4/ip_output.c	Mon Feb 16 02:13:07 2004
+++ b/net/ipv4/ip_output.c	Mon Feb 16 02:13:07 2004
@@ -122,6 +122,14 @@
 	return ttl;
 }
 
+static inline int ip_dst_output(struct sk_buff *skb)
+{
+	if (skb->dst->xfrm != NULL)
+		return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+		               skb->dst->dev, dst_output);
+	return dst_output(skb);
+}
+
 /* 
  *		Add an ip header to a skbuff and send it out.
  *
@@ -164,7 +172,7 @@
 
 	/* Send it out. */
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 }
 
 static inline int ip_finish_output2(struct sk_buff *skb)
@@ -282,7 +290,7 @@
 		return ip_finish_output(skb);
 }
 
-int ip_output(struct sk_buff *skb)
+static inline int ip_output2(struct sk_buff *skb)
 {
 	IP_INC_STATS(IpOutRequests);
 
@@ -293,6 +301,16 @@
 		return ip_finish_output(skb);
 }
 
+int ip_output(struct sk_buff *skb)
+{
+	if (IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) {
+		nf_reset(skb);
+		return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL,
+		               skb->dst->dev, ip_output2);
+	}
+	return ip_output2(skb);
+}
+
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
 	struct sock *sk = skb->sk;
@@ -386,7 +404,7 @@
 	skb->priority = sk->sk_priority;
 
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 
 no_route:
 	IP_INC_STATS(IpOutNoRoutes);
@@ -1165,7 +1183,7 @@
 
 	/* Netfilter gets whole the not fragmented skb. */
 	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, 
-		      skb->dst->dev, dst_output);
+		      skb->dst->dev, ip_dst_output);
 	if (err) {
 		if (err > 0)
 			err = inet->recverr ? net_xmit_errno(err) : 0;
diff -Nru a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
--- a/net/ipv4/ipcomp.c	Mon Feb 16 02:13:07 2004
+++ b/net/ipv4/ipcomp.c	Mon Feb 16 02:13:07 2004
@@ -223,6 +223,7 @@
 	skb->nh.raw = skb->data;
 
 out_ok:
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 	x->curlft.bytes += skb->len;
 	x->curlft.packets++;
 	spin_unlock_bh(&x->lock);
diff -Nru a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
--- a/net/ipv4/xfrm4_input.c	Mon Feb 16 02:13:07 2004
+++ b/net/ipv4/xfrm4_input.c	Mon Feb 16 02:13:07 2004
@@ -130,6 +130,7 @@
 			dst_release(skb->dst);
 			skb->dst = NULL;
 		}
+		nf_reset(skb);
 		netif_rx(skb);
 		return 0;
 	} else {
diff -Nru a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c
--- a/net/ipv4/xfrm4_tunnel.c	Mon Feb 16 02:13:07 2004
+++ b/net/ipv4/xfrm4_tunnel.c	Mon Feb 16 02:13:07 2004
@@ -66,6 +66,7 @@
 	ip_send_check(top_iph);
 
 	skb->nh.raw = skb->data;
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 	x->curlft.bytes += skb->len;
 	x->curlft.packets++;
 

[-- Attachment #4: 03-nat-policy-lookup.diff --]
[-- Type: text/x-patch, Size: 5377 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/16 00:51:09+01:00 kaber@trash.net 
#   Add policy lookups to ip_route_me_harder, make NAT reroute for any
#   change in route/policy key
# 
# net/ipv4/netfilter/ip_nat_standalone.c
#   2004/02/16 00:51:03+01:00 kaber@trash.net +72 -8
#   Add policy lookups to ip_route_me_harder, make NAT reroute for any
#   change in route/policy key
# 
# net/ipv4/netfilter/ip_conntrack_standalone.c
#   2004/02/16 00:51:03+01:00 kaber@trash.net +1 -0
#   Add policy lookups to ip_route_me_harder, make NAT reroute for any
#   change in route/policy key
# 
# net/core/netfilter.c
#   2004/02/16 00:51:03+01:00 kaber@trash.net +16 -0
#   Add policy lookups to ip_route_me_harder, make NAT reroute for any
#   change in route/policy key
# 
diff -Nru a/net/core/netfilter.c b/net/core/netfilter.c
--- a/net/core/netfilter.c	Mon Feb 16 02:13:24 2004
+++ b/net/core/netfilter.c	Mon Feb 16 02:13:24 2004
@@ -25,6 +25,8 @@
 #include <linux/icmp.h>
 #include <net/sock.h>
 #include <net/route.h>
+#include <net/xfrm.h>
+#include <net/ip.h>
 #include <linux/ip.h>
 
 #define __KERNEL_SYSCALLS__
@@ -664,6 +666,20 @@
 	
 	if ((*pskb)->dst->error)
 		return -1;
+
+#ifdef CONFIG_XFRM
+	if (!(IPCB(*pskb)->flags & IPSKB_XFRM_TRANSFORMED)) {
+		struct xfrm_policy_afinfo *afinfo;
+
+		afinfo = xfrm_policy_get_afinfo(AF_INET);
+		if (afinfo != NULL) {
+			afinfo->decode_session(*pskb, &fl);
+			xfrm_policy_put_afinfo(afinfo);
+			if (xfrm_lookup(&(*pskb)->dst, &fl, (*pskb)->sk, 0) != 0)
+				return -1;
+		}
+	}
+#endif
 
 	/* Change in oif may mean change in hh_len. */
 	hh_len = (*pskb)->dst->dev->hard_header_len;
diff -Nru a/net/ipv4/netfilter/ip_conntrack_standalone.c b/net/ipv4/netfilter/ip_conntrack_standalone.c
--- a/net/ipv4/netfilter/ip_conntrack_standalone.c	Mon Feb 16 02:13:24 2004
+++ b/net/ipv4/netfilter/ip_conntrack_standalone.c	Mon Feb 16 02:13:24 2004
@@ -489,6 +489,7 @@
 EXPORT_SYMBOL(ip_conntrack_alter_reply);
 EXPORT_SYMBOL(ip_conntrack_destroyed);
 EXPORT_SYMBOL(ip_conntrack_get);
+EXPORT_SYMBOL(__ip_conntrack_confirm);
 EXPORT_SYMBOL(need_ip_conntrack);
 EXPORT_SYMBOL(ip_conntrack_helper_register);
 EXPORT_SYMBOL(ip_conntrack_helper_unregister);
diff -Nru a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c
--- a/net/ipv4/netfilter/ip_nat_standalone.c	Mon Feb 16 02:13:24 2004
+++ b/net/ipv4/netfilter/ip_nat_standalone.c	Mon Feb 16 02:13:24 2004
@@ -166,6 +166,45 @@
 	return do_bindings(ct, ctinfo, info, hooknum, pskb);
 }
 
+struct nat_route_key
+{
+	u_int32_t addr;
+#ifdef CONFIG_XFRM
+	u_int16_t port;
+#endif
+};
+
+static inline void
+nat_route_key_get(struct sk_buff *skb, struct nat_route_key *key, int which)
+{
+	struct iphdr *iph = skb->nh.iph;
+
+	key->addr = which ? iph->daddr : iph->saddr;
+#ifdef CONFIG_XFRM
+	key->port = 0;
+	if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) {
+		u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4);
+		key->port = ports[which];
+	}
+#endif
+}
+
+static inline int
+nat_route_key_compare(struct sk_buff *skb, struct nat_route_key *key, int which)
+{
+	struct iphdr *iph = skb->nh.iph;
+
+	if (key->addr != (which ? iph->daddr : iph->saddr))
+		return 1;
+#ifdef CONFIG_XFRM
+	if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) {
+		u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4);
+		if (key->port != ports[which])
+			return 1;
+	}
+#endif
+}
+
 static unsigned int
 ip_nat_out(unsigned int hooknum,
 	   struct sk_buff **pskb,
@@ -173,6 +212,9 @@
 	   const struct net_device *out,
 	   int (*okfn)(struct sk_buff *))
 {
+	struct nat_route_key key;
+	unsigned int ret;
+
 	/* root is playing with raw sockets. */
 	if ((*pskb)->len < sizeof(struct iphdr)
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
@@ -195,7 +237,29 @@
 			return NF_STOLEN;
 	}
 
-	return ip_nat_fn(hooknum, pskb, in, out, okfn);
+	nat_route_key_get(*pskb, &key, 0);
+	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
+
+	if (ret != NF_DROP && ret != NF_STOLEN
+	    && nat_route_key_compare(*pskb, &key, 0)) {
+		if (ip_route_me_harder(pskb) != 0)
+			ret = NF_DROP;
+#ifdef CONFIG_XFRM
+		/*
+		 * POST_ROUTING hook is called with fixed outfn, we need
+		 * to manually confirm the packet and direct it to the
+		 * transformers if a policy matches.
+		 */
+		else if ((*pskb)->dst->xfrm != NULL) {
+			ret = ip_conntrack_confirm(*pskb);
+			if (ret != NF_DROP) {
+				dst_output(*pskb);
+				ret = NF_STOLEN;
+			}
+		}
+#endif
+	}
+	return ret;
 }
 
 #ifdef CONFIG_IP_NF_NAT_LOCAL
@@ -206,7 +270,7 @@
 		const struct net_device *out,
 		int (*okfn)(struct sk_buff *))
 {
-	u_int32_t saddr, daddr;
+	struct nat_route_key key;
 	unsigned int ret;
 
 	/* root is playing with raw sockets. */
@@ -214,14 +278,14 @@
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
 		return NF_ACCEPT;
 
-	saddr = (*pskb)->nh.iph->saddr;
-	daddr = (*pskb)->nh.iph->daddr;
-
+	nat_route_key_get(*pskb, &key, 1);
 	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
+
 	if (ret != NF_DROP && ret != NF_STOLEN
-	    && ((*pskb)->nh.iph->saddr != saddr
-		|| (*pskb)->nh.iph->daddr != daddr))
-		return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;
+	    && nat_route_key_compare(*pskb, &key, 1)) {
+		if (ip_route_me_harder(pskb) != 0)
+			ret = NF_DROP;
+	}
 	return ret;
 }
 #endif

[-- Attachment #5: 04-nat-policy-check.diff --]
[-- Type: text/x-patch, Size: 5682 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/16 00:54:30+01:00 kaber@trash.net 
#   Make policy checks work with NAT
# 
# net/xfrm/xfrm_policy.c
#   2004/02/16 00:54:24+01:00 kaber@trash.net +2 -0
#   Make policy checks work with NAT
# 
# net/ipv4/udp.c
#   2004/02/16 00:54:24+01:00 kaber@trash.net +2 -0
#   Make policy checks work with NAT
# 
# net/ipv4/tcp_ipv4.c
#   2004/02/16 00:54:24+01:00 kaber@trash.net +1 -0
#   Make policy checks work with NAT
# 
# net/ipv4/raw.c
#   2004/02/16 00:54:24+01:00 kaber@trash.net +1 -0
#   Make policy checks work with NAT
# 
# net/ipv4/ip_input.c
#   2004/02/16 00:54:23+01:00 kaber@trash.net +6 -8
#   Make policy checks work with NAT
# 
# net/core/netfilter.c
#   2004/02/16 00:54:23+01:00 kaber@trash.net +43 -0
#   Make policy checks work with NAT
# 
# include/linux/netfilter.h
#   2004/02/16 00:54:23+01:00 kaber@trash.net +16 -0
#   Make policy checks work with NAT
# 
diff -Nru a/include/linux/netfilter.h b/include/linux/netfilter.h
--- a/include/linux/netfilter.h	Mon Feb 16 02:13:39 2004
+++ b/include/linux/netfilter.h	Mon Feb 16 02:13:39 2004
@@ -166,5 +166,21 @@
 #define NF_HOOK(pf, hook, skb, indev, outdev, okfn) (okfn)(skb)
 #endif /*CONFIG_NETFILTER*/
 
+#ifdef CONFIG_XFRM
+#ifdef CONFIG_IP_NF_NAT_NEEDED
+struct flowi;
+extern void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl);
+
+static inline void
+nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, int family)
+{
+	if (family == AF_INET)
+		nf_nat_decode_session4(skb, fl);
+}
+#else /* CONFIG_IP_NF_NAT_NEEDED */
+#define nf_nat_decode_session(skb,fl,family)
+#endif /* CONFIG_IP_NF_NAT_NEEDED */
+#endif /* CONFIG_XFRM */
+
 #endif /*__KERNEL__*/
 #endif /*__LINUX_NETFILTER_H*/
diff -Nru a/net/core/netfilter.c b/net/core/netfilter.c
--- a/net/core/netfilter.c	Mon Feb 16 02:13:39 2004
+++ b/net/core/netfilter.c	Mon Feb 16 02:13:39 2004
@@ -698,6 +698,49 @@
 	return 0;
 }
 
+#if defined(CONFIG_IP_NF_NAT_NEEDED) && defined(CONFIG_XFRM)
+#include <linux/netfilter_ipv4/ip_conntrack.h>
+#include <linux/netfilter_ipv4/ip_nat.h>
+
+void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl)
+{
+	struct ip_conntrack *ct;
+	struct ip_conntrack_tuple *t;
+	struct ip_nat_info_manip *m;
+	unsigned int i;
+
+	if (skb->nfct == NULL)
+		return;
+	ct = (struct ip_conntrack *)skb->nfct->master;
+
+	for (i = 0; i < ct->nat.info.num_manips; i++) {
+		m = &ct->nat.info.manips[i];
+		t = &ct->tuplehash[m->direction].tuple;
+
+		switch (m->hooknum) {
+		case NF_IP_PRE_ROUTING:
+			if (m->maniptype != IP_NAT_MANIP_DST)
+				break;
+			fl->fl4_dst = t->dst.ip;
+			if (t->dst.protonum == IPPROTO_TCP ||
+			    t->dst.protonum == IPPROTO_UDP)
+				fl->fl_ip_dport = t->dst.u.tcp.port;
+			break;
+#ifdef CONFIG_IP_NF_NAT_LOCAL
+		case NF_IP_LOCAL_IN:
+			if (m->maniptype != IP_NAT_MANIP_SRC)
+				break;
+			fl->fl4_src = t->src.ip;
+			if (t->dst.protonum == IPPROTO_TCP ||
+			    t->dst.protonum == IPPROTO_UDP)
+				fl->fl_ip_sport = t->src.u.tcp.port;
+			break;
+#endif
+		}
+	}
+}
+#endif /* CONFIG_IP_NF_NAT_NEEDED && CONFIG_XFRM */
+
 int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len)
 {
 	struct sk_buff *nskb;
diff -Nru a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
--- a/net/ipv4/ip_input.c	Mon Feb 16 02:13:39 2004
+++ b/net/ipv4/ip_input.c	Mon Feb 16 02:13:39 2004
@@ -206,10 +206,6 @@
 
 	__skb_pull(skb, ihl);
 
-	/* Free reference early: we don't need it any more, and it may
-           hold ip_conntrack module loaded indefinitely. */
-	nf_reset(skb);
-
         /* Point into the IP datagram, just past the header. */
         skb->h.raw = skb->data;
 
@@ -235,10 +231,12 @@
 			int ret;
 
 			smp_read_barrier_depends();
-			if (!ipprot->no_policy &&
-			    !xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-				kfree_skb(skb);
-				goto out;
+			if (!ipprot->no_policy) {
+				if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+					kfree_skb(skb);
+					goto out;
+				}
+				nf_reset(skb);
 			}
 			ret = ipprot->handler(skb);
 			if (ret < 0) {
diff -Nru a/net/ipv4/raw.c b/net/ipv4/raw.c
--- a/net/ipv4/raw.c	Mon Feb 16 02:13:39 2004
+++ b/net/ipv4/raw.c	Mon Feb 16 02:13:39 2004
@@ -249,6 +249,7 @@
 		kfree_skb(skb);
 		return NET_RX_DROP;
 	}
+	nf_reset(skb);
 
 	skb_push(skb, skb->data - skb->nh.raw);
 
diff -Nru a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
--- a/net/ipv4/tcp_ipv4.c	Mon Feb 16 02:13:39 2004
+++ b/net/ipv4/tcp_ipv4.c	Mon Feb 16 02:13:39 2004
@@ -1785,6 +1785,7 @@
 
 	if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
 		goto discard_and_relse;
+	nf_reset(skb);
 
 	if (sk_filter(sk, skb, 0))
 		goto discard_and_relse;
diff -Nru a/net/ipv4/udp.c b/net/ipv4/udp.c
--- a/net/ipv4/udp.c	Mon Feb 16 02:13:39 2004
+++ b/net/ipv4/udp.c	Mon Feb 16 02:13:39 2004
@@ -1027,6 +1027,7 @@
 		kfree_skb(skb);
 		return -1;
 	}
+	nf_reset(skb);
 
 	if (up->encap_type) {
 		/*
@@ -1192,6 +1193,7 @@
 
 	if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 		goto drop;
+	nf_reset(skb);
 
 	/* No socket. Drop packet silently, if checksum is wrong */
 	if (udp_checksum_complete(skb))
diff -Nru a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
--- a/net/xfrm/xfrm_policy.c	Mon Feb 16 02:13:39 2004
+++ b/net/xfrm/xfrm_policy.c	Mon Feb 16 02:13:39 2004
@@ -21,6 +21,7 @@
 #include <linux/workqueue.h>
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
+#include <linux/netfilter.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
 
@@ -908,6 +909,7 @@
 
 	if (_decode_session(skb, &fl, family) < 0)
 		return 0;
+	nf_nat_decode_session(skb, &fl, family);
 
 	/* First, check used SA against their selectors. */
 	if (skb->sp) {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-02-16  1:19                             ` Patrick McHardy
@ 2004-02-18 14:57                               ` Patrick McHardy
       [not found]                                 ` <20040218220337.GA3193@alpha.home.local>
  0 siblings, 1 reply; 63+ messages in thread
From: Patrick McHardy @ 2004-02-18 14:57 UTC (permalink / raw)
  Cc: Harald Welte, Henrik Nordstrom, Willy Tarreau, Tom Eastep,
	Michal Ludvig, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]

Here is another patch on top of the last ones, it makes
incoming packets from transport mode SAs visible to
netfilter hooks and emulates normal behaviour. With this
patch you can share a transport-mode SA between an entire
network using MASQUERADE, but also do weird stuff like
DNAT packets coming from a transport mode SA to remote
hosts ;)

The patch is not exactly nice, but with it, everything should
be working fine except for nested tunnels. I would really
appreciate if I can get some comments for this part and the
other ones so I can prepare some final patches to discuss
with davem.

Regards
Patrick

Patrick McHardy wrote:

> There are 4 patches:
> 
> 01-nf_reset.diff
>     Move common nf_conntrack_put/nfct=NULL/nf_debug=0 code to
>     new inline function nf_reset.
> 
> 02-hooks.diff
>     Make packets to be encrypted visible on POST_ROUTING hook
>     and encrypted packets on LOCAL_OUT. Reset nfct etc. before
>     reposting the packet into the stack on reception.
> 
> 03-nat-policy-lookup.diff
>     Add policy lookups to ip_route_me_harder and change NAT to
>     reroute for any change that affects routing or policy lookups
> 
> 04-nat-policy-checks.diff
>     Make xfrm_policy_check find correct policy for NATed packets

[-- Attachment #2: 05-transport-mode-hooks.diff --]
[-- Type: text/plain, Size: 2092 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/18 15:42:49+01:00 kaber@trash.net 
#   Emulate netfilter hooks for transport mode packets
# 
# net/ipv4/xfrm4_input.c
#   2004/02/18 15:42:42+01:00 kaber@trash.net +46 -1
#   Emulate netfilter hooks for transport mode packets
# 
diff -Nru a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
--- a/net/ipv4/xfrm4_input.c	Wed Feb 18 15:45:17 2004
+++ b/net/ipv4/xfrm4_input.c	Wed Feb 18 15:45:17 2004
@@ -9,10 +9,53 @@
  * 	
  */
 
+#include <linux/config.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/inet_ecn.h>
 #include <net/ip.h>
 #include <net/xfrm.h>
 
+#ifdef CONFIG_NETFILTER
+static inline int emulate_nf_done(struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline int emulate_nf_hooks2(struct sk_buff *skb)
+{
+	if (inet_addr_type(skb->nh.iph->daddr) != RTN_LOCAL) {
+		if (ip_route_me_harder(&skb) == 0)
+			dst_input(skb);
+		else
+			kfree_skb(skb);
+		return -1;
+	}
+		
+	return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL,
+	               emulate_nf_done);
+}
+
+static inline int emulate_nf_hooks(struct sk_buff *skb)
+{
+	int off = skb->data - skb->nh.raw;
+		
+	skb_push(skb, off);
+	skb->nh.iph->tot_len = htons(skb->len);
+	ip_send_check(skb->nh.iph);
+
+	if (NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
+	            emulate_nf_hooks2) != 0)
+		return -1;
+
+	skb_pull(skb, off);
+	return 0;
+}
+#else /* CONFIG_NETFILTER */
+static inline int emulate_nf_hooks(struct sk_buff *skb) {}
+#endif /* CONFIG_NETFILTER */
+
 int xfrm4_rcv(struct sk_buff *skb)
 {
 	return xfrm4_rcv_encap(skb, 0);
@@ -125,15 +168,17 @@
 	memcpy(skb->sp->x+skb->sp->len, xfrm_vec, xfrm_nr*sizeof(struct sec_decap_state));
 	skb->sp->len += xfrm_nr;
 
+	nf_reset(skb);
 	if (decaps) {
 		if (!(skb->dev->flags&IFF_LOOPBACK)) {
 			dst_release(skb->dst);
 			skb->dst = NULL;
 		}
-		nf_reset(skb);
 		netif_rx(skb);
 		return 0;
 	} else {
+		if (emulate_nf_hooks(skb) != 0)
+			return 0;
 		return -skb->nh.iph->protocol;
 	}
 

^ permalink raw reply	[flat|nested] 63+ messages in thread

[parent not found: <20040218220337.GA3193@alpha.home.local>]

* Re: [PATCH]Re: NAT before IPsec with 2.6
       [not found]                                 ` <20040218220337.GA3193@alpha.home.local>
@ 2004-02-20  1:43                                   ` Patrick McHardy
  2004-03-04 22:30                                     ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy
  0 siblings, 1 reply; 63+ messages in thread
From: Patrick McHardy @ 2004-02-20  1:43 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Harald Welte, Henrik Nordstrom, Tom Eastep, Michal Ludvig,
	Netfilter Development Mailinglist

Hi Willy,

Willy Tarreau wrote:
> 
> Judging by the fact that I saw no reply to your previous mail,
> I suspect that we all are a bit busy. Regarding your previous
> question about a possible asymetry, I was starting to draw a
> flow diagram to check what I understood correctly and that
> we could use as a base to comment on, but I finally didn't have
> time to work on it anymore. In fact, I see how a packet passes
> through the tables and chains without ipsec, I have a few doubts
> about what changes with ipsec and your patches (eg: I don't
> remember if the decapsulated packet goes through mgl:PRE or not),
> and I'm yet less certain about what is known about the sessions
> at different stages. If I find time to come up with a diagram
> (even if it's plain wrong), I'll post it here.

Thanks a lot! Decapsulated packets go the usual way, in fact the
patch doesn't change anything for tunnel mode except that
it drops the conntrack reference before packets are posted into
the stack again. For transport mode packets it's a bit different,
and I too am not entirely sure if the packet is always in valid
state with the emulate_nf_hooks stuff, especially when
NAT-Traversal is used. I'm investigating this after I fix a
bug Michal and a second tester reported. Regarding hooks passed,
packets SNATed in POST_ROUTING which have a matching policy
afterwards won't pass the SELINUX and CONNTRACK hooks.
The mangle table may also cause problems when something causes
rerouting, I haven't thought about the possible effects yet.
Other than that I can currently not think of more problems ..

Best regards,
Patrick

> 
> Cheers,
> Willy
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH]: latest netfilter+ipsec patches
  2004-02-20  1:43                                   ` Patrick McHardy
@ 2004-03-04 22:30                                     ` Patrick McHardy
  2004-03-04 23:11                                       ` Willy Tarreau
                                                         ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Patrick McHardy @ 2004-03-04 22:30 UTC (permalink / raw)
  To: Netfilter Development Mailinglist
  Cc: Willy Tarreau, Harald Welte, Henrik Nordstrom, Tom Eastep,
	Michal Ludvig, alex, guillaume

[-- Attachment #1: Type: text/plain, Size: 1919 bytes --]

(long CC list, if anyone doesn't want to be on it anymore please say so)

This is the latest version of the netfilter+ipsec patches, I will check
them in in a couple of days after some minor changes.

(Noteworthy) changes since last version:

- hook-traversal on input is symetrical to output now. This means
packets will traverse PRE_ROUTING/LOCAL_IN before decryption and
PRE_REROUTING/LOCAL_IN or PRE_ROUTING/FORWARD afterwards.

The encrypted packets go through the stack, including the hooks,
the usual way. Packets reposted into the stack (tunnel-mode SAs)
skip the hooks until we know decryption is finished. If after
input-routing one of these packets has a non-local destination,
we know decyption is finished. The function nf_postxfrm_nonlocal()
which passes them to PRE_ROUTING is called for these packets.
Packets with local-destination are handled in
ip_local_deliver_finish(), if the ip-protocol handler is not marked
as xfrm_prot (all ipsec-transformers), decryption is finished and
packets go to nf_postxfrm_input() where they will traverse the
PRE_ROUTING hook, potentially be rerouted or traverse the LOCAL_IN
hook, before beeing delivered to the ip-protocol handler. Packets
from transport-mode SAs are handled exactly like local-packets
from tunnel-mode SAs.

The only problem with this approach I'm currently aware of is that
one of the ip-protocol handlers marked with xfrm_prot
(xfrm4_tunnel.c) is also responsible for dispatching packets to
ipip.c; the encapsulated packets heading for a ipip-tunnel won't
traverse the netfilter hooks on input.

- new match "policy" for matching the policy that was used during
decapsulation (well, the used SAs, policy checks come later), and the
policy that will be used for encapsulation.

I had some examples but I accidentally killed X before sending, for
now I have just attached the test-script I used, if there are
questions just ask.


Regards
Patrick

[-- Attachment #2: 01-nf_reset.diff --]
[-- Type: text/x-patch, Size: 5407 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/22 14:27:46+01:00 kaber@trash.net 
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv6/sit.c
#   2004/02/22 14:27:39+01:00 kaber@trash.net +2 -14
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv6/ip6_tunnel.c
#   2004/02/22 14:27:39+01:00 kaber@trash.net +1 -7
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv4/ipip.c
#   2004/02/22 14:27:39+01:00 kaber@trash.net +2 -14
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv4/ip_input.c
#   2004/02/22 14:27:39+01:00 kaber@trash.net +1 -5
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# net/ipv4/ip_gre.c
#   2004/02/22 14:27:39+01:00 kaber@trash.net +2 -14
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
# include/linux/skbuff.h
#   2004/02/22 14:27:39+01:00 kaber@trash.net +12 -3
#   Add new function 'nf_reset' to reset netfilter related skb-fields
# 
diff -Nru a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h	Sun Feb 22 14:35:04 2004
+++ b/include/linux/skbuff.h	Sun Feb 22 14:35:04 2004
@@ -1204,6 +1204,14 @@
 	if (nfct)
 		atomic_inc(&nfct->master->use);
 }
+static inline void nf_reset(struct sk_buff *skb)
+{
+	nf_conntrack_put(skb->nfct);
+	skb->nfct = NULL;
+#ifdef CONFIG_NETFILTER_DEBUG
+	skb->nf_debug = 0;
+#endif
+}
 
 #ifdef CONFIG_BRIDGE_NETFILTER
 static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
@@ -1216,9 +1224,10 @@
 	if (nf_bridge)
 		atomic_inc(&nf_bridge->use);
 }
-#endif
-
-#endif
+#endif /* CONFIG_BRIDGE_NETFILTER */
+#else /* CONFIG_NETFILTER */
+static inline void nf_reset(struct sk_buff *skb) {}
+#endif /* CONFIG_NETFILTER */
 
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */
diff -Nru a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
--- a/net/ipv4/ip_gre.c	Sun Feb 22 14:35:04 2004
+++ b/net/ipv4/ip_gre.c	Sun Feb 22 14:35:04 2004
@@ -643,13 +643,7 @@
 		skb->dev = tunnel->dev;
 		dst_release(skb->dst);
 		skb->dst = NULL;
-#ifdef CONFIG_NETFILTER
-		nf_conntrack_put(skb->nfct);
-		skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-		skb->nf_debug = 0;
-#endif
-#endif
+		nf_reset(skb);
 		ipgre_ecn_decapsulate(iph, skb);
 		netif_rx(skb);
 		read_unlock(&ipgre_lock);
@@ -877,13 +871,7 @@
 		}
 	}
 
-#ifdef CONFIG_NETFILTER
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-	skb->nf_debug = 0;
-#endif
-#endif
+	nf_reset(skb);
 
 	IPTUNNEL_XMIT();
 	tunnel->recursion--;
diff -Nru a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
--- a/net/ipv4/ip_input.c	Sun Feb 22 14:35:04 2004
+++ b/net/ipv4/ip_input.c	Sun Feb 22 14:35:04 2004
@@ -202,17 +202,13 @@
 
 #ifdef CONFIG_NETFILTER_DEBUG
 	nf_debug_ip_local_deliver(skb);
-	skb->nf_debug = 0;
 #endif /*CONFIG_NETFILTER_DEBUG*/
 
 	__skb_pull(skb, ihl);
 
-#ifdef CONFIG_NETFILTER
 	/* Free reference early: we don't need it any more, and it may
            hold ip_conntrack module loaded indefinitely. */
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#endif /*CONFIG_NETFILTER*/
+	nf_reset(skb);
 
         /* Point into the IP datagram, just past the header. */
         skb->h.raw = skb->data;
diff -Nru a/net/ipv4/ipip.c b/net/ipv4/ipip.c
--- a/net/ipv4/ipip.c	Sun Feb 22 14:35:04 2004
+++ b/net/ipv4/ipip.c	Sun Feb 22 14:35:04 2004
@@ -496,13 +496,7 @@
 		skb->dev = tunnel->dev;
 		dst_release(skb->dst);
 		skb->dst = NULL;
-#ifdef CONFIG_NETFILTER
-		nf_conntrack_put(skb->nfct);
-		skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-		skb->nf_debug = 0;
-#endif
-#endif
+		nf_reset(skb);
 		ipip_ecn_decapsulate(iph, skb);
 		netif_rx(skb);
 		read_unlock(&ipip_lock);
@@ -647,13 +641,7 @@
 	if ((iph->ttl = tiph->ttl) == 0)
 		iph->ttl	=	old_iph->ttl;
 
-#ifdef CONFIG_NETFILTER
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-	skb->nf_debug = 0;
-#endif
-#endif
+	nf_reset(skb);
 
 	IPTUNNEL_XMIT();
 	tunnel->recursion--;
diff -Nru a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
--- a/net/ipv6/ip6_tunnel.c	Sun Feb 22 14:35:04 2004
+++ b/net/ipv6/ip6_tunnel.c	Sun Feb 22 14:35:04 2004
@@ -715,13 +715,7 @@
 	ipv6h->nexthdr = proto;
 	ipv6_addr_copy(&ipv6h->saddr, &fl.fl6_src);
 	ipv6_addr_copy(&ipv6h->daddr, &fl.fl6_dst);
-#ifdef CONFIG_NETFILTER
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-	skb->nf_debug = 0;
-#endif
-#endif
+	nf_reset(skb);
 	pkt_len = skb->len;
 	err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, 
 		      skb->dst->dev, dst_output);
diff -Nru a/net/ipv6/sit.c b/net/ipv6/sit.c
--- a/net/ipv6/sit.c	Sun Feb 22 14:35:04 2004
+++ b/net/ipv6/sit.c	Sun Feb 22 14:35:04 2004
@@ -388,13 +388,7 @@
 		skb->dev = tunnel->dev;
 		dst_release(skb->dst);
 		skb->dst = NULL;
-#ifdef CONFIG_NETFILTER
-		nf_conntrack_put(skb->nfct);
-		skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-		skb->nf_debug = 0;
-#endif
-#endif
+		nf_reset(skb);
 		ipip6_ecn_decapsulate(iph, skb);
 		netif_rx(skb);
 		read_unlock(&ipip6_lock);
@@ -580,13 +574,7 @@
 	if ((iph->ttl = tiph->ttl) == 0)
 		iph->ttl	=	iph6->hop_limit;
 
-#ifdef CONFIG_NETFILTER
-	nf_conntrack_put(skb->nfct);
-	skb->nfct = NULL;
-#ifdef CONFIG_NETFILTER_DEBUG
-	skb->nf_debug = 0;
-#endif
-#endif
+	nf_reset(skb);
 
 	IPTUNNEL_XMIT();
 	tunnel->recursion--;

[-- Attachment #3: 02-output_hooks.diff --]
[-- Type: text/x-patch, Size: 6927 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/22 14:29:05+01:00 kaber@trash.net 
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
# net/ipv4/xfrm4_tunnel.c
#   2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
# net/ipv4/ipcomp.c
#   2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
# net/ipv4/ip_output.c
#   2004/02/22 14:28:58+01:00 kaber@trash.net +20 -4
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
# net/ipv4/ip_forward.c
#   2004/02/22 14:28:58+01:00 kaber@trash.net +2 -1
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
# net/ipv4/esp4.c
#   2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
# net/ipv4/ah4.c
#   2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
# include/net/ip.h
#   2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
# include/linux/netfilter.h
#   2004/02/22 14:28:58+01:00 kaber@trash.net +9 -4
#   Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards
# 
diff -Nru a/include/linux/netfilter.h b/include/linux/netfilter.h
--- a/include/linux/netfilter.h	Sun Feb 22 14:35:13 2004
+++ b/include/linux/netfilter.h	Sun Feb 22 14:35:13 2004
@@ -119,12 +119,14 @@
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
+#define NF_HOOK_COND(pf, hook, skb, indev, outdev, okfn, cond)		\
+(!(cond)									\
+ ? (okfn)(skb) 								\
+ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
 #define NF_HOOK_THRESH nf_hook_slow
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
+#define NF_HOOK_COND(pf, hook, skb, indev, outdev, okfn, cond)		\
+(!(cond) || list_empty(&nf_hooks[(pf)][(hook)])				\
  ? (okfn)(skb)								\
  : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
 #define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
@@ -132,6 +134,8 @@
  ? (okfn)(skb)								\
  : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
 #endif
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
+ NF_HOOK_COND((pf), (hook), (skb), (indev), (outdev), (okfn), 1)
 
 int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
 		 struct net_device *indev, struct net_device *outdev,
@@ -164,6 +168,7 @@
 
 #else /* !CONFIG_NETFILTER */
 #define NF_HOOK(pf, hook, skb, indev, outdev, okfn) (okfn)(skb)
+#define NF_HOOK_COND NF_HOOK
 #endif /*CONFIG_NETFILTER*/
 
 #endif /*__KERNEL__*/
diff -Nru a/include/net/ip.h b/include/net/ip.h
--- a/include/net/ip.h	Sun Feb 22 14:35:13 2004
+++ b/include/net/ip.h	Sun Feb 22 14:35:13 2004
@@ -48,6 +48,7 @@
 #define IPSKB_TRANSLATED	2
 #define IPSKB_FORWARDED		4
 #define IPSKB_XFRM_TUNNEL_SIZE	8
+#define IPSKB_XFRM_TRANSFORMED	16
 };
 
 struct ipcm_cookie
diff -Nru a/net/ipv4/ah4.c b/net/ipv4/ah4.c
--- a/net/ipv4/ah4.c	Sun Feb 22 14:35:13 2004
+++ b/net/ipv4/ah4.c	Sun Feb 22 14:35:13 2004
@@ -145,6 +145,7 @@
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 	return NET_XMIT_BYPASS;
 
 error:
diff -Nru a/net/ipv4/esp4.c b/net/ipv4/esp4.c
--- a/net/ipv4/esp4.c	Sun Feb 22 14:35:13 2004
+++ b/net/ipv4/esp4.c	Sun Feb 22 14:35:13 2004
@@ -199,6 +199,7 @@
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 	return NET_XMIT_BYPASS;
 
 error:
diff -Nru a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
--- a/net/ipv4/ip_forward.c	Sun Feb 22 14:35:13 2004
+++ b/net/ipv4/ip_forward.c	Sun Feb 22 14:35:13 2004
@@ -51,7 +51,8 @@
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
 
-	return dst_output(skb);
+	return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+	                    skb->dst->dev, dst_output, skb->dst->xfrm != NULL);
 }
 
 int ip_forward(struct sk_buff *skb)
diff -Nru a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
--- a/net/ipv4/ip_output.c	Sun Feb 22 14:35:13 2004
+++ b/net/ipv4/ip_output.c	Sun Feb 22 14:35:13 2004
@@ -122,6 +122,12 @@
 	return ttl;
 }
 
+static inline int ip_dst_output(struct sk_buff *skb)
+{
+	return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
+	                    skb->dst->dev, dst_output, skb->dst->xfrm != NULL);
+}
+
 /* 
  *		Add an ip header to a skbuff and send it out.
  *
@@ -164,7 +170,7 @@
 
 	/* Send it out. */
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 }
 
 static inline int ip_finish_output2(struct sk_buff *skb)
@@ -282,7 +288,7 @@
 		return ip_finish_output(skb);
 }
 
-int ip_output(struct sk_buff *skb)
+static inline int ip_output2(struct sk_buff *skb)
 {
 	IP_INC_STATS(IpOutRequests);
 
@@ -293,6 +299,16 @@
 		return ip_finish_output(skb);
 }
 
+int ip_output(struct sk_buff *skb)
+{
+	int transformed = IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED;
+
+	if (transformed)
+		nf_reset(skb);
+	return NF_HOOK_COND(PF_INET, NF_IP_LOCAL_OUT, skb, NULL,
+	                    skb->dst->dev, ip_output2, transformed);
+}
+
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
 	struct sock *sk = skb->sk;
@@ -386,7 +402,7 @@
 	skb->priority = sk->sk_priority;
 
 	return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
-		       dst_output);
+		       ip_dst_output);
 
 no_route:
 	IP_INC_STATS(IpOutNoRoutes);
@@ -1165,7 +1181,7 @@
 
 	/* Netfilter gets whole the not fragmented skb. */
 	err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, 
-		      skb->dst->dev, dst_output);
+		      skb->dst->dev, ip_dst_output);
 	if (err) {
 		if (err > 0)
 			err = inet->recverr ? net_xmit_errno(err) : 0;
diff -Nru a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
--- a/net/ipv4/ipcomp.c	Sun Feb 22 14:35:13 2004
+++ b/net/ipv4/ipcomp.c	Sun Feb 22 14:35:13 2004
@@ -231,6 +231,7 @@
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 	err = NET_XMIT_BYPASS;
 
 out_exit:
diff -Nru a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c
--- a/net/ipv4/xfrm4_tunnel.c	Sun Feb 22 14:35:13 2004
+++ b/net/ipv4/xfrm4_tunnel.c	Sun Feb 22 14:35:13 2004
@@ -76,6 +76,7 @@
 		err = -EHOSTUNREACH;
 		goto error_nolock;
 	}
+	IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
 	return NET_XMIT_BYPASS;
 
 error_nolock:

[-- Attachment #4: 03-input-hooks.diff --]
[-- Type: text/x-patch, Size: 6024 bytes --]

===== include/linux/netfilter_ipv4.h 1.6 vs edited =====
--- 1.6/include/linux/netfilter_ipv4.h	Wed Jan  7 06:38:33 2004
+++ edited/include/linux/netfilter_ipv4.h	Sun Feb 22 19:25:19 2004
@@ -83,6 +83,21 @@
    Returns true or false. */
 extern int skb_ip_make_writable(struct sk_buff **pskb,
 				unsigned int writable_len);
+
+#ifdef CONFIG_XFRM
+extern int nf_postxfrm_input(struct sk_buff *skb);
+extern int nf_postxfrm_nonlocal(struct sk_buff *skb);
+#else /* CONFIG_XFRM */
+static inline int nf_postxfrm_input(struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline int nf_postxfrm_nonlocal(struct sk_buff *skb)
+{
+	return 0;
+}
+#endif /* CONFIG_XFRM */
 #endif /*__KERNEL__*/
 
 #endif /*__LINUX_IP_NETFILTER_H*/
===== include/net/protocol.h 1.10 vs edited =====
--- 1.10/include/net/protocol.h	Sat May 10 14:25:34 2003
+++ edited/include/net/protocol.h	Sun Feb 22 19:25:08 2004
@@ -39,6 +39,7 @@
 	int			(*handler)(struct sk_buff *skb);
 	void			(*err_handler)(struct sk_buff *skb, u32 info);
 	int			no_policy;
+	int			xfrm_prot;
 };
 
 #if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
===== net/core/netfilter.c 1.26 vs edited =====
--- 1.26/net/core/netfilter.c	Sun Sep 28 18:34:18 2003
+++ edited/net/core/netfilter.c	Sun Feb 22 19:25:19 2004
@@ -11,6 +11,7 @@
  */
 #include <linux/config.h>
 #include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
 #include <net/protocol.h>
 #include <linux/init.h>
 #include <linux/skbuff.h>
@@ -25,6 +26,7 @@
 #include <linux/icmp.h>
 #include <net/sock.h>
 #include <net/route.h>
+#include <net/xfrm.h>
 #include <linux/ip.h>
 
 #define __KERNEL_SYSCALLS__
@@ -628,9 +630,6 @@
 	struct dst_entry *odst;
 	unsigned int hh_len;
 
-	/* some non-standard hacks like ipt_REJECT.c:send_reset() can cause
-	 * packets with foreign saddr to appear on the NF_IP_LOCAL_OUT hook.
-	 */
 	if (inet_addr_type(iph->saddr) == RTN_LOCAL) {
 		fl.nl_u.ip4_u.daddr = iph->daddr;
 		fl.nl_u.ip4_u.saddr = iph->saddr;
@@ -681,6 +680,54 @@
 
 	return 0;
 }
+
+#ifdef CONFIG_XFRM
+static inline int nf_postxfrm_done(struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline int nf_postxfrm_input2(struct sk_buff *skb)
+{
+	if (inet_addr_type(skb->nh.iph->daddr) != RTN_LOCAL) {
+		if (ip_route_me_harder(&skb) == 0)
+			dst_input(skb);
+		else
+			kfree_skb(skb);
+		return -1;
+	}
+
+	return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL,
+	               nf_postxfrm_done);
+}
+
+int nf_postxfrm_input(struct sk_buff *skb)
+{
+	int off = skb->data - skb->nh.raw;
+
+	__skb_push(skb, off);
+	/* Fix header len and checksum if last xfrm was transport mode */
+	if (!skb->sp->x[skb->sp->len - 1].xvec->props.mode) {
+		skb->nh.iph->tot_len = htons(skb->len);
+		ip_send_check(skb->nh.iph);
+	}
+
+	nf_reset(skb);
+	if (NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
+	            nf_postxfrm_input2) != 0)
+		return -1;
+
+	__skb_pull(skb, off);
+	return 0;
+}
+
+int nf_postxfrm_nonlocal(struct sk_buff *skb)
+{
+	nf_reset(skb);
+	return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
+	               nf_postxfrm_done);
+}
+#endif /* CONFIG_XFRM */
 
 int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len)
 {
===== net/ipv4/ah4.c 1.30 vs edited =====
--- 1.30/net/ipv4/ah4.c	Sun Feb 22 14:28:58 2004
+++ edited/net/ipv4/ah4.c	Sun Feb 22 19:25:08 2004
@@ -344,6 +344,7 @@
 	.handler	=	xfrm4_rcv,
 	.err_handler	=	ah4_err,
 	.no_policy	=	1,
+	.xfrm_prot	=	1,
 };
 
 static int __init ah4_init(void)
===== net/ipv4/esp4.c 1.36 vs edited =====
--- 1.36/net/ipv4/esp4.c	Sun Feb 22 14:28:58 2004
+++ edited/net/ipv4/esp4.c	Sun Feb 22 19:25:09 2004
@@ -575,6 +575,7 @@
 	.handler	=	xfrm4_rcv,
 	.err_handler	=	esp4_err,
 	.no_policy	=	1,
+	.xfrm_prot	=	1,
 };
 
 static int __init esp4_init(void)
===== net/ipv4/ip_input.c 1.21 vs edited =====
--- 1.21/net/ipv4/ip_input.c	Sun Feb 22 14:27:39 2004
+++ edited/net/ipv4/ip_input.c	Sun Feb 22 21:20:47 2004
@@ -224,6 +224,12 @@
 	resubmit:
 		hash = protocol & (MAX_INET_PROTOS - 1);
 		raw_sk = sk_head(&raw_v4_htable[hash]);
+		ipprot = inet_protos[hash];
+		smp_read_barrier_depends();
+
+		if (skb->sp && !ipprot->xfrm_prot)
+			if (nf_postxfrm_input(skb))
+				goto out;
 
 		/* If there maybe a raw socket we must check - if not we
 		 * don't care less
@@ -231,10 +237,9 @@
 		if (raw_sk)
 			raw_v4_input(skb, skb->nh.iph, hash);
 
-		if ((ipprot = inet_protos[hash]) != NULL) {
+		if (ipprot != NULL) {
 			int ret;
 
-			smp_read_barrier_depends();
 			if (!ipprot->no_policy &&
 			    !xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
 				kfree_skb(skb);
@@ -279,8 +284,8 @@
 			return 0;
 	}
 
-	return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL,
-		       ip_local_deliver_finish);
+	return NF_HOOK_COND(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL,
+	                    ip_local_deliver_finish, !skb->sp);
 }
 
 static inline int ip_rcv_finish(struct sk_buff *skb)
@@ -346,6 +351,10 @@
 		}
 	}
 
+	if (skb->sp && !(((struct rtable *)skb->dst)->rt_flags&RTCF_LOCAL))
+		if (nf_postxfrm_nonlocal(skb))
+			goto drop;
+
 	return dst_input(skb);
 
 inhdr_error:
@@ -418,8 +427,8 @@
 		}
 	}
 
-	return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL,
-		       ip_rcv_finish);
+	return NF_HOOK_COND(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL,
+	                    ip_rcv_finish, !skb->sp);
 
 inhdr_error:
 	IP_INC_STATS_BH(IpInHdrErrors);
===== net/ipv4/ipcomp.c 1.19 vs edited =====
--- 1.19/net/ipv4/ipcomp.c	Sun Feb 22 14:28:58 2004
+++ edited/net/ipv4/ipcomp.c	Sun Feb 22 19:25:09 2004
@@ -408,6 +408,7 @@
 	.handler	=	xfrm4_rcv,
 	.err_handler	=	ipcomp4_err,
 	.no_policy	=	1,
+	.xfrm_prot	=	1,
 };
 
 static int __init ipcomp4_init(void)
===== net/ipv4/xfrm4_tunnel.c 1.10 vs edited =====
--- 1.10/net/ipv4/xfrm4_tunnel.c	Sun Feb 22 14:28:58 2004
+++ edited/net/ipv4/xfrm4_tunnel.c	Sun Feb 22 19:25:10 2004
@@ -171,6 +171,7 @@
 	.handler	=	ipip_rcv,
 	.err_handler	=	ipip_err,
 	.no_policy	=	1,
+	.xfrm_prot	=	1,
 };
 
 static int __init ipip_init(void)

[-- Attachment #5: 04-nat-policy_lookup.diff --]
[-- Type: text/x-patch, Size: 5352 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/22 14:31:33+01:00 kaber@trash.net 
#   Add policy lookups to ip_route_me_harder, make NAT reroute for any change
#   in route/policy key
# 
# net/ipv4/netfilter/ip_nat_standalone.c
#   2004/02/22 14:31:26+01:00 kaber@trash.net +72 -8
#   Add policy lookups to ip_route_me_harder, make NAT reroute for any change
#   in route/policy key
# 
# net/ipv4/netfilter/ip_conntrack_standalone.c
#   2004/02/22 14:31:26+01:00 kaber@trash.net +1 -0
#   Add policy lookups to ip_route_me_harder, make NAT reroute for any change
#   in route/policy key
# 
# net/core/netfilter.c
#   2004/02/22 14:31:26+01:00 kaber@trash.net +15 -0
#   Add policy lookups to ip_route_me_harder, make NAT reroute for any change
#   in route/policy key
# 
diff -Nru a/net/core/netfilter.c b/net/core/netfilter.c
--- a/net/core/netfilter.c	Sun Feb 22 14:35:30 2004
+++ b/net/core/netfilter.c	Sun Feb 22 14:35:30 2004
@@ -27,6 +27,7 @@
 #include <net/sock.h>
 #include <net/route.h>
 #include <net/xfrm.h>
+#include <net/ip.h>
 #include <linux/ip.h>
 
 #define __KERNEL_SYSCALLS__
@@ -663,6 +664,20 @@
 	
 	if ((*pskb)->dst->error)
 		return -1;
+
+#ifdef CONFIG_XFRM
+	if (!(IPCB(*pskb)->flags & IPSKB_XFRM_TRANSFORMED)) {
+		struct xfrm_policy_afinfo *afinfo;
+
+		afinfo = xfrm_policy_get_afinfo(AF_INET);
+		if (afinfo != NULL) {
+			afinfo->decode_session(*pskb, &fl);
+			xfrm_policy_put_afinfo(afinfo);
+			if (xfrm_lookup(&(*pskb)->dst, &fl, (*pskb)->sk, 0) != 0)
+				return -1;
+		}
+	}
+#endif
 
 	/* Change in oif may mean change in hh_len. */
 	hh_len = (*pskb)->dst->dev->hard_header_len;
diff -Nru a/net/ipv4/netfilter/ip_conntrack_standalone.c b/net/ipv4/netfilter/ip_conntrack_standalone.c
--- a/net/ipv4/netfilter/ip_conntrack_standalone.c	Sun Feb 22 14:35:30 2004
+++ b/net/ipv4/netfilter/ip_conntrack_standalone.c	Sun Feb 22 14:35:30 2004
@@ -489,6 +489,7 @@
 EXPORT_SYMBOL(ip_conntrack_alter_reply);
 EXPORT_SYMBOL(ip_conntrack_destroyed);
 EXPORT_SYMBOL(ip_conntrack_get);
+EXPORT_SYMBOL(__ip_conntrack_confirm);
 EXPORT_SYMBOL(need_ip_conntrack);
 EXPORT_SYMBOL(ip_conntrack_helper_register);
 EXPORT_SYMBOL(ip_conntrack_helper_unregister);
diff -Nru a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c
--- a/net/ipv4/netfilter/ip_nat_standalone.c	Sun Feb 22 14:35:30 2004
+++ b/net/ipv4/netfilter/ip_nat_standalone.c	Sun Feb 22 14:35:30 2004
@@ -166,6 +166,45 @@
 	return do_bindings(ct, ctinfo, info, hooknum, pskb);
 }
 
+struct nat_route_key
+{
+	u_int32_t addr;
+#ifdef CONFIG_XFRM
+	u_int16_t port;
+#endif
+};
+
+static inline void
+nat_route_key_get(struct sk_buff *skb, struct nat_route_key *key, int which)
+{
+	struct iphdr *iph = skb->nh.iph;
+
+	key->addr = which ? iph->daddr : iph->saddr;
+#ifdef CONFIG_XFRM
+	key->port = 0;
+	if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) {
+		u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4);
+		key->port = ports[which];
+	}
+#endif
+}
+
+static inline int
+nat_route_key_compare(struct sk_buff *skb, struct nat_route_key *key, int which)
+{
+	struct iphdr *iph = skb->nh.iph;
+
+	if (key->addr != (which ? iph->daddr : iph->saddr))
+		return 1;
+#ifdef CONFIG_XFRM
+	if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) {
+		u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4);
+		if (key->port != ports[which])
+			return 1;
+	}
+#endif
+}
+
 static unsigned int
 ip_nat_out(unsigned int hooknum,
 	   struct sk_buff **pskb,
@@ -173,6 +212,9 @@
 	   const struct net_device *out,
 	   int (*okfn)(struct sk_buff *))
 {
+	struct nat_route_key key;
+	unsigned int ret;
+
 	/* root is playing with raw sockets. */
 	if ((*pskb)->len < sizeof(struct iphdr)
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
@@ -195,7 +237,29 @@
 			return NF_STOLEN;
 	}
 
-	return ip_nat_fn(hooknum, pskb, in, out, okfn);
+	nat_route_key_get(*pskb, &key, 0);
+	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
+
+	if (ret != NF_DROP && ret != NF_STOLEN
+	    && nat_route_key_compare(*pskb, &key, 0)) {
+		if (ip_route_me_harder(pskb) != 0)
+			ret = NF_DROP;
+#ifdef CONFIG_XFRM
+		/*
+		 * POST_ROUTING hook is called with fixed outfn, we need
+		 * to manually confirm the packet and direct it to the
+		 * transformers if a policy matches.
+		 */
+		else if ((*pskb)->dst->xfrm != NULL) {
+			ret = ip_conntrack_confirm(*pskb);
+			if (ret != NF_DROP) {
+				dst_output(*pskb);
+				ret = NF_STOLEN;
+			}
+		}
+#endif
+	}
+	return ret;
 }
 
 #ifdef CONFIG_IP_NF_NAT_LOCAL
@@ -206,7 +270,7 @@
 		const struct net_device *out,
 		int (*okfn)(struct sk_buff *))
 {
-	u_int32_t saddr, daddr;
+	struct nat_route_key key;
 	unsigned int ret;
 
 	/* root is playing with raw sockets. */
@@ -214,14 +278,14 @@
 	    || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr))
 		return NF_ACCEPT;
 
-	saddr = (*pskb)->nh.iph->saddr;
-	daddr = (*pskb)->nh.iph->daddr;
-
+	nat_route_key_get(*pskb, &key, 1);
 	ret = ip_nat_fn(hooknum, pskb, in, out, okfn);
+
 	if (ret != NF_DROP && ret != NF_STOLEN
-	    && ((*pskb)->nh.iph->saddr != saddr
-		|| (*pskb)->nh.iph->daddr != daddr))
-		return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP;
+	    && nat_route_key_compare(*pskb, &key, 1)) {
+		if (ip_route_me_harder(pskb) != 0)
+			ret = NF_DROP;
+	}
 	return ret;
 }
 #endif

[-- Attachment #6: 05-nat-policy_checks.diff --]
[-- Type: text/x-patch, Size: 5812 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/02/22 14:33:14+01:00 kaber@trash.net 
#   Make policy checks find correct policy after NAT
# 
# net/xfrm/xfrm_policy.c
#   2004/02/22 14:33:07+01:00 kaber@trash.net +2 -0
#   Make policy checks find correct policy after NAT
# 
# net/ipv4/udp.c
#   2004/02/22 14:33:07+01:00 kaber@trash.net +2 -0
#   Make policy checks find correct policy after NAT
# 
# net/ipv4/tcp_ipv4.c
#   2004/02/22 14:33:07+01:00 kaber@trash.net +1 -0
#   Make policy checks find correct policy after NAT
# 
# net/ipv4/raw.c
#   2004/02/22 14:33:07+01:00 kaber@trash.net +1 -0
#   Make policy checks find correct policy after NAT
# 
# net/ipv4/ip_input.c
#   2004/02/22 14:33:07+01:00 kaber@trash.net +6 -8
#   Make policy checks find correct policy after NAT
# 
# net/core/netfilter.c
#   2004/02/22 14:33:07+01:00 kaber@trash.net +43 -0
#   Make policy checks find correct policy after NAT
# 
# include/linux/netfilter.h
#   2004/02/22 14:33:07+01:00 kaber@trash.net +16 -0
#   Make policy checks find correct policy after NAT
# 
diff -Nru a/include/linux/netfilter.h b/include/linux/netfilter.h
--- a/include/linux/netfilter.h	Sun Feb 22 14:35:36 2004
+++ b/include/linux/netfilter.h	Sun Feb 22 14:35:36 2004
@@ -171,5 +171,21 @@
 #define NF_HOOK_COND NF_HOOK
 #endif /*CONFIG_NETFILTER*/
 
+#ifdef CONFIG_XFRM
+#ifdef CONFIG_IP_NF_NAT_NEEDED
+struct flowi;
+extern void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl);
+
+static inline void
+nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, int family)
+{
+	if (family == AF_INET)
+		nf_nat_decode_session4(skb, fl);
+}
+#else /* CONFIG_IP_NF_NAT_NEEDED */
+#define nf_nat_decode_session(skb,fl,family)
+#endif /* CONFIG_IP_NF_NAT_NEEDED */
+#endif /* CONFIG_XFRM */
+
 #endif /*__KERNEL__*/
 #endif /*__LINUX_NETFILTER_H*/
diff -Nru a/net/core/netfilter.c b/net/core/netfilter.c
--- a/net/core/netfilter.c	Sun Feb 22 14:35:36 2004
+++ b/net/core/netfilter.c	Sun Feb 22 14:35:36 2004
@@ -747,6 +747,49 @@
 	return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL,
 	               nf_postxfrm_done);
 }
+
+#ifdef CONFIG_IP_NF_NAT_NEEDED
+#include <linux/netfilter_ipv4/ip_conntrack.h>
+#include <linux/netfilter_ipv4/ip_nat.h>
+
+void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl)
+{
+	struct ip_conntrack *ct;
+	struct ip_conntrack_tuple *t;
+	struct ip_nat_info_manip *m;
+	unsigned int i;
+
+	if (skb->nfct == NULL)
+		return;
+	ct = (struct ip_conntrack *)skb->nfct->master;
+
+	for (i = 0; i < ct->nat.info.num_manips; i++) {
+		m = &ct->nat.info.manips[i];
+		t = &ct->tuplehash[m->direction].tuple;
+
+		switch (m->hooknum) {
+		case NF_IP_PRE_ROUTING:
+			if (m->maniptype != IP_NAT_MANIP_DST)
+				break;
+			fl->fl4_dst = t->dst.ip;
+			if (t->dst.protonum == IPPROTO_TCP ||
+			    t->dst.protonum == IPPROTO_UDP)
+				fl->fl_ip_dport = t->dst.u.tcp.port;
+			break;
+#ifdef CONFIG_IP_NF_NAT_LOCAL
+		case NF_IP_LOCAL_IN:
+			if (m->maniptype != IP_NAT_MANIP_SRC)
+				break;
+			fl->fl4_src = t->src.ip;
+			if (t->dst.protonum == IPPROTO_TCP ||
+			    t->dst.protonum == IPPROTO_UDP)
+				fl->fl_ip_sport = t->src.u.tcp.port;
+			break;
+#endif
+		}
+	}
+}
+#endif /* CONFIG_IP_NF_NAT_NEEDED */
 #endif /* CONFIG_XFRM */
 
 int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len)
diff -Nru a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
--- a/net/ipv4/ip_input.c	Sun Feb 22 14:35:36 2004
+++ b/net/ipv4/ip_input.c	Sun Feb 22 14:35:36 2004
@@ -206,10 +206,6 @@
 
 	__skb_pull(skb, ihl);
 
-	/* Free reference early: we don't need it any more, and it may
-           hold ip_conntrack module loaded indefinitely. */
-	nf_reset(skb);
-
         /* Point into the IP datagram, just past the header. */
         skb->h.raw = skb->data;
 
@@ -240,10 +236,12 @@
 		if (ipprot != NULL) {
 			int ret;
 
-			if (!ipprot->no_policy &&
-			    !xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-				kfree_skb(skb);
-				goto out;
+			if (!ipprot->no_policy) {
+				if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+					kfree_skb(skb);
+					goto out;
+				}
+				nf_reset(skb);
 			}
 			ret = ipprot->handler(skb);
 			if (ret < 0) {
diff -Nru a/net/ipv4/raw.c b/net/ipv4/raw.c
--- a/net/ipv4/raw.c	Sun Feb 22 14:35:36 2004
+++ b/net/ipv4/raw.c	Sun Feb 22 14:35:36 2004
@@ -249,6 +249,7 @@
 		kfree_skb(skb);
 		return NET_RX_DROP;
 	}
+	nf_reset(skb);
 
 	skb_push(skb, skb->data - skb->nh.raw);
 
diff -Nru a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
--- a/net/ipv4/tcp_ipv4.c	Sun Feb 22 14:35:36 2004
+++ b/net/ipv4/tcp_ipv4.c	Sun Feb 22 14:35:36 2004
@@ -1785,6 +1785,7 @@
 
 	if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
 		goto discard_and_relse;
+	nf_reset(skb);
 
 	if (sk_filter(sk, skb, 0))
 		goto discard_and_relse;
diff -Nru a/net/ipv4/udp.c b/net/ipv4/udp.c
--- a/net/ipv4/udp.c	Sun Feb 22 14:35:36 2004
+++ b/net/ipv4/udp.c	Sun Feb 22 14:35:36 2004
@@ -1027,6 +1027,7 @@
 		kfree_skb(skb);
 		return -1;
 	}
+	nf_reset(skb);
 
 	if (up->encap_type) {
 		/*
@@ -1192,6 +1193,7 @@
 
 	if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 		goto drop;
+	nf_reset(skb);
 
 	/* No socket. Drop packet silently, if checksum is wrong */
 	if (udp_checksum_complete(skb))
diff -Nru a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
--- a/net/xfrm/xfrm_policy.c	Sun Feb 22 14:35:36 2004
+++ b/net/xfrm/xfrm_policy.c	Sun Feb 22 14:35:36 2004
@@ -21,6 +21,7 @@
 #include <linux/workqueue.h>
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
+#include <linux/netfilter.h>
 #include <net/xfrm.h>
 #include <net/ip.h>
 
@@ -908,6 +909,7 @@
 
 	if (_decode_session(skb, &fl, family) < 0)
 		return 0;
+	nf_nat_decode_session(skb, &fl, family);
 
 	/* First, check used SA against their selectors. */
 	if (skb->sp) {

[-- Attachment #7: 06-policy-match.diff --]
[-- Type: text/x-patch, Size: 7715 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/03/02 06:08:55+01:00 kaber@trash.net 
#   Add policy match
# 
# net/ipv4/netfilter/ipt_policy.c
#   2004/03/02 06:08:46+01:00 kaber@trash.net +176 -0
# 
# include/linux/netfilter_ipv4/ipt_policy.h
#   2004/03/02 06:08:46+01:00 kaber@trash.net +52 -0
# 
# net/ipv4/netfilter/ipt_policy.c
#   2004/03/02 06:08:46+01:00 kaber@trash.net +0 -0
#   BitKeeper file /home/kaber/src/nf/ipsec/linux-2.6/net/ipv4/netfilter/ipt_policy.c
# 
# net/ipv4/netfilter/Makefile
#   2004/03/02 06:08:46+01:00 kaber@trash.net +1 -0
#   Add policy match
# 
# net/ipv4/netfilter/Kconfig
#   2004/03/02 06:08:46+01:00 kaber@trash.net +10 -0
#   Add policy match
# 
# include/linux/netfilter_ipv4/ipt_policy.h
#   2004/03/02 06:08:46+01:00 kaber@trash.net +0 -0
#   BitKeeper file /home/kaber/src/nf/ipsec/linux-2.6/include/linux/netfilter_ipv4/ipt_policy.h
# 
diff -Nru a/include/linux/netfilter_ipv4/ipt_policy.h b/include/linux/netfilter_ipv4/ipt_policy.h
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/include/linux/netfilter_ipv4/ipt_policy.h	Tue Mar  2 06:53:14 2004
@@ -0,0 +1,52 @@
+#ifndef _IPT_POLICY_H
+#define _IPT_POLICY_H
+
+#define POLICY_MAX_ELEM	4
+
+enum ipt_policy_flags
+{
+	POLICY_MATCH_IN		= 0x1,
+	POLICY_MATCH_OUT	= 0x2,
+	POLICY_MATCH_NONE	= 0x4,
+	POLICY_MATCH_STRICT	= 0x8,
+};
+
+enum ipt_policy_modes
+{
+	POLICY_MODE_TRANSPORT,
+	POLICY_MODE_TUNNEL
+};
+
+struct ipt_policy_spec
+{
+	u_int8_t	saddr:1,
+			daddr:1,
+			proto:1,
+			mode:1,
+			spi:1,
+			reqid:1;
+};
+
+struct ipt_policy_elem
+{
+	u_int32_t	saddr;
+	u_int32_t	smask;
+	u_int32_t	daddr;
+	u_int32_t	dmask;
+	u_int32_t	spi;
+	u_int32_t	reqid;
+	u_int8_t	proto;
+	u_int8_t	mode;
+
+	struct ipt_policy_spec	match;
+	struct ipt_policy_spec	invert;
+};
+
+struct ipt_policy_info
+{
+	struct ipt_policy_elem pol[POLICY_MAX_ELEM];
+	u_int16_t flags;
+	u_int16_t len;
+};
+
+#endif /* _IPT_POLICY_H */
diff -Nru a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
--- a/net/ipv4/netfilter/Kconfig	Tue Mar  2 06:53:14 2004
+++ b/net/ipv4/netfilter/Kconfig	Tue Mar  2 06:53:14 2004
@@ -127,6 +127,16 @@
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_NF_MATCH_POLICY
+	tristate "IPsec policy match support"
+	depends on IP_NF_IPTABLES && XFRM
+	help
+	  Policy matching allows you to match packets based on the
+	  IPsec policy that was used during decapsulation/will
+	  be used during encapsulation.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config IP_NF_MATCH_MARK
 	tristate "netfilter MARK match support"
 	depends on IP_NF_IPTABLES
diff -Nru a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
--- a/net/ipv4/netfilter/Makefile	Tue Mar  2 06:53:14 2004
+++ b/net/ipv4/netfilter/Makefile	Tue Mar  2 06:53:14 2004
@@ -65,6 +65,7 @@
 obj-$(CONFIG_IP_NF_MATCH_TCPMSS) += ipt_tcpmss.o
 
 obj-$(CONFIG_IP_NF_MATCH_PHYSDEV) += ipt_physdev.o
+obj-$(CONFIG_IP_NF_MATCH_POLICY) += ipt_policy.o
 
 # targets
 obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o
diff -Nru a/net/ipv4/netfilter/ipt_policy.c b/net/ipv4/netfilter/ipt_policy.c
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/net/ipv4/netfilter/ipt_policy.c	Tue Mar  2 06:53:14 2004
@@ -0,0 +1,176 @@
+/* IP tables module for matching IPsec policy
+ *
+ * Copyright (c) 2004 Patrick McHardy, <kaber@trash.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/init.h>
+#include <net/xfrm.h>
+
+#include <linux/netfilter_ipv4.h>
+#include <linux/netfilter_ipv4/ipt_policy.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>");
+MODULE_DESCRIPTION("IPtables IPsec policy matching module");
+MODULE_LICENSE("GPL");
+
+
+static inline int
+match_xfrm_state(struct xfrm_state *x, const struct ipt_policy_elem *e)
+{
+#define MISMATCH(x,y)	(e->match.x && ((e->x != (y)) ^ e->invert.x))
+
+	if (MISMATCH(saddr, x->props.saddr.a4 & e->smask) ||
+	    MISMATCH(daddr, x->id.daddr.a4 & e->dmask) ||
+	    MISMATCH(proto, x->id.proto) ||
+	    MISMATCH(mode, x->props.mode) ||
+	    MISMATCH(spi, x->id.spi) ||
+	    MISMATCH(reqid, x->props.reqid))
+		return 0;
+	return 1;
+}
+
+static int
+match_policy_in(const struct sk_buff *skb, const struct ipt_policy_info *info)
+{
+	const struct ipt_policy_elem *e;
+	struct sec_path *sp = skb->sp;
+	int strict = info->flags & POLICY_MATCH_STRICT;
+	int i, pos;
+
+	if (sp == NULL)
+		return -1;
+	if (strict && info->len != sp->len)
+		return 0;
+
+	for (i = sp->len - 1; i >= 0; i--) {
+		pos = strict ? i - sp->len + 1 : 0;
+		if (pos >= info->len)
+			return 0;
+		e = &info->pol[pos];
+
+		if (match_xfrm_state(sp->x[i].xvec, e)) {
+			if (!strict)
+				return 1;
+		} else if (strict)
+			return 0;
+	}
+
+	return strict ? 1 : 0;
+}
+
+static int
+match_policy_out(const struct sk_buff *skb, const struct ipt_policy_info *info)
+{
+	const struct ipt_policy_elem *e;
+	struct dst_entry *dst = skb->dst;
+	int strict = info->flags & POLICY_MATCH_STRICT;
+	int i, pos;
+
+	if (dst->xfrm == NULL)
+		return -1;
+
+	for (i = 0; dst && dst->xfrm; dst = dst->child, i++) {
+		pos = strict ? i : 0;
+		if (pos >= info->len)
+			return 0;
+		e = &info->pol[pos];
+
+		if (match_xfrm_state(dst->xfrm, e)) {
+			if (!strict)
+				return 1;
+		} else if (strict)
+			return 0;
+	}
+
+	return strict ? 1 : 0;
+}
+
+static int match(const struct sk_buff *skb,
+                 const struct net_device *in,
+                 const struct net_device *out,
+                 const void *matchinfo, int offset, int *hotdrop)
+{
+	const struct ipt_policy_info *info = matchinfo;
+	int ret;
+
+	if (info->flags & POLICY_MATCH_IN)
+		ret = match_policy_in(skb, info);
+	else
+		ret = match_policy_out(skb, info);
+
+	if (ret < 0) {
+		if (info->flags & POLICY_MATCH_NONE)
+			ret = 1;
+		else
+			ret = 0;
+	} else if (info->flags & POLICY_MATCH_NONE)
+		ret = 0;
+
+	return ret;
+}
+
+static int checkentry(const char *tablename, const struct ipt_ip *ip,
+                      void *matchinfo, unsigned int matchsize,
+                      unsigned int hook_mask)
+{
+	struct ipt_policy_info *info = matchinfo;
+
+	if (matchsize != IPT_ALIGN(sizeof(*info))) {
+		printk(KERN_ERR "ipt_policy: matchsize %u != %u\n",
+		       matchsize, IPT_ALIGN(sizeof(*info)));
+		return 0;
+	}
+	if (!(info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT))) {
+		printk(KERN_ERR "ipt_policy: neither incoming nor "
+		                "outgoing policy selected\n");
+		return 0;
+	}
+	if (hook_mask & (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_LOCAL_IN)
+	    && info->flags & POLICY_MATCH_OUT) {
+		printk(KERN_ERR "ipt_policy: output policy not valid in "
+		                "PRE_ROUTING and INPUT\n");
+		return 0;
+	}
+	if (hook_mask & (1 << NF_IP_POST_ROUTING | 1 << NF_IP_LOCAL_OUT)
+	    && info->flags & POLICY_MATCH_IN) {
+		printk(KERN_ERR "ipt_policy: input policy not valid in "
+		                "POST_ROUTING and OUTPUT\n");
+		return 0;
+	}
+	if (info->len > POLICY_MAX_ELEM) {
+		printk(KERN_ERR "ipt_policy: too many policy elements\n");
+		return 0;
+	}
+
+	return 1;
+}
+
+static struct ipt_match policy_match =
+{
+	.name		= "policy",
+	.match		= match,
+	.checkentry 	= checkentry,
+	.me		= THIS_MODULE,
+};
+
+static int __init init(void)
+{
+	return ipt_register_match(&policy_match);
+}
+
+static void __exit fini(void)
+{
+	ipt_unregister_match(&policy_match);
+}
+
+module_init(init);
+module_exit(fini);

[-- Attachment #8: libipt_policy.diff --]
[-- Type: text/x-patch, Size: 14936 bytes --]

diff -urN a/extensions/.policy-test b/extensions/.policy-test
--- a/extensions/.policy-test	2004-02-23 20:39:37.000000000 +0100
+++ b/extensions/.policy-test	2004-03-04 21:41:39.000000000 +0100
@@ -1,2 +1,3 @@
 #!/bin/sh
-[ -f $KERNEL_DIR/net/ipv4/netfilter/ipt_policy.c -a -f $KERNEL_DIR/include/linux/netfilter_ipv4/ipt_policy.h ] && echo policy
+#
+[ -f $KERNEL_DIR/include/linux/netfilter_ipv4/ipt_policy.h ] && echo policy
diff -urN a/extensions/libipt_policy.c b/extensions/libipt_policy.c
--- a/extensions/libipt_policy.c	2004-02-23 20:38:37.000000000 +0100
+++ b/extensions/libipt_policy.c	2004-03-04 21:41:32.000000000 +0100
@@ -1,3 +1,4 @@
+/* Shared library add-on to iptables to add policy support. */
 #include <stdio.h>
 #include <netdb.h>
 #include <string.h>
@@ -14,72 +15,82 @@
 #include <linux/netfilter_ipv4/ip_tables.h>
 #include <linux/netfilter_ipv4/ipt_policy.h>
 
+/* 
+ * HACK: global pointer to current matchinfo for making
+ * final checks and adjustments in final_check.
+ */
+static struct ipt_policy_info *policy_info;
+
 static void help(void)
 {
 	printf(
 "policy v%s options:\n"
-"[!] --reqid reqid		Match reqid\n"
-"[!] --spi spi			Match SPI\n"
-"[!] --proto proto		Match protocol (ah/esp/ipcomp)\n"
-"[!] --mode mode 		Match mode (transport/tunnel)\n"
-"[!] --local addr/mask		Match local tunnel endpoint\n"
-"[!] --remote addr/mask		Match remote tunnel endpoint\n"
-"  --path 			Match path instead of single element at\n"
-"				any position\n"
-"  --next 			Begin next element in path\n",
+"  --dir in|out			match policy applied during decapsulation/\n"
+"				policy to be applied during encapsulation\n"
+"  --pol none|ipsec		match policy\n"
+"  --strict 			match entire policy instead of single element\n"
+"				at any position\n"
+"[!] --reqid reqid		match reqid\n"
+"[!] --spi spi			match SPI\n"
+"[!] --proto proto		match protocol (ah/esp/ipcomp)\n"
+"[!] --mode mode 		match mode (transport/tunnel)\n"
+"[!] --tunnel-src addr/mask	match tunnel source\n"
+"[!] --tunnel-dst addr/mask	match tunnel destination\n"
+"  --next 			begin next element in policy\n",
 	IPTABLES_VERSION);
 }
 
-static struct option opts[] = {
+static struct option opts[] =
+{
 	{
-		.name		= "reqid",
+		.name		= "dir",
 		.has_arg	= 1,
-		.flag		= 0,
 		.val		= '1',
 	},
 	{
-		.name		= "spi",
+		.name		= "pol",
 		.has_arg	= 1,
-		.flag		= 0,
-		.val		= '2'
+		.val		= '2',
 	},
 	{
-		.name		= "local",
-		.has_arg	= 1,
-		.flag		= 0,
+		.name		= "strict",
 		.val		= '3'
 	},
 	{
-		.name		= "remote",
+		.name		= "reqid",
 		.has_arg	= 1,
-		.flag		= 0,
-		.val		= '4'
+		.val		= '4',
 	},
 	{
-		.name		= "proto",
+		.name		= "spi",
 		.has_arg	= 1,
-		.flag		= 0,
 		.val		= '5'
 	},
 	{
-		.name		= "mode",
+		.name		= "tunnel-src",
 		.has_arg	= 1,
-		.flag		= 0,
 		.val		= '6'
 	},
 	{
-		.name		= "path",
-		.has_arg	= 0,
-		.flag		= 0,
+		.name		= "tunnel-dst",
+		.has_arg	= 1,
 		.val		= '7'
 	},
 	{
-		.name		= "next",
-		.has_arg	= 0,
-		.flag		= 0,
+		.name		= "proto",
+		.has_arg	= 1,
 		.val		= '8'
 	},
-	{ 0 }
+	{
+		.name		= "mode",
+		.has_arg	= 1,
+		.val		= '9'
+	},
+	{
+		.name		= "next",
+		.val		= 'a'
+	},
+	{ }
 };
 
 static void init(struct ipt_entry_match *m, unsigned int *nfcache)
@@ -87,154 +98,201 @@
 	*nfcache |= NFC_UNKNOWN;
 }
 
+static int parse_direction(char *s)
+{
+	if (strcmp(s, "in") == 0)
+		return POLICY_MATCH_IN;
+	if (strcmp(s, "out") == 0)
+		return POLICY_MATCH_OUT;
+	exit_error(PARAMETER_PROBLEM, "policy_match: invalid dir `%s'", s);
+}
+
+static int parse_policy(char *s)
+{
+	if (strcmp(s, "none") == 0)
+		return POLICY_MATCH_NONE;
+	if (strcmp(s, "ipsec") == 0)
+		return 0;
+	exit_error(PARAMETER_PROBLEM, "policy match: invalid policy `%s'", s);
+}
+
 static int parse_mode(char *s)
 {
 	if (strcmp(s, "transport") == 0)
-		return SECPATH_MODE_TRANSPORT;
+		return POLICY_MODE_TRANSPORT;
 	if (strcmp(s, "tunnel") == 0)
-		return SECPATH_MODE_TUNNEL;
-	return -EINVAL;
+		return POLICY_MODE_TUNNEL;
+	exit_error(PARAMETER_PROBLEM, "policy match: invalid mode `%s'", s);
 }
 
-static int parse(int c, char **argv, int invert, unsigned int *i,
+static int parse(int c, char **argv, int invert, unsigned int *flags,
                  const struct ipt_entry *entry,
                  unsigned int *nfcache,
                  struct ipt_entry_match **match)
 {
 	struct ipt_policy_info *info = (void *)(*match)->data;
-	struct ipt_policy_elem *e = &info->path[*i];
+	struct ipt_policy_elem *e = &info->pol[info->len];
 	struct in_addr *addr = NULL, mask;
-	struct protoent *p;
 	unsigned int naddr = 0;
-	int mode, proto;
+	int mode;
 
 	check_inverse(optarg, &invert, &optind, 0);
 
 	switch (c) {
 	case '1':
+		if (info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT))
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --dir option");
+		if (invert)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: can't invert --dir option");
+
+		info->flags |= parse_direction(argv[optind-1]);
+		break;
+	case '2':
+		if (invert)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: can't invert --policy option");
+
+		info->flags |= parse_policy(argv[optind-1]);
+		break;
+	case '3':
+		if (info->flags & POLICY_MATCH_STRICT)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --strict option");
+
+		if (invert)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: can't invert --strict option");
+
+		info->flags |= POLICY_MATCH_STRICT;
+		break;
+	case '4':
 		if (e->match.reqid)
 			exit_error(PARAMETER_PROBLEM,
-			           "Can't specify --reqid twice");
+			           "policy match: double --reqid option");
 
 		e->match.reqid = 1;
 		e->invert.reqid = invert;
 		e->reqid = strtol(argv[optind-1], NULL, 10);
 		break;
-	case '2':
+	case '5':
 		if (e->match.spi)
 			exit_error(PARAMETER_PROBLEM,
-			           "Can't specify --spi twice");
+			           "policy match: double --spi option");
 		
 		e->match.spi = 1;
 		e->invert.spi = invert;
 		e->spi = strtol(argv[optind-1], NULL, 0x10);
 		break;
-	case '3':
-		if (e->match.daddr)
-			exit_error(PARAMETER_PROBLEM,
-			           "Can't specify --local twice");
-		
-		parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr);
-		if (naddr > 1)
-			exit_error(PARAMETER_PROBLEM,
-			           "Multiple IP addresses are not allowed");
-
-		e->match.daddr = 1;
-		e->invert.daddr = invert;
-		e->daddr = addr[0].s_addr;
-		e->dmask = mask.s_addr;
-		break;
-	case '4':
+	case '6':
 		if (e->match.saddr)
 			exit_error(PARAMETER_PROBLEM,
-			           "Can't specify --remote twice");
+			           "policy match: double --tunnel-src option");
 
 		parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr);
 		if (naddr > 1)
 			exit_error(PARAMETER_PROBLEM,
-			           "Multiple IP addresses are not allowed");
+			           "policy match: name resolves to multiple IPs");
 
 		e->match.saddr = 1;
 		e->invert.saddr = invert;
 		e->saddr = addr[0].s_addr;
 		e->smask = mask.s_addr;
                 break;
-	case '5':
-		if (e->match.proto)
+	case '7':
+		if (e->match.daddr)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --tunnel-dst option");
+
+		parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr);
+		if (naddr > 1)
 			exit_error(PARAMETER_PROBLEM,
-			           "Can't specify --proto twice");
+			           "policy match: name resolves to multiple IPs");
 
-		p = getprotobyname(argv[optind-1]);
-		if (p != NULL)
-			proto = p->p_proto;
-		else if (!(proto = atoi(argv[optind-1])))
+		e->match.daddr = 1;
+		e->invert.daddr = invert;
+		e->daddr = addr[0].s_addr;
+		e->dmask = mask.s_addr;
+		break;
+	case '8':
+		if (e->match.proto)
 			exit_error(PARAMETER_PROBLEM,
-			           "Unknown protocol `%s'\n",
-			           argv[optind-1]);
+			           "policy match: double --proto option");
 
+		e->proto = parse_protocol(argv[optind-1]);
+		if (e->proto != IPPROTO_AH && e->proto != IPPROTO_ESP &&
+		    e->proto != IPPROTO_COMP)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: protocol must ah/esp/ipcomp");
 		e->match.proto = 1;
 		e->invert.proto = invert;
-		e->proto = proto;
 		break;
-	case '6':
+	case '9':
 		if (e->match.mode)
 			exit_error(PARAMETER_PROBLEM,
-			           "Can't specify --mode twice");
+			           "policy match: double --mode option");
 		
 		mode = parse_mode(argv[optind-1]);
-		if (mode < 0)
-			exit_error(PARAMETER_PROBLEM,
-			           "Unknown mode `%s'\n",
-			           argv[optind-1]);
-
 		e->match.mode = 1;
 		e->invert.mode = invert;
 		e->mode = mode;
 		break;
-	case '7':
-		if (info->flags & MATCH_PATH)
-			exit_error(PARAMETER_PROBLEM,
-			           "Can't specify --path twice");
-
+	case 'a':
 		if (invert)
 			exit_error(PARAMETER_PROBLEM,
-			           "Can't invert `--path' option");
+			           "policy match: can't invert --next option");
 
-		info->flags |= MATCH_PATH;
-		break;
-	case '8':
-		if (invert)
+		if (++info->len == POLICY_MAX_ELEM)
 			exit_error(PARAMETER_PROBLEM,
-			           "Can't invert `--next' option");
-
-		if (++(*i) == SECPATH_MAX_DEPTH)
-			exit_error(PARAMETER_PROBLEM,
-			           "Maximum path depth reached");
-
+			           "policy match: maximum policy depth reached");
 		break;
 	default:
 		return 0;
 	}
 
-	info->len = *i + 1;
+	policy_info = info;
 	return 1;
 }
 
 static void final_check(unsigned int flags)
 {
-	return;
-}
+	struct ipt_policy_info *info = policy_info;
+	struct ipt_policy_elem *e;
+	int i;
 
-static void print_proto(char *prefix, u_int8_t proto, int numeric)
-{
-	struct protoent *p = NULL;
+	if (info == NULL)
+		exit_error(PARAMETER_PROBLEM,
+		           "policy match: no parameters given");
 
-	if (!numeric)
-		p = getprotobynumber(proto);
-	if (p == NULL)
-		printf("%sproto %u ", prefix, proto);
-	else
-		printf("%sproto %s ", prefix, p->p_name);
+	if (!(info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT)))
+		exit_error(PARAMETER_PROBLEM,
+		           "policy match: neither --in nor --out specified");
+
+	if (info->flags & POLICY_MATCH_NONE) {
+		if (info->flags & POLICY_MATCH_STRICT)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: policy none but --strict given");
+
+		if (info->len != 0)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: policy none but policy given");
+	} else
+		info->len++;	/* increase len by 1, no --next after last element */
+
+	if (!(info->flags & POLICY_MATCH_STRICT) && info->len > 1)
+		exit_error(PARAMETER_PROBLEM,
+		           "policy match: multiple elements but no --strict");
+
+	for (i = 0; i < info->len; i++) {
+		e = &info->pol[i];
+		if ((e->match.saddr || e->match.daddr)
+		    && ((e->mode == POLICY_MODE_TUNNEL && e->invert.mode) ||
+		        (e->mode == POLICY_MODE_TRANSPORT && !e->invert.mode)))
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: --tunnel-src/--tunnel-dst "
+			           "is only valid in tunnel mode");
+	}
 }
 
 static void print_mode(char *prefix, u_int8_t mode, int numeric)
@@ -242,10 +300,10 @@
 	printf("%smode ", prefix);
 
 	switch (mode) {
-	case SECPATH_MODE_TRANSPORT:
+	case POLICY_MODE_TRANSPORT:
 		printf("transport ");
 		break;
-	case SECPATH_MODE_TUNNEL:
+	case POLICY_MODE_TUNNEL:
 		printf("tunnel ");
 		break;
 	default:
@@ -273,7 +331,7 @@
 	}
 	if (e->match.proto) {
 		PRINT_INVERT(e->invert.proto);
-		print_proto(prefix, e->proto, numeric);
+		printf("%sproto %s ", prefix, proto_to_name(e->proto, numeric));
 	}
 	if (e->match.mode) {
 		PRINT_INVERT(e->invert.mode);
@@ -281,18 +339,34 @@
 	}
 	if (e->match.daddr) {
 		PRINT_INVERT(e->invert.daddr);
-		printf("%slocal %s%s ", prefix,
+		printf("%stunnel-dst %s%s ", prefix,
 		       addr_to_dotted((struct in_addr *)&e->daddr),
 		       mask_to_dotted((struct in_addr *)&e->dmask));
 	}
 	if (e->match.saddr) {
 		PRINT_INVERT(e->invert.saddr);
-		printf("%sremote %s%s ", prefix,
+		printf("%stunnel-src %s%s ", prefix,
 		       addr_to_dotted((struct in_addr *)&e->saddr),
 		       mask_to_dotted((struct in_addr *)&e->smask));
 	}
 }
 
+static void print_flags(char *prefix, const struct ipt_policy_info *info)
+{
+	if (info->flags & POLICY_MATCH_IN)
+		printf("%sdir in ", prefix);
+	else
+		printf("%sdir out ", prefix);
+
+	if (info->flags & POLICY_MATCH_NONE)
+		printf("%spol none ", prefix);
+	else
+		printf("%spol ipsec ", prefix);
+
+	if (info->flags & POLICY_MATCH_STRICT)
+		printf("%sstrict ", prefix);
+}
+
 static void print(const struct ipt_ip *ip,
                   const struct ipt_entry_match *match,
 		  int numeric)
@@ -300,14 +374,12 @@
 	const struct ipt_policy_info *info = (void *)match->data;
 	unsigned int i;
 
-	printf ("policy match ");
-	if (info->flags & MATCH_PATH)
-		printf("path ");
-
+	printf("policy match ");
+	print_flags("", info);
 	for (i = 0; i < info->len; i++) {
 		if (info->len > 1)
 			printf("[%u] ", i);
-		print_entry("", &info->path[i], numeric);
+		print_entry("", &info->pol[i], numeric);
 	}
 
 	printf("\n");
@@ -318,11 +390,9 @@
 	const struct ipt_policy_info *info = (void *)match->data;
 	unsigned int i;
 
-	if (info->flags & MATCH_PATH)
-		printf("--path ");
-
+	print_flags("--", info);
 	for (i = 0; i < info->len; i++) {
-		print_entry("--", &info->path[i], 0);
+		print_entry("--", &info->pol[i], 0);
 		if (i + 1 < info->len)
 			printf("--next ");
 	}
@@ -332,18 +402,17 @@
 
 struct iptables_match policy =
 {
-	NULL,
-	"policy",
-	IPTABLES_VERSION,
-	IPT_ALIGN(sizeof(struct ipt_policy_info)),
-	IPT_ALIGN(sizeof(struct ipt_policy_info)),
-	&help,
-	&init,
-	&parse,
-	&final_check,
-	&print,
-	&save,
-	opts
+	.name		= "policy",
+	.version	= IPTABLES_VERSION,
+	.size		= IPT_ALIGN(sizeof(struct ipt_policy_info)),
+	.userspacesize	= IPT_ALIGN(sizeof(struct ipt_policy_info)),
+	.help		= help,
+	.init		= init,
+	.parse		= parse,
+	.final_check	= final_check,
+	.print		= print,
+	.save		= save,
+	.extra_opts	= opts
 };
 
 void _init(void)
diff -urN a/include/iptables.h b/include/iptables.h
--- a/include/iptables.h	2004-03-04 21:37:40.000000000 +0100
+++ b/include/iptables.h	2004-03-04 21:38:36.000000000 +0100
@@ -122,6 +122,7 @@
 extern void register_match(struct iptables_match *me);
 extern void register_target(struct iptables_target *me);
 
+extern char *proto_to_name(u_int8_t proto, int nolookup);
 extern struct in_addr *dotted_to_addr(const char *dotted);
 extern char *addr_to_dotted(const struct in_addr *addrp);
 extern char *addr_to_anyname(const struct in_addr *addr);
diff -urN a/iptables.c b/iptables.c
--- a/iptables.c	2004-02-24 18:28:50.000000000 +0100
+++ b/iptables.c	2004-03-04 21:39:18.000000000 +0100
@@ -235,11 +235,12 @@
 	{ "icmp", IPPROTO_ICMP },
 	{ "esp", IPPROTO_ESP },
 	{ "ah", IPPROTO_AH },
+	{ "ipcomp", IPPROTO_COMP },
 	{ "sctp", IPPROTO_SCTP },
 	{ "all", 0 },
 };
 
-static char *
+char *
 proto_to_name(u_int8_t proto, int nolookup)
 {
 	unsigned int i;

[-- Attachment #9: policy-test --]
[-- Type: text/plain, Size: 2245 bytes --]

#! /bin/bash

PATH="/usr/local/sbin"

iptables -F


iptables -A INPUT -m policy --dir in --pol none
iptables -A INPUT -m policy --dir in --pol ipsec
iptables -A INPUT -m policy --dir in --pol ipsec --mode tunnel
iptables -A INPUT -m policy --dir in --pol ipsec --mode transport
iptables -A INPUT -m policy --dir in --pol ipsec --proto ah
iptables -A INPUT -m policy --dir in --pol ipsec --proto esp
iptables -A INPUT -m policy --dir in --pol ipsec --strict --mode tunnel
iptables -A INPUT -m policy --dir in --pol ipsec --strict --mode transport
iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto ah
iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto esp
iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto esp --next --proto ah
iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto esp --mode tunnel --tunnel-src 192.168.0.1 --next --proto ah --mode transport
iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto esp --mode tunnel --tunnel-src 192.168.0.1 --tunnel-dst 192.168.0.23 --next --proto ah --mode transport


iptables -A OUTPUT -m policy --dir out --pol none
iptables -A OUTPUT -m policy --dir out --pol ipsec
iptables -A OUTPUT -m policy --dir out --pol ipsec --mode tunnel
iptables -A OUTPUT -m policy --dir out --pol ipsec --mode transport
iptables -A OUTPUT -m policy --dir out --pol ipsec --proto ah
iptables -A OUTPUT -m policy --dir out --pol ipsec --proto esp
iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --mode tunnel
iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --mode transport
iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto ah
iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp
iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp --next --proto ah
iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp --mode tunnel --tunnel-dst 192.168.0.0/24 --next --proto ah
iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp --mode tunnel --tunnel-dst ! 192.168.0.0/24 --next --proto ah
iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp --mode tunnel --tunnel-dst 192.168.0.1 --next --proto ah --mode transport


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-04 22:30                                     ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy
@ 2004-03-04 23:11                                       ` Willy Tarreau
  2004-03-04 23:42                                         ` Alexander Samad
  2004-03-05  1:47                                         ` Patrick McHardy
  2004-03-04 23:44                                       ` Patrick McHardy
  2004-03-05 11:39                                       ` Harald Welte
  2 siblings, 2 replies; 63+ messages in thread
From: Willy Tarreau @ 2004-03-04 23:11 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Netfilter Development Mailinglist, Harald Welte, Henrik Nordstrom,
	Tom Eastep, Michal Ludvig, alex, guillaume

Hi Patrick,

On Thu, Mar 04, 2004 at 11:30:38PM +0100, Patrick McHardy wrote:
> (long CC list, if anyone doesn't want to be on it anymore please say so)

no pb, please keep me in :-)

> - hook-traversal on input is symetrical to output now. This means
> packets will traverse PRE_ROUTING/LOCAL_IN before decryption and
> PRE_REROUTING/LOCAL_IN or PRE_ROUTING/FORWARD afterwards.

This is excellent. So if I understand it right, a packet now passes this way
(please excuse the ascii-art, it seems I'll never take the time to start xfig)

            [incoming packet]
                    |<-----------------. 
                    v                  |
               PREROUTING              |
    dest!=local?  /  \  dest==local?   |
                /      \               |
         FORWARD        LOCAL_IN       |
             |            |            |
             |      (ipsec?decrypt:nop)|
             |            |   \        / dest!=local ? nf_postxfrm_nonlocal()
             |            |    `------'
             v            v  dest==local ? ip_local_deliver_finish()

So if it behaves like this (which is what seems the best solution to me), is
there a risk that a carefully crafted packet loops many times on the right
side (multiple encapsulation with local DIP) ? And if so, is it risky or not,
wrt resource leak, multiple locks, etc... ?

> The only problem with this approach I'm currently aware of is that
> one of the ip-protocol handlers marked with xfrm_prot
> (xfrm4_tunnel.c) is also responsible for dispatching packets to
> ipip.c; the encapsulated packets heading for a ipip-tunnel won't
> traverse the netfilter hooks on input.

I don't know well the internals of IPSEC (bought the book but didn't read it
yet). Are IPIP tunnels often used ? or are they simply used according to a
negociation, in which case the ipsec implementation on the netfilter side
could refuse them ?

> - new match "policy" for matching the policy that was used during
> decapsulation (well, the used SAs, policy checks come later), and the
> policy that will be used for encapsulation.

Even more interesting... Now we can for sure check that packets coming from
a tunnel have not been spoofed by another tunnel.

Thanks a lot Patrick.
My main goal is to use this on 2.4 with davem/herbert xu's ipsec backport.
I will soon check if your patches apply on top of their implementation.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-04 23:11                                       ` Willy Tarreau
@ 2004-03-04 23:42                                         ` Alexander Samad
  2004-03-05  2:00                                           ` Patrick McHardy
  2004-03-05  1:47                                         ` Patrick McHardy
  1 sibling, 1 reply; 63+ messages in thread
From: Alexander Samad @ 2004-03-04 23:42 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Patrick McHardy, Netfilter Development Mailinglist, Harald Welte,
	Henrik Nordstrom, Tom Eastep, Michal Ludvig, guillaume

[-- Attachment #1: Type: text/plain, Size: 2951 bytes --]

On Fri, Mar 05, 2004 at 12:11:41AM +0100, Willy Tarreau wrote:
> Hi Patrick,
> 
> On Thu, Mar 04, 2004 at 11:30:38PM +0100, Patrick McHardy wrote:
> > (long CC list, if anyone doesn't want to be on it anymore please say so)
> 
> no pb, please keep me in :-)
> 
> > - hook-traversal on input is symetrical to output now. This means
> > packets will traverse PRE_ROUTING/LOCAL_IN before decryption and
> > PRE_REROUTING/LOCAL_IN or PRE_ROUTING/FORWARD afterwards.
> 
> This is excellent. So if I understand it right, a packet now passes this way
> (please excuse the ascii-art, it seems I'll never take the time to start xfig)
> 
>             [incoming packet]
>                     |<-----------------. 
>                     v                  |
>                PREROUTING              |
>     dest!=local?  /  \  dest==local?   |
>                 /      \               |
>          FORWARD        LOCAL_IN       |
>              |            |            |
>              |      (ipsec?decrypt:nop)|
>              |            |   \        / dest!=local ? nf_postxfrm_nonlocal()
>              |            |    `------'
>              v            v  dest==local ? ip_local_deliver_finish()
> 

Willy thanks for the diagram, too lazy to do one myself.

Q do I understand right that encrypted packets can or can't be acted
upon by the hooks in LOCAL_IN.

Or another way of putting it does a packet travel the tables twice once
as an encrypted packet and once as a de crypted packet ?

Alex

> 
> So if it behaves like this (which is what seems the best solution to me), is
> there a risk that a carefully crafted packet loops many times on the right
> side (multiple encapsulation with local DIP) ? And if so, is it risky or not,
> wrt resource leak, multiple locks, etc... ?
> 
> > The only problem with this approach I'm currently aware of is that
> > one of the ip-protocol handlers marked with xfrm_prot
> > (xfrm4_tunnel.c) is also responsible for dispatching packets to
> > ipip.c; the encapsulated packets heading for a ipip-tunnel won't
> > traverse the netfilter hooks on input.
> 
> I don't know well the internals of IPSEC (bought the book but didn't read it
> yet). Are IPIP tunnels often used ? or are they simply used according to a
> negociation, in which case the ipsec implementation on the netfilter side
> could refuse them ?
> 
> > - new match "policy" for matching the policy that was used during
> > decapsulation (well, the used SAs, policy checks come later), and the
> > policy that will be used for encapsulation.
> 
> Even more interesting... Now we can for sure check that packets coming from
> a tunnel have not been spoofed by another tunnel.
> 
> Thanks a lot Patrick.
> My main goal is to use this on 2.4 with davem/herbert xu's ipsec backport.
> I will soon check if your patches apply on top of their implementation.
> 
> Cheers,
> Willy
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-04 23:42                                         ` Alexander Samad
@ 2004-03-05  2:00                                           ` Patrick McHardy
  2004-03-05  2:13                                             ` Alexander Samad
  2004-03-10  2:45                                             ` Alexander Samad
  0 siblings, 2 replies; 63+ messages in thread
From: Patrick McHardy @ 2004-03-05  2:00 UTC (permalink / raw)
  To: Alexander Samad
  Cc: Willy Tarreau, Netfilter Development Mailinglist, Harald Welte,
	Tom Eastep, Michal Ludvig, guillaume

Alexander Samad wrote:
> Q do I understand right that encrypted packets can or can't be acted
> upon by the hooks in LOCAL_IN.
> 
> Or another way of putting it does a packet travel the tables twice once
> as an encrypted packet and once as a de crypted packet ?

Exactly, input looks like this:

(encrypted) PRE_ROUTING -> LOCAL_IN ->
(plain) PRE_ROUTING -> LOCAL_IN/FORWARD

output looks like this:

(plain) FORWARD/LOCAL_OUT -> POST_ROUTING ->
(encrypted) LOCAL_OUT -> POST_ROUTING

This is the same as with freeswan, only without the ipsec
devices, the policy match can be used as a easy replacement
for them (-m policy --pol ipsec).

Regards,
Patrick

> 
> Alex
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-05  2:00                                           ` Patrick McHardy
@ 2004-03-05  2:13                                             ` Alexander Samad
  2004-03-10  2:45                                             ` Alexander Samad
  1 sibling, 0 replies; 63+ messages in thread
From: Alexander Samad @ 2004-03-05  2:13 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

On Fri, Mar 05, 2004 at 03:00:07AM +0100, Patrick McHardy wrote:
> Alexander Samad wrote:
> >Q do I understand right that encrypted packets can or can't be acted
> >upon by the hooks in LOCAL_IN.
> >
> >Or another way of putting it does a packet travel the tables twice once
> >as an encrypted packet and once as a de crypted packet ?
> 
> Exactly, input looks like this:
> 
> (encrypted) PRE_ROUTING -> LOCAL_IN ->
> (plain) PRE_ROUTING -> LOCAL_IN/FORWARD
> 
> output looks like this:
> 
> (plain) FORWARD/LOCAL_OUT -> POST_ROUTING ->
> (encrypted) LOCAL_OUT -> POST_ROUTING
> 
> This is the same as with freeswan, only without the ipsec
> devices, the policy match can be used as a easy replacement
> for them (-m policy --pol ipsec).
> 
> Regards,
> Patrick

Great, i also presume this means that NAT + IPSEC is now working, will
give it a try tonight.

Thanks

> 
> >
> >Alex
> >
> >

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-05  2:00                                           ` Patrick McHardy
  2004-03-05  2:13                                             ` Alexander Samad
@ 2004-03-10  2:45                                             ` Alexander Samad
  2004-03-11 22:10                                               ` Patrick McHardy
  1 sibling, 1 reply; 63+ messages in thread
From: Alexander Samad @ 2004-03-10  2:45 UTC (permalink / raw)
  To: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 1866 bytes --]

Patrick 

I seem to have found a bug in your patches, but only when used in
conjuction with Herbert's mangle patch.

It seems like there is a loop caused when the packet traverses the
tablesi, in particular ip_route_me_harder.

I tested this on my laptop with debian 2.6.3-2 source with these patches
that you provided on this thread, as well as the Herbert mangle patch.

It seem like the packet on the way out gets encapsulated and then the
encrypted packets try to get re encrypted.

example ipsec.conf

conn wireless
	left=10.0.4.129
	leftsubnet=0/0
	authby=secret
	pfs=no
	auto=add
	right=%defaultroute

By the dump it looks like a loop, I added a printk("%d\n", iph->protocol); 
in ip_route_me_harder just before Herberts fix to test that.

When I changed the config to look like this


conn wireless
	left=10.0.4.129
	leftsubnet=10.6.0/24
	authby=secret
	pfs=no
	auto=add
	right=%defaultroute

It worked fine


Any other question ask, I have deb's of the image and headers too if you
want.

Alex


On Fri, Mar 05, 2004 at 03:00:07AM +0100, Patrick McHardy wrote:
> Alexander Samad wrote:
> >Q do I understand right that encrypted packets can or can't be acted
> >upon by the hooks in LOCAL_IN.
> >
> >Or another way of putting it does a packet travel the tables twice once
> >as an encrypted packet and once as a de crypted packet ?
> 
> Exactly, input looks like this:
> 
> (encrypted) PRE_ROUTING -> LOCAL_IN ->
> (plain) PRE_ROUTING -> LOCAL_IN/FORWARD
> 
> output looks like this:
> 
> (plain) FORWARD/LOCAL_OUT -> POST_ROUTING ->
> (encrypted) LOCAL_OUT -> POST_ROUTING
> 
> This is the same as with freeswan, only without the ipsec
> devices, the policy match can be used as a easy replacement
> for them (-m policy --pol ipsec).
> 
> Regards,
> Patrick
> 
> >
> >Alex
> >
> >

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-10  2:45                                             ` Alexander Samad
@ 2004-03-11 22:10                                               ` Patrick McHardy
  2004-03-12  0:15                                                 ` Alexander Samad
  0 siblings, 1 reply; 63+ messages in thread
From: Patrick McHardy @ 2004-03-11 22:10 UTC (permalink / raw)
  To: Alexander Samad; +Cc: Netfilter Development Mailinglist

Alexander Samad wrote:
> Patrick 
> 
> I seem to have found a bug in your patches, but only when used in
> conjuction with Herbert's mangle patch.
> 
> It seems like there is a loop caused when the packet traverses the
> tablesi, in particular ip_route_me_harder.
> 
> I tested this on my laptop with debian 2.6.3-2 source with these patches
> that you provided on this thread, as well as the Herbert mangle patch.
> 
> It seem like the packet on the way out gets encapsulated and then the
> encrypted packets try to get re encrypted.

Thanks for the report, for now the easiest solution is to back out
Herbert's patch.

Regards
Patrick

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-11 22:10                                               ` Patrick McHardy
@ 2004-03-12  0:15                                                 ` Alexander Samad
  0 siblings, 0 replies; 63+ messages in thread
From: Alexander Samad @ 2004-03-12  0:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On Thu, Mar 11, 2004 at 11:10:38PM +0100, Patrick McHardy wrote:
> Alexander Samad wrote:
> >Patrick 
> >
> >I seem to have found a bug in your patches, but only when used in
> >conjuction with Herbert's mangle patch.
> >
> >It seems like there is a loop caused when the packet traverses the
> >tablesi, in particular ip_route_me_harder.
> >
> >I tested this on my laptop with debian 2.6.3-2 source with these patches
> >that you provided on this thread, as well as the Herbert mangle patch.
> >
> >It seem like the packet on the way out gets encapsulated and then the
> >encrypted packets try to get re encrypted.
> 
> Thanks for the report, for now the easiest solution is to back out
> Herbert's patch.

I have done that in my local build, but I beleive it is in 2.6.4 (the
changelog)



> 
> Regards
> Patrick
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-04 23:11                                       ` Willy Tarreau
  2004-03-04 23:42                                         ` Alexander Samad
@ 2004-03-05  1:47                                         ` Patrick McHardy
  2004-03-05 11:10                                           ` Willy Tarreau
  1 sibling, 1 reply; 63+ messages in thread
From: Patrick McHardy @ 2004-03-05  1:47 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Netfilter Development Mailinglist, Harald Welte, Tom Eastep,
	Michal Ludvig, alex, guillaume

[-- Attachment #1: Type: text/plain, Size: 4497 bytes --]

Hi Willy,

Willy Tarreau wrote:
> This is excellent. So if I understand it right, a packet now passes this way
> (please excuse the ascii-art, it seems I'll never take the time to start xfig)
> 
>             [incoming packet]
>                     |<-----------------. 
>                     v                  |
>                PREROUTING              |
>     dest!=local?  /  \  dest==local?   |
>                 /      \               |
>          FORWARD        LOCAL_IN       |
>              |            |            |
>              |      (ipsec?decrypt:nop)|
>              |            |   \        / dest!=local ? nf_postxfrm_nonlocal()
>              |            |    `------'
>              v            v  dest==local ? ip_local_deliver_finish()
> 
> 
> So if it behaves like this (which is what seems the best solution to me), is
> there a risk that a carefully crafted packet loops many times on the right
> side (multiple encapsulation with local DIP) ? And if so, is it risky or not,
> wrt resource leak, multiple locks, etc... ?

Not exactly. the ipsec transformers are ip protocol handlers,
their input handler (xfrm4_input) is called from
ip_local_deliver_finish. After decrypting the payload the packets
are either reposted into the network stack (in case of a tunnel-mode
SA) or again delivered to the ip-protocol contained in the next
header. packets reposted into the stack may be in some intermediate
state in case of a transport mode encapsulation following a tunnel
mode encapsulation. The difficulty is to know when the packet
is "done". So nf_postxfrm_nonlocal is called directly after input
routing if the destination is non-local, but only for packets which
were reposted into the stack. The fact that they have a non-local
destination means ipsec must be done with them. For packet with
local destination, we look at what the ipprotocol handler does
in ip_local_deliver_finish (xfrm_prot or not) and if the packet
was handled by ipsec before and the protocol handler is no xfrm_prot,
ipsec is done, so the packets are passed through nf_postxfrm_input.

I attached a xfig file with a little picture of how it works, I hope
it's understandable. The blue lines represent the flow as it happens
without my patches, the red lines represent whats new. While drawing
it, I noticed a bug, packets with their destination changed
in nf_postxfrm_nonlocal may need to be delivered locally afterwards.

Regarding loop-danger, I'm not totally sure yet, but I don't think
that it would be triggerable by crafted packets but only by
adventurous configurations. Multiple encapsulations are no error
and they are limited to 4.

>>The only problem with this approach I'm currently aware of is that
>>one of the ip-protocol handlers marked with xfrm_prot
>>(xfrm4_tunnel.c) is also responsible for dispatching packets to
>>ipip.c; the encapsulated packets heading for a ipip-tunnel won't
>>traverse the netfilter hooks on input.
> 
> 
> I don't know well the internals of IPSEC (bought the book but didn't read it
> yet). Are IPIP tunnels often used ? or are they simply used according to a
> negociation, in which case the ipsec implementation on the netfilter side
> could refuse them ?

It's not IPIP with IPSEC that misbehaves but plaintext-IPIP tunnels.
xfrm4_tunnel.c is the registered handler for IPIP, it used to be ipip.c.
ipip.c now registers itself with xfrm4_tunnel, but xfrm4_tunnel is
marked as a xfrm_prot which is not true for ipip.c. Right now I'm not
even sure that it is a problem and under what exact circumstances, I
need to think about this some more (tomorrow).

>>- new match "policy" for matching the policy that was used during
>>decapsulation (well, the used SAs, policy checks come later), and the
>>policy that will be used for encapsulation.
> 
> 
> Even more interesting... Now we can for sure check that packets coming from
> a tunnel have not been spoofed by another tunnel.

That should also be handled by the policy checks, although at a
later time (at socket-delivery/forwarding). I'm unsure if the
policy checks should happen before packets are looked at by
netfilter, after all it manipulates it's state from information
in these untrusted packets.

> 
> Thanks a lot Patrick.
> My main goal is to use this on 2.4 with davem/herbert xu's ipsec backport.
> I will soon check if your patches apply on top of their implementation.

Thanks for your comments, I'm looking forward to your testing.

Best regards
Patrick

> 
> Cheers,
> Willy
> 

[-- Attachment #2: nf+ipsec.xfig.fig --]
[-- Type: image/x-xfig, Size: 3113 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-05  1:47                                         ` Patrick McHardy
@ 2004-03-05 11:10                                           ` Willy Tarreau
  0 siblings, 0 replies; 63+ messages in thread
From: Willy Tarreau @ 2004-03-05 11:10 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Netfilter Development Mailinglist, Harald Welte, Tom Eastep,
	Michal Ludvig, alex, guillaume

Hi Patrick,

On Fri, Mar 05, 2004 at 02:47:58AM +0100, Patrick McHardy wrote:
 
> I attached a xfig file with a little picture of how it works, I hope
> it's understandable. The blue lines represent the flow as it happens
> without my patches, the red lines represent whats new. While drawing
> it, I noticed a bug, packets with their destination changed
> in nf_postxfrm_nonlocal may need to be delivered locally afterwards.

Thanks for the picture, indeed it's far better now ;-)

> Regarding loop-danger, I'm not totally sure yet, but I don't think
> that it would be triggerable by crafted packets but only by
> adventurous configurations. Multiple encapsulations are no error
> and they are limited to 4.

OK. Thanks for your clarifications, I understand better now.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-04 22:30                                     ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy
  2004-03-04 23:11                                       ` Willy Tarreau
@ 2004-03-04 23:44                                       ` Patrick McHardy
  2004-03-05 11:39                                       ` Harald Welte
  2 siblings, 0 replies; 63+ messages in thread
From: Patrick McHardy @ 2004-03-04 23:44 UTC (permalink / raw)
  To: Netfilter Development Mailinglist
  Cc: Willy Tarreau, Harald Welte, Tom Eastep, Michal Ludvig, alex,
	guillaume

[-- Attachment #1: Type: text/plain, Size: 320 bytes --]

Patrick McHardy wrote:

> - new match "policy" for matching the policy that was used during
> decapsulation (well, the used SAs, policy checks come later), and the
> policy that will be used for encapsulation.
> 

This patch (libipt_policy.diff) was broken in the last mail, fixed
version attached.

> Regards
> Patrick

[-- Attachment #2: libipt_policy.diff --]
[-- Type: text/x-patch, Size: 11996 bytes --]

diff -urN a/extensions/.policy-test b/extensions/.policy-test
--- a/extensions/.policy-test	1970-01-01 01:00:00.000000000 +0100
+++ b/extensions/.policy-test	2004-03-05 00:44:32.000000000 +0100
@@ -0,0 +1,3 @@
+#!/bin/sh
+#
+[ -f $KERNEL_DIR/include/linux/netfilter_ipv4/ipt_policy.h ] && echo policy
diff -urN a/extensions/libipt_policy.c b/extensions/libipt_policy.c
--- a/extensions/libipt_policy.c	1970-01-01 01:00:00.000000000 +0100
+++ b/extensions/libipt_policy.c	2004-03-05 00:44:40.000000000 +0100
@@ -0,0 +1,421 @@
+/* Shared library add-on to iptables to add policy support. */
+#include <stdio.h>
+#include <netdb.h>
+#include <string.h>
+#include <stdlib.h>
+#include <syslog.h>
+#include <getopt.h>
+#include <netdb.h>
+#include <errno.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <iptables.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+#include <linux/netfilter_ipv4/ipt_policy.h>
+
+/* 
+ * HACK: global pointer to current matchinfo for making
+ * final checks and adjustments in final_check.
+ */
+static struct ipt_policy_info *policy_info;
+
+static void help(void)
+{
+	printf(
+"policy v%s options:\n"
+"  --dir in|out			match policy applied during decapsulation/\n"
+"				policy to be applied during encapsulation\n"
+"  --pol none|ipsec		match policy\n"
+"  --strict 			match entire policy instead of single element\n"
+"				at any position\n"
+"[!] --reqid reqid		match reqid\n"
+"[!] --spi spi			match SPI\n"
+"[!] --proto proto		match protocol (ah/esp/ipcomp)\n"
+"[!] --mode mode 		match mode (transport/tunnel)\n"
+"[!] --tunnel-src addr/mask	match tunnel source\n"
+"[!] --tunnel-dst addr/mask	match tunnel destination\n"
+"  --next 			begin next element in policy\n",
+	IPTABLES_VERSION);
+}
+
+static struct option opts[] =
+{
+	{
+		.name		= "dir",
+		.has_arg	= 1,
+		.val		= '1',
+	},
+	{
+		.name		= "pol",
+		.has_arg	= 1,
+		.val		= '2',
+	},
+	{
+		.name		= "strict",
+		.val		= '3'
+	},
+	{
+		.name		= "reqid",
+		.has_arg	= 1,
+		.val		= '4',
+	},
+	{
+		.name		= "spi",
+		.has_arg	= 1,
+		.val		= '5'
+	},
+	{
+		.name		= "tunnel-src",
+		.has_arg	= 1,
+		.val		= '6'
+	},
+	{
+		.name		= "tunnel-dst",
+		.has_arg	= 1,
+		.val		= '7'
+	},
+	{
+		.name		= "proto",
+		.has_arg	= 1,
+		.val		= '8'
+	},
+	{
+		.name		= "mode",
+		.has_arg	= 1,
+		.val		= '9'
+	},
+	{
+		.name		= "next",
+		.val		= 'a'
+	},
+	{ }
+};
+
+static void init(struct ipt_entry_match *m, unsigned int *nfcache)
+{
+	*nfcache |= NFC_UNKNOWN;
+}
+
+static int parse_direction(char *s)
+{
+	if (strcmp(s, "in") == 0)
+		return POLICY_MATCH_IN;
+	if (strcmp(s, "out") == 0)
+		return POLICY_MATCH_OUT;
+	exit_error(PARAMETER_PROBLEM, "policy_match: invalid dir `%s'", s);
+}
+
+static int parse_policy(char *s)
+{
+	if (strcmp(s, "none") == 0)
+		return POLICY_MATCH_NONE;
+	if (strcmp(s, "ipsec") == 0)
+		return 0;
+	exit_error(PARAMETER_PROBLEM, "policy match: invalid policy `%s'", s);
+}
+
+static int parse_mode(char *s)
+{
+	if (strcmp(s, "transport") == 0)
+		return POLICY_MODE_TRANSPORT;
+	if (strcmp(s, "tunnel") == 0)
+		return POLICY_MODE_TUNNEL;
+	exit_error(PARAMETER_PROBLEM, "policy match: invalid mode `%s'", s);
+}
+
+static int parse(int c, char **argv, int invert, unsigned int *flags,
+                 const struct ipt_entry *entry,
+                 unsigned int *nfcache,
+                 struct ipt_entry_match **match)
+{
+	struct ipt_policy_info *info = (void *)(*match)->data;
+	struct ipt_policy_elem *e = &info->pol[info->len];
+	struct in_addr *addr = NULL, mask;
+	unsigned int naddr = 0;
+	int mode;
+
+	check_inverse(optarg, &invert, &optind, 0);
+
+	switch (c) {
+	case '1':
+		if (info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT))
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --dir option");
+		if (invert)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: can't invert --dir option");
+
+		info->flags |= parse_direction(argv[optind-1]);
+		break;
+	case '2':
+		if (invert)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: can't invert --policy option");
+
+		info->flags |= parse_policy(argv[optind-1]);
+		break;
+	case '3':
+		if (info->flags & POLICY_MATCH_STRICT)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --strict option");
+
+		if (invert)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: can't invert --strict option");
+
+		info->flags |= POLICY_MATCH_STRICT;
+		break;
+	case '4':
+		if (e->match.reqid)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --reqid option");
+
+		e->match.reqid = 1;
+		e->invert.reqid = invert;
+		e->reqid = strtol(argv[optind-1], NULL, 10);
+		break;
+	case '5':
+		if (e->match.spi)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --spi option");
+		
+		e->match.spi = 1;
+		e->invert.spi = invert;
+		e->spi = strtol(argv[optind-1], NULL, 0x10);
+		break;
+	case '6':
+		if (e->match.saddr)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --tunnel-src option");
+
+		parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr);
+		if (naddr > 1)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: name resolves to multiple IPs");
+
+		e->match.saddr = 1;
+		e->invert.saddr = invert;
+		e->saddr = addr[0].s_addr;
+		e->smask = mask.s_addr;
+                break;
+	case '7':
+		if (e->match.daddr)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --tunnel-dst option");
+
+		parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr);
+		if (naddr > 1)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: name resolves to multiple IPs");
+
+		e->match.daddr = 1;
+		e->invert.daddr = invert;
+		e->daddr = addr[0].s_addr;
+		e->dmask = mask.s_addr;
+		break;
+	case '8':
+		if (e->match.proto)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --proto option");
+
+		e->proto = parse_protocol(argv[optind-1]);
+		if (e->proto != IPPROTO_AH && e->proto != IPPROTO_ESP &&
+		    e->proto != IPPROTO_COMP)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: protocol must ah/esp/ipcomp");
+		e->match.proto = 1;
+		e->invert.proto = invert;
+		break;
+	case '9':
+		if (e->match.mode)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: double --mode option");
+		
+		mode = parse_mode(argv[optind-1]);
+		e->match.mode = 1;
+		e->invert.mode = invert;
+		e->mode = mode;
+		break;
+	case 'a':
+		if (invert)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: can't invert --next option");
+
+		if (++info->len == POLICY_MAX_ELEM)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: maximum policy depth reached");
+		break;
+	default:
+		return 0;
+	}
+
+	policy_info = info;
+	return 1;
+}
+
+static void final_check(unsigned int flags)
+{
+	struct ipt_policy_info *info = policy_info;
+	struct ipt_policy_elem *e;
+	int i;
+
+	if (info == NULL)
+		exit_error(PARAMETER_PROBLEM,
+		           "policy match: no parameters given");
+
+	if (!(info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT)))
+		exit_error(PARAMETER_PROBLEM,
+		           "policy match: neither --in nor --out specified");
+
+	if (info->flags & POLICY_MATCH_NONE) {
+		if (info->flags & POLICY_MATCH_STRICT)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: policy none but --strict given");
+
+		if (info->len != 0)
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: policy none but policy given");
+	} else
+		info->len++;	/* increase len by 1, no --next after last element */
+
+	if (!(info->flags & POLICY_MATCH_STRICT) && info->len > 1)
+		exit_error(PARAMETER_PROBLEM,
+		           "policy match: multiple elements but no --strict");
+
+	for (i = 0; i < info->len; i++) {
+		e = &info->pol[i];
+		if ((e->match.saddr || e->match.daddr)
+		    && ((e->mode == POLICY_MODE_TUNNEL && e->invert.mode) ||
+		        (e->mode == POLICY_MODE_TRANSPORT && !e->invert.mode)))
+			exit_error(PARAMETER_PROBLEM,
+			           "policy match: --tunnel-src/--tunnel-dst "
+			           "is only valid in tunnel mode");
+	}
+}
+
+static void print_mode(char *prefix, u_int8_t mode, int numeric)
+{
+	printf("%smode ", prefix);
+
+	switch (mode) {
+	case POLICY_MODE_TRANSPORT:
+		printf("transport ");
+		break;
+	case POLICY_MODE_TUNNEL:
+		printf("tunnel ");
+		break;
+	default:
+		printf("??? ");
+		break;
+	}
+}
+
+#define PRINT_INVERT(x)		\
+do {				\
+	if (x)			\
+		printf("! ");	\
+} while(0)
+
+static void print_entry(char *prefix, const struct ipt_policy_elem *e,
+                        int numeric)
+{
+	if (e->match.reqid) {
+		PRINT_INVERT(e->invert.reqid);
+		printf("%sreqid %u ", prefix, e->reqid);
+	}
+	if (e->match.spi) {
+		PRINT_INVERT(e->invert.spi);
+		printf("%sspi 0x%x ", prefix, e->spi);
+	}
+	if (e->match.proto) {
+		PRINT_INVERT(e->invert.proto);
+		printf("%sproto %s ", prefix, proto_to_name(e->proto, numeric));
+	}
+	if (e->match.mode) {
+		PRINT_INVERT(e->invert.mode);
+		print_mode(prefix, e->mode, numeric);
+	}
+	if (e->match.daddr) {
+		PRINT_INVERT(e->invert.daddr);
+		printf("%stunnel-dst %s%s ", prefix,
+		       addr_to_dotted((struct in_addr *)&e->daddr),
+		       mask_to_dotted((struct in_addr *)&e->dmask));
+	}
+	if (e->match.saddr) {
+		PRINT_INVERT(e->invert.saddr);
+		printf("%stunnel-src %s%s ", prefix,
+		       addr_to_dotted((struct in_addr *)&e->saddr),
+		       mask_to_dotted((struct in_addr *)&e->smask));
+	}
+}
+
+static void print_flags(char *prefix, const struct ipt_policy_info *info)
+{
+	if (info->flags & POLICY_MATCH_IN)
+		printf("%sdir in ", prefix);
+	else
+		printf("%sdir out ", prefix);
+
+	if (info->flags & POLICY_MATCH_NONE)
+		printf("%spol none ", prefix);
+	else
+		printf("%spol ipsec ", prefix);
+
+	if (info->flags & POLICY_MATCH_STRICT)
+		printf("%sstrict ", prefix);
+}
+
+static void print(const struct ipt_ip *ip,
+                  const struct ipt_entry_match *match,
+		  int numeric)
+{
+	const struct ipt_policy_info *info = (void *)match->data;
+	unsigned int i;
+
+	printf("policy match ");
+	print_flags("", info);
+	for (i = 0; i < info->len; i++) {
+		if (info->len > 1)
+			printf("[%u] ", i);
+		print_entry("", &info->pol[i], numeric);
+	}
+
+	printf("\n");
+}
+
+static void save(const struct ipt_ip *ip, const struct ipt_entry_match *match)
+{
+	const struct ipt_policy_info *info = (void *)match->data;
+	unsigned int i;
+
+	print_flags("--", info);
+	for (i = 0; i < info->len; i++) {
+		print_entry("--", &info->pol[i], 0);
+		if (i + 1 < info->len)
+			printf("--next ");
+	}
+
+	printf("\n");
+}
+
+struct iptables_match policy =
+{
+	.name		= "policy",
+	.version	= IPTABLES_VERSION,
+	.size		= IPT_ALIGN(sizeof(struct ipt_policy_info)),
+	.userspacesize	= IPT_ALIGN(sizeof(struct ipt_policy_info)),
+	.help		= help,
+	.init		= init,
+	.parse		= parse,
+	.final_check	= final_check,
+	.print		= print,
+	.save		= save,
+	.extra_opts	= opts
+};
+
+void _init(void)
+{
+	register_match(&policy);
+}
diff -urN a/include/iptables.h b/include/iptables.h
--- a/include/iptables.h	2004-03-05 00:43:36.000000000 +0100
+++ b/include/iptables.h	2004-03-05 00:45:29.000000000 +0100
@@ -122,6 +122,7 @@
 extern void register_match(struct iptables_match *me);
 extern void register_target(struct iptables_target *me);
 
+extern char *proto_to_name(u_int8_t proto, int nolookup);
 extern struct in_addr *dotted_to_addr(const char *dotted);
 extern char *addr_to_dotted(const struct in_addr *addrp);
 extern char *addr_to_anyname(const struct in_addr *addr);
diff -urN a/iptables.c b/iptables.c
--- a/iptables.c	2004-03-05 00:44:10.000000000 +0100
+++ b/iptables.c	2004-03-05 00:45:59.000000000 +0100
@@ -235,11 +235,12 @@
 	{ "icmp", IPPROTO_ICMP },
 	{ "esp", IPPROTO_ESP },
 	{ "ah", IPPROTO_AH },
+	{ "ipcomp", IPPROTO_COMP },
 	{ "sctp", IPPROTO_SCTP },
 	{ "all", 0 },
 };
 
-static char *
+char *
 proto_to_name(u_int8_t proto, int nolookup)
 {
 	unsigned int i;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]: latest netfilter+ipsec patches
  2004-03-04 22:30                                     ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy
  2004-03-04 23:11                                       ` Willy Tarreau
  2004-03-04 23:44                                       ` Patrick McHardy
@ 2004-03-05 11:39                                       ` Harald Welte
  2 siblings, 0 replies; 63+ messages in thread
From: Harald Welte @ 2004-03-05 11:39 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

On Thu, Mar 04, 2004 at 11:30:38PM +0100, Patrick McHardy wrote:

> This is the latest version of the netfilter+ipsec patches, I will check
> them in in a couple of days after some minor changes.

good work, patrick.  I'm looking forward to seeing those changes in the
mainline kernel.

one minor note:
please #ifdef CONFIG_NETFILTER the nf_reset function (or #define it to
nothing otherwise)


-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28  0:09                     ` [PATCH]Re: " Harald Welte
  2004-01-28  8:49                       ` Patrick McHardy
@ 2004-01-28 10:30                       ` Andreas Jellinghaus
  2004-01-29 19:05                         ` Harald Welte
  1 sibling, 1 reply; 63+ messages in thread
From: Andreas Jellinghaus @ 2004-01-28 10:30 UTC (permalink / raw)
  To: netfilter-devel

ah_output to ah_output_finish and a new ah_output
with:
> +	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, ah_dummydev,
> +		       ah_output_finish);

please add the same to ipip, ipgre and friends, to keep them
in sync? what about ipv6?

Andreas
p.s. I hope the idea with the new hook POSTENCAP died?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH]Re: NAT before IPsec with 2.6
  2004-01-28 10:30                       ` [PATCH]Re: NAT before IPsec with 2.6 Andreas Jellinghaus
@ 2004-01-29 19:05                         ` Harald Welte
  0 siblings, 0 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-29 19:05 UTC (permalink / raw)
  To: Andreas Jellinghaus; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]

On Wed, Jan 28, 2004 at 11:30:04AM +0100, Andreas Jellinghaus wrote:
> ah_output to ah_output_finish and a new ah_output
> with:
> > +	return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, ah_dummydev,
> > +		       ah_output_finish);
> 
> please add the same to ipip, ipgre and friends, to keep them
> in sync? what about ipv6?

yes, please wait until we have implemented in one case and proven that
it does what it is supposed to do.. after that we will sync up.

> Andreas
> p.s. I hope the idea with the new hook POSTENCAP died?

yes.

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 11:57                 ` Henrik Nordstrom
  2004-01-27 13:07                   ` Harald Welte
@ 2004-01-27 19:54                   ` Michael Richardson
  1 sibling, 0 replies; 63+ messages in thread
From: Michael Richardson @ 2004-01-27 19:54 UTC (permalink / raw)
  To: netfilter-devel

-----BEGIN PGP SIGNED MESSAGE-----


>>>>> "Henrik" == Henrik Nordstrom <hno@marasystems.com> writes:
    Henrik> For me it is easier to grasp what is happending with the packet
    Henrik> if it is viewed like the original packet is forwarded to the
    Henrik> ipsec engine and the ipsec engine then generates a new packet.

  Yes, I agree strongly.

    Henrik> Even in the Host<->Host IPSec case the tunnel view is most
    Henrik> appropriate. As soon as the packets enter the IPSec encapsulation
    Henrik> they loose all the relevant propoerties of the original
    Henrik> connection and becomes part of the single IPSec Host <-> IPSec
    Henrik> Host connection all packets between these two IPSec host is
    Henrik> transmitted over.

  Actually, there are a number of properities that many will wish to retain.
This includes:
     1) VLAN
     2) QoS - DSCP, IPv6 flow label, other application set parameters
     3) possibly some pieces of nfmark (it will be scenario specific)
     4) ECN gets copied

  It is my understanding that the stackable dst permits most of this to
be retained - but the IPsec actually needs to decide what to do.

  Finally, there are some packets which will enter the IPsec system, but
emerge unchanged - they may excepted. Those packets will not be new.

]       ON HUMILITY: to err is human. To moo, bovine.           |  firewalls  [
]   Michael Richardson,    Xelerance Corporation, Ottawa, ON    |net architect[
] mcr@xelerance.com      http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Finger me for keys

iQCVAwUBQBbCDYqHRg3pndX9AQGjCgP9EWNAvrsEwopZXv1aJhBAyrFbawM5bPI4
CoENcH4FFXURohmE7GDsXl3Ei8Hf7Kuru8EG1WKZCNJXYpgpsMFRXH6YVOzl497o
aVdE2y279vv1JOyY7Jo/Lwj6nm5NMqkv1EMMOMK8IZm3+AgQ6Fzw/IosMpQUEAkR
CrWi68BCtE0=
=usFv
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 10:39               ` Harald Welte
  2004-01-27 11:57                 ` Henrik Nordstrom
@ 2004-01-27 13:27                 ` Valentijn Sessink
  2004-01-27 13:57                   ` Henrik Nordstrom
  2004-01-27 21:13                   ` Andreas Jellinghaus
  2004-01-27 16:11                 ` Tom Eastep
  2004-01-27 19:51                 ` Michael Richardson
  3 siblings, 2 replies; 63+ messages in thread
From: Valentijn Sessink @ 2004-01-27 13:27 UTC (permalink / raw)
  To: Harald Welte, Willy Tarreau, Henrik Nordstrom, Tom Eastep,
	Michal Ludvig, netfilter-devel

Hello,

At Tue, Jan 27, 2004 at 11:39:18AM +0100, Harald Welte wrote:
> I also disagree that an ESP packet should be trated as locally
> generated (and thus iterate over OUTPUT).

This is what happens with incoming packets: they hit INPUT twice.

>   From my perspective, locally
> generated means more something like: sent via a local userspace process
> using PF_INET sockets.  If we consider a packet after any kind of change 
> as locally-generated, we could argue doing this with NAT'ed packets,
> too.  Please try to convince me, if I'm missing some point.

You are missing the point that the netfilter code changes the packets here,
thus if you would want netfilter to do something else, you could code that
in netfilter rules.

Now let's review an IPsec tunnel. On the input side, everything works as
expected: an ESP packet hits PREROUTING, then INPUT, gets unencrypted, then
the new packet (that has different from and to, and may be a different
protocol altogether) hits PREROUTING again, then INPUT (or FORWARD).

On output, however, things are not so transparant. A packet hits OUTPUT,
then gets encrypted. A totally different packet hits POSTROUTING. This makes
no sense.

> One of the fundamental principles behind netfilter/iptables, especially
> NAT, is it's symmetric behaviour (between incoming and outgoing packets,
> between the hooks, ...).   So with any possible solution, this should be
> kept in mind.

There's no symmetric behaviour now. And from an "input" perspective, I think
a decapsulated packet that has a locally ending payload should hit INPUT
again. Same with output: a locally generated packet hits output, then should
hit OUTPUT again if it was really destined for the outside world.

V.
-- 
http://www.openoffice.nl/   Open Office - Linux Office Solutions
Valentijn Sessink  valentyn+sessink@nospam.openoffice.nl

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 13:27                 ` Valentijn Sessink
@ 2004-01-27 13:57                   ` Henrik Nordstrom
  2004-01-27 21:13                   ` Andreas Jellinghaus
  1 sibling, 0 replies; 63+ messages in thread
From: Henrik Nordstrom @ 2004-01-27 13:57 UTC (permalink / raw)
  To: Valentijn Sessink
  Cc: Harald Welte, Willy Tarreau, Tom Eastep, Michal Ludvig,
	netfilter-devel

On Tue, 27 Jan 2004, Valentijn Sessink wrote:

> On output, however, things are not so transparant. A packet hits OUTPUT,
> then gets encrypted. A totally different packet hits POSTROUTING. This makes
> no sense.

And there the required hooks needs to be inserted in the IPSec 
implementation to call

a) POSTROUTING before the final decision that this packet is to go via 
IPSec. Note that this may need to allow reroute of the packet including 
IPSec policy. Because of this some changes may be needed in the IPSec 
implementation for this to be complete. 

b) After encrypting OUTPUT needs to be called on the encrypted packet 
before the packet leaves IPSec and continues the normal path leading to 
POSTROUTING.

> There's no symmetric behaviour now.

We all know that, and is why this discussion exists in order to find out 
how it should be.

> And from an "input" perspective, I think a decapsulated packet that has
> a locally ending payload should hit INPUT again. Same with output: a
> locally generated packet hits output, then should hit OUTPUT again if it
> was really destined for the outside world.

To this we all agree from what I can tell.

Regards
Henrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 13:27                 ` Valentijn Sessink
  2004-01-27 13:57                   ` Henrik Nordstrom
@ 2004-01-27 21:13                   ` Andreas Jellinghaus
  2004-01-28  8:58                     ` Harald Welte
  1 sibling, 1 reply; 63+ messages in thread
From: Andreas Jellinghaus @ 2004-01-27 21:13 UTC (permalink / raw)
  To: netfilter-devel

On Tue, 27 Jan 2004 14:27:25 +0100, Valentijn Sessink wrote:

> Hello,
> 
> At Tue, Jan 27, 2004 at 11:39:18AM +0100, Harald Welte wrote:
>> I also disagree that an ESP packet should be trated as locally
>> generated (and thus iterate over OUTPUT).
> 
> This is what happens with incoming packets: they hit INPUT twice.

yes, but once before and once after decryption. use either fwmark
or an ipip tunnel interface to destinguish between already decrypted
and was-never-encrypted packets. 

> On output, however, things are not so transparant. A packet hits OUTPUT,
> then gets encrypted. A totally different packet hits POSTROUTING. This makes
> no sense.

use an interface, and you will see the packet twice, plus the interface
does the route lookup on the encrypted packet. so no ugly hacks in the
routing table are necessary.

(ugly hacks scenario: consider a road warrior.
he is 192.168.3.2 and his vpn gateway is 192.168.3.1. he wants to put
the company net on that GW address. But without a GW he doesn't know
on which interface to put it. it could be either wlan (if the tunnel
is established with his wireless lan card), or the eth0 interface
(if he is using ethernet cable to connect to the net).
sure, the ipsec software could manipulate the routing table after
altering local and remote tunnel endpoints, but that is an ugly hack
and a bad design.)

[symetry]

that does not work:
netfilter wants NAT to be the first thing for incoming packets
and the last thing for outgoing packets. thats symetry.

but you want ipsec encryption and decryption as soon as possible:
both before looking at the routing table.

mixing those two will never result in symetry. if you try it
(routing before encapsulation), the result has problems.

Andreas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 21:13                   ` Andreas Jellinghaus
@ 2004-01-28  8:58                     ` Harald Welte
  2004-01-28 10:21                       ` Andreas Jellinghaus
                                         ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Harald Welte @ 2004-01-28  8:58 UTC (permalink / raw)
  To: Andreas Jellinghaus; +Cc: netfilter-devel, netdev

[-- Attachment #1: Type: text/plain, Size: 4125 bytes --]

Cc'ing netdev, since this is now a very clear proposal and not some
internal netfilter rambling.

On Tue, Jan 27, 2004 at 10:13:33PM +0100, Andreas Jellinghaus wrote:
> > This is what happens with incoming packets: they hit INPUT twice.
> 
> yes, but once before and once after decryption. use either fwmark
> or an ipip tunnel interface to destinguish between already decrypted
> and was-never-encrypted packets. 

yes, and this is desired.   However, the packets also hit PREROUTING
twice, which is exactly what we want (and get it for free, since
xfrm4_input.c calls netif_rx() again).  

However, it doesn't reset the skb->nfct pointer, so connection tracking
sees both encapsulated and decapsulated packet as belonging to the same
connection.  This is wrong, and dangerous (application helpers might be
called for the encapsulated traffic, which now has a completely
different layer 4 protocol.

So with stock 2.6.x you have 

- no working connection tracking in any kind of ipsec scenario
- conntrack and NAT helpers trying to find readable application data
  within already-encrypted packets
- no working DNAT/SNAT, mainly because of not-working connection
  tracking

> > On output, however, things are not so transparant. A packet hits OUTPUT,
> > then gets encrypted. A totally different packet hits POSTROUTING. This makes
> > no sense.
> 
> use an interface, and you will see the packet twice, plus the interface
> does the route lookup on the encrypted packet. so no ugly hacks in the
> routing table are necessary.

Did anybody propose ugly hacks in the routing table?

> [symetry]
> 
> that does not work:
> netfilter wants NAT to be the first thing for incoming packets
> and the last thing for outgoing packets. thats symetry.

yes, it is symmetric.  As is the symmetry between prerouting/postrouting
and input/output hooks. As well as the symmetry between SNAT and DNAt,
...

> but you want ipsec encryption and decryption as soon as possible:
> both before looking at the routing table.

Yes, that's why what IPsec is doing for incoming ESP/AH packets the
right thing: go through PRE_ROUTING, route to LOCAL_IN, decapsulate,
and start over again.

However, for outgoing ESP/AH packets (locally-encapsulated), it doesn't
show the symmetric behaviour of what it does on the input side: go
through FORWARD, encapsulate, route, POST_ROUTING.  No starting over.

So we absolutely _need_ a symmetric-to-incoming behaviour like:

a) for locally-originated packets
LOCAL_OUT, POST_ROUTING, encapsulate, LOCAL_OUT, POST_ROUTING.

b) for forwarded, locally-encapsulated packets
PRE_ROUTING, FORWARD, POST_ROUTING, encapsulate, LOCAL_OUT, POST_ROUTING.

No, we don't achieve this by manipulating the routing code, but by
placing the respective hooks in ah{4,6}.c and esp{4,6}.c
{ah,esp}_output() function respectively. We also need to (again) reset
the skb->nfct and drop the conntrack reference again.

This way we 
	1) make connection tracking work again
	2) do not change any routing of the current implementation
	3) enable users to
		a) filter unencapsulated, encapsulated and decapsulated
		   packeets at their 'expected' place
		b) do DNAT on the incoming decapsulated packets
		   (which doesn't work right now, but should)
		c) do SNAT on the outgoing/forwarded to-be-encapsualated
		   packets (which doesn't work right now, but should)

> mixing those two will never result in symetry. if you try it
> (routing before encapsulation), the result has problems.

I'm not proposing to route before encapsulating.  I just propose of
calling the same netfilter functions that we usually call before/after
routing at different places :)

> Andreas

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-28  8:58                     ` Harald Welte
@ 2004-01-28 10:21                       ` Andreas Jellinghaus
  2004-01-28 13:00                         ` Harald Welte
  2004-01-28 14:24                       ` 2.6.2-rc2 and nf-log Wojciech 'Sas' Cieciwa
  2004-01-28 19:38                       ` NAT before IPsec with 2.6 David S. Miller
  2 siblings, 1 reply; 63+ messages in thread
From: Andreas Jellinghaus @ 2004-01-28 10:21 UTC (permalink / raw)
  To: Harald Welte; +Cc: netfilter-devel, netdev

On Wed, 2004-01-28 at 09:58, Harald Welte wrote:
> However, it doesn't reset the skb->nfct pointer,

the tunnels do. 

In summary it looks to me, like you want to have everything that a
real tunnel with an interface offers, but for some reason you don't
want the tunnel with the interface. right?

so, why?

> - no working connection tracking in any kind of ipsec scenario
that is not true. ipip_rcv:
                nf_conntrack_put(skb->nfct);
                skb->nfct = NULL;
ipip_xmit:
        nf_conntrack_put(skb->nfct);
        skb->nfct = NULL;

the only problem is stability and scheduling in atomic with xfrm+ipip
(i guess a bug, I hope someone who understands the code could take a
look at it).

> > use an interface, and you will see the packet twice, plus the interface
> > does the route lookup on the encrypted packet. so no ugly hacks in the
> > routing table are necessary.
> 
> Did anybody propose ugly hacks in the routing table?

if you don't use an interface, you need them in real world setups.
with an interface both gw and client put the route to each others
static vpn ip on ipsec0 and then configure the tunnel ips, and
those ip will be used to lookup the interface where to send the
encrypted packet.

with the current code without using interfaces, the ip of the
vpn tunnel needs to be moved to the interface that will be used
to reach the vpn gateway. even worse, you need to play with
source address selection to make sure an outgoing packet has
your static vpn ip, and not the dynamic ip of whatever net
you are currently using. 

example:
      up ip addr add 192.168.3.1 dev wlan0
      up ip route del default dev wlan0
      up ip route add default via 192.168.1.1 src 192.168.3.1 dev wlan0
      down ip addr del 192.168.3.1 dev wlan0

and swithcing from wlan0 to eth0 does not only require
the tunnel to be reconfigured (local and remote ip), but
also changes in the routing table. that is horrible.

and yes, of course you want static ip addresses for "road warrior"
clients! for example smtp servers need to these to allow relaying
or not.

> So we absolutely _need_ a symmetric-to-incoming behaviour like:

use an tunnel interface and you have that.
(adding pre_/post_ routing is very simple, simply cut&paste to the
_rcv and _xmit functions.

sure, you can archive the same thing without using tunnel interfaces.
But i wonder if that creates a simple solution, easy to understand.

my basic question is: how will these changes affect setups with tunnel
interfaces? 

[other issue]
if we are to look at the calls to xfrm anyway: would it be possible to
add a debugging facility for dropped packets? I know, that is not a
netfilter issue, but I guess many people will find it problematic,
if they cannot see what packets are dropped by the xfrm logic.
I'm only rasing the issue, because that code is most likely put
to the same places we are discussiong right now (ah, esp, ipip_rcv,
ipip_xmit, ...).

> I'm not proposing to route before encapsulating.

that is currently done. I'm glad if that "feature" is removed.

>   I just propose of
> calling the same netfilter functions that we usually call before/after
> routing at different places :)

fine. I don't understand what the benefit of doing everything like with
an interface, but avoiding the interface. but please keep
tunnel with interface and tunnel without interface in sync.

Regards, Andreas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-28 10:21                       ` Andreas Jellinghaus
@ 2004-01-28 13:00                         ` Harald Welte
  2004-01-28 13:43                           ` Andreas Jellinghaus
  0 siblings, 1 reply; 63+ messages in thread
From: Harald Welte @ 2004-01-28 13:00 UTC (permalink / raw)
  To: Andreas Jellinghaus; +Cc: netfilter-devel, netdev

[-- Attachment #1: Type: text/plain, Size: 3838 bytes --]

On Wed, Jan 28, 2004 at 11:21:34AM +0100, Andreas Jellinghaus wrote:
> On Wed, 2004-01-28 at 09:58, Harald Welte wrote:
> > However, it doesn't reset the skb->nfct pointer,
> 
> the tunnels do. 

yes, I'm well aware of that.

> In summary it looks to me, like you want to have everything that a
> real tunnel with an interface offers, but for some reason you don't
> want the tunnel with the interface. right?
> so, why?

Because it's needed for interoperability with other IPsec
implementations, in other words if you are not running Linux on both
ends.  Yes, I know... in an ideal world everybody would run linux on
both ends... but if you do: Why use IPsec at all, if not for
interoperability.

> > - no working connection tracking in any kind of ipsec scenario

s/ipsec scen/plain non-tunnel ipsec scen/

> > Did anybody propose ugly hacks in the routing table?
>
> if you don't use an interface, you need them in real world setups.

oh well, what you are describing are entries in the routing table,
that's fine.  I thought you were talking about hacking the routing code.

> and swithcing from wlan0 to eth0 does not only require
> the tunnel to be reconfigured (local and remote ip), but
> also changes in the routing table. that is horrible.

well, this is just like the 2.6.x ipsec implementation is like.  It's
not the netfilter project's will or task to change that implementation.
We just want to get the same netfilter/iptables functionality,
independent of somebody using ipsec or not.

> > So we absolutely _need_ a symmetric-to-incoming behaviour like:
> 
> use an tunnel interface and you have that.

yes, but as stated above, other implementations might not be able to
work with that implementation.

> sure, you can archive the same thing without using tunnel interfaces.
> But i wonder if that creates a simple solution, easy to understand.

Unfortunately there is no simple or easy solution.  We just want to get
it working at all.

> my basic question is: how will these changes affect setups with tunnel
> interfaces? 

I think you would see the packet even more often:

1) LOCAL_OUT of original packet 
2) POST_ROUTING of original packet
3) LOCAL_OUT of tunnelled packet
4) POST_ROUTING of tunneled packet
5) LOCAL_OUT of ESP/AH packet
6) POST_ROUTING of ESP/AH packet

> [other issue]
> if we are to look at the calls to xfrm anyway: would it be possible to
> add a debugging facility for dropped packets? I know, that is not a
> netfilter issue, but I guess many people will find it problematic,
> if they cannot see what packets are dropped by the xfrm logic.
> I'm only rasing the issue, because that code is most likely put
> to the same places we are discussiong right now (ah, esp, ipip_rcv,
> ipip_xmit, ...).

yes, but I'd rather don't put both of them into one patch.

> > I'm not proposing to route before encapsulating.
> 
> that is currently done. I'm glad if that "feature" is removed.

Again, I misunderstood you.  I do not intend to change the IPsec
implementation in any way, I just want netfilter to acommodate it. 

> an interface, but avoiding the interface. but please keep
> tunnel with interface and tunnel without interface in sync.

What do you mean by that?  From my point of view, it should be like
(1-6) above.   This way anybody can filter on any kind of header bit at
any layer of the encapsulation. 

> Regards, Andreas

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-28 13:00                         ` Harald Welte
@ 2004-01-28 13:43                           ` Andreas Jellinghaus
  0 siblings, 0 replies; 63+ messages in thread
From: Andreas Jellinghaus @ 2004-01-28 13:43 UTC (permalink / raw)
  To: Harald Welte; +Cc: netfilter-devel, netdev

Hi Harald,

The userland could be improved to be more flexible,
then it could configure the same issue with and
without interface, depending on the needs. From
protocol perspective that wouldn't be visible, 
so I realy don't see any interoperability issues here.

ip, esp, inner ip, tcp/udp, payload.
the packet can look the same, whether you use some
interface to assemble it or not. ike and setting
spd plus tunnel is the issue, but thats userspace.

ok, but thats not a netfilter topic.

> I think you would see the packet even more often:
> 
> 1) LOCAL_OUT of original packet 
> 2) POST_ROUTING of original packet
> 3) LOCAL_OUT of tunnelled packet
> 4) POST_ROUTING of tunneled packet
> 5) LOCAL_OUT of ESP/AH packet
> 6) POST_ROUTING of ESP/AH packet

one change you want to add is POST_ROUTING before
encapsilating. add that hook at the start of ipip_xmit,
too (renameing ipip_xmit to ipip_xmit_finish
and a new ipip_xmit with only the hook), and things
should be in sync. same for the other tunnels.

the other change you want to make is set the interface
to some other value after encapsulation. what value?
will it be possible to specify the value?
I'm still thinking about that and in what scenario you
would need to differ say ipsec0 and ipsec1.
but such scenarios could exist. 

maybe one vpn/gateway/router shared by two small companies,
both want to keep their network private ?

[setkey debugging]
> yes, but I'd rather don't put both of them into one patch.

sure. are there netfilter routines that can be reused,
or are they somehow tied to the NF core? netfilter formats
the packets nicely for debugging.

> > an interface, but avoiding the interface. but please keep
> > tunnel with interface and tunnel without interface in sync.
> 
> What do you mean by that? 

if you add a hook in ah/esp, add the same hooks to ipip, ipgre,
ip6_tunnel etc. Currently netfilter/tunneling is more or
less consistent - without interface there are some issues,
but the tunnelling fits fine into the big netfilter picture
(the one with 5 hooks explained). 

if ah/esp used different hooks than the real tunnels,
that would make the situation less easy.

also, what is the status of the new hooks?
dropped or do you still plan to add a different
hook, even if it calls the same table?
POST_XFRM is a bad name, better something
like POST_ENCAP or POST_DECAP, because IMO
tunnels should call that hook, too, so it
should be a generic name, not tied to xfrm.

>  From my point of view, it should be like
> (1-6) above.   This way anybody can filter on any kind of header bit at
> any layer of the encapsulation. 

ah, wait. it will be only 4 steps,
3+5 and 4+6 are merged, because the tunnel calls xfrm itself,
so it does both at once. that could be changed of course,
if you want that flexibility.

but i wonder:
if you want to change the untunneld packet, you can do that
in 1/2. if you want to change the encrypted packet, you can
do that i 5/6. 3/4 has the same (as payload) you see in 1/2,
and the same header that has 5/6. so what is the benefit
of two additional steps?

Regards, Andreas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* 2.6.2-rc2 and nf-log
  2004-01-28  8:58                     ` Harald Welte
  2004-01-28 10:21                       ` Andreas Jellinghaus
@ 2004-01-28 14:24                       ` Wojciech 'Sas' Cieciwa
  2004-01-28 19:38                       ` NAT before IPsec with 2.6 David S. Miller
  2 siblings, 0 replies; 63+ messages in thread
From: Wojciech 'Sas' Cieciwa @ 2004-01-28 14:24 UTC (permalink / raw)
  To: Harald Welte; +Cc: Netfilter developmnet mailing list

Hi I try compile nf-log module from CVS snap 20040128 on 2.6.2-rc2 kernel 
and I found undefinef BR_NETPROTO_LOCK.

Can anyone check and solve this problem ?

Thanx.
					Sas.
-- 
{Wojciech 'Sas' Cieciwa}  {Member of PLD Team                               }
{e-mail: cieciwa@alpha.zarz.agh.edu.pl, http://www2.zarz.agh.edu.pl/~cieciwa}

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-28  8:58                     ` Harald Welte
  2004-01-28 10:21                       ` Andreas Jellinghaus
  2004-01-28 14:24                       ` 2.6.2-rc2 and nf-log Wojciech 'Sas' Cieciwa
@ 2004-01-28 19:38                       ` David S. Miller
  2 siblings, 0 replies; 63+ messages in thread
From: David S. Miller @ 2004-01-28 19:38 UTC (permalink / raw)
  To: Harald Welte; +Cc: aj, netfilter-devel, netdev

On Wed, 28 Jan 2004 09:58:31 +0100
Harald Welte <laforge@netfilter.org> wrote:

> No, we don't achieve this by manipulating the routing code, but by
> placing the respective hooks in ah{4,6}.c and esp{4,6}.c
> {ah,esp}_output() function respectively. We also need to (again) reset
> the skb->nfct and drop the conntrack reference again.

Why not just do this right when we pop into the dst_output() call
in ip_output.c  This way we don't have to add all of this stuff
for every new encapsulator we ever implement.

Maybe not like this precisely, but something like it.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 10:39               ` Harald Welte
  2004-01-27 11:57                 ` Henrik Nordstrom
  2004-01-27 13:27                 ` Valentijn Sessink
@ 2004-01-27 16:11                 ` Tom Eastep
  2004-01-27 20:45                   ` Harald Welte
  2004-01-27 19:51                 ` Michael Richardson
  3 siblings, 1 reply; 63+ messages in thread
From: Tom Eastep @ 2004-01-27 16:11 UTC (permalink / raw)
  To: Harald Welte, Willy Tarreau
  Cc: Henrik Nordstrom, Michal Ludvig, netfilter-devel

On Tuesday 27 January 2004 02:39 am, Harald Welte wrote:

>
> My proposal:
> From conntracks view (and thus from NAT view), there (shuld be?) two
> seperate connections for any given to-be-encapsulated packet:
>
> - the payload connection (tcp/udp/whatever)
> 	- conntrack entry created at PREROUTING
> 	- conntrack entry confirmed (put in hash) at XXX
> - the AH/ESP connection
> 	- conntrack entry created at YYY
> 	- conntrack entry confirmed at POSTROUTING
>
> To do this, somewhen between esp_output() is called and the beginning of
> the modification of the packet payload, we need to call
> nf_hook(POST_ROUTING).  This way, conntrack would be able to put the
> connection in the hash, and people can do SNAT-like operations in
> nat->POSTROUTING.  We could even pass a dummy output device structure
> with an interface name "esp" so people can SNAT everything heading for
> esp encapsulation.

I like this proposal. I assume that on input, payload packets would also have 
this dummy device as their input device? And on output that payload packets 
would have "esp" as their output device in the FORWARD and OUTPUT hooks as 
well?

I would also like to register my vote for having the AH/ESP packets go through 
INPUT and OUTPUT. This would allow Shorewall to treat IPSEC in the same way 
as it does all other tunnel types:

a) the payload connection uses a dedicated device ("esp" in this case)
b) the encapsulated traffic originates at the gateway on output and is 
directed to the gateway on input.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 16:11                 ` Tom Eastep
@ 2004-01-27 20:45                   ` Harald Welte
  2004-01-28 15:36                     ` Tom Eastep
  0 siblings, 1 reply; 63+ messages in thread
From: Harald Welte @ 2004-01-27 20:45 UTC (permalink / raw)
  To: Tom Eastep
  Cc: Willy Tarreau, Henrik Nordstrom, Michal Ludvig, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

> > To do this, somewhen between esp_output() is called and the beginning of
> > the modification of the packet payload, we need to call
> > nf_hook(POST_ROUTING).  This way, conntrack would be able to put the
> > connection in the hash, and people can do SNAT-like operations in
> > nat->POSTROUTING.  We could even pass a dummy output device structure
> > with an interface name "esp" so people can SNAT everything heading for
> > esp encapsulation.
> 
> I like this proposal. 

Great :)

> I assume that on input, payload packets would also have this dummy
> device as their input device? 

sure, makes sense.

> And on output that payload packets would have "esp" as their output
> device in the FORWARD and OUTPUT hooks as well?

no, that's way more difficult.  I'm not sure whether it can be done at
all (without adding rediculous kludges to the code).

> I would also like to register my vote for having the AH/ESP packets go
> through INPUT and OUTPUT. This would allow Shorewall to treat IPSEC in
> the same way as it does all other tunnel types:

Mh.  Let's say we stick with the INPUT/OUTPUT chains for now.

However, I would like to introduce new netfilter hooks.  At least for
the beginning (due to lack of better ideas), I'm going to register the
INPUT/OUTPUT chains with those new hooks.

the idea of the new hooks is, that in reality they are at different
locations in the stack.  And at some point we might have some module
that is only interested in xfrm'ed packets.

> -Tom

-- 
- Harald Welte <laforge@netfilter.org>             http://www.netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 20:45                   ` Harald Welte
@ 2004-01-28 15:36                     ` Tom Eastep
  0 siblings, 0 replies; 63+ messages in thread
From: Tom Eastep @ 2004-01-28 15:36 UTC (permalink / raw)
  To: Harald Welte
  Cc: Willy Tarreau, Henrik Nordstrom, Michal Ludvig, netfilter-devel

On Tuesday 27 January 2004 12:45 pm, Harald Welte wrote:
> > > To do this, somewhen between esp_output() is called and the beginning
> > > of the modification of the packet payload, we need to call
> > > nf_hook(POST_ROUTING).  This way, conntrack would be able to put the
> > > connection in the hash, and people can do SNAT-like operations in
> > > nat->POSTROUTING.  We could even pass a dummy output device structure
> > > with an interface name "esp" so people can SNAT everything heading for
> > > esp encapsulation.
> >
> > I like this proposal.
>
> Great :)
>
> > I assume that on input, payload packets would also have this dummy
> > device as their input device?
>
> sure, makes sense.
>
> > And on output that payload packets would have "esp" as their output
> > device in the FORWARD and OUTPUT hooks as well?
>
> no, that's way more difficult.  I'm not sure whether it can be done at
> all (without adding rediculous kludges to the code).

Bummer -- I'm trying to avoid equally ridiculous kludges in my own code :-)

>
> > I would also like to register my vote for having the AH/ESP packets go
> > through INPUT and OUTPUT. This would allow Shorewall to treat IPSEC in
> > the same way as it does all other tunnel types:
>
> Mh.  Let's say we stick with the INPUT/OUTPUT chains for now.
>
> However, I would like to introduce new netfilter hooks.  At least for
> the beginning (due to lack of better ideas), I'm going to register the
> INPUT/OUTPUT chains with those new hooks.
>
> the idea of the new hooks is, that in reality they are at different
> locations in the stack.  And at some point we might have some module
> that is only interested in xfrm'ed packets.
>

I understand.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: NAT before IPsec with 2.6
  2004-01-27 10:39               ` Harald Welte
                                   ` (2 preceding siblings ...)
  2004-01-27 16:11                 ` Tom Eastep
@ 2004-01-27 19:51                 ` Michael Richardson
  3 siblings, 0 replies; 63+ messages in thread
From: Michael Richardson @ 2004-01-27 19:51 UTC (permalink / raw)
  To: netfilter-devel

-----BEGIN PGP SIGNED MESSAGE-----


>>>>> "Harald" == Harald Welte <laforge@netfilter.org> writes:
    Harald> To do this, somewhen between esp_output() is called and the
    Harald> beginning of the modification of the packet payload, we need to
    Harald> call nf_hook(POST_ROUTING).  This way, conntrack would be able to
    Harald> put the connection in the hash, and people can do SNAT-like
    Harald> operations in nat-> POSTROUTING.  We could even pass a dummy
    Harald> output device structure with an interface name "esp" so people
    Harald> can SNAT everything heading for esp encapsulation.

  I would suggest that many administrators will want to have possibly a
pseudo-interface per SA. These aren't necessarily real interfaces like
ipsecX, but they do need to be named in some way.

  (There are a number of scenarios where having a proper virtual interface
is really the best way to interact properly with routing daemons)

  The original IPsec implementation that was done at NRL created virtual
tunnels (using the "route" command) with virtual tunnel devices - one per
tunnel. They were criticised that this didn't scale - that's because at the
time (1993/1994) people weren't building kernels that assumed that there
could be thousands of devices. 
  Linux people *are* doing this, and doing it correct. The examples include:
	MPLS, PPP, PPPoE, VLAN

]       ON HUMILITY: to err is human. To moo, bovine.           |  firewalls  [
]   Michael Richardson,    Xelerance Corporation, Ottawa, ON    |net architect[
] mcr@xelerance.com      http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Finger me for keys

iQCVAwUBQBbBJ4qHRg3pndX9AQEaHAQA3+VNcmq/9RzhqIW/d5ecLN1bnh+/BQYf
hUELO1Drczud1ne1FxUHSDlvDtwRE5EzZ3tJwSA2NH+5s2yEYnk9Ydb+JNYPiWAL
F/KohhfcQxsasloCSpQ6KErazBU240pJfRYjlSwqNPsZLVC64CqyenEwAm4vxXJJ
X7uzK/idyc4=
=eeVc
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2004-03-12  0:15 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-21 12:29 NAT before IPsec with 2.6 Michal Ludvig
2004-01-23  6:57 ` Willy Tarreau
2004-01-23 12:31 ` Henrik Nordstrom
2004-01-23 13:31   ` Michal Ludvig
2004-01-23 14:24     ` Henrik Nordstrom
2004-01-23 14:40       ` Michal Ludvig
2004-01-23 15:56         ` Henrik Nordstrom
2004-01-23 15:51       ` Tom Eastep
2004-01-24  8:22         ` Willy Tarreau
2004-01-24  9:21           ` Henrik Nordstrom
2004-01-24  9:27             ` Willy Tarreau
2004-01-27 10:39               ` Harald Welte
2004-01-27 11:57                 ` Henrik Nordstrom
2004-01-27 13:07                   ` Harald Welte
2004-01-27 13:22                     ` Henrik Nordstrom
2004-01-27 14:12                     ` Henrik Nordstrom
2004-01-27 20:51                       ` Harald Welte
2004-01-27 22:35                         ` Henrik Nordstrom
2004-01-28 13:48                           ` Harald Welte
2004-01-27 22:41                         ` Willy Tarreau
2004-01-27 23:55                     ` Harald Welte
2004-01-28  0:14                       ` Willy Tarreau
2004-01-28  0:09                     ` [PATCH]Re: " Harald Welte
2004-01-28  8:49                       ` Patrick McHardy
2004-01-28  9:37                         ` Patrick McHardy
2004-01-28 10:30                         ` Harald Welte
2004-01-28 11:24                           ` Willy Tarreau
2004-01-28 13:39                             ` Harald Welte
2004-01-28 15:58                             ` Tom Eastep
2004-01-28 13:22                           ` Patrick McHardy
2004-01-28 14:23                           ` Henrik Nordstrom
2004-02-01 14:52                           ` Patrick McHardy
2004-02-16  1:19                             ` Patrick McHardy
2004-02-18 14:57                               ` Patrick McHardy
     [not found]                                 ` <20040218220337.GA3193@alpha.home.local>
2004-02-20  1:43                                   ` Patrick McHardy
2004-03-04 22:30                                     ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy
2004-03-04 23:11                                       ` Willy Tarreau
2004-03-04 23:42                                         ` Alexander Samad
2004-03-05  2:00                                           ` Patrick McHardy
2004-03-05  2:13                                             ` Alexander Samad
2004-03-10  2:45                                             ` Alexander Samad
2004-03-11 22:10                                               ` Patrick McHardy
2004-03-12  0:15                                                 ` Alexander Samad
2004-03-05  1:47                                         ` Patrick McHardy
2004-03-05 11:10                                           ` Willy Tarreau
2004-03-04 23:44                                       ` Patrick McHardy
2004-03-05 11:39                                       ` Harald Welte
2004-01-28 10:30                       ` [PATCH]Re: NAT before IPsec with 2.6 Andreas Jellinghaus
2004-01-29 19:05                         ` Harald Welte
2004-01-27 19:54                   ` Michael Richardson
2004-01-27 13:27                 ` Valentijn Sessink
2004-01-27 13:57                   ` Henrik Nordstrom
2004-01-27 21:13                   ` Andreas Jellinghaus
2004-01-28  8:58                     ` Harald Welte
2004-01-28 10:21                       ` Andreas Jellinghaus
2004-01-28 13:00                         ` Harald Welte
2004-01-28 13:43                           ` Andreas Jellinghaus
2004-01-28 14:24                       ` 2.6.2-rc2 and nf-log Wojciech 'Sas' Cieciwa
2004-01-28 19:38                       ` NAT before IPsec with 2.6 David S. Miller
2004-01-27 16:11                 ` Tom Eastep
2004-01-27 20:45                   ` Harald Welte
2004-01-28 15:36                     ` Tom Eastep
2004-01-27 19:51                 ` Michael Richardson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.