* NAT before IPsec with 2.6
@ 2004-01-21 12:29 Michal Ludvig
2004-01-23 6:57 ` Willy Tarreau
2004-01-23 12:31 ` Henrik Nordstrom
0 siblings, 2 replies; 63+ messages in thread
From: Michal Ludvig @ 2004-01-21 12:29 UTC (permalink / raw)
To: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]
Hi there,
with the native IPsec stack in 2.6.1 kernel it's impossible to do SNAT
on outgoing packets that are leaving through an IPsec link. The problem
is that the forwarded packets are encapsulated (with IPSec ESP/AH) yet
before the POSTROUTING chain is reached.
So I hacked up this simple solution - I understand that it won't be
accepted as-is, because now the POSTROUTING chain is reached twice (once
from ip_forward and once from ip_output). But anyways - is this a
correct approach and should I go this way with some cleanups?
Or is there another approach? I was thinking about extending the FORWARD
chain and let it do SNAT/MASQUERADE. That would be cleaner, but more
intrusive than my patch and would probably also require some changes in
the userspace.
Comments, opinions, ...?
(Please Cc me on replies, thanks!)
Michal Ludvig
--
SUSE Labs mludvig@suse.cz | Cray is the only computer
(+420) 296.545.373 http://www.suse.cz | that runs an endless loop
Personal homepage http://www.logix.cz/michal | in just four hours.
[-- Attachment #2: kernel-nat-before-ipsec.diff --]
[-- Type: text/plain, Size: 1188 bytes --]
diff -rup linux-2.6.1.orig/net/ipv4/ip_forward.c linux-2.6.1.naga/net/ipv4/ip_forward.c
--- linux-2.6.1.orig/net/ipv4/ip_forward.c 2004-01-16 14:28:39.000000000 +0100
+++ linux-2.6.1.naga/net/ipv4/ip_forward.c 2004-01-20 16:11:09.904466001 +0100
@@ -46,12 +46,30 @@ static inline int ip_forward_finish(stru
{
struct ip_options * opt = &(IPCB(skb)->opt);
+ if (!xfrm4_route_forward(skb))
+ goto drop;
+
+#ifdef CONFIG_NETFILTER_DEBUG
+ skb->nf_debug &= ~(1 << NF_IP_POST_ROUTING);
+#endif
+
IP_INC_STATS_BH(IpForwDatagrams);
if (unlikely(opt->optlen))
ip_forward_options(skb);
return dst_output(skb);
+
+drop:
+ kfree_skb(skb);
+ return NET_RX_DROP;
+}
+
+static inline int ip_forward_postroute(struct sk_buff *skb)
+{
+ struct rtable *rt = (struct rtable*)skb->dst;
+
+ return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, rt->u.dst.dev, ip_forward_finish);
}
int ip_forward(struct sk_buff *skb)
@@ -109,7 +131,7 @@ int ip_forward(struct sk_buff *skb)
skb->priority = rt_tos2priority(iph->tos);
return NF_HOOK(PF_INET, NF_IP_FORWARD, skb, skb->dev, rt->u.dst.dev,
- ip_forward_finish);
+ ip_forward_postroute);
sr_failed:
/*
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: NAT before IPsec with 2.6 2004-01-21 12:29 NAT before IPsec with 2.6 Michal Ludvig @ 2004-01-23 6:57 ` Willy Tarreau 2004-01-23 12:31 ` Henrik Nordstrom 1 sibling, 0 replies; 63+ messages in thread From: Willy Tarreau @ 2004-01-23 6:57 UTC (permalink / raw) To: Michal Ludvig; +Cc: netfilter-devel Hi, On Wed, Jan 21, 2004 at 01:29:09PM +0100, Michal Ludvig wrote: > So I hacked up this simple solution - I understand that it won't be > accepted as-is, because now the POSTROUTING chain is reached twice (once > from ip_forward and once from ip_output). But anyways - is this a > correct approach and should I go this way with some cleanups? I don't know if it's a correct approach, but I really think that all tables and chains should apply to a consistent level of encapsulation, whatever it is. Before, it was easy to process the packet at different levels because it was sent to one (or even several) interemediate virtual interfaces. To keep control of what goes in and out, now we should : 1) re-evaluate all previously evaluated chains when a packet has been decapsulated ; 2) re-inject encapsulated packets into the output & postrouting chains (it's a locally generated packet after all, it has no reason to bypass output filtering). 3) keep assigned fwmark between passes so that we can track what each packet does. I believe that 1) & 3) are already done, but I don't know if there's a general consensus towards or against 2). Cheers, Willy ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-21 12:29 NAT before IPsec with 2.6 Michal Ludvig 2004-01-23 6:57 ` Willy Tarreau @ 2004-01-23 12:31 ` Henrik Nordstrom 2004-01-23 13:31 ` Michal Ludvig 1 sibling, 1 reply; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-23 12:31 UTC (permalink / raw) To: Michal Ludvig; +Cc: netfilter-devel On Wed, 21 Jan 2004, Michal Ludvig wrote: > So I hacked up this simple solution - I understand that it won't be > accepted as-is, because now the POSTROUTING chain is reached twice (once > from ip_forward and once from ip_output). But anyways - is this a > correct approach and should I go this way with some cleanups? The ip_forward call should be moved to just before where the packets gets encapsulated when using IPSec. When using IPSec this is the conceptual POSTROUTING point in the stack for the plain packets. Other than that I see no direct problems. Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-23 12:31 ` Henrik Nordstrom @ 2004-01-23 13:31 ` Michal Ludvig 2004-01-23 14:24 ` Henrik Nordstrom 0 siblings, 1 reply; 63+ messages in thread From: Michal Ludvig @ 2004-01-23 13:31 UTC (permalink / raw) To: Henrik Nordstrom; +Cc: netfilter-devel Henrik Nordstrom told me that: > On Wed, 21 Jan 2004, Michal Ludvig wrote: > >>So I hacked up this simple solution - I understand that it won't be >>accepted as-is, because now the POSTROUTING chain is reached twice (once >>from ip_forward and once from ip_output). But anyways - is this a >>correct approach and should I go this way with some cleanups? > > The ip_forward call should be moved to just before where the packets gets > encapsulated when using IPSec. When using IPSec this is the conceptual > POSTROUTING point in the stack for the plain packets. Without my patch the path is: ip_forward() | check xfrm policy (here it's decided whether or not to encrypt) | NF_IP_FORWARD (unfortunately no NAT available here...) | ip_forward_finish() | dst_output() => esp_output() With my patch the path is: ip_forward() | check xfrm policy (could probably be omitted, we'll do it after NAT once again) | NF_IP_FORWARD (the packet may be e.g. dropped here) | ip_forward_postroute() | NF_IP_POST_ROUTING (do NAT here) | ip_forward_finish() | check xfrm policy (here it's decided whether or not to encrypt) | dst_output() => esp_output() I.e. the postrouting on the unencrypted packet is really called right before it gets encrypted. And the encrypted packet then hits the POSTROUTING again, but it's already a different packet, actually. Was this your point or did I misunderstand you? Michal Ludvig -- SUSE Labs mludvig@suse.cz | Cray is the only computer (+420) 296.545.373 http://www.suse.cz | that runs an endless loop Personal homepage http://www.logix.cz/michal | in just four hours. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-23 13:31 ` Michal Ludvig @ 2004-01-23 14:24 ` Henrik Nordstrom 2004-01-23 14:40 ` Michal Ludvig 2004-01-23 15:51 ` Tom Eastep 0 siblings, 2 replies; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-23 14:24 UTC (permalink / raw) To: Michal Ludvig; +Cc: netfilter-devel On Fri, 23 Jan 2004, Michal Ludvig wrote: > I.e. the postrouting on the unencrypted packet is really called right > before it gets encrypted. And the encrypted packet then hits the > POSTROUTING again, but it's already a different packet, actually. My issue is with packets not destinated for an IPSec tunnel.. from what I read your patch these will hit POSTROUTING twice. But maybe I misread your patch? Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-23 14:24 ` Henrik Nordstrom @ 2004-01-23 14:40 ` Michal Ludvig 2004-01-23 15:56 ` Henrik Nordstrom 2004-01-23 15:51 ` Tom Eastep 1 sibling, 1 reply; 63+ messages in thread From: Michal Ludvig @ 2004-01-23 14:40 UTC (permalink / raw) To: Henrik Nordstrom; +Cc: netfilter-devel Henrik Nordstrom told me that: > On Fri, 23 Jan 2004, Michal Ludvig wrote: > >>I.e. the postrouting on the unencrypted packet is really called right >>before it gets encrypted. And the encrypted packet then hits the >>POSTROUTING again, but it's already a different packet, actually. > > My issue is with packets not destinated for an IPSec tunnel.. from what I > read your patch these will hit POSTROUTING twice. But maybe I misread your > patch? Yes, that's true. But how to solve it? The best would be to skip the first POSTROUTING if the packet won't be encapsulated. But we can't know whether or not it will go to IPsec until we perform NAT on it. So it looks like we must skip the second POSTROUTING in the case where the packet was encapsulated. But how can we recognise these two situations? Wild idea - comparing some hashes (src/dst ip/port) before and after the first POSTROUTING call? Michal Ludvig -- SUSE Labs mludvig@suse.cz | Cray is the only computer (+420) 296.545.373 http://www.suse.cz | that runs an endless loop Personal homepage http://www.logix.cz/michal | in just four hours. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-23 14:40 ` Michal Ludvig @ 2004-01-23 15:56 ` Henrik Nordstrom 0 siblings, 0 replies; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-23 15:56 UTC (permalink / raw) To: Michal Ludvig; +Cc: netfilter-devel On Fri, 23 Jan 2004, Michal Ludvig wrote: > Yes, that's true. But how to solve it? The best would be to skip the > first POSTROUTING if the packet won't be encapsulated. But we can't know > whether or not it will go to IPsec until we perform NAT on it. As I said the hook needs to be just before the packet is about to enter IPSec. Any decisions on what to IPSec or not needs to be based on other information than NAT, i.e. via nfmark routing or similar. Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-23 14:24 ` Henrik Nordstrom 2004-01-23 14:40 ` Michal Ludvig @ 2004-01-23 15:51 ` Tom Eastep 2004-01-24 8:22 ` Willy Tarreau 1 sibling, 1 reply; 63+ messages in thread From: Tom Eastep @ 2004-01-23 15:51 UTC (permalink / raw) To: Henrik Nordstrom, Michal Ludvig; +Cc: netfilter-devel On Friday 23 January 2004 06:24 am, Henrik Nordstrom wrote: > On Fri, 23 Jan 2004, Michal Ludvig wrote: > > I.e. the postrouting on the unencrypted packet is really called right > > before it gets encrypted. And the encrypted packet then hits the > > POSTROUTING again, but it's already a different packet, actually. > > My issue is with packets not destinated for an IPSec tunnel.. from what I > read your patch these will hit POSTROUTING twice. But maybe I misread your > patch? I'm concerned that passing the unencrypted packet through POSTROUTING apparently makes it impossible for firewall rules to enforce encryption to a remote host or network. It's my opinion that when the ipsec<n> devices where eliminated, the baby got thrown out with the bath water. My $.02... -Tom -- Tom Eastep \ Nothing is foolproof to a sufficiently talented fool Shoreline, \ http://shorewall.net Washington USA \ teastep@shorewall.net ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-23 15:51 ` Tom Eastep @ 2004-01-24 8:22 ` Willy Tarreau 2004-01-24 9:21 ` Henrik Nordstrom 0 siblings, 1 reply; 63+ messages in thread From: Willy Tarreau @ 2004-01-24 8:22 UTC (permalink / raw) To: Tom Eastep; +Cc: Henrik Nordstrom, Michal Ludvig, netfilter-devel Hi, On Fri, Jan 23, 2004 at 07:51:40AM -0800, Tom Eastep wrote: > I'm concerned that passing the unencrypted packet through POSTROUTING > apparently makes it impossible for firewall rules to enforce encryption to a > remote host or network. > > It's my opinion that when the ipsec<n> devices where eliminated, the baby got > thrown out with the bath water. That was my point too. Even if the packet is only encapsulated, from an IP point of view, this is a completely new locally generated packet, and we need to be able to apply filtering before and after encapsulation. Willy ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-24 8:22 ` Willy Tarreau @ 2004-01-24 9:21 ` Henrik Nordstrom 2004-01-24 9:27 ` Willy Tarreau 0 siblings, 1 reply; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-24 9:21 UTC (permalink / raw) To: Willy Tarreau; +Cc: Tom Eastep, Michal Ludvig, netfilter-devel On Sat, 24 Jan 2004, Willy Tarreau wrote: > That was my point too. Even if the packet is only encapsulated, from an IP > point of view, this is a completely new locally generated packet, and we > need to be able to apply filtering before and after encapsulation. Yes, and to do that the required netfilter hooks needs to be added to the new IPSec code when packets leave the IP stack and enters IPSec or the reverse, similar to how there was hooks for the ipsecX: interfaces. These hooks have not yet been done, mostly because nobody has contributed the code adding these hooks to the new IPSec implementation in a reasonable manner I guess. Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-24 9:21 ` Henrik Nordstrom @ 2004-01-24 9:27 ` Willy Tarreau 2004-01-27 10:39 ` Harald Welte 0 siblings, 1 reply; 63+ messages in thread From: Willy Tarreau @ 2004-01-24 9:27 UTC (permalink / raw) To: Henrik Nordstrom; +Cc: Tom Eastep, Michal Ludvig, netfilter-devel On Sat, Jan 24, 2004 at 10:21:12AM +0100, Henrik Nordstrom wrote: > These hooks have not yet been done, mostly because nobody has contributed > the code adding these hooks to the new IPSec implementation in a > reasonable manner I guess. OK, that's good news. At first, I thought that people were against this mode of operation. This is now just a matter of time for anyone of us to understand how to add the hooks and send a patch ! I wish I add time... Regards, Willy ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-24 9:27 ` Willy Tarreau @ 2004-01-27 10:39 ` Harald Welte 2004-01-27 11:57 ` Henrik Nordstrom ` (3 more replies) 0 siblings, 4 replies; 63+ messages in thread From: Harald Welte @ 2004-01-27 10:39 UTC (permalink / raw) To: Willy Tarreau Cc: Henrik Nordstrom, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 3896 bytes --] On Sat, Jan 24, 2004 at 10:27:21AM +0100, Willy Tarreau wrote: > On Sat, Jan 24, 2004 at 10:21:12AM +0100, Henrik Nordstrom wrote: > > > These hooks have not yet been done, mostly because nobody has contributed > > the code adding these hooks to the new IPSec implementation in a > > reasonable manner I guess. > > OK, that's good news. At first, I thought that people were against this > mode of operation. This is now just a matter of time for anyone of us to > understand how to add the hooks and send a patch ! I wish I add time... I have to agree with Henrik. Once there is an implementation that the netfilter core team is happy with, we're certainly going to push very hard to include this in the mainstream kernel. I have to admit that I haven't yet found a 'perfect' solution of that problem - but I'm following the ongoing discussion. JFYI: I really don't like the 'postrouting called twice' solution. This violates one of the core basics of netfilter/iptables so far. I also disagree that an ESP packet should be trated as locally generated (and thus iterate over OUTPUT). From my perspective, locally generated means more something like: sent via a local userspace process using PF_INET sockets. If we consider a packet after any kind of change as locally-generated, we could argue doing this with NAT'ed packets, too. Please try to convince me, if I'm missing some point. One of the fundamental principles behind netfilter/iptables, especially NAT, is it's symmetric behaviour (between incoming and outgoing packets, between the hooks, ...). So with any possible solution, this should be kept in mind. Also, I could expect way more headaches with regard to NAT of IPsec encapsulated packets. NAT is based on connection tracking and thus connection tracking has to be fully compatible with the 2.6.x IPsec implementation. Right now, conntrack doesn't even notice that there are ESP/AH packets. PREROUTING creates a new conntrack for the unencapsulated packet. Later the packet is encapsulated, and confirmed at POSTROUTING time [where it is still the same skb, but one should argue whether that ESP packet really belongs to the original connection]. My proposal: From conntracks view (and thus from NAT view), there (shuld be?) two seperate connections for any given to-be-encapsulated packet: - the payload connection (tcp/udp/whatever) - conntrack entry created at PREROUTING - conntrack entry confirmed (put in hash) at XXX - the AH/ESP connection - conntrack entry created at YYY - conntrack entry confirmed at POSTROUTING To do this, somewhen between esp_output() is called and the beginning of the modification of the packet payload, we need to call nf_hook(POST_ROUTING). This way, conntrack would be able to put the connection in the hash, and people can do SNAT-like operations in nat->POSTROUTING. We could even pass a dummy output device structure with an interface name "esp" so people can SNAT everything heading for esp encapsulation. As for the AH/ESP connection, this has to be created just after the packet was encapsulated. We need ot clear skb->nfct (and drop refcount to old conntrack) after encapsulation, and call nf_hook(PRE_ROUTING). The then-encapsulated packet will then finally go via the traditional ipv4 stack and come to the POST_ROUTING hook, where the ESP connection can be confirmed. This should solve all our NAT problems for free, shouldn't it? > Regards, > Willy -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 10:39 ` Harald Welte @ 2004-01-27 11:57 ` Henrik Nordstrom 2004-01-27 13:07 ` Harald Welte 2004-01-27 19:54 ` Michael Richardson 2004-01-27 13:27 ` Valentijn Sessink ` (2 subsequent siblings) 3 siblings, 2 replies; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-27 11:57 UTC (permalink / raw) To: Harald Welte; +Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel On Tue, 27 Jan 2004, Harald Welte wrote: > I also disagree that an ESP packet should be trated as locally > generated (and thus iterate over OUTPUT). I partly, but at the same time it becomes somewhat complex when making firewalling policies when not. For me it is easier to grasp what is happending with the packet if it is viewed like the original packet is forwarded to the ipsec engine and the ipsec engine then generates a new packet. This is very different from NAT as ESP/AH is a complete protocol transformation where the resulting packet does not resemble the original packet in any manner. This is also very much visible to conntrack where the original packets needs to be tracked as one connection while the ipsec gateway<->gateway traffic needs to be a separate connection. IPSec is essentially a tunneling protocol, tunneling IP packets over an encrypted connection. The tunnel endpoints are the IPSec gateways. Even in the Host<->Host IPSec case the tunnel view is most appropriate. As soon as the packets enter the IPSec encapsulation they loose all the relevant propoerties of the original connection and becomes part of the single IPSec Host <-> IPSec Host connection all packets between these two IPSec host is transmitted over. What may be interesting in netfilter is to preserve some kind of inheritance, allowing "OUTPUT" rules to know important details about the original packet when seeing an ESP packet. My view of where hooks should be for IPSec: (Client) -> Gateway -> (Gateway) 1. PREROUTING, original client packet 2. FORWARD, original client packet. Destination the IPSec SPI 3. POSTROUTING, original client packet. Destination the IPSec SPI 4. OUTPUT, ESP packet. Preferably with some preserved information about the original packet attached. 5. POSTROUTING, ESP packet. If the packet is rerouted in POSTROUTING then this should include IPSec routing. If the new route is to an IPSec SPI then it is. If not then it is not. (Gateway) -> Gateway -> (client) 1. PREROUTING, ESP packet 2. FORWARD, ESP packet. Destination the IPSec SPI 3. POSTROUTING, ESP packet. Destination the IPSec SPI. 4. PREROUTING, decapsulated packet. With reference to the IPSec SPI. 5. FORWARD, decapsulated packet. With reference to the IPSec SPI. 6. POSTROUTING, decapsulated packet. With reference to the IPSec SPI. Host -> (Gateway) 1. OUTPUT, original packet 2. POSTROUTING, original packet. Destination the IPSec SPI 3. OUTPUT, ESP packet. Preferably with some preserved information about the original packet attached. 4. POSTROUTING, ESP packet. Preferably with some preserved information about the original packet attached. (Gateway) -> Host 1. PREROUTING, ESP packet. 2. INPUT, ESP packet. 3. PREROUTING, Decapsulated packet, with reference to the IPSec SPI 4. INPUT, Decapsulated packet, with reference to the IPSec SPI Non-IPSec traffic: As is today. There must be no change. These hooks fits naturally into the concepts of the netfilter hook principles if IPSec is viewed as a tunneling method, which it in my opinion is best viewed as. It also fits the requirements for conntrack to be able to track the different types of connections involved, including NAT operations on both original and ESP traffic (assuming local NAT enabled if DNAT is required on ESP). > generated means more something like: sent via a local userspace process > using PF_INET sockets. If we consider a packet after any kind of change > as locally-generated, we could argue doing this with NAT'ed packets, > too. Please try to convince me, if I'm missing some point. A NAT:ed connection is still the same connection at the packet level. ESP moves the packet into a new connection carrying all kinds of packets between the two IPSec endpoints where this host is one of the endpoints. > One of the fundamental principles behind netfilter/iptables, especially > NAT, is it's symmetric behaviour (between incoming and outgoing packets, > between the hooks, ...). So with any possible solution, this should be > kept in mind. Granted. > My proposal: > From conntracks view (and thus from NAT view), there (shuld be?) two > seperate connections for any given to-be-encapsulated packet: Ah, we agree! ;-) The above hook orders should match your expectations I think. Maybe we don't agree on the use of INPUT/OUTPUT in the case of forwarded client traffic, but I see no other clean way. These packets are deliberately addressed to/from this host, not the client, and part of the IPSec connection they are forwarded over between the two IPSec endpoints, not the original connection. I save myself from diging into exacly where the hooks need to be in the kernel as I have not looked in detail at the 2.6 IPSec implementation, but I take for granted that several need to be at the corresponding locations in the IPSec protocol kernel. I also see it likely there needs to be some skb extensions to add the requires SPI tracking information to allow rulesets a better view of what is going on (optional). Without knowing in detail how the 2.6 IPSec policies are maintained I see a slight risk that the 2.6 IPSec policy implementation maybe can not easily match the routing expectations, but it should be possible to find a way to at least give reasonable hook order I think. But there is a risk it is not possible to reroute packets to/from an IPSec route and the DNAT capabilities may be somewhat limited in such case until the IPSec kernel is extended with what may be required to do this cleanly. (i.e. slightly more than just inserting the appropriate hooks may be required) Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 11:57 ` Henrik Nordstrom @ 2004-01-27 13:07 ` Harald Welte 2004-01-27 13:22 ` Henrik Nordstrom ` (3 more replies) 2004-01-27 19:54 ` Michael Richardson 1 sibling, 4 replies; 63+ messages in thread From: Harald Welte @ 2004-01-27 13:07 UTC (permalink / raw) To: Henrik Nordstrom Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 4302 bytes --] On Tue, Jan 27, 2004 at 12:57:10PM +0100, Henrik Nordstrom wrote: > For me it is easier to grasp what is happending with the packet if it is > viewed like the original packet is forwarded to the ipsec engine and the > ipsec engine then generates a new packet. agreed. > What may be interesting in netfilter is to preserve some kind of > inheritance, allowing "OUTPUT" rules to know important details about the > original packet when seeing an ESP packet. yes, but that's again another chapter (like how to preserve that kind of information, .. > My view of where hooks should be for IPSec: > > (Client) -> Gateway -> (Gateway) > > 1. PREROUTING, original client packet > 2. FORWARD, original client packet. Destination the IPSec SPI > 3. POSTROUTING, original client packet. Destination the IPSec SPI > 4. OUTPUT, ESP packet. Preferably with some preserved information about > the original packet attached. > 5. POSTROUTING, ESP packet. I agree to all but 4. My proposal was PREROUTING, yours OUTPUT. In the end, none of them really fits. But I would consider this a minor issue. We could also create a new hook 'POSTENCAP' or whatever... Maybe we could agree at that point, that there is a new hook NF_IP_POST_XFRM. It's a matter of iptable_nat/... which chain they want to attach to that hook. > If the packet is rerouted in POSTROUTING then this should include IPSec > routing. If the new route is to an IPSec SPI then it is. If not then it is > not. You are talking about the first POSTROUTING, Number 3? Then I would definitely agree. > (Gateway) -> Gateway -> (client) > > 1. PREROUTING, ESP packet > 2. FORWARD, ESP packet. Destination the IPSec SPI > 3. POSTROUTING, ESP packet. Destination the IPSec SPI. > 4. PREROUTING, decapsulated packet. With reference to the IPSec SPI. > 5. FORWARD, decapsulated packet. With reference to the IPSec SPI. > 6. POSTROUTING, decapsulated packet. With reference to the IPSec SPI. This is not symmetric, despite being the exact opposite of the previous case. I don't like that. If we make it similar to the previous case, it would be: 1. PREROUTING, ESP packet 2. INPUT, ESP packet 3. PREROUTING, decapsulated packet 4. FORWARD, decapsulated packet 5. POSTROUTING, decapsulated packet. So here we have the analgy of a locally received (INPUT) packet, as the opposite of the locally received (OUTPUT) packet in your previous case. > Host -> (Gateway) > > 1. OUTPUT, original packet > 2. POSTROUTING, original packet. Destination the IPSec SPI > 3. OUTPUT, ESP packet. Preferably with some preserved information about > the original packet attached. > 4. POSTROUTING, ESP packet. Preferably with some preserved information > about the original packet attached. this would make it similar to the case #1. > (Gateway) -> Host > > 1. PREROUTING, ESP packet. > 2. INPUT, ESP packet. > 3. PREROUTING, Decapsulated packet, with reference to the IPSec SPI > 4. INPUT, Decapsulated packet, with reference to the IPSec SPI agreed. > Ah, we agree! ;-) yes, at least in the largest part. > I save myself from diging into exacly where the hooks need to be in the > kernel as I have not looked in detail at the 2.6 IPSec implementation, but > I take for granted that several need to be at the corresponding locations > in the IPSec protocol kernel. I guess I cannot save myself from doing so. Unfortunately I have very limited time at this moment, so don't expect it to be done today. > I also see it likely there needs to be some skb extensions to add the > requires SPI tracking information to allow rulesets a better view of > what is going on (optional). yes, but let's that make step 2. First we have to agree on the first step, do a decent implementation and then submit it to netdev. > Regards > Henrik -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 13:07 ` Harald Welte @ 2004-01-27 13:22 ` Henrik Nordstrom 2004-01-27 14:12 ` Henrik Nordstrom ` (2 subsequent siblings) 3 siblings, 0 replies; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-27 13:22 UTC (permalink / raw) To: Harald Welte; +Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel On Tue, 27 Jan 2004, Harald Welte wrote: > > If the packet is rerouted in POSTROUTING then this should include IPSec > > routing. If the new route is to an IPSec SPI then it is. If not then it is > > not. > > You are talking about the first POSTROUTING, Number 3? Then I would > definitely agree. I am. > > (Gateway) -> Gateway -> (client) > > This is not symmetric, despite being the exact opposite of the previous > case. I don't like that. If we make it similar to the previous case, > it would be: > > 1. PREROUTING, ESP packet > 2. INPUT, ESP packet > 3. PREROUTING, decapsulated packet > 4. FORWARD, decapsulated packet > 5. POSTROUTING, decapsulated packet. Right. My error. Thinks one thing, writes another. Brain obviously not fully functional today. > > I also see it likely there needs to be some skb extensions to add the > > requires SPI tracking information to allow rulesets a better view of > > what is going on (optional). > > yes, but let's that make step 2. First we have to agree on the first > step, do a decent implementation and then submit it to netdev. Agreed. Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 13:07 ` Harald Welte 2004-01-27 13:22 ` Henrik Nordstrom @ 2004-01-27 14:12 ` Henrik Nordstrom 2004-01-27 20:51 ` Harald Welte 2004-01-27 23:55 ` Harald Welte 2004-01-28 0:09 ` [PATCH]Re: " Harald Welte 3 siblings, 1 reply; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-27 14:12 UTC (permalink / raw) To: Harald Welte; +Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel On Tue, 27 Jan 2004, Harald Welte wrote: > > My view of where hooks should be for IPSec: > > > > (Client) -> Gateway -> (Gateway) > > > > 1. PREROUTING, original client packet > > 2. FORWARD, original client packet. Destination the IPSec SPI > > 3. POSTROUTING, original client packet. Destination the IPSec SPI > > 4. OUTPUT, ESP packet. Preferably with some preserved information about > > the original packet attached. > > 5. POSTROUTING, ESP packet. > > I agree to all but 4. My proposal was PREROUTING, yours OUTPUT. It is very hard to distinguis the different IPSec cases here. Save of adding yet another hook for encapsulated and to be decapsulated packets OUTPUT(/INPUT) is the closest match in my opinion. The packet it sees is the packet from this IPSec endpoint to the other IPSec endpoint on the IPSec SPI connection between the two. The packet inside may be a forwarded packet or locally generated, it's all the same. The ESP is an exchange between the two endpoints OUTPUT -> POSTROUTING <-net-> PREROUTING -> INPUT Use of PREROUTING introduces a completely new semantic unless you want to also call FORWARD on these packets which I do not think is appropriate. If introducing new hooks then we would have XXXX -> POSTROUTING <-net-> PREROUTING -> YYYY if XXXX and YYYY can be the same hook name in this case or not I'll leave to the core team to ponder upon? What exacly is the semantical problem of using OUTPUT/INPUT? In my view the ESP/AH packet is locally generated, and belongs to a connection with a local endpoint. Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 14:12 ` Henrik Nordstrom @ 2004-01-27 20:51 ` Harald Welte 2004-01-27 22:35 ` Henrik Nordstrom 2004-01-27 22:41 ` Willy Tarreau 0 siblings, 2 replies; 63+ messages in thread From: Harald Welte @ 2004-01-27 20:51 UTC (permalink / raw) To: Henrik Nordstrom Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2053 bytes --] On Tue, Jan 27, 2004 at 03:12:49PM +0100, Henrik Nordstrom wrote: > On Tue, 27 Jan 2004, Harald Welte wrote: > > > > My view of where hooks should be for IPSec: > > > > > > (Client) -> Gateway -> (Gateway) > > > > > > 1. PREROUTING, original client packet > > > 2. FORWARD, original client packet. Destination the IPSec SPI > > > 3. POSTROUTING, original client packet. Destination the IPSec SPI > > > 4. OUTPUT, ESP packet. Preferably with some preserved information about > > > the original packet attached. > > > 5. POSTROUTING, ESP packet. > > > > I agree to all but 4. My proposal was PREROUTING, yours OUTPUT. > > It is very hard to distinguis the different IPSec cases here. Save of > adding yet another hook for encapsulated and to be decapsulated packets > OUTPUT(/INPUT) is the closest match in my opinion. Please don't confuse hooks with chains. I suggested to add new netfilter hooks (NF_IP_{PRE,POST}_XFRM), but attach the old INPUT/OUTPUT (or even PREROUTING/POSTROUTING) chains to those hooks. Hm. You have a point, but I'm still not really sure. I'll do it the way you proposed, but I might change my mind again in the future ;) > If introducing new hooks then we would have > > XXXX -> POSTROUTING <-net-> PREROUTING -> YYYY > > if XXXX and YYYY can be the same hook name in this case or not I'll leave > to the core team to ponder upon? > > What exacly is the semantical problem of using OUTPUT/INPUT? In my view > the ESP/AH packet is locally generated, and belongs to a connection with a > local endpoint. well, you convinced me (for now). new hooks, old chains. > Regards > Henrik -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 20:51 ` Harald Welte @ 2004-01-27 22:35 ` Henrik Nordstrom 2004-01-28 13:48 ` Harald Welte 2004-01-27 22:41 ` Willy Tarreau 1 sibling, 1 reply; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-27 22:35 UTC (permalink / raw) To: Harald Welte; +Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel On Tue, 27 Jan 2004, Harald Welte wrote: > Please don't confuse hooks with chains. I suggested to add new > netfilter hooks (NF_IP_{PRE,POST}_XFRM), but attach the old INPUT/OUTPUT > (or even PREROUTING/POSTROUTING) chains to those hooks. Sorry for the naming mismatch. With the current close relation between hooks and chains it is easy to confuse the two when one is not installing hooks every day. All thoughts was in netfilter hooks not iptables chains. Using the existing existing hooks with NF_IP_LOCAL_IN/OUT for the IPSec wrapped packets is a exact match on what is required for conntrack, but to answer my own question on what is wrong with INPUT/OUTPUT there is nothing wrong but NAT and mangle may benefit from using a couple of new hooks for the "original" packets just at the border to IPSec due to the different rerouting requirements. NF_IP_LOCAL_IN/OUT is however an exact match for all existing netfilter module requirements on the encapsulated packets from what I can tell. If there is a desire to use new hooks for the encapsulated packets then this should be done for the whole path of those packets, which obviously becomes somewhat tricky as it is not known yet at the NF_IP_PRE_ROUTING time of incoming packets that the encapsulated packet will go to IPSec and a bastard on final output of transmitted packets. My slightly revised proposal is now (this time using hook names to avoid confusion) (Client) -> Gateway -> (Gateway) 1. NF_IP_PRE_ROUTING 2. NF_IP_FORWARD 3. NF_IP_XXXXXXXX, transparently mapped to the POSTROUTING chain in iptables. [IPSec encapsulation] 4. NF_IP_LOCAL_OUT 5. NF_IP_POST_ROUTING (Gateway) -> Gateway -> (client) 1. NF_IP_PRE_ROUTING 2. NF_IP_LOCAL_IN [IPSec decapsulation] 3. NF_IP_YYYYYYYY, transparently mapped to the PREROUTING chain in iptables 4. NF_IP_FORWARD 5. NF_IP_POST_ROUTING And similarily for the Host -> Gateway/Host case (locally generated packets). But I'd say the use of new hooks should be avoided unless seen needed when implementing rerouting. Similar "rerouting" needs to be done in the normal hook to allow netfilter packet modifications to change the routing/policy decision to cause a previously clear text packet to be sent over IPSec as is needed on the same changes to packets decided to go to an IPSec SPI. My problem with adding new hooks is that I don't see a value in adding new types of hooks if all uses of this hook will be mapped to the exact same code paths. What is pretty clear is that the "rerouting" requirements is far from obvious and needs to be analyzed carefully. If one thought rerouting of packets in the plain IP with ip_route_me_harder was a mess the situation is now some orders of magnitude more complex it seems, at least from a quick evaluation without digging into the actual 2.6 IPSec code. Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 22:35 ` Henrik Nordstrom @ 2004-01-28 13:48 ` Harald Welte 0 siblings, 0 replies; 63+ messages in thread From: Harald Welte @ 2004-01-28 13:48 UTC (permalink / raw) To: Henrik Nordstrom Cc: Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2066 bytes --] On Tue, Jan 27, 2004 at 11:35:48PM +0100, Henrik Nordstrom wrote: > Sorry for the naming mismatch. With the current close relation between > hooks and chains it is easy to confuse the two when one is not installing > hooks every day. All thoughts was in netfilter hooks not iptables chains. > > Using the existing existing hooks with NF_IP_LOCAL_IN/OUT for the IPSec > wrapped packets is a exact match on what is required for conntrack, but to > answer my own question on what is wrong with INPUT/OUTPUT there is nothing > wrong but NAT and mangle may benefit from using a couple of new hooks for > the "original" packets just at the border to IPSec due to the different > rerouting requirements. NF_IP_LOCAL_IN/OUT is however an exact match for > all existing netfilter module requirements on the encapsulated packets > from what I can tell. Well, I now had to find out that libiptc depends on a 'one hook per chain' assumption, too bad. My idea was to attach the same chain to multiple hooks. That is not possible, as of now. As you can see in your patch, I gave up on all of that and just use the existing hooks. > What is pretty clear is that the "rerouting" requirements is far from > obvious and needs to be analyzed carefully. If one thought rerouting of > packets in the plain IP with ip_route_me_harder was a mess the situation > is now some orders of magnitude more complex it seems, at least from a > quick evaluation without digging into the actual 2.6 IPSec code. yes, indeed. Unfortunately, analyzing this whole issue has alrady cost me too much of my time the last couple of days :( I'll have to defer it for now. > Regards > Henrik -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 20:51 ` Harald Welte 2004-01-27 22:35 ` Henrik Nordstrom @ 2004-01-27 22:41 ` Willy Tarreau 1 sibling, 0 replies; 63+ messages in thread From: Willy Tarreau @ 2004-01-27 22:41 UTC (permalink / raw) To: Harald Welte, Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel Hi guys, On Tue, Jan 27, 2004 at 09:51:23PM +0100, Harald Welte wrote: <...> > Hm. You have a point, but I'm still not really sure. I'll do it the > way you proposed, but I might change my mind again in the future ;) Cool, I'm very glad that we're at least several ones to think it the same way. I didn't respond because my proposal would have been identical to Henrik's. So please count my vote :-) Thanks, Willy ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 13:07 ` Harald Welte 2004-01-27 13:22 ` Henrik Nordstrom 2004-01-27 14:12 ` Henrik Nordstrom @ 2004-01-27 23:55 ` Harald Welte 2004-01-28 0:14 ` Willy Tarreau 2004-01-28 0:09 ` [PATCH]Re: " Harald Welte 3 siblings, 1 reply; 63+ messages in thread From: Harald Welte @ 2004-01-27 23:55 UTC (permalink / raw) To: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 6649 bytes --] Hi! I've now hacked a preliminary patch. Be warned, I didn't even test if it compiles. Maybe someone who actually has a running 2.6.x ipsec setup can give it a try. What it should be doing - traverse the additional chains as described in the last email - have connection tracking recognize two seperate connections, one for the decapsulated traffic, one for the encapsulated - thus, even SNAT/DNAT should be working. - locally-encapsulated traffic shows up with input device "ah4" or "esp4" in POSTROUTING. - locally-encapsulated traffic shows up with output device "ah4" or "esp4" in OUTPUT. What is missing (TODO): - no dummy device names in INPUT/PREROUTING for locally-decapsulated packets. This is somewhat harder - no real output device shown in OUTPUT for locally-encapsulated packets. I'm not sure if it is legal to typecast the just-popped dst_entry to 'struct rtable' and derive the output interface from there. Please give feedback. P.S.: Yes, the #ifdef's are ugly. I know. diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/ah4.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/ah4.c --- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/ah4.c 2003-12-18 11:39:26.000000000 +0100 +++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/ah4.c 2004-01-28 00:40:38.923531384 +0100 @@ -9,6 +9,12 @@ #include <net/icmp.h> #include <asm/scatterlist.h> +static struct net_device *ah_dummydev; + +static int ah_xmit_bypass(struct sk_buff *skb) +{ + return NET_XMIT_BYPASS; +} /* Clear mutable options and find final destination to substitute * into IP header for icv calculation. Options are already checked @@ -54,7 +60,7 @@ return 0; } -static int ah_output(struct sk_buff *skb) +static int ah_output_finish(struct sk_buff *skb) { int err; struct dst_entry *dst = skb->dst; @@ -145,7 +151,17 @@ err = -EHOSTUNREACH; goto error_nolock; } - return NET_XMIT_BYPASS; + +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->conntrack); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif + + return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, ah_dummydev, NULL, + ah_xmit_bypass); error: spin_unlock_bh(&x->lock); @@ -154,6 +170,12 @@ return err; } +static ah_output(struct sk_buff *skb) +{ + return NF_HOOK(PF_INET, IP_NF_POST_ROUTING, skb, skb->dev, ah_dummydev, + ah_output_finish); +} + int ah_input(struct xfrm_state *x, struct xfrm_decap_state *decap, struct sk_buff *skb) { int ah_hlen; @@ -354,11 +376,18 @@ xfrm_unregister_type(&ah_type, AF_INET); return -EAGAIN; } + ah_dummydev = alloc_netdev(0, "ah4", NULL); + if (!ah_dummydev) { + printk(KERN_INFO "ip ah init: can't allocate dummydev\n"); + inet_del_protocol(&ah4_protocol, IPPROTO_AH); + xfrm_unregister_type(&ah_type, AF_INET); + } return 0; } static void __exit ah4_fini(void) { + free_netdev(ah_dummydev); if (inet_del_protocol(&ah4_protocol, IPPROTO_AH) < 0) printk(KERN_INFO "ip ah close: can't remove protocol\n"); if (xfrm_unregister_type(&ah_type, AF_INET) < 0) diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/esp4.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/esp4.c --- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/esp4.c 2003-12-18 11:38:22.000000000 +0100 +++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/esp4.c 2004-01-28 00:33:58.002480648 +0100 @@ -20,7 +20,16 @@ __u8 proto; }; -int esp_output(struct sk_buff *skb) +static struct net_device *esp_dummydev; + +/* dummy function for netfilter. esp_output_finish() has to return + * that particular value upon success */ +static int esp_xmit_bypass(struct sk_buff *skb) +{ + return NET_XMIT_BYPASS; +} + +static int esp_output_finish(struct sk_buff *skb) { int err; struct dst_entry *dst = skb->dst; @@ -199,7 +208,17 @@ err = -EHOSTUNREACH; goto error_nolock; } - return NET_XMIT_BYPASS; + +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->conntrack); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif + /* FIXME: return real destination device */ + return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, esp_dummydev, NULL, + esp_xmit_bypass); error: spin_unlock_bh(&x->lock); @@ -208,6 +227,12 @@ return err; } +int esp_output(struct sk_buff *skb) +{ + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, esp_dummydev, + esp_output_finish); +} + /* * Note: detecting truncated vs. non-truncated authentication data is very * expensive, so we only support truncated data, which is the recommended @@ -596,11 +621,19 @@ xfrm_unregister_type(&esp_type, AF_INET); return -EAGAIN; } + esp_dummydev = alloc_netdev(0, "esp4", NULL); + if (!esp_dummydev) { + printk(KERN_INFO "ip esp init: can't allocate dummydev\n"); + inet_del_protocol(&esp4_protocol, IPPROTO_ESP); + xfrm_unregister_type(&esp_type, AF_INET); + return -EAGAIN; + } return 0; } static void __exit esp4_fini(void) { + free_netdev(esp_dummydev); if (inet_del_protocol(&esp4_protocol, IPPROTO_ESP) < 0) printk(KERN_INFO "ip esp close: can't remove protocol\n"); if (xfrm_unregister_type(&esp_type, AF_INET) < 0) diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/xfrm4_input.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/xfrm4_input.c --- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/xfrm4_input.c 2004-01-28 00:50:36.204730840 +0100 +++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/xfrm4_input.c 2004-01-28 00:51:39.557099816 +0100 @@ -6,6 +6,8 @@ * Split up af-specific portion * Derek Atkins <derek@ihtfp.com> * Add Encapsulation support + * Harald Welte <laforge@netfilter.org> + * Drop conntrack reference before calling netif_rx() * */ @@ -130,6 +132,13 @@ dst_release(skb->dst); skb->dst = NULL; } +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->conntrack); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif netif_rx(skb); return 0; } else { -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 23:55 ` Harald Welte @ 2004-01-28 0:14 ` Willy Tarreau 0 siblings, 0 replies; 63+ messages in thread From: Willy Tarreau @ 2004-01-28 0:14 UTC (permalink / raw) To: Harald Welte, Henrik Nordstrom, Tom Eastep, Michal Ludvig, netfilter-devel On Wed, Jan 28, 2004 at 12:55:25AM +0100, Harald Welte wrote: > Hi! > > I've now hacked a preliminary patch. Be warned, I didn't even test if > it compiles. Maybe someone who actually has a running 2.6.x ipsec setup > can give it a try. I have somewhere a working 2.6 ipsec backport to 2.4 thanks to Davem and Herbert Xu. I'll see how it applies on it and if it works too, because this is nearly the only thing refraining me from definitely switching over this new stack. But don't expect any feedback before this week-end. Thanks, willy ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH]Re: NAT before IPsec with 2.6 2004-01-27 13:07 ` Harald Welte ` (2 preceding siblings ...) 2004-01-27 23:55 ` Harald Welte @ 2004-01-28 0:09 ` Harald Welte 2004-01-28 8:49 ` Patrick McHardy 2004-01-28 10:30 ` [PATCH]Re: NAT before IPsec with 2.6 Andreas Jellinghaus 3 siblings, 2 replies; 63+ messages in thread From: Harald Welte @ 2004-01-28 0:09 UTC (permalink / raw) To: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 7006 bytes --] Hi! [please ignore the last message, I mistakenly sent an outdated, incomplete patch] I've now hacked a preliminary patch. Be warned, I didn't even test if it compiles. Maybe someone who actually has a running 2.6.x ipsec setup can give it a try. What it should be doing - traverse the additional chains as described in the last email - have connection tracking recognize two seperate connections, one for the decapsulated traffic, one for the encapsulated - thus, even SNAT/DNAT should be working. - locally-encapsulated traffic shows up with input device "ah4" or "esp4" in POSTROUTING. - locally-encapsulated traffic shows up with output device "ah4" or "esp4" in OUTPUT. What is missing (TODO): - no dummy device names in INPUT/PREROUTING for locally-decapsulated packets. This is somewhat harder - no real output device shown in OUTPUT for locally-encapsulated packets. I'm not sure if it is legal to typecast the just-popped dst_entry to 'struct rtable' and derive the output interface from there. Please give feedback. P.S.: Yes, the #ifdef's are ugly. I know. diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/ah4.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/ah4.c --- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/ah4.c 2003-12-18 11:39:26.000000000 +0100 +++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/ah4.c 2004-01-28 01:08:59.430015104 +0100 @@ -7,8 +7,15 @@ #include <linux/crypto.h> #include <linux/pfkeyv2.h> #include <net/icmp.h> +#include <linux/netfilter_ipv4.h> #include <asm/scatterlist.h> +static struct net_device *ah_dummydev; + +static int ah_xmit_bypass(struct sk_buff *skb) +{ + return NET_XMIT_BYPASS; +} /* Clear mutable options and find final destination to substitute * into IP header for icv calculation. Options are already checked @@ -54,7 +61,7 @@ return 0; } -static int ah_output(struct sk_buff *skb) +static int ah_output_finish(struct sk_buff *skb) { int err; struct dst_entry *dst = skb->dst; @@ -145,7 +152,17 @@ err = -EHOSTUNREACH; goto error_nolock; } - return NET_XMIT_BYPASS; + +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif + + return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, ah_dummydev, NULL, + ah_xmit_bypass); error: spin_unlock_bh(&x->lock); @@ -154,6 +171,12 @@ return err; } +static int ah_output(struct sk_buff *skb) +{ + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, ah_dummydev, + ah_output_finish); +} + int ah_input(struct xfrm_state *x, struct xfrm_decap_state *decap, struct sk_buff *skb) { int ah_hlen; @@ -354,11 +377,18 @@ xfrm_unregister_type(&ah_type, AF_INET); return -EAGAIN; } + ah_dummydev = alloc_netdev(0, "ah4", NULL); + if (!ah_dummydev) { + printk(KERN_INFO "ip ah init: can't allocate dummydev\n"); + inet_del_protocol(&ah4_protocol, IPPROTO_AH); + xfrm_unregister_type(&ah_type, AF_INET); + } return 0; } static void __exit ah4_fini(void) { + free_netdev(ah_dummydev); if (inet_del_protocol(&ah4_protocol, IPPROTO_AH) < 0) printk(KERN_INFO "ip ah close: can't remove protocol\n"); if (xfrm_unregister_type(&ah_type, AF_INET) < 0) diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/esp4.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/esp4.c --- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/esp4.c 2003-12-18 11:38:22.000000000 +0100 +++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/esp4.c 2004-01-28 01:09:19.610947136 +0100 @@ -8,6 +8,7 @@ #include <linux/crypto.h> #include <linux/pfkeyv2.h> #include <linux/random.h> +#include <linux/netfilter_ipv4.h> #include <net/icmp.h> #include <net/udp.h> @@ -20,7 +21,16 @@ __u8 proto; }; -int esp_output(struct sk_buff *skb) +static struct net_device *esp_dummydev; + +/* dummy function for netfilter. esp_output_finish() has to return + * that particular value upon success */ +static int esp_xmit_bypass(struct sk_buff *skb) +{ + return NET_XMIT_BYPASS; +} + +static int esp_output_finish(struct sk_buff *skb) { int err; struct dst_entry *dst = skb->dst; @@ -199,7 +209,17 @@ err = -EHOSTUNREACH; goto error_nolock; } - return NET_XMIT_BYPASS; + +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif + /* FIXME: return real destination device */ + return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, esp_dummydev, NULL, + esp_xmit_bypass); error: spin_unlock_bh(&x->lock); @@ -208,6 +228,12 @@ return err; } +int esp_output(struct sk_buff *skb) +{ + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, esp_dummydev, + esp_output_finish); +} + /* * Note: detecting truncated vs. non-truncated authentication data is very * expensive, so we only support truncated data, which is the recommended @@ -596,11 +622,19 @@ xfrm_unregister_type(&esp_type, AF_INET); return -EAGAIN; } + esp_dummydev = alloc_netdev(0, "esp4", NULL); + if (!esp_dummydev) { + printk(KERN_INFO "ip esp init: can't allocate dummydev\n"); + inet_del_protocol(&esp4_protocol, IPPROTO_ESP); + xfrm_unregister_type(&esp_type, AF_INET); + return -EAGAIN; + } return 0; } static void __exit esp4_fini(void) { + free_netdev(esp_dummydev); if (inet_del_protocol(&esp4_protocol, IPPROTO_ESP) < 0) printk(KERN_INFO "ip esp close: can't remove protocol\n"); if (xfrm_unregister_type(&esp_type, AF_INET) < 0) diff -Nru linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/xfrm4_input.c linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/xfrm4_input.c --- linuxppc25bh-031218-orinoco_monitor-gpl/net/ipv4/xfrm4_input.c 2004-01-28 00:50:36.204730840 +0100 +++ linuxppc25bh-031218-orinoco_monitor-gpl-netfipsec/net/ipv4/xfrm4_input.c 2004-01-28 00:51:39.557099816 +0100 @@ -6,6 +6,8 @@ * Split up af-specific portion * Derek Atkins <derek@ihtfp.com> * Add Encapsulation support + * Harald Welte <laforge@netfilter.org> + * Drop conntrack reference before calling netif_rx() * */ @@ -130,6 +132,13 @@ dst_release(skb->dst); skb->dst = NULL; } +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif netif_rx(skb); return 0; } else { -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 0:09 ` [PATCH]Re: " Harald Welte @ 2004-01-28 8:49 ` Patrick McHardy 2004-01-28 9:37 ` Patrick McHardy 2004-01-28 10:30 ` Harald Welte 2004-01-28 10:30 ` [PATCH]Re: NAT before IPsec with 2.6 Andreas Jellinghaus 1 sibling, 2 replies; 63+ messages in thread From: Patrick McHardy @ 2004-01-28 8:49 UTC (permalink / raw) To: Harald Welte Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel Harald Welte wrote: >Hi! > >[please ignore the last message, I mistakenly sent an outdated, incomplete > patch] > >I've now hacked a preliminary patch. Be warned, I didn't even test if >it compiles. Maybe someone who actually has a running 2.6.x ipsec setup >can give it a try. > >What it should be doing >- traverse the additional chains as described in the last email >- have connection tracking recognize two seperate connections, one > for the decapsulated traffic, one for the encapsulated >- thus, even SNAT/DNAT should be working. >- locally-encapsulated traffic shows up with input device "ah4" or > "esp4" in POSTROUTING. >- locally-encapsulated traffic shows up with output device "ah4" or > "esp4" in OUTPUT. > >What is missing (TODO): >- no dummy device names in INPUT/PREROUTING for locally-decapsulated > packets. This is somewhat harder >- no real output device shown in OUTPUT for locally-encapsulated > packets. I'm not sure if it is legal to typecast the just-popped > dst_entry to 'struct rtable' and derive the output interface from > there. > >Please give feedback. > > I see two problems with this approach. The dummy devices don't have any ip config, so f.e. REDIRECT will fail. The bigger problem is hooking in output routines that return NET_XMIT_BYPASS. dst_output loops until the return code of skb->dst->output != NET_XMIT_BYPASS. These output routines replace skb->dst when finished by calling dst_pop. If we pass the packet through netfilter in between, the dst_entry might get replaced in ip_route_me_harder or elsewhere and not all transformations will be applied. If NAT is used, ip_route_{input,output} might even return a different policy bundle. Anyways, I'm testing it (slightly hacked) as soon as the compile finishes ;) Regards, Patrick ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 8:49 ` Patrick McHardy @ 2004-01-28 9:37 ` Patrick McHardy 2004-01-28 10:30 ` Harald Welte 1 sibling, 0 replies; 63+ messages in thread From: Patrick McHardy @ 2004-01-28 9:37 UTC (permalink / raw) To: Harald Welte Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel Patrick McHardy wrote: >I see two problems with this approach. The dummy devices don't have > > >any ip config, so f.e. REDIRECT will fail. The bigger problem is > > >hooking in output routines that return NET_XMIT_BYPASS. dst_output > > >loops until the return code of skb->dst->output != NET_XMIT_BYPASS. > > >These output routines replace skb->dst when finished by calling dst_pop. > > >If we pass the packet through netfilter in between, the dst_entry > > >might get replaced in ip_route_me_harder or elsewhere and not all > > >transformations will be applied. If NAT is used, > > >ip_route_{input,output} might even return a different policy bundle. > Bad news, I've tested the hook part (using skb->dev instead of ah4/esp4 devices), this is how the traces look: IN= OUT=eth0 SRC=172.20.0.3 DST=192.168.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=28443 SEQ=1 IN= OUT= SRC=172.20.0.3 DST=192.168.0.1 LEN=152 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ESP SPI=0xb9f7900 IN= OUT= SRC=172.20.0.3 DST=192.168.0.1 LEN=176 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=AH SPI=0xc46e27 After applying a MARK rule to the first packet (causing rerouting): IN= OUT=eth0 SRC=172.20.0.3 DST=192.168.0.1 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=ICMP TYPE=8 CODE=0 ID=51484 SEQ=1 Sniffing in between shows the unencrypted packet. >Regards, > > >Patrick > > > > > ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 8:49 ` Patrick McHardy 2004-01-28 9:37 ` Patrick McHardy @ 2004-01-28 10:30 ` Harald Welte 2004-01-28 11:24 ` Willy Tarreau ` (3 more replies) 1 sibling, 4 replies; 63+ messages in thread From: Harald Welte @ 2004-01-28 10:30 UTC (permalink / raw) To: Patrick McHardy Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2578 bytes --] On Wed, Jan 28, 2004 at 09:49:56AM +0100, Patrick McHardy wrote: > I see two problems with this approach. The dummy devices don't have > any ip config, so f.e. REDIRECT will fail. Ok, let's ignore that for now. dunno if the dummydev idea was so good at all. > The bigger problem is hooking in output routines that return > NET_XMIT_BYPASS. dst_output loops until the return code of > skb->dst->output != NET_XMIT_BYPASS. These output routines replace > skb->dst when finished by calling dst_pop. > > If we pass the packet through netfilter in between, the dst_entry > might get replaced in ip_route_me_harder or elsewhere and not all > transformations will be applied. I see. So we would have to modify all code that changes skb->dst to check if there is a dst stack. If yes, it would have to iterate over all of them and just change the last one. Shouldn't be too hard to intergrate that change. However, it is hard to tell what is the correct behaviour in that case. In POST_ROUTING of the original, unencapsulated packet, I would say it is correct to not apply those transformations, in case rerouting of the packet occurs. If somebody reroutes here, he wants to affect the original packet, not the encapsulated one. However, once we did do the first encapsulation, it doesn't make sense to encapsulate further layers until we've reached the real dst. At this point, LOCAL_OUT would be called with the heading-for-the-wire packet and rerouting will and should affect the final device. However, exposing all those intermediate steps to the netfilter-hook-attached code is not so bad. It enables people to do whatever they want... they just need to be careful with their rules. If there's too much interference with normal OUTPUT/POSTROUTING rules, we could still go for new hooks+new chains. > If NAT is used, ip_route_{input,output} might even return a different > policy bundle. The question is, again: What ist the desired behaviour? Should the policy be determined on the un-NAT'ed packet or on the NAT'ed one? > Anyways, I'm testing it (slightly hacked) as soon as the compile finishes ;) good luck :) > Regards, > Patrick -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 10:30 ` Harald Welte @ 2004-01-28 11:24 ` Willy Tarreau 2004-01-28 13:39 ` Harald Welte 2004-01-28 15:58 ` Tom Eastep 2004-01-28 13:22 ` Patrick McHardy ` (2 subsequent siblings) 3 siblings, 2 replies; 63+ messages in thread From: Willy Tarreau @ 2004-01-28 11:24 UTC (permalink / raw) To: Harald Welte, Patrick McHardy, Henrik Nordstrom, Tom Eastep, Michal Ludvig, netfilter-devel On Wed, Jan 28, 2004 at 11:30:00AM +0100, Harald Welte wrote: > > If NAT is used, ip_route_{input,output} might even return a different > > policy bundle. > > The question is, again: What ist the desired behaviour? Should the > policy be determined on the un-NAT'ed packet or on the NAT'ed one? Harald, I believe it's important to remember that NAT is not the only usage of this extension. For many people seeking security (=those who install VPN gateways for customers), it is very important to : - be able to filter what enters and leaves a tunnel - be able to filter where the encapsulated packets go to/come from NAT is complementary IMHO. Regards, Willy ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 11:24 ` Willy Tarreau @ 2004-01-28 13:39 ` Harald Welte 2004-01-28 15:58 ` Tom Eastep 1 sibling, 0 replies; 63+ messages in thread From: Harald Welte @ 2004-01-28 13:39 UTC (permalink / raw) To: Willy Tarreau Cc: Patrick McHardy, Henrik Nordstrom, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1393 bytes --] On Wed, Jan 28, 2004 at 12:24:19PM +0100, Willy Tarreau wrote: > On Wed, Jan 28, 2004 at 11:30:00AM +0100, Harald Welte wrote: > > > > If NAT is used, ip_route_{input,output} might even return a different > > > policy bundle. > > > > The question is, again: What ist the desired behaviour? Should the > > policy be determined on the un-NAT'ed packet or on the NAT'ed one? > > Harald, I believe it's important to remember that NAT is not the only usage > of this extension. For many people seeking security (=those who install VPN > gateways for customers), it is very important to : > - be able to filter what enters and leaves a tunnel > - be able to filter where the encapsulated packets go to/come from > > NAT is complementary IMHO. Yes, I totally agree. My major point was conntrack being broken, and thus stateful packet filtering not being possible. The respective iptables chains are missing. Both of them are needed for the additional making-NAT-work. > Regards, > Willy -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 11:24 ` Willy Tarreau 2004-01-28 13:39 ` Harald Welte @ 2004-01-28 15:58 ` Tom Eastep 1 sibling, 0 replies; 63+ messages in thread From: Tom Eastep @ 2004-01-28 15:58 UTC (permalink / raw) To: Willy Tarreau, Harald Welte, Patrick McHardy, Henrik Nordstrom, Michal Ludvig, netfilter-devel On Wednesday 28 January 2004 03:24 am, Willy Tarreau wrote: > On Wed, Jan 28, 2004 at 11:30:00AM +0100, Harald Welte wrote: > > > If NAT is used, ip_route_{input,output} might even return a different > > > policy bundle. > > > > The question is, again: What ist the desired behaviour? Should the > > policy be determined on the un-NAT'ed packet or on the NAT'ed one? > > Harald, I believe it's important to remember that NAT is not the only usage > of this extension. For many people seeking security (=those who install VPN > gateways for customers), it is very important to : > - be able to filter what enters and leaves a tunnel > - be able to filter where the encapsulated packets go to/come from > I agree. And the 2.4 implementation of IPSEC allowed this control to be exercised in a natural and uniform way (the same model applied for all tunnel types including IPSEC). - Unencrypted traffic is exchanged between the client and a special device. - encrypted traffic is exchanged between the gateways. These are separately tracked connections that obey the normal netfilter rules. For configured tunnels, I still believe that this is the proper model. Opportunistic keying OTOH probably shouldn't follow that model. > NAT is complementary IMHO. But there are instances where it is necessary. -Tom -- Tom Eastep \ Nothing is foolproof to a sufficiently talented fool Shoreline, \ http://shorewall.net Washington USA \ teastep@shorewall.net ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 10:30 ` Harald Welte 2004-01-28 11:24 ` Willy Tarreau @ 2004-01-28 13:22 ` Patrick McHardy 2004-01-28 14:23 ` Henrik Nordstrom 2004-02-01 14:52 ` Patrick McHardy 3 siblings, 0 replies; 63+ messages in thread From: Patrick McHardy @ 2004-01-28 13:22 UTC (permalink / raw) To: Harald Welte Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel Harald Welte wrote: > On Wed, Jan 28, 2004 at 09:49:56AM +0100, Patrick McHardy wrote: > >>I see two problems with this approach. The dummy devices don't have >>any ip config, so f.e. REDIRECT will fail. > > > Ok, let's ignore that for now. dunno if the dummydev idea was so good > at all. I believe we pass the real device and provide a new match. The real device might be interesting for filtering, too. But as you say, let's ignore it for now. >>The bigger problem is hooking in output routines that return >>NET_XMIT_BYPASS. dst_output loops until the return code of >>skb->dst->output != NET_XMIT_BYPASS. These output routines replace >>skb->dst when finished by calling dst_pop. >> >>If we pass the packet through netfilter in between, the dst_entry >>might get replaced in ip_route_me_harder or elsewhere and not all >>transformations will be applied. > > > I see. So we would have to modify all code that changes skb->dst to > check if there is a dst stack. If yes, it would have to iterate over > all of them and just change the last one. Shouldn't be too hard to > intergrate that change. > However, it is hard to tell what is the correct behaviour in that case. > > In POST_ROUTING of the original, unencapsulated packet, I would say it > is correct to not apply those transformations, in case rerouting of the > packet occurs. If somebody reroutes here, he wants to affect the > original packet, not the encapsulated one. Agreed. The problem is outfn is already set to {ah,esp}_output when POST_ROUTING of the original packet is called, we need to handle this somehow. We also should handle the opposite case, after a (normal) packet is SNATed ip_route_me_harder will replace the dst_entry, the new one might include ipsec transformations if a policy for the now-different packet exists, but the packet is always passed to ip_finish_output2. If we would call dst_output instead strange hook orders can occur: PRE_ROUTING FORWARD POST_ROUTING <SNAT, encapsulate> POST_ROUTING LOCAL_OUT <SNAT again ;)> .. I can't imagine a good solution right now. > However, once we did do the first encapsulation, it doesn't make sense > to encapsulate further layers until we've reached the real dst. At this > point, LOCAL_OUT would be called with the heading-for-the-wire packet > and rerouting will and should affect the final device. I'm not sure if you understand you correctly. Do you mean to say that it doesn't make sense to pass packets to netfilter hooks until all transformations have been applied ? If so, I agree, I don't think there is much use in doing stuff to half-done packets. We can easily detect the final dst_entry, it should have dst->xfrm = NULL (at least I would think so). But I don't know how to skip intermediate ones, I guess they look similar to the first one. > > However, exposing all those intermediate steps to the > netfilter-hook-attached code is not so bad. It enables people to do > whatever they want... they just need to be careful with their rules. > If there's too much interference with normal OUTPUT/POSTROUTING rules, > we could still go for new hooks+new chains. This makes me think I misunderstood your statement above. I recall there was some confusion between hook functions and hooks before, it seems I got me, too ;) Do you propose to make these steps visible to iptables chains or just to the registered hook functions, which would hide them from the user by immediately returning in the iptables case ? > > >>If NAT is used, ip_route_{input,output} might even return a different >>policy bundle. > > > The question is, again: What ist the desired behaviour? Should the > policy be determined on the un-NAT'ed packet or on the NAT'ed one? The NATed one. When the policy says encrypt 10.0.0.1<->10.0.0.2, NAT should bypass that. IIRC Herbert Xu mentioned some things about when policy checks need to be done in the "2.6 IPSEC + SNAT" thread. I'm going to read it again later. >>Regards, >>Patrick > > ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 10:30 ` Harald Welte 2004-01-28 11:24 ` Willy Tarreau 2004-01-28 13:22 ` Patrick McHardy @ 2004-01-28 14:23 ` Henrik Nordstrom 2004-02-01 14:52 ` Patrick McHardy 3 siblings, 0 replies; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-28 14:23 UTC (permalink / raw) To: Harald Welte Cc: Patrick McHardy, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel On Wed, 28 Jan 2004, Harald Welte wrote: > The question is, again: What ist the desired behaviour? Should the > policy be determined on the un-NAT'ed packet or on the NAT'ed one? IN my opinion it must be on the NAT'ed one, or else there is a very large risk the packet does not match the selected policy and the other endpoint will reject the resulting IPSec packet. Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 10:30 ` Harald Welte ` (2 preceding siblings ...) 2004-01-28 14:23 ` Henrik Nordstrom @ 2004-02-01 14:52 ` Patrick McHardy 2004-02-16 1:19 ` Patrick McHardy 3 siblings, 1 reply; 63+ messages in thread From: Patrick McHardy @ 2004-02-01 14:52 UTC (permalink / raw) To: Harald Welte Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2626 bytes --] I've hacked together a working version, it's a ugly hack but it shows what parts are required for getting NAT working. Comments are welcome. The patch is split into three parts: 01-hooks.diff, 02-nat-original.diff and 03-nat-reply.diff. I'm going to discuss each part seperately. 01-hooks.diff: This patch adds the required hooks. Locally generated and forwarded packets pass through POST_ROUTING before beeing encypted and pass through LOCAL_OUT afterwards. This is currently not consistent with input packets which pass through PRE_ROUTING and LOCAL_IN encrypted and then again after beeing decapsulated from a tunnel-mode encapsulation. This means packets from nested tunnels will pass multiple times, packets from transport-mode connections won't pass the hook after decryption at all. 02-nat-original.diff: This patch adds policy lookups to ip_route_me_harder and changes ip_nat_standalone to reroute in both LOCAL_OUT and POST_ROUTING if the routing key or the key used for policy lookups changed. This allows to NAT packets and have the correct policy applied to them afterwards. Since the POST_ROUTING hook is called with ip_finish_output2 as outfn the packets which have a transformer-dst_entry after ip_route_me_harder are manually confirmed (conntrack) and passed to dst_output. These packets will be passed through LOCAL_OUT after encryption has been done. There is a danger of loops, the encrypted packets could be natted in LOCAL_OUT and POST_ROUTING again and again match a policy. The packets matching a policy after rerouting in LOCAL_OUT don't need this kind of special handling, LOCAL_OUT is called with dst_output as outfn. They are currently not passwd through POST_ROUTING before encryption, which I just noticed while writing this mail. 03-nat-reply.diff: This patch changes xfrm_policy_check to lookup the correct policy for input packets after NAT has been applied. Unfortunately some policy checks are performed after skb->nfct has been released so I had to introduce a leak. To summarize the problems: - Not completely consistens picture wrt. hooks passed and ordering. Should be easily fixable. - loop danger. suggestions ? - nat information required at socket receive time for policy checks, currently achieved by introducing a leak. possible solution is to store the required data in new skb fields, it's only 6 bytes .. What is currently working is the following: - Filter traffic before/after encryption (more or less) - SNAT/DNAT packets and have correct policies applied, including none - Have policy checks find correct policy for SNAT/DNAT/SNAT+DNATed packets Regards, Patrick [-- Attachment #2: 01-hooks.diff --] [-- Type: text/plain, Size: 4085 bytes --] ===== net/ipv4/ah4.c 1.29 vs edited ===== --- 1.29/net/ipv4/ah4.c Sat Jan 24 19:08:48 2004 +++ edited/net/ipv4/ah4.c Sat Jan 31 16:26:44 2004 @@ -6,6 +6,8 @@ #include <net/ah.h> #include <linux/crypto.h> #include <linux/pfkeyv2.h> +#include <linux/netfilter.h> +#include <linux/netfilter_ipv4.h> #include <net/icmp.h> #include <asm/scatterlist.h> @@ -54,6 +56,11 @@ return 0; } +static inline int ah_finish_output(struct sk_buff *skb) +{ + return NET_XMIT_BYPASS; +} + static int ah_output(struct sk_buff *skb) { int err; @@ -144,6 +151,18 @@ if ((skb->dst = dst_pop(dst)) == NULL) { err = -EHOSTUNREACH; goto error_nolock; + } + /* final packet goes through LOCAL_OUT hook */ + if (skb->dst->xfrm == NULL) { +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif + return NF_HOOK(AF_INET, NF_IP_LOCAL_OUT, skb, NULL, + skb->dst->dev, ah_finish_output); } return NET_XMIT_BYPASS; ===== net/ipv4/esp4.c 1.35 vs edited ===== --- 1.35/net/ipv4/esp4.c Mon Aug 18 13:14:38 2003 +++ edited/net/ipv4/esp4.c Sat Jan 31 16:26:33 2004 @@ -8,6 +8,8 @@ #include <linux/crypto.h> #include <linux/pfkeyv2.h> #include <linux/random.h> +#include <linux/netfilter.h> +#include <linux/netfilter_ipv4.h> #include <net/icmp.h> #include <net/udp.h> @@ -20,6 +22,11 @@ __u8 proto; }; +static inline int esp_finish_output(struct sk_buff *skb) +{ + return NET_XMIT_BYPASS; +} + int esp_output(struct sk_buff *skb) { int err; @@ -198,6 +205,18 @@ if ((skb->dst = dst_pop(dst)) == NULL) { err = -EHOSTUNREACH; goto error_nolock; + } + /* final packet goes through LOCAL_OUT hook */ + if (skb->dst->xfrm == NULL) { +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif + return NF_HOOK(AF_INET, NF_IP_LOCAL_OUT, skb, NULL, + skb->dst->dev, esp_finish_output); } return NET_XMIT_BYPASS; ===== net/ipv4/ip_forward.c 1.9 vs edited ===== --- 1.9/net/ipv4/ip_forward.c Sun Mar 23 11:21:28 2003 +++ edited/net/ipv4/ip_forward.c Sat Jan 31 16:27:11 2004 @@ -51,6 +51,10 @@ if (unlikely(opt->optlen)) ip_forward_options(skb); + if (skb->dst->xfrm != NULL) + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev, + dst_output); + return dst_output(skb); } ===== net/ipv4/ip_output.c 1.48 vs edited ===== --- 1.48/net/ipv4/ip_output.c Wed Dec 17 21:06:18 2003 +++ edited/net/ipv4/ip_output.c Sat Jan 31 16:27:22 2004 @@ -122,6 +122,14 @@ return ttl; } +static inline int ip_dst_output(struct sk_buff *skb) +{ + if (skb->dst->xfrm != NULL) + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, + skb->dst->dev, dst_output); + return dst_output(skb); +} + /* * Add an ip header to a skbuff and send it out. * @@ -164,7 +172,7 @@ /* Send it out. */ return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, - dst_output); + ip_dst_output); } static inline int ip_finish_output2(struct sk_buff *skb) @@ -386,7 +394,7 @@ skb->priority = sk->sk_priority; return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, - dst_output); + ip_dst_output); no_route: IP_INC_STATS(IpOutNoRoutes); @@ -1164,7 +1172,7 @@ /* Netfilter gets whole the not fragmented skb. */ err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, - skb->dst->dev, dst_output); + skb->dst->dev, ip_dst_output); if (err) { if (err > 0) err = inet->recverr ? net_xmit_errno(err) : 0; ===== net/ipv4/xfrm4_input.c 1.9 vs edited ===== --- 1.9/net/ipv4/xfrm4_input.c Fri Aug 8 06:17:15 2003 +++ edited/net/ipv4/xfrm4_input.c Sat Jan 31 14:23:52 2004 @@ -130,6 +130,13 @@ dst_release(skb->dst); skb->dst = NULL; } +#ifdef CONFIG_NETFILTER + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +#endif netif_rx(skb); return 0; } else { [-- Attachment #3: 02-nat-original.diff --] [-- Type: text/plain, Size: 4629 bytes --] ===== net/core/netfilter.c 1.26 vs edited ===== --- 1.26/net/core/netfilter.c Sun Sep 28 18:34:18 2003 +++ edited/net/core/netfilter.c Sun Feb 1 14:39:46 2004 @@ -25,6 +25,7 @@ #include <linux/icmp.h> #include <net/sock.h> #include <net/route.h> +#include <net/xfrm.h> #include <linux/ip.h> #define __KERNEL_SYSCALLS__ @@ -627,6 +628,9 @@ struct flowi fl = {}; struct dst_entry *odst; unsigned int hh_len; +#ifdef CONFIG_XFRM + struct xfrm_policy_afinfo *afinfo; +#endif /* some non-standard hacks like ipt_REJECT.c:send_reset() can cause * packets with foreign saddr to appear on the NF_IP_LOCAL_OUT hook. @@ -665,6 +669,16 @@ if ((*pskb)->dst->error) return -1; +#ifdef CONFIG_XFRM + afinfo = xfrm_policy_get_afinfo(AF_INET); + if (afinfo != NULL) { + afinfo->decode_session(*pskb, &fl); + xfrm_policy_put_afinfo(afinfo); + if (xfrm_lookup(&(*pskb)->dst, &fl, (*pskb)->sk, 1) != 0) + return -1; + } +#endif + /* Change in oif may mean change in hh_len. */ hh_len = (*pskb)->dst->dev->hard_header_len; if (skb_headroom(*pskb) < hh_len) { ===== net/ipv4/netfilter/ip_conntrack_standalone.c 1.23 vs edited ===== --- 1.23/net/ipv4/netfilter/ip_conntrack_standalone.c Fri Dec 5 21:30:11 2003 +++ edited/net/ipv4/netfilter/ip_conntrack_standalone.c Sat Jan 31 14:37:49 2004 @@ -481,6 +481,7 @@ EXPORT_SYMBOL(ip_conntrack_alter_reply); EXPORT_SYMBOL(ip_conntrack_destroyed); EXPORT_SYMBOL(ip_conntrack_get); +EXPORT_SYMBOL(__ip_conntrack_confirm); EXPORT_SYMBOL(need_ip_conntrack); EXPORT_SYMBOL(ip_conntrack_helper_register); EXPORT_SYMBOL(ip_conntrack_helper_unregister); ===== net/ipv4/netfilter/ip_nat_standalone.c 1.28 vs edited ===== --- 1.28/net/ipv4/netfilter/ip_nat_standalone.c Fri Oct 3 08:26:55 2003 +++ edited/net/ipv4/netfilter/ip_nat_standalone.c Sun Feb 1 11:21:29 2004 @@ -160,6 +160,46 @@ return do_bindings(ct, ctinfo, info, hooknum, pskb); } +struct flow_key +{ + u_int32_t addr; +#ifdef CONFIG_XFRM + u_int16_t port; +#endif +}; + +static inline void +flow_key_get(struct sk_buff *skb, struct flow_key *key, int which) +{ + struct iphdr *iph = skb->nh.iph; + + key->addr = which ? iph->daddr : iph->saddr; +#ifdef CONFIG_XFRM + key->port = 0; + if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) { + u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4); + key->port = ports[which]; + } +#endif +} + +static inline int +flow_key_compare(struct sk_buff *skb, struct flow_key *key, int which) +{ + struct iphdr *iph = skb->nh.iph; + + if (key->addr != (which ? iph->daddr : iph->saddr)) + return 1; +#ifdef CONFIG_XFRM + if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) { + u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4); + if (key->port != ports[which]) + return 1; + } +#endif + return 0; +} + static unsigned int ip_nat_out(unsigned int hooknum, struct sk_buff **pskb, @@ -167,6 +207,9 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { + struct flow_key k; + unsigned int ret; + /* root is playing with raw sockets. */ if ((*pskb)->len < sizeof(struct iphdr) || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) @@ -189,7 +232,25 @@ return NF_STOLEN; } - return ip_nat_fn(hooknum, pskb, in, out, okfn); + flow_key_get(*pskb, &k, 0); + ret = ip_nat_fn(hooknum, pskb, in, out, okfn); + + if (ret != NF_DROP && ret != NF_STOLEN + && flow_key_compare(*pskb, &k, 0)) { + if (ip_route_me_harder(pskb) != 0) + ret = NF_DROP; + else if ((*pskb)->dst->xfrm != NULL) { + /* packet matches policy after ip_route_me_harder - + * need to manually direct packet to transformers */ + /* WARNING: loop danger */ + ret = ip_conntrack_confirm(*pskb); + if (ret != NF_DROP) { + dst_output(*pskb); + ret = NF_STOLEN; + } + } + } + return ret; } #ifdef CONFIG_IP_NF_NAT_LOCAL @@ -200,7 +261,7 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { - u_int32_t saddr, daddr; + struct flow_key k; unsigned int ret; /* root is playing with raw sockets. */ @@ -208,14 +269,14 @@ || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) return NF_ACCEPT; - saddr = (*pskb)->nh.iph->saddr; - daddr = (*pskb)->nh.iph->daddr; - + flow_key_get(*pskb, &k, 1); ret = ip_nat_fn(hooknum, pskb, in, out, okfn); + if (ret != NF_DROP && ret != NF_STOLEN - && ((*pskb)->nh.iph->saddr != saddr - || (*pskb)->nh.iph->daddr != daddr)) - return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP; + && flow_key_compare(*pskb, &k, 1)) { + if (ip_route_me_harder(pskb) != 0) + ret = NF_DROP; + } return ret; } #endif [-- Attachment #4: 03-nat-reply.diff --] [-- Type: text/plain, Size: 3489 bytes --] ===== include/linux/netfilter.h 1.6 vs edited ===== --- 1.6/include/linux/netfilter.h Wed Jun 25 00:36:10 2003 +++ edited/include/linux/netfilter.h Sun Feb 1 14:39:16 2004 @@ -166,5 +166,21 @@ #define NF_HOOK(pf, hook, skb, indev, outdev, okfn) (okfn)(skb) #endif /*CONFIG_NETFILTER*/ +#ifdef CONFIG_XFRM +#ifdef CONFIG_IP_NF_NAT_NEEDED +struct flowi; +extern void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl); + +static inline void +nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, int family) +{ + if (family == AF_INET) + nf_nat_decode_session4(skb, fl); +} +#else /* CONFIG_IP_NF_NAT_NEEDED */ +#define nf_nat_decode_session(skb, fl, family) +#endif /* CONFIG_IP_NF_NAT_NEEDED */ +#endif /* CONFIG_XFRM */ + #endif /*__KERNEL__*/ #endif /*__LINUX_NETFILTER_H*/ ===== net/core/netfilter.c 1.26 vs edited ===== --- 1.26/net/core/netfilter.c Sun Sep 28 18:34:18 2003 +++ edited/net/core/netfilter.c Sun Feb 1 14:39:46 2004 @@ -681,6 +695,49 @@ return 0; } + +#if defined(CONFIG_IP_NF_NAT_NEEDED) && defined(CONFIG_XFRM) +#include <linux/netfilter_ipv4/ip_conntrack.h> +#include <linux/netfilter_ipv4/ip_nat.h> + +void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl) +{ + struct ip_conntrack *ct; + struct ip_nat_info_manip *m; + struct ip_conntrack_tuple *tuple; + unsigned int i; + + if (skb->nfct == NULL) + return; + ct = (struct ip_conntrack *)skb->nfct->master; + + for (i = 0; i < ct->nat.info.num_manips; i++) { + m = &ct->nat.info.manips[i]; + if (m->direction != IP_CT_DIR_REPLY) + continue; + tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple; + + if (m->hooknum == NF_IP_PRE_ROUTING && + m->maniptype == IP_NAT_MANIP_DST) { + /* SNAT rule reply mangling */ + fl->fl4_dst = tuple->dst.ip; + if (tuple->dst.protonum == IPPROTO_TCP || + tuple->dst.protonum == IPPROTO_UDP) + fl->fl_ip_dport = tuple->dst.u.tcp.port; + } +#ifdef CONFIG_IP_NF_NAT_LOCAL + else if (m->hooknum == NF_IP_LOCAL_IN && + m->maniptype == IP_NAT_MANIP_SRC) { + /* DNAT rule reply mangling */ + fl->fl4_src = tuple->src.ip; + if (tuple->dst.protonum == IPPROTO_TCP || + tuple->dst.protonum == IPPROTO_UDP) + fl->fl_ip_sport = tuple->src.u.tcp.port; + } +#endif + } +} +#endif /* CONFIG_IP_NF_NAT_NEEDED && CONFIG_XFRM */ int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len) { ===== net/ipv4/ip_input.c 1.20 vs edited ===== --- 1.20/net/ipv4/ip_input.c Mon Sep 29 04:05:42 2003 +++ edited/net/ipv4/ip_input.c Sat Jan 31 21:30:52 2004 @@ -207,12 +207,14 @@ __skb_pull(skb, ihl); +#if 0 #ifdef CONFIG_NETFILTER /* Free reference early: we don't need it any more, and it may hold ip_conntrack module loaded indefinitely. */ nf_conntrack_put(skb->nfct); skb->nfct = NULL; #endif /*CONFIG_NETFILTER*/ +#endif /* Point into the IP datagram, just past the header. */ skb->h.raw = skb->data; ===== net/xfrm/xfrm_policy.c 1.47 vs edited ===== --- 1.47/net/xfrm/xfrm_policy.c Wed Jan 14 08:30:19 2004 +++ edited/net/xfrm/xfrm_policy.c Sun Feb 1 13:57:09 2004 @@ -21,6 +21,7 @@ #include <linux/workqueue.h> #include <linux/notifier.h> #include <linux/netdevice.h> +#include <linux/netfilter.h> #include <net/xfrm.h> #include <net/ip.h> @@ -908,6 +909,7 @@ if (_decode_session(skb, &fl, family) < 0) return 0; + nf_nat_decode_session(skb, &fl, family); /* First, check used SA against their selectors. */ if (skb->sp) { ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-02-01 14:52 ` Patrick McHardy @ 2004-02-16 1:19 ` Patrick McHardy 2004-02-18 14:57 ` Patrick McHardy 0 siblings, 1 reply; 63+ messages in thread From: Patrick McHardy @ 2004-02-16 1:19 UTC (permalink / raw) To: Harald Welte Cc: Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1715 bytes --] I've continued to work on the IPSEC+NAT issue, this is the current status. The hooks for pre/post-ipsec are now entirely contained in ip_output.c. The transformers mark transformed packets with a new flag in the control buffer, this is used for directing ipsec packets to LOCAL_OUT in ip_output(). It also solves the loop danger when redirecting packets in ip_nat_out after rerouting, ip_route_me_harder only looks up a policy for untransformed packets. Redirected packets are manually confirmed, should this be changed to just mark them for redirection and do the actual redirecting in the conntrack hook ? The policy check part is fine now, the conntrack reference is kept as long as required for the ip protocol handlers that do policy checks, but it should be dropped on all paths before packets are queued to the socket. The nf_nat_decode_packet function now handles all cases of NAT which affect policy lookups, the last patches didn't handle DNAT rules in PREROUTING. The last big problem is the input side, I don't know how to make hook ordering symetric to output. Suggestions are very welcome. There are 4 patches: 01-nf_reset.diff Move common nf_conntrack_put/nfct=NULL/nf_debug=0 code to new inline function nf_reset. 02-hooks.diff Make packets to be encrypted visible on POST_ROUTING hook and encrypted packets on LOCAL_OUT. Reset nfct etc. before reposting the packet into the stack on reception. 03-nat-policy-lookup.diff Add policy lookups to ip_route_me_harder and change NAT to reroute for any change that affects routing or policy lookups 04-nat-policy-checks.diff Make xfrm_policy_check find correct policy for NATed packets Best regards, Patrick [-- Attachment #2: 01-nf_reset.diff --] [-- Type: text/x-patch, Size: 5407 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/16 00:43:03+01:00 kaber@trash.net # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv6/sit.c # 2004/02/16 00:41:03+01:00 kaber@trash.net +2 -14 # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv6/ip6_tunnel.c # 2004/02/16 00:41:03+01:00 kaber@trash.net +1 -7 # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv4/ipip.c # 2004/02/16 00:41:03+01:00 kaber@trash.net +2 -14 # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv4/ip_input.c # 2004/02/16 00:41:03+01:00 kaber@trash.net +1 -5 # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv4/ip_gre.c # 2004/02/16 00:41:03+01:00 kaber@trash.net +2 -14 # Add new function 'nf_reset' to reset netfilter related skb-fields # # include/linux/skbuff.h # 2004/02/16 00:41:03+01:00 kaber@trash.net +12 -3 # Add new function 'nf_reset' to reset netfilter related skb-fields # diff -Nru a/include/linux/skbuff.h b/include/linux/skbuff.h --- a/include/linux/skbuff.h Mon Feb 16 02:12:55 2004 +++ b/include/linux/skbuff.h Mon Feb 16 02:12:55 2004 @@ -1201,6 +1201,14 @@ if (nfct) atomic_inc(&nfct->master->use); } +static inline void nf_reset(struct sk_buff *skb) +{ + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +} #ifdef CONFIG_BRIDGE_NETFILTER static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge) @@ -1213,9 +1221,10 @@ if (nf_bridge) atomic_inc(&nf_bridge->use); } -#endif - -#endif +#endif /* CONFIG_BRIDGE_NETFILTER */ +#else /* CONFIG_NETFILTER */ +static inline void nf_reset(struct sk_buff *skb) {} +#endif /* CONFIG_NETFILTER */ #endif /* __KERNEL__ */ #endif /* _LINUX_SKBUFF_H */ diff -Nru a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c --- a/net/ipv4/ip_gre.c Mon Feb 16 02:12:55 2004 +++ b/net/ipv4/ip_gre.c Mon Feb 16 02:12:55 2004 @@ -643,13 +643,7 @@ skb->dev = tunnel->dev; dst_release(skb->dst); skb->dst = NULL; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); ipgre_ecn_decapsulate(iph, skb); netif_rx(skb); read_unlock(&ipgre_lock); @@ -877,13 +871,7 @@ } } -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); IPTUNNEL_XMIT(); tunnel->recursion--; diff -Nru a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c --- a/net/ipv4/ip_input.c Mon Feb 16 02:12:55 2004 +++ b/net/ipv4/ip_input.c Mon Feb 16 02:12:55 2004 @@ -202,17 +202,13 @@ #ifdef CONFIG_NETFILTER_DEBUG nf_debug_ip_local_deliver(skb); - skb->nf_debug = 0; #endif /*CONFIG_NETFILTER_DEBUG*/ __skb_pull(skb, ihl); -#ifdef CONFIG_NETFILTER /* Free reference early: we don't need it any more, and it may hold ip_conntrack module loaded indefinitely. */ - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#endif /*CONFIG_NETFILTER*/ + nf_reset(skb); /* Point into the IP datagram, just past the header. */ skb->h.raw = skb->data; diff -Nru a/net/ipv4/ipip.c b/net/ipv4/ipip.c --- a/net/ipv4/ipip.c Mon Feb 16 02:12:55 2004 +++ b/net/ipv4/ipip.c Mon Feb 16 02:12:55 2004 @@ -496,13 +496,7 @@ skb->dev = tunnel->dev; dst_release(skb->dst); skb->dst = NULL; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); ipip_ecn_decapsulate(iph, skb); netif_rx(skb); read_unlock(&ipip_lock); @@ -647,13 +641,7 @@ if ((iph->ttl = tiph->ttl) == 0) iph->ttl = old_iph->ttl; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); IPTUNNEL_XMIT(); tunnel->recursion--; diff -Nru a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c --- a/net/ipv6/ip6_tunnel.c Mon Feb 16 02:12:55 2004 +++ b/net/ipv6/ip6_tunnel.c Mon Feb 16 02:12:55 2004 @@ -715,13 +715,7 @@ ipv6h->nexthdr = proto; ipv6_addr_copy(&ipv6h->saddr, &fl.fl6_src); ipv6_addr_copy(&ipv6h->daddr, &fl.fl6_dst); -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); pkt_len = skb->len; err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, skb->dst->dev, dst_output); diff -Nru a/net/ipv6/sit.c b/net/ipv6/sit.c --- a/net/ipv6/sit.c Mon Feb 16 02:12:55 2004 +++ b/net/ipv6/sit.c Mon Feb 16 02:12:55 2004 @@ -388,13 +388,7 @@ skb->dev = tunnel->dev; dst_release(skb->dst); skb->dst = NULL; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); ipip6_ecn_decapsulate(iph, skb); netif_rx(skb); read_unlock(&ipip6_lock); @@ -580,13 +574,7 @@ if ((iph->ttl = tiph->ttl) == 0) iph->ttl = iph6->hop_limit; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); IPTUNNEL_XMIT(); tunnel->recursion--; [-- Attachment #3: 02-hooks.diff --] [-- Type: text/x-patch, Size: 6379 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/16 00:50:24+01:00 kaber@trash.net # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # # net/ipv4/xfrm4_tunnel.c # 2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0 # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # # net/ipv4/xfrm4_input.c # 2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0 # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # # net/ipv4/ipcomp.c # 2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0 # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # # net/ipv4/ip_output.c # 2004/02/16 00:50:18+01:00 kaber@trash.net +22 -4 # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # # net/ipv4/ip_forward.c # 2004/02/16 00:50:18+01:00 kaber@trash.net +4 -0 # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # # net/ipv4/esp4.c # 2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0 # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # # net/ipv4/ah4.c # 2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0 # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # # include/net/ip.h # 2004/02/16 00:50:18+01:00 kaber@trash.net +1 -0 # Make packets visible to POST_ROUTING before encryption and LOCAL_OUT # afterwards, reset netfilter fields before re-posting packet into the # stack on reception. # diff -Nru a/include/net/ip.h b/include/net/ip.h --- a/include/net/ip.h Mon Feb 16 02:13:07 2004 +++ b/include/net/ip.h Mon Feb 16 02:13:07 2004 @@ -48,6 +48,7 @@ #define IPSKB_TRANSLATED 2 #define IPSKB_FORWARDED 4 #define IPSKB_XFRM_TUNNEL_SIZE 8 +#define IPSKB_XFRM_TRANSFORMED 16 }; struct ipcm_cookie diff -Nru a/net/ipv4/ah4.c b/net/ipv4/ah4.c --- a/net/ipv4/ah4.c Mon Feb 16 02:13:07 2004 +++ b/net/ipv4/ah4.c Mon Feb 16 02:13:07 2004 @@ -137,6 +137,7 @@ ip_send_check(top_iph); skb->nh.raw = skb->data; + IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED; x->curlft.bytes += skb->len; x->curlft.packets++; diff -Nru a/net/ipv4/esp4.c b/net/ipv4/esp4.c --- a/net/ipv4/esp4.c Mon Feb 16 02:13:07 2004 +++ b/net/ipv4/esp4.c Mon Feb 16 02:13:07 2004 @@ -191,6 +191,7 @@ ip_send_check(top_iph); skb->nh.raw = skb->data; + IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED; x->curlft.bytes += skb->len; x->curlft.packets++; diff -Nru a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c --- a/net/ipv4/ip_forward.c Mon Feb 16 02:13:07 2004 +++ b/net/ipv4/ip_forward.c Mon Feb 16 02:13:07 2004 @@ -51,6 +51,10 @@ if (unlikely(opt->optlen)) ip_forward_options(skb); + if (skb->dst->xfrm != NULL) + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, + skb->dst->dev, dst_output); + return dst_output(skb); } diff -Nru a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c --- a/net/ipv4/ip_output.c Mon Feb 16 02:13:07 2004 +++ b/net/ipv4/ip_output.c Mon Feb 16 02:13:07 2004 @@ -122,6 +122,14 @@ return ttl; } +static inline int ip_dst_output(struct sk_buff *skb) +{ + if (skb->dst->xfrm != NULL) + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, + skb->dst->dev, dst_output); + return dst_output(skb); +} + /* * Add an ip header to a skbuff and send it out. * @@ -164,7 +172,7 @@ /* Send it out. */ return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, - dst_output); + ip_dst_output); } static inline int ip_finish_output2(struct sk_buff *skb) @@ -282,7 +290,7 @@ return ip_finish_output(skb); } -int ip_output(struct sk_buff *skb) +static inline int ip_output2(struct sk_buff *skb) { IP_INC_STATS(IpOutRequests); @@ -293,6 +301,16 @@ return ip_finish_output(skb); } +int ip_output(struct sk_buff *skb) +{ + if (IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) { + nf_reset(skb); + return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, + skb->dst->dev, ip_output2); + } + return ip_output2(skb); +} + int ip_queue_xmit(struct sk_buff *skb, int ipfragok) { struct sock *sk = skb->sk; @@ -386,7 +404,7 @@ skb->priority = sk->sk_priority; return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, - dst_output); + ip_dst_output); no_route: IP_INC_STATS(IpOutNoRoutes); @@ -1165,7 +1183,7 @@ /* Netfilter gets whole the not fragmented skb. */ err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, - skb->dst->dev, dst_output); + skb->dst->dev, ip_dst_output); if (err) { if (err > 0) err = inet->recverr ? net_xmit_errno(err) : 0; diff -Nru a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c --- a/net/ipv4/ipcomp.c Mon Feb 16 02:13:07 2004 +++ b/net/ipv4/ipcomp.c Mon Feb 16 02:13:07 2004 @@ -223,6 +223,7 @@ skb->nh.raw = skb->data; out_ok: + IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED; x->curlft.bytes += skb->len; x->curlft.packets++; spin_unlock_bh(&x->lock); diff -Nru a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c --- a/net/ipv4/xfrm4_input.c Mon Feb 16 02:13:07 2004 +++ b/net/ipv4/xfrm4_input.c Mon Feb 16 02:13:07 2004 @@ -130,6 +130,7 @@ dst_release(skb->dst); skb->dst = NULL; } + nf_reset(skb); netif_rx(skb); return 0; } else { diff -Nru a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c --- a/net/ipv4/xfrm4_tunnel.c Mon Feb 16 02:13:07 2004 +++ b/net/ipv4/xfrm4_tunnel.c Mon Feb 16 02:13:07 2004 @@ -66,6 +66,7 @@ ip_send_check(top_iph); skb->nh.raw = skb->data; + IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED; x->curlft.bytes += skb->len; x->curlft.packets++; [-- Attachment #4: 03-nat-policy-lookup.diff --] [-- Type: text/x-patch, Size: 5377 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/16 00:51:09+01:00 kaber@trash.net # Add policy lookups to ip_route_me_harder, make NAT reroute for any # change in route/policy key # # net/ipv4/netfilter/ip_nat_standalone.c # 2004/02/16 00:51:03+01:00 kaber@trash.net +72 -8 # Add policy lookups to ip_route_me_harder, make NAT reroute for any # change in route/policy key # # net/ipv4/netfilter/ip_conntrack_standalone.c # 2004/02/16 00:51:03+01:00 kaber@trash.net +1 -0 # Add policy lookups to ip_route_me_harder, make NAT reroute for any # change in route/policy key # # net/core/netfilter.c # 2004/02/16 00:51:03+01:00 kaber@trash.net +16 -0 # Add policy lookups to ip_route_me_harder, make NAT reroute for any # change in route/policy key # diff -Nru a/net/core/netfilter.c b/net/core/netfilter.c --- a/net/core/netfilter.c Mon Feb 16 02:13:24 2004 +++ b/net/core/netfilter.c Mon Feb 16 02:13:24 2004 @@ -25,6 +25,8 @@ #include <linux/icmp.h> #include <net/sock.h> #include <net/route.h> +#include <net/xfrm.h> +#include <net/ip.h> #include <linux/ip.h> #define __KERNEL_SYSCALLS__ @@ -664,6 +666,20 @@ if ((*pskb)->dst->error) return -1; + +#ifdef CONFIG_XFRM + if (!(IPCB(*pskb)->flags & IPSKB_XFRM_TRANSFORMED)) { + struct xfrm_policy_afinfo *afinfo; + + afinfo = xfrm_policy_get_afinfo(AF_INET); + if (afinfo != NULL) { + afinfo->decode_session(*pskb, &fl); + xfrm_policy_put_afinfo(afinfo); + if (xfrm_lookup(&(*pskb)->dst, &fl, (*pskb)->sk, 0) != 0) + return -1; + } + } +#endif /* Change in oif may mean change in hh_len. */ hh_len = (*pskb)->dst->dev->hard_header_len; diff -Nru a/net/ipv4/netfilter/ip_conntrack_standalone.c b/net/ipv4/netfilter/ip_conntrack_standalone.c --- a/net/ipv4/netfilter/ip_conntrack_standalone.c Mon Feb 16 02:13:24 2004 +++ b/net/ipv4/netfilter/ip_conntrack_standalone.c Mon Feb 16 02:13:24 2004 @@ -489,6 +489,7 @@ EXPORT_SYMBOL(ip_conntrack_alter_reply); EXPORT_SYMBOL(ip_conntrack_destroyed); EXPORT_SYMBOL(ip_conntrack_get); +EXPORT_SYMBOL(__ip_conntrack_confirm); EXPORT_SYMBOL(need_ip_conntrack); EXPORT_SYMBOL(ip_conntrack_helper_register); EXPORT_SYMBOL(ip_conntrack_helper_unregister); diff -Nru a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c --- a/net/ipv4/netfilter/ip_nat_standalone.c Mon Feb 16 02:13:24 2004 +++ b/net/ipv4/netfilter/ip_nat_standalone.c Mon Feb 16 02:13:24 2004 @@ -166,6 +166,45 @@ return do_bindings(ct, ctinfo, info, hooknum, pskb); } +struct nat_route_key +{ + u_int32_t addr; +#ifdef CONFIG_XFRM + u_int16_t port; +#endif +}; + +static inline void +nat_route_key_get(struct sk_buff *skb, struct nat_route_key *key, int which) +{ + struct iphdr *iph = skb->nh.iph; + + key->addr = which ? iph->daddr : iph->saddr; +#ifdef CONFIG_XFRM + key->port = 0; + if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) { + u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4); + key->port = ports[which]; + } +#endif +} + +static inline int +nat_route_key_compare(struct sk_buff *skb, struct nat_route_key *key, int which) +{ + struct iphdr *iph = skb->nh.iph; + + if (key->addr != (which ? iph->daddr : iph->saddr)) + return 1; +#ifdef CONFIG_XFRM + if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) { + u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4); + if (key->port != ports[which]) + return 1; + } +#endif +} + static unsigned int ip_nat_out(unsigned int hooknum, struct sk_buff **pskb, @@ -173,6 +212,9 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { + struct nat_route_key key; + unsigned int ret; + /* root is playing with raw sockets. */ if ((*pskb)->len < sizeof(struct iphdr) || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) @@ -195,7 +237,29 @@ return NF_STOLEN; } - return ip_nat_fn(hooknum, pskb, in, out, okfn); + nat_route_key_get(*pskb, &key, 0); + ret = ip_nat_fn(hooknum, pskb, in, out, okfn); + + if (ret != NF_DROP && ret != NF_STOLEN + && nat_route_key_compare(*pskb, &key, 0)) { + if (ip_route_me_harder(pskb) != 0) + ret = NF_DROP; +#ifdef CONFIG_XFRM + /* + * POST_ROUTING hook is called with fixed outfn, we need + * to manually confirm the packet and direct it to the + * transformers if a policy matches. + */ + else if ((*pskb)->dst->xfrm != NULL) { + ret = ip_conntrack_confirm(*pskb); + if (ret != NF_DROP) { + dst_output(*pskb); + ret = NF_STOLEN; + } + } +#endif + } + return ret; } #ifdef CONFIG_IP_NF_NAT_LOCAL @@ -206,7 +270,7 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { - u_int32_t saddr, daddr; + struct nat_route_key key; unsigned int ret; /* root is playing with raw sockets. */ @@ -214,14 +278,14 @@ || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) return NF_ACCEPT; - saddr = (*pskb)->nh.iph->saddr; - daddr = (*pskb)->nh.iph->daddr; - + nat_route_key_get(*pskb, &key, 1); ret = ip_nat_fn(hooknum, pskb, in, out, okfn); + if (ret != NF_DROP && ret != NF_STOLEN - && ((*pskb)->nh.iph->saddr != saddr - || (*pskb)->nh.iph->daddr != daddr)) - return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP; + && nat_route_key_compare(*pskb, &key, 1)) { + if (ip_route_me_harder(pskb) != 0) + ret = NF_DROP; + } return ret; } #endif [-- Attachment #5: 04-nat-policy-check.diff --] [-- Type: text/x-patch, Size: 5682 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/16 00:54:30+01:00 kaber@trash.net # Make policy checks work with NAT # # net/xfrm/xfrm_policy.c # 2004/02/16 00:54:24+01:00 kaber@trash.net +2 -0 # Make policy checks work with NAT # # net/ipv4/udp.c # 2004/02/16 00:54:24+01:00 kaber@trash.net +2 -0 # Make policy checks work with NAT # # net/ipv4/tcp_ipv4.c # 2004/02/16 00:54:24+01:00 kaber@trash.net +1 -0 # Make policy checks work with NAT # # net/ipv4/raw.c # 2004/02/16 00:54:24+01:00 kaber@trash.net +1 -0 # Make policy checks work with NAT # # net/ipv4/ip_input.c # 2004/02/16 00:54:23+01:00 kaber@trash.net +6 -8 # Make policy checks work with NAT # # net/core/netfilter.c # 2004/02/16 00:54:23+01:00 kaber@trash.net +43 -0 # Make policy checks work with NAT # # include/linux/netfilter.h # 2004/02/16 00:54:23+01:00 kaber@trash.net +16 -0 # Make policy checks work with NAT # diff -Nru a/include/linux/netfilter.h b/include/linux/netfilter.h --- a/include/linux/netfilter.h Mon Feb 16 02:13:39 2004 +++ b/include/linux/netfilter.h Mon Feb 16 02:13:39 2004 @@ -166,5 +166,21 @@ #define NF_HOOK(pf, hook, skb, indev, outdev, okfn) (okfn)(skb) #endif /*CONFIG_NETFILTER*/ +#ifdef CONFIG_XFRM +#ifdef CONFIG_IP_NF_NAT_NEEDED +struct flowi; +extern void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl); + +static inline void +nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, int family) +{ + if (family == AF_INET) + nf_nat_decode_session4(skb, fl); +} +#else /* CONFIG_IP_NF_NAT_NEEDED */ +#define nf_nat_decode_session(skb,fl,family) +#endif /* CONFIG_IP_NF_NAT_NEEDED */ +#endif /* CONFIG_XFRM */ + #endif /*__KERNEL__*/ #endif /*__LINUX_NETFILTER_H*/ diff -Nru a/net/core/netfilter.c b/net/core/netfilter.c --- a/net/core/netfilter.c Mon Feb 16 02:13:39 2004 +++ b/net/core/netfilter.c Mon Feb 16 02:13:39 2004 @@ -698,6 +698,49 @@ return 0; } +#if defined(CONFIG_IP_NF_NAT_NEEDED) && defined(CONFIG_XFRM) +#include <linux/netfilter_ipv4/ip_conntrack.h> +#include <linux/netfilter_ipv4/ip_nat.h> + +void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl) +{ + struct ip_conntrack *ct; + struct ip_conntrack_tuple *t; + struct ip_nat_info_manip *m; + unsigned int i; + + if (skb->nfct == NULL) + return; + ct = (struct ip_conntrack *)skb->nfct->master; + + for (i = 0; i < ct->nat.info.num_manips; i++) { + m = &ct->nat.info.manips[i]; + t = &ct->tuplehash[m->direction].tuple; + + switch (m->hooknum) { + case NF_IP_PRE_ROUTING: + if (m->maniptype != IP_NAT_MANIP_DST) + break; + fl->fl4_dst = t->dst.ip; + if (t->dst.protonum == IPPROTO_TCP || + t->dst.protonum == IPPROTO_UDP) + fl->fl_ip_dport = t->dst.u.tcp.port; + break; +#ifdef CONFIG_IP_NF_NAT_LOCAL + case NF_IP_LOCAL_IN: + if (m->maniptype != IP_NAT_MANIP_SRC) + break; + fl->fl4_src = t->src.ip; + if (t->dst.protonum == IPPROTO_TCP || + t->dst.protonum == IPPROTO_UDP) + fl->fl_ip_sport = t->src.u.tcp.port; + break; +#endif + } + } +} +#endif /* CONFIG_IP_NF_NAT_NEEDED && CONFIG_XFRM */ + int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len) { struct sk_buff *nskb; diff -Nru a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c --- a/net/ipv4/ip_input.c Mon Feb 16 02:13:39 2004 +++ b/net/ipv4/ip_input.c Mon Feb 16 02:13:39 2004 @@ -206,10 +206,6 @@ __skb_pull(skb, ihl); - /* Free reference early: we don't need it any more, and it may - hold ip_conntrack module loaded indefinitely. */ - nf_reset(skb); - /* Point into the IP datagram, just past the header. */ skb->h.raw = skb->data; @@ -235,10 +231,12 @@ int ret; smp_read_barrier_depends(); - if (!ipprot->no_policy && - !xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { - kfree_skb(skb); - goto out; + if (!ipprot->no_policy) { + if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { + kfree_skb(skb); + goto out; + } + nf_reset(skb); } ret = ipprot->handler(skb); if (ret < 0) { diff -Nru a/net/ipv4/raw.c b/net/ipv4/raw.c --- a/net/ipv4/raw.c Mon Feb 16 02:13:39 2004 +++ b/net/ipv4/raw.c Mon Feb 16 02:13:39 2004 @@ -249,6 +249,7 @@ kfree_skb(skb); return NET_RX_DROP; } + nf_reset(skb); skb_push(skb, skb->data - skb->nh.raw); diff -Nru a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c Mon Feb 16 02:13:39 2004 +++ b/net/ipv4/tcp_ipv4.c Mon Feb 16 02:13:39 2004 @@ -1785,6 +1785,7 @@ if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse; + nf_reset(skb); if (sk_filter(sk, skb, 0)) goto discard_and_relse; diff -Nru a/net/ipv4/udp.c b/net/ipv4/udp.c --- a/net/ipv4/udp.c Mon Feb 16 02:13:39 2004 +++ b/net/ipv4/udp.c Mon Feb 16 02:13:39 2004 @@ -1027,6 +1027,7 @@ kfree_skb(skb); return -1; } + nf_reset(skb); if (up->encap_type) { /* @@ -1192,6 +1193,7 @@ if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) goto drop; + nf_reset(skb); /* No socket. Drop packet silently, if checksum is wrong */ if (udp_checksum_complete(skb)) diff -Nru a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c --- a/net/xfrm/xfrm_policy.c Mon Feb 16 02:13:39 2004 +++ b/net/xfrm/xfrm_policy.c Mon Feb 16 02:13:39 2004 @@ -21,6 +21,7 @@ #include <linux/workqueue.h> #include <linux/notifier.h> #include <linux/netdevice.h> +#include <linux/netfilter.h> #include <net/xfrm.h> #include <net/ip.h> @@ -908,6 +909,7 @@ if (_decode_session(skb, &fl, family) < 0) return 0; + nf_nat_decode_session(skb, &fl, family); /* First, check used SA against their selectors. */ if (skb->sp) { ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-02-16 1:19 ` Patrick McHardy @ 2004-02-18 14:57 ` Patrick McHardy [not found] ` <20040218220337.GA3193@alpha.home.local> 0 siblings, 1 reply; 63+ messages in thread From: Patrick McHardy @ 2004-02-18 14:57 UTC (permalink / raw) Cc: Harald Welte, Henrik Nordstrom, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1246 bytes --] Here is another patch on top of the last ones, it makes incoming packets from transport mode SAs visible to netfilter hooks and emulates normal behaviour. With this patch you can share a transport-mode SA between an entire network using MASQUERADE, but also do weird stuff like DNAT packets coming from a transport mode SA to remote hosts ;) The patch is not exactly nice, but with it, everything should be working fine except for nested tunnels. I would really appreciate if I can get some comments for this part and the other ones so I can prepare some final patches to discuss with davem. Regards Patrick Patrick McHardy wrote: > There are 4 patches: > > 01-nf_reset.diff > Move common nf_conntrack_put/nfct=NULL/nf_debug=0 code to > new inline function nf_reset. > > 02-hooks.diff > Make packets to be encrypted visible on POST_ROUTING hook > and encrypted packets on LOCAL_OUT. Reset nfct etc. before > reposting the packet into the stack on reception. > > 03-nat-policy-lookup.diff > Add policy lookups to ip_route_me_harder and change NAT to > reroute for any change that affects routing or policy lookups > > 04-nat-policy-checks.diff > Make xfrm_policy_check find correct policy for NATed packets [-- Attachment #2: 05-transport-mode-hooks.diff --] [-- Type: text/plain, Size: 2092 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/18 15:42:49+01:00 kaber@trash.net # Emulate netfilter hooks for transport mode packets # # net/ipv4/xfrm4_input.c # 2004/02/18 15:42:42+01:00 kaber@trash.net +46 -1 # Emulate netfilter hooks for transport mode packets # diff -Nru a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c --- a/net/ipv4/xfrm4_input.c Wed Feb 18 15:45:17 2004 +++ b/net/ipv4/xfrm4_input.c Wed Feb 18 15:45:17 2004 @@ -9,10 +9,53 @@ * */ +#include <linux/config.h> +#include <linux/skbuff.h> +#include <linux/netfilter.h> +#include <linux/netfilter_ipv4.h> #include <net/inet_ecn.h> #include <net/ip.h> #include <net/xfrm.h> +#ifdef CONFIG_NETFILTER +static inline int emulate_nf_done(struct sk_buff *skb) +{ + return 0; +} + +static inline int emulate_nf_hooks2(struct sk_buff *skb) +{ + if (inet_addr_type(skb->nh.iph->daddr) != RTN_LOCAL) { + if (ip_route_me_harder(&skb) == 0) + dst_input(skb); + else + kfree_skb(skb); + return -1; + } + + return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL, + emulate_nf_done); +} + +static inline int emulate_nf_hooks(struct sk_buff *skb) +{ + int off = skb->data - skb->nh.raw; + + skb_push(skb, off); + skb->nh.iph->tot_len = htons(skb->len); + ip_send_check(skb->nh.iph); + + if (NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL, + emulate_nf_hooks2) != 0) + return -1; + + skb_pull(skb, off); + return 0; +} +#else /* CONFIG_NETFILTER */ +static inline int emulate_nf_hooks(struct sk_buff *skb) {} +#endif /* CONFIG_NETFILTER */ + int xfrm4_rcv(struct sk_buff *skb) { return xfrm4_rcv_encap(skb, 0); @@ -125,15 +168,17 @@ memcpy(skb->sp->x+skb->sp->len, xfrm_vec, xfrm_nr*sizeof(struct sec_decap_state)); skb->sp->len += xfrm_nr; + nf_reset(skb); if (decaps) { if (!(skb->dev->flags&IFF_LOOPBACK)) { dst_release(skb->dst); skb->dst = NULL; } - nf_reset(skb); netif_rx(skb); return 0; } else { + if (emulate_nf_hooks(skb) != 0) + return 0; return -skb->nh.iph->protocol; } ^ permalink raw reply [flat|nested] 63+ messages in thread
[parent not found: <20040218220337.GA3193@alpha.home.local>]
* Re: [PATCH]Re: NAT before IPsec with 2.6 [not found] ` <20040218220337.GA3193@alpha.home.local> @ 2004-02-20 1:43 ` Patrick McHardy 2004-03-04 22:30 ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy 0 siblings, 1 reply; 63+ messages in thread From: Patrick McHardy @ 2004-02-20 1:43 UTC (permalink / raw) To: Willy Tarreau Cc: Harald Welte, Henrik Nordstrom, Tom Eastep, Michal Ludvig, Netfilter Development Mailinglist Hi Willy, Willy Tarreau wrote: > > Judging by the fact that I saw no reply to your previous mail, > I suspect that we all are a bit busy. Regarding your previous > question about a possible asymetry, I was starting to draw a > flow diagram to check what I understood correctly and that > we could use as a base to comment on, but I finally didn't have > time to work on it anymore. In fact, I see how a packet passes > through the tables and chains without ipsec, I have a few doubts > about what changes with ipsec and your patches (eg: I don't > remember if the decapsulated packet goes through mgl:PRE or not), > and I'm yet less certain about what is known about the sessions > at different stages. If I find time to come up with a diagram > (even if it's plain wrong), I'll post it here. Thanks a lot! Decapsulated packets go the usual way, in fact the patch doesn't change anything for tunnel mode except that it drops the conntrack reference before packets are posted into the stack again. For transport mode packets it's a bit different, and I too am not entirely sure if the packet is always in valid state with the emulate_nf_hooks stuff, especially when NAT-Traversal is used. I'm investigating this after I fix a bug Michal and a second tester reported. Regarding hooks passed, packets SNATed in POST_ROUTING which have a matching policy afterwards won't pass the SELINUX and CONNTRACK hooks. The mangle table may also cause problems when something causes rerouting, I haven't thought about the possible effects yet. Other than that I can currently not think of more problems .. Best regards, Patrick > > Cheers, > Willy > ^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH]: latest netfilter+ipsec patches 2004-02-20 1:43 ` Patrick McHardy @ 2004-03-04 22:30 ` Patrick McHardy 2004-03-04 23:11 ` Willy Tarreau ` (2 more replies) 0 siblings, 3 replies; 63+ messages in thread From: Patrick McHardy @ 2004-03-04 22:30 UTC (permalink / raw) To: Netfilter Development Mailinglist Cc: Willy Tarreau, Harald Welte, Henrik Nordstrom, Tom Eastep, Michal Ludvig, alex, guillaume [-- Attachment #1: Type: text/plain, Size: 1919 bytes --] (long CC list, if anyone doesn't want to be on it anymore please say so) This is the latest version of the netfilter+ipsec patches, I will check them in in a couple of days after some minor changes. (Noteworthy) changes since last version: - hook-traversal on input is symetrical to output now. This means packets will traverse PRE_ROUTING/LOCAL_IN before decryption and PRE_REROUTING/LOCAL_IN or PRE_ROUTING/FORWARD afterwards. The encrypted packets go through the stack, including the hooks, the usual way. Packets reposted into the stack (tunnel-mode SAs) skip the hooks until we know decryption is finished. If after input-routing one of these packets has a non-local destination, we know decyption is finished. The function nf_postxfrm_nonlocal() which passes them to PRE_ROUTING is called for these packets. Packets with local-destination are handled in ip_local_deliver_finish(), if the ip-protocol handler is not marked as xfrm_prot (all ipsec-transformers), decryption is finished and packets go to nf_postxfrm_input() where they will traverse the PRE_ROUTING hook, potentially be rerouted or traverse the LOCAL_IN hook, before beeing delivered to the ip-protocol handler. Packets from transport-mode SAs are handled exactly like local-packets from tunnel-mode SAs. The only problem with this approach I'm currently aware of is that one of the ip-protocol handlers marked with xfrm_prot (xfrm4_tunnel.c) is also responsible for dispatching packets to ipip.c; the encapsulated packets heading for a ipip-tunnel won't traverse the netfilter hooks on input. - new match "policy" for matching the policy that was used during decapsulation (well, the used SAs, policy checks come later), and the policy that will be used for encapsulation. I had some examples but I accidentally killed X before sending, for now I have just attached the test-script I used, if there are questions just ask. Regards Patrick [-- Attachment #2: 01-nf_reset.diff --] [-- Type: text/x-patch, Size: 5407 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/22 14:27:46+01:00 kaber@trash.net # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv6/sit.c # 2004/02/22 14:27:39+01:00 kaber@trash.net +2 -14 # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv6/ip6_tunnel.c # 2004/02/22 14:27:39+01:00 kaber@trash.net +1 -7 # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv4/ipip.c # 2004/02/22 14:27:39+01:00 kaber@trash.net +2 -14 # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv4/ip_input.c # 2004/02/22 14:27:39+01:00 kaber@trash.net +1 -5 # Add new function 'nf_reset' to reset netfilter related skb-fields # # net/ipv4/ip_gre.c # 2004/02/22 14:27:39+01:00 kaber@trash.net +2 -14 # Add new function 'nf_reset' to reset netfilter related skb-fields # # include/linux/skbuff.h # 2004/02/22 14:27:39+01:00 kaber@trash.net +12 -3 # Add new function 'nf_reset' to reset netfilter related skb-fields # diff -Nru a/include/linux/skbuff.h b/include/linux/skbuff.h --- a/include/linux/skbuff.h Sun Feb 22 14:35:04 2004 +++ b/include/linux/skbuff.h Sun Feb 22 14:35:04 2004 @@ -1204,6 +1204,14 @@ if (nfct) atomic_inc(&nfct->master->use); } +static inline void nf_reset(struct sk_buff *skb) +{ + nf_conntrack_put(skb->nfct); + skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif +} #ifdef CONFIG_BRIDGE_NETFILTER static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge) @@ -1216,9 +1224,10 @@ if (nf_bridge) atomic_inc(&nf_bridge->use); } -#endif - -#endif +#endif /* CONFIG_BRIDGE_NETFILTER */ +#else /* CONFIG_NETFILTER */ +static inline void nf_reset(struct sk_buff *skb) {} +#endif /* CONFIG_NETFILTER */ #endif /* __KERNEL__ */ #endif /* _LINUX_SKBUFF_H */ diff -Nru a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c --- a/net/ipv4/ip_gre.c Sun Feb 22 14:35:04 2004 +++ b/net/ipv4/ip_gre.c Sun Feb 22 14:35:04 2004 @@ -643,13 +643,7 @@ skb->dev = tunnel->dev; dst_release(skb->dst); skb->dst = NULL; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); ipgre_ecn_decapsulate(iph, skb); netif_rx(skb); read_unlock(&ipgre_lock); @@ -877,13 +871,7 @@ } } -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); IPTUNNEL_XMIT(); tunnel->recursion--; diff -Nru a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c --- a/net/ipv4/ip_input.c Sun Feb 22 14:35:04 2004 +++ b/net/ipv4/ip_input.c Sun Feb 22 14:35:04 2004 @@ -202,17 +202,13 @@ #ifdef CONFIG_NETFILTER_DEBUG nf_debug_ip_local_deliver(skb); - skb->nf_debug = 0; #endif /*CONFIG_NETFILTER_DEBUG*/ __skb_pull(skb, ihl); -#ifdef CONFIG_NETFILTER /* Free reference early: we don't need it any more, and it may hold ip_conntrack module loaded indefinitely. */ - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#endif /*CONFIG_NETFILTER*/ + nf_reset(skb); /* Point into the IP datagram, just past the header. */ skb->h.raw = skb->data; diff -Nru a/net/ipv4/ipip.c b/net/ipv4/ipip.c --- a/net/ipv4/ipip.c Sun Feb 22 14:35:04 2004 +++ b/net/ipv4/ipip.c Sun Feb 22 14:35:04 2004 @@ -496,13 +496,7 @@ skb->dev = tunnel->dev; dst_release(skb->dst); skb->dst = NULL; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); ipip_ecn_decapsulate(iph, skb); netif_rx(skb); read_unlock(&ipip_lock); @@ -647,13 +641,7 @@ if ((iph->ttl = tiph->ttl) == 0) iph->ttl = old_iph->ttl; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); IPTUNNEL_XMIT(); tunnel->recursion--; diff -Nru a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c --- a/net/ipv6/ip6_tunnel.c Sun Feb 22 14:35:04 2004 +++ b/net/ipv6/ip6_tunnel.c Sun Feb 22 14:35:04 2004 @@ -715,13 +715,7 @@ ipv6h->nexthdr = proto; ipv6_addr_copy(&ipv6h->saddr, &fl.fl6_src); ipv6_addr_copy(&ipv6h->daddr, &fl.fl6_dst); -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); pkt_len = skb->len; err = NF_HOOK(PF_INET6, NF_IP6_LOCAL_OUT, skb, NULL, skb->dst->dev, dst_output); diff -Nru a/net/ipv6/sit.c b/net/ipv6/sit.c --- a/net/ipv6/sit.c Sun Feb 22 14:35:04 2004 +++ b/net/ipv6/sit.c Sun Feb 22 14:35:04 2004 @@ -388,13 +388,7 @@ skb->dev = tunnel->dev; dst_release(skb->dst); skb->dst = NULL; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); ipip6_ecn_decapsulate(iph, skb); netif_rx(skb); read_unlock(&ipip6_lock); @@ -580,13 +574,7 @@ if ((iph->ttl = tiph->ttl) == 0) iph->ttl = iph6->hop_limit; -#ifdef CONFIG_NETFILTER - nf_conntrack_put(skb->nfct); - skb->nfct = NULL; -#ifdef CONFIG_NETFILTER_DEBUG - skb->nf_debug = 0; -#endif -#endif + nf_reset(skb); IPTUNNEL_XMIT(); tunnel->recursion--; [-- Attachment #3: 02-output_hooks.diff --] [-- Type: text/x-patch, Size: 6927 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/22 14:29:05+01:00 kaber@trash.net # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # # net/ipv4/xfrm4_tunnel.c # 2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0 # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # # net/ipv4/ipcomp.c # 2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0 # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # # net/ipv4/ip_output.c # 2004/02/22 14:28:58+01:00 kaber@trash.net +20 -4 # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # # net/ipv4/ip_forward.c # 2004/02/22 14:28:58+01:00 kaber@trash.net +2 -1 # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # # net/ipv4/esp4.c # 2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0 # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # # net/ipv4/ah4.c # 2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0 # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # # include/net/ip.h # 2004/02/22 14:28:58+01:00 kaber@trash.net +1 -0 # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # # include/linux/netfilter.h # 2004/02/22 14:28:58+01:00 kaber@trash.net +9 -4 # Pass packets to POST_ROUTING hook before encryption and LOCAL_OUT afterwards # diff -Nru a/include/linux/netfilter.h b/include/linux/netfilter.h --- a/include/linux/netfilter.h Sun Feb 22 14:35:13 2004 +++ b/include/linux/netfilter.h Sun Feb 22 14:35:13 2004 @@ -119,12 +119,14 @@ /* This is gross, but inline doesn't cut it for avoiding the function call in fast path: gcc doesn't inline (needs value tracking?). --RR */ #ifdef CONFIG_NETFILTER_DEBUG -#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \ - nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN) +#define NF_HOOK_COND(pf, hook, skb, indev, outdev, okfn, cond) \ +(!(cond) \ + ? (okfn)(skb) \ + : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)) #define NF_HOOK_THRESH nf_hook_slow #else -#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \ -(list_empty(&nf_hooks[(pf)][(hook)]) \ +#define NF_HOOK_COND(pf, hook, skb, indev, outdev, okfn, cond) \ +(!(cond) || list_empty(&nf_hooks[(pf)][(hook)]) \ ? (okfn)(skb) \ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)) #define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh) \ @@ -132,6 +134,8 @@ ? (okfn)(skb) \ : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh))) #endif +#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) \ + NF_HOOK_COND((pf), (hook), (skb), (indev), (outdev), (okfn), 1) int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb, struct net_device *indev, struct net_device *outdev, @@ -164,6 +168,7 @@ #else /* !CONFIG_NETFILTER */ #define NF_HOOK(pf, hook, skb, indev, outdev, okfn) (okfn)(skb) +#define NF_HOOK_COND NF_HOOK #endif /*CONFIG_NETFILTER*/ #endif /*__KERNEL__*/ diff -Nru a/include/net/ip.h b/include/net/ip.h --- a/include/net/ip.h Sun Feb 22 14:35:13 2004 +++ b/include/net/ip.h Sun Feb 22 14:35:13 2004 @@ -48,6 +48,7 @@ #define IPSKB_TRANSLATED 2 #define IPSKB_FORWARDED 4 #define IPSKB_XFRM_TUNNEL_SIZE 8 +#define IPSKB_XFRM_TRANSFORMED 16 }; struct ipcm_cookie diff -Nru a/net/ipv4/ah4.c b/net/ipv4/ah4.c --- a/net/ipv4/ah4.c Sun Feb 22 14:35:13 2004 +++ b/net/ipv4/ah4.c Sun Feb 22 14:35:13 2004 @@ -145,6 +145,7 @@ err = -EHOSTUNREACH; goto error_nolock; } + IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED; return NET_XMIT_BYPASS; error: diff -Nru a/net/ipv4/esp4.c b/net/ipv4/esp4.c --- a/net/ipv4/esp4.c Sun Feb 22 14:35:13 2004 +++ b/net/ipv4/esp4.c Sun Feb 22 14:35:13 2004 @@ -199,6 +199,7 @@ err = -EHOSTUNREACH; goto error_nolock; } + IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED; return NET_XMIT_BYPASS; error: diff -Nru a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c --- a/net/ipv4/ip_forward.c Sun Feb 22 14:35:13 2004 +++ b/net/ipv4/ip_forward.c Sun Feb 22 14:35:13 2004 @@ -51,7 +51,8 @@ if (unlikely(opt->optlen)) ip_forward_options(skb); - return dst_output(skb); + return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL, + skb->dst->dev, dst_output, skb->dst->xfrm != NULL); } int ip_forward(struct sk_buff *skb) diff -Nru a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c --- a/net/ipv4/ip_output.c Sun Feb 22 14:35:13 2004 +++ b/net/ipv4/ip_output.c Sun Feb 22 14:35:13 2004 @@ -122,6 +122,12 @@ return ttl; } +static inline int ip_dst_output(struct sk_buff *skb) +{ + return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL, + skb->dst->dev, dst_output, skb->dst->xfrm != NULL); +} + /* * Add an ip header to a skbuff and send it out. * @@ -164,7 +170,7 @@ /* Send it out. */ return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, - dst_output); + ip_dst_output); } static inline int ip_finish_output2(struct sk_buff *skb) @@ -282,7 +288,7 @@ return ip_finish_output(skb); } -int ip_output(struct sk_buff *skb) +static inline int ip_output2(struct sk_buff *skb) { IP_INC_STATS(IpOutRequests); @@ -293,6 +299,16 @@ return ip_finish_output(skb); } +int ip_output(struct sk_buff *skb) +{ + int transformed = IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED; + + if (transformed) + nf_reset(skb); + return NF_HOOK_COND(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, + skb->dst->dev, ip_output2, transformed); +} + int ip_queue_xmit(struct sk_buff *skb, int ipfragok) { struct sock *sk = skb->sk; @@ -386,7 +402,7 @@ skb->priority = sk->sk_priority; return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, - dst_output); + ip_dst_output); no_route: IP_INC_STATS(IpOutNoRoutes); @@ -1165,7 +1181,7 @@ /* Netfilter gets whole the not fragmented skb. */ err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, - skb->dst->dev, dst_output); + skb->dst->dev, ip_dst_output); if (err) { if (err > 0) err = inet->recverr ? net_xmit_errno(err) : 0; diff -Nru a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c --- a/net/ipv4/ipcomp.c Sun Feb 22 14:35:13 2004 +++ b/net/ipv4/ipcomp.c Sun Feb 22 14:35:13 2004 @@ -231,6 +231,7 @@ err = -EHOSTUNREACH; goto error_nolock; } + IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED; err = NET_XMIT_BYPASS; out_exit: diff -Nru a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c --- a/net/ipv4/xfrm4_tunnel.c Sun Feb 22 14:35:13 2004 +++ b/net/ipv4/xfrm4_tunnel.c Sun Feb 22 14:35:13 2004 @@ -76,6 +76,7 @@ err = -EHOSTUNREACH; goto error_nolock; } + IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED; return NET_XMIT_BYPASS; error_nolock: [-- Attachment #4: 03-input-hooks.diff --] [-- Type: text/x-patch, Size: 6024 bytes --] ===== include/linux/netfilter_ipv4.h 1.6 vs edited ===== --- 1.6/include/linux/netfilter_ipv4.h Wed Jan 7 06:38:33 2004 +++ edited/include/linux/netfilter_ipv4.h Sun Feb 22 19:25:19 2004 @@ -83,6 +83,21 @@ Returns true or false. */ extern int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len); + +#ifdef CONFIG_XFRM +extern int nf_postxfrm_input(struct sk_buff *skb); +extern int nf_postxfrm_nonlocal(struct sk_buff *skb); +#else /* CONFIG_XFRM */ +static inline int nf_postxfrm_input(struct sk_buff *skb) +{ + return 0; +} + +static inline int nf_postxfrm_nonlocal(struct sk_buff *skb) +{ + return 0; +} +#endif /* CONFIG_XFRM */ #endif /*__KERNEL__*/ #endif /*__LINUX_IP_NETFILTER_H*/ ===== include/net/protocol.h 1.10 vs edited ===== --- 1.10/include/net/protocol.h Sat May 10 14:25:34 2003 +++ edited/include/net/protocol.h Sun Feb 22 19:25:08 2004 @@ -39,6 +39,7 @@ int (*handler)(struct sk_buff *skb); void (*err_handler)(struct sk_buff *skb, u32 info); int no_policy; + int xfrm_prot; }; #if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) ===== net/core/netfilter.c 1.26 vs edited ===== --- 1.26/net/core/netfilter.c Sun Sep 28 18:34:18 2003 +++ edited/net/core/netfilter.c Sun Feb 22 19:25:19 2004 @@ -11,6 +11,7 @@ */ #include <linux/config.h> #include <linux/netfilter.h> +#include <linux/netfilter_ipv4.h> #include <net/protocol.h> #include <linux/init.h> #include <linux/skbuff.h> @@ -25,6 +26,7 @@ #include <linux/icmp.h> #include <net/sock.h> #include <net/route.h> +#include <net/xfrm.h> #include <linux/ip.h> #define __KERNEL_SYSCALLS__ @@ -628,9 +630,6 @@ struct dst_entry *odst; unsigned int hh_len; - /* some non-standard hacks like ipt_REJECT.c:send_reset() can cause - * packets with foreign saddr to appear on the NF_IP_LOCAL_OUT hook. - */ if (inet_addr_type(iph->saddr) == RTN_LOCAL) { fl.nl_u.ip4_u.daddr = iph->daddr; fl.nl_u.ip4_u.saddr = iph->saddr; @@ -681,6 +680,54 @@ return 0; } + +#ifdef CONFIG_XFRM +static inline int nf_postxfrm_done(struct sk_buff *skb) +{ + return 0; +} + +static inline int nf_postxfrm_input2(struct sk_buff *skb) +{ + if (inet_addr_type(skb->nh.iph->daddr) != RTN_LOCAL) { + if (ip_route_me_harder(&skb) == 0) + dst_input(skb); + else + kfree_skb(skb); + return -1; + } + + return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL, + nf_postxfrm_done); +} + +int nf_postxfrm_input(struct sk_buff *skb) +{ + int off = skb->data - skb->nh.raw; + + __skb_push(skb, off); + /* Fix header len and checksum if last xfrm was transport mode */ + if (!skb->sp->x[skb->sp->len - 1].xvec->props.mode) { + skb->nh.iph->tot_len = htons(skb->len); + ip_send_check(skb->nh.iph); + } + + nf_reset(skb); + if (NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL, + nf_postxfrm_input2) != 0) + return -1; + + __skb_pull(skb, off); + return 0; +} + +int nf_postxfrm_nonlocal(struct sk_buff *skb) +{ + nf_reset(skb); + return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL, + nf_postxfrm_done); +} +#endif /* CONFIG_XFRM */ int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len) { ===== net/ipv4/ah4.c 1.30 vs edited ===== --- 1.30/net/ipv4/ah4.c Sun Feb 22 14:28:58 2004 +++ edited/net/ipv4/ah4.c Sun Feb 22 19:25:08 2004 @@ -344,6 +344,7 @@ .handler = xfrm4_rcv, .err_handler = ah4_err, .no_policy = 1, + .xfrm_prot = 1, }; static int __init ah4_init(void) ===== net/ipv4/esp4.c 1.36 vs edited ===== --- 1.36/net/ipv4/esp4.c Sun Feb 22 14:28:58 2004 +++ edited/net/ipv4/esp4.c Sun Feb 22 19:25:09 2004 @@ -575,6 +575,7 @@ .handler = xfrm4_rcv, .err_handler = esp4_err, .no_policy = 1, + .xfrm_prot = 1, }; static int __init esp4_init(void) ===== net/ipv4/ip_input.c 1.21 vs edited ===== --- 1.21/net/ipv4/ip_input.c Sun Feb 22 14:27:39 2004 +++ edited/net/ipv4/ip_input.c Sun Feb 22 21:20:47 2004 @@ -224,6 +224,12 @@ resubmit: hash = protocol & (MAX_INET_PROTOS - 1); raw_sk = sk_head(&raw_v4_htable[hash]); + ipprot = inet_protos[hash]; + smp_read_barrier_depends(); + + if (skb->sp && !ipprot->xfrm_prot) + if (nf_postxfrm_input(skb)) + goto out; /* If there maybe a raw socket we must check - if not we * don't care less @@ -231,10 +237,9 @@ if (raw_sk) raw_v4_input(skb, skb->nh.iph, hash); - if ((ipprot = inet_protos[hash]) != NULL) { + if (ipprot != NULL) { int ret; - smp_read_barrier_depends(); if (!ipprot->no_policy && !xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { kfree_skb(skb); @@ -279,8 +284,8 @@ return 0; } - return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL, - ip_local_deliver_finish); + return NF_HOOK_COND(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL, + ip_local_deliver_finish, !skb->sp); } static inline int ip_rcv_finish(struct sk_buff *skb) @@ -346,6 +351,10 @@ } } + if (skb->sp && !(((struct rtable *)skb->dst)->rt_flags&RTCF_LOCAL)) + if (nf_postxfrm_nonlocal(skb)) + goto drop; + return dst_input(skb); inhdr_error: @@ -418,8 +427,8 @@ } } - return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL, - ip_rcv_finish); + return NF_HOOK_COND(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL, + ip_rcv_finish, !skb->sp); inhdr_error: IP_INC_STATS_BH(IpInHdrErrors); ===== net/ipv4/ipcomp.c 1.19 vs edited ===== --- 1.19/net/ipv4/ipcomp.c Sun Feb 22 14:28:58 2004 +++ edited/net/ipv4/ipcomp.c Sun Feb 22 19:25:09 2004 @@ -408,6 +408,7 @@ .handler = xfrm4_rcv, .err_handler = ipcomp4_err, .no_policy = 1, + .xfrm_prot = 1, }; static int __init ipcomp4_init(void) ===== net/ipv4/xfrm4_tunnel.c 1.10 vs edited ===== --- 1.10/net/ipv4/xfrm4_tunnel.c Sun Feb 22 14:28:58 2004 +++ edited/net/ipv4/xfrm4_tunnel.c Sun Feb 22 19:25:10 2004 @@ -171,6 +171,7 @@ .handler = ipip_rcv, .err_handler = ipip_err, .no_policy = 1, + .xfrm_prot = 1, }; static int __init ipip_init(void) [-- Attachment #5: 04-nat-policy_lookup.diff --] [-- Type: text/x-patch, Size: 5352 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/22 14:31:33+01:00 kaber@trash.net # Add policy lookups to ip_route_me_harder, make NAT reroute for any change # in route/policy key # # net/ipv4/netfilter/ip_nat_standalone.c # 2004/02/22 14:31:26+01:00 kaber@trash.net +72 -8 # Add policy lookups to ip_route_me_harder, make NAT reroute for any change # in route/policy key # # net/ipv4/netfilter/ip_conntrack_standalone.c # 2004/02/22 14:31:26+01:00 kaber@trash.net +1 -0 # Add policy lookups to ip_route_me_harder, make NAT reroute for any change # in route/policy key # # net/core/netfilter.c # 2004/02/22 14:31:26+01:00 kaber@trash.net +15 -0 # Add policy lookups to ip_route_me_harder, make NAT reroute for any change # in route/policy key # diff -Nru a/net/core/netfilter.c b/net/core/netfilter.c --- a/net/core/netfilter.c Sun Feb 22 14:35:30 2004 +++ b/net/core/netfilter.c Sun Feb 22 14:35:30 2004 @@ -27,6 +27,7 @@ #include <net/sock.h> #include <net/route.h> #include <net/xfrm.h> +#include <net/ip.h> #include <linux/ip.h> #define __KERNEL_SYSCALLS__ @@ -663,6 +664,20 @@ if ((*pskb)->dst->error) return -1; + +#ifdef CONFIG_XFRM + if (!(IPCB(*pskb)->flags & IPSKB_XFRM_TRANSFORMED)) { + struct xfrm_policy_afinfo *afinfo; + + afinfo = xfrm_policy_get_afinfo(AF_INET); + if (afinfo != NULL) { + afinfo->decode_session(*pskb, &fl); + xfrm_policy_put_afinfo(afinfo); + if (xfrm_lookup(&(*pskb)->dst, &fl, (*pskb)->sk, 0) != 0) + return -1; + } + } +#endif /* Change in oif may mean change in hh_len. */ hh_len = (*pskb)->dst->dev->hard_header_len; diff -Nru a/net/ipv4/netfilter/ip_conntrack_standalone.c b/net/ipv4/netfilter/ip_conntrack_standalone.c --- a/net/ipv4/netfilter/ip_conntrack_standalone.c Sun Feb 22 14:35:30 2004 +++ b/net/ipv4/netfilter/ip_conntrack_standalone.c Sun Feb 22 14:35:30 2004 @@ -489,6 +489,7 @@ EXPORT_SYMBOL(ip_conntrack_alter_reply); EXPORT_SYMBOL(ip_conntrack_destroyed); EXPORT_SYMBOL(ip_conntrack_get); +EXPORT_SYMBOL(__ip_conntrack_confirm); EXPORT_SYMBOL(need_ip_conntrack); EXPORT_SYMBOL(ip_conntrack_helper_register); EXPORT_SYMBOL(ip_conntrack_helper_unregister); diff -Nru a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c --- a/net/ipv4/netfilter/ip_nat_standalone.c Sun Feb 22 14:35:30 2004 +++ b/net/ipv4/netfilter/ip_nat_standalone.c Sun Feb 22 14:35:30 2004 @@ -166,6 +166,45 @@ return do_bindings(ct, ctinfo, info, hooknum, pskb); } +struct nat_route_key +{ + u_int32_t addr; +#ifdef CONFIG_XFRM + u_int16_t port; +#endif +}; + +static inline void +nat_route_key_get(struct sk_buff *skb, struct nat_route_key *key, int which) +{ + struct iphdr *iph = skb->nh.iph; + + key->addr = which ? iph->daddr : iph->saddr; +#ifdef CONFIG_XFRM + key->port = 0; + if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) { + u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4); + key->port = ports[which]; + } +#endif +} + +static inline int +nat_route_key_compare(struct sk_buff *skb, struct nat_route_key *key, int which) +{ + struct iphdr *iph = skb->nh.iph; + + if (key->addr != (which ? iph->daddr : iph->saddr)) + return 1; +#ifdef CONFIG_XFRM + if (iph->protocol == IPPROTO_TCP || iph->protocol == IPPROTO_UDP) { + u_int16_t *ports = (u_int16_t *)(skb->nh.raw + iph->ihl*4); + if (key->port != ports[which]) + return 1; + } +#endif +} + static unsigned int ip_nat_out(unsigned int hooknum, struct sk_buff **pskb, @@ -173,6 +212,9 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { + struct nat_route_key key; + unsigned int ret; + /* root is playing with raw sockets. */ if ((*pskb)->len < sizeof(struct iphdr) || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) @@ -195,7 +237,29 @@ return NF_STOLEN; } - return ip_nat_fn(hooknum, pskb, in, out, okfn); + nat_route_key_get(*pskb, &key, 0); + ret = ip_nat_fn(hooknum, pskb, in, out, okfn); + + if (ret != NF_DROP && ret != NF_STOLEN + && nat_route_key_compare(*pskb, &key, 0)) { + if (ip_route_me_harder(pskb) != 0) + ret = NF_DROP; +#ifdef CONFIG_XFRM + /* + * POST_ROUTING hook is called with fixed outfn, we need + * to manually confirm the packet and direct it to the + * transformers if a policy matches. + */ + else if ((*pskb)->dst->xfrm != NULL) { + ret = ip_conntrack_confirm(*pskb); + if (ret != NF_DROP) { + dst_output(*pskb); + ret = NF_STOLEN; + } + } +#endif + } + return ret; } #ifdef CONFIG_IP_NF_NAT_LOCAL @@ -206,7 +270,7 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { - u_int32_t saddr, daddr; + struct nat_route_key key; unsigned int ret; /* root is playing with raw sockets. */ @@ -214,14 +278,14 @@ || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) return NF_ACCEPT; - saddr = (*pskb)->nh.iph->saddr; - daddr = (*pskb)->nh.iph->daddr; - + nat_route_key_get(*pskb, &key, 1); ret = ip_nat_fn(hooknum, pskb, in, out, okfn); + if (ret != NF_DROP && ret != NF_STOLEN - && ((*pskb)->nh.iph->saddr != saddr - || (*pskb)->nh.iph->daddr != daddr)) - return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP; + && nat_route_key_compare(*pskb, &key, 1)) { + if (ip_route_me_harder(pskb) != 0) + ret = NF_DROP; + } return ret; } #endif [-- Attachment #6: 05-nat-policy_checks.diff --] [-- Type: text/x-patch, Size: 5812 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/02/22 14:33:14+01:00 kaber@trash.net # Make policy checks find correct policy after NAT # # net/xfrm/xfrm_policy.c # 2004/02/22 14:33:07+01:00 kaber@trash.net +2 -0 # Make policy checks find correct policy after NAT # # net/ipv4/udp.c # 2004/02/22 14:33:07+01:00 kaber@trash.net +2 -0 # Make policy checks find correct policy after NAT # # net/ipv4/tcp_ipv4.c # 2004/02/22 14:33:07+01:00 kaber@trash.net +1 -0 # Make policy checks find correct policy after NAT # # net/ipv4/raw.c # 2004/02/22 14:33:07+01:00 kaber@trash.net +1 -0 # Make policy checks find correct policy after NAT # # net/ipv4/ip_input.c # 2004/02/22 14:33:07+01:00 kaber@trash.net +6 -8 # Make policy checks find correct policy after NAT # # net/core/netfilter.c # 2004/02/22 14:33:07+01:00 kaber@trash.net +43 -0 # Make policy checks find correct policy after NAT # # include/linux/netfilter.h # 2004/02/22 14:33:07+01:00 kaber@trash.net +16 -0 # Make policy checks find correct policy after NAT # diff -Nru a/include/linux/netfilter.h b/include/linux/netfilter.h --- a/include/linux/netfilter.h Sun Feb 22 14:35:36 2004 +++ b/include/linux/netfilter.h Sun Feb 22 14:35:36 2004 @@ -171,5 +171,21 @@ #define NF_HOOK_COND NF_HOOK #endif /*CONFIG_NETFILTER*/ +#ifdef CONFIG_XFRM +#ifdef CONFIG_IP_NF_NAT_NEEDED +struct flowi; +extern void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl); + +static inline void +nf_nat_decode_session(struct sk_buff *skb, struct flowi *fl, int family) +{ + if (family == AF_INET) + nf_nat_decode_session4(skb, fl); +} +#else /* CONFIG_IP_NF_NAT_NEEDED */ +#define nf_nat_decode_session(skb,fl,family) +#endif /* CONFIG_IP_NF_NAT_NEEDED */ +#endif /* CONFIG_XFRM */ + #endif /*__KERNEL__*/ #endif /*__LINUX_NETFILTER_H*/ diff -Nru a/net/core/netfilter.c b/net/core/netfilter.c --- a/net/core/netfilter.c Sun Feb 22 14:35:36 2004 +++ b/net/core/netfilter.c Sun Feb 22 14:35:36 2004 @@ -747,6 +747,49 @@ return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, skb->dev, NULL, nf_postxfrm_done); } + +#ifdef CONFIG_IP_NF_NAT_NEEDED +#include <linux/netfilter_ipv4/ip_conntrack.h> +#include <linux/netfilter_ipv4/ip_nat.h> + +void nf_nat_decode_session4(struct sk_buff *skb, struct flowi *fl) +{ + struct ip_conntrack *ct; + struct ip_conntrack_tuple *t; + struct ip_nat_info_manip *m; + unsigned int i; + + if (skb->nfct == NULL) + return; + ct = (struct ip_conntrack *)skb->nfct->master; + + for (i = 0; i < ct->nat.info.num_manips; i++) { + m = &ct->nat.info.manips[i]; + t = &ct->tuplehash[m->direction].tuple; + + switch (m->hooknum) { + case NF_IP_PRE_ROUTING: + if (m->maniptype != IP_NAT_MANIP_DST) + break; + fl->fl4_dst = t->dst.ip; + if (t->dst.protonum == IPPROTO_TCP || + t->dst.protonum == IPPROTO_UDP) + fl->fl_ip_dport = t->dst.u.tcp.port; + break; +#ifdef CONFIG_IP_NF_NAT_LOCAL + case NF_IP_LOCAL_IN: + if (m->maniptype != IP_NAT_MANIP_SRC) + break; + fl->fl4_src = t->src.ip; + if (t->dst.protonum == IPPROTO_TCP || + t->dst.protonum == IPPROTO_UDP) + fl->fl_ip_sport = t->src.u.tcp.port; + break; +#endif + } + } +} +#endif /* CONFIG_IP_NF_NAT_NEEDED */ #endif /* CONFIG_XFRM */ int skb_ip_make_writable(struct sk_buff **pskb, unsigned int writable_len) diff -Nru a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c --- a/net/ipv4/ip_input.c Sun Feb 22 14:35:36 2004 +++ b/net/ipv4/ip_input.c Sun Feb 22 14:35:36 2004 @@ -206,10 +206,6 @@ __skb_pull(skb, ihl); - /* Free reference early: we don't need it any more, and it may - hold ip_conntrack module loaded indefinitely. */ - nf_reset(skb); - /* Point into the IP datagram, just past the header. */ skb->h.raw = skb->data; @@ -240,10 +236,12 @@ if (ipprot != NULL) { int ret; - if (!ipprot->no_policy && - !xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { - kfree_skb(skb); - goto out; + if (!ipprot->no_policy) { + if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { + kfree_skb(skb); + goto out; + } + nf_reset(skb); } ret = ipprot->handler(skb); if (ret < 0) { diff -Nru a/net/ipv4/raw.c b/net/ipv4/raw.c --- a/net/ipv4/raw.c Sun Feb 22 14:35:36 2004 +++ b/net/ipv4/raw.c Sun Feb 22 14:35:36 2004 @@ -249,6 +249,7 @@ kfree_skb(skb); return NET_RX_DROP; } + nf_reset(skb); skb_push(skb, skb->data - skb->nh.raw); diff -Nru a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c Sun Feb 22 14:35:36 2004 +++ b/net/ipv4/tcp_ipv4.c Sun Feb 22 14:35:36 2004 @@ -1785,6 +1785,7 @@ if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse; + nf_reset(skb); if (sk_filter(sk, skb, 0)) goto discard_and_relse; diff -Nru a/net/ipv4/udp.c b/net/ipv4/udp.c --- a/net/ipv4/udp.c Sun Feb 22 14:35:36 2004 +++ b/net/ipv4/udp.c Sun Feb 22 14:35:36 2004 @@ -1027,6 +1027,7 @@ kfree_skb(skb); return -1; } + nf_reset(skb); if (up->encap_type) { /* @@ -1192,6 +1193,7 @@ if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) goto drop; + nf_reset(skb); /* No socket. Drop packet silently, if checksum is wrong */ if (udp_checksum_complete(skb)) diff -Nru a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c --- a/net/xfrm/xfrm_policy.c Sun Feb 22 14:35:36 2004 +++ b/net/xfrm/xfrm_policy.c Sun Feb 22 14:35:36 2004 @@ -21,6 +21,7 @@ #include <linux/workqueue.h> #include <linux/notifier.h> #include <linux/netdevice.h> +#include <linux/netfilter.h> #include <net/xfrm.h> #include <net/ip.h> @@ -908,6 +909,7 @@ if (_decode_session(skb, &fl, family) < 0) return 0; + nf_nat_decode_session(skb, &fl, family); /* First, check used SA against their selectors. */ if (skb->sp) { [-- Attachment #7: 06-policy-match.diff --] [-- Type: text/x-patch, Size: 7715 bytes --] # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/03/02 06:08:55+01:00 kaber@trash.net # Add policy match # # net/ipv4/netfilter/ipt_policy.c # 2004/03/02 06:08:46+01:00 kaber@trash.net +176 -0 # # include/linux/netfilter_ipv4/ipt_policy.h # 2004/03/02 06:08:46+01:00 kaber@trash.net +52 -0 # # net/ipv4/netfilter/ipt_policy.c # 2004/03/02 06:08:46+01:00 kaber@trash.net +0 -0 # BitKeeper file /home/kaber/src/nf/ipsec/linux-2.6/net/ipv4/netfilter/ipt_policy.c # # net/ipv4/netfilter/Makefile # 2004/03/02 06:08:46+01:00 kaber@trash.net +1 -0 # Add policy match # # net/ipv4/netfilter/Kconfig # 2004/03/02 06:08:46+01:00 kaber@trash.net +10 -0 # Add policy match # # include/linux/netfilter_ipv4/ipt_policy.h # 2004/03/02 06:08:46+01:00 kaber@trash.net +0 -0 # BitKeeper file /home/kaber/src/nf/ipsec/linux-2.6/include/linux/netfilter_ipv4/ipt_policy.h # diff -Nru a/include/linux/netfilter_ipv4/ipt_policy.h b/include/linux/netfilter_ipv4/ipt_policy.h --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/include/linux/netfilter_ipv4/ipt_policy.h Tue Mar 2 06:53:14 2004 @@ -0,0 +1,52 @@ +#ifndef _IPT_POLICY_H +#define _IPT_POLICY_H + +#define POLICY_MAX_ELEM 4 + +enum ipt_policy_flags +{ + POLICY_MATCH_IN = 0x1, + POLICY_MATCH_OUT = 0x2, + POLICY_MATCH_NONE = 0x4, + POLICY_MATCH_STRICT = 0x8, +}; + +enum ipt_policy_modes +{ + POLICY_MODE_TRANSPORT, + POLICY_MODE_TUNNEL +}; + +struct ipt_policy_spec +{ + u_int8_t saddr:1, + daddr:1, + proto:1, + mode:1, + spi:1, + reqid:1; +}; + +struct ipt_policy_elem +{ + u_int32_t saddr; + u_int32_t smask; + u_int32_t daddr; + u_int32_t dmask; + u_int32_t spi; + u_int32_t reqid; + u_int8_t proto; + u_int8_t mode; + + struct ipt_policy_spec match; + struct ipt_policy_spec invert; +}; + +struct ipt_policy_info +{ + struct ipt_policy_elem pol[POLICY_MAX_ELEM]; + u_int16_t flags; + u_int16_t len; +}; + +#endif /* _IPT_POLICY_H */ diff -Nru a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig --- a/net/ipv4/netfilter/Kconfig Tue Mar 2 06:53:14 2004 +++ b/net/ipv4/netfilter/Kconfig Tue Mar 2 06:53:14 2004 @@ -127,6 +127,16 @@ To compile it as a module, choose M here. If unsure, say N. +config IP_NF_MATCH_POLICY + tristate "IPsec policy match support" + depends on IP_NF_IPTABLES && XFRM + help + Policy matching allows you to match packets based on the + IPsec policy that was used during decapsulation/will + be used during encapsulation. + + To compile it as a module, choose M here. If unsure, say N. + config IP_NF_MATCH_MARK tristate "netfilter MARK match support" depends on IP_NF_IPTABLES diff -Nru a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile --- a/net/ipv4/netfilter/Makefile Tue Mar 2 06:53:14 2004 +++ b/net/ipv4/netfilter/Makefile Tue Mar 2 06:53:14 2004 @@ -65,6 +65,7 @@ obj-$(CONFIG_IP_NF_MATCH_TCPMSS) += ipt_tcpmss.o obj-$(CONFIG_IP_NF_MATCH_PHYSDEV) += ipt_physdev.o +obj-$(CONFIG_IP_NF_MATCH_POLICY) += ipt_policy.o # targets obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o diff -Nru a/net/ipv4/netfilter/ipt_policy.c b/net/ipv4/netfilter/ipt_policy.c --- /dev/null Wed Dec 31 16:00:00 1969 +++ b/net/ipv4/netfilter/ipt_policy.c Tue Mar 2 06:53:14 2004 @@ -0,0 +1,176 @@ +/* IP tables module for matching IPsec policy + * + * Copyright (c) 2004 Patrick McHardy, <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/kernel.h> +#include <linux/config.h> +#include <linux/module.h> +#include <linux/skbuff.h> +#include <linux/init.h> +#include <net/xfrm.h> + +#include <linux/netfilter_ipv4.h> +#include <linux/netfilter_ipv4/ipt_policy.h> +#include <linux/netfilter_ipv4/ip_tables.h> + +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_DESCRIPTION("IPtables IPsec policy matching module"); +MODULE_LICENSE("GPL"); + + +static inline int +match_xfrm_state(struct xfrm_state *x, const struct ipt_policy_elem *e) +{ +#define MISMATCH(x,y) (e->match.x && ((e->x != (y)) ^ e->invert.x)) + + if (MISMATCH(saddr, x->props.saddr.a4 & e->smask) || + MISMATCH(daddr, x->id.daddr.a4 & e->dmask) || + MISMATCH(proto, x->id.proto) || + MISMATCH(mode, x->props.mode) || + MISMATCH(spi, x->id.spi) || + MISMATCH(reqid, x->props.reqid)) + return 0; + return 1; +} + +static int +match_policy_in(const struct sk_buff *skb, const struct ipt_policy_info *info) +{ + const struct ipt_policy_elem *e; + struct sec_path *sp = skb->sp; + int strict = info->flags & POLICY_MATCH_STRICT; + int i, pos; + + if (sp == NULL) + return -1; + if (strict && info->len != sp->len) + return 0; + + for (i = sp->len - 1; i >= 0; i--) { + pos = strict ? i - sp->len + 1 : 0; + if (pos >= info->len) + return 0; + e = &info->pol[pos]; + + if (match_xfrm_state(sp->x[i].xvec, e)) { + if (!strict) + return 1; + } else if (strict) + return 0; + } + + return strict ? 1 : 0; +} + +static int +match_policy_out(const struct sk_buff *skb, const struct ipt_policy_info *info) +{ + const struct ipt_policy_elem *e; + struct dst_entry *dst = skb->dst; + int strict = info->flags & POLICY_MATCH_STRICT; + int i, pos; + + if (dst->xfrm == NULL) + return -1; + + for (i = 0; dst && dst->xfrm; dst = dst->child, i++) { + pos = strict ? i : 0; + if (pos >= info->len) + return 0; + e = &info->pol[pos]; + + if (match_xfrm_state(dst->xfrm, e)) { + if (!strict) + return 1; + } else if (strict) + return 0; + } + + return strict ? 1 : 0; +} + +static int match(const struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + const void *matchinfo, int offset, int *hotdrop) +{ + const struct ipt_policy_info *info = matchinfo; + int ret; + + if (info->flags & POLICY_MATCH_IN) + ret = match_policy_in(skb, info); + else + ret = match_policy_out(skb, info); + + if (ret < 0) { + if (info->flags & POLICY_MATCH_NONE) + ret = 1; + else + ret = 0; + } else if (info->flags & POLICY_MATCH_NONE) + ret = 0; + + return ret; +} + +static int checkentry(const char *tablename, const struct ipt_ip *ip, + void *matchinfo, unsigned int matchsize, + unsigned int hook_mask) +{ + struct ipt_policy_info *info = matchinfo; + + if (matchsize != IPT_ALIGN(sizeof(*info))) { + printk(KERN_ERR "ipt_policy: matchsize %u != %u\n", + matchsize, IPT_ALIGN(sizeof(*info))); + return 0; + } + if (!(info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT))) { + printk(KERN_ERR "ipt_policy: neither incoming nor " + "outgoing policy selected\n"); + return 0; + } + if (hook_mask & (1 << NF_IP_PRE_ROUTING | 1 << NF_IP_LOCAL_IN) + && info->flags & POLICY_MATCH_OUT) { + printk(KERN_ERR "ipt_policy: output policy not valid in " + "PRE_ROUTING and INPUT\n"); + return 0; + } + if (hook_mask & (1 << NF_IP_POST_ROUTING | 1 << NF_IP_LOCAL_OUT) + && info->flags & POLICY_MATCH_IN) { + printk(KERN_ERR "ipt_policy: input policy not valid in " + "POST_ROUTING and OUTPUT\n"); + return 0; + } + if (info->len > POLICY_MAX_ELEM) { + printk(KERN_ERR "ipt_policy: too many policy elements\n"); + return 0; + } + + return 1; +} + +static struct ipt_match policy_match = +{ + .name = "policy", + .match = match, + .checkentry = checkentry, + .me = THIS_MODULE, +}; + +static int __init init(void) +{ + return ipt_register_match(&policy_match); +} + +static void __exit fini(void) +{ + ipt_unregister_match(&policy_match); +} + +module_init(init); +module_exit(fini); [-- Attachment #8: libipt_policy.diff --] [-- Type: text/x-patch, Size: 14936 bytes --] diff -urN a/extensions/.policy-test b/extensions/.policy-test --- a/extensions/.policy-test 2004-02-23 20:39:37.000000000 +0100 +++ b/extensions/.policy-test 2004-03-04 21:41:39.000000000 +0100 @@ -1,2 +1,3 @@ #!/bin/sh -[ -f $KERNEL_DIR/net/ipv4/netfilter/ipt_policy.c -a -f $KERNEL_DIR/include/linux/netfilter_ipv4/ipt_policy.h ] && echo policy +# +[ -f $KERNEL_DIR/include/linux/netfilter_ipv4/ipt_policy.h ] && echo policy diff -urN a/extensions/libipt_policy.c b/extensions/libipt_policy.c --- a/extensions/libipt_policy.c 2004-02-23 20:38:37.000000000 +0100 +++ b/extensions/libipt_policy.c 2004-03-04 21:41:32.000000000 +0100 @@ -1,3 +1,4 @@ +/* Shared library add-on to iptables to add policy support. */ #include <stdio.h> #include <netdb.h> #include <string.h> @@ -14,72 +15,82 @@ #include <linux/netfilter_ipv4/ip_tables.h> #include <linux/netfilter_ipv4/ipt_policy.h> +/* + * HACK: global pointer to current matchinfo for making + * final checks and adjustments in final_check. + */ +static struct ipt_policy_info *policy_info; + static void help(void) { printf( "policy v%s options:\n" -"[!] --reqid reqid Match reqid\n" -"[!] --spi spi Match SPI\n" -"[!] --proto proto Match protocol (ah/esp/ipcomp)\n" -"[!] --mode mode Match mode (transport/tunnel)\n" -"[!] --local addr/mask Match local tunnel endpoint\n" -"[!] --remote addr/mask Match remote tunnel endpoint\n" -" --path Match path instead of single element at\n" -" any position\n" -" --next Begin next element in path\n", +" --dir in|out match policy applied during decapsulation/\n" +" policy to be applied during encapsulation\n" +" --pol none|ipsec match policy\n" +" --strict match entire policy instead of single element\n" +" at any position\n" +"[!] --reqid reqid match reqid\n" +"[!] --spi spi match SPI\n" +"[!] --proto proto match protocol (ah/esp/ipcomp)\n" +"[!] --mode mode match mode (transport/tunnel)\n" +"[!] --tunnel-src addr/mask match tunnel source\n" +"[!] --tunnel-dst addr/mask match tunnel destination\n" +" --next begin next element in policy\n", IPTABLES_VERSION); } -static struct option opts[] = { +static struct option opts[] = +{ { - .name = "reqid", + .name = "dir", .has_arg = 1, - .flag = 0, .val = '1', }, { - .name = "spi", + .name = "pol", .has_arg = 1, - .flag = 0, - .val = '2' + .val = '2', }, { - .name = "local", - .has_arg = 1, - .flag = 0, + .name = "strict", .val = '3' }, { - .name = "remote", + .name = "reqid", .has_arg = 1, - .flag = 0, - .val = '4' + .val = '4', }, { - .name = "proto", + .name = "spi", .has_arg = 1, - .flag = 0, .val = '5' }, { - .name = "mode", + .name = "tunnel-src", .has_arg = 1, - .flag = 0, .val = '6' }, { - .name = "path", - .has_arg = 0, - .flag = 0, + .name = "tunnel-dst", + .has_arg = 1, .val = '7' }, { - .name = "next", - .has_arg = 0, - .flag = 0, + .name = "proto", + .has_arg = 1, .val = '8' }, - { 0 } + { + .name = "mode", + .has_arg = 1, + .val = '9' + }, + { + .name = "next", + .val = 'a' + }, + { } }; static void init(struct ipt_entry_match *m, unsigned int *nfcache) @@ -87,154 +98,201 @@ *nfcache |= NFC_UNKNOWN; } +static int parse_direction(char *s) +{ + if (strcmp(s, "in") == 0) + return POLICY_MATCH_IN; + if (strcmp(s, "out") == 0) + return POLICY_MATCH_OUT; + exit_error(PARAMETER_PROBLEM, "policy_match: invalid dir `%s'", s); +} + +static int parse_policy(char *s) +{ + if (strcmp(s, "none") == 0) + return POLICY_MATCH_NONE; + if (strcmp(s, "ipsec") == 0) + return 0; + exit_error(PARAMETER_PROBLEM, "policy match: invalid policy `%s'", s); +} + static int parse_mode(char *s) { if (strcmp(s, "transport") == 0) - return SECPATH_MODE_TRANSPORT; + return POLICY_MODE_TRANSPORT; if (strcmp(s, "tunnel") == 0) - return SECPATH_MODE_TUNNEL; - return -EINVAL; + return POLICY_MODE_TUNNEL; + exit_error(PARAMETER_PROBLEM, "policy match: invalid mode `%s'", s); } -static int parse(int c, char **argv, int invert, unsigned int *i, +static int parse(int c, char **argv, int invert, unsigned int *flags, const struct ipt_entry *entry, unsigned int *nfcache, struct ipt_entry_match **match) { struct ipt_policy_info *info = (void *)(*match)->data; - struct ipt_policy_elem *e = &info->path[*i]; + struct ipt_policy_elem *e = &info->pol[info->len]; struct in_addr *addr = NULL, mask; - struct protoent *p; unsigned int naddr = 0; - int mode, proto; + int mode; check_inverse(optarg, &invert, &optind, 0); switch (c) { case '1': + if (info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT)) + exit_error(PARAMETER_PROBLEM, + "policy match: double --dir option"); + if (invert) + exit_error(PARAMETER_PROBLEM, + "policy match: can't invert --dir option"); + + info->flags |= parse_direction(argv[optind-1]); + break; + case '2': + if (invert) + exit_error(PARAMETER_PROBLEM, + "policy match: can't invert --policy option"); + + info->flags |= parse_policy(argv[optind-1]); + break; + case '3': + if (info->flags & POLICY_MATCH_STRICT) + exit_error(PARAMETER_PROBLEM, + "policy match: double --strict option"); + + if (invert) + exit_error(PARAMETER_PROBLEM, + "policy match: can't invert --strict option"); + + info->flags |= POLICY_MATCH_STRICT; + break; + case '4': if (e->match.reqid) exit_error(PARAMETER_PROBLEM, - "Can't specify --reqid twice"); + "policy match: double --reqid option"); e->match.reqid = 1; e->invert.reqid = invert; e->reqid = strtol(argv[optind-1], NULL, 10); break; - case '2': + case '5': if (e->match.spi) exit_error(PARAMETER_PROBLEM, - "Can't specify --spi twice"); + "policy match: double --spi option"); e->match.spi = 1; e->invert.spi = invert; e->spi = strtol(argv[optind-1], NULL, 0x10); break; - case '3': - if (e->match.daddr) - exit_error(PARAMETER_PROBLEM, - "Can't specify --local twice"); - - parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr); - if (naddr > 1) - exit_error(PARAMETER_PROBLEM, - "Multiple IP addresses are not allowed"); - - e->match.daddr = 1; - e->invert.daddr = invert; - e->daddr = addr[0].s_addr; - e->dmask = mask.s_addr; - break; - case '4': + case '6': if (e->match.saddr) exit_error(PARAMETER_PROBLEM, - "Can't specify --remote twice"); + "policy match: double --tunnel-src option"); parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr); if (naddr > 1) exit_error(PARAMETER_PROBLEM, - "Multiple IP addresses are not allowed"); + "policy match: name resolves to multiple IPs"); e->match.saddr = 1; e->invert.saddr = invert; e->saddr = addr[0].s_addr; e->smask = mask.s_addr; break; - case '5': - if (e->match.proto) + case '7': + if (e->match.daddr) + exit_error(PARAMETER_PROBLEM, + "policy match: double --tunnel-dst option"); + + parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr); + if (naddr > 1) exit_error(PARAMETER_PROBLEM, - "Can't specify --proto twice"); + "policy match: name resolves to multiple IPs"); - p = getprotobyname(argv[optind-1]); - if (p != NULL) - proto = p->p_proto; - else if (!(proto = atoi(argv[optind-1]))) + e->match.daddr = 1; + e->invert.daddr = invert; + e->daddr = addr[0].s_addr; + e->dmask = mask.s_addr; + break; + case '8': + if (e->match.proto) exit_error(PARAMETER_PROBLEM, - "Unknown protocol `%s'\n", - argv[optind-1]); + "policy match: double --proto option"); + e->proto = parse_protocol(argv[optind-1]); + if (e->proto != IPPROTO_AH && e->proto != IPPROTO_ESP && + e->proto != IPPROTO_COMP) + exit_error(PARAMETER_PROBLEM, + "policy match: protocol must ah/esp/ipcomp"); e->match.proto = 1; e->invert.proto = invert; - e->proto = proto; break; - case '6': + case '9': if (e->match.mode) exit_error(PARAMETER_PROBLEM, - "Can't specify --mode twice"); + "policy match: double --mode option"); mode = parse_mode(argv[optind-1]); - if (mode < 0) - exit_error(PARAMETER_PROBLEM, - "Unknown mode `%s'\n", - argv[optind-1]); - e->match.mode = 1; e->invert.mode = invert; e->mode = mode; break; - case '7': - if (info->flags & MATCH_PATH) - exit_error(PARAMETER_PROBLEM, - "Can't specify --path twice"); - + case 'a': if (invert) exit_error(PARAMETER_PROBLEM, - "Can't invert `--path' option"); + "policy match: can't invert --next option"); - info->flags |= MATCH_PATH; - break; - case '8': - if (invert) + if (++info->len == POLICY_MAX_ELEM) exit_error(PARAMETER_PROBLEM, - "Can't invert `--next' option"); - - if (++(*i) == SECPATH_MAX_DEPTH) - exit_error(PARAMETER_PROBLEM, - "Maximum path depth reached"); - + "policy match: maximum policy depth reached"); break; default: return 0; } - info->len = *i + 1; + policy_info = info; return 1; } static void final_check(unsigned int flags) { - return; -} + struct ipt_policy_info *info = policy_info; + struct ipt_policy_elem *e; + int i; -static void print_proto(char *prefix, u_int8_t proto, int numeric) -{ - struct protoent *p = NULL; + if (info == NULL) + exit_error(PARAMETER_PROBLEM, + "policy match: no parameters given"); - if (!numeric) - p = getprotobynumber(proto); - if (p == NULL) - printf("%sproto %u ", prefix, proto); - else - printf("%sproto %s ", prefix, p->p_name); + if (!(info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT))) + exit_error(PARAMETER_PROBLEM, + "policy match: neither --in nor --out specified"); + + if (info->flags & POLICY_MATCH_NONE) { + if (info->flags & POLICY_MATCH_STRICT) + exit_error(PARAMETER_PROBLEM, + "policy match: policy none but --strict given"); + + if (info->len != 0) + exit_error(PARAMETER_PROBLEM, + "policy match: policy none but policy given"); + } else + info->len++; /* increase len by 1, no --next after last element */ + + if (!(info->flags & POLICY_MATCH_STRICT) && info->len > 1) + exit_error(PARAMETER_PROBLEM, + "policy match: multiple elements but no --strict"); + + for (i = 0; i < info->len; i++) { + e = &info->pol[i]; + if ((e->match.saddr || e->match.daddr) + && ((e->mode == POLICY_MODE_TUNNEL && e->invert.mode) || + (e->mode == POLICY_MODE_TRANSPORT && !e->invert.mode))) + exit_error(PARAMETER_PROBLEM, + "policy match: --tunnel-src/--tunnel-dst " + "is only valid in tunnel mode"); + } } static void print_mode(char *prefix, u_int8_t mode, int numeric) @@ -242,10 +300,10 @@ printf("%smode ", prefix); switch (mode) { - case SECPATH_MODE_TRANSPORT: + case POLICY_MODE_TRANSPORT: printf("transport "); break; - case SECPATH_MODE_TUNNEL: + case POLICY_MODE_TUNNEL: printf("tunnel "); break; default: @@ -273,7 +331,7 @@ } if (e->match.proto) { PRINT_INVERT(e->invert.proto); - print_proto(prefix, e->proto, numeric); + printf("%sproto %s ", prefix, proto_to_name(e->proto, numeric)); } if (e->match.mode) { PRINT_INVERT(e->invert.mode); @@ -281,18 +339,34 @@ } if (e->match.daddr) { PRINT_INVERT(e->invert.daddr); - printf("%slocal %s%s ", prefix, + printf("%stunnel-dst %s%s ", prefix, addr_to_dotted((struct in_addr *)&e->daddr), mask_to_dotted((struct in_addr *)&e->dmask)); } if (e->match.saddr) { PRINT_INVERT(e->invert.saddr); - printf("%sremote %s%s ", prefix, + printf("%stunnel-src %s%s ", prefix, addr_to_dotted((struct in_addr *)&e->saddr), mask_to_dotted((struct in_addr *)&e->smask)); } } +static void print_flags(char *prefix, const struct ipt_policy_info *info) +{ + if (info->flags & POLICY_MATCH_IN) + printf("%sdir in ", prefix); + else + printf("%sdir out ", prefix); + + if (info->flags & POLICY_MATCH_NONE) + printf("%spol none ", prefix); + else + printf("%spol ipsec ", prefix); + + if (info->flags & POLICY_MATCH_STRICT) + printf("%sstrict ", prefix); +} + static void print(const struct ipt_ip *ip, const struct ipt_entry_match *match, int numeric) @@ -300,14 +374,12 @@ const struct ipt_policy_info *info = (void *)match->data; unsigned int i; - printf ("policy match "); - if (info->flags & MATCH_PATH) - printf("path "); - + printf("policy match "); + print_flags("", info); for (i = 0; i < info->len; i++) { if (info->len > 1) printf("[%u] ", i); - print_entry("", &info->path[i], numeric); + print_entry("", &info->pol[i], numeric); } printf("\n"); @@ -318,11 +390,9 @@ const struct ipt_policy_info *info = (void *)match->data; unsigned int i; - if (info->flags & MATCH_PATH) - printf("--path "); - + print_flags("--", info); for (i = 0; i < info->len; i++) { - print_entry("--", &info->path[i], 0); + print_entry("--", &info->pol[i], 0); if (i + 1 < info->len) printf("--next "); } @@ -332,18 +402,17 @@ struct iptables_match policy = { - NULL, - "policy", - IPTABLES_VERSION, - IPT_ALIGN(sizeof(struct ipt_policy_info)), - IPT_ALIGN(sizeof(struct ipt_policy_info)), - &help, - &init, - &parse, - &final_check, - &print, - &save, - opts + .name = "policy", + .version = IPTABLES_VERSION, + .size = IPT_ALIGN(sizeof(struct ipt_policy_info)), + .userspacesize = IPT_ALIGN(sizeof(struct ipt_policy_info)), + .help = help, + .init = init, + .parse = parse, + .final_check = final_check, + .print = print, + .save = save, + .extra_opts = opts }; void _init(void) diff -urN a/include/iptables.h b/include/iptables.h --- a/include/iptables.h 2004-03-04 21:37:40.000000000 +0100 +++ b/include/iptables.h 2004-03-04 21:38:36.000000000 +0100 @@ -122,6 +122,7 @@ extern void register_match(struct iptables_match *me); extern void register_target(struct iptables_target *me); +extern char *proto_to_name(u_int8_t proto, int nolookup); extern struct in_addr *dotted_to_addr(const char *dotted); extern char *addr_to_dotted(const struct in_addr *addrp); extern char *addr_to_anyname(const struct in_addr *addr); diff -urN a/iptables.c b/iptables.c --- a/iptables.c 2004-02-24 18:28:50.000000000 +0100 +++ b/iptables.c 2004-03-04 21:39:18.000000000 +0100 @@ -235,11 +235,12 @@ { "icmp", IPPROTO_ICMP }, { "esp", IPPROTO_ESP }, { "ah", IPPROTO_AH }, + { "ipcomp", IPPROTO_COMP }, { "sctp", IPPROTO_SCTP }, { "all", 0 }, }; -static char * +char * proto_to_name(u_int8_t proto, int nolookup) { unsigned int i; [-- Attachment #9: policy-test --] [-- Type: text/plain, Size: 2245 bytes --] #! /bin/bash PATH="/usr/local/sbin" iptables -F iptables -A INPUT -m policy --dir in --pol none iptables -A INPUT -m policy --dir in --pol ipsec iptables -A INPUT -m policy --dir in --pol ipsec --mode tunnel iptables -A INPUT -m policy --dir in --pol ipsec --mode transport iptables -A INPUT -m policy --dir in --pol ipsec --proto ah iptables -A INPUT -m policy --dir in --pol ipsec --proto esp iptables -A INPUT -m policy --dir in --pol ipsec --strict --mode tunnel iptables -A INPUT -m policy --dir in --pol ipsec --strict --mode transport iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto ah iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto esp iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto esp --next --proto ah iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto esp --mode tunnel --tunnel-src 192.168.0.1 --next --proto ah --mode transport iptables -A INPUT -m policy --dir in --pol ipsec --strict --proto esp --mode tunnel --tunnel-src 192.168.0.1 --tunnel-dst 192.168.0.23 --next --proto ah --mode transport iptables -A OUTPUT -m policy --dir out --pol none iptables -A OUTPUT -m policy --dir out --pol ipsec iptables -A OUTPUT -m policy --dir out --pol ipsec --mode tunnel iptables -A OUTPUT -m policy --dir out --pol ipsec --mode transport iptables -A OUTPUT -m policy --dir out --pol ipsec --proto ah iptables -A OUTPUT -m policy --dir out --pol ipsec --proto esp iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --mode tunnel iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --mode transport iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto ah iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp --next --proto ah iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp --mode tunnel --tunnel-dst 192.168.0.0/24 --next --proto ah iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp --mode tunnel --tunnel-dst ! 192.168.0.0/24 --next --proto ah iptables -A OUTPUT -m policy --dir out --pol ipsec --strict --proto esp --mode tunnel --tunnel-dst 192.168.0.1 --next --proto ah --mode transport ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-04 22:30 ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy @ 2004-03-04 23:11 ` Willy Tarreau 2004-03-04 23:42 ` Alexander Samad 2004-03-05 1:47 ` Patrick McHardy 2004-03-04 23:44 ` Patrick McHardy 2004-03-05 11:39 ` Harald Welte 2 siblings, 2 replies; 63+ messages in thread From: Willy Tarreau @ 2004-03-04 23:11 UTC (permalink / raw) To: Patrick McHardy Cc: Netfilter Development Mailinglist, Harald Welte, Henrik Nordstrom, Tom Eastep, Michal Ludvig, alex, guillaume Hi Patrick, On Thu, Mar 04, 2004 at 11:30:38PM +0100, Patrick McHardy wrote: > (long CC list, if anyone doesn't want to be on it anymore please say so) no pb, please keep me in :-) > - hook-traversal on input is symetrical to output now. This means > packets will traverse PRE_ROUTING/LOCAL_IN before decryption and > PRE_REROUTING/LOCAL_IN or PRE_ROUTING/FORWARD afterwards. This is excellent. So if I understand it right, a packet now passes this way (please excuse the ascii-art, it seems I'll never take the time to start xfig) [incoming packet] |<-----------------. v | PREROUTING | dest!=local? / \ dest==local? | / \ | FORWARD LOCAL_IN | | | | | (ipsec?decrypt:nop)| | | \ / dest!=local ? nf_postxfrm_nonlocal() | | `------' v v dest==local ? ip_local_deliver_finish() So if it behaves like this (which is what seems the best solution to me), is there a risk that a carefully crafted packet loops many times on the right side (multiple encapsulation with local DIP) ? And if so, is it risky or not, wrt resource leak, multiple locks, etc... ? > The only problem with this approach I'm currently aware of is that > one of the ip-protocol handlers marked with xfrm_prot > (xfrm4_tunnel.c) is also responsible for dispatching packets to > ipip.c; the encapsulated packets heading for a ipip-tunnel won't > traverse the netfilter hooks on input. I don't know well the internals of IPSEC (bought the book but didn't read it yet). Are IPIP tunnels often used ? or are they simply used according to a negociation, in which case the ipsec implementation on the netfilter side could refuse them ? > - new match "policy" for matching the policy that was used during > decapsulation (well, the used SAs, policy checks come later), and the > policy that will be used for encapsulation. Even more interesting... Now we can for sure check that packets coming from a tunnel have not been spoofed by another tunnel. Thanks a lot Patrick. My main goal is to use this on 2.4 with davem/herbert xu's ipsec backport. I will soon check if your patches apply on top of their implementation. Cheers, Willy ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-04 23:11 ` Willy Tarreau @ 2004-03-04 23:42 ` Alexander Samad 2004-03-05 2:00 ` Patrick McHardy 2004-03-05 1:47 ` Patrick McHardy 1 sibling, 1 reply; 63+ messages in thread From: Alexander Samad @ 2004-03-04 23:42 UTC (permalink / raw) To: Willy Tarreau Cc: Patrick McHardy, Netfilter Development Mailinglist, Harald Welte, Henrik Nordstrom, Tom Eastep, Michal Ludvig, guillaume [-- Attachment #1: Type: text/plain, Size: 2951 bytes --] On Fri, Mar 05, 2004 at 12:11:41AM +0100, Willy Tarreau wrote: > Hi Patrick, > > On Thu, Mar 04, 2004 at 11:30:38PM +0100, Patrick McHardy wrote: > > (long CC list, if anyone doesn't want to be on it anymore please say so) > > no pb, please keep me in :-) > > > - hook-traversal on input is symetrical to output now. This means > > packets will traverse PRE_ROUTING/LOCAL_IN before decryption and > > PRE_REROUTING/LOCAL_IN or PRE_ROUTING/FORWARD afterwards. > > This is excellent. So if I understand it right, a packet now passes this way > (please excuse the ascii-art, it seems I'll never take the time to start xfig) > > [incoming packet] > |<-----------------. > v | > PREROUTING | > dest!=local? / \ dest==local? | > / \ | > FORWARD LOCAL_IN | > | | | > | (ipsec?decrypt:nop)| > | | \ / dest!=local ? nf_postxfrm_nonlocal() > | | `------' > v v dest==local ? ip_local_deliver_finish() > Willy thanks for the diagram, too lazy to do one myself. Q do I understand right that encrypted packets can or can't be acted upon by the hooks in LOCAL_IN. Or another way of putting it does a packet travel the tables twice once as an encrypted packet and once as a de crypted packet ? Alex > > So if it behaves like this (which is what seems the best solution to me), is > there a risk that a carefully crafted packet loops many times on the right > side (multiple encapsulation with local DIP) ? And if so, is it risky or not, > wrt resource leak, multiple locks, etc... ? > > > The only problem with this approach I'm currently aware of is that > > one of the ip-protocol handlers marked with xfrm_prot > > (xfrm4_tunnel.c) is also responsible for dispatching packets to > > ipip.c; the encapsulated packets heading for a ipip-tunnel won't > > traverse the netfilter hooks on input. > > I don't know well the internals of IPSEC (bought the book but didn't read it > yet). Are IPIP tunnels often used ? or are they simply used according to a > negociation, in which case the ipsec implementation on the netfilter side > could refuse them ? > > > - new match "policy" for matching the policy that was used during > > decapsulation (well, the used SAs, policy checks come later), and the > > policy that will be used for encapsulation. > > Even more interesting... Now we can for sure check that packets coming from > a tunnel have not been spoofed by another tunnel. > > Thanks a lot Patrick. > My main goal is to use this on 2.4 with davem/herbert xu's ipsec backport. > I will soon check if your patches apply on top of their implementation. > > Cheers, > Willy > [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-04 23:42 ` Alexander Samad @ 2004-03-05 2:00 ` Patrick McHardy 2004-03-05 2:13 ` Alexander Samad 2004-03-10 2:45 ` Alexander Samad 0 siblings, 2 replies; 63+ messages in thread From: Patrick McHardy @ 2004-03-05 2:00 UTC (permalink / raw) To: Alexander Samad Cc: Willy Tarreau, Netfilter Development Mailinglist, Harald Welte, Tom Eastep, Michal Ludvig, guillaume Alexander Samad wrote: > Q do I understand right that encrypted packets can or can't be acted > upon by the hooks in LOCAL_IN. > > Or another way of putting it does a packet travel the tables twice once > as an encrypted packet and once as a de crypted packet ? Exactly, input looks like this: (encrypted) PRE_ROUTING -> LOCAL_IN -> (plain) PRE_ROUTING -> LOCAL_IN/FORWARD output looks like this: (plain) FORWARD/LOCAL_OUT -> POST_ROUTING -> (encrypted) LOCAL_OUT -> POST_ROUTING This is the same as with freeswan, only without the ipsec devices, the policy match can be used as a easy replacement for them (-m policy --pol ipsec). Regards, Patrick > > Alex > > ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-05 2:00 ` Patrick McHardy @ 2004-03-05 2:13 ` Alexander Samad 2004-03-10 2:45 ` Alexander Samad 1 sibling, 0 replies; 63+ messages in thread From: Alexander Samad @ 2004-03-05 2:13 UTC (permalink / raw) To: Patrick McHardy; +Cc: Netfilter Development Mailinglist [-- Attachment #1: Type: text/plain, Size: 924 bytes --] On Fri, Mar 05, 2004 at 03:00:07AM +0100, Patrick McHardy wrote: > Alexander Samad wrote: > >Q do I understand right that encrypted packets can or can't be acted > >upon by the hooks in LOCAL_IN. > > > >Or another way of putting it does a packet travel the tables twice once > >as an encrypted packet and once as a de crypted packet ? > > Exactly, input looks like this: > > (encrypted) PRE_ROUTING -> LOCAL_IN -> > (plain) PRE_ROUTING -> LOCAL_IN/FORWARD > > output looks like this: > > (plain) FORWARD/LOCAL_OUT -> POST_ROUTING -> > (encrypted) LOCAL_OUT -> POST_ROUTING > > This is the same as with freeswan, only without the ipsec > devices, the policy match can be used as a easy replacement > for them (-m policy --pol ipsec). > > Regards, > Patrick Great, i also presume this means that NAT + IPSEC is now working, will give it a try tonight. Thanks > > > > >Alex > > > > [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-05 2:00 ` Patrick McHardy 2004-03-05 2:13 ` Alexander Samad @ 2004-03-10 2:45 ` Alexander Samad 2004-03-11 22:10 ` Patrick McHardy 1 sibling, 1 reply; 63+ messages in thread From: Alexander Samad @ 2004-03-10 2:45 UTC (permalink / raw) To: Netfilter Development Mailinglist [-- Attachment #1: Type: text/plain, Size: 1866 bytes --] Patrick I seem to have found a bug in your patches, but only when used in conjuction with Herbert's mangle patch. It seems like there is a loop caused when the packet traverses the tablesi, in particular ip_route_me_harder. I tested this on my laptop with debian 2.6.3-2 source with these patches that you provided on this thread, as well as the Herbert mangle patch. It seem like the packet on the way out gets encapsulated and then the encrypted packets try to get re encrypted. example ipsec.conf conn wireless left=10.0.4.129 leftsubnet=0/0 authby=secret pfs=no auto=add right=%defaultroute By the dump it looks like a loop, I added a printk("%d\n", iph->protocol); in ip_route_me_harder just before Herberts fix to test that. When I changed the config to look like this conn wireless left=10.0.4.129 leftsubnet=10.6.0/24 authby=secret pfs=no auto=add right=%defaultroute It worked fine Any other question ask, I have deb's of the image and headers too if you want. Alex On Fri, Mar 05, 2004 at 03:00:07AM +0100, Patrick McHardy wrote: > Alexander Samad wrote: > >Q do I understand right that encrypted packets can or can't be acted > >upon by the hooks in LOCAL_IN. > > > >Or another way of putting it does a packet travel the tables twice once > >as an encrypted packet and once as a de crypted packet ? > > Exactly, input looks like this: > > (encrypted) PRE_ROUTING -> LOCAL_IN -> > (plain) PRE_ROUTING -> LOCAL_IN/FORWARD > > output looks like this: > > (plain) FORWARD/LOCAL_OUT -> POST_ROUTING -> > (encrypted) LOCAL_OUT -> POST_ROUTING > > This is the same as with freeswan, only without the ipsec > devices, the policy match can be used as a easy replacement > for them (-m policy --pol ipsec). > > Regards, > Patrick > > > > >Alex > > > > [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-10 2:45 ` Alexander Samad @ 2004-03-11 22:10 ` Patrick McHardy 2004-03-12 0:15 ` Alexander Samad 0 siblings, 1 reply; 63+ messages in thread From: Patrick McHardy @ 2004-03-11 22:10 UTC (permalink / raw) To: Alexander Samad; +Cc: Netfilter Development Mailinglist Alexander Samad wrote: > Patrick > > I seem to have found a bug in your patches, but only when used in > conjuction with Herbert's mangle patch. > > It seems like there is a loop caused when the packet traverses the > tablesi, in particular ip_route_me_harder. > > I tested this on my laptop with debian 2.6.3-2 source with these patches > that you provided on this thread, as well as the Herbert mangle patch. > > It seem like the packet on the way out gets encapsulated and then the > encrypted packets try to get re encrypted. Thanks for the report, for now the easiest solution is to back out Herbert's patch. Regards Patrick ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-11 22:10 ` Patrick McHardy @ 2004-03-12 0:15 ` Alexander Samad 0 siblings, 0 replies; 63+ messages in thread From: Alexander Samad @ 2004-03-12 0:15 UTC (permalink / raw) To: Patrick McHardy; +Cc: Netfilter Development Mailinglist [-- Attachment #1: Type: text/plain, Size: 845 bytes --] On Thu, Mar 11, 2004 at 11:10:38PM +0100, Patrick McHardy wrote: > Alexander Samad wrote: > >Patrick > > > >I seem to have found a bug in your patches, but only when used in > >conjuction with Herbert's mangle patch. > > > >It seems like there is a loop caused when the packet traverses the > >tablesi, in particular ip_route_me_harder. > > > >I tested this on my laptop with debian 2.6.3-2 source with these patches > >that you provided on this thread, as well as the Herbert mangle patch. > > > >It seem like the packet on the way out gets encapsulated and then the > >encrypted packets try to get re encrypted. > > Thanks for the report, for now the easiest solution is to back out > Herbert's patch. I have done that in my local build, but I beleive it is in 2.6.4 (the changelog) > > Regards > Patrick > [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-04 23:11 ` Willy Tarreau 2004-03-04 23:42 ` Alexander Samad @ 2004-03-05 1:47 ` Patrick McHardy 2004-03-05 11:10 ` Willy Tarreau 1 sibling, 1 reply; 63+ messages in thread From: Patrick McHardy @ 2004-03-05 1:47 UTC (permalink / raw) To: Willy Tarreau Cc: Netfilter Development Mailinglist, Harald Welte, Tom Eastep, Michal Ludvig, alex, guillaume [-- Attachment #1: Type: text/plain, Size: 4497 bytes --] Hi Willy, Willy Tarreau wrote: > This is excellent. So if I understand it right, a packet now passes this way > (please excuse the ascii-art, it seems I'll never take the time to start xfig) > > [incoming packet] > |<-----------------. > v | > PREROUTING | > dest!=local? / \ dest==local? | > / \ | > FORWARD LOCAL_IN | > | | | > | (ipsec?decrypt:nop)| > | | \ / dest!=local ? nf_postxfrm_nonlocal() > | | `------' > v v dest==local ? ip_local_deliver_finish() > > > So if it behaves like this (which is what seems the best solution to me), is > there a risk that a carefully crafted packet loops many times on the right > side (multiple encapsulation with local DIP) ? And if so, is it risky or not, > wrt resource leak, multiple locks, etc... ? Not exactly. the ipsec transformers are ip protocol handlers, their input handler (xfrm4_input) is called from ip_local_deliver_finish. After decrypting the payload the packets are either reposted into the network stack (in case of a tunnel-mode SA) or again delivered to the ip-protocol contained in the next header. packets reposted into the stack may be in some intermediate state in case of a transport mode encapsulation following a tunnel mode encapsulation. The difficulty is to know when the packet is "done". So nf_postxfrm_nonlocal is called directly after input routing if the destination is non-local, but only for packets which were reposted into the stack. The fact that they have a non-local destination means ipsec must be done with them. For packet with local destination, we look at what the ipprotocol handler does in ip_local_deliver_finish (xfrm_prot or not) and if the packet was handled by ipsec before and the protocol handler is no xfrm_prot, ipsec is done, so the packets are passed through nf_postxfrm_input. I attached a xfig file with a little picture of how it works, I hope it's understandable. The blue lines represent the flow as it happens without my patches, the red lines represent whats new. While drawing it, I noticed a bug, packets with their destination changed in nf_postxfrm_nonlocal may need to be delivered locally afterwards. Regarding loop-danger, I'm not totally sure yet, but I don't think that it would be triggerable by crafted packets but only by adventurous configurations. Multiple encapsulations are no error and they are limited to 4. >>The only problem with this approach I'm currently aware of is that >>one of the ip-protocol handlers marked with xfrm_prot >>(xfrm4_tunnel.c) is also responsible for dispatching packets to >>ipip.c; the encapsulated packets heading for a ipip-tunnel won't >>traverse the netfilter hooks on input. > > > I don't know well the internals of IPSEC (bought the book but didn't read it > yet). Are IPIP tunnels often used ? or are they simply used according to a > negociation, in which case the ipsec implementation on the netfilter side > could refuse them ? It's not IPIP with IPSEC that misbehaves but plaintext-IPIP tunnels. xfrm4_tunnel.c is the registered handler for IPIP, it used to be ipip.c. ipip.c now registers itself with xfrm4_tunnel, but xfrm4_tunnel is marked as a xfrm_prot which is not true for ipip.c. Right now I'm not even sure that it is a problem and under what exact circumstances, I need to think about this some more (tomorrow). >>- new match "policy" for matching the policy that was used during >>decapsulation (well, the used SAs, policy checks come later), and the >>policy that will be used for encapsulation. > > > Even more interesting... Now we can for sure check that packets coming from > a tunnel have not been spoofed by another tunnel. That should also be handled by the policy checks, although at a later time (at socket-delivery/forwarding). I'm unsure if the policy checks should happen before packets are looked at by netfilter, after all it manipulates it's state from information in these untrusted packets. > > Thanks a lot Patrick. > My main goal is to use this on 2.4 with davem/herbert xu's ipsec backport. > I will soon check if your patches apply on top of their implementation. Thanks for your comments, I'm looking forward to your testing. Best regards Patrick > > Cheers, > Willy > [-- Attachment #2: nf+ipsec.xfig.fig --] [-- Type: image/x-xfig, Size: 3113 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-05 1:47 ` Patrick McHardy @ 2004-03-05 11:10 ` Willy Tarreau 0 siblings, 0 replies; 63+ messages in thread From: Willy Tarreau @ 2004-03-05 11:10 UTC (permalink / raw) To: Patrick McHardy Cc: Netfilter Development Mailinglist, Harald Welte, Tom Eastep, Michal Ludvig, alex, guillaume Hi Patrick, On Fri, Mar 05, 2004 at 02:47:58AM +0100, Patrick McHardy wrote: > I attached a xfig file with a little picture of how it works, I hope > it's understandable. The blue lines represent the flow as it happens > without my patches, the red lines represent whats new. While drawing > it, I noticed a bug, packets with their destination changed > in nf_postxfrm_nonlocal may need to be delivered locally afterwards. Thanks for the picture, indeed it's far better now ;-) > Regarding loop-danger, I'm not totally sure yet, but I don't think > that it would be triggerable by crafted packets but only by > adventurous configurations. Multiple encapsulations are no error > and they are limited to 4. OK. Thanks for your clarifications, I understand better now. Cheers, Willy ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-04 22:30 ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy 2004-03-04 23:11 ` Willy Tarreau @ 2004-03-04 23:44 ` Patrick McHardy 2004-03-05 11:39 ` Harald Welte 2 siblings, 0 replies; 63+ messages in thread From: Patrick McHardy @ 2004-03-04 23:44 UTC (permalink / raw) To: Netfilter Development Mailinglist Cc: Willy Tarreau, Harald Welte, Tom Eastep, Michal Ludvig, alex, guillaume [-- Attachment #1: Type: text/plain, Size: 320 bytes --] Patrick McHardy wrote: > - new match "policy" for matching the policy that was used during > decapsulation (well, the used SAs, policy checks come later), and the > policy that will be used for encapsulation. > This patch (libipt_policy.diff) was broken in the last mail, fixed version attached. > Regards > Patrick [-- Attachment #2: libipt_policy.diff --] [-- Type: text/x-patch, Size: 11996 bytes --] diff -urN a/extensions/.policy-test b/extensions/.policy-test --- a/extensions/.policy-test 1970-01-01 01:00:00.000000000 +0100 +++ b/extensions/.policy-test 2004-03-05 00:44:32.000000000 +0100 @@ -0,0 +1,3 @@ +#!/bin/sh +# +[ -f $KERNEL_DIR/include/linux/netfilter_ipv4/ipt_policy.h ] && echo policy diff -urN a/extensions/libipt_policy.c b/extensions/libipt_policy.c --- a/extensions/libipt_policy.c 1970-01-01 01:00:00.000000000 +0100 +++ b/extensions/libipt_policy.c 2004-03-05 00:44:40.000000000 +0100 @@ -0,0 +1,421 @@ +/* Shared library add-on to iptables to add policy support. */ +#include <stdio.h> +#include <netdb.h> +#include <string.h> +#include <stdlib.h> +#include <syslog.h> +#include <getopt.h> +#include <netdb.h> +#include <errno.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <arpa/inet.h> +#include <iptables.h> + +#include <linux/netfilter_ipv4/ip_tables.h> +#include <linux/netfilter_ipv4/ipt_policy.h> + +/* + * HACK: global pointer to current matchinfo for making + * final checks and adjustments in final_check. + */ +static struct ipt_policy_info *policy_info; + +static void help(void) +{ + printf( +"policy v%s options:\n" +" --dir in|out match policy applied during decapsulation/\n" +" policy to be applied during encapsulation\n" +" --pol none|ipsec match policy\n" +" --strict match entire policy instead of single element\n" +" at any position\n" +"[!] --reqid reqid match reqid\n" +"[!] --spi spi match SPI\n" +"[!] --proto proto match protocol (ah/esp/ipcomp)\n" +"[!] --mode mode match mode (transport/tunnel)\n" +"[!] --tunnel-src addr/mask match tunnel source\n" +"[!] --tunnel-dst addr/mask match tunnel destination\n" +" --next begin next element in policy\n", + IPTABLES_VERSION); +} + +static struct option opts[] = +{ + { + .name = "dir", + .has_arg = 1, + .val = '1', + }, + { + .name = "pol", + .has_arg = 1, + .val = '2', + }, + { + .name = "strict", + .val = '3' + }, + { + .name = "reqid", + .has_arg = 1, + .val = '4', + }, + { + .name = "spi", + .has_arg = 1, + .val = '5' + }, + { + .name = "tunnel-src", + .has_arg = 1, + .val = '6' + }, + { + .name = "tunnel-dst", + .has_arg = 1, + .val = '7' + }, + { + .name = "proto", + .has_arg = 1, + .val = '8' + }, + { + .name = "mode", + .has_arg = 1, + .val = '9' + }, + { + .name = "next", + .val = 'a' + }, + { } +}; + +static void init(struct ipt_entry_match *m, unsigned int *nfcache) +{ + *nfcache |= NFC_UNKNOWN; +} + +static int parse_direction(char *s) +{ + if (strcmp(s, "in") == 0) + return POLICY_MATCH_IN; + if (strcmp(s, "out") == 0) + return POLICY_MATCH_OUT; + exit_error(PARAMETER_PROBLEM, "policy_match: invalid dir `%s'", s); +} + +static int parse_policy(char *s) +{ + if (strcmp(s, "none") == 0) + return POLICY_MATCH_NONE; + if (strcmp(s, "ipsec") == 0) + return 0; + exit_error(PARAMETER_PROBLEM, "policy match: invalid policy `%s'", s); +} + +static int parse_mode(char *s) +{ + if (strcmp(s, "transport") == 0) + return POLICY_MODE_TRANSPORT; + if (strcmp(s, "tunnel") == 0) + return POLICY_MODE_TUNNEL; + exit_error(PARAMETER_PROBLEM, "policy match: invalid mode `%s'", s); +} + +static int parse(int c, char **argv, int invert, unsigned int *flags, + const struct ipt_entry *entry, + unsigned int *nfcache, + struct ipt_entry_match **match) +{ + struct ipt_policy_info *info = (void *)(*match)->data; + struct ipt_policy_elem *e = &info->pol[info->len]; + struct in_addr *addr = NULL, mask; + unsigned int naddr = 0; + int mode; + + check_inverse(optarg, &invert, &optind, 0); + + switch (c) { + case '1': + if (info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT)) + exit_error(PARAMETER_PROBLEM, + "policy match: double --dir option"); + if (invert) + exit_error(PARAMETER_PROBLEM, + "policy match: can't invert --dir option"); + + info->flags |= parse_direction(argv[optind-1]); + break; + case '2': + if (invert) + exit_error(PARAMETER_PROBLEM, + "policy match: can't invert --policy option"); + + info->flags |= parse_policy(argv[optind-1]); + break; + case '3': + if (info->flags & POLICY_MATCH_STRICT) + exit_error(PARAMETER_PROBLEM, + "policy match: double --strict option"); + + if (invert) + exit_error(PARAMETER_PROBLEM, + "policy match: can't invert --strict option"); + + info->flags |= POLICY_MATCH_STRICT; + break; + case '4': + if (e->match.reqid) + exit_error(PARAMETER_PROBLEM, + "policy match: double --reqid option"); + + e->match.reqid = 1; + e->invert.reqid = invert; + e->reqid = strtol(argv[optind-1], NULL, 10); + break; + case '5': + if (e->match.spi) + exit_error(PARAMETER_PROBLEM, + "policy match: double --spi option"); + + e->match.spi = 1; + e->invert.spi = invert; + e->spi = strtol(argv[optind-1], NULL, 0x10); + break; + case '6': + if (e->match.saddr) + exit_error(PARAMETER_PROBLEM, + "policy match: double --tunnel-src option"); + + parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr); + if (naddr > 1) + exit_error(PARAMETER_PROBLEM, + "policy match: name resolves to multiple IPs"); + + e->match.saddr = 1; + e->invert.saddr = invert; + e->saddr = addr[0].s_addr; + e->smask = mask.s_addr; + break; + case '7': + if (e->match.daddr) + exit_error(PARAMETER_PROBLEM, + "policy match: double --tunnel-dst option"); + + parse_hostnetworkmask(argv[optind-1], &addr, &mask, &naddr); + if (naddr > 1) + exit_error(PARAMETER_PROBLEM, + "policy match: name resolves to multiple IPs"); + + e->match.daddr = 1; + e->invert.daddr = invert; + e->daddr = addr[0].s_addr; + e->dmask = mask.s_addr; + break; + case '8': + if (e->match.proto) + exit_error(PARAMETER_PROBLEM, + "policy match: double --proto option"); + + e->proto = parse_protocol(argv[optind-1]); + if (e->proto != IPPROTO_AH && e->proto != IPPROTO_ESP && + e->proto != IPPROTO_COMP) + exit_error(PARAMETER_PROBLEM, + "policy match: protocol must ah/esp/ipcomp"); + e->match.proto = 1; + e->invert.proto = invert; + break; + case '9': + if (e->match.mode) + exit_error(PARAMETER_PROBLEM, + "policy match: double --mode option"); + + mode = parse_mode(argv[optind-1]); + e->match.mode = 1; + e->invert.mode = invert; + e->mode = mode; + break; + case 'a': + if (invert) + exit_error(PARAMETER_PROBLEM, + "policy match: can't invert --next option"); + + if (++info->len == POLICY_MAX_ELEM) + exit_error(PARAMETER_PROBLEM, + "policy match: maximum policy depth reached"); + break; + default: + return 0; + } + + policy_info = info; + return 1; +} + +static void final_check(unsigned int flags) +{ + struct ipt_policy_info *info = policy_info; + struct ipt_policy_elem *e; + int i; + + if (info == NULL) + exit_error(PARAMETER_PROBLEM, + "policy match: no parameters given"); + + if (!(info->flags & (POLICY_MATCH_IN|POLICY_MATCH_OUT))) + exit_error(PARAMETER_PROBLEM, + "policy match: neither --in nor --out specified"); + + if (info->flags & POLICY_MATCH_NONE) { + if (info->flags & POLICY_MATCH_STRICT) + exit_error(PARAMETER_PROBLEM, + "policy match: policy none but --strict given"); + + if (info->len != 0) + exit_error(PARAMETER_PROBLEM, + "policy match: policy none but policy given"); + } else + info->len++; /* increase len by 1, no --next after last element */ + + if (!(info->flags & POLICY_MATCH_STRICT) && info->len > 1) + exit_error(PARAMETER_PROBLEM, + "policy match: multiple elements but no --strict"); + + for (i = 0; i < info->len; i++) { + e = &info->pol[i]; + if ((e->match.saddr || e->match.daddr) + && ((e->mode == POLICY_MODE_TUNNEL && e->invert.mode) || + (e->mode == POLICY_MODE_TRANSPORT && !e->invert.mode))) + exit_error(PARAMETER_PROBLEM, + "policy match: --tunnel-src/--tunnel-dst " + "is only valid in tunnel mode"); + } +} + +static void print_mode(char *prefix, u_int8_t mode, int numeric) +{ + printf("%smode ", prefix); + + switch (mode) { + case POLICY_MODE_TRANSPORT: + printf("transport "); + break; + case POLICY_MODE_TUNNEL: + printf("tunnel "); + break; + default: + printf("??? "); + break; + } +} + +#define PRINT_INVERT(x) \ +do { \ + if (x) \ + printf("! "); \ +} while(0) + +static void print_entry(char *prefix, const struct ipt_policy_elem *e, + int numeric) +{ + if (e->match.reqid) { + PRINT_INVERT(e->invert.reqid); + printf("%sreqid %u ", prefix, e->reqid); + } + if (e->match.spi) { + PRINT_INVERT(e->invert.spi); + printf("%sspi 0x%x ", prefix, e->spi); + } + if (e->match.proto) { + PRINT_INVERT(e->invert.proto); + printf("%sproto %s ", prefix, proto_to_name(e->proto, numeric)); + } + if (e->match.mode) { + PRINT_INVERT(e->invert.mode); + print_mode(prefix, e->mode, numeric); + } + if (e->match.daddr) { + PRINT_INVERT(e->invert.daddr); + printf("%stunnel-dst %s%s ", prefix, + addr_to_dotted((struct in_addr *)&e->daddr), + mask_to_dotted((struct in_addr *)&e->dmask)); + } + if (e->match.saddr) { + PRINT_INVERT(e->invert.saddr); + printf("%stunnel-src %s%s ", prefix, + addr_to_dotted((struct in_addr *)&e->saddr), + mask_to_dotted((struct in_addr *)&e->smask)); + } +} + +static void print_flags(char *prefix, const struct ipt_policy_info *info) +{ + if (info->flags & POLICY_MATCH_IN) + printf("%sdir in ", prefix); + else + printf("%sdir out ", prefix); + + if (info->flags & POLICY_MATCH_NONE) + printf("%spol none ", prefix); + else + printf("%spol ipsec ", prefix); + + if (info->flags & POLICY_MATCH_STRICT) + printf("%sstrict ", prefix); +} + +static void print(const struct ipt_ip *ip, + const struct ipt_entry_match *match, + int numeric) +{ + const struct ipt_policy_info *info = (void *)match->data; + unsigned int i; + + printf("policy match "); + print_flags("", info); + for (i = 0; i < info->len; i++) { + if (info->len > 1) + printf("[%u] ", i); + print_entry("", &info->pol[i], numeric); + } + + printf("\n"); +} + +static void save(const struct ipt_ip *ip, const struct ipt_entry_match *match) +{ + const struct ipt_policy_info *info = (void *)match->data; + unsigned int i; + + print_flags("--", info); + for (i = 0; i < info->len; i++) { + print_entry("--", &info->pol[i], 0); + if (i + 1 < info->len) + printf("--next "); + } + + printf("\n"); +} + +struct iptables_match policy = +{ + .name = "policy", + .version = IPTABLES_VERSION, + .size = IPT_ALIGN(sizeof(struct ipt_policy_info)), + .userspacesize = IPT_ALIGN(sizeof(struct ipt_policy_info)), + .help = help, + .init = init, + .parse = parse, + .final_check = final_check, + .print = print, + .save = save, + .extra_opts = opts +}; + +void _init(void) +{ + register_match(&policy); +} diff -urN a/include/iptables.h b/include/iptables.h --- a/include/iptables.h 2004-03-05 00:43:36.000000000 +0100 +++ b/include/iptables.h 2004-03-05 00:45:29.000000000 +0100 @@ -122,6 +122,7 @@ extern void register_match(struct iptables_match *me); extern void register_target(struct iptables_target *me); +extern char *proto_to_name(u_int8_t proto, int nolookup); extern struct in_addr *dotted_to_addr(const char *dotted); extern char *addr_to_dotted(const struct in_addr *addrp); extern char *addr_to_anyname(const struct in_addr *addr); diff -urN a/iptables.c b/iptables.c --- a/iptables.c 2004-03-05 00:44:10.000000000 +0100 +++ b/iptables.c 2004-03-05 00:45:59.000000000 +0100 @@ -235,11 +235,12 @@ { "icmp", IPPROTO_ICMP }, { "esp", IPPROTO_ESP }, { "ah", IPPROTO_AH }, + { "ipcomp", IPPROTO_COMP }, { "sctp", IPPROTO_SCTP }, { "all", 0 }, }; -static char * +char * proto_to_name(u_int8_t proto, int nolookup) { unsigned int i; ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]: latest netfilter+ipsec patches 2004-03-04 22:30 ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy 2004-03-04 23:11 ` Willy Tarreau 2004-03-04 23:44 ` Patrick McHardy @ 2004-03-05 11:39 ` Harald Welte 2 siblings, 0 replies; 63+ messages in thread From: Harald Welte @ 2004-03-05 11:39 UTC (permalink / raw) To: Patrick McHardy; +Cc: Netfilter Development Mailinglist [-- Attachment #1: Type: text/plain, Size: 779 bytes --] On Thu, Mar 04, 2004 at 11:30:38PM +0100, Patrick McHardy wrote: > This is the latest version of the netfilter+ipsec patches, I will check > them in in a couple of days after some minor changes. good work, patrick. I'm looking forward to seeing those changes in the mainline kernel. one minor note: please #ifdef CONFIG_NETFILTER the nf_reset function (or #define it to nothing otherwise) -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 0:09 ` [PATCH]Re: " Harald Welte 2004-01-28 8:49 ` Patrick McHardy @ 2004-01-28 10:30 ` Andreas Jellinghaus 2004-01-29 19:05 ` Harald Welte 1 sibling, 1 reply; 63+ messages in thread From: Andreas Jellinghaus @ 2004-01-28 10:30 UTC (permalink / raw) To: netfilter-devel ah_output to ah_output_finish and a new ah_output with: > + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, ah_dummydev, > + ah_output_finish); please add the same to ipip, ipgre and friends, to keep them in sync? what about ipv6? Andreas p.s. I hope the idea with the new hook POSTENCAP died? ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH]Re: NAT before IPsec with 2.6 2004-01-28 10:30 ` [PATCH]Re: NAT before IPsec with 2.6 Andreas Jellinghaus @ 2004-01-29 19:05 ` Harald Welte 0 siblings, 0 replies; 63+ messages in thread From: Harald Welte @ 2004-01-29 19:05 UTC (permalink / raw) To: Andreas Jellinghaus; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 933 bytes --] On Wed, Jan 28, 2004 at 11:30:04AM +0100, Andreas Jellinghaus wrote: > ah_output to ah_output_finish and a new ah_output > with: > > + return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, skb->dev, ah_dummydev, > > + ah_output_finish); > > please add the same to ipip, ipgre and friends, to keep them > in sync? what about ipv6? yes, please wait until we have implemented in one case and proven that it does what it is supposed to do.. after that we will sync up. > Andreas > p.s. I hope the idea with the new hook POSTENCAP died? yes. -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 11:57 ` Henrik Nordstrom 2004-01-27 13:07 ` Harald Welte @ 2004-01-27 19:54 ` Michael Richardson 1 sibling, 0 replies; 63+ messages in thread From: Michael Richardson @ 2004-01-27 19:54 UTC (permalink / raw) To: netfilter-devel -----BEGIN PGP SIGNED MESSAGE----- >>>>> "Henrik" == Henrik Nordstrom <hno@marasystems.com> writes: Henrik> For me it is easier to grasp what is happending with the packet Henrik> if it is viewed like the original packet is forwarded to the Henrik> ipsec engine and the ipsec engine then generates a new packet. Yes, I agree strongly. Henrik> Even in the Host<->Host IPSec case the tunnel view is most Henrik> appropriate. As soon as the packets enter the IPSec encapsulation Henrik> they loose all the relevant propoerties of the original Henrik> connection and becomes part of the single IPSec Host <-> IPSec Henrik> Host connection all packets between these two IPSec host is Henrik> transmitted over. Actually, there are a number of properities that many will wish to retain. This includes: 1) VLAN 2) QoS - DSCP, IPv6 flow label, other application set parameters 3) possibly some pieces of nfmark (it will be scenario specific) 4) ECN gets copied It is my understanding that the stackable dst permits most of this to be retained - but the IPsec actually needs to decide what to do. Finally, there are some packets which will enter the IPsec system, but emerge unchanged - they may excepted. Those packets will not be new. ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson, Xelerance Corporation, Ottawa, ON |net architect[ ] mcr@xelerance.com http://www.sandelman.ottawa.on.ca/mcr/ |device driver[ ] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Finger me for keys iQCVAwUBQBbCDYqHRg3pndX9AQGjCgP9EWNAvrsEwopZXv1aJhBAyrFbawM5bPI4 CoENcH4FFXURohmE7GDsXl3Ei8Hf7Kuru8EG1WKZCNJXYpgpsMFRXH6YVOzl497o aVdE2y279vv1JOyY7Jo/Lwj6nm5NMqkv1EMMOMK8IZm3+AgQ6Fzw/IosMpQUEAkR CrWi68BCtE0= =usFv -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 10:39 ` Harald Welte 2004-01-27 11:57 ` Henrik Nordstrom @ 2004-01-27 13:27 ` Valentijn Sessink 2004-01-27 13:57 ` Henrik Nordstrom 2004-01-27 21:13 ` Andreas Jellinghaus 2004-01-27 16:11 ` Tom Eastep 2004-01-27 19:51 ` Michael Richardson 3 siblings, 2 replies; 63+ messages in thread From: Valentijn Sessink @ 2004-01-27 13:27 UTC (permalink / raw) To: Harald Welte, Willy Tarreau, Henrik Nordstrom, Tom Eastep, Michal Ludvig, netfilter-devel Hello, At Tue, Jan 27, 2004 at 11:39:18AM +0100, Harald Welte wrote: > I also disagree that an ESP packet should be trated as locally > generated (and thus iterate over OUTPUT). This is what happens with incoming packets: they hit INPUT twice. > From my perspective, locally > generated means more something like: sent via a local userspace process > using PF_INET sockets. If we consider a packet after any kind of change > as locally-generated, we could argue doing this with NAT'ed packets, > too. Please try to convince me, if I'm missing some point. You are missing the point that the netfilter code changes the packets here, thus if you would want netfilter to do something else, you could code that in netfilter rules. Now let's review an IPsec tunnel. On the input side, everything works as expected: an ESP packet hits PREROUTING, then INPUT, gets unencrypted, then the new packet (that has different from and to, and may be a different protocol altogether) hits PREROUTING again, then INPUT (or FORWARD). On output, however, things are not so transparant. A packet hits OUTPUT, then gets encrypted. A totally different packet hits POSTROUTING. This makes no sense. > One of the fundamental principles behind netfilter/iptables, especially > NAT, is it's symmetric behaviour (between incoming and outgoing packets, > between the hooks, ...). So with any possible solution, this should be > kept in mind. There's no symmetric behaviour now. And from an "input" perspective, I think a decapsulated packet that has a locally ending payload should hit INPUT again. Same with output: a locally generated packet hits output, then should hit OUTPUT again if it was really destined for the outside world. V. -- http://www.openoffice.nl/ Open Office - Linux Office Solutions Valentijn Sessink valentyn+sessink@nospam.openoffice.nl ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 13:27 ` Valentijn Sessink @ 2004-01-27 13:57 ` Henrik Nordstrom 2004-01-27 21:13 ` Andreas Jellinghaus 1 sibling, 0 replies; 63+ messages in thread From: Henrik Nordstrom @ 2004-01-27 13:57 UTC (permalink / raw) To: Valentijn Sessink Cc: Harald Welte, Willy Tarreau, Tom Eastep, Michal Ludvig, netfilter-devel On Tue, 27 Jan 2004, Valentijn Sessink wrote: > On output, however, things are not so transparant. A packet hits OUTPUT, > then gets encrypted. A totally different packet hits POSTROUTING. This makes > no sense. And there the required hooks needs to be inserted in the IPSec implementation to call a) POSTROUTING before the final decision that this packet is to go via IPSec. Note that this may need to allow reroute of the packet including IPSec policy. Because of this some changes may be needed in the IPSec implementation for this to be complete. b) After encrypting OUTPUT needs to be called on the encrypted packet before the packet leaves IPSec and continues the normal path leading to POSTROUTING. > There's no symmetric behaviour now. We all know that, and is why this discussion exists in order to find out how it should be. > And from an "input" perspective, I think a decapsulated packet that has > a locally ending payload should hit INPUT again. Same with output: a > locally generated packet hits output, then should hit OUTPUT again if it > was really destined for the outside world. To this we all agree from what I can tell. Regards Henrik ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 13:27 ` Valentijn Sessink 2004-01-27 13:57 ` Henrik Nordstrom @ 2004-01-27 21:13 ` Andreas Jellinghaus 2004-01-28 8:58 ` Harald Welte 1 sibling, 1 reply; 63+ messages in thread From: Andreas Jellinghaus @ 2004-01-27 21:13 UTC (permalink / raw) To: netfilter-devel On Tue, 27 Jan 2004 14:27:25 +0100, Valentijn Sessink wrote: > Hello, > > At Tue, Jan 27, 2004 at 11:39:18AM +0100, Harald Welte wrote: >> I also disagree that an ESP packet should be trated as locally >> generated (and thus iterate over OUTPUT). > > This is what happens with incoming packets: they hit INPUT twice. yes, but once before and once after decryption. use either fwmark or an ipip tunnel interface to destinguish between already decrypted and was-never-encrypted packets. > On output, however, things are not so transparant. A packet hits OUTPUT, > then gets encrypted. A totally different packet hits POSTROUTING. This makes > no sense. use an interface, and you will see the packet twice, plus the interface does the route lookup on the encrypted packet. so no ugly hacks in the routing table are necessary. (ugly hacks scenario: consider a road warrior. he is 192.168.3.2 and his vpn gateway is 192.168.3.1. he wants to put the company net on that GW address. But without a GW he doesn't know on which interface to put it. it could be either wlan (if the tunnel is established with his wireless lan card), or the eth0 interface (if he is using ethernet cable to connect to the net). sure, the ipsec software could manipulate the routing table after altering local and remote tunnel endpoints, but that is an ugly hack and a bad design.) [symetry] that does not work: netfilter wants NAT to be the first thing for incoming packets and the last thing for outgoing packets. thats symetry. but you want ipsec encryption and decryption as soon as possible: both before looking at the routing table. mixing those two will never result in symetry. if you try it (routing before encapsulation), the result has problems. Andreas ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 21:13 ` Andreas Jellinghaus @ 2004-01-28 8:58 ` Harald Welte 2004-01-28 10:21 ` Andreas Jellinghaus ` (2 more replies) 0 siblings, 3 replies; 63+ messages in thread From: Harald Welte @ 2004-01-28 8:58 UTC (permalink / raw) To: Andreas Jellinghaus; +Cc: netfilter-devel, netdev [-- Attachment #1: Type: text/plain, Size: 4125 bytes --] Cc'ing netdev, since this is now a very clear proposal and not some internal netfilter rambling. On Tue, Jan 27, 2004 at 10:13:33PM +0100, Andreas Jellinghaus wrote: > > This is what happens with incoming packets: they hit INPUT twice. > > yes, but once before and once after decryption. use either fwmark > or an ipip tunnel interface to destinguish between already decrypted > and was-never-encrypted packets. yes, and this is desired. However, the packets also hit PREROUTING twice, which is exactly what we want (and get it for free, since xfrm4_input.c calls netif_rx() again). However, it doesn't reset the skb->nfct pointer, so connection tracking sees both encapsulated and decapsulated packet as belonging to the same connection. This is wrong, and dangerous (application helpers might be called for the encapsulated traffic, which now has a completely different layer 4 protocol. So with stock 2.6.x you have - no working connection tracking in any kind of ipsec scenario - conntrack and NAT helpers trying to find readable application data within already-encrypted packets - no working DNAT/SNAT, mainly because of not-working connection tracking > > On output, however, things are not so transparant. A packet hits OUTPUT, > > then gets encrypted. A totally different packet hits POSTROUTING. This makes > > no sense. > > use an interface, and you will see the packet twice, plus the interface > does the route lookup on the encrypted packet. so no ugly hacks in the > routing table are necessary. Did anybody propose ugly hacks in the routing table? > [symetry] > > that does not work: > netfilter wants NAT to be the first thing for incoming packets > and the last thing for outgoing packets. thats symetry. yes, it is symmetric. As is the symmetry between prerouting/postrouting and input/output hooks. As well as the symmetry between SNAT and DNAt, ... > but you want ipsec encryption and decryption as soon as possible: > both before looking at the routing table. Yes, that's why what IPsec is doing for incoming ESP/AH packets the right thing: go through PRE_ROUTING, route to LOCAL_IN, decapsulate, and start over again. However, for outgoing ESP/AH packets (locally-encapsulated), it doesn't show the symmetric behaviour of what it does on the input side: go through FORWARD, encapsulate, route, POST_ROUTING. No starting over. So we absolutely _need_ a symmetric-to-incoming behaviour like: a) for locally-originated packets LOCAL_OUT, POST_ROUTING, encapsulate, LOCAL_OUT, POST_ROUTING. b) for forwarded, locally-encapsulated packets PRE_ROUTING, FORWARD, POST_ROUTING, encapsulate, LOCAL_OUT, POST_ROUTING. No, we don't achieve this by manipulating the routing code, but by placing the respective hooks in ah{4,6}.c and esp{4,6}.c {ah,esp}_output() function respectively. We also need to (again) reset the skb->nfct and drop the conntrack reference again. This way we 1) make connection tracking work again 2) do not change any routing of the current implementation 3) enable users to a) filter unencapsulated, encapsulated and decapsulated packeets at their 'expected' place b) do DNAT on the incoming decapsulated packets (which doesn't work right now, but should) c) do SNAT on the outgoing/forwarded to-be-encapsualated packets (which doesn't work right now, but should) > mixing those two will never result in symetry. if you try it > (routing before encapsulation), the result has problems. I'm not proposing to route before encapsulating. I just propose of calling the same netfilter functions that we usually call before/after routing at different places :) > Andreas -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-28 8:58 ` Harald Welte @ 2004-01-28 10:21 ` Andreas Jellinghaus 2004-01-28 13:00 ` Harald Welte 2004-01-28 14:24 ` 2.6.2-rc2 and nf-log Wojciech 'Sas' Cieciwa 2004-01-28 19:38 ` NAT before IPsec with 2.6 David S. Miller 2 siblings, 1 reply; 63+ messages in thread From: Andreas Jellinghaus @ 2004-01-28 10:21 UTC (permalink / raw) To: Harald Welte; +Cc: netfilter-devel, netdev On Wed, 2004-01-28 at 09:58, Harald Welte wrote: > However, it doesn't reset the skb->nfct pointer, the tunnels do. In summary it looks to me, like you want to have everything that a real tunnel with an interface offers, but for some reason you don't want the tunnel with the interface. right? so, why? > - no working connection tracking in any kind of ipsec scenario that is not true. ipip_rcv: nf_conntrack_put(skb->nfct); skb->nfct = NULL; ipip_xmit: nf_conntrack_put(skb->nfct); skb->nfct = NULL; the only problem is stability and scheduling in atomic with xfrm+ipip (i guess a bug, I hope someone who understands the code could take a look at it). > > use an interface, and you will see the packet twice, plus the interface > > does the route lookup on the encrypted packet. so no ugly hacks in the > > routing table are necessary. > > Did anybody propose ugly hacks in the routing table? if you don't use an interface, you need them in real world setups. with an interface both gw and client put the route to each others static vpn ip on ipsec0 and then configure the tunnel ips, and those ip will be used to lookup the interface where to send the encrypted packet. with the current code without using interfaces, the ip of the vpn tunnel needs to be moved to the interface that will be used to reach the vpn gateway. even worse, you need to play with source address selection to make sure an outgoing packet has your static vpn ip, and not the dynamic ip of whatever net you are currently using. example: up ip addr add 192.168.3.1 dev wlan0 up ip route del default dev wlan0 up ip route add default via 192.168.1.1 src 192.168.3.1 dev wlan0 down ip addr del 192.168.3.1 dev wlan0 and swithcing from wlan0 to eth0 does not only require the tunnel to be reconfigured (local and remote ip), but also changes in the routing table. that is horrible. and yes, of course you want static ip addresses for "road warrior" clients! for example smtp servers need to these to allow relaying or not. > So we absolutely _need_ a symmetric-to-incoming behaviour like: use an tunnel interface and you have that. (adding pre_/post_ routing is very simple, simply cut&paste to the _rcv and _xmit functions. sure, you can archive the same thing without using tunnel interfaces. But i wonder if that creates a simple solution, easy to understand. my basic question is: how will these changes affect setups with tunnel interfaces? [other issue] if we are to look at the calls to xfrm anyway: would it be possible to add a debugging facility for dropped packets? I know, that is not a netfilter issue, but I guess many people will find it problematic, if they cannot see what packets are dropped by the xfrm logic. I'm only rasing the issue, because that code is most likely put to the same places we are discussiong right now (ah, esp, ipip_rcv, ipip_xmit, ...). > I'm not proposing to route before encapsulating. that is currently done. I'm glad if that "feature" is removed. > I just propose of > calling the same netfilter functions that we usually call before/after > routing at different places :) fine. I don't understand what the benefit of doing everything like with an interface, but avoiding the interface. but please keep tunnel with interface and tunnel without interface in sync. Regards, Andreas ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-28 10:21 ` Andreas Jellinghaus @ 2004-01-28 13:00 ` Harald Welte 2004-01-28 13:43 ` Andreas Jellinghaus 0 siblings, 1 reply; 63+ messages in thread From: Harald Welte @ 2004-01-28 13:00 UTC (permalink / raw) To: Andreas Jellinghaus; +Cc: netfilter-devel, netdev [-- Attachment #1: Type: text/plain, Size: 3838 bytes --] On Wed, Jan 28, 2004 at 11:21:34AM +0100, Andreas Jellinghaus wrote: > On Wed, 2004-01-28 at 09:58, Harald Welte wrote: > > However, it doesn't reset the skb->nfct pointer, > > the tunnels do. yes, I'm well aware of that. > In summary it looks to me, like you want to have everything that a > real tunnel with an interface offers, but for some reason you don't > want the tunnel with the interface. right? > so, why? Because it's needed for interoperability with other IPsec implementations, in other words if you are not running Linux on both ends. Yes, I know... in an ideal world everybody would run linux on both ends... but if you do: Why use IPsec at all, if not for interoperability. > > - no working connection tracking in any kind of ipsec scenario s/ipsec scen/plain non-tunnel ipsec scen/ > > Did anybody propose ugly hacks in the routing table? > > if you don't use an interface, you need them in real world setups. oh well, what you are describing are entries in the routing table, that's fine. I thought you were talking about hacking the routing code. > and swithcing from wlan0 to eth0 does not only require > the tunnel to be reconfigured (local and remote ip), but > also changes in the routing table. that is horrible. well, this is just like the 2.6.x ipsec implementation is like. It's not the netfilter project's will or task to change that implementation. We just want to get the same netfilter/iptables functionality, independent of somebody using ipsec or not. > > So we absolutely _need_ a symmetric-to-incoming behaviour like: > > use an tunnel interface and you have that. yes, but as stated above, other implementations might not be able to work with that implementation. > sure, you can archive the same thing without using tunnel interfaces. > But i wonder if that creates a simple solution, easy to understand. Unfortunately there is no simple or easy solution. We just want to get it working at all. > my basic question is: how will these changes affect setups with tunnel > interfaces? I think you would see the packet even more often: 1) LOCAL_OUT of original packet 2) POST_ROUTING of original packet 3) LOCAL_OUT of tunnelled packet 4) POST_ROUTING of tunneled packet 5) LOCAL_OUT of ESP/AH packet 6) POST_ROUTING of ESP/AH packet > [other issue] > if we are to look at the calls to xfrm anyway: would it be possible to > add a debugging facility for dropped packets? I know, that is not a > netfilter issue, but I guess many people will find it problematic, > if they cannot see what packets are dropped by the xfrm logic. > I'm only rasing the issue, because that code is most likely put > to the same places we are discussiong right now (ah, esp, ipip_rcv, > ipip_xmit, ...). yes, but I'd rather don't put both of them into one patch. > > I'm not proposing to route before encapsulating. > > that is currently done. I'm glad if that "feature" is removed. Again, I misunderstood you. I do not intend to change the IPsec implementation in any way, I just want netfilter to acommodate it. > an interface, but avoiding the interface. but please keep > tunnel with interface and tunnel without interface in sync. What do you mean by that? From my point of view, it should be like (1-6) above. This way anybody can filter on any kind of header bit at any layer of the encapsulation. > Regards, Andreas -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-28 13:00 ` Harald Welte @ 2004-01-28 13:43 ` Andreas Jellinghaus 0 siblings, 0 replies; 63+ messages in thread From: Andreas Jellinghaus @ 2004-01-28 13:43 UTC (permalink / raw) To: Harald Welte; +Cc: netfilter-devel, netdev Hi Harald, The userland could be improved to be more flexible, then it could configure the same issue with and without interface, depending on the needs. From protocol perspective that wouldn't be visible, so I realy don't see any interoperability issues here. ip, esp, inner ip, tcp/udp, payload. the packet can look the same, whether you use some interface to assemble it or not. ike and setting spd plus tunnel is the issue, but thats userspace. ok, but thats not a netfilter topic. > I think you would see the packet even more often: > > 1) LOCAL_OUT of original packet > 2) POST_ROUTING of original packet > 3) LOCAL_OUT of tunnelled packet > 4) POST_ROUTING of tunneled packet > 5) LOCAL_OUT of ESP/AH packet > 6) POST_ROUTING of ESP/AH packet one change you want to add is POST_ROUTING before encapsilating. add that hook at the start of ipip_xmit, too (renameing ipip_xmit to ipip_xmit_finish and a new ipip_xmit with only the hook), and things should be in sync. same for the other tunnels. the other change you want to make is set the interface to some other value after encapsulation. what value? will it be possible to specify the value? I'm still thinking about that and in what scenario you would need to differ say ipsec0 and ipsec1. but such scenarios could exist. maybe one vpn/gateway/router shared by two small companies, both want to keep their network private ? [setkey debugging] > yes, but I'd rather don't put both of them into one patch. sure. are there netfilter routines that can be reused, or are they somehow tied to the NF core? netfilter formats the packets nicely for debugging. > > an interface, but avoiding the interface. but please keep > > tunnel with interface and tunnel without interface in sync. > > What do you mean by that? if you add a hook in ah/esp, add the same hooks to ipip, ipgre, ip6_tunnel etc. Currently netfilter/tunneling is more or less consistent - without interface there are some issues, but the tunnelling fits fine into the big netfilter picture (the one with 5 hooks explained). if ah/esp used different hooks than the real tunnels, that would make the situation less easy. also, what is the status of the new hooks? dropped or do you still plan to add a different hook, even if it calls the same table? POST_XFRM is a bad name, better something like POST_ENCAP or POST_DECAP, because IMO tunnels should call that hook, too, so it should be a generic name, not tied to xfrm. > From my point of view, it should be like > (1-6) above. This way anybody can filter on any kind of header bit at > any layer of the encapsulation. ah, wait. it will be only 4 steps, 3+5 and 4+6 are merged, because the tunnel calls xfrm itself, so it does both at once. that could be changed of course, if you want that flexibility. but i wonder: if you want to change the untunneld packet, you can do that in 1/2. if you want to change the encrypted packet, you can do that i 5/6. 3/4 has the same (as payload) you see in 1/2, and the same header that has 5/6. so what is the benefit of two additional steps? Regards, Andreas ^ permalink raw reply [flat|nested] 63+ messages in thread
* 2.6.2-rc2 and nf-log 2004-01-28 8:58 ` Harald Welte 2004-01-28 10:21 ` Andreas Jellinghaus @ 2004-01-28 14:24 ` Wojciech 'Sas' Cieciwa 2004-01-28 19:38 ` NAT before IPsec with 2.6 David S. Miller 2 siblings, 0 replies; 63+ messages in thread From: Wojciech 'Sas' Cieciwa @ 2004-01-28 14:24 UTC (permalink / raw) To: Harald Welte; +Cc: Netfilter developmnet mailing list Hi I try compile nf-log module from CVS snap 20040128 on 2.6.2-rc2 kernel and I found undefinef BR_NETPROTO_LOCK. Can anyone check and solve this problem ? Thanx. Sas. -- {Wojciech 'Sas' Cieciwa} {Member of PLD Team } {e-mail: cieciwa@alpha.zarz.agh.edu.pl, http://www2.zarz.agh.edu.pl/~cieciwa} ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-28 8:58 ` Harald Welte 2004-01-28 10:21 ` Andreas Jellinghaus 2004-01-28 14:24 ` 2.6.2-rc2 and nf-log Wojciech 'Sas' Cieciwa @ 2004-01-28 19:38 ` David S. Miller 2 siblings, 0 replies; 63+ messages in thread From: David S. Miller @ 2004-01-28 19:38 UTC (permalink / raw) To: Harald Welte; +Cc: aj, netfilter-devel, netdev On Wed, 28 Jan 2004 09:58:31 +0100 Harald Welte <laforge@netfilter.org> wrote: > No, we don't achieve this by manipulating the routing code, but by > placing the respective hooks in ah{4,6}.c and esp{4,6}.c > {ah,esp}_output() function respectively. We also need to (again) reset > the skb->nfct and drop the conntrack reference again. Why not just do this right when we pop into the dst_output() call in ip_output.c This way we don't have to add all of this stuff for every new encapsulator we ever implement. Maybe not like this precisely, but something like it. ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 10:39 ` Harald Welte 2004-01-27 11:57 ` Henrik Nordstrom 2004-01-27 13:27 ` Valentijn Sessink @ 2004-01-27 16:11 ` Tom Eastep 2004-01-27 20:45 ` Harald Welte 2004-01-27 19:51 ` Michael Richardson 3 siblings, 1 reply; 63+ messages in thread From: Tom Eastep @ 2004-01-27 16:11 UTC (permalink / raw) To: Harald Welte, Willy Tarreau Cc: Henrik Nordstrom, Michal Ludvig, netfilter-devel On Tuesday 27 January 2004 02:39 am, Harald Welte wrote: > > My proposal: > From conntracks view (and thus from NAT view), there (shuld be?) two > seperate connections for any given to-be-encapsulated packet: > > - the payload connection (tcp/udp/whatever) > - conntrack entry created at PREROUTING > - conntrack entry confirmed (put in hash) at XXX > - the AH/ESP connection > - conntrack entry created at YYY > - conntrack entry confirmed at POSTROUTING > > To do this, somewhen between esp_output() is called and the beginning of > the modification of the packet payload, we need to call > nf_hook(POST_ROUTING). This way, conntrack would be able to put the > connection in the hash, and people can do SNAT-like operations in > nat->POSTROUTING. We could even pass a dummy output device structure > with an interface name "esp" so people can SNAT everything heading for > esp encapsulation. I like this proposal. I assume that on input, payload packets would also have this dummy device as their input device? And on output that payload packets would have "esp" as their output device in the FORWARD and OUTPUT hooks as well? I would also like to register my vote for having the AH/ESP packets go through INPUT and OUTPUT. This would allow Shorewall to treat IPSEC in the same way as it does all other tunnel types: a) the payload connection uses a dedicated device ("esp" in this case) b) the encapsulated traffic originates at the gateway on output and is directed to the gateway on input. -Tom -- Tom Eastep \ Nothing is foolproof to a sufficiently talented fool Shoreline, \ http://shorewall.net Washington USA \ teastep@shorewall.net ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 16:11 ` Tom Eastep @ 2004-01-27 20:45 ` Harald Welte 2004-01-28 15:36 ` Tom Eastep 0 siblings, 1 reply; 63+ messages in thread From: Harald Welte @ 2004-01-27 20:45 UTC (permalink / raw) To: Tom Eastep Cc: Willy Tarreau, Henrik Nordstrom, Michal Ludvig, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1904 bytes --] > > To do this, somewhen between esp_output() is called and the beginning of > > the modification of the packet payload, we need to call > > nf_hook(POST_ROUTING). This way, conntrack would be able to put the > > connection in the hash, and people can do SNAT-like operations in > > nat->POSTROUTING. We could even pass a dummy output device structure > > with an interface name "esp" so people can SNAT everything heading for > > esp encapsulation. > > I like this proposal. Great :) > I assume that on input, payload packets would also have this dummy > device as their input device? sure, makes sense. > And on output that payload packets would have "esp" as their output > device in the FORWARD and OUTPUT hooks as well? no, that's way more difficult. I'm not sure whether it can be done at all (without adding rediculous kludges to the code). > I would also like to register my vote for having the AH/ESP packets go > through INPUT and OUTPUT. This would allow Shorewall to treat IPSEC in > the same way as it does all other tunnel types: Mh. Let's say we stick with the INPUT/OUTPUT chains for now. However, I would like to introduce new netfilter hooks. At least for the beginning (due to lack of better ideas), I'm going to register the INPUT/OUTPUT chains with those new hooks. the idea of the new hooks is, that in reality they are at different locations in the stack. And at some point we might have some module that is only interested in xfrm'ed packets. > -Tom -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 20:45 ` Harald Welte @ 2004-01-28 15:36 ` Tom Eastep 0 siblings, 0 replies; 63+ messages in thread From: Tom Eastep @ 2004-01-28 15:36 UTC (permalink / raw) To: Harald Welte Cc: Willy Tarreau, Henrik Nordstrom, Michal Ludvig, netfilter-devel On Tuesday 27 January 2004 12:45 pm, Harald Welte wrote: > > > To do this, somewhen between esp_output() is called and the beginning > > > of the modification of the packet payload, we need to call > > > nf_hook(POST_ROUTING). This way, conntrack would be able to put the > > > connection in the hash, and people can do SNAT-like operations in > > > nat->POSTROUTING. We could even pass a dummy output device structure > > > with an interface name "esp" so people can SNAT everything heading for > > > esp encapsulation. > > > > I like this proposal. > > Great :) > > > I assume that on input, payload packets would also have this dummy > > device as their input device? > > sure, makes sense. > > > And on output that payload packets would have "esp" as their output > > device in the FORWARD and OUTPUT hooks as well? > > no, that's way more difficult. I'm not sure whether it can be done at > all (without adding rediculous kludges to the code). Bummer -- I'm trying to avoid equally ridiculous kludges in my own code :-) > > > I would also like to register my vote for having the AH/ESP packets go > > through INPUT and OUTPUT. This would allow Shorewall to treat IPSEC in > > the same way as it does all other tunnel types: > > Mh. Let's say we stick with the INPUT/OUTPUT chains for now. > > However, I would like to introduce new netfilter hooks. At least for > the beginning (due to lack of better ideas), I'm going to register the > INPUT/OUTPUT chains with those new hooks. > > the idea of the new hooks is, that in reality they are at different > locations in the stack. And at some point we might have some module > that is only interested in xfrm'ed packets. > I understand. -Tom -- Tom Eastep \ Nothing is foolproof to a sufficiently talented fool Shoreline, \ http://shorewall.net Washington USA \ teastep@shorewall.net ^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: NAT before IPsec with 2.6 2004-01-27 10:39 ` Harald Welte ` (2 preceding siblings ...) 2004-01-27 16:11 ` Tom Eastep @ 2004-01-27 19:51 ` Michael Richardson 3 siblings, 0 replies; 63+ messages in thread From: Michael Richardson @ 2004-01-27 19:51 UTC (permalink / raw) To: netfilter-devel -----BEGIN PGP SIGNED MESSAGE----- >>>>> "Harald" == Harald Welte <laforge@netfilter.org> writes: Harald> To do this, somewhen between esp_output() is called and the Harald> beginning of the modification of the packet payload, we need to Harald> call nf_hook(POST_ROUTING). This way, conntrack would be able to Harald> put the connection in the hash, and people can do SNAT-like Harald> operations in nat-> POSTROUTING. We could even pass a dummy Harald> output device structure with an interface name "esp" so people Harald> can SNAT everything heading for esp encapsulation. I would suggest that many administrators will want to have possibly a pseudo-interface per SA. These aren't necessarily real interfaces like ipsecX, but they do need to be named in some way. (There are a number of scenarios where having a proper virtual interface is really the best way to interact properly with routing daemons) The original IPsec implementation that was done at NRL created virtual tunnels (using the "route" command) with virtual tunnel devices - one per tunnel. They were criticised that this didn't scale - that's because at the time (1993/1994) people weren't building kernels that assumed that there could be thousands of devices. Linux people *are* doing this, and doing it correct. The examples include: MPLS, PPP, PPPoE, VLAN ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson, Xelerance Corporation, Ottawa, ON |net architect[ ] mcr@xelerance.com http://www.sandelman.ottawa.on.ca/mcr/ |device driver[ ] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Finger me for keys iQCVAwUBQBbBJ4qHRg3pndX9AQEaHAQA3+VNcmq/9RzhqIW/d5ecLN1bnh+/BQYf hUELO1Drczud1ne1FxUHSDlvDtwRE5EzZ3tJwSA2NH+5s2yEYnk9Ydb+JNYPiWAL F/KohhfcQxsasloCSpQ6KErazBU240pJfRYjlSwqNPsZLVC64CqyenEwAm4vxXJJ X7uzK/idyc4= =eeVc -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 63+ messages in thread
end of thread, other threads:[~2004-03-12 0:15 UTC | newest]
Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-21 12:29 NAT before IPsec with 2.6 Michal Ludvig
2004-01-23 6:57 ` Willy Tarreau
2004-01-23 12:31 ` Henrik Nordstrom
2004-01-23 13:31 ` Michal Ludvig
2004-01-23 14:24 ` Henrik Nordstrom
2004-01-23 14:40 ` Michal Ludvig
2004-01-23 15:56 ` Henrik Nordstrom
2004-01-23 15:51 ` Tom Eastep
2004-01-24 8:22 ` Willy Tarreau
2004-01-24 9:21 ` Henrik Nordstrom
2004-01-24 9:27 ` Willy Tarreau
2004-01-27 10:39 ` Harald Welte
2004-01-27 11:57 ` Henrik Nordstrom
2004-01-27 13:07 ` Harald Welte
2004-01-27 13:22 ` Henrik Nordstrom
2004-01-27 14:12 ` Henrik Nordstrom
2004-01-27 20:51 ` Harald Welte
2004-01-27 22:35 ` Henrik Nordstrom
2004-01-28 13:48 ` Harald Welte
2004-01-27 22:41 ` Willy Tarreau
2004-01-27 23:55 ` Harald Welte
2004-01-28 0:14 ` Willy Tarreau
2004-01-28 0:09 ` [PATCH]Re: " Harald Welte
2004-01-28 8:49 ` Patrick McHardy
2004-01-28 9:37 ` Patrick McHardy
2004-01-28 10:30 ` Harald Welte
2004-01-28 11:24 ` Willy Tarreau
2004-01-28 13:39 ` Harald Welte
2004-01-28 15:58 ` Tom Eastep
2004-01-28 13:22 ` Patrick McHardy
2004-01-28 14:23 ` Henrik Nordstrom
2004-02-01 14:52 ` Patrick McHardy
2004-02-16 1:19 ` Patrick McHardy
2004-02-18 14:57 ` Patrick McHardy
[not found] ` <20040218220337.GA3193@alpha.home.local>
2004-02-20 1:43 ` Patrick McHardy
2004-03-04 22:30 ` [PATCH]: latest netfilter+ipsec patches Patrick McHardy
2004-03-04 23:11 ` Willy Tarreau
2004-03-04 23:42 ` Alexander Samad
2004-03-05 2:00 ` Patrick McHardy
2004-03-05 2:13 ` Alexander Samad
2004-03-10 2:45 ` Alexander Samad
2004-03-11 22:10 ` Patrick McHardy
2004-03-12 0:15 ` Alexander Samad
2004-03-05 1:47 ` Patrick McHardy
2004-03-05 11:10 ` Willy Tarreau
2004-03-04 23:44 ` Patrick McHardy
2004-03-05 11:39 ` Harald Welte
2004-01-28 10:30 ` [PATCH]Re: NAT before IPsec with 2.6 Andreas Jellinghaus
2004-01-29 19:05 ` Harald Welte
2004-01-27 19:54 ` Michael Richardson
2004-01-27 13:27 ` Valentijn Sessink
2004-01-27 13:57 ` Henrik Nordstrom
2004-01-27 21:13 ` Andreas Jellinghaus
2004-01-28 8:58 ` Harald Welte
2004-01-28 10:21 ` Andreas Jellinghaus
2004-01-28 13:00 ` Harald Welte
2004-01-28 13:43 ` Andreas Jellinghaus
2004-01-28 14:24 ` 2.6.2-rc2 and nf-log Wojciech 'Sas' Cieciwa
2004-01-28 19:38 ` NAT before IPsec with 2.6 David S. Miller
2004-01-27 16:11 ` Tom Eastep
2004-01-27 20:45 ` Harald Welte
2004-01-28 15:36 ` Tom Eastep
2004-01-27 19:51 ` Michael Richardson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.