All of lore.kernel.org
 help / color / mirror / Atom feed
* Stateless NAT with iptables
@ 2015-01-09 18:38 Glen Miner
  2015-01-09 19:14 ` Kristian Evensen
  0 siblings, 1 reply; 9+ messages in thread
From: Glen Miner @ 2015-01-09 18:38 UTC (permalink / raw)
  To: netfilter@vger.kernel.org


After a day of hacking around and trolling the internet I'm more or less convinced this isn't possible with iptables but I figured I should ask in case I missed something.

What we're trying to do is set up UDP NAT proxy for a pair of clients on demand.

The mappings are preceded by some application layer traffic used to figure out where the mapping should go -- we're essentially trying to use the netfilter stack to do a blazing fast STUN/TURN server with no packet overhead. We need to bounce a few packets around on the ports we're going to proxy because this is how we figure out the remote peer address & ports we'll need to bind together.

In my prototype this works but there's a problem: nf_conntrac gets in the way of the hand-off between the application layer socket and the NAT rule. I've tried telling iptables to go stateless:

iptables -t raw -F

iptables -t raw -I PREROUTING -j NOTRACK
iptables -t raw -I OUTPUT -j NOTRACK

But of course this means that none of my -t nat rules are checked; I've read the documentation several times now and I can't seem to find a way to have my cake and eat it too.

In more detail here's how it currently works (and what goes wrong):

Alice sends a message to the relay server saying "can you help me talk to Bob?" (the mechanics of this communication channel are unimportant)

The relay server binds a pair of UDP sockets $anPort and $bnPort to be used for the proxy on $nAddr and tells

Alice: send me a packet to $nAddr:$bnPort so I can figure out how your router NATs to that endpoint
- this gives us $aAddr and $aPort

Bob: send me a packet to $anPort 
- this gives us $bAddr and $bPort

With that, the server then has enough information to create the following rules

# A->B
iptables -t nat -I PREROUTING -p udp -s $aAddr --sport $aPort -d $nAddr --dport $bnPort -j DNAT --to $bAddr:$bPort
iptables -t nat -A POSTROUTING -p udp -d $bAddr --dport $bPort -j SNAT --to $nAddr:$anPort
iptables -t nat -I OUTPUT -p udp -s $aAddr --sport $aPort -d $nAddr --dport $bnPort -j DNAT --to $bAddr:$bPort

# B->A (NOTE: these are not needed per se if Alice sends a packet to Bob first)
iptables -t nat -I PREROUTING -p udp -s $bAddr --sport $bPort -d $nAddr --dport $anPort -j DNAT --to $aAddr:$aPort
iptables -t nat -A POSTROUTING -p udp -d $aAddr --dport $aPort -j SNAT --to $nAddr:$bnPort
iptables -t nat -I OUTPUT -p udp -s $bAddr --sport $bPort -d $nAddr --dport $anPort -j DNAT --to $aAddr:$aPort

And now as far as Alice is concerned Bob is at $nAddr:$bnPort and as far as Bob is concerned Alice is at $nAddr:$anPort

We can technically close the sockets we used to discover the bind points -- they aren't needed any more -- but here's the rub: nf_conntrack_udp_timeout seconds has to elapse with NO traffic from Alice or Bob to the bind ports; evidently there is a connection kicking around that is preventing the NAT rules I added from taking over and the system default is 30 seconds.

I can see the connection with this command:

conntrack -L --proto udp --dport=$anPort

Until this times out I get no joy and any stray packet from the respective peer will reset the countdown.

There are two workarounds I can think of:

1) reduce nf_conntrack_udp_timeout 
2) try a conntract -D command to clear the connection
3) before binding the local sockets I create rules to NOTRACK them while they're in userland

The fourth option that I dream of, however, is a "please don't bother with nf_conntrack, I really don't need it, honest" -- this would be the ideal case but try as I might I can't find any way to make this work. NOTRACK means no -t nat and you can't rewrite source or dest headers without -t nat.

Here's why I'd really like to ditch conntrack: it's a waste of resources for us. In fact, when we get up to around 40,000 users on a server (not doing this NAT stuff, just doing other things), the conntrack table can overflow and cause all kinds or problems. I could be wrong but I'm pretty confident we don't need it for anything -- our firewall rules can be really simple.

I read some of the new nftables documentation but as far as I can tell it's still doing the same hooks and doesn't get me any closer to success. As far as I can tell I can't escape conntrack for what I'm doing.

If you've read this far, thanks for reading; if you have any light to shed, thanks in advance!

-g


 		 	   		  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stateless NAT with iptables
  2015-01-09 18:38 Glen Miner
@ 2015-01-09 19:14 ` Kristian Evensen
  2015-01-09 20:32   ` Glen Miner
  0 siblings, 1 reply; 9+ messages in thread
From: Kristian Evensen @ 2015-01-09 19:14 UTC (permalink / raw)
  To: Glen Miner; +Cc: netfilter@vger.kernel.org

Hi,

On Fri, Jan 9, 2015 at 7:38 PM, Glen Miner <shaggie76@hotmail.com> wrote:
>
> After a day of hacking around and trolling the internet I'm more or less convinced this isn't possible with iptables but I figured I should ask in case I missed something.
>
> What we're trying to do is set up UDP NAT proxy for a pair of clients on demand.
>
> The mappings are preceded by some application layer traffic used to figure out where the mapping should go -- we're essentially trying to use the netfilter stack to do a blazing fast STUN/TURN server with no packet overhead. We need to bounce a few packets around on the ports we're going to proxy because this is how we figure out the remote peer address & ports we'll need to bind together.
>
> In my prototype this works but there's a problem: nf_conntrac gets in the way of the hand-off between the application layer socket and the NAT rule. I've tried telling iptables to go stateless:
>
> iptables -t raw -F
>
> iptables -t raw -I PREROUTING -j NOTRACK
> iptables -t raw -I OUTPUT -j NOTRACK
>
> But of course this means that none of my -t nat rules are checked; I've read the documentation several times now and I can't seem to find a way to have my cake and eat it too.
>
> In more detail here's how it currently works (and what goes wrong):
>
> Alice sends a message to the relay server saying "can you help me talk to Bob?" (the mechanics of this communication channel are unimportant)
>
> The relay server binds a pair of UDP sockets $anPort and $bnPort to be used for the proxy on $nAddr and tells
>
> Alice: send me a packet to $nAddr:$bnPort so I can figure out how your router NATs to that endpoint
> - this gives us $aAddr and $aPort
>
> Bob: send me a packet to $anPort
> - this gives us $bAddr and $bPort
>
> With that, the server then has enough information to create the following rules
>
> # A->B
> iptables -t nat -I PREROUTING -p udp -s $aAddr --sport $aPort -d $nAddr --dport $bnPort -j DNAT --to $bAddr:$bPort
> iptables -t nat -A POSTROUTING -p udp -d $bAddr --dport $bPort -j SNAT --to $nAddr:$anPort
> iptables -t nat -I OUTPUT -p udp -s $aAddr --sport $aPort -d $nAddr --dport $bnPort -j DNAT --to $bAddr:$bPort
>
> # B->A (NOTE: these are not needed per se if Alice sends a packet to Bob first)
> iptables -t nat -I PREROUTING -p udp -s $bAddr --sport $bPort -d $nAddr --dport $anPort -j DNAT --to $aAddr:$aPort
> iptables -t nat -A POSTROUTING -p udp -d $aAddr --dport $aPort -j SNAT --to $nAddr:$bnPort
> iptables -t nat -I OUTPUT -p udp -s $bAddr --sport $bPort -d $nAddr --dport $anPort -j DNAT --to $aAddr:$aPort
>
> And now as far as Alice is concerned Bob is at $nAddr:$bnPort and as far as Bob is concerned Alice is at $nAddr:$anPort
>
> We can technically close the sockets we used to discover the bind points -- they aren't needed any more -- but here's the rub: nf_conntrack_udp_timeout seconds has to elapse with NO traffic from Alice or Bob to the bind ports; evidently there is a connection kicking around that is preventing the NAT rules I added from taking over and the system default is 30 seconds.
>
> I can see the connection with this command:
>
> conntrack -L --proto udp --dport=$anPort
>
> Until this times out I get no joy and any stray packet from the respective peer will reset the countdown.
>
> There are two workarounds I can think of:
>
> 1) reduce nf_conntrack_udp_timeout
> 2) try a conntract -D command to clear the connection
> 3) before binding the local sockets I create rules to NOTRACK them while they're in userland
>
> The fourth option that I dream of, however, is a "please don't bother with nf_conntrack, I really don't need it, honest" -- this would be the ideal case but try as I might I can't find any way to make this work. NOTRACK means no -t nat and you can't rewrite source or dest headers without -t nat.
>
> Here's why I'd really like to ditch conntrack: it's a waste of resources for us. In fact, when we get up to around 40,000 users on a server (not doing this NAT stuff, just doing other things), the conntrack table can overflow and cause all kinds or problems. I could be wrong but I'm pretty confident we don't need it for anything -- our firewall rules can be really simple.
>
> I read some of the new nftables documentation but as far as I can tell it's still doing the same hooks and doesn't get me any closer to success. As far as I can tell I can't escape conntrack for what I'm doing.
>
> If you've read this far, thanks for reading; if you have any light to shed, thanks in advance!
>
> -g

I was recently working on something similar. My solution was to
combine NOTRACK with RAWDNAT/RAWSNAT in the raw/rawpost-tables. Of
course, this requires that you manage timeouts yourself. The raw table
is part of iptabes (iptables-extension?) but rawpost, RAWDNAT and
RAWSNAT requires xtables-addons and that you revert commit
9414a5df343bf30ba13e76dbd7181c55683b11cb.

I hope this helps!

-Kristian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Stateless NAT with iptables
  2015-01-09 19:14 ` Kristian Evensen
@ 2015-01-09 20:32   ` Glen Miner
  2015-01-09 20:53     ` Kristian Evensen
  0 siblings, 1 reply; 9+ messages in thread
From: Glen Miner @ 2015-01-09 20:32 UTC (permalink / raw)
  To: netfilter@vger.kernel.org

> I was recently working on something similar. My solution was to
> combine NOTRACK with RAWDNAT/RAWSNAT in the raw/rawpost-tables.

I've fiddled around with RAWDNAT but from what I can tell it doesn't support --to addr:port (just addr) like regular DNAT. Thank you for pointing it out the xtables-addons package to me, though.

-g 		 	   		  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stateless NAT with iptables
  2015-01-09 20:32   ` Glen Miner
@ 2015-01-09 20:53     ` Kristian Evensen
  2015-01-09 22:35       ` Glen Miner
  0 siblings, 1 reply; 9+ messages in thread
From: Kristian Evensen @ 2015-01-09 20:53 UTC (permalink / raw)
  To: Glen Miner; +Cc: netfilter@vger.kernel.org

Hi,

On Fri, Jan 9, 2015 at 9:32 PM, Glen Miner <shaggie76@hotmail.com> wrote:
> I've fiddled around with RAWDNAT but from what I can tell it doesn't support --to addr:port (just addr) like regular DNAT. Thank you for pointing it out the xtables-addons package to me, though.

Ah, yes, that is true, forgot about that. However, patching RAWDNAT to
support this should be too much work. You can probably copy most of
the code from DNAT.

-Kristian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Stateless NAT with iptables
  2015-01-09 20:53     ` Kristian Evensen
@ 2015-01-09 22:35       ` Glen Miner
  0 siblings, 0 replies; 9+ messages in thread
From: Glen Miner @ 2015-01-09 22:35 UTC (permalink / raw)
  To: netfilter@vger.kernel.org

> Ah, yes, that is true, forgot about that. However, patching RAWDNAT to
> support this should be too much work. You can probably copy most of
> the code from DNAT.

I fired up a clone of the xtables-addons repo and was surprised to find that the RAWNAT module was deleted over a year ago:

<https://sourceforge.net/p/xtables-addons/xtables-addons/ci/c024c6b16ee0eeff200b8a020b20ebf9d2a74738/>

I also had trouble getting autoconf to work; even if I were to resurrect the dead I may not be familiar enough with the tools involved to build and test any changes I might make. I sent the package maintainer a message, though; maybe with a little push I might try to help.

-g
  		 	   		  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stateless NAT with iptables
@ 2015-01-09 23:54 Jan Engelhardt
  2015-01-12 18:49 ` Glen Miner
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Engelhardt @ 2015-01-09 23:54 UTC (permalink / raw)
  To: Glen Miner; +Cc: Netfilter user mailing list, Kristian Evensen


> What we're trying to do is set up UDP NAT proxy for a pair of
> clients on demand. The mappings are preceded by some application
> layer traffic used to figure out where the mapping should go --
> we're essentially trying to use the netfilter stack to do a blazing
> fast STUN/TURN server with no packet overhead.
>
> In my prototype this works but there's a problem: nf_conntrac gets
> in the way of the hand-off between the application layer socket and
> the NAT rule. I've tried telling iptables to go stateless: 
> iptables -t raw -F
> iptables -t raw -I PREROUTING -j NOTRACK
> iptables -t raw -I OUTPUT -j NOTRACK

Rules are rather static, but your described scenario requires
rather dynamic changes. There is a much more targeted solution:

Once your application decides it is time to switch, it needs to
change the conntrack entry for the UDP session. conntrack(8)
can be used, and there is also a C interface with
libnetfilter_conntrack.

(With the conntrack command line utility, updating certain parameters is 
not possible because of the program's way of option parsing. This 
limitation may be gone in the C interface. With the command-line 
utility, one could however recreate the entry, if the lack of atomicity 
between the operations is bearable.)

conntrack -D -p udp \
	--src client_addr --dst stun_addr \
	--sport client_port --dport stun_port
conntrack -I -p udp \
	--src client_addr --dst stun_addr \
	--sport client_port --dport stun_port \
	--reply-src real_server_addr --reply-dst client_addr \
	--reply-src-port real_server_port \
	--reply-dst-port client_port

At least, that is the general idea.. whether it works, or if it needs
additional commands, I have not tried.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Stateless NAT with iptables
  2015-01-09 23:54 Stateless NAT with iptables Jan Engelhardt
@ 2015-01-12 18:49 ` Glen Miner
  2015-01-12 19:25   ` Glen Miner
  2015-01-12 22:06   ` Marcelo Ricardo Leitner
  0 siblings, 2 replies; 9+ messages in thread
From: Glen Miner @ 2015-01-12 18:49 UTC (permalink / raw)
  To: netfilter@vger.kernel.org

> Rules are rather static, but your described scenario requires
> rather dynamic changes. There is a much more targeted solution:
>
> Once your application decides it is time to switch, it needs to
> change the conntrack entry for the UDP session. conntrack(8)
> can be used, and there is also a C interface with
> libnetfilter_conntrack.
>
> (With the conntrack command line utility, updating certain parameters is
> not possible because of the program's way of option parsing. This
> limitation may be gone in the C interface. With the command-line
> utility, one could however recreate the entry, if the lack of atomicity
> between the operations is bearable.)
>
> conntrack -D -p udp \
> --src client_addr --dst stun_addr \
> --sport client_port --dport stun_port
> conntrack -I -p udp \
> --src client_addr --dst stun_addr \
> --sport client_port --dport stun_port \
> --reply-src real_server_addr --reply-dst client_addr \
> --reply-src-port real_server_port \
> --reply-dst-port client_port
>
> At least, that is the general idea.. whether it works, or if it needs
> additional commands, I have not tried.

This is indeed an exciting alternative! It might make it possible to avoid connection tracking for all other regular traffic through this box too so I am quite excited!

However, I've tried to get it to work and must be missing something subtle.

In my test I'm hosting my clients on 

10.0.1.7:5000 (Alice)
->
10.0.1.8 (Relay Server) port 5001 relayed to Bob via 5002
->
10.0.1.7:5003 (Bob)

Using iptables (the old method) I create the NAT rules like this

aAddr=10.0.1.7
aPort=5000
bAddr=10.0.1.7
bPort=5003
nAddr=10.0.1.8
anPort=5002
bnPort=5001

iptables -t nat -I PREROUTING -p udp -s $aAddr --sport $aPort -d $nAddr --dport $bnPort -j DNAT --to $bAddr:$bPort
iptables -t nat -A POSTROUTING -p udp -d $bAddr --dport $bPort -j SNAT --to $nAddr:$anPort
iptables -t nat -I OUTPUT -p udp -s $aAddr --sport $aPort -d $nAddr --dport $bnPort -j DNAT --to $bAddr:$bPort

I can see the connection with conntrack -L -p udp 

udp      17 24  src=10.0.1.7 dst=10.0.1.8 sport=5000 dport=5001 [UNREPLIED] src=10.0.1.7 dst=10.0.1.8 sport=5003 dport=5002           mark=0 use=1

And Bob gets the packets and everything works.

I tried repeating this experiment (without the complexity of the intermediate handshake, just by creating the connection manually:

I cleared the nat rules from previous test with:

iptables -t nat -F

I flushed all udp tracked connections:

conntrack -D -p udp

Then I created the relay connection:

conntrack -I -p udp \
--src $aAddr --sport $aPort \
--dst $nAddr --dport $bnPort \
--reply-src $bAddr --reply-port-src $bPort \
--reply-dst $nAddr --reply-port-dst $anPort \
--timeout 600

But my packets all bounce with ICMP Port Unreachable. 

I compared the conntrack -L -p udp output and it looked essentially the same.

I also read the man page which suggested using this instead of --src/--dest

--orig-src $aAddr --orig-port-src $aPort \
--orig-dst $nAddr --orig-port-dst $bnPort \

But the conntrack entry looks the same.

It feels like some aspect of the firewall is not being informed of this state. 

I checked iptables -L, iptables -L -t nat, and -t raw and everything is empty. 

Can you think of something I've missed?

-g

 		 	   		  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Stateless NAT with iptables
  2015-01-12 18:49 ` Glen Miner
@ 2015-01-12 19:25   ` Glen Miner
  2015-01-12 22:06   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 9+ messages in thread
From: Glen Miner @ 2015-01-12 19:25 UTC (permalink / raw)
  To: netfilter@vger.kernel.org

> However, I've tried to get it to work and must be missing something subtle.

More diagnostics on this approach not working out:

If I watch the conntrack event log with conntrack -E -p udp

The iptables -t nat... method logs this:

    [NEW] udp      17 30 src=10.0.1.7 dst=10.0.1.8 sport=5000 dport=5001 [UNREPLIED] src=10.0.1.7 dst=10.0.1.8 sport=5003 dport=5002

But the conntrack -I ... method logs this:

 [UPDATE] udp      17 120 src=10.0.1.7 dst=10.0.1.8 sport=5000 dport=5001 [UNREPLIED] src=10.0.1.7 dst=10.0.1.8 sport=5003 dport=5002 mark=0

(note that I double checked that there were no previously existing udp entries in the table so I can't explain the update vs new). 

-g

 		 	   		  

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stateless NAT with iptables
  2015-01-12 18:49 ` Glen Miner
  2015-01-12 19:25   ` Glen Miner
@ 2015-01-12 22:06   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 9+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-01-12 22:06 UTC (permalink / raw)
  To: Glen Miner, netfilter@vger.kernel.org

On 12-01-2015 16:49, Glen Miner wrote:
>> Rules are rather static, but your described scenario requires
>> rather dynamic changes. There is a much more targeted solution:
>>
>> Once your application decides it is time to switch, it needs to
>> change the conntrack entry for the UDP session. conntrack(8)
>> can be used, and there is also a C interface with
>> libnetfilter_conntrack.
>>
>> (With the conntrack command line utility, updating certain parameters is
>> not possible because of the program's way of option parsing. This
>> limitation may be gone in the C interface. With the command-line
>> utility, one could however recreate the entry, if the lack of atomicity
>> between the operations is bearable.)
>>
>> conntrack -D -p udp \
>> --src client_addr --dst stun_addr \
>> --sport client_port --dport stun_port
>> conntrack -I -p udp \
>> --src client_addr --dst stun_addr \
>> --sport client_port --dport stun_port \
>> --reply-src real_server_addr --reply-dst client_addr \
>> --reply-src-port real_server_port \
>> --reply-dst-port client_port
>>
>> At least, that is the general idea.. whether it works, or if it needs
>> additional commands, I have not tried.
>
> This is indeed an exciting alternative! It might make it possible to avoid connection tracking for all other regular traffic through this box too so I am quite excited!
>
> However, I've tried to get it to work and must be missing something subtle.
>
> In my test I'm hosting my clients on
>
> 10.0.1.7:5000 (Alice)
> ->
> 10.0.1.8 (Relay Server) port 5001 relayed to Bob via 5002
> ->
> 10.0.1.7:5003 (Bob)
>
> Using iptables (the old method) I create the NAT rules like this
>
> aAddr=10.0.1.7
> aPort=5000
> bAddr=10.0.1.7
> bPort=5003
> nAddr=10.0.1.8
> anPort=5002
> bnPort=5001
>
> iptables -t nat -I PREROUTING -p udp -s $aAddr --sport $aPort -d $nAddr --dport $bnPort -j DNAT --to $bAddr:$bPort
> iptables -t nat -A POSTROUTING -p udp -d $bAddr --dport $bPort -j SNAT --to $nAddr:$anPort
> iptables -t nat -I OUTPUT -p udp -s $aAddr --sport $aPort -d $nAddr --dport $bnPort -j DNAT --to $bAddr:$bPort
>
> I can see the connection with conntrack -L -p udp
>
> udp      17 24  src=10.0.1.7 dst=10.0.1.8 sport=5000 dport=5001 [UNREPLIED] src=10.0.1.7 dst=10.0.1.8 sport=5003 dport=5002           mark=0 use=1
>
> And Bob gets the packets and everything works.
>
> I tried repeating this experiment (without the complexity of the intermediate handshake, just by creating the connection manually:
>
> I cleared the nat rules from previous test with:
>
> iptables -t nat -F
>
> I flushed all udp tracked connections:
>
> conntrack -D -p udp
>
> Then I created the relay connection:
>
> conntrack -I -p udp \
> --src $aAddr --sport $aPort \
> --dst $nAddr --dport $bnPort \
> --reply-src $bAddr --reply-port-src $bPort \
> --reply-dst $nAddr --reply-port-dst $anPort \
> --timeout 600
>
> But my packets all bounce with ICMP Port Unreachable.
>
> I compared the conntrack -L -p udp output and it looked essentially the same.
>
> I also read the man page which suggested using this instead of --src/--dest
>
> --orig-src $aAddr --orig-port-src $aPort \
> --orig-dst $nAddr --orig-port-dst $bnPort \
>
> But the conntrack entry looks the same.
>
> It feels like some aspect of the firewall is not being informed of this state.
>
> I checked iptables -L, iptables -L -t nat, and -t raw and everything is empty.
>
> Can you think of something I've missed?

(be warned, I didn't read the full thread, sorry)

Having the conntrack entry is not enough to get your packets NATed and, 
AFAICT, you flushed the nat tables, so your packets are probably getting 
routed to the original destination with the original src info, despite the 
conntrack entry being there.

It's like: ok, packets like this are expected, but nothing is updating them to 
be like that..

   Marcelo


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-01-12 22:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-09 23:54 Stateless NAT with iptables Jan Engelhardt
2015-01-12 18:49 ` Glen Miner
2015-01-12 19:25   ` Glen Miner
2015-01-12 22:06   ` Marcelo Ricardo Leitner
  -- strict thread matches above, loose matches on Subject: below --
2015-01-09 18:38 Glen Miner
2015-01-09 19:14 ` Kristian Evensen
2015-01-09 20:32   ` Glen Miner
2015-01-09 20:53     ` Kristian Evensen
2015-01-09 22:35       ` Glen Miner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.