multiprimary conntrackd setup

Linux Netfilter discussions
 help / color / mirror / Atom feed

* multiprimary conntrackd setup
@ 2008-06-17 11:02 Sebastian Vieira
  2008-06-18 13:05 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Vieira @ 2008-06-17 11:02 UTC (permalink / raw)
  To: netfilter

Hi,

I must be looking in the wrong places for documentation but so far i'm
unable to find it. I'm trying to set up a multiprimary (active-active)
conntrackd on 2 firewalls. I have conntrackd running on both nodes and
'conntrackd -s' shows that mcast is working. However, i still have to
do a manual 'conntrackd -c;conntrackd -R' to sync both tables (as
would be proper in a failover / active-backup situation). Other than
enable  CacheWriteThrough , i couldn't find anything on multiprimary
setup. If someone could point me to the correct documentation, i would
be very happy indeed :)

thanks,

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: multiprimary conntrackd setup
  2008-06-17 11:02 multiprimary conntrackd setup Sebastian Vieira
@ 2008-06-18 13:05 ` Pablo Neira Ayuso
  2008-06-23  6:46   ` Sebastian Vieira
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2008-06-18 13:05 UTC (permalink / raw)
  To: Sebastian Vieira; +Cc: netfilter

Sebastian Vieira wrote:
> Hi,
> 
> I must be looking in the wrong places for documentation but so far i'm
> unable to find it. I'm trying to set up a multiprimary (active-active)
> conntrackd on 2 firewalls. I have conntrackd running on both nodes and
> 'conntrackd -s' shows that mcast is working. However, i still have to
> do a manual 'conntrackd -c;conntrackd -R' to sync both tables (as
> would be proper in a failover / active-backup situation). Other than
> enable  CacheWriteThrough , i couldn't find anything on multiprimary
> setup.

What kind of active-active? There are two kind:
a) symmetric or flow-based: the packets are always handled by the same
firewall replica. In this case, you only have to call conntrackd -c
during the failover (which is usually done by your HA manager such as
keepalived).

b) asymmetric or packet-based: typical case of OSPF setups, there is no
guarantees that the packet is handled by the same firewall replica as
OSPF may change the routes at any time. In that case, you have to enable
the CacheWriteThrough. However, from the design point of view,
conntrackd suits better in the scenario a).

> If someone could point me to the correct documentation, i would
> be very happy indeed :)

There's no documentation on active-active setups yet but there will be
some at some point for sure. Anyway, I'd appreciate if you can write it.
Feel free to ask whatever you need.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: multiprimary conntrackd setup
  2008-06-18 13:05 ` Pablo Neira Ayuso
@ 2008-06-23  6:46   ` Sebastian Vieira
  2008-06-23  9:09     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Vieira @ 2008-06-23  6:46 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

On Wed, Jun 18, 2008 at 3:05 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> What kind of active-active? There are two kind:

-snip-

> b) asymmetric or packet-based: typical case of OSPF setups, there is no
> guarantees that the packet is handled by the same firewall replica as
> OSPF may change the routes at any time. In that case, you have to enable
> the CacheWriteThrough. However, from the design point of view,
> conntrackd suits better in the scenario a).

I'm using the asymmetric setup. Two firewalls connected with BGP to
the service provider, and as you mentioned, no way of knowing which
firewall handles which packet.

But the funny thing is, that it's working now :)  Yes, i enabled the
CacheWriteThrough option, but i was testing with ICMP's. Later i
learnt that ICMP is a kind of unreliable protocol because when i
tested it with a simple tcp connection it worked fine.

I'm still fiddling around a bit with the ip_conntrack_max sysctl
setting because i tend to get dropped packets. Also `conntrackd -s`
indicated that for both nodes it failed to destroy connections on
internal cache. These numbers roughly match  the other node's
succesfully destroyed connections:

node1:
connections destroyed:		    31473050	failed:	        7334

node2:
connections destroyed:		        7441	failed:	    31475657

Is this something i need to worry about?

regards,

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: multiprimary conntrackd setup
  2008-06-23  6:46   ` Sebastian Vieira
@ 2008-06-23  9:09     ` Pablo Neira Ayuso
  2008-06-23 12:42       ` Sebastian Vieira
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2008-06-23  9:09 UTC (permalink / raw)
  To: Sebastian Vieira; +Cc: netfilter

Sebastian Vieira wrote:
> On Wed, Jun 18, 2008 at 3:05 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> What kind of active-active? There are two kind:
> 
> -snip-
> 
>> b) asymmetric or packet-based: typical case of OSPF setups, there is no
>> guarantees that the packet is handled by the same firewall replica as
>> OSPF may change the routes at any time. In that case, you have to enable
>> the CacheWriteThrough. However, from the design point of view,
>> conntrackd suits better in the scenario a).
> 
> I'm using the asymmetric setup. Two firewalls connected with BGP to
> the service provider, and as you mentioned, no way of knowing which
> firewall handles which packet.
> 
> But the funny thing is, that it's working now :)  Yes, i enabled the
> CacheWriteThrough option, but i was testing with ICMP's. Later i
> learnt that ICMP is a kind of unreliable protocol because when i
> tested it with a simple tcp connection it worked fine.

As you said, replicating ICMP does not make too much sense to me either.

Some considerations on this setup: There's a shortcoming in the
asymmetric approach. conntrackd performs much better in a flow-based
multiprimary setup.

The multipath setup that you're using works fine iif the RTT between the
FW cluster and the server peer is greater than the time to send and
inject the state change from FW1 to FW2. Otherwise, you'll probably
notice a slow down in the connection setup. This condition fulfills if
the server peer is in the Internet (DSL RTT is ~30 ms and the
synchronization messages barely take 0.01 ms here). This limitation
happens due to the asynchronous nature of the solution. The design of
conntrackd supports this scenario but flow-based performs much better.

In short: BGP works at packet level, when the stateful firewalling
operate at flow level.

> I'm still fiddling around a bit with the ip_conntrack_max sysctl
> setting because i tend to get dropped packets. Also `conntrackd -s`
> indicated that for both nodes it failed to destroy connections on
> internal cache. These numbers roughly match  the other node's
> succesfully destroyed connections:
> 
> node1:
> connections destroyed:		    31473050	failed:	        7334
> 
> node2:
> connections destroyed:		        7441	failed:	    31475657
> 
> Is this something i need to worry about?

Well, I need to know which replication approach you're using. Anyhow,
I'll try to do several assumptions from the information that you've posted.

Basically, that output means that node1 has try to destroy 7334
connections that were not available it its cache. Since you have trimmed
the output, I don't know if it's the internal or external cache.
Assuming that the information that you've posted talks about node1's
internal cache and node2's external cache:

1) node1 did not resynchronize against the kernel conntrack table at
startup (you forgot to include conntrackd -R in your scripts for force
resynchronization between internal cache <-> kernel conntrack table).
Use conntrackd -i to check if the output is similar to conntrack -L.

2) node2 has try to destroy several connections in its external cache
that were not available. This means that node2 did not issue a
conntrackd -n to resynchronize its external cache to node1's internal
cache (assuming that you're using FTFW or NOTRACK approach).

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: multiprimary conntrackd setup
  2008-06-23  9:09     ` Pablo Neira Ayuso
@ 2008-06-23 12:42       ` Sebastian Vieira
  2008-06-24 16:06         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Vieira @ 2008-06-23 12:42 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

On Mon, Jun 23, 2008 at 11:09 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:

First of all, thanks alot for the (quick) response, i appreciate it!

> As you said, replicating ICMP does not make too much sense to me either.
>
> Some considerations on this setup: There's a shortcoming in the
> asymmetric approach. conntrackd performs much better in a flow-based
> multiprimary setup.
>
> The multipath setup that you're using works fine iif the RTT between the
> FW cluster and the server peer is greater than the time to send and
> inject the state change from FW1 to FW2. Otherwise, you'll probably
> notice a slow down in the connection setup. This condition fulfills if
> the server peer is in the Internet (DSL RTT is ~30 ms and the
> synchronization messages barely take 0.01 ms here). This limitation
> happens due to the asynchronous nature of the solution. The design of
> conntrackd supports this scenario but flow-based performs much better.
>

Right. Is there any way to measure these synchronization times? The
setup is located on the LAN where even the synchronization messages
are passing through a switch. Maybe we can overcome this by hooking up
a crosscable but that depends, i don't know if we have free NIC :)

> Well, I need to know which replication approach you're using. Anyhow,
> I'll try to do several assumptions from the information that you've posted.
>
> Basically, that output means that node1 has try to destroy 7334
> connections that were not available it its cache. Since you have trimmed
> the output, I don't know if it's the internal or external cache.

Both is the output of internal cache. I'll paste it in full below.
Note that conntrackd was just restarted a couple of minutes ago:

node1:
cache internal:
current active connections:            15200
connections created:                   28326    failed:            0
connections updated:                   68477    failed:            0
connections destroyed:                 13126    failed:            1

cache external:
current active connections:              167
connections created:                     167    failed:            0
connections updated:                     434    failed:            0
connections destroyed:                     0    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic:
             6735580 Bytes sent                53708 Bytes recv
               75025 Pckts sent                  596 Pckts recv
                   0 Error send                    0 Error recv

multicast sequence tracking:
                   0 Pckts mfrm                    0 Pckts lost


node2:
cache internal:
current active connections:             1699
connections created:                    1826    failed:            0
connections updated:                     636    failed:            0
connections destroyed:                   127    failed:          804

cache external:
current active connections:            11893
connections created:                   11989    failed:            0
connections updated:                   68991    failed:            0
connections destroyed:                    96    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic:
               58372 Bytes sent              6810200 Bytes recv
                 646 Pckts sent                75940 Pckts recv
                   0 Error send                    0 Error recv

multicast sequence tracking:
                   0 Pckts mfrm                    0 Pckts lost



And for completeness' sake, the conntrackd.conf for both nodes (where
only IPv4_interface differs) :

fw02:~# cat /etc/conntrackd/conntrackd.conf
Sync {
        Mode NOTRACK {
                CommitTimeout 180
        }
        Multicast {
                IPv4_address 225.0.0.50
                IPv4_interface 172.29.254.3
                Interface eth1
                Group 3780
                McastSndSocketBuffer 1249280
                McastRcvSocketBuffer 1249280
	}
	Checksum on
	CacheWriteThrough On
}

General {
	HashSize 8192
	HashLimit 65535
	LockFile /var/lock/conntrack.lock
	LogFile /var/log/conntrackd.log
	UNIX {
		Path /tmp/sync.sock
		Backlog 20
	}
	SocketBufferSize 262142
	SocketBufferSizeMaxGrown 655355
}

IgnoreTrafficFor {
	IPv4_address 172.29.254.3	# loc (fw02)
	IPv4_address 172.29.254.2	# loc (fw01)
	IPv4_address 172.29.253.1	# loc
	IPv4_address 127.0.0.1		# loopback
}

IgnoreProtocol {
	UDP
	VRRP
}


regards,

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: multiprimary conntrackd setup
  2008-06-23 12:42       ` Sebastian Vieira
@ 2008-06-24 16:06         ` Pablo Neira Ayuso
  2008-06-25 21:02           ` Sebastian Vieira
  0 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2008-06-24 16:06 UTC (permalink / raw)
  To: Sebastian Vieira; +Cc: netfilter

Sebastian Vieira wrote:
> On Mon, Jun 23, 2008 at 11:09 AM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 
> First of all, thanks alot for the (quick) response, i appreciate it!
> 
>> As you said, replicating ICMP does not make too much sense to me either.
>>
>> Some considerations on this setup: There's a shortcoming in the
>> asymmetric approach. conntrackd performs much better in a flow-based
>> multiprimary setup.
>>
>> The multipath setup that you're using works fine iif the RTT between the
>> FW cluster and the server peer is greater than the time to send and
>> inject the state change from FW1 to FW2. Otherwise, you'll probably
>> notice a slow down in the connection setup. This condition fulfills if
>> the server peer is in the Internet (DSL RTT is ~30 ms and the
>> synchronization messages barely take 0.01 ms here). This limitation
>> happens due to the asynchronous nature of the solution. The design of
>> conntrackd supports this scenario but flow-based performs much better.
> 
> Right. Is there any way to measure these synchronization times? The
> setup is located on the LAN where even the synchronization messages
> are passing through a switch. Maybe we can overcome this by hooking up
> a crosscable but that depends, i don't know if we have free NIC :)

Actually, the nodes must use a dedicated link, otherwise you risk to
leak state information. And please, elaborate your setup a bit more.

>> Well, I need to know which replication approach you're using. Anyhow,
>> I'll try to do several assumptions from the information that you've posted.
>>
>> Basically, that output means that node1 has try to destroy 7334
>> connections that were not available it its cache. Since you have trimmed
>> the output, I don't know if it's the internal or external cache.
> 
> Both is the output of internal cache. I'll paste it in full below.
> Note that conntrackd was just restarted a couple of minutes ago:
> 
> node1:
> cache internal:
> current active connections:            15200
> connections created:                   28326    failed:            0
> connections updated:                   68477    failed:            0
> connections destroyed:                 13126    failed:            1
> 
> cache external:
> current active connections:              167
> connections created:                     167    failed:            0
> connections updated:                     434    failed:            0
> connections destroyed:                     0    failed:            0
> 
> traffic processed:
>                    0 Bytes                         0 Pckts
> 
> multicast traffic:
>              6735580 Bytes sent                53708 Bytes recv
>                75025 Pckts sent                  596 Pckts recv
>                    0 Error send                    0 Error recv
> 
> multicast sequence tracking:
>                    0 Pckts mfrm                    0 Pckts lost
> 
> 
> node2:
> cache internal:
> current active connections:             1699
> connections created:                    1826    failed:            0
> connections updated:                     636    failed:            0
> connections destroyed:                   127    failed:          804
> 
> cache external:
> current active connections:            11893
> connections created:                   11989    failed:            0
> connections updated:                   68991    failed:            0
> connections destroyed:                    96    failed:            0
> 
> traffic processed:
>                    0 Bytes                         0 Pckts
> 
> multicast traffic:
>                58372 Bytes sent              6810200 Bytes recv
>                  646 Pckts sent                75940 Pckts recv
>                    0 Error send                    0 Error recv
> 
> multicast sequence tracking:
>                    0 Pckts mfrm                    0 Pckts lost
> 
> 
> 
> And for completeness' sake, the conntrackd.conf for both nodes (where
> only IPv4_interface differs) :
> 
> fw02:~# cat /etc/conntrackd/conntrackd.conf
> Sync {
>         Mode NOTRACK {
>                 CommitTimeout 180
>         }

If you're using NOTRACK, the nodes do not seem to be in sync as the
number of internal cache entries in node1 must be equal to node2's in
the external cache. I guess that you've been testing the failover
several times before posting this results. BTW, which HA manager are you
using? The HA manager is required to assist conntrackd as it invokes
several important commands (see the scripts).

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: multiprimary conntrackd setup
  2008-06-24 16:06         ` Pablo Neira Ayuso
@ 2008-06-25 21:02           ` Sebastian Vieira
  2008-06-26 15:25             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Vieira @ 2008-06-25 21:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

On Tue, Jun 24, 2008 at 6:06 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Actually, the nodes must use a dedicated link, otherwise you risk to
> leak state information. And please, elaborate your setup a bit more.

Hm, the hardware lacks a dedicated link at the moment so i send mcast
traffic over the least used link. I'll see if i can add an extra NIC
to get the dedicated link up.

I'll try to describe the setup better:

There are two firewalls, both configured as BGP routers using quagga.
Right now we have configured it so that traffic leaves through one
node and comes back through the other. There is no HA software
configured for this, only for the floating IP (main gateway ip). It
was my understanding that connection updates would be inserted
immediately, not depending on a HA manager script.

Also, since this is a production environment and we can't really
'experiment' too much with it, the way we're similating the
active-active setup now is by adding a static route to some external
host via the inactive firewall so the packet route is as follows:

[our host in dmz] -----> [fw02] -----> [external host] -------->
[fw01] ------> [our host in dmz]

But i guess that the ACK to our external host, and the SYN/ACK
response from it, is faster that the conntrackd synchronization
between fw01 and 02. Am i right in that assumption?

> If you're using NOTRACK, the nodes do not seem to be in sync as the
> number of internal cache entries in node1 must be equal to node2's in
> the external cache. I guess that you've been testing the failover
> several times before posting this results. BTW, which HA manager are you
> using? The HA manager is required to assist conntrackd as it invokes
> several important commands (see the scripts).

See above. Did i understand the working of conntrackd incorrectly?

We have somewhat come to the conclusion that a multiprimary firewall
setup is maybe impossible to accomplish due to the latency and that it
might be better (read: easier) to split the routers from the firewalls
(they are now one and the same physical machine), have only 1 firewall
active at the time and add the HA manager to sync conntrack tables
upon failover.

regards,

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: multiprimary conntrackd setup
  2008-06-25 21:02           ` Sebastian Vieira
@ 2008-06-26 15:25             ` Pablo Neira Ayuso
  0 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2008-06-26 15:25 UTC (permalink / raw)
  To: Sebastian Vieira; +Cc: netfilter

Sebastian Vieira wrote:
> On Tue, Jun 24, 2008 at 6:06 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> Actually, the nodes must use a dedicated link, otherwise you risk to
>> leak state information. And please, elaborate your setup a bit more.
> 
> Hm, the hardware lacks a dedicated link at the moment so i send mcast
> traffic over the least used link. I'll see if i can add an extra NIC
> to get the dedicated link up.
> 
> I'll try to describe the setup better:
> 
> There are two firewalls, both configured as BGP routers using quagga.
> Right now we have configured it so that traffic leaves through one
> node and comes back through the other. There is no HA software
> configured for this, only for the floating IP (main gateway ip). It
> was my understanding that connection updates would be inserted
> immediately, not depending on a HA manager script.

Yes, but the HA manager assists conntrackd to flush/request a resync/etc
whenever a node comes up/down. Otherwise, you'll probably get flow
entries stuck into conntrackd forever.

> Also, since this is a production environment and we can't really
> 'experiment' too much with it, the way we're similating the
> active-active setup now is by adding a static route to some external
> host via the inactive firewall so the packet route is as follows:
> 
> [our host in dmz] -----> [fw02] -----> [external host] -------->
> [fw01] ------> [our host in dmz]
> 
> But i guess that the ACK to our external host, and the SYN/ACK
> response from it, is faster that the conntrackd synchronization
> between fw01 and 02. Am i right in that assumption?

It depends on where your external host is, ie. the RTT between the
external and your dmz host (see my previous email). Moreover, the CPU
consumption with the asymmetric approach is higher since:

a) you have to inject every single state change into the kernel.
b) you have to replicate every single state-change.

The asymmetric multiprimary is less performant the symmetric approach.

>> If you're using NOTRACK, the nodes do not seem to be in sync as the
>> number of internal cache entries in node1 must be equal to node2's in
>> the external cache. I guess that you've been testing the failover
>> several times before posting this results. BTW, which HA manager are you
>> using? The HA manager is required to assist conntrackd as it invokes
>> several important commands (see the scripts).
> 
> See above. Did i understand the working of conntrackd incorrectly?
> 
> We have somewhat come to the conclusion that a multiprimary firewall
> setup is maybe impossible to accomplish due to the latency and that it
> might be better (read: easier) to split the routers from the firewalls
> (they are now one and the same physical machine), have only 1 firewall
> active at the time and add the HA manager to sync conntrack tables
> upon failover.

The per-packet (asymmetric) multiprimary support is possible but, as
said, I'd suggest a per-flow multiprimary, ie. the same firewall always
handles the same subset of flows. I have setup one symmetric
multiprimary testbed with ClusterIP, however, this target is focused on
backend clustering - thus not for gateway clustering.

Anytime soon, I'd like to come with a new target similar to clusterIP
but for gatweays and, of course, some documentation to avoid this sort
of threads.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-26 15:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-17 11:02 multiprimary conntrackd setup Sebastian Vieira
2008-06-18 13:05 ` Pablo Neira Ayuso
2008-06-23  6:46   ` Sebastian Vieira
2008-06-23  9:09     ` Pablo Neira Ayuso
2008-06-23 12:42       ` Sebastian Vieira
2008-06-24 16:06         ` Pablo Neira Ayuso
2008-06-25 21:02           ` Sebastian Vieira
2008-06-26 15:25             ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox