All of lore.kernel.org
 help / color / mirror / Atom feed
* conntrackd: synchronization failures
@ 2017-01-11 20:06 Jiri Kosina
  2017-01-11 20:12 ` Jiri Kosina
  2017-01-11 21:57 ` Jiri Kosina
  0 siblings, 2 replies; 7+ messages in thread
From: Jiri Kosina @ 2017-01-11 20:06 UTC (permalink / raw)
  To: netfilter; +Cc: info

Hi,

I've tried to use conntrackd to provide full connection tracking for 
firewalling and NAT in an asymmetric routing situation, but unfortunately 
the synchronization is failing quite often.

The setup is rather simple:

		  /  Internet  \
		 /		\
		RS1 <conntrackd> RS2
		   \	       /
		     \	LAN /
			
RS1 and RS2 have the same netfilter rules applied. RS1 is mostly used for 
sending outgoing traffic from LAN to Internet, while RS2 is mostly used 
for packets coming from Internet to LAN (LAN is mostly "consumer", so the 
Internet->LAN traffic is of a much higher volume (~1Gbps spikes) compared 
to ~200Mbps of outgoing traffic).

RS1 and RS2 are connected via direct link (~0.06ms) which is reserved for 
conntrackd.

conntrackd.conf can be found below.

The issues:

- minor: every few minutes, error message of either of the two forms below 
  appears in the log

	(pid=11083) [ERROR] inject-add2: File exists
	        tcp      6 120 FIN_WAIT src=10.33.12.15 dst=116.31.116.30 sport=22 dport=44232 [ASSURED]
	(pid=11083) [ERROR] inject-add2: Device or resource busy
		tcp      6 300 CLOSE src=10.33.40.102 dst=216.58.201.110 sport=53660 dport=443 [UNREPLIED]

  etc. (states, IPs, ports differ).

- major: roughly once in a few hours the frequency of these messages on 
  one of the router (usually RS2) starts spitting the 'File exists' messages
  much more frequently, and the traffic on the dedicated link 
  dramatically decreases for some reason (IOW conntrackds stop syncing as 
  they were before, while the actual 'data' traffic between LAN and 
  Internet is still the same). This is how the traffic on dedicated link 
  looks like when the issue appears:

	http://www.jikos.cz/jikos/junk/conntrackd.jpg

  the down-spike is where connections between LAN and Internet start to 
  fail (no ACKs coming back from SYNs, etc, as RS2 is dropping everything 
  due to conntrack being out of sync), and the up-spike, bringing things 
  back to normal, is where either conntrackd is restarted, or the incoming
  traffic is cut (by shutting BGP sessions down).

I don't really have a good trial-test environment, as this is happening on 
a production network that is hard to emulate.

Any ideas what might be causing this, or any hints to to efficiently (and 
non-disruptively) debug the issue?

Thanks,

-- 
Jiri Kosina <jikos@kernel.org>
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-10 11:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-11 20:06 conntrackd: synchronization failures Jiri Kosina
2017-01-11 20:12 ` Jiri Kosina
2017-01-11 21:57 ` Jiri Kosina
2017-01-12  7:58   ` Arturo Borrero Gonzalez
2017-01-13 15:01     ` Jiri Kosina
2017-01-16 10:16       ` Pablo Neira Ayuso
2017-03-10 11:26     ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.