From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pablo Neira Ayuso Subject: Re: conntrackd high cpu usage Date: Mon, 16 Jan 2012 23:58:10 +0100 Message-ID: <20120116225810.GC17879@1984> References: <20120116112807.GA11934@1984> Mime-Version: 1.0 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: Sender: netfilter-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="utf-8" To: Stefan Majer Cc: netfilter@vger.kernel.org On Mon, Jan 16, 2012 at 08:53:23PM +0100, Stefan Majer wrote: > Hi Pablo, >=20 > On Mon, Jan 16, 2012 at 12:28 PM, Pablo Neira Ayuso wrote: > > Hi Stefan, > > > > On Mon, Jan 09, 2012 at 07:49:55PM +0100, Stefan Majer wrote: > >> Hi, > >> > >> we have 2 8core Xeon Boxes with 2 Intel X520 10GBit Adapter runnin= g > >> rhel 6.1 as redundant firewall. > > > > Interesting setup. So far, the reports of conntrackd usage that > > I've received are deployments with 1GBit NICs and smaller machines > > (up to 2-4 cores). > > > >> On every node we have conntrackd installed with a FTFW mode, we > >> synchronize all states. > >> Synchronization is made over multicast on a dedicated vlan interfa= ce. > >> The Firewall itself actually have around 300 vlans active. > >> > >> Actually we see permanent ~400 new connections/sec with peaks at 8= 00 > >> conn/sec. > > > > I've been abled to reach up to 20000 sessions/sec with 6 years old > > hardward (dual core, 2.4GHz, 1Gbit links). I know people that > > got better results in more modern hardware. >=20 > This would be sufficient for our use case but... >=20 > > > > You may want to enable the reliable synchronization option in > > conntrackd. With it, conntrackd starts dropping packets if the > > synchronization does not happen timely. >=20 > This is probably not what we want as this prevent a working state on > the secondary machine at any time right ? the reliable synchronization means that we drop network packets in the primary if we cannot back off (the rate of state-changes/s is so high that conntrackd starts dropping events of state-changes coming from the kernel). See NetlinkEventsReliable option. > >> With this load the conntrackd consumes about 15 - 25 % CPU from on= e > >> CPU on the active side and about 5% CPU usage on the passive side. > >> Is this expected ? > > > > What tool are you using to obtain those measurements? >=20 > This was actually with measured with top. >=20 > > top is fine for estimated load, but it's inaccurate. sysstat is a simple tool and it's bit better. > > Still, full state synchronization is a resource consuming task >=20 > Is it possible to reduce the synchronization of specifc state events > to ESTABLISHED, and NEW for example > without loosing a working state on the secondary side ? Yes, please have a look at the conntrack-tools user-manual documentatio= n. See the CT target iniptables. > >> This is our Testing environment, and we expect much higher (~10 - = 20 > >> times) connection rates. > >> > >> This would not be possible with the current setup, as this would b= e > >> cpu bound on the conntrackd, as this daemon is single threaded. > >> Is there any way to make this process faster, eg. make the > >> synchronization multi threaded ? > > > > There several things that we can do to improve conntrackd performan= ce > > (from the development side): > > > > 1) port conntrackd to libmnl to use recvmmsg system call. > > 2) implement netlink multi-queue, we discussed this during the > > NFWS2010. The idea is to implement something similar to the existin= g > > nfqueue multiqueue load balancing (see --queue-balance in iptables'= s > > NFQUEUE). It's similar to multi-threading that you're proposing. > > 3) implement batching for the commit operation. > > > > So far, nobody has come to show interest on these tasks. Recent > > enhancements for conntrackd have focused on adding new features. >=20 > This sounds all great but i have no idea how much this would increase > performance. > We will first try to measure our current environment how many conn/se= c > we are able to synchronize. I don't have numbers because it's not implemented yet ;-), but I'm sure this will boost performance considerably. The recvmmsg will reduce the huge amount of recv system calls that happen under heavy load to allow conntrackd receiving state-change events from kernel-space. The multiqueue approach will let it scale for a high number of processors / cores. The batching will allow us to reduce the time to inject the states into the kernel. > >> I already did some perf analysis, but they didnt gave us much ligh= t. > > > > What tools are you using? >=20 > we were using perf record, see man 1 perf. >=20 > > I suggest you to have a look at Willy Tarreau's tool (httpterm). Yo= u > > may want to use my http client instead of inject32. > > > > http://1984.lsi.us.es/git/http-client-benchmark/ >=20 > I will check both, but yours wont compile with: >=20 > make > gcc -g -c alarm.c -o alarm.o > gcc -g -c client.c -o client.o > client.c: In function =E2=80=98print_alarm_cb=E2=80=99: > client.c:335:3: warning: format =E2=80=98%llu=E2=80=99 expects argume= nt of type =E2=80=98long > long unsigned int=E2=80=99, but argument 5 has type =E2=80=98uint64_t= =E2=80=99 [-Wformat] > client.c:335:3: warning: format =E2=80=98%u=E2=80=99 expects argument= of type > =E2=80=98unsigned int=E2=80=99, but argument 10 has type =E2=80=98__t= ime_t=E2=80=99 [-Wformat] > client.c:335:3: warning: format =E2=80=98%u=E2=80=99 expects argument= of type > =E2=80=98unsigned int=E2=80=99, but argument 11 has type =E2=80=98__s= useconds_t=E2=80=99 [-Wformat] > client.c: In function =E2=80=98main=E2=80=99: > client.c:404:5: error: variable-sized object may not be initialized > make: *** [all] Error 1 Interesting, I don't hit that problem here. I have applied one fix to git. Let me know if it compiles now. This tool is quite rudimentary, not documented and I think I'm the one using it for my benchmark evaluations. But it's very useful.