From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>,
netfilter-devel@vger.kernel.org, netdev <netdev@vger.kernel.org>,
Tom Herbert <therbert@google.com>,
Patrick McHardy <kaber@trash.net>
Subject: Re: [PATCH v2 nf-next] netfilter: conntrack: remove the central spinlock
Date: Fri, 24 May 2013 15:16:47 +0200 [thread overview]
Message-ID: <20130524151647.18388e27@redhat.com> (raw)
In-Reply-To: <1369244868.3301.343.camel@edumazet-glaptop>
On Wed, 22 May 2013 10:47:48 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> nf_conntrack_lock is a monolithic lock and suffers from huge
> contention on current generation servers (8 or more core/threads).
>
[...]
> Results on a 32 threads machine, 200 concurrent instances of "netperf
> -t TCP_CRR" :
>
> ~390000 tps instead of ~300000 tps.
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
I gave the patch a quick run in my testlab, and the results are
amazing, you are amazing Eric! :-)
Basic testlab setup:
I'm generating a 2700 Kpps SYN-flood against port 80 (with trafgen)
Baseline result from a 3.9.0-rc5 kernel:
- With nf_conntrack my performance is 749 Kpps.
If removing all iptables and nf_contrack modules:
- the performance hits 1095 Kpps.
But it looks like we are hitting a new spin_lock in ip_send_reply()
If start a LISTEN process on the port, then we hit the "old" SYN
scalability issues again, performance drops tp 227 Kpps.
On a patched net-next (close to 3.10.0-rc1) kernel, with Eric's new
locking scheme patch:
- I measured an amazing 2431 Kpps.
13.45% [kernel] [k] fib_table_lookup
9.07% [nf_conntrack] [k] __nf_conntrack_alloc
6.50% [nf_conntrack] [k] nf_conntrack_free
5.24% [ip_tables] [k] ipt_do_table
3.66% [nf_conntrack] [k] nf_conntrack_in
3.54% [kernel] [k] inet_getpeer
3.52% [nf_conntrack] [k] tcp_packet
2.44% [ixgbe] [k] ixgbe_poll
2.30% [kernel] [k] __ip_route_output_key
2.04% [nf_conntrack] [k] nf_conntrack_tuple_taken
1.98% [kernel] [k] icmp_send
Then, I realized that I didn't have any iptables rules that accepted
port 80 on my testlab system, thus this were basically a drop packets
test with a nf_conntrack lookup.
If I add a rule that accept new connection to that port e.g:
iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j
ACCEPT
New ruleset:
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
Then, performance drops again:
- to approx 883 Kpps.
Discover that the NAT stuff is to blame:
- 17.71% swapper [kernel.kallsyms] [k] _raw_spin_lock_bh
- _raw_spin_lock_bh
+ 47.17% nf_nat_cleanup_conntrack
+ 45.81% nf_nat_setup_info
+ 6.43% nf_nat_get_offset
Removing the nat modules, improves the performance:
- to 1182 Kpps (not listen on port 80)
sudo iptables -t nat -F
sudo rmmod iptable_nat nf_nat_ipv4
And the perf output looks more like what I would expect:
- 14.85% swapper [kernel.kallsyms] [k] _raw_spin_lock
- _raw_spin_lock
+ 82.86% mod_timer
+ 11.14% nf_conntrack_double_lock
+ 2.50% nf_ct_del_from_dying_or_unconfirmed_list
+ 1.48% nf_conntrack_in
+ 1.30% nf_ct_delete_from_lists
- 12.78% swapper [kernel.kallsyms] [k]
_raw_spin_lock_irqsave
- _raw_spin_lock_irqsave
- 99.44% lock_timer_base
+ 99.07% del_timer
+ 0.93% mod_timer
+ 2.69% swapper [ip_tables] [k] ipt_do_table
+ 2.28% ksoftirqd/0 [kernel.kallsyms] [k]
_raw_spin_lock_irqsave
+ 2.18% swapper [nf_conntrack] [k] tcp_packet
+ 2.16% swapper [kernel.kallsyms] [k] fib_table_lookup
Again if I start a LISTEN process on the port, performance drops to
169Kpps, due to the LISTEN and SYN-cookie scalability issues.
I'm amazed, this patch will actually make it a viable choice to load
the conntrack modules on a DDoS based filtering box, and use the
conntracks to protect against ACK and SYN+ACK attacks.
Simply by not accepting the ACK or SYN+ACK to create a conntrack entry.
Via the command:
sysctl -w net/netfilter/nf_conntrack_tcp_loose=0
A quick test show; now I can run a LISTEN process on the port, and
handle an SYN+ACK attack of approx 2580Kpps (and the same for ACK
attacks), while running a LISTEN process on the port.
Thanks for the great work Eric!
ps. also tested resizing the hash tables, both:
/proc/sys/net/netfilter/nf_conntrack_max
and resizing the buckets via:
/sys/module/nf_conntrack/parameters/hashsize
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2013-05-24 13:17 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-09 3:04 [PATCH nf-next] netfilter: conntrack: remove the central spinlock Eric Dumazet
2013-05-09 5:43 ` Cong Wang
2013-05-09 6:01 ` Eric Dumazet
2013-05-09 7:46 ` Cong Wang
2013-05-09 13:46 ` Eric Dumazet
2013-05-22 17:47 ` [PATCH v2 " Eric Dumazet
2013-05-22 18:20 ` Joe Perches
2013-05-22 19:26 ` Eric Dumazet
2013-05-22 19:57 ` Joe Perches
2013-05-22 20:16 ` Eric Dumazet
2013-05-22 20:38 ` Joe Perches
2013-05-22 20:48 ` Eric Dumazet
2013-05-22 21:12 ` Joe Perches
2013-05-22 21:29 ` David Miller
2013-05-22 21:34 ` Eric Dumazet
2013-05-24 13:16 ` Jesper Dangaard Brouer [this message]
2013-05-24 13:51 ` Eric Dumazet
2013-05-27 12:33 ` Jesper Dangaard Brouer
2013-05-27 12:36 ` Pablo Neira Ayuso
2013-08-23 14:42 ` Jesper Dangaard Brouer
2013-08-26 22:28 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130524151647.18388e27@redhat.com \
--to=jbrouer@redhat.com \
--cc=eric.dumazet@gmail.com \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=therbert@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).