From: Kerin Millar <kerframil@gmail.com>
To: netfilter-devel@vger.kernel.org
Subject: scheduling while atomic followed by oops upon conntrackd -c execution
Date: Fri, 02 Mar 2012 15:11:07 +0000 [thread overview]
Message-ID: <4F50E30B.6000704@gmail.com> (raw)
Hello,
I have recently set up a pair of Dell PowerEdge R610 servers (Xeon
X5650, 8GB RAM) for active-backup firewall duty. I've installed
conntrack-tools-1.0.1 and libnetfilter_conntrack-1.0.0 and am using the
FTFW mode for synchronization across a dedicated gigabit interface. The
active firewall has to contend with fairly heavy traffic, much of which
is in the form of long-lived TCP connections to an internal (LVS) load
balancer, behind which a bunch of application servers sit.
The number of active, concurrent connections to this service peaks at
around 480,000. At last count, the number of conntrack states was
785,785 which is typical. I have net.nf_conntrack_max set to 1048576 and
the nf_conntrack module is loaded with hashsize=262144. The firewall is
fully stateful in that new connections must match on -ctstate NEW. I'm
also using "-t raw -A PREROUTING -j CT --ctevents assured" as mentioned
in the docs.
This is my current test case for the backup:-
1) Boot the system and start conntrackd
2) Run conntrackd -n to sync with the active firewall
3) Run conntrackd -c to commit the states from the external cache
Originally, while conntrackd -c was performing its work, I would
experience protracted soft lockups. After some investigation, I noticed
that conntrackd was trying to more states than net.nf_conntrack_max
which, in turn, led me to this patch:-
https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=af14cca
Although Jozsef's patch was helpful, I'm still experiencing a nasty
kernel oops after conntrackd -c has finished executing. This always
occurs within 15 seconds or so - sometimes immediately. Here's a recent
netconsole trace from 3.3-rc5 + patch:-
http://paste.pocoo.org/raw/559736/
Though I ultimately intend to use the 3.0 kernel, I tried various other
versions going as far back as 2.6.32. In each case, an oops is
reproducible - though the details do vary. Using 3.3-rc5, I even noticed
a null ptr deref on one occcasion. Alas, I was unable to capture it at
the time.
Here's some other configuration information which may be useful ...
conntrackd.conf: http://paste.pocoo.org/raw/559727/
sysctl.conf: http://paste.pocoo.org/raw/559726/
kernel .config: http://paste.pocoo.org/raw/559725/
It's perhaps worth noting that I followed the advice to set HashLimit in
conntrackd.conf to at least double that of net.nf_conntrack_max
(commented in my config because I was experimenting with the issue that
Jozef's patch rectifies). One thing that puzzles me is why conntrackd
always tries to commit more state entries than can be accommodated. On
the master, the internal cache grows to the maximum size and, afaict,
nothing is ever expired. This is from the master which has been up for a
while ...
# conntrackd -s | head -n 5
cache internal:
current active connections: 2097152
connections created: 31649757 failed: 234788761
connections updated: 105516073 failed: 0
connections destroyed: 29552605 failed: 0
# conntrack -S | head -n1
entries 792495
It seems that the cache usage grows to the maximum, at which point the
creation failed counter starts going skyward. On the backup, it seems
that conntrackd -n && conntrackd -c tries to commit all of this, but I
don't really understand why.
Any advice would be most welcome. I can't tinker too much with the
active firewall at this point but, if it helps, I can conduct any number
of tests with the backup.
Cheers,
--Kerin
next reply other threads:[~2012-03-02 15:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-02 15:11 Kerin Millar [this message]
2012-03-03 13:30 ` scheduling while atomic followed by oops upon conntrackd -c execution Pablo Neira Ayuso
2012-03-03 17:49 ` Kerin Millar
2012-03-03 18:47 ` Kerin Millar
2012-03-04 11:01 ` Pablo Neira Ayuso
2012-03-05 17:19 ` Kerin Millar
2012-03-06 11:14 ` Pablo Neira Ayuso
2012-03-06 16:42 ` Kerin Millar
2012-03-06 17:23 ` Pablo Neira Ayuso
2012-03-06 22:37 ` Kerin Millar
2012-03-07 14:41 ` Kerin Millar
2012-03-08 1:33 ` Pablo Neira Ayuso
2012-03-08 11:00 ` Kerin Millar
2012-03-08 11:29 ` Kerin Millar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F50E30B.6000704@gmail.com \
--to=kerframil@gmail.com \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).