* scheduling while atomic followed by oops upon conntrackd -c execution @ 2012-03-02 15:11 Kerin Millar 2012-03-03 13:30 ` Pablo Neira Ayuso 0 siblings, 1 reply; 14+ messages in thread From: Kerin Millar @ 2012-03-02 15:11 UTC (permalink / raw) To: netfilter-devel Hello, I have recently set up a pair of Dell PowerEdge R610 servers (Xeon X5650, 8GB RAM) for active-backup firewall duty. I've installed conntrack-tools-1.0.1 and libnetfilter_conntrack-1.0.0 and am using the FTFW mode for synchronization across a dedicated gigabit interface. The active firewall has to contend with fairly heavy traffic, much of which is in the form of long-lived TCP connections to an internal (LVS) load balancer, behind which a bunch of application servers sit. The number of active, concurrent connections to this service peaks at around 480,000. At last count, the number of conntrack states was 785,785 which is typical. I have net.nf_conntrack_max set to 1048576 and the nf_conntrack module is loaded with hashsize=262144. The firewall is fully stateful in that new connections must match on -ctstate NEW. I'm also using "-t raw -A PREROUTING -j CT --ctevents assured" as mentioned in the docs. This is my current test case for the backup:- 1) Boot the system and start conntrackd 2) Run conntrackd -n to sync with the active firewall 3) Run conntrackd -c to commit the states from the external cache Originally, while conntrackd -c was performing its work, I would experience protracted soft lockups. After some investigation, I noticed that conntrackd was trying to more states than net.nf_conntrack_max which, in turn, led me to this patch:- https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=af14cca Although Jozsef's patch was helpful, I'm still experiencing a nasty kernel oops after conntrackd -c has finished executing. This always occurs within 15 seconds or so - sometimes immediately. Here's a recent netconsole trace from 3.3-rc5 + patch:- http://paste.pocoo.org/raw/559736/ Though I ultimately intend to use the 3.0 kernel, I tried various other versions going as far back as 2.6.32. In each case, an oops is reproducible - though the details do vary. Using 3.3-rc5, I even noticed a null ptr deref on one occcasion. Alas, I was unable to capture it at the time. Here's some other configuration information which may be useful ... conntrackd.conf: http://paste.pocoo.org/raw/559727/ sysctl.conf: http://paste.pocoo.org/raw/559726/ kernel .config: http://paste.pocoo.org/raw/559725/ It's perhaps worth noting that I followed the advice to set HashLimit in conntrackd.conf to at least double that of net.nf_conntrack_max (commented in my config because I was experimenting with the issue that Jozef's patch rectifies). One thing that puzzles me is why conntrackd always tries to commit more state entries than can be accommodated. On the master, the internal cache grows to the maximum size and, afaict, nothing is ever expired. This is from the master which has been up for a while ... # conntrackd -s | head -n 5 cache internal: current active connections: 2097152 connections created: 31649757 failed: 234788761 connections updated: 105516073 failed: 0 connections destroyed: 29552605 failed: 0 # conntrack -S | head -n1 entries 792495 It seems that the cache usage grows to the maximum, at which point the creation failed counter starts going skyward. On the backup, it seems that conntrackd -n && conntrackd -c tries to commit all of this, but I don't really understand why. Any advice would be most welcome. I can't tinker too much with the active firewall at this point but, if it helps, I can conduct any number of tests with the backup. Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-02 15:11 scheduling while atomic followed by oops upon conntrackd -c execution Kerin Millar @ 2012-03-03 13:30 ` Pablo Neira Ayuso 2012-03-03 17:49 ` Kerin Millar 2012-03-03 18:47 ` Kerin Millar 0 siblings, 2 replies; 14+ messages in thread From: Pablo Neira Ayuso @ 2012-03-03 13:30 UTC (permalink / raw) To: Kerin Millar; +Cc: netfilter-devel Hi, On Fri, Mar 02, 2012 at 03:11:07PM +0000, Kerin Millar wrote: > Hello, > > I have recently set up a pair of Dell PowerEdge R610 servers (Xeon > X5650, 8GB RAM) for active-backup firewall duty. I've installed > conntrack-tools-1.0.1 and libnetfilter_conntrack-1.0.0 and am using > the FTFW mode for synchronization across a dedicated gigabit > interface. The active firewall has to contend with fairly heavy > traffic, much of which is in the form of long-lived TCP connections > to an internal (LVS) load balancer, behind which a bunch of > application servers sit. > > The number of active, concurrent connections to this service peaks > at around 480,000. At last count, the number of conntrack states was > 785,785 which is typical. I have net.nf_conntrack_max set to 1048576 > and the nf_conntrack module is loaded with hashsize=262144. The > firewall is fully stateful in that new connections must match on > -ctstate NEW. I'm also using "-t raw -A PREROUTING -j CT --ctevents > assured" as mentioned in the docs. Docs explictly says that you require Linux kernel >= 2.6.38 to use this filtering. You seem to be using 2.6.32. > This is my current test case for the backup:- > > 1) Boot the system and start conntrackd > 2) Run conntrackd -n to sync with the active firewall > 3) Run conntrackd -c to commit the states from the external cache > > Originally, while conntrackd -c was performing its work, I would > experience protracted soft lockups. After some investigation, I > noticed that conntrackd was trying to more states than > net.nf_conntrack_max which, in turn, led me to this patch:- > > https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=af14cca I just posted another patch to the ML that is a relative fix to Jozsef's patch. You have to apply that as well. > Although Jozsef's patch was helpful, I'm still experiencing a nasty > kernel oops after conntrackd -c has finished executing. This always > occurs within 15 seconds or so - sometimes immediately. Here's a > recent netconsole trace from 3.3-rc5 + patch:- > > http://paste.pocoo.org/raw/559736/ It seems ctnetlink is trying to load nf_nat over and over again, but it doesn't seem to find it. One of the firewalls seem to be performing NAT but the other doesn't have access to the NAT module. This is strange, I guess you have the same rule-set loaded in both firewalls correctly. > Though I ultimately intend to use the 3.0 kernel, I tried various > other versions going as far back as 2.6.32. In each case, an oops is > reproducible - though the details do vary. Using 3.3-rc5, I even > noticed a null ptr deref on one occcasion. Alas, I was unable to > capture it at the time. For reporting problems, you have to stick latest Linux kernel version. 2.6.32 is rather old kernel. > Here's some other configuration information which may be useful ... > > conntrackd.conf: http://paste.pocoo.org/raw/559727/ Options { TCPWindowTracking On } You cannot use this with 2.6.32 either. It's also documented in the user manual and the example config file (it requires 2.6.36). Please, take the time to read the docs. > sysctl.conf: http://paste.pocoo.org/raw/559726/ > kernel .config: http://paste.pocoo.org/raw/559725/ > > It's perhaps worth noting that I followed the advice to set > HashLimit in conntrackd.conf to at least double that of > net.nf_conntrack_max (commented in my config because I was > experimenting with the issue that Jozef's patch rectifies). One > thing that puzzles me is why conntrackd always tries to commit more > state entries than can be accommodated. On the master, the internal > cache grows to the maximum size and, afaict, nothing is ever > expired. This is from the master which has been up for a while ... > > # conntrackd -s | head -n 5 > cache internal: > current active connections: 2097152 > connections created: 31649757 failed: 234788761 > connections updated: 105516073 failed: 0 > connections destroyed: 29552605 failed: 0 > > # conntrack -S | head -n1 > entries 792495 > > It seems that the cache usage grows to the maximum, at which point > the creation failed counter starts going skyward. On the backup, it > seems that conntrackd -n && conntrackd -c tries to commit all of > this, but I don't really understand why. > > Any advice would be most welcome. I can't tinker too much with the > active firewall at this point but, if it helps, I can conduct any > number of tests with the backup. I need that you stick to a reasonable configuration to help you. Then, we can fix issues, if any shows up. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-03 13:30 ` Pablo Neira Ayuso @ 2012-03-03 17:49 ` Kerin Millar 2012-03-03 18:47 ` Kerin Millar 1 sibling, 0 replies; 14+ messages in thread From: Kerin Millar @ 2012-03-03 17:49 UTC (permalink / raw) To: netfilter-devel Hi Pablo, On 03/03/2012 13:30, Pablo Neira Ayuso wrote: > Hi, > > On Fri, Mar 02, 2012 at 03:11:07PM +0000, Kerin Millar wrote: >> Hello, >> >> I have recently set up a pair of Dell PowerEdge R610 servers (Xeon >> X5650, 8GB RAM) for active-backup firewall duty. I've installed >> conntrack-tools-1.0.1 and libnetfilter_conntrack-1.0.0 and am using >> the FTFW mode for synchronization across a dedicated gigabit >> interface. The active firewall has to contend with fairly heavy >> traffic, much of which is in the form of long-lived TCP connections >> to an internal (LVS) load balancer, behind which a bunch of >> application servers sit. >> >> The number of active, concurrent connections to this service peaks >> at around 480,000. At last count, the number of conntrack states was >> 785,785 which is typical. I have net.nf_conntrack_max set to 1048576 >> and the nf_conntrack module is loaded with hashsize=262144. The >> firewall is fully stateful in that new connections must match on >> -ctstate NEW. I'm also using "-t raw -A PREROUTING -j CT --ctevents >> assured" as mentioned in the docs. > > Docs explictly says that you require Linux kernel>= 2.6.38 to use > this filtering. You seem to be using 2.6.32. I'm aware of this requirement. In point of fact, I am using 3.3-rc5 as indicated by the head of the submitted .config and the words "Here's a recent netconsole trace from 3.3-rc5 ..." The only reference to 2.6.32 was to mention in passing that "I tried various other versions going as far back as 2.6.32". That's because I wanted to establish whether it could be considered a regression. Neither the configuration modifications required for - nor the outcome of - using 2.6.32 was the subject of the post. My point was that *all* tested versions crash under the test case, be they old or bleeding edge. It was actually a typo; I meant to say that I went "as far back as 2.6.33" but that seems neither here nor there. See below for the exact versions tested. >> This is my current test case for the backup:- >> >> 1) Boot the system and start conntrackd >> 2) Run conntrackd -n to sync with the active firewall >> 3) Run conntrackd -c to commit the states from the external cache >> >> Originally, while conntrackd -c was performing its work, I would >> experience protracted soft lockups. After some investigation, I >> noticed that conntrackd was trying to more states than >> net.nf_conntrack_max which, in turn, led me to this patch:- >> >> https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=af14cca > > I just posted another patch to the ML that is a relative fix to > Jozsef's patch. You have to apply that as well. Presumably, that would be the one removing the spinlocks? I'll try that now. > >> Although Jozsef's patch was helpful, I'm still experiencing a nasty >> kernel oops after conntrackd -c has finished executing. This always >> occurs within 15 seconds or so - sometimes immediately. Here's a >> recent netconsole trace from 3.3-rc5 + patch:- >> >> http://paste.pocoo.org/raw/559736/ > > It seems ctnetlink is trying to load nf_nat over and over again, but > it doesn't seem to find it. One of the firewalls seem to be performing > NAT but the other doesn't have access to the NAT module. This is > strange, I guess you have the same rule-set loaded in both firewalls > correctly. Yes, the ruleset is identical. Furthermore, the hardware is identical and the software is identical except that the master continues to run 3.1.10 as originally deployed because I can't bring this down until a reliable failover process is probable. Here are the modules which are shown as loaded after the ruleset has been initialised and prior to conntrackd being initialised:- # lsmod | awk 'NR!=1{print $1}' | tr '\n' ' ' iptable_nat nf_nat xt_CT xt_multiport xt_NOTRACK iptable_raw xt_conntrack xt_u32 xt_limit xt_recent xt_addrtype ipt_REJECT xt_comment ipt_LOG iptable_filter ip_tables nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iTCO_wdt After initialising conntrackd, the nfnetlink and nf_conntrack_netlink modules are dynamically loaded in addition to the above. The pattern you describe is bothersome but it's what happens after conntrackd -c has finished that really worries me! As alluded to before, the crash dump varies depending on the kernel version. What remains entirely consistent is that the kernel crashes spectacularly within approximately 15 seconds of conntrackd -c returning to prompt. > >> Though I ultimately intend to use the 3.0 kernel, I tried various >> other versions going as far back as 2.6.32. In each case, an oops is >> reproducible - though the details do vary. Using 3.3-rc5, I even >> noticed a null ptr deref on one occcasion. Alas, I was unable to >> capture it at the time. > > For reporting problems, you have to stick latest Linux kernel version. > 2.6.32 is rather old kernel. I submitted my report based on the results in 3.3-rc5 specifically so as to be amenable to upstream. Precis:- 1) Both systems are deployed with 3.1.10 - issue is then discovered 2) Tested 3.2.9 on backup 3) Tested 2.6.33.20 on backup (with appropriate config modifications) 4) Tested 3.3.0-rc5 + Jozsef's patch on backup 5) Tested 3.3.0-62d222b+ on backup (pulled from linux-2.6 git tree) The results submitted in my original post were for case (4). > >> Here's some other configuration information which may be useful ... >> >> conntrackd.conf: http://paste.pocoo.org/raw/559727/ > > Options { > TCPWindowTracking On > } > > You cannot use this with 2.6.32 either. It's also documented in the > user manual and the example config file (it requires 2.6.36). Please, > take the time to read the docs. I'm aware of this. > >> sysctl.conf: http://paste.pocoo.org/raw/559726/ >> kernel .config: http://paste.pocoo.org/raw/559725/ >> >> It's perhaps worth noting that I followed the advice to set >> HashLimit in conntrackd.conf to at least double that of >> net.nf_conntrack_max (commented in my config because I was >> experimenting with the issue that Jozef's patch rectifies). One >> thing that puzzles me is why conntrackd always tries to commit more >> state entries than can be accommodated. On the master, the internal >> cache grows to the maximum size and, afaict, nothing is ever >> expired. This is from the master which has been up for a while ... >> >> # conntrackd -s | head -n 5 >> cache internal: >> current active connections: 2097152 >> connections created: 31649757 failed: 234788761 >> connections updated: 105516073 failed: 0 >> connections destroyed: 29552605 failed: 0 >> >> # conntrack -S | head -n1 >> entries 792495 >> >> It seems that the cache usage grows to the maximum, at which point >> the creation failed counter starts going skyward. On the backup, it >> seems that conntrackd -n&& conntrackd -c tries to commit all of >> this, but I don't really understand why. >> >> Any advice would be most welcome. I can't tinker too much with the >> active firewall at this point but, if it helps, I can conduct any >> number of tests with the backup. > > I need that you stick to a reasonable configuration to help you. Then, > we can fix issues, if any shows up. What is unreasonable about my configuration? Even if it were, the inference that a crash is not an issue because the user may or may not have followed best practice is one I find somewhat perplexing. I don't see how it could possibly be deemed that there is not an issue with the kernel's behaviour here. Given that the misunderstanding over 2.6.32 has been put to bed, if you still have any concerns as to the nature of my configuration, by all means please air them and I'll take remedial action. Further, the servers are equipped with remote access cards so I'm ready and willing to do whatever is necessary to conduct and/or assist any further diagnosis. Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-03 13:30 ` Pablo Neira Ayuso 2012-03-03 17:49 ` Kerin Millar @ 2012-03-03 18:47 ` Kerin Millar 2012-03-04 11:01 ` Pablo Neira Ayuso 1 sibling, 1 reply; 14+ messages in thread From: Kerin Millar @ 2012-03-03 18:47 UTC (permalink / raw) To: netfilter-devel Hi, On 03/03/2012 13:30, Pablo Neira Ayuso wrote: > I just posted another patch to the ML that is a relative fix to > Jozsef's patch. You have to apply that as well. I've now tested 3.3-rc5 with the addition of the above mentioned follow-on patch. The behaviour during conntrackd -c execution is clearly much improved - in so far as it doesn't generate much noise - but the crash that follows remains. Here's a netconsole capture:- http://paste.pocoo.org/raw/560439/ Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-03 18:47 ` Kerin Millar @ 2012-03-04 11:01 ` Pablo Neira Ayuso 2012-03-05 17:19 ` Kerin Millar 0 siblings, 1 reply; 14+ messages in thread From: Pablo Neira Ayuso @ 2012-03-04 11:01 UTC (permalink / raw) To: Kerin Millar; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1001 bytes --] Hi Kerin, On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote: > Hi, > > On 03/03/2012 13:30, Pablo Neira Ayuso wrote: > >I just posted another patch to the ML that is a relative fix to > >Jozsef's patch. You have to apply that as well. > > I've now tested 3.3-rc5 with the addition of the above mentioned > follow-on patch. The behaviour during conntrackd -c execution is > clearly much improved - in so far as it doesn't generate much noise > - but the crash that follows remains. Here's a netconsole capture:- > > http://paste.pocoo.org/raw/560439/ Great to know :-). Regarding your previous email, I'm sorry, by reading your email I thought you were using 2.6.32 which was not the case, your configuration is perfectly reasonable. It seems we still have problems regarding early_drop, but this time with reliable event delivery enabled (15 seconds is the time that is required to retry sending the destroy event). If you can test the following patch, I'll appreciate. Thank you. [-- Attachment #2: 0001-netfilter-nf_conntrack-fix-early_drop-with-reliable-.patch --] [-- Type: text/x-diff, Size: 1214 bytes --] >From 1320c099d618a278fa17715127d6fecca2786a36 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso <pablo@netfilter.org> Date: Sun, 4 Mar 2012 11:34:06 +0100 Subject: [PATCH] netfilter: nf_conntrack: fix early_drop with reliable event delivery With reliable event delivery is enabled, if we fail to deliver the destroy event in early_drop, we put out one entry that is still in the dying list. Reported-by: Kerin Millar <kerframil@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> --- net/netfilter/nf_conntrack_core.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index ed86a3b..7d2d641 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -635,6 +635,11 @@ static noinline int early_drop(struct net *net, unsigned int hash) if (del_timer(&ct->timeout)) { death_by_timeout((unsigned long)ct); + /* Check if we indeed killed this entry. Reliable event + delivery may insert this into the dying list. */ + if (!test_bit(IPS_DYING_BIT, &ct->status)) + return dropped; + dropped = 1; NF_CT_STAT_INC_ATOMIC(net, early_drop); } -- 1.7.7.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-04 11:01 ` Pablo Neira Ayuso @ 2012-03-05 17:19 ` Kerin Millar 2012-03-06 11:14 ` Pablo Neira Ayuso 0 siblings, 1 reply; 14+ messages in thread From: Kerin Millar @ 2012-03-05 17:19 UTC (permalink / raw) To: netfilter-devel Hi Pablo, On 04/03/2012 11:01, Pablo Neira Ayuso wrote: > Hi Kerin, > > On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote: >> Hi, >> >> On 03/03/2012 13:30, Pablo Neira Ayuso wrote: >>> I just posted another patch to the ML that is a relative fix to >>> Jozsef's patch. You have to apply that as well. >> >> I've now tested 3.3-rc5 with the addition of the above mentioned >> follow-on patch. The behaviour during conntrackd -c execution is >> clearly much improved - in so far as it doesn't generate much noise >> - but the crash that follows remains. Here's a netconsole capture:- >> >> http://paste.pocoo.org/raw/560439/ > > Great to know :-). I apologize but I think I may have led you astray on the nf_nat issue. At the time of submitting my original report, I now believe that the nf_nat module wasn't loaded prior to starting conntrackd, although it was definitely available. For all tests that followed, however, I am entirely certain the the nf_nat module was loaded in advance. The upshot is that my claim that things had improved may have been premature; I need to specifically test under both circumstances to be sure that things are improving. That is, both with and without the module loaded in advance. Following my own advice then, I first tried going through my test case *without* loading nf_nat in advance. Alas, conntrackd -c triggered hard lockups and didn't return to prompt. Here are the results:- http://paste.pocoo.org/raw/561350/ In case it matters, the existing ssh session continued to respond to input but I was no longer able to initiate any new sessions. > > Regarding your previous email, I'm sorry, by reading your email I > thought you were using 2.6.32 which was not the case, your > configuration is perfectly reasonable. > > It seems we still have problems regarding early_drop, but this time > with reliable event delivery enabled (15 seconds is the time that > is required to retry sending the destroy event). > > If you can test the following patch, I'll appreciate. Gladly. I applied the patch to my 3.3-rc5 tree, which is still carrying the two patches discussed earlier in the thread. I then went through my test case under normal circumstances i.e. all firewall rules in place, nf_nat confirmed present before conntrackd etc. Again, conntrackd -c did not return to prompt. Here are the results:- http://paste.pocoo.org/raw/561354/ Well, at least there was no oops this time. I should also add that the patch was present for both of the tests mentioned in this email. --- Incidentally, I found out why the internal cache on the master was filling up to capacity. It was apparently due to the use of "iptables -I PREROUTING -t raw -j CT --ctevents assured". Perhaps I'm missing something but doesn't this stop events such as new and destroy from being propagated? An inspection with conntrack -E suggests so. Once I removed the above rule, I could see destroy events being propagated and the number of active connections in the cache no longer exceeded my chosen limit of 2097152 ... # conntrack -S | head -n1; conntrackd -s | head -n2 entries 725826 cache internal: current active connections: 1409472 Whatever the case, I'm quite happy to go without this rule as these systems are coping fine with the load incurred by conntrackd. Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-05 17:19 ` Kerin Millar @ 2012-03-06 11:14 ` Pablo Neira Ayuso 2012-03-06 16:42 ` Kerin Millar 0 siblings, 1 reply; 14+ messages in thread From: Pablo Neira Ayuso @ 2012-03-06 11:14 UTC (permalink / raw) To: Kerin Millar; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 5468 bytes --] Hi Kerin, On Mon, Mar 05, 2012 at 05:19:49PM +0000, Kerin Millar wrote: > Hi Pablo, > > On 04/03/2012 11:01, Pablo Neira Ayuso wrote: > >Hi Kerin, > > > >On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote: > >>Hi, > >> > >>On 03/03/2012 13:30, Pablo Neira Ayuso wrote: > >>>I just posted another patch to the ML that is a relative fix to > >>>Jozsef's patch. You have to apply that as well. > >> > >>I've now tested 3.3-rc5 with the addition of the above mentioned > >>follow-on patch. The behaviour during conntrackd -c execution is > >>clearly much improved - in so far as it doesn't generate much noise > >>- but the crash that follows remains. Here's a netconsole capture:- > >> > >>http://paste.pocoo.org/raw/560439/ > > > >Great to know :-). > > I apologize but I think I may have led you astray on the nf_nat > issue. At the time of submitting my original report, I now believe > that the nf_nat module wasn't loaded prior to starting conntrackd, > although it was definitely available. For all tests that followed, > however, I am entirely certain the the nf_nat module was loaded in > advance. The upshot is that my claim that things had improved may > have been premature; I need to specifically test under both > circumstances to be sure that things are improving. That is, both > with and without the module loaded in advance. > > Following my own advice then, I first tried going through my test > case *without* loading nf_nat in advance. Alas, conntrackd -c > triggered hard lockups and didn't return to prompt. Here are the > results:- > > http://paste.pocoo.org/raw/561350/ > > In case it matters, the existing ssh session continued to respond to > input but I was no longer able to initiate any new sessions. > > > > >Regarding your previous email, I'm sorry, by reading your email I > >thought you were using 2.6.32 which was not the case, your > >configuration is perfectly reasonable. > > > >It seems we still have problems regarding early_drop, but this time > >with reliable event delivery enabled (15 seconds is the time that > >is required to retry sending the destroy event). > > > >If you can test the following patch, I'll appreciate. > > Gladly. I applied the patch to my 3.3-rc5 tree, which is still > carrying the two patches discussed earlier in the thread. I then > went through my test case under normal circumstances i.e. all > firewall rules in place, nf_nat confirmed present before conntrackd > etc. Again, conntrackd -c did not return to prompt. Here are the > results:- > > http://paste.pocoo.org/raw/561354/ > > Well, at least there was no oops this time. I should also add that > the patch was present for both of the tests mentioned in this email. Previous patch that I sent you was not OK, sorry. I have committed the following to my git tree: http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 I've been using the following tools that you can find enclosed to this email, they are much more simple than conntrackd but, they do the same in essence: * conntrack_stress.c * conntrack_events.c gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress gcc -lnetfilter_conntrack conntrack_events.c -o ct_events Then, to listen to events with reliable event delivery enabled: # ./ct_events & And to create loads of flow entries in ASSURED state: # ./ct_stress 65535 # that's my ct table size in my laptop You'll hit ENOMEM errors at some point, that's fine, but no oops or lockups happen here. I have pushed this tools to the qa/ directory under libnetfilter_conntrack: commit 94e75add9867fb6f0e05e73b23f723f139da829e Author: Pablo Neira Ayuso <pablo@netfilter.org> Date: Tue Mar 6 12:10:55 2012 +0100 qa: add some stress tools to test conntrack via ctnetlink (BTW, ct_stress may disrupt your network connection since the table gets filled. You can use conntrack -F to get the ct table empty again). > --- > Incidentally, I found out why the internal cache on the master was > filling up to capacity. It was apparently due to the use of > "iptables -I PREROUTING -t raw -j CT --ctevents assured". Perhaps > I'm missing something but doesn't this stop events such as new and > destroy from being propagated? An inspection with conntrack -E > suggests so. Once I removed the above rule, I could see destroy > events being propagated and the number of active connections in the > cache no longer exceeded my chosen limit of 2097152 ... Yes, that line was wrong, I have fixed in the documentation, the correct one must be: iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy Thus, destroy events are delivered to user-space. > # conntrack -S | head -n1; conntrackd -s | head -n2 > entries 725826 > cache internal: > current active connections: 1409472 > > Whatever the case, I'm quite happy to go without this rule as these > systems are coping fine with the load incurred by conntrackd. I want to get things fixed, please, don't give up on using that rule yet :-). Regarding the hardlockups. I'd be happy if you can re-do the tests, both with conntrackd and the tools that I send you. Make sure you have these three patches, note that the last one has changed. http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2 http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60 http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 Thanks! [-- Attachment #2: conntrack_events.c --] [-- Type: text/x-csrc, Size: 1153 bytes --] #include <stdio.h> #include <stdlib.h> #include <string.h> #include <errno.h> #include <libnetfilter_conntrack/libnetfilter_conntrack.h> static int event_cb(enum nf_conntrack_msg_type type, struct nf_conntrack *ct, void *data) { static int i = 0; static int new, destroy; if (type == NFCT_T_NEW) new++; else if (type == NFCT_T_DESTROY) destroy++; if ((++i % 10000) == 0) printf("%d events received (%d new, %d destroy)\n", i, new, destroy); return NFCT_CB_CONTINUE; } int main(void) { int ret; struct nfct_handle *h; int on = 1; h = nfct_open(CONNTRACK, NFCT_ALL_CT_GROUPS); if (!h) { perror("nfct_open"); return 0; } setsockopt(nfct_fd(h), SOL_NETLINK, NETLINK_BROADCAST_SEND_ERROR, &on, sizeof(int)); setsockopt(nfct_fd(h), SOL_NETLINK, NETLINK_NO_ENOBUFS, &on, sizeof(int)); nfct_callback_register(h, NFCT_T_ALL, event_cb, NULL); printf("TEST: waiting for events...\n"); ret = nfct_catch(h); printf("TEST: conntrack events "); if (ret == -1) printf("(%d)(%s)\n", ret, strerror(errno)); else printf("(OK)\n"); nfct_close(h); ret == -1 ? exit(EXIT_FAILURE) : exit(EXIT_SUCCESS); } [-- Attachment #3: conntrack_stress.c --] [-- Type: text/x-csrc, Size: 1487 bytes --] #include <stdio.h> #include <stdlib.h> #include <string.h> #include <errno.h> #include <arpa/inet.h> #include <libnetfilter_conntrack/libnetfilter_conntrack.h> #include <libnetfilter_conntrack/libnetfilter_conntrack_tcp.h> int main(int argc, char *argv[]) { time_t t; int ret, i, r; struct nfct_handle *h; struct nf_conntrack *ct; if (argc < 2) { fprintf(stderr, "Usage: %s [ct_table_size]\n", argv[0]); exit(EXIT_FAILURE); } time(&t); srandom(t); r = random(); ct = nfct_new(); if (!ct) { perror("nfct_new"); return 0; } h = nfct_open(CONNTRACK, 0); if (!h) { perror("nfct_open"); nfct_destroy(ct); return -1; } for (i = r;i < (r + atoi(argv[1]) * 2); i++) { nfct_set_attr_u8(ct, ATTR_L3PROTO, AF_INET); nfct_set_attr_u32(ct, ATTR_IPV4_SRC, inet_addr("1.1.1.1") + i); nfct_set_attr_u32(ct, ATTR_IPV4_DST, inet_addr("2.2.2.2") + i); nfct_set_attr_u8(ct, ATTR_L4PROTO, IPPROTO_TCP); nfct_set_attr_u16(ct, ATTR_PORT_SRC, htons(10)); nfct_set_attr_u16(ct, ATTR_PORT_DST, htons(20)); nfct_setobjopt(ct, NFCT_SOPT_SETUP_REPLY); nfct_set_attr_u8(ct, ATTR_TCP_STATE, TCP_CONNTRACK_ESTABLISHED); nfct_set_attr_u32(ct, ATTR_TIMEOUT, 1000); nfct_set_attr_u32(ct, ATTR_STATUS, IPS_ASSURED); if (i % 10000 == 0) printf("added %d flow entries\n", i); ret = nfct_query(h, NFCT_Q_CREATE, ct); if (ret == -1) perror("nfct_query: "); } nfct_close(h); nfct_destroy(ct); ret == -1 ? exit(EXIT_FAILURE) : exit(EXIT_SUCCESS); } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-06 11:14 ` Pablo Neira Ayuso @ 2012-03-06 16:42 ` Kerin Millar 2012-03-06 17:23 ` Pablo Neira Ayuso 0 siblings, 1 reply; 14+ messages in thread From: Kerin Millar @ 2012-03-06 16:42 UTC (permalink / raw) To: netfilter-devel Hi Pablo, On 06/03/2012 11:14, Pablo Neira Ayuso wrote: <snip> >> Gladly. I applied the patch to my 3.3-rc5 tree, which is still >> carrying the two patches discussed earlier in the thread. I then >> went through my test case under normal circumstances i.e. all >> firewall rules in place, nf_nat confirmed present before conntrackd >> etc. Again, conntrackd -c did not return to prompt. Here are the >> results:- >> >> http://paste.pocoo.org/raw/561354/ >> >> Well, at least there was no oops this time. I should also add that >> the patch was present for both of the tests mentioned in this email. > > Previous patch that I sent you was not OK, sorry. I have committed the > following to my git tree: > > http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 Noted. > > I've been using the following tools that you can find enclosed to this > email, they are much more simple than conntrackd but, they do the same > in essence: > > * conntrack_stress.c > * conntrack_events.c > > gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress > gcc -lnetfilter_conntrack conntrack_events.c -o ct_events > > Then, to listen to events with reliable event delivery enabled: > > # ./ct_events& > > And to create loads of flow entries in ASSURED state: > > # ./ct_stress 65535 # that's my ct table size in my laptop > > You'll hit ENOMEM errors at some point, that's fine, but no oops or > lockups happen here. > > I have pushed this tools to the qa/ directory under > libnetfilter_conntrack: > > commit 94e75add9867fb6f0e05e73b23f723f139da829e > Author: Pablo Neira Ayuso<pablo@netfilter.org> > Date: Tue Mar 6 12:10:55 2012 +0100 > > qa: add some stress tools to test conntrack via ctnetlink > > (BTW, ct_stress may disrupt your network connection since the table > gets filled. You can use conntrack -F to get the ct table empty again). > Sorry if this is a silly question but should conntrackd be running while I conduct this stress test? If so, is there any danger of the master becoming unstable? I must ask because, if the stability of the master is compromised, I will be in big trouble ;) <snip> > Yes, that line was wrong, I have fixed in the documentation, the > correct one must be: > > iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy > > Thus, destroy events are delivered to user-space. > >> # conntrack -S | head -n1; conntrackd -s | head -n2 >> entries 725826 >> cache internal: >> current active connections: 1409472 >> >> Whatever the case, I'm quite happy to go without this rule as these >> systems are coping fine with the load incurred by conntrackd. > > I want to get things fixed, please, don't give up on using that rule > yet :-). Sure. I've re-instated the rule as requested. With the addition of destroy events, cache usage remains under control. > > Regarding the hardlockups. I'd be happy if you can re-do the tests, > both with conntrackd and the tools that I send you. > > Make sure you have these three patches, note that the last one has > changed. > > http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2 > http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60 > http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 > Duly applied to a fresh 3.3-rc5 tree. Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-06 16:42 ` Kerin Millar @ 2012-03-06 17:23 ` Pablo Neira Ayuso 2012-03-06 22:37 ` Kerin Millar 0 siblings, 1 reply; 14+ messages in thread From: Pablo Neira Ayuso @ 2012-03-06 17:23 UTC (permalink / raw) To: Kerin Millar; +Cc: netfilter-devel On Tue, Mar 06, 2012 at 04:42:02PM +0000, Kerin Millar wrote: > Hi Pablo, > > On 06/03/2012 11:14, Pablo Neira Ayuso wrote: > > <snip> > > >>Gladly. I applied the patch to my 3.3-rc5 tree, which is still > >>carrying the two patches discussed earlier in the thread. I then > >>went through my test case under normal circumstances i.e. all > >>firewall rules in place, nf_nat confirmed present before conntrackd > >>etc. Again, conntrackd -c did not return to prompt. Here are the > >>results:- > >> > >>http://paste.pocoo.org/raw/561354/ > >> > >>Well, at least there was no oops this time. I should also add that > >>the patch was present for both of the tests mentioned in this email. > > > >Previous patch that I sent you was not OK, sorry. I have committed the > >following to my git tree: > > > >http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 > > Noted. > > > > >I've been using the following tools that you can find enclosed to this > >email, they are much more simple than conntrackd but, they do the same > >in essence: > > > >* conntrack_stress.c > >* conntrack_events.c > > > >gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress > >gcc -lnetfilter_conntrack conntrack_events.c -o ct_events > > > >Then, to listen to events with reliable event delivery enabled: > > > ># ./ct_events& > > > >And to create loads of flow entries in ASSURED state: > > > ># ./ct_stress 65535 # that's my ct table size in my laptop > > > >You'll hit ENOMEM errors at some point, that's fine, but no oops or > >lockups happen here. > > > >I have pushed this tools to the qa/ directory under > >libnetfilter_conntrack: > > > >commit 94e75add9867fb6f0e05e73b23f723f139da829e > >Author: Pablo Neira Ayuso<pablo@netfilter.org> > >Date: Tue Mar 6 12:10:55 2012 +0100 > > > > qa: add some stress tools to test conntrack via ctnetlink > > > >(BTW, ct_stress may disrupt your network connection since the table > >gets filled. You can use conntrack -F to get the ct table empty again). > > > > Sorry if this is a silly question but should conntrackd be running > while I conduct this stress test? If so, is there any danger of the > master becoming unstable? I must ask because, if the stability of > the master is compromised, I will be in big trouble ;) If you run this in the backup, conntrackd will spam the master with lots of new flows in the external cache. That shouldn't be a problem (just a bit of extra load invested in the replication). But if you run this in the master, my test will fill the ct table with lots of assured flows. Thus, packets that belong new flows will be likely dropped in that node. > >Yes, that line was wrong, I have fixed in the documentation, the > >correct one must be: > > > >iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy > > > >Thus, destroy events are delivered to user-space. > > > >># conntrack -S | head -n1; conntrackd -s | head -n2 > >>entries 725826 > >>cache internal: > >>current active connections: 1409472 > >> > >>Whatever the case, I'm quite happy to go without this rule as these > >>systems are coping fine with the load incurred by conntrackd. > > > >I want to get things fixed, please, don't give up on using that rule > >yet :-). > > Sure. I've re-instated the rule as requested. With the addition of > destroy events, cache usage remains under control. > > > > >Regarding the hardlockups. I'd be happy if you can re-do the tests, > >both with conntrackd and the tools that I send you. > > > >Make sure you have these three patches, note that the last one has > >changed. > > > >http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2 > >http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60 > >http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897 > > > > Duly applied to a fresh 3.3-rc5 tree. > > Cheers, > > --Kerin > > -- > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-06 17:23 ` Pablo Neira Ayuso @ 2012-03-06 22:37 ` Kerin Millar 2012-03-07 14:41 ` Kerin Millar 0 siblings, 1 reply; 14+ messages in thread From: Kerin Millar @ 2012-03-06 22:37 UTC (permalink / raw) To: netfilter-devel Hi Pablo, On 06/03/2012 17:23, Pablo Neira Ayuso wrote: <snip> >>> I've been using the following tools that you can find enclosed to this >>> email, they are much more simple than conntrackd but, they do the same >>> in essence: >>> >>> * conntrack_stress.c >>> * conntrack_events.c >>> >>> gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress >>> gcc -lnetfilter_conntrack conntrack_events.c -o ct_events >>> >>> Then, to listen to events with reliable event delivery enabled: >>> >>> # ./ct_events& >>> >>> And to create loads of flow entries in ASSURED state: >>> >>> # ./ct_stress 65535 # that's my ct table size in my laptop >>> >>> You'll hit ENOMEM errors at some point, that's fine, but no oops or >>> lockups happen here. >>> >>> I have pushed this tools to the qa/ directory under >>> libnetfilter_conntrack: >>> >>> commit 94e75add9867fb6f0e05e73b23f723f139da829e >>> Author: Pablo Neira Ayuso<pablo@netfilter.org> >>> Date: Tue Mar 6 12:10:55 2012 +0100 >>> >>> qa: add some stress tools to test conntrack via ctnetlink >>> >>> (BTW, ct_stress may disrupt your network connection since the table >>> gets filled. You can use conntrack -F to get the ct table empty again). >>> >> >> Sorry if this is a silly question but should conntrackd be running >> while I conduct this stress test? If so, is there any danger of the >> master becoming unstable? I must ask because, if the stability of >> the master is compromised, I will be in big trouble ;) > > If you run this in the backup, conntrackd will spam the master with > lots of new flows in the external cache. That shouldn't be a problem > (just a bit of extra load invested in the replication). > > But if you run this in the master, my test will fill the ct table > with lots of assured flows. Thus, packets that belong new flows will > be likely dropped in that node. That makes sense. So, I rebooted the backup with the latest kernel build, ran my iptables script then started conntrackd. I was not able to destabilize the system through the use of your stress tool. The sequence of commands used to invoke the ct_stress tool was as follows:- 1) ct_stress 2097152 2) ct_stress 2097152 3) ct_stress 1048576 There were indeed a lot of ENOMEM errors, and messages warning that the conntrack table was full with packets being dropped. Nothing surprising. I then tried my test case again. The exact sequence of commands was as follows:- 4) conntrackd -n 5) conntrackd -c 6) conntrackd -f internal 7) conntrackd -F 8) conntrackd -n 9) conntrackd -c It didn't crash after the 5th step (to my amazement) but it did after the 9th. Here's a netconsole log covering all of the above: http://paste.pocoo.org/raw/562136/ The invalid opcode error was also present in the log that I provided with my first post in this thread. For some reason, I couldn't capture stdout from your ct_events tool but here's as much as I was able to copy and paste before it stopped responding completely. 2100000 events received (2 new, 1048702 destroy) 2110000 events received (2 new, 1048706 destroy) 2120000 events received (2 new, 1048713 destroy) 2130000 events received (2 new, 1048722 destroy) 2140000 events received (2 new, 1048735 destroy) 2150000 events received (2 new, 1048748 destroy) 2160000 events received (2 new, 1048776 destroy) 2170000 events received (2 new, 1048797 destroy) 2180000 events received (2 new, 1048830 destroy) 2190000 events received (2 new, 1048872 destroy) 2200000 events received (2 new, 1048909 destroy) 2210000 events received (2 new, 1048945 destroy) 2220000 events received (2 new, 1048985 destroy) 2230000 events received (2 new, 1049039 destroy) 2240000 events received (2 new, 1049102 destroy) 2250000 events received (2 new, 1049170 destroy) 2260000 events received (2 new, 1049238 destroy) 2270000 events received (2 new, 1049292 destroy) 2280000 events received (2 new, 1049347 destroy) 2290000 events received (2 new, 1049423 destroy) 2300000 events received (2 new, 1049490 destroy) 2310000 events received (2 new, 1049563 destroy) 2320000 events received (2 new, 1049646 destroy) 2330000 events received (2 new, 1049739 destroy) 2340000 events received (2 new, 1049819 destroy) 2350000 events received (2 new, 1049932 destroy) 2360000 events received (2 new, 1050040 destroy) 2370000 events received (2 new, 1050153 destroy) 2380000 events received (2 new, 1050293 destroy) 2390000 events received (2 new, 1050405 destroy) 2400000 events received (2 new, 1050535 destroy) 2410000 events received (2 new, 1050661 destroy) 2420000 events received (2 new, 1050786 destroy) 2430000 events received (2 new, 1050937 destroy) 2440000 events received (2 new, 1051085 destroy) 2450000 events received (2 new, 1051226 destroy) 2460000 events received (2 new, 1051378 destroy) 2470000 events received (2 new, 1051542 destroy) 2480000 events received (2 new, 1051693 destroy) 2490000 events received (2 new, 1051852 destroy) 2500000 events received (2 new, 1052008 destroy) 2510000 events received (2 new, 1052185 destroy) 2520000 events received (2 new, 1052373 destroy) 2530000 events received (2 new, 1052569 destroy) 2540000 events received (2 new, 1052770 destroy) 2550000 events received (2 new, 1052978 destroy) Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-06 22:37 ` Kerin Millar @ 2012-03-07 14:41 ` Kerin Millar 2012-03-08 1:33 ` Pablo Neira Ayuso 0 siblings, 1 reply; 14+ messages in thread From: Kerin Millar @ 2012-03-07 14:41 UTC (permalink / raw) To: netfilter-devel Hi Pablo, To follow up briefly (at the end of this message) ... On 06/03/2012 22:37, Kerin Millar wrote: > Hi Pablo, > > On 06/03/2012 17:23, Pablo Neira Ayuso wrote: > > <snip> > >>>> I've been using the following tools that you can find enclosed to this >>>> email, they are much more simple than conntrackd but, they do the same >>>> in essence: >>>> >>>> * conntrack_stress.c >>>> * conntrack_events.c >>>> >>>> gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress >>>> gcc -lnetfilter_conntrack conntrack_events.c -o ct_events >>>> >>>> Then, to listen to events with reliable event delivery enabled: >>>> >>>> # ./ct_events& >>>> >>>> And to create loads of flow entries in ASSURED state: >>>> >>>> # ./ct_stress 65535 # that's my ct table size in my laptop >>>> >>>> You'll hit ENOMEM errors at some point, that's fine, but no oops or >>>> lockups happen here. >>>> >>>> I have pushed this tools to the qa/ directory under >>>> libnetfilter_conntrack: >>>> >>>> commit 94e75add9867fb6f0e05e73b23f723f139da829e >>>> Author: Pablo Neira Ayuso<pablo@netfilter.org> >>>> Date: Tue Mar 6 12:10:55 2012 +0100 >>>> >>>> qa: add some stress tools to test conntrack via ctnetlink >>>> >>>> (BTW, ct_stress may disrupt your network connection since the table >>>> gets filled. You can use conntrack -F to get the ct table empty again). >>>> >>> >>> Sorry if this is a silly question but should conntrackd be running >>> while I conduct this stress test? If so, is there any danger of the >>> master becoming unstable? I must ask because, if the stability of >>> the master is compromised, I will be in big trouble ;) >> >> If you run this in the backup, conntrackd will spam the master with >> lots of new flows in the external cache. That shouldn't be a problem >> (just a bit of extra load invested in the replication). >> >> But if you run this in the master, my test will fill the ct table >> with lots of assured flows. Thus, packets that belong new flows will >> be likely dropped in that node. > > That makes sense. So, I rebooted the backup with the latest kernel > build, ran my iptables script then started conntrackd. I was not able to > destabilize the system through the use of your stress tool. The sequence > of commands used to invoke the ct_stress tool was as follows:- > > 1) ct_stress 2097152 > 2) ct_stress 2097152 > 3) ct_stress 1048576 > > There were indeed a lot of ENOMEM errors, and messages warning that the > conntrack table was full with packets being dropped. Nothing surprising. > > I then tried my test case again. The exact sequence of commands was as > follows:- > > 4) conntrackd -n > 5) conntrackd -c > 6) conntrackd -f internal > 7) conntrackd -F > 8) conntrackd -n > 9) conntrackd -c > > It didn't crash after the 5th step (to my amazement) but it did after > the 9th. Here's a netconsole log covering all of the above: > > http://paste.pocoo.org/raw/562136/ > > The invalid opcode error was also present in the log that I provided > with my first post in this thread. > > For some reason, I couldn't capture stdout from your ct_events tool but > here's as much as I was able to copy and paste before it stopped > responding completely. > > 2100000 events received (2 new, 1048702 destroy) > 2110000 events received (2 new, 1048706 destroy) > 2120000 events received (2 new, 1048713 destroy) > 2130000 events received (2 new, 1048722 destroy) > 2140000 events received (2 new, 1048735 destroy) > 2150000 events received (2 new, 1048748 destroy) > 2160000 events received (2 new, 1048776 destroy) > 2170000 events received (2 new, 1048797 destroy) > 2180000 events received (2 new, 1048830 destroy) > 2190000 events received (2 new, 1048872 destroy) > 2200000 events received (2 new, 1048909 destroy) > 2210000 events received (2 new, 1048945 destroy) > 2220000 events received (2 new, 1048985 destroy) > 2230000 events received (2 new, 1049039 destroy) > 2240000 events received (2 new, 1049102 destroy) > 2250000 events received (2 new, 1049170 destroy) > 2260000 events received (2 new, 1049238 destroy) > 2270000 events received (2 new, 1049292 destroy) > 2280000 events received (2 new, 1049347 destroy) > 2290000 events received (2 new, 1049423 destroy) > 2300000 events received (2 new, 1049490 destroy) > 2310000 events received (2 new, 1049563 destroy) > 2320000 events received (2 new, 1049646 destroy) > 2330000 events received (2 new, 1049739 destroy) > 2340000 events received (2 new, 1049819 destroy) > 2350000 events received (2 new, 1049932 destroy) > 2360000 events received (2 new, 1050040 destroy) > 2370000 events received (2 new, 1050153 destroy) > 2380000 events received (2 new, 1050293 destroy) > 2390000 events received (2 new, 1050405 destroy) > 2400000 events received (2 new, 1050535 destroy) > 2410000 events received (2 new, 1050661 destroy) > 2420000 events received (2 new, 1050786 destroy) > 2430000 events received (2 new, 1050937 destroy) > 2440000 events received (2 new, 1051085 destroy) > 2450000 events received (2 new, 1051226 destroy) > 2460000 events received (2 new, 1051378 destroy) > 2470000 events received (2 new, 1051542 destroy) > 2480000 events received (2 new, 1051693 destroy) > 2490000 events received (2 new, 1051852 destroy) > 2500000 events received (2 new, 1052008 destroy) > 2510000 events received (2 new, 1052185 destroy) > 2520000 events received (2 new, 1052373 destroy) > 2530000 events received (2 new, 1052569 destroy) > 2540000 events received (2 new, 1052770 destroy) > 2550000 events received (2 new, 1052978 destroy) Just to add that I ran a more extensive stress test on the backup, like so ... for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done It remained stable throughout. I notice that there's an option to dump the cache in XML format. I wonder if it be useful if I were to provide such a dump, having synced with the master? Assuming that there's a way to inject the contents, perhaps you could reproduce the issue also. Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-07 14:41 ` Kerin Millar @ 2012-03-08 1:33 ` Pablo Neira Ayuso 2012-03-08 11:00 ` Kerin Millar 2012-03-08 11:29 ` Kerin Millar 0 siblings, 2 replies; 14+ messages in thread From: Pablo Neira Ayuso @ 2012-03-08 1:33 UTC (permalink / raw) To: Kerin Millar; +Cc: netfilter-devel On Wed, Mar 07, 2012 at 02:41:02PM +0000, Kerin Millar wrote: > Hi Pablo, > > To follow up briefly (at the end of this message) ... > > On 06/03/2012 22:37, Kerin Millar wrote: > >Hi Pablo, > > > >On 06/03/2012 17:23, Pablo Neira Ayuso wrote: > > > ><snip> > > > >>>>I've been using the following tools that you can find enclosed to this > >>>>email, they are much more simple than conntrackd but, they do the same > >>>>in essence: > >>>> > >>>>* conntrack_stress.c > >>>>* conntrack_events.c > >>>> > >>>>gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress > >>>>gcc -lnetfilter_conntrack conntrack_events.c -o ct_events > >>>> > >>>>Then, to listen to events with reliable event delivery enabled: > >>>> > >>>># ./ct_events& > >>>> > >>>>And to create loads of flow entries in ASSURED state: > >>>> > >>>># ./ct_stress 65535 # that's my ct table size in my laptop > >>>> > >>>>You'll hit ENOMEM errors at some point, that's fine, but no oops or > >>>>lockups happen here. > >>>> > >>>>I have pushed this tools to the qa/ directory under > >>>>libnetfilter_conntrack: > >>>> > >>>>commit 94e75add9867fb6f0e05e73b23f723f139da829e > >>>>Author: Pablo Neira Ayuso<pablo@netfilter.org> > >>>>Date: Tue Mar 6 12:10:55 2012 +0100 > >>>> > >>>>qa: add some stress tools to test conntrack via ctnetlink > >>>> > >>>>(BTW, ct_stress may disrupt your network connection since the table > >>>>gets filled. You can use conntrack -F to get the ct table empty again). > >>>> > >>> > >>>Sorry if this is a silly question but should conntrackd be running > >>>while I conduct this stress test? If so, is there any danger of the > >>>master becoming unstable? I must ask because, if the stability of > >>>the master is compromised, I will be in big trouble ;) > >> > >>If you run this in the backup, conntrackd will spam the master with > >>lots of new flows in the external cache. That shouldn't be a problem > >>(just a bit of extra load invested in the replication). > >> > >>But if you run this in the master, my test will fill the ct table > >>with lots of assured flows. Thus, packets that belong new flows will > >>be likely dropped in that node. > > > >That makes sense. So, I rebooted the backup with the latest kernel > >build, ran my iptables script then started conntrackd. I was not able to > >destabilize the system through the use of your stress tool. The sequence > >of commands used to invoke the ct_stress tool was as follows:- > > > >1) ct_stress 2097152 > >2) ct_stress 2097152 > >3) ct_stress 1048576 > > > >There were indeed a lot of ENOMEM errors, and messages warning that the > >conntrack table was full with packets being dropped. Nothing surprising. > > > >I then tried my test case again. The exact sequence of commands was as > >follows:- > > > >4) conntrackd -n > >5) conntrackd -c > >6) conntrackd -f internal > >7) conntrackd -F > >8) conntrackd -n > >9) conntrackd -c > > > >It didn't crash after the 5th step (to my amazement) but it did after > >the 9th. Here's a netconsole log covering all of the above: > > > >http://paste.pocoo.org/raw/562136/ > > > >The invalid opcode error was also present in the log that I provided > >with my first post in this thread. > > > >For some reason, I couldn't capture stdout from your ct_events tool but > >here's as much as I was able to copy and paste before it stopped > >responding completely. > > > >2100000 events received (2 new, 1048702 destroy) > >2110000 events received (2 new, 1048706 destroy) > >2120000 events received (2 new, 1048713 destroy) > >2130000 events received (2 new, 1048722 destroy) > >2140000 events received (2 new, 1048735 destroy) > >2150000 events received (2 new, 1048748 destroy) > >2160000 events received (2 new, 1048776 destroy) > >2170000 events received (2 new, 1048797 destroy) > >2180000 events received (2 new, 1048830 destroy) > >2190000 events received (2 new, 1048872 destroy) > >2200000 events received (2 new, 1048909 destroy) > >2210000 events received (2 new, 1048945 destroy) > >2220000 events received (2 new, 1048985 destroy) > >2230000 events received (2 new, 1049039 destroy) > >2240000 events received (2 new, 1049102 destroy) > >2250000 events received (2 new, 1049170 destroy) > >2260000 events received (2 new, 1049238 destroy) > >2270000 events received (2 new, 1049292 destroy) > >2280000 events received (2 new, 1049347 destroy) > >2290000 events received (2 new, 1049423 destroy) > >2300000 events received (2 new, 1049490 destroy) > >2310000 events received (2 new, 1049563 destroy) > >2320000 events received (2 new, 1049646 destroy) > >2330000 events received (2 new, 1049739 destroy) > >2340000 events received (2 new, 1049819 destroy) > >2350000 events received (2 new, 1049932 destroy) > >2360000 events received (2 new, 1050040 destroy) > >2370000 events received (2 new, 1050153 destroy) > >2380000 events received (2 new, 1050293 destroy) > >2390000 events received (2 new, 1050405 destroy) > >2400000 events received (2 new, 1050535 destroy) > >2410000 events received (2 new, 1050661 destroy) > >2420000 events received (2 new, 1050786 destroy) > >2430000 events received (2 new, 1050937 destroy) > >2440000 events received (2 new, 1051085 destroy) > >2450000 events received (2 new, 1051226 destroy) > >2460000 events received (2 new, 1051378 destroy) > >2470000 events received (2 new, 1051542 destroy) > >2480000 events received (2 new, 1051693 destroy) > >2490000 events received (2 new, 1051852 destroy) > >2500000 events received (2 new, 1052008 destroy) > >2510000 events received (2 new, 1052185 destroy) > >2520000 events received (2 new, 1052373 destroy) > >2530000 events received (2 new, 1052569 destroy) > >2540000 events received (2 new, 1052770 destroy) > >2550000 events received (2 new, 1052978 destroy) > > Just to add that I ran a more extensive stress test on the backup, > like so ... > > for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done I guess you're running ct_events_reliable as well. Lauching several ct_stress at the same time is also interesting. > It remained stable throughout. I notice that there's an option to > dump the cache in XML format. I wonder if it be useful if I were to > provide such a dump, having synced with the master? Assuming that > there's a way to inject the contents, perhaps you could reproduce > the issue also. I've been launching the user-space stress tests but I was not abled to reproduce the problem that you reported so far. I'd need to know if the problem that you reported is easy to reproduce in your setup or, it looks more like a race condition. Moreover, I need to know if there was some traffic circulating through the backup or no traffic at all. Let me know. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-08 1:33 ` Pablo Neira Ayuso @ 2012-03-08 11:00 ` Kerin Millar 2012-03-08 11:29 ` Kerin Millar 1 sibling, 0 replies; 14+ messages in thread From: Kerin Millar @ 2012-03-08 11:00 UTC (permalink / raw) To: netfilter-devel Hi Pablo, On 08/03/2012 01:33, Pablo Neira Ayuso wrote: > On Wed, Mar 07, 2012 at 02:41:02PM +0000, Kerin Millar wrote: <snip> >>> That makes sense. So, I rebooted the backup with the latest kernel >>> build, ran my iptables script then started conntrackd. I was not able to >>> destabilize the system through the use of your stress tool. The sequence >>> of commands used to invoke the ct_stress tool was as follows:- >>> >>> 1) ct_stress 2097152 >>> 2) ct_stress 2097152 >>> 3) ct_stress 1048576 >>> >>> There were indeed a lot of ENOMEM errors, and messages warning that the >>> conntrack table was full with packets being dropped. Nothing surprising. >>> >>> I then tried my test case again. The exact sequence of commands was as >>> follows:- >>> >>> 4) conntrackd -n >>> 5) conntrackd -c >>> 6) conntrackd -f internal >>> 7) conntrackd -F >>> 8) conntrackd -n >>> 9) conntrackd -c >>> >>> It didn't crash after the 5th step (to my amazement) but it did after >>> the 9th. Here's a netconsole log covering all of the above: >>> >>> http://paste.pocoo.org/raw/562136/ >>> >>> The invalid opcode error was also present in the log that I provided >>> with my first post in this thread. >>> >>> For some reason, I couldn't capture stdout from your ct_events tool but >>> here's as much as I was able to copy and paste before it stopped >>> responding completely. >>> >>> 2100000 events received (2 new, 1048702 destroy) >>> 2110000 events received (2 new, 1048706 destroy) >>> 2120000 events received (2 new, 1048713 destroy) >>> 2130000 events received (2 new, 1048722 destroy) >>> 2140000 events received (2 new, 1048735 destroy) >>> 2150000 events received (2 new, 1048748 destroy) >>> 2160000 events received (2 new, 1048776 destroy) >>> 2170000 events received (2 new, 1048797 destroy) >>> 2180000 events received (2 new, 1048830 destroy) >>> 2190000 events received (2 new, 1048872 destroy) >>> 2200000 events received (2 new, 1048909 destroy) >>> 2210000 events received (2 new, 1048945 destroy) >>> 2220000 events received (2 new, 1048985 destroy) >>> 2230000 events received (2 new, 1049039 destroy) >>> 2240000 events received (2 new, 1049102 destroy) >>> 2250000 events received (2 new, 1049170 destroy) >>> 2260000 events received (2 new, 1049238 destroy) >>> 2270000 events received (2 new, 1049292 destroy) >>> 2280000 events received (2 new, 1049347 destroy) >>> 2290000 events received (2 new, 1049423 destroy) >>> 2300000 events received (2 new, 1049490 destroy) >>> 2310000 events received (2 new, 1049563 destroy) >>> 2320000 events received (2 new, 1049646 destroy) >>> 2330000 events received (2 new, 1049739 destroy) >>> 2340000 events received (2 new, 1049819 destroy) >>> 2350000 events received (2 new, 1049932 destroy) >>> 2360000 events received (2 new, 1050040 destroy) >>> 2370000 events received (2 new, 1050153 destroy) >>> 2380000 events received (2 new, 1050293 destroy) >>> 2390000 events received (2 new, 1050405 destroy) >>> 2400000 events received (2 new, 1050535 destroy) >>> 2410000 events received (2 new, 1050661 destroy) >>> 2420000 events received (2 new, 1050786 destroy) >>> 2430000 events received (2 new, 1050937 destroy) >>> 2440000 events received (2 new, 1051085 destroy) >>> 2450000 events received (2 new, 1051226 destroy) >>> 2460000 events received (2 new, 1051378 destroy) >>> 2470000 events received (2 new, 1051542 destroy) >>> 2480000 events received (2 new, 1051693 destroy) >>> 2490000 events received (2 new, 1051852 destroy) >>> 2500000 events received (2 new, 1052008 destroy) >>> 2510000 events received (2 new, 1052185 destroy) >>> 2520000 events received (2 new, 1052373 destroy) >>> 2530000 events received (2 new, 1052569 destroy) >>> 2540000 events received (2 new, 1052770 destroy) >>> 2550000 events received (2 new, 1052978 destroy) >> >> Just to add that I ran a more extensive stress test on the backup, >> like so ... >> >> for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done > > I guess you're running ct_events_reliable as well. Lauching several > ct_stress at the same time is also interesting. The ct_events (conntrack_events.c) program was running throughout and "NetlinkEventsReliable On" remains defined in conntrackd.conf. I will try running ct_stress concurrently. > >> It remained stable throughout. I notice that there's an option to >> dump the cache in XML format. I wonder if it be useful if I were to >> provide such a dump, having synced with the master? Assuming that >> there's a way to inject the contents, perhaps you could reproduce >> the issue also. > > I've been launching the user-space stress tests but I was not abled to > reproduce the problem that you reported so far. > > I'd need to know if the problem that you reported is easy to > reproduce in your setup or, it looks more like a race condition. > Moreover, I need to know if there was some traffic circulating > through the backup or no traffic at all. Indeed, I can reproduce it so easily and consistently that I've now lost track of the amount of times I've had to hard reboot this machine. To recap: I boot the slave with 3.3-rc5 (now featuring the three patches you asked me to apply). My iptables ruleset is loaded and conntrackd is started. At that point, the stats look like this:- # conntrackd -s cache internal: current active connections: 109 connections created: 175 failed: 0 connections updated: 0 failed: 0 connections destroyed: 66 failed: 0 cache external: current active connections: 3676 connections created: 3688 failed: 0 connections updated: 4 failed: 0 connections destroyed: 12 failed: 0 traffic processed: 0 Bytes 0 Pckts UDP traffic (active device=eth2): 4360 Bytes sent 570188 Bytes recv 91 Pckts sent 6681 Pckts recv 0 Error send 0 Error recv message tracking: 0 Malformed msgs 0 Lost msgs NOTE: if left alone, external cache usage grows steadily - as expected due to the master handling new connections to our busy load balance/appserver farm. Next, rather than spend the time waiting for the backup to catch up, I explicitly synchronize with the master. # conntrackd -n # conntrackd -s cache internal: current active connections: 5 connections created: 179 failed: 0 connections updated: 0 failed: 0 connections destroyed: 174 failed: 0 cache external: current active connections: 1295640 connections created: 1351838 failed: 0 connections updated: 26 failed: 0 connections destroyed: 56198 failed: 0 traffic processed: 0 Bytes 0 Pckts UDP traffic (active device=eth2): 139384 Bytes sent 112306776 Bytes recv 1054 Pckts sent 135073 Pckts recv 0 Error send 0 Error recv message tracking: 0 Malformed msgs 175170 Lost msgs The number of state entries in the external cache is now in line with the master. Finally, I commit. # conntrackd -c What happens next is one of two things ... a) a seemingly never-ending series of hard lockups occur b) it panics with "not syncing: Fatal exception in interrupt" Scenario (a) seems to be more frequent but, either way, it happens virtually every single time. I think I've had no more than *two* occasions where conntrackd -c returned to prompt in dozens upon dozens of tests and, even then, the system didn't survive a second invocation of conntrackd -c. In all cases, I have to hard reboot the machine. Netconsole logs for both of these outcomes have been provided in previous posts. Ergo, it's entirely reproducible here. There are absolutely no signs of instability with the system except for when state is being committed. Indeed, I first became aware of this situation upon simulating a genuine failover scenario. That is, I shut down the master, ucarp migrated the VIPs and instructed conntrackd to commit state. Then it died. Since that first time, I deactivated ucarp entirely and have easily reproduced the issue by running conntrackd -c manually. Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: scheduling while atomic followed by oops upon conntrackd -c execution 2012-03-08 1:33 ` Pablo Neira Ayuso 2012-03-08 11:00 ` Kerin Millar @ 2012-03-08 11:29 ` Kerin Millar 1 sibling, 0 replies; 14+ messages in thread From: Kerin Millar @ 2012-03-08 11:29 UTC (permalink / raw) To: netfilter-devel On 08/03/2012 01:33, Pablo Neira Ayuso wrote: > Moreover, I need to know if there was some traffic circulating > through the backup or no traffic at all. Sorry, I didn't address this point in my previous email. The backup does indeed handle some traffic. Both systems run BIND and, as such, the backup is also our secondary public-facing nameserver. The load generated by this is not significant though. At any given moment, the number of state entries hovers at between 100-150 and almost all of these are from UDP entries on account of DNS queries. The rest can be accounted for by ntpd (about 4 active entries for upstream ntp servers), ssh (1 connection only), ICMP echo-request handling and a few other sundries. In my test case, I am running conntrackd -c under circumstances where conntrackd on the master is still pushing events across. But, I have also simulated a realistic failover scenario on at least two occasions by shutting down the master (at which point, conntrackd terminates and is obviously no longer pushing events to the backup). Regardless, the backup still crashes upon conntrackd -c. In summary: * Both nodes are handling DNS traffic (but it's packet forwarding which really generates a heavy load) * conntrackd -c has been run under circumstances where conntrack daemon is and isn't continuing to receive traffic from other node. It crashes anyway. Cheers, --Kerin ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2012-03-08 11:30 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-02 15:11 scheduling while atomic followed by oops upon conntrackd -c execution Kerin Millar 2012-03-03 13:30 ` Pablo Neira Ayuso 2012-03-03 17:49 ` Kerin Millar 2012-03-03 18:47 ` Kerin Millar 2012-03-04 11:01 ` Pablo Neira Ayuso 2012-03-05 17:19 ` Kerin Millar 2012-03-06 11:14 ` Pablo Neira Ayuso 2012-03-06 16:42 ` Kerin Millar 2012-03-06 17:23 ` Pablo Neira Ayuso 2012-03-06 22:37 ` Kerin Millar 2012-03-07 14:41 ` Kerin Millar 2012-03-08 1:33 ` Pablo Neira Ayuso 2012-03-08 11:00 ` Kerin Millar 2012-03-08 11:29 ` Kerin Millar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).