scheduling while atomic followed by oops upon conntrackd -c execution

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* scheduling while atomic followed by oops upon conntrackd -c execution
@ 2012-03-02 15:11 Kerin Millar
  2012-03-03 13:30 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 14+ messages in thread
From: Kerin Millar @ 2012-03-02 15:11 UTC (permalink / raw)
  To: netfilter-devel

Hello,

I have recently set up a pair of Dell PowerEdge R610 servers (Xeon 
X5650, 8GB RAM) for active-backup firewall duty. I've installed 
conntrack-tools-1.0.1 and libnetfilter_conntrack-1.0.0 and am using the 
FTFW mode for synchronization across a dedicated gigabit interface. The 
active firewall has to contend with fairly heavy traffic, much of which 
is in the form of long-lived TCP connections to an internal (LVS) load 
balancer, behind which a bunch of application servers sit.

The number of active, concurrent connections to this service peaks at 
around 480,000. At last count, the number of conntrack states was 
785,785 which is typical. I have net.nf_conntrack_max set to 1048576 and 
the nf_conntrack module is loaded with hashsize=262144. The firewall is 
fully stateful in that new connections must match on -ctstate NEW. I'm 
also using "-t raw -A PREROUTING -j CT --ctevents assured" as mentioned 
in the docs.

This is my current test case for the backup:-

1) Boot the system and start conntrackd
2) Run conntrackd -n to sync with the active firewall
3) Run conntrackd -c to commit the states from the external cache

Originally, while conntrackd -c was performing its work, I would 
experience protracted soft lockups. After some investigation, I noticed 
that conntrackd was trying to more states than net.nf_conntrack_max 
which, in turn, led me to this patch:-

https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=af14cca

Although Jozsef's patch was helpful, I'm still experiencing a nasty 
kernel oops after conntrackd -c has finished executing. This always 
occurs within 15 seconds or so - sometimes immediately. Here's a recent 
netconsole trace from 3.3-rc5 + patch:-

http://paste.pocoo.org/raw/559736/

Though I ultimately intend to use the 3.0 kernel, I tried various other 
versions going as far back as 2.6.32. In each case, an oops is 
reproducible - though the details do vary. Using 3.3-rc5, I even noticed 
a null ptr deref on one occcasion. Alas, I was unable to capture it at 
the time.

Here's some other configuration information which may be useful ...

conntrackd.conf: http://paste.pocoo.org/raw/559727/
sysctl.conf: http://paste.pocoo.org/raw/559726/
kernel .config: http://paste.pocoo.org/raw/559725/

It's perhaps worth noting that I followed the advice to set HashLimit in 
conntrackd.conf to at least double that of net.nf_conntrack_max 
(commented in my config because I was experimenting with the issue that 
Jozef's patch rectifies). One thing that puzzles me is why conntrackd 
always tries to commit more state entries than can be accommodated. On 
the master, the internal cache grows to the maximum size and, afaict, 
nothing is ever expired. This is from the master which has been up for a 
while ...

# conntrackd -s | head -n 5
cache internal:
current active connections:          2097152
connections created:                31649757    failed:    234788761
connections updated:               105516073    failed:            0
connections destroyed:              29552605    failed:            0

# conntrack -S | head -n1
entries                 792495

It seems that the cache usage grows to the maximum, at which point the 
creation failed counter starts going skyward. On the backup, it seems 
that conntrackd -n && conntrackd -c tries to commit all of this, but I 
don't really understand why.

Any advice would be most welcome. I can't tinker too much with the 
active firewall at this point but, if it helps, I can conduct any number 
of tests with the backup.

Cheers,

--Kerin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-02 15:11 scheduling while atomic followed by oops upon conntrackd -c execution Kerin Millar
@ 2012-03-03 13:30 ` Pablo Neira Ayuso
  2012-03-03 17:49   ` Kerin Millar
  2012-03-03 18:47   ` Kerin Millar
  0 siblings, 2 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2012-03-03 13:30 UTC (permalink / raw)
  To: Kerin Millar; +Cc: netfilter-devel

Hi,

On Fri, Mar 02, 2012 at 03:11:07PM +0000, Kerin Millar wrote:
> Hello,
> 
> I have recently set up a pair of Dell PowerEdge R610 servers (Xeon
> X5650, 8GB RAM) for active-backup firewall duty. I've installed
> conntrack-tools-1.0.1 and libnetfilter_conntrack-1.0.0 and am using
> the FTFW mode for synchronization across a dedicated gigabit
> interface. The active firewall has to contend with fairly heavy
> traffic, much of which is in the form of long-lived TCP connections
> to an internal (LVS) load balancer, behind which a bunch of
> application servers sit.
> 
> The number of active, concurrent connections to this service peaks
> at around 480,000. At last count, the number of conntrack states was
> 785,785 which is typical. I have net.nf_conntrack_max set to 1048576
> and the nf_conntrack module is loaded with hashsize=262144. The
> firewall is fully stateful in that new connections must match on
> -ctstate NEW. I'm also using "-t raw -A PREROUTING -j CT --ctevents
> assured" as mentioned in the docs.

Docs explictly says that you require Linux kernel >= 2.6.38 to use
this filtering. You seem to be using 2.6.32.

> This is my current test case for the backup:-
> 
> 1) Boot the system and start conntrackd
> 2) Run conntrackd -n to sync with the active firewall
> 3) Run conntrackd -c to commit the states from the external cache
> 
> Originally, while conntrackd -c was performing its work, I would
> experience protracted soft lockups. After some investigation, I
> noticed that conntrackd was trying to more states than
> net.nf_conntrack_max which, in turn, led me to this patch:-
> 
> https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=af14cca

I just posted another patch to the ML that is a relative fix to
Jozsef's patch. You have to apply that as well.

> Although Jozsef's patch was helpful, I'm still experiencing a nasty
> kernel oops after conntrackd -c has finished executing. This always
> occurs within 15 seconds or so - sometimes immediately. Here's a
> recent netconsole trace from 3.3-rc5 + patch:-
> 
> http://paste.pocoo.org/raw/559736/

It seems ctnetlink is trying to load nf_nat over and over again, but
it doesn't seem to find it. One of the firewalls seem to be performing
NAT but the other doesn't have access to the NAT module. This is
strange, I guess you have the same rule-set loaded in both firewalls
correctly.

> Though I ultimately intend to use the 3.0 kernel, I tried various
> other versions going as far back as 2.6.32. In each case, an oops is
> reproducible - though the details do vary. Using 3.3-rc5, I even
> noticed a null ptr deref on one occcasion. Alas, I was unable to
> capture it at the time.

For reporting problems, you have to stick latest Linux kernel version.
2.6.32 is rather old kernel.

> Here's some other configuration information which may be useful ...
> 
> conntrackd.conf: http://paste.pocoo.org/raw/559727/

        Options {
                TCPWindowTracking On
        }

You cannot use this with 2.6.32 either. It's also documented in the
user manual and the example config file (it requires 2.6.36). Please,
take the time to read the docs.

> sysctl.conf: http://paste.pocoo.org/raw/559726/
> kernel .config: http://paste.pocoo.org/raw/559725/
> 
> It's perhaps worth noting that I followed the advice to set
> HashLimit in conntrackd.conf to at least double that of
> net.nf_conntrack_max (commented in my config because I was
> experimenting with the issue that Jozef's patch rectifies). One
> thing that puzzles me is why conntrackd always tries to commit more
> state entries than can be accommodated. On the master, the internal
> cache grows to the maximum size and, afaict, nothing is ever
> expired. This is from the master which has been up for a while ...
> 
> # conntrackd -s | head -n 5
> cache internal:
> current active connections:          2097152
> connections created:                31649757    failed:    234788761
> connections updated:               105516073    failed:            0
> connections destroyed:              29552605    failed:            0
> 
> # conntrack -S | head -n1
> entries                 792495
>
> It seems that the cache usage grows to the maximum, at which point
> the creation failed counter starts going skyward. On the backup, it
> seems that conntrackd -n && conntrackd -c tries to commit all of
> this, but I don't really understand why.
> 
> Any advice would be most welcome. I can't tinker too much with the
> active firewall at this point but, if it helps, I can conduct any
> number of tests with the backup.

I need that you stick to a reasonable configuration to help you. Then,
we can fix issues, if any shows up.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-03 13:30 ` Pablo Neira Ayuso
@ 2012-03-03 17:49   ` Kerin Millar
  2012-03-03 18:47   ` Kerin Millar
  1 sibling, 0 replies; 14+ messages in thread
From: Kerin Millar @ 2012-03-03 17:49 UTC (permalink / raw)
  To: netfilter-devel

Hi Pablo,

On 03/03/2012 13:30, Pablo Neira Ayuso wrote:
 > Hi,
 >
 > On Fri, Mar 02, 2012 at 03:11:07PM +0000, Kerin Millar wrote:
 >> Hello,
 >>
 >> I have recently set up a pair of Dell PowerEdge R610 servers (Xeon
 >> X5650, 8GB RAM) for active-backup firewall duty. I've installed
 >> conntrack-tools-1.0.1 and libnetfilter_conntrack-1.0.0 and am using
 >> the FTFW mode for synchronization across a dedicated gigabit
 >> interface. The active firewall has to contend with fairly heavy
 >> traffic, much of which is in the form of long-lived TCP connections
 >> to an internal (LVS) load balancer, behind which a bunch of
 >> application servers sit.
 >>
 >> The number of active, concurrent connections to this service peaks
 >> at around 480,000. At last count, the number of conntrack states was
 >> 785,785 which is typical. I have net.nf_conntrack_max set to 1048576
 >> and the nf_conntrack module is loaded with hashsize=262144. The
 >> firewall is fully stateful in that new connections must match on
 >> -ctstate NEW. I'm also using "-t raw -A PREROUTING -j CT --ctevents
 >> assured" as mentioned in the docs.
 >
 > Docs explictly says that you require Linux kernel>= 2.6.38 to use
 > this filtering. You seem to be using 2.6.32.

I'm aware of this requirement. In point of fact, I am using 3.3-rc5 as 
indicated by the head of the submitted .config and the words "Here's a 
recent netconsole trace from 3.3-rc5 ..."

The only reference to 2.6.32 was to mention in passing that "I tried 
various other versions going as far back as 2.6.32". That's because I 
wanted to establish whether it could be considered a regression. Neither 
the configuration modifications required for - nor the outcome of - 
using 2.6.32 was the subject of the post. My point was that *all* tested 
versions crash under the test case, be they old or bleeding edge.

It was actually a typo; I meant to say that I went "as far back as 
2.6.33" but that seems neither here nor there. See below for the exact 
versions tested.

 >> This is my current test case for the backup:-
 >>
 >> 1) Boot the system and start conntrackd
 >> 2) Run conntrackd -n to sync with the active firewall
 >> 3) Run conntrackd -c to commit the states from the external cache
 >>
 >> Originally, while conntrackd -c was performing its work, I would
 >> experience protracted soft lockups. After some investigation, I
 >> noticed that conntrackd was trying to more states than
 >> net.nf_conntrack_max which, in turn, led me to this patch:-
 >>
 >> 
https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=af14cca
 >
 > I just posted another patch to the ML that is a relative fix to
 > Jozsef's patch. You have to apply that as well.

Presumably, that would be the one removing the spinlocks? I'll try that now.

 >
 >> Although Jozsef's patch was helpful, I'm still experiencing a nasty
 >> kernel oops after conntrackd -c has finished executing. This always
 >> occurs within 15 seconds or so - sometimes immediately. Here's a
 >> recent netconsole trace from 3.3-rc5 + patch:-
 >>
 >> http://paste.pocoo.org/raw/559736/
 >
 > It seems ctnetlink is trying to load nf_nat over and over again, but
 > it doesn't seem to find it. One of the firewalls seem to be performing
 > NAT but the other doesn't have access to the NAT module. This is
 > strange, I guess you have the same rule-set loaded in both firewalls
 > correctly.

Yes, the ruleset is identical. Furthermore, the hardware is identical 
and the software is identical except that the master continues to run 
3.1.10 as originally deployed because I can't bring this down until a 
reliable failover process is probable.

Here are the modules which are shown as loaded after the ruleset has 
been initialised and prior to conntrackd being initialised:-

# lsmod | awk 'NR!=1{print $1}' | tr '\n' ' '
iptable_nat nf_nat xt_CT xt_multiport xt_NOTRACK iptable_raw 
xt_conntrack xt_u32 xt_limit xt_recent xt_addrtype ipt_REJECT xt_comment 
ipt_LOG iptable_filter ip_tables nf_conntrack_ftp nf_conntrack_ipv4 
nf_conntrack nf_defrag_ipv4 iTCO_wdt

After initialising conntrackd, the nfnetlink and nf_conntrack_netlink 
modules are dynamically loaded in addition to the above.

The pattern you describe is bothersome but it's what happens after 
conntrackd -c has finished that really worries me! As alluded to before, 
the crash dump varies depending on the kernel version. What remains 
entirely consistent is that the kernel crashes spectacularly within 
approximately 15 seconds of conntrackd -c returning to prompt.

 >
 >> Though I ultimately intend to use the 3.0 kernel, I tried various
 >> other versions going as far back as 2.6.32. In each case, an oops is
 >> reproducible - though the details do vary. Using 3.3-rc5, I even
 >> noticed a null ptr deref on one occcasion. Alas, I was unable to
 >> capture it at the time.
 >
 > For reporting problems, you have to stick latest Linux kernel version.
 > 2.6.32 is rather old kernel.

I submitted my report based on the results in 3.3-rc5 specifically so as 
to be amenable to upstream. Precis:-

1) Both systems are deployed with 3.1.10 - issue is then discovered
2) Tested 3.2.9 on backup
3) Tested 2.6.33.20 on backup (with appropriate config modifications)
4) Tested 3.3.0-rc5 + Jozsef's patch on backup
5) Tested 3.3.0-62d222b+ on backup (pulled from linux-2.6 git tree)

The results submitted in my original post were for case (4).

 >
 >> Here's some other configuration information which may be useful ...
 >>
 >> conntrackd.conf: http://paste.pocoo.org/raw/559727/
 >
 >          Options {
 >                  TCPWindowTracking On
 >          }
 >
 > You cannot use this with 2.6.32 either. It's also documented in the
 > user manual and the example config file (it requires 2.6.36). Please,
 > take the time to read the docs.

I'm aware of this.

 >
 >> sysctl.conf: http://paste.pocoo.org/raw/559726/
 >> kernel .config: http://paste.pocoo.org/raw/559725/
 >>
 >> It's perhaps worth noting that I followed the advice to set
 >> HashLimit in conntrackd.conf to at least double that of
 >> net.nf_conntrack_max (commented in my config because I was
 >> experimenting with the issue that Jozef's patch rectifies). One
 >> thing that puzzles me is why conntrackd always tries to commit more
 >> state entries than can be accommodated. On the master, the internal
 >> cache grows to the maximum size and, afaict, nothing is ever
 >> expired. This is from the master which has been up for a while ...
 >>
 >> # conntrackd -s | head -n 5
 >> cache internal:
 >> current active connections:          2097152
 >> connections created:                31649757    failed:    234788761
 >> connections updated:               105516073    failed:            0
 >> connections destroyed:              29552605    failed:            0
 >>
 >> # conntrack -S | head -n1
 >> entries                 792495
 >>
 >> It seems that the cache usage grows to the maximum, at which point
 >> the creation failed counter starts going skyward. On the backup, it
 >> seems that conntrackd -n&&  conntrackd -c tries to commit all of
 >> this, but I don't really understand why.
 >>
 >> Any advice would be most welcome. I can't tinker too much with the
 >> active firewall at this point but, if it helps, I can conduct any
 >> number of tests with the backup.
 >
 > I need that you stick to a reasonable configuration to help you. Then,
 > we can fix issues, if any shows up.

What is unreasonable about my configuration? Even if it were, the 
inference that a crash is not an issue because the user may or may not 
have followed best practice is one I find somewhat perplexing. I don't 
see how it could possibly be deemed that there is not an issue with the 
kernel's behaviour here.

Given that the misunderstanding over 2.6.32 has been put to bed, if you 
still have any concerns as to the nature of my configuration, by all 
means please air them and I'll take remedial action. Further, the 
servers are equipped with remote access cards so I'm ready and willing 
to do whatever is necessary to conduct and/or assist any further diagnosis.

Cheers,

--Kerin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-03 13:30 ` Pablo Neira Ayuso
  2012-03-03 17:49   ` Kerin Millar
@ 2012-03-03 18:47   ` Kerin Millar
  2012-03-04 11:01     ` Pablo Neira Ayuso
  1 sibling, 1 reply; 14+ messages in thread
From: Kerin Millar @ 2012-03-03 18:47 UTC (permalink / raw)
  To: netfilter-devel

Hi,

On 03/03/2012 13:30, Pablo Neira Ayuso wrote:
> I just posted another patch to the ML that is a relative fix to
> Jozsef's patch. You have to apply that as well.

I've now tested 3.3-rc5 with the addition of the above mentioned 
follow-on patch. The behaviour during conntrackd -c execution is clearly 
much improved - in so far as it doesn't generate much noise - but the 
crash that follows remains. Here's a netconsole capture:-

http://paste.pocoo.org/raw/560439/

Cheers,

--Kerin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-03 18:47   ` Kerin Millar
@ 2012-03-04 11:01     ` Pablo Neira Ayuso
  2012-03-05 17:19       ` Kerin Millar
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2012-03-04 11:01 UTC (permalink / raw)
  To: Kerin Millar; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]

Hi Kerin,

On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote:
> Hi,
> 
> On 03/03/2012 13:30, Pablo Neira Ayuso wrote:
> >I just posted another patch to the ML that is a relative fix to
> >Jozsef's patch. You have to apply that as well.
> 
> I've now tested 3.3-rc5 with the addition of the above mentioned
> follow-on patch. The behaviour during conntrackd -c execution is
> clearly much improved - in so far as it doesn't generate much noise
> - but the crash that follows remains. Here's a netconsole capture:-
> 
> http://paste.pocoo.org/raw/560439/

Great to know :-).

Regarding your previous email, I'm sorry, by reading your email I
thought you were using 2.6.32 which was not the case, your
configuration is perfectly reasonable.

It seems we still have problems regarding early_drop, but this time
with reliable event delivery enabled (15 seconds is the time that
is required to retry sending the destroy event).

If you can test the following patch, I'll appreciate.

Thank you.

[-- Attachment #2: 0001-netfilter-nf_conntrack-fix-early_drop-with-reliable-.patch --]
[-- Type: text/x-diff, Size: 1214 bytes --]

>From 1320c099d618a278fa17715127d6fecca2786a36 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun, 4 Mar 2012 11:34:06 +0100
Subject: [PATCH] netfilter: nf_conntrack: fix early_drop with reliable event
 delivery

With reliable event delivery is enabled, if we fail to deliver the
destroy event in early_drop, we put out one entry that is still in
the dying list.

Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ed86a3b..7d2d641 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -635,6 +635,11 @@ static noinline int early_drop(struct net *net, unsigned int hash)
 
 	if (del_timer(&ct->timeout)) {
 		death_by_timeout((unsigned long)ct);
+		/* Check if we indeed killed this entry. Reliable event
+		   delivery may insert this into the dying list. */
+		if (!test_bit(IPS_DYING_BIT, &ct->status))
+			return dropped;
+
 		dropped = 1;
 		NF_CT_STAT_INC_ATOMIC(net, early_drop);
 	}
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-04 11:01     ` Pablo Neira Ayuso
@ 2012-03-05 17:19       ` Kerin Millar
  2012-03-06 11:14         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 14+ messages in thread
From: Kerin Millar @ 2012-03-05 17:19 UTC (permalink / raw)
  To: netfilter-devel

Hi Pablo,

On 04/03/2012 11:01, Pablo Neira Ayuso wrote:
> Hi Kerin,
>
> On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote:
>> Hi,
>>
>> On 03/03/2012 13:30, Pablo Neira Ayuso wrote:
>>> I just posted another patch to the ML that is a relative fix to
>>> Jozsef's patch. You have to apply that as well.
>>
>> I've now tested 3.3-rc5 with the addition of the above mentioned
>> follow-on patch. The behaviour during conntrackd -c execution is
>> clearly much improved - in so far as it doesn't generate much noise
>> - but the crash that follows remains. Here's a netconsole capture:-
>>
>> http://paste.pocoo.org/raw/560439/
>
> Great to know :-).

I apologize but I think I may have led you astray on the nf_nat issue. 
At the time of submitting my original report, I now believe that the 
nf_nat module wasn't loaded prior to starting conntrackd, although it 
was definitely available. For all tests that followed, however, I am 
entirely certain the the nf_nat module was loaded in advance. The upshot 
is that my claim that things had improved may have been premature; I 
need to specifically test under both circumstances to be sure that 
things are improving. That is, both with and without the module loaded 
in advance.

Following my own advice then, I first tried going through my test case 
*without* loading nf_nat in advance. Alas, conntrackd -c triggered hard 
lockups and didn't return to prompt. Here are the results:-

http://paste.pocoo.org/raw/561350/

In case it matters, the existing ssh session continued to respond to 
input but I was no longer able to initiate any new sessions.

>
> Regarding your previous email, I'm sorry, by reading your email I
> thought you were using 2.6.32 which was not the case, your
> configuration is perfectly reasonable.
>
> It seems we still have problems regarding early_drop, but this time
> with reliable event delivery enabled (15 seconds is the time that
> is required to retry sending the destroy event).
>
> If you can test the following patch, I'll appreciate.

Gladly. I applied the patch to my 3.3-rc5 tree, which is still carrying 
the two patches discussed earlier in the thread. I then went through my 
test case under normal circumstances i.e. all firewall rules in place, 
nf_nat confirmed present before conntrackd etc. Again, conntrackd -c did 
not return to prompt. Here are the results:-

http://paste.pocoo.org/raw/561354/

Well, at least there was no oops this time. I should also add that the 
patch was present for both of the tests mentioned in this email.

---
Incidentally, I found out why the internal cache on the master was 
filling up to capacity. It was apparently due to the use of "iptables -I 
PREROUTING -t raw -j CT --ctevents assured". Perhaps I'm missing 
something but doesn't this stop events such as new and destroy from 
being propagated? An inspection with conntrack -E suggests so. Once I 
removed the above rule, I could see destroy events being propagated and 
the number of active connections in the cache no longer exceeded my 
chosen limit of 2097152 ...

# conntrack -S | head -n1; conntrackd -s | head -n2
entries                 725826
cache internal:
current active connections:          1409472

Whatever the case, I'm quite happy to go without this rule as these 
systems are coping fine with the load incurred by conntrackd.

Cheers,

--Kerin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-05 17:19       ` Kerin Millar
@ 2012-03-06 11:14         ` Pablo Neira Ayuso
  2012-03-06 16:42           ` Kerin Millar
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2012-03-06 11:14 UTC (permalink / raw)
  To: Kerin Millar; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 5468 bytes --]

Hi Kerin,

On Mon, Mar 05, 2012 at 05:19:49PM +0000, Kerin Millar wrote:
> Hi Pablo,
> 
> On 04/03/2012 11:01, Pablo Neira Ayuso wrote:
> >Hi Kerin,
> >
> >On Sat, Mar 03, 2012 at 06:47:27PM +0000, Kerin Millar wrote:
> >>Hi,
> >>
> >>On 03/03/2012 13:30, Pablo Neira Ayuso wrote:
> >>>I just posted another patch to the ML that is a relative fix to
> >>>Jozsef's patch. You have to apply that as well.
> >>
> >>I've now tested 3.3-rc5 with the addition of the above mentioned
> >>follow-on patch. The behaviour during conntrackd -c execution is
> >>clearly much improved - in so far as it doesn't generate much noise
> >>- but the crash that follows remains. Here's a netconsole capture:-
> >>
> >>http://paste.pocoo.org/raw/560439/
> >
> >Great to know :-).
> 
> I apologize but I think I may have led you astray on the nf_nat
> issue. At the time of submitting my original report, I now believe
> that the nf_nat module wasn't loaded prior to starting conntrackd,
> although it was definitely available. For all tests that followed,
> however, I am entirely certain the the nf_nat module was loaded in
> advance. The upshot is that my claim that things had improved may
> have been premature; I need to specifically test under both
> circumstances to be sure that things are improving. That is, both
> with and without the module loaded in advance.
> 
> Following my own advice then, I first tried going through my test
> case *without* loading nf_nat in advance. Alas, conntrackd -c
> triggered hard lockups and didn't return to prompt. Here are the
> results:-
> 
> http://paste.pocoo.org/raw/561350/
> 
> In case it matters, the existing ssh session continued to respond to
> input but I was no longer able to initiate any new sessions.
> 
> >
> >Regarding your previous email, I'm sorry, by reading your email I
> >thought you were using 2.6.32 which was not the case, your
> >configuration is perfectly reasonable.
> >
> >It seems we still have problems regarding early_drop, but this time
> >with reliable event delivery enabled (15 seconds is the time that
> >is required to retry sending the destroy event).
> >
> >If you can test the following patch, I'll appreciate.
> 
> Gladly. I applied the patch to my 3.3-rc5 tree, which is still
> carrying the two patches discussed earlier in the thread. I then
> went through my test case under normal circumstances i.e. all
> firewall rules in place, nf_nat confirmed present before conntrackd
> etc. Again, conntrackd -c did not return to prompt. Here are the
> results:-
> 
> http://paste.pocoo.org/raw/561354/
>
> Well, at least there was no oops this time. I should also add that
> the patch was present for both of the tests mentioned in this email.

Previous patch that I sent you was not OK, sorry. I have committed the
following to my git tree:

http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897

I've been using the following tools that you can find enclosed to this
email, they are much more simple than conntrackd but, they do the same
in essence:

* conntrack_stress.c
* conntrack_events.c

gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress
gcc -lnetfilter_conntrack conntrack_events.c -o ct_events

Then, to listen to events with reliable event delivery enabled:

# ./ct_events &

And to create loads of flow entries in ASSURED state:

# ./ct_stress 65535 # that's my ct table size in my laptop

You'll hit ENOMEM errors at some point, that's fine, but no oops or
lockups happen here.

I have pushed this tools to the qa/ directory under
libnetfilter_conntrack:

commit 94e75add9867fb6f0e05e73b23f723f139da829e
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Tue Mar 6 12:10:55 2012 +0100

    qa: add some stress tools to test conntrack via ctnetlink

(BTW, ct_stress may disrupt your network connection since the table
gets filled. You can use conntrack -F to get the ct table empty again).

> ---
> Incidentally, I found out why the internal cache on the master was
> filling up to capacity. It was apparently due to the use of
> "iptables -I PREROUTING -t raw -j CT --ctevents assured". Perhaps
> I'm missing something but doesn't this stop events such as new and
> destroy from being propagated? An inspection with conntrack -E
> suggests so. Once I removed the above rule, I could see destroy
> events being propagated and the number of active connections in the
> cache no longer exceeded my chosen limit of 2097152 ...

Yes, that line was wrong, I have fixed in the documentation, the
correct one must be:

iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy

Thus, destroy events are delivered to user-space.

> # conntrack -S | head -n1; conntrackd -s | head -n2
> entries                 725826
> cache internal:
> current active connections:          1409472
> 
> Whatever the case, I'm quite happy to go without this rule as these
> systems are coping fine with the load incurred by conntrackd.

I want to get things fixed, please, don't give up on using that rule
yet :-).

Regarding the hardlockups. I'd be happy if you can re-do the tests,
both with conntrackd and the tools that I send you.

Make sure you have these three patches, note that the last one has
changed.

http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2
http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60
http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897

Thanks!

[-- Attachment #2: conntrack_events.c --]
[-- Type: text/x-csrc, Size: 1153 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#include <libnetfilter_conntrack/libnetfilter_conntrack.h>

static int event_cb(enum nf_conntrack_msg_type type,
		    struct nf_conntrack *ct,
		    void *data)
{
	static int i = 0;
	static int new, destroy;

	if (type == NFCT_T_NEW)
		new++;
	else if (type == NFCT_T_DESTROY)
		destroy++;

	if ((++i % 10000) == 0)
		printf("%d events received (%d new, %d destroy)\n",
			i, new, destroy);

	return NFCT_CB_CONTINUE;
}

int main(void)
{
	int ret;
	struct nfct_handle *h;
	int on = 1;

	h = nfct_open(CONNTRACK, NFCT_ALL_CT_GROUPS);
	if (!h) {
		perror("nfct_open");
		return 0;
	}

	setsockopt(nfct_fd(h), SOL_NETLINK,
			NETLINK_BROADCAST_SEND_ERROR, &on, sizeof(int));
	setsockopt(nfct_fd(h), SOL_NETLINK,
			NETLINK_NO_ENOBUFS, &on, sizeof(int));

	nfct_callback_register(h, NFCT_T_ALL, event_cb, NULL);

	printf("TEST: waiting for events...\n");

	ret = nfct_catch(h);

	printf("TEST: conntrack events ");
	if (ret == -1)
		printf("(%d)(%s)\n", ret, strerror(errno));
	else
		printf("(OK)\n");

	nfct_close(h);

	ret == -1 ? exit(EXIT_FAILURE) : exit(EXIT_SUCCESS);
}

[-- Attachment #3: conntrack_stress.c --]
[-- Type: text/x-csrc, Size: 1487 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <arpa/inet.h>

#include <libnetfilter_conntrack/libnetfilter_conntrack.h>
#include <libnetfilter_conntrack/libnetfilter_conntrack_tcp.h>

int main(int argc, char *argv[])
{
	time_t t;
	int ret, i, r;
	struct nfct_handle *h;
	struct nf_conntrack *ct;

	if (argc < 2) {
		fprintf(stderr, "Usage: %s [ct_table_size]\n", argv[0]);
		exit(EXIT_FAILURE);
	}

	time(&t);
	srandom(t);
	r = random();

	ct = nfct_new();
	if (!ct) {
		perror("nfct_new");
		return 0;
	}

	h = nfct_open(CONNTRACK, 0);
	if (!h) {
		perror("nfct_open");
		nfct_destroy(ct);
		return -1;
	}

	for (i = r;i < (r + atoi(argv[1]) * 2); i++) {
		nfct_set_attr_u8(ct, ATTR_L3PROTO, AF_INET);
		nfct_set_attr_u32(ct, ATTR_IPV4_SRC, inet_addr("1.1.1.1") + i);
		nfct_set_attr_u32(ct, ATTR_IPV4_DST, inet_addr("2.2.2.2") + i);

		nfct_set_attr_u8(ct, ATTR_L4PROTO, IPPROTO_TCP);
		nfct_set_attr_u16(ct, ATTR_PORT_SRC, htons(10));
		nfct_set_attr_u16(ct, ATTR_PORT_DST, htons(20));

		nfct_setobjopt(ct, NFCT_SOPT_SETUP_REPLY);

		nfct_set_attr_u8(ct, ATTR_TCP_STATE, TCP_CONNTRACK_ESTABLISHED);
		nfct_set_attr_u32(ct, ATTR_TIMEOUT, 1000);
		nfct_set_attr_u32(ct, ATTR_STATUS, IPS_ASSURED);

		if (i % 10000 == 0)
			printf("added %d flow entries\n", i);

		ret = nfct_query(h, NFCT_Q_CREATE, ct);
		if (ret == -1)
			perror("nfct_query: ");
	}
	nfct_close(h);

	nfct_destroy(ct);

	ret == -1 ? exit(EXIT_FAILURE) : exit(EXIT_SUCCESS);
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-06 11:14         ` Pablo Neira Ayuso
@ 2012-03-06 16:42           ` Kerin Millar
  2012-03-06 17:23             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 14+ messages in thread
From: Kerin Millar @ 2012-03-06 16:42 UTC (permalink / raw)
  To: netfilter-devel

Hi Pablo,

On 06/03/2012 11:14, Pablo Neira Ayuso wrote:

<snip>

>> Gladly. I applied the patch to my 3.3-rc5 tree, which is still
>> carrying the two patches discussed earlier in the thread. I then
>> went through my test case under normal circumstances i.e. all
>> firewall rules in place, nf_nat confirmed present before conntrackd
>> etc. Again, conntrackd -c did not return to prompt. Here are the
>> results:-
>>
>> http://paste.pocoo.org/raw/561354/
>>
>> Well, at least there was no oops this time. I should also add that
>> the patch was present for both of the tests mentioned in this email.
>
> Previous patch that I sent you was not OK, sorry. I have committed the
> following to my git tree:
>
> http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897

Noted.

>
> I've been using the following tools that you can find enclosed to this
> email, they are much more simple than conntrackd but, they do the same
> in essence:
>
> * conntrack_stress.c
> * conntrack_events.c
>
> gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress
> gcc -lnetfilter_conntrack conntrack_events.c -o ct_events
>
> Then, to listen to events with reliable event delivery enabled:
>
> # ./ct_events&
>
> And to create loads of flow entries in ASSURED state:
>
> # ./ct_stress 65535 # that's my ct table size in my laptop
>
> You'll hit ENOMEM errors at some point, that's fine, but no oops or
> lockups happen here.
>
> I have pushed this tools to the qa/ directory under
> libnetfilter_conntrack:
>
> commit 94e75add9867fb6f0e05e73b23f723f139da829e
> Author: Pablo Neira Ayuso<pablo@netfilter.org>
> Date:   Tue Mar 6 12:10:55 2012 +0100
>
>      qa: add some stress tools to test conntrack via ctnetlink
>
> (BTW, ct_stress may disrupt your network connection since the table
> gets filled. You can use conntrack -F to get the ct table empty again).
>

Sorry if this is a silly question but should conntrackd be running while 
I conduct this stress test? If so, is there any danger of the master 
becoming unstable? I must ask because, if the stability of the master is 
compromised, I will be in big trouble ;)

<snip>

> Yes, that line was wrong, I have fixed in the documentation, the
> correct one must be:
>
> iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy
>
> Thus, destroy events are delivered to user-space.
>
>> # conntrack -S | head -n1; conntrackd -s | head -n2
>> entries                 725826
>> cache internal:
>> current active connections:          1409472
>>
>> Whatever the case, I'm quite happy to go without this rule as these
>> systems are coping fine with the load incurred by conntrackd.
>
> I want to get things fixed, please, don't give up on using that rule
> yet :-).

Sure. I've re-instated the rule as requested. With the addition of 
destroy events, cache usage remains under control.

>
> Regarding the hardlockups. I'd be happy if you can re-do the tests,
> both with conntrackd and the tools that I send you.
>
> Make sure you have these three patches, note that the last one has
> changed.
>
> http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2
> http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60
> http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897
>

Duly applied to a fresh 3.3-rc5 tree.

Cheers,

--Kerin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-06 16:42           ` Kerin Millar
@ 2012-03-06 17:23             ` Pablo Neira Ayuso
  2012-03-06 22:37               ` Kerin Millar
  0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2012-03-06 17:23 UTC (permalink / raw)
  To: Kerin Millar; +Cc: netfilter-devel

On Tue, Mar 06, 2012 at 04:42:02PM +0000, Kerin Millar wrote:
> Hi Pablo,
> 
> On 06/03/2012 11:14, Pablo Neira Ayuso wrote:
> 
> <snip>
> 
> >>Gladly. I applied the patch to my 3.3-rc5 tree, which is still
> >>carrying the two patches discussed earlier in the thread. I then
> >>went through my test case under normal circumstances i.e. all
> >>firewall rules in place, nf_nat confirmed present before conntrackd
> >>etc. Again, conntrackd -c did not return to prompt. Here are the
> >>results:-
> >>
> >>http://paste.pocoo.org/raw/561354/
> >>
> >>Well, at least there was no oops this time. I should also add that
> >>the patch was present for both of the tests mentioned in this email.
> >
> >Previous patch that I sent you was not OK, sorry. I have committed the
> >following to my git tree:
> >
> >http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897
> 
> Noted.
> 
> >
> >I've been using the following tools that you can find enclosed to this
> >email, they are much more simple than conntrackd but, they do the same
> >in essence:
> >
> >* conntrack_stress.c
> >* conntrack_events.c
> >
> >gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress
> >gcc -lnetfilter_conntrack conntrack_events.c -o ct_events
> >
> >Then, to listen to events with reliable event delivery enabled:
> >
> ># ./ct_events&
> >
> >And to create loads of flow entries in ASSURED state:
> >
> ># ./ct_stress 65535 # that's my ct table size in my laptop
> >
> >You'll hit ENOMEM errors at some point, that's fine, but no oops or
> >lockups happen here.
> >
> >I have pushed this tools to the qa/ directory under
> >libnetfilter_conntrack:
> >
> >commit 94e75add9867fb6f0e05e73b23f723f139da829e
> >Author: Pablo Neira Ayuso<pablo@netfilter.org>
> >Date:   Tue Mar 6 12:10:55 2012 +0100
> >
> >     qa: add some stress tools to test conntrack via ctnetlink
> >
> >(BTW, ct_stress may disrupt your network connection since the table
> >gets filled. You can use conntrack -F to get the ct table empty again).
> >
> 
> Sorry if this is a silly question but should conntrackd be running
> while I conduct this stress test? If so, is there any danger of the
> master becoming unstable? I must ask because, if the stability of
> the master is compromised, I will be in big trouble ;)

If you run this in the backup, conntrackd will spam the master with
lots of new flows in the external cache. That shouldn't be a problem
(just a bit of extra load invested in the replication).

But if you run this in the master, my test will fill the ct table
with lots of assured flows. Thus, packets that belong new flows will
be likely dropped in that node.

> >Yes, that line was wrong, I have fixed in the documentation, the
> >correct one must be:
> >
> >iptables -I PREROUTING -t raw -j CT --ctevents assured,destroy
> >
> >Thus, destroy events are delivered to user-space.
> >
> >># conntrack -S | head -n1; conntrackd -s | head -n2
> >>entries                 725826
> >>cache internal:
> >>current active connections:          1409472
> >>
> >>Whatever the case, I'm quite happy to go without this rule as these
> >>systems are coping fine with the load incurred by conntrackd.
> >
> >I want to get things fixed, please, don't give up on using that rule
> >yet :-).
> 
> Sure. I've re-instated the rule as requested. With the addition of
> destroy events, cache usage remains under control.
> 
> >
> >Regarding the hardlockups. I'd be happy if you can re-do the tests,
> >both with conntrackd and the tools that I send you.
> >
> >Make sure you have these three patches, note that the last one has
> >changed.
> >
> >http://1984.lsi.us.es/git/net/commit/?id=7d367e06688dc7a2cc98c2ace04e1296e1d987e2
> >http://1984.lsi.us.es/git/net/commit/?id=a8f341e98a46f579061fabfe6ea50be3d0eb2c60
> >http://1984.lsi.us.es/git/net/commit/?id=691d47b2dc8fdb8fea5a2b59c46e70363fa66897
> >
> 
> Duly applied to a fresh 3.3-rc5 tree.
> 
> Cheers,
> 
> --Kerin
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-06 17:23             ` Pablo Neira Ayuso
@ 2012-03-06 22:37               ` Kerin Millar
  2012-03-07 14:41                 ` Kerin Millar
  0 siblings, 1 reply; 14+ messages in thread
From: Kerin Millar @ 2012-03-06 22:37 UTC (permalink / raw)
  To: netfilter-devel

Hi Pablo,

On 06/03/2012 17:23, Pablo Neira Ayuso wrote:

<snip>

>>> I've been using the following tools that you can find enclosed to this
>>> email, they are much more simple than conntrackd but, they do the same
>>> in essence:
>>>
>>> * conntrack_stress.c
>>> * conntrack_events.c
>>>
>>> gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress
>>> gcc -lnetfilter_conntrack conntrack_events.c -o ct_events
>>>
>>> Then, to listen to events with reliable event delivery enabled:
>>>
>>> # ./ct_events&
>>>
>>> And to create loads of flow entries in ASSURED state:
>>>
>>> # ./ct_stress 65535 # that's my ct table size in my laptop
>>>
>>> You'll hit ENOMEM errors at some point, that's fine, but no oops or
>>> lockups happen here.
>>>
>>> I have pushed this tools to the qa/ directory under
>>> libnetfilter_conntrack:
>>>
>>> commit 94e75add9867fb6f0e05e73b23f723f139da829e
>>> Author: Pablo Neira Ayuso<pablo@netfilter.org>
>>> Date:   Tue Mar 6 12:10:55 2012 +0100
>>>
>>>      qa: add some stress tools to test conntrack via ctnetlink
>>>
>>> (BTW, ct_stress may disrupt your network connection since the table
>>> gets filled. You can use conntrack -F to get the ct table empty again).
>>>
>>
>> Sorry if this is a silly question but should conntrackd be running
>> while I conduct this stress test? If so, is there any danger of the
>> master becoming unstable? I must ask because, if the stability of
>> the master is compromised, I will be in big trouble ;)
>
> If you run this in the backup, conntrackd will spam the master with
> lots of new flows in the external cache. That shouldn't be a problem
> (just a bit of extra load invested in the replication).
>
> But if you run this in the master, my test will fill the ct table
> with lots of assured flows. Thus, packets that belong new flows will
> be likely dropped in that node.

That makes sense. So, I rebooted the backup with the latest kernel 
build, ran my iptables script then started conntrackd. I was not able to 
destabilize the system through the use of your stress tool. The sequence 
of commands used to invoke the ct_stress tool was as follows:-

1) ct_stress 2097152
2) ct_stress 2097152
3) ct_stress 1048576

There were indeed a lot of ENOMEM errors, and messages warning that the 
conntrack table was full with packets being dropped. Nothing surprising.

I then tried my test case again. The exact sequence of commands was as 
follows:-

4) conntrackd -n
5) conntrackd -c
6) conntrackd -f internal
7) conntrackd -F
8) conntrackd -n
9) conntrackd -c

It didn't crash after the 5th step (to my amazement) but it did after 
the 9th. Here's a netconsole log covering all of the above:

http://paste.pocoo.org/raw/562136/

The invalid opcode error was also present in the log that I provided 
with my first post in this thread.

For some reason, I couldn't capture stdout from your ct_events tool but 
here's as much as I was able to copy and paste before it stopped 
responding completely.

2100000 events received (2 new, 1048702 destroy)
2110000 events received (2 new, 1048706 destroy)
2120000 events received (2 new, 1048713 destroy)
2130000 events received (2 new, 1048722 destroy)
2140000 events received (2 new, 1048735 destroy)
2150000 events received (2 new, 1048748 destroy)
2160000 events received (2 new, 1048776 destroy)
2170000 events received (2 new, 1048797 destroy)
2180000 events received (2 new, 1048830 destroy)
2190000 events received (2 new, 1048872 destroy)
2200000 events received (2 new, 1048909 destroy)
2210000 events received (2 new, 1048945 destroy)
2220000 events received (2 new, 1048985 destroy)
2230000 events received (2 new, 1049039 destroy)
2240000 events received (2 new, 1049102 destroy)
2250000 events received (2 new, 1049170 destroy)
2260000 events received (2 new, 1049238 destroy)
2270000 events received (2 new, 1049292 destroy)
2280000 events received (2 new, 1049347 destroy)
2290000 events received (2 new, 1049423 destroy)
2300000 events received (2 new, 1049490 destroy)
2310000 events received (2 new, 1049563 destroy)
2320000 events received (2 new, 1049646 destroy)
2330000 events received (2 new, 1049739 destroy)
2340000 events received (2 new, 1049819 destroy)
2350000 events received (2 new, 1049932 destroy)
2360000 events received (2 new, 1050040 destroy)
2370000 events received (2 new, 1050153 destroy)
2380000 events received (2 new, 1050293 destroy)
2390000 events received (2 new, 1050405 destroy)
2400000 events received (2 new, 1050535 destroy)
2410000 events received (2 new, 1050661 destroy)
2420000 events received (2 new, 1050786 destroy)
2430000 events received (2 new, 1050937 destroy)
2440000 events received (2 new, 1051085 destroy)
2450000 events received (2 new, 1051226 destroy)
2460000 events received (2 new, 1051378 destroy)
2470000 events received (2 new, 1051542 destroy)
2480000 events received (2 new, 1051693 destroy)
2490000 events received (2 new, 1051852 destroy)
2500000 events received (2 new, 1052008 destroy)
2510000 events received (2 new, 1052185 destroy)
2520000 events received (2 new, 1052373 destroy)
2530000 events received (2 new, 1052569 destroy)
2540000 events received (2 new, 1052770 destroy)
2550000 events received (2 new, 1052978 destroy)

Cheers,

--Kerin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-06 22:37               ` Kerin Millar
@ 2012-03-07 14:41                 ` Kerin Millar
  2012-03-08  1:33                   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 14+ messages in thread
From: Kerin Millar @ 2012-03-07 14:41 UTC (permalink / raw)
  To: netfilter-devel

Hi Pablo,

To follow up briefly (at the end of this message) ...

On 06/03/2012 22:37, Kerin Millar wrote:
> Hi Pablo,
>
> On 06/03/2012 17:23, Pablo Neira Ayuso wrote:
>
> <snip>
>
>>>> I've been using the following tools that you can find enclosed to this
>>>> email, they are much more simple than conntrackd but, they do the same
>>>> in essence:
>>>>
>>>> * conntrack_stress.c
>>>> * conntrack_events.c
>>>>
>>>> gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress
>>>> gcc -lnetfilter_conntrack conntrack_events.c -o ct_events
>>>>
>>>> Then, to listen to events with reliable event delivery enabled:
>>>>
>>>> # ./ct_events&
>>>>
>>>> And to create loads of flow entries in ASSURED state:
>>>>
>>>> # ./ct_stress 65535 # that's my ct table size in my laptop
>>>>
>>>> You'll hit ENOMEM errors at some point, that's fine, but no oops or
>>>> lockups happen here.
>>>>
>>>> I have pushed this tools to the qa/ directory under
>>>> libnetfilter_conntrack:
>>>>
>>>> commit 94e75add9867fb6f0e05e73b23f723f139da829e
>>>> Author: Pablo Neira Ayuso<pablo@netfilter.org>
>>>> Date: Tue Mar 6 12:10:55 2012 +0100
>>>>
>>>> qa: add some stress tools to test conntrack via ctnetlink
>>>>
>>>> (BTW, ct_stress may disrupt your network connection since the table
>>>> gets filled. You can use conntrack -F to get the ct table empty again).
>>>>
>>>
>>> Sorry if this is a silly question but should conntrackd be running
>>> while I conduct this stress test? If so, is there any danger of the
>>> master becoming unstable? I must ask because, if the stability of
>>> the master is compromised, I will be in big trouble ;)
>>
>> If you run this in the backup, conntrackd will spam the master with
>> lots of new flows in the external cache. That shouldn't be a problem
>> (just a bit of extra load invested in the replication).
>>
>> But if you run this in the master, my test will fill the ct table
>> with lots of assured flows. Thus, packets that belong new flows will
>> be likely dropped in that node.
>
> That makes sense. So, I rebooted the backup with the latest kernel
> build, ran my iptables script then started conntrackd. I was not able to
> destabilize the system through the use of your stress tool. The sequence
> of commands used to invoke the ct_stress tool was as follows:-
>
> 1) ct_stress 2097152
> 2) ct_stress 2097152
> 3) ct_stress 1048576
>
> There were indeed a lot of ENOMEM errors, and messages warning that the
> conntrack table was full with packets being dropped. Nothing surprising.
>
> I then tried my test case again. The exact sequence of commands was as
> follows:-
>
> 4) conntrackd -n
> 5) conntrackd -c
> 6) conntrackd -f internal
> 7) conntrackd -F
> 8) conntrackd -n
> 9) conntrackd -c
>
> It didn't crash after the 5th step (to my amazement) but it did after
> the 9th. Here's a netconsole log covering all of the above:
>
> http://paste.pocoo.org/raw/562136/
>
> The invalid opcode error was also present in the log that I provided
> with my first post in this thread.
>
> For some reason, I couldn't capture stdout from your ct_events tool but
> here's as much as I was able to copy and paste before it stopped
> responding completely.
>
> 2100000 events received (2 new, 1048702 destroy)
> 2110000 events received (2 new, 1048706 destroy)
> 2120000 events received (2 new, 1048713 destroy)
> 2130000 events received (2 new, 1048722 destroy)
> 2140000 events received (2 new, 1048735 destroy)
> 2150000 events received (2 new, 1048748 destroy)
> 2160000 events received (2 new, 1048776 destroy)
> 2170000 events received (2 new, 1048797 destroy)
> 2180000 events received (2 new, 1048830 destroy)
> 2190000 events received (2 new, 1048872 destroy)
> 2200000 events received (2 new, 1048909 destroy)
> 2210000 events received (2 new, 1048945 destroy)
> 2220000 events received (2 new, 1048985 destroy)
> 2230000 events received (2 new, 1049039 destroy)
> 2240000 events received (2 new, 1049102 destroy)
> 2250000 events received (2 new, 1049170 destroy)
> 2260000 events received (2 new, 1049238 destroy)
> 2270000 events received (2 new, 1049292 destroy)
> 2280000 events received (2 new, 1049347 destroy)
> 2290000 events received (2 new, 1049423 destroy)
> 2300000 events received (2 new, 1049490 destroy)
> 2310000 events received (2 new, 1049563 destroy)
> 2320000 events received (2 new, 1049646 destroy)
> 2330000 events received (2 new, 1049739 destroy)
> 2340000 events received (2 new, 1049819 destroy)
> 2350000 events received (2 new, 1049932 destroy)
> 2360000 events received (2 new, 1050040 destroy)
> 2370000 events received (2 new, 1050153 destroy)
> 2380000 events received (2 new, 1050293 destroy)
> 2390000 events received (2 new, 1050405 destroy)
> 2400000 events received (2 new, 1050535 destroy)
> 2410000 events received (2 new, 1050661 destroy)
> 2420000 events received (2 new, 1050786 destroy)
> 2430000 events received (2 new, 1050937 destroy)
> 2440000 events received (2 new, 1051085 destroy)
> 2450000 events received (2 new, 1051226 destroy)
> 2460000 events received (2 new, 1051378 destroy)
> 2470000 events received (2 new, 1051542 destroy)
> 2480000 events received (2 new, 1051693 destroy)
> 2490000 events received (2 new, 1051852 destroy)
> 2500000 events received (2 new, 1052008 destroy)
> 2510000 events received (2 new, 1052185 destroy)
> 2520000 events received (2 new, 1052373 destroy)
> 2530000 events received (2 new, 1052569 destroy)
> 2540000 events received (2 new, 1052770 destroy)
> 2550000 events received (2 new, 1052978 destroy)

Just to add that I ran a more extensive stress test on the backup, like 
so ...

for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done

It remained stable throughout. I notice that there's an option to dump 
the cache in XML format. I wonder if it be useful if I were to provide 
such a dump, having synced with the master? Assuming that there's a way 
to inject the contents, perhaps you could reproduce the issue also.

Cheers,

--Kerin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-07 14:41                 ` Kerin Millar
@ 2012-03-08  1:33                   ` Pablo Neira Ayuso
  2012-03-08 11:00                     ` Kerin Millar
  2012-03-08 11:29                     ` Kerin Millar
  0 siblings, 2 replies; 14+ messages in thread
From: Pablo Neira Ayuso @ 2012-03-08  1:33 UTC (permalink / raw)
  To: Kerin Millar; +Cc: netfilter-devel

On Wed, Mar 07, 2012 at 02:41:02PM +0000, Kerin Millar wrote:
> Hi Pablo,
> 
> To follow up briefly (at the end of this message) ...
> 
> On 06/03/2012 22:37, Kerin Millar wrote:
> >Hi Pablo,
> >
> >On 06/03/2012 17:23, Pablo Neira Ayuso wrote:
> >
> ><snip>
> >
> >>>>I've been using the following tools that you can find enclosed to this
> >>>>email, they are much more simple than conntrackd but, they do the same
> >>>>in essence:
> >>>>
> >>>>* conntrack_stress.c
> >>>>* conntrack_events.c
> >>>>
> >>>>gcc -lnetfilter_conntrack conntrack_stress.c -o ct_stress
> >>>>gcc -lnetfilter_conntrack conntrack_events.c -o ct_events
> >>>>
> >>>>Then, to listen to events with reliable event delivery enabled:
> >>>>
> >>>># ./ct_events&
> >>>>
> >>>>And to create loads of flow entries in ASSURED state:
> >>>>
> >>>># ./ct_stress 65535 # that's my ct table size in my laptop
> >>>>
> >>>>You'll hit ENOMEM errors at some point, that's fine, but no oops or
> >>>>lockups happen here.
> >>>>
> >>>>I have pushed this tools to the qa/ directory under
> >>>>libnetfilter_conntrack:
> >>>>
> >>>>commit 94e75add9867fb6f0e05e73b23f723f139da829e
> >>>>Author: Pablo Neira Ayuso<pablo@netfilter.org>
> >>>>Date: Tue Mar 6 12:10:55 2012 +0100
> >>>>
> >>>>qa: add some stress tools to test conntrack via ctnetlink
> >>>>
> >>>>(BTW, ct_stress may disrupt your network connection since the table
> >>>>gets filled. You can use conntrack -F to get the ct table empty again).
> >>>>
> >>>
> >>>Sorry if this is a silly question but should conntrackd be running
> >>>while I conduct this stress test? If so, is there any danger of the
> >>>master becoming unstable? I must ask because, if the stability of
> >>>the master is compromised, I will be in big trouble ;)
> >>
> >>If you run this in the backup, conntrackd will spam the master with
> >>lots of new flows in the external cache. That shouldn't be a problem
> >>(just a bit of extra load invested in the replication).
> >>
> >>But if you run this in the master, my test will fill the ct table
> >>with lots of assured flows. Thus, packets that belong new flows will
> >>be likely dropped in that node.
> >
> >That makes sense. So, I rebooted the backup with the latest kernel
> >build, ran my iptables script then started conntrackd. I was not able to
> >destabilize the system through the use of your stress tool. The sequence
> >of commands used to invoke the ct_stress tool was as follows:-
> >
> >1) ct_stress 2097152
> >2) ct_stress 2097152
> >3) ct_stress 1048576
> >
> >There were indeed a lot of ENOMEM errors, and messages warning that the
> >conntrack table was full with packets being dropped. Nothing surprising.
> >
> >I then tried my test case again. The exact sequence of commands was as
> >follows:-
> >
> >4) conntrackd -n
> >5) conntrackd -c
> >6) conntrackd -f internal
> >7) conntrackd -F
> >8) conntrackd -n
> >9) conntrackd -c
> >
> >It didn't crash after the 5th step (to my amazement) but it did after
> >the 9th. Here's a netconsole log covering all of the above:
> >
> >http://paste.pocoo.org/raw/562136/
> >
> >The invalid opcode error was also present in the log that I provided
> >with my first post in this thread.
> >
> >For some reason, I couldn't capture stdout from your ct_events tool but
> >here's as much as I was able to copy and paste before it stopped
> >responding completely.
> >
> >2100000 events received (2 new, 1048702 destroy)
> >2110000 events received (2 new, 1048706 destroy)
> >2120000 events received (2 new, 1048713 destroy)
> >2130000 events received (2 new, 1048722 destroy)
> >2140000 events received (2 new, 1048735 destroy)
> >2150000 events received (2 new, 1048748 destroy)
> >2160000 events received (2 new, 1048776 destroy)
> >2170000 events received (2 new, 1048797 destroy)
> >2180000 events received (2 new, 1048830 destroy)
> >2190000 events received (2 new, 1048872 destroy)
> >2200000 events received (2 new, 1048909 destroy)
> >2210000 events received (2 new, 1048945 destroy)
> >2220000 events received (2 new, 1048985 destroy)
> >2230000 events received (2 new, 1049039 destroy)
> >2240000 events received (2 new, 1049102 destroy)
> >2250000 events received (2 new, 1049170 destroy)
> >2260000 events received (2 new, 1049238 destroy)
> >2270000 events received (2 new, 1049292 destroy)
> >2280000 events received (2 new, 1049347 destroy)
> >2290000 events received (2 new, 1049423 destroy)
> >2300000 events received (2 new, 1049490 destroy)
> >2310000 events received (2 new, 1049563 destroy)
> >2320000 events received (2 new, 1049646 destroy)
> >2330000 events received (2 new, 1049739 destroy)
> >2340000 events received (2 new, 1049819 destroy)
> >2350000 events received (2 new, 1049932 destroy)
> >2360000 events received (2 new, 1050040 destroy)
> >2370000 events received (2 new, 1050153 destroy)
> >2380000 events received (2 new, 1050293 destroy)
> >2390000 events received (2 new, 1050405 destroy)
> >2400000 events received (2 new, 1050535 destroy)
> >2410000 events received (2 new, 1050661 destroy)
> >2420000 events received (2 new, 1050786 destroy)
> >2430000 events received (2 new, 1050937 destroy)
> >2440000 events received (2 new, 1051085 destroy)
> >2450000 events received (2 new, 1051226 destroy)
> >2460000 events received (2 new, 1051378 destroy)
> >2470000 events received (2 new, 1051542 destroy)
> >2480000 events received (2 new, 1051693 destroy)
> >2490000 events received (2 new, 1051852 destroy)
> >2500000 events received (2 new, 1052008 destroy)
> >2510000 events received (2 new, 1052185 destroy)
> >2520000 events received (2 new, 1052373 destroy)
> >2530000 events received (2 new, 1052569 destroy)
> >2540000 events received (2 new, 1052770 destroy)
> >2550000 events received (2 new, 1052978 destroy)
> 
> Just to add that I ran a more extensive stress test on the backup,
> like so ...
> 
> for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done

I guess you're running ct_events_reliable as well. Lauching several
ct_stress at the same time is also interesting.

> It remained stable throughout. I notice that there's an option to
> dump the cache in XML format. I wonder if it be useful if I were to
> provide such a dump, having synced with the master? Assuming that
> there's a way to inject the contents, perhaps you could reproduce
> the issue also.

I've been launching the user-space stress tests but I was not abled to
reproduce the problem that you reported so far.

I'd need to know if the problem that you reported is easy to
reproduce in your setup or, it looks more like a race condition.
Moreover, I need to know if there was some traffic circulating
through the backup or no traffic at all.

Let me know.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-08  1:33                   ` Pablo Neira Ayuso
@ 2012-03-08 11:00                     ` Kerin Millar
  2012-03-08 11:29                     ` Kerin Millar
  1 sibling, 0 replies; 14+ messages in thread
From: Kerin Millar @ 2012-03-08 11:00 UTC (permalink / raw)
  To: netfilter-devel

Hi Pablo,

On 08/03/2012 01:33, Pablo Neira Ayuso wrote:
> On Wed, Mar 07, 2012 at 02:41:02PM +0000, Kerin Millar wrote:

<snip>

>>> That makes sense. So, I rebooted the backup with the latest kernel
>>> build, ran my iptables script then started conntrackd. I was not able to
>>> destabilize the system through the use of your stress tool. The sequence
>>> of commands used to invoke the ct_stress tool was as follows:-
>>>
>>> 1) ct_stress 2097152
>>> 2) ct_stress 2097152
>>> 3) ct_stress 1048576
>>>
>>> There were indeed a lot of ENOMEM errors, and messages warning that the
>>> conntrack table was full with packets being dropped. Nothing surprising.
>>>
>>> I then tried my test case again. The exact sequence of commands was as
>>> follows:-
>>>
>>> 4) conntrackd -n
>>> 5) conntrackd -c
>>> 6) conntrackd -f internal
>>> 7) conntrackd -F
>>> 8) conntrackd -n
>>> 9) conntrackd -c
>>>
>>> It didn't crash after the 5th step (to my amazement) but it did after
>>> the 9th. Here's a netconsole log covering all of the above:
>>>
>>> http://paste.pocoo.org/raw/562136/
>>>
>>> The invalid opcode error was also present in the log that I provided
>>> with my first post in this thread.
>>>
>>> For some reason, I couldn't capture stdout from your ct_events tool but
>>> here's as much as I was able to copy and paste before it stopped
>>> responding completely.
>>>
>>> 2100000 events received (2 new, 1048702 destroy)
>>> 2110000 events received (2 new, 1048706 destroy)
>>> 2120000 events received (2 new, 1048713 destroy)
>>> 2130000 events received (2 new, 1048722 destroy)
>>> 2140000 events received (2 new, 1048735 destroy)
>>> 2150000 events received (2 new, 1048748 destroy)
>>> 2160000 events received (2 new, 1048776 destroy)
>>> 2170000 events received (2 new, 1048797 destroy)
>>> 2180000 events received (2 new, 1048830 destroy)
>>> 2190000 events received (2 new, 1048872 destroy)
>>> 2200000 events received (2 new, 1048909 destroy)
>>> 2210000 events received (2 new, 1048945 destroy)
>>> 2220000 events received (2 new, 1048985 destroy)
>>> 2230000 events received (2 new, 1049039 destroy)
>>> 2240000 events received (2 new, 1049102 destroy)
>>> 2250000 events received (2 new, 1049170 destroy)
>>> 2260000 events received (2 new, 1049238 destroy)
>>> 2270000 events received (2 new, 1049292 destroy)
>>> 2280000 events received (2 new, 1049347 destroy)
>>> 2290000 events received (2 new, 1049423 destroy)
>>> 2300000 events received (2 new, 1049490 destroy)
>>> 2310000 events received (2 new, 1049563 destroy)
>>> 2320000 events received (2 new, 1049646 destroy)
>>> 2330000 events received (2 new, 1049739 destroy)
>>> 2340000 events received (2 new, 1049819 destroy)
>>> 2350000 events received (2 new, 1049932 destroy)
>>> 2360000 events received (2 new, 1050040 destroy)
>>> 2370000 events received (2 new, 1050153 destroy)
>>> 2380000 events received (2 new, 1050293 destroy)
>>> 2390000 events received (2 new, 1050405 destroy)
>>> 2400000 events received (2 new, 1050535 destroy)
>>> 2410000 events received (2 new, 1050661 destroy)
>>> 2420000 events received (2 new, 1050786 destroy)
>>> 2430000 events received (2 new, 1050937 destroy)
>>> 2440000 events received (2 new, 1051085 destroy)
>>> 2450000 events received (2 new, 1051226 destroy)
>>> 2460000 events received (2 new, 1051378 destroy)
>>> 2470000 events received (2 new, 1051542 destroy)
>>> 2480000 events received (2 new, 1051693 destroy)
>>> 2490000 events received (2 new, 1051852 destroy)
>>> 2500000 events received (2 new, 1052008 destroy)
>>> 2510000 events received (2 new, 1052185 destroy)
>>> 2520000 events received (2 new, 1052373 destroy)
>>> 2530000 events received (2 new, 1052569 destroy)
>>> 2540000 events received (2 new, 1052770 destroy)
>>> 2550000 events received (2 new, 1052978 destroy)
>>
>> Just to add that I ran a more extensive stress test on the backup,
>> like so ...
>>
>> for x in $(seq 1 100); do ct_stress 1048576; sleep $(( $RANDOM % 60 )); done
>
> I guess you're running ct_events_reliable as well. Lauching several
> ct_stress at the same time is also interesting.

The ct_events (conntrack_events.c) program was running throughout and 
"NetlinkEventsReliable On" remains defined in conntrackd.conf. I will 
try running ct_stress concurrently.

>
>> It remained stable throughout. I notice that there's an option to
>> dump the cache in XML format. I wonder if it be useful if I were to
>> provide such a dump, having synced with the master? Assuming that
>> there's a way to inject the contents, perhaps you could reproduce
>> the issue also.
>
> I've been launching the user-space stress tests but I was not abled to
> reproduce the problem that you reported so far.
>
> I'd need to know if the problem that you reported is easy to
> reproduce in your setup or, it looks more like a race condition.
> Moreover, I need to know if there was some traffic circulating
> through the backup or no traffic at all.

Indeed, I can reproduce it so easily and consistently that I've now lost 
track of the amount of times I've had to hard reboot this machine. To 
recap: I boot the slave with 3.3-rc5 (now featuring the three patches 
you asked me to apply). My iptables ruleset is loaded and conntrackd is 
started. At that point, the stats look like this:-

# conntrackd -s
cache internal:
current active connections:              109
connections created:                     175    failed:            0
connections updated:                       0    failed:            0
connections destroyed:                    66    failed:            0

cache external:
current active connections:             3676
connections created:                    3688    failed:            0
connections updated:                       4    failed:            0
connections destroyed:                    12    failed:            0

traffic processed:
                    0 Bytes                         0 Pckts

UDP traffic (active device=eth2):
                 4360 Bytes sent               570188 Bytes recv
                   91 Pckts sent                 6681 Pckts recv
                    0 Error send                    0 Error recv

message tracking:
                    0 Malformed msgs                    0 Lost msgs

NOTE: if left alone, external cache usage grows steadily - as expected 
due to the master handling new connections to our busy load 
balance/appserver farm.

Next, rather than spend the time waiting for the backup to catch up, I 
explicitly synchronize with the master.

# conntrackd -n
# conntrackd -s
cache internal:
current active connections:                5
connections created:                     179    failed:            0
connections updated:                       0    failed:            0
connections destroyed:                   174    failed:            0

cache external:
current active connections:          1295640
connections created:                 1351838    failed:            0
connections updated:                      26    failed:            0
connections destroyed:                 56198    failed:            0

traffic processed:
                    0 Bytes                         0 Pckts

UDP traffic (active device=eth2):
               139384 Bytes sent            112306776 Bytes recv
                 1054 Pckts sent               135073 Pckts recv
                    0 Error send                    0 Error recv

message tracking:
                    0 Malformed msgs               175170 Lost msgs

The number of state entries in the external cache is now in line with 
the master. Finally, I commit.

# conntrackd -c

What happens next is one of two things ...

a) a seemingly never-ending series of hard lockups occur
b) it panics with "not syncing: Fatal exception in interrupt"

Scenario (a) seems to be more frequent but, either way, it happens 
virtually every single time. I think I've had no more than *two* 
occasions where conntrackd -c returned to prompt in dozens upon dozens 
of tests and, even then, the system didn't survive a second invocation 
of conntrackd -c. In all cases, I have to hard reboot the machine. 
Netconsole logs for both of these outcomes have been provided in 
previous posts. Ergo, it's entirely reproducible here.

There are absolutely no signs of instability with the system except for 
when state is being committed. Indeed, I first became aware of this 
situation upon simulating a genuine failover scenario. That is, I shut 
down the master, ucarp migrated the VIPs and instructed conntrackd to 
commit state. Then it died. Since that first time, I deactivated ucarp 
entirely and have easily reproduced the issue by running conntrackd -c 
manually.

Cheers,

--Kerin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: scheduling while atomic followed by oops upon conntrackd -c execution
  2012-03-08  1:33                   ` Pablo Neira Ayuso
  2012-03-08 11:00                     ` Kerin Millar
@ 2012-03-08 11:29                     ` Kerin Millar
  1 sibling, 0 replies; 14+ messages in thread
From: Kerin Millar @ 2012-03-08 11:29 UTC (permalink / raw)
  To: netfilter-devel

On 08/03/2012 01:33, Pablo Neira Ayuso wrote:
> Moreover, I need to know if there was some traffic circulating
> through the backup or no traffic at all.

Sorry, I didn't address this point in my previous email. The backup does 
indeed handle some traffic. Both systems run BIND and, as such, the 
backup is also our secondary public-facing nameserver. The load 
generated by this is not significant though. At any given moment, the 
number of state entries hovers at between 100-150 and almost all of 
these are from UDP entries on account of DNS queries.

The rest can be accounted for by ntpd (about 4 active entries for 
upstream ntp servers), ssh (1 connection only), ICMP echo-request 
handling and a few other sundries.

In my test case, I am running conntrackd -c under circumstances where 
conntrackd on the master is still pushing events across. But, I have 
also simulated a realistic failover scenario on at least two occasions 
by shutting down the master (at which point, conntrackd terminates and 
is obviously no longer pushing events to the backup). Regardless, the 
backup still crashes upon conntrackd -c.

In summary:

* Both nodes are handling DNS traffic (but it's packet forwarding which
   really generates a heavy load)

* conntrackd -c has been run under circumstances where conntrack daemon
   is and isn't continuing to receive traffic from other node. It crashes
   anyway.

Cheers,

--Kerin

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-03-08 11:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-02 15:11 scheduling while atomic followed by oops upon conntrackd -c execution Kerin Millar
2012-03-03 13:30 ` Pablo Neira Ayuso
2012-03-03 17:49   ` Kerin Millar
2012-03-03 18:47   ` Kerin Millar
2012-03-04 11:01     ` Pablo Neira Ayuso
2012-03-05 17:19       ` Kerin Millar
2012-03-06 11:14         ` Pablo Neira Ayuso
2012-03-06 16:42           ` Kerin Millar
2012-03-06 17:23             ` Pablo Neira Ayuso
2012-03-06 22:37               ` Kerin Millar
2012-03-07 14:41                 ` Kerin Millar
2012-03-08  1:33                   ` Pablo Neira Ayuso
2012-03-08 11:00                     ` Kerin Millar
2012-03-08 11:29                     ` Kerin Millar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).