ip_conntrack table full problem

All of lore.kernel.org
 help / color / mirror / Atom feed

* ip_conntrack table full problem
@ 2005-03-14 15:47 Thomas Jarosch
  2005-03-14 17:18 ` Phil Oester
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Jarosch @ 2005-03-14 15:47 UTC (permalink / raw)
  To: netfilter-devel

Hi,

I'm facing a problem with conntrack on a 2.4.21 kernel.
One machine which firewalls a webradio reproducable
becomes unresponsive every week with
"ip_conntrack: table full, dropping packet."

Raising the /proc/sys/net/ipv4/ip_conntrack_max limit only delayed the 
problem. I also installed a cronscript, which saves the contents 
of /proc/net/ip_conntrack every minute to a folder.
When the system died there were around 150 connections in conntrack,
far below the maximum limit.

Also interesting is that the system never recovers from the "full table 
error", even though the conntrack table in /proc is almost empty. It feels
like the table is filled with "ghost entries" and there's no room
for new connections.

I googled around and found this:
http://cert.uni-stuttgart.de/archive/suse/security/2005/02/msg00174.html

The problem is at least confirmed by Ludwig Nussel from SuSE:
http://cert.uni-stuttgart.de/archive/suse/security/2005/02/msg00197.html

I want to help tracking the problem down. I can't upgrade to a newer kernel 
version because of various other patches, but as the "stock" SuSE 9.2 kernel 
got the same problem I assume it's a more generic problem.

Would it be wise to dump the complete internal conntrack table to syslog when 
the error occurs? Any patches I could try? Any other ideas?

Thanks in advance,
Thomas Jarosch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-14 15:47 ip_conntrack table full problem Thomas Jarosch
@ 2005-03-14 17:18 ` Phil Oester
  2005-03-15 10:13   ` Thomas Jarosch
  2005-03-21 14:13   ` Thomas Jarosch
  0 siblings, 2 replies; 15+ messages in thread
From: Phil Oester @ 2005-03-14 17:18 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netfilter-devel

On Mon, Mar 14, 2005 at 04:47:42PM +0100, Thomas Jarosch wrote:
> Hi,
> 
> I'm facing a problem with conntrack on a 2.4.21 kernel.
> One machine which firewalls a webradio reproducable
> becomes unresponsive every week with
> "ip_conntrack: table full, dropping packet."

When this happens, what does output from this look like:

wc -l /proc/net/ip_conntrack ; grep ip_conntrack /proc/slabinfo

Phil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-14 17:18 ` Phil Oester
@ 2005-03-15 10:13   ` Thomas Jarosch
  2005-03-21 14:13   ` Thomas Jarosch
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Jarosch @ 2005-03-15 10:13 UTC (permalink / raw)
  To: netfilter-devel

> > I'm facing a problem with conntrack on a 2.4.21 kernel.
> > One machine which firewalls a webradio reproducable
> > becomes unresponsive every week with
> > "ip_conntrack: table full, dropping packet."
>
> When this happens, what does output from this look like:
>
> wc -l /proc/net/ip_conntrack ; grep ip_conntrack /proc/slabinfo

Thanks Phil, I've extended my logger. The machine freezed yesterday,
so it will take a week until it freezes again. I'll keep you posted.

Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-14 17:18 ` Phil Oester
  2005-03-15 10:13   ` Thomas Jarosch
@ 2005-03-21 14:13   ` Thomas Jarosch
  2005-03-21 16:21     ` Phil Oester
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Jarosch @ 2005-03-21 14:13 UTC (permalink / raw)
  To: netfilter-devel

> > I'm facing a problem with conntrack on a 2.4.21 kernel.
> > One machine which firewalls a webradio reproducable
> > becomes unresponsive every week with
> > "ip_conntrack: table full, dropping packet."
>
> When this happens, what does output from this look like:
>
> wc -l /proc/net/ip_conntrack ; grep ip_conntrack /proc/slabinfo

It happend again on Sunday night:

wc -l:
35 /proc/net/ip_conntrack

/proc/slabinfo:
ip_conntrack       16263  16272    320 1356 1356    1

top:
  9:51pm  up 7 days,  2:04,  0 users,  load average: 0.11, 0.04, 0.01
120 processes: 119 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  2.0% user,  0.5% system,  0.3% nice,  5.8% idle
Mem:   253116K av,  246808K used,    6308K free,       0K shrd,   74200K buff
Swap:  260992K av,   57580K used,  203412K free                   66160K cached

I'm not familiar with the slab stuff, but it looks "full" to me ;-)

Cheers,
Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 14:13   ` Thomas Jarosch
@ 2005-03-21 16:21     ` Phil Oester
  2005-03-21 17:03       ` Thomas Jarosch
  0 siblings, 1 reply; 15+ messages in thread
From: Phil Oester @ 2005-03-21 16:21 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netfilter-devel

On Mon, Mar 21, 2005 at 03:13:59PM +0100, Thomas Jarosch wrote:
> > > I'm facing a problem with conntrack on a 2.4.21 kernel.
> > > One machine which firewalls a webradio reproducable
> > > becomes unresponsive every week with
> > > "ip_conntrack: table full, dropping packet."
> >
> > When this happens, what does output from this look like:
> >
> > wc -l /proc/net/ip_conntrack ; grep ip_conntrack /proc/slabinfo
> 
> It happend again on Sunday night:
> 
> wc -l:
> 35 /proc/net/ip_conntrack
> 
> /proc/slabinfo:
> ip_conntrack       16263  16272    320 1356 1356    1

Yes, you're leaking conntracks somewhere.  Any possibility of testing
a somewhat newer kernel than 2.4.21?  This may have already been
fixed.

Phil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 16:21     ` Phil Oester
@ 2005-03-21 17:03       ` Thomas Jarosch
  2005-03-21 18:08         ` Phil Oester
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Thomas Jarosch @ 2005-03-21 17:03 UTC (permalink / raw)
  To: netfilter-devel

Phil,

> > /proc/slabinfo:
> > ip_conntrack       16263  16272    320 1356 1356    1
>
> Yes, you're leaking conntracks somewhere.  Any possibility of testing
> a somewhat newer kernel than 2.4.21?  This may have already been
> fixed.

Thank you for your response.
Unfortunately I cannot update to a newer kernel soon.

Would it be possible to dump the internal conntrack tables
once the error occurs? Then we would at least know what
is filling the table up. Is there some kind of debug macro
I could add before the printk("conntrack table full") code?

Or a more aggressive solution:
Flush the complete conntrack table once the error occurs.
This would kill all running connections, but the machine
would still be reachable afterwards.

Any other ideas?

I'll try to reproduce the problem in a test environment,
but it will be hard to narrow the cause down.

Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 17:03       ` Thomas Jarosch
@ 2005-03-21 18:08         ` Phil Oester
  2005-03-21 18:23           ` Thomas Jarosch
  2005-03-21 18:41         ` Patrick Schaaf
  2005-03-23  2:38         ` Patrick McHardy
  2 siblings, 1 reply; 15+ messages in thread
From: Phil Oester @ 2005-03-21 18:08 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netfilter-devel

On Mon, Mar 21, 2005 at 06:03:18PM +0100, Thomas Jarosch wrote:
> > Yes, you're leaking conntracks somewhere.  Any possibility of testing
> > a somewhat newer kernel than 2.4.21?  This may have already been
> > fixed.
> 
> Thank you for your response.
> Unfortunately I cannot update to a newer kernel soon.
> 
> Would it be possible to dump the internal conntrack tables
> once the error occurs? Then we would at least know what
> is filling the table up. Is there some kind of debug macro
> I could add before the printk("conntrack table full") code?

No easy way.  Last week I posted a patch which would have made
this possible by creating a 'cleaned' list, but since you cannot
upgrade kernels, you could not use this anyway.

> Or a more aggressive solution:
> Flush the complete conntrack table once the error occurs.
> This would kill all running connections, but the machine
> would still be reachable afterwards.

Even if conntrack were modular, you would be unable to unload
it (see the thread referenced above).

> Any other ideas?

I'm still studying the root cause and have narrowed it down
somewhat, but no patch yet.

> I'll try to reproduce the problem in a test environment,
> but it will be hard to narrow the cause down.

What's the traffic pattern on this box?  In my testing I've
never seen such high rates of leakage.

Phil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 18:08         ` Phil Oester
@ 2005-03-21 18:23           ` Thomas Jarosch
  2005-03-21 21:14             ` Phil Oester
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Jarosch @ 2005-03-21 18:23 UTC (permalink / raw)
  To: netfilter-devel

> > Would it be possible to dump the internal conntrack tables
> > once the error occurs? Then we would at least know what
> > is filling the table up. Is there some kind of debug macro
> > I could add before the printk("conntrack table full") code?
>
> No easy way.  Last week I posted a patch which would have made
> this possible by creating a 'cleaned' list, but since you cannot
> upgrade kernels, you could not use this anyway.

But I can still patch kernels ;-)

> > Any other ideas?
>
> I'm still studying the root cause and have narrowed it down
> somewhat, but no patch yet.
>
> > I'll try to reproduce the problem in a test environment,
> > but it will be hard to narrow the cause down.
>
> What's the traffic pattern on this box?  In my testing I've
> never seen such high rates of leakage.

IIRC the box makes heavy use of SNAT/DNAT for port forwarding.
I'll try to get a copy of the firewall rules tomorrow and
test it locally here.

Is there an easy way to see if it leaked conntracks?
Should the information in /proc/slabinfo be somewhat proportional
to the number of connections/lines in /proc/net/ip_conntrack?

Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 17:03       ` Thomas Jarosch
  2005-03-21 18:08         ` Phil Oester
@ 2005-03-21 18:41         ` Patrick Schaaf
  2005-03-21 21:15           ` Phil Oester
  2005-03-23  2:38         ` Patrick McHardy
  2 siblings, 1 reply; 15+ messages in thread
From: Patrick Schaaf @ 2005-03-21 18:41 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netfilter-devel

> Would it be possible to dump the internal conntrack tables
> once the error occurs?

The interface for such dumping is /proc/net/ip_conntrack, where you
only find 35 conntracks in your table-full situation. So I would
strongly doubt that extra "show them all" code would show much
more than that...

Just a random comment - unfortunately I have no idea how to
help more constructively.

best regards
  Patrick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 18:23           ` Thomas Jarosch
@ 2005-03-21 21:14             ` Phil Oester
  2005-03-21 22:58               ` Thomas Jarosch
  0 siblings, 1 reply; 15+ messages in thread
From: Phil Oester @ 2005-03-21 21:14 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netfilter-devel

On Mon, Mar 21, 2005 at 07:23:48PM +0100, Thomas Jarosch wrote:
> > No easy way.  Last week I posted a patch which would have made
> > this possible by creating a 'cleaned' list, but since you cannot
> > upgrade kernels, you could not use this anyway.
> 
> But I can still patch kernels ;-)

OK, I'll send along a 2.4.21 -> 2.6.11 patch shortly ;-)

> IIRC the box makes heavy use of SNAT/DNAT for port forwarding.
> I'll try to get a copy of the firewall rules tomorrow and
> test it locally here.
> 
> Is there an easy way to see if it leaked conntracks?
> Should the information in /proc/slabinfo be somewhat proportional
> to the number of connections/lines in /proc/net/ip_conntrack?

Yes, the numbers should be in the same ballpark.  Conntracks are being
cleaned from the lists (i.e. /proc/net/ip_conntrack), but never being
destroyed.  In my testing this is caused by a process not freeing
the skb.  What kinds of processes are running on this box?

Phil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 18:41         ` Patrick Schaaf
@ 2005-03-21 21:15           ` Phil Oester
  0 siblings, 0 replies; 15+ messages in thread
From: Phil Oester @ 2005-03-21 21:15 UTC (permalink / raw)
  To: Patrick Schaaf; +Cc: Thomas Jarosch, netfilter-devel

On Mon, Mar 21, 2005 at 07:41:06PM +0100, Patrick Schaaf wrote:
> > Would it be possible to dump the internal conntrack tables
> > once the error occurs?
> 
> The interface for such dumping is /proc/net/ip_conntrack, where you
> only find 35 conntracks in your table-full situation. So I would
> strongly doubt that extra "show them all" code would show much
> more than that...

It would show the conntracks which have been removed from ip_conntrack
but not yet destroyed.

Phil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 21:14             ` Phil Oester
@ 2005-03-21 22:58               ` Thomas Jarosch
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Jarosch @ 2005-03-21 22:58 UTC (permalink / raw)
  To: netfilter-devel

> > IIRC the box makes heavy use of SNAT/DNAT for port forwarding.
> > I'll try to get a copy of the firewall rules tomorrow and
> > test it locally here.
> >
> > Is there an easy way to see if it leaked conntracks?
> > Should the information in /proc/slabinfo be somewhat proportional
> > to the number of connections/lines in /proc/net/ip_conntrack?
>
> Yes, the numbers should be in the same ballpark.  Conntracks are being
> cleaned from the lists (i.e. /proc/net/ip_conntrack), but never being
> destroyed.  In my testing this is caused by a process not freeing
> the skb.  What kinds of processes are running on this box?

There are the usual suspects like apache, mailserver etc.,
but I can't think of anything that manipulates skbs directly.
FreeS/WAN is also installed on the box, but I use the same
configuration at home with no trouble ever.

The best thing would be to find a way of reproducing the problem,
which I start tomorrow. Just checked another box with the same installation
and 32 days uptime, /proc/slabinfo looks good. IMHO the problem
is related somehow to the heavy use of SNAT/DNAT as it's the main
difference between the freezing box and mine.

Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-21 17:03       ` Thomas Jarosch
  2005-03-21 18:08         ` Phil Oester
  2005-03-21 18:41         ` Patrick Schaaf
@ 2005-03-23  2:38         ` Patrick McHardy
  2005-03-23  9:11           ` Thomas Jarosch
  2005-03-29 20:26           ` Thomas Jarosch
  2 siblings, 2 replies; 15+ messages in thread
From: Patrick McHardy @ 2005-03-23  2:38 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: netfilter-devel

Thomas Jarosch wrote:
> Phil,
> 
> 
>>>/proc/slabinfo:
>>>ip_conntrack       16263  16272    320 1356 1356    1
>>
>>Yes, you're leaking conntracks somewhere.  Any possibility of testing
>>a somewhat newer kernel than 2.4.21?  This may have already been
>>fixed.
> 
> 
> Thank you for your response.
> Unfortunately I cannot update to a newer kernel soon.

I suggest trying this patch:

http://linux.bkbits.net:8080/linux-2.4/cset@3f219dbcj1MnJqxiJa99m_AcShdk5A?nav=index.html|src/net/|src/net|src/net/ipv4|src/net/ipv4/netfilter|related/net/ipv4/netfilter/ip_conntrack_core.c

Regards
Patrick

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-23  2:38         ` Patrick McHardy
@ 2005-03-23  9:11           ` Thomas Jarosch
  2005-03-29 20:26           ` Thomas Jarosch
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Jarosch @ 2005-03-23  9:11 UTC (permalink / raw)
  To: netfilter-devel

> >>>/proc/slabinfo:
> >>>ip_conntrack       16263  16272    320 1356 1356    1
> >>
> >>Yes, you're leaking conntracks somewhere.  Any possibility of testing
> >>a somewhat newer kernel than 2.4.21?  This may have already been
> >>fixed.
> >
> > Thank you for your response.
> > Unfortunately I cannot update to a newer kernel soon.
>
> I suggest trying this patch:
>
> http://linux.bkbits.net:8080/linux-2.4/cset@3f219dbcj1MnJqxiJa99m_AcShdk5A?
>nav=index.html|src/net/|src/net|src/net/ipv4|src/net/ipv4/netfilter|related/
>net/ipv4/netfilter/ip_conntrack_core.c

Thanks! I'll install a patched kernel on the box today.
Hopefully the problem will vanish...

Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: ip_conntrack table full problem
  2005-03-23  2:38         ` Patrick McHardy
  2005-03-23  9:11           ` Thomas Jarosch
@ 2005-03-29 20:26           ` Thomas Jarosch
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Jarosch @ 2005-03-29 20:26 UTC (permalink / raw)
  To: netfilter-devel

Hi Patrick,

> >>Yes, you're leaking conntracks somewhere.  Any possibility of testing
> >>a somewhat newer kernel than 2.4.21?  This may have already been
> >>fixed.
> >
> > Thank you for your response.
> > Unfortunately I cannot update to a newer kernel soon.
>
> I suggest trying this patch:
>
> http://linux.bkbits.net:8080/linux-2.4/cset@3f219dbcj1MnJqxiJa99m_AcShdk5A?
>nav=index.html|src/net/|src/net|src/net/ipv4|src/net/ipv4/netfilter|related/
>net/ipv4/netfilter/ip_conntrack_core.c

That patch solved the problem. Thanks!

I'm curious how this bug is triggered in real-life usage?

Cheers,
Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-03-29 20:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-14 15:47 ip_conntrack table full problem Thomas Jarosch
2005-03-14 17:18 ` Phil Oester
2005-03-15 10:13   ` Thomas Jarosch
2005-03-21 14:13   ` Thomas Jarosch
2005-03-21 16:21     ` Phil Oester
2005-03-21 17:03       ` Thomas Jarosch
2005-03-21 18:08         ` Phil Oester
2005-03-21 18:23           ` Thomas Jarosch
2005-03-21 21:14             ` Phil Oester
2005-03-21 22:58               ` Thomas Jarosch
2005-03-21 18:41         ` Patrick Schaaf
2005-03-21 21:15           ` Phil Oester
2005-03-23  2:38         ` Patrick McHardy
2005-03-23  9:11           ` Thomas Jarosch
2005-03-29 20:26           ` Thomas Jarosch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.