netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Conntrack leak (2.6.2rc2)
@ 2004-02-02  8:56 Steve Hill
  2004-02-02  9:22 ` Jozsef Kadlecsik
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Hill @ 2004-02-02  8:56 UTC (permalink / raw)
  To: netdev


I've already posted this to the netfilter-devel list and had no response 
so I'm hoping that some of you might have some insight into the problem:

I'm using the 2.6.2rc2 kernel and have a strange connection tracking 
problem - when using unfragmented packets every thing is fine - a new 
connection is made and init_conntrack() is called, and as the session is 
timed out by conntrack, destroy_conntrack() is called.  Absolutely fine.

However, if I start a connection with a fragmented packet (i.e. my MTU 
is 1500 bytes, so "ping -c 1 -s 2500 172.16.0.1" sends a packet consisting 
of 2 fragments), init_conntrack() is called as usual, but when the session 
is timed out destroy_conntrack() never gets called.  This means that the 
memory for the connection is never freed and ip_conntrack_count is never 
decremented.  However, the connection is still removed from the hash 
table.  This means that it leaks memory, and eventually reaches 
ip_conntrack_max and starts dropping new connections.

-- 

- Steve Hill
Senior Software Developer                        Email: steve@navaho.co.uk
Navaho Technologies Ltd.                           Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02  8:56 Conntrack leak (2.6.2rc2) Steve Hill
@ 2004-02-02  9:22 ` Jozsef Kadlecsik
  2004-02-02  9:46   ` Steve Hill
  0 siblings, 1 reply; 21+ messages in thread
From: Jozsef Kadlecsik @ 2004-02-02  9:22 UTC (permalink / raw)
  To: Steve Hill; +Cc: netdev

Hi Steve,

On Mon, 2 Feb 2004, Steve Hill wrote:

> However, if I start a connection with a fragmented packet (i.e. my MTU
> is 1500 bytes, so "ping -c 1 -s 2500 172.16.0.1" sends a packet consisting
> of 2 fragments), init_conntrack() is called as usual, but when the session
> is timed out destroy_conntrack() never gets called.  This means that the
> memory for the connection is never freed and ip_conntrack_count is never
> decremented.  However, the connection is still removed from the hash
> table.  This means that it leaks memory, and eventually reaches
> ip_conntrack_max and starts dropping new connections.

init_conntrack is called only when we have full, non-fragmented
packets: ip_conntrack_in explicitly calls the proper function to gather
the fragments before calling init_conntrack. There is no memory leak
there.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02  9:22 ` Jozsef Kadlecsik
@ 2004-02-02  9:46   ` Steve Hill
  2004-02-02 10:34     ` Jozsef Kadlecsik
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Hill @ 2004-02-02  9:46 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netdev

On Mon, 2 Feb 2004, Jozsef Kadlecsik wrote:

> init_conntrack is called only when we have full, non-fragmented
> packets: ip_conntrack_in explicitly calls the proper function to gather
> the fragments before calling init_conntrack. There is no memory leak
> there.

>From my observations, init_conntrack() is being called for each packet 
(not fragment, packet), which seems right.  destroy_conntrack() is, 
however, _not_ being called for any packets that are fragmented (i.e. it 
is not being called for the complete packet).  This is leading to the 
memory never being freed, and the conntrack count never being decremented 
even though the connection has been removed from the hash table.

There _is_ a memory leak here - it is observable and completely 
reproducable.  If I make a number of > MTU sized pings from a machine 
connected to one NIC to a machine connected another NIC (i.e. the packets 
will be fragmented), ip_conntrack_count grows until it reaches 
ip_conntrack_max, at which point it starts dropping new connections.  the 
ip_conntrack memory listed in /proc/slabinfo also grows.  Neither the 
memory or the connection count ever shrink again.

- Steve Hill
Senior Software Developer                        Email: steve@navaho.co.uk
Navaho Technologies Ltd.                           Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02  9:46   ` Steve Hill
@ 2004-02-02 10:34     ` Jozsef Kadlecsik
  2004-02-02 10:48       ` Steve Hill
  2004-02-04  9:22       ` Harald Welte
  0 siblings, 2 replies; 21+ messages in thread
From: Jozsef Kadlecsik @ 2004-02-02 10:34 UTC (permalink / raw)
  To: Steve Hill; +Cc: netdev

On Mon, 2 Feb 2004, Steve Hill wrote:

> > init_conntrack is called only when we have full, non-fragmented
> > packets: ip_conntrack_in explicitly calls the proper function to gather
> > the fragments before calling init_conntrack. There is no memory leak
> > there.
>
> >From my observations, init_conntrack() is being called for each packet
> (not fragment, packet), which seems right.

No, that's not true (and would be bad). Please check the code.

> destroy_conntrack() is, however, _not_ being called for any packets
> that are fragmented

Yes, because fragmented packets does not lead to conntrack entries -
there is nothing to be freed.

> There _is_ a memory leak here - it is observable and completely
> reproducable.  If I make a number of > MTU sized pings from a machine
> connected to one NIC to a machine connected another NIC (i.e. the packets
> will be fragmented), ip_conntrack_count grows until it reaches
> ip_conntrack_max, at which point it starts dropping new connections.  the
> ip_conntrack memory listed in /proc/slabinfo also grows.  Neither the
> memory or the connection count ever shrink again.

I could not reproduce it: test machine with 2.6.1 + patch-2.6.2-rc2,
ip_conntrack_max lowered to 10. From another machine, in a loop, 400
times:

ping -c 1 -s 2500 test-machine

No "ip_conntrack: table full, dropping packet" message on test-machine.
No problem shown up in /proc/slabinfo either.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 10:34     ` Jozsef Kadlecsik
@ 2004-02-02 10:48       ` Steve Hill
  2004-02-02 11:45         ` Jozsef Kadlecsik
                           ` (2 more replies)
  2004-02-04  9:22       ` Harald Welte
  1 sibling, 3 replies; 21+ messages in thread
From: Steve Hill @ 2004-02-02 10:48 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netdev

On Mon, 2 Feb 2004, Jozsef Kadlecsik wrote:

> > >From my observations, init_conntrack() is being called for each packet
> > (not fragment, packet), which seems right.
> 
> No, that's not true (and would be bad). Please check the code.

I have added to the top of init_conntrack():
    printk("Init conntrack\n");

Doing:
    ping -n 172.16.0.1 -c 1 -s 2500
through the machine now causes the kernel to output "Init conntrack", 
proving the function is being called.

> Yes, because fragmented packets does not lead to conntrack entries -
> there is nothing to be freed.

If fragmented packets do not lead to conntrack entries, how are their 
connections tracked?  I was under the impression that fragmented packets 
were received by one NIC, defragged, pushed through all the netfilter code 
and then transmitted by another NIC (after being fragmented again if they 
are > MTU size)?

> I could not reproduce it: test machine with 2.6.1 + patch-2.6.2-rc2,
> ip_conntrack_max lowered to 10. From another machine, in a loop, 400
> times:
> 
> ping -c 1 -s 2500 test-machine
> 
> No "ip_conntrack: table full, dropping packet" message on test-machine.
> No problem shown up in /proc/slabinfo either.

Just to confirm, you have your network set up like:

    [ Machine 1 ]----[ Machine 2 ]----[Machine 3]

Machines 1 and 3 are running the 2.4 kernel for me, but that shouldn't be 
important.
Machine 2 is running 2.6.2rc2.
I am making > MTU sized pings from machine 1 to machine 3 and machine 2 is 
showing the leak.

Pinging machine 2 from machine 1 does not show any such problems, I have 
not tried pinging from machine 2 itself.

I'm not sure if makes any difference, the NICs are eepro100's but I 
have also reproduced the problem on eepro1000's.

- Steve Hill
Senior Software Developer                        Email: steve@navaho.co.uk
Navaho Technologies Ltd.                           Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 10:48       ` Steve Hill
@ 2004-02-02 11:45         ` Jozsef Kadlecsik
  2004-02-02 11:58           ` Steve Hill
  2004-02-03  8:48         ` Jozsef Kadlecsik
  2004-02-04  9:20         ` Conntrack leak (2.6.2rc2) Harald Welte
  2 siblings, 1 reply; 21+ messages in thread
From: Jozsef Kadlecsik @ 2004-02-02 11:45 UTC (permalink / raw)
  To: Steve Hill; +Cc: netdev

On Mon, 2 Feb 2004, Steve Hill wrote:

> > > >From my observations, init_conntrack() is being called for each packet
> > > (not fragment, packet), which seems right.
> >
> > No, that's not true (and would be bad). Please check the code.
>
> I have added to the top of init_conntrack():
>     printk("Init conntrack\n");
>
> Doing:
>     ping -n 172.16.0.1 -c 1 -s 2500
> through the machine now causes the kernel to output "Init conntrack",
> proving the function is being called.

Yes, once, on the whole packet. Or do you see the message two times, when
issuing the ping command above once?

> > Yes, because fragmented packets does not lead to conntrack entries -
> > there is nothing to be freed.
>
> If fragmented packets do not lead to conntrack entries, how are their
> connections tracked?  I was under the impression that fragmented packets
> were received by one NIC, defragged, pushed through all the netfilter code
> and then transmitted by another NIC (after being fragmented again if they
> are > MTU size)?

You described exactly what happens: fragmented packets received, defragged
by the stack, and as we get the complete packet, then it handled by
conntrack.

> > I could not reproduce it: test machine with 2.6.1 + patch-2.6.2-rc2,
> > ip_conntrack_max lowered to 10. From another machine, in a loop, 400
> > times:
> >
> > ping -c 1 -s 2500 test-machine
> >
> > No "ip_conntrack: table full, dropping packet" message on test-machine.
> > No problem shown up in /proc/slabinfo either.
>
> Just to confirm, you have your network set up like:
>
>     [ Machine 1 ]----[ Machine 2 ]----[Machine 3]
>
>
> Machines 1 and 3 are running the 2.4 kernel for me, but that shouldn't be
> important.
> Machine 2 is running 2.6.2rc2.

I have only Machine 1 and Machine 2, but that should make no difference.

> I am making > MTU sized pings from machine 1 to machine 3 and machine 2 is
> showing the leak.
>
> Pinging machine 2 from machine 1 does not show any such problems.

That's really, really strange! Both for local and forwarded packets
ip_conntrack_in is called, which first checks the fragments and calls
defragging, when required. If there were a leak when pinging
machine 3, then there should be a leak when pinging machine 2 as well.

I'll setup an UML-net to test the forwarded case, but I expect negative
results.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 11:45         ` Jozsef Kadlecsik
@ 2004-02-02 11:58           ` Steve Hill
  2004-02-02 12:47             ` Jozsef Kadlecsik
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Hill @ 2004-02-02 11:58 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netdev

On Mon, 2 Feb 2004, Jozsef Kadlecsik wrote:

> Yes, once, on the whole packet. Or do you see the message two times, when
> issuing the ping command above once?

No, only once for the whole packet (sorry, I think I didn't do a good job 
of describing the problem).
init_conntrack() always gets called once for the whole packet (this seems 
right to me).  However, destroy never gets called for the whole packet if 
the packet was fragmented, which seems to be  the source of the leak - 
init_conntrack was called and allocated for the whole packet but that 
memory is never freed again if the packet was fragmented.

I've added some debugging code into nf_conntrack_put() and it seems that 
if it's called on a packet that was fragmented, the usage count is > 1 so 
it never gets freed.  I'm not sure if anything is actually using the 
packet at that point though or if something has just forgotten to 
decrement the usage count though - in any case, it never gets called with 
a usage count <= 1.

> >     [ Machine 1 ]----[ Machine 2 ]----[Machine 3]
> >
> >
> > Machines 1 and 3 are running the 2.4 kernel for me, but that shouldn't be
> > important.
> > Machine 2 is running 2.6.2rc2.
> 
> I have only Machine 1 and Machine 2, but that should make no difference.

Pinging from machine 1 to machine 2 didn't cause any problem for me.  

I have just tried pinging from machine 2 to machine 1 (or 3) and this also 
isn't causing a problem.  The leak is only showing itself if the machine 
is routing packets from one network segment to another, not if the machine 
itself is the source or destination.

> I'll setup an UML-net to test the forwarded case, but I expect negative
> results.

Thanks - I'm doing my best to debug the problem, but I'm not at all 
familiar with the networking code so I'm having to start at the ground and 
work my way up (which is good since I don't have any preconceptions about 
that way it _should_ work, but bad in the fact I'm having to learn it all 
from scratch which takes time).


- Steve Hill
Senior Software Developer                        Email: steve@navaho.co.uk
Navaho Technologies Ltd.                           Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 11:58           ` Steve Hill
@ 2004-02-02 12:47             ` Jozsef Kadlecsik
  2004-02-02 13:36               ` Steve Hill
  0 siblings, 1 reply; 21+ messages in thread
From: Jozsef Kadlecsik @ 2004-02-02 12:47 UTC (permalink / raw)
  To: Steve Hill; +Cc: netdev

On Mon, 2 Feb 2004, Steve Hill wrote:

> On Mon, 2 Feb 2004, Jozsef Kadlecsik wrote:
>
> > Yes, once, on the whole packet. Or do you see the message two times, when
> > issuing the ping command above once?
>
> No, only once for the whole packet (sorry, I think I didn't do a good job
> of describing the problem).
> init_conntrack() always gets called once for the whole packet (this seems
> right to me).  However, destroy never gets called for the whole packet if
> the packet was fragmented, which seems to be  the source of the leak -
> init_conntrack was called and allocated for the whole packet but that
> memory is never freed again if the packet was fragmented.

To be precise, the destroy function is not called whenever a packet leaves
the system: it gets called, when conntrack thinks the connection is
completed. It can happen when whe explicitly know from the packet that it
finishes the connection (ICMP reply for ICMP non-error messages, and a
special case for TCP RST), or when the timer of the conntrack entry goes
off.

So the destroy function is called when the system sees the ICMP reply
packet from machine 3 (and there were so many request as reply packets so
far) - otherwise it'll simply time out the connection.

Machine 3 answers the ping requests, doesn't it? You ping the same IP
address all the time?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 12:47             ` Jozsef Kadlecsik
@ 2004-02-02 13:36               ` Steve Hill
  2004-02-02 13:46                 ` Jozsef Kadlecsik
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Hill @ 2004-02-02 13:36 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netdev

On Mon, 2 Feb 2004, Jozsef Kadlecsik wrote:

> To be precise, the destroy function is not called whenever a packet leaves
> the system: it gets called, when conntrack thinks the connection is
> completed. It can happen when whe explicitly know from the packet that it
> finishes the connection (ICMP reply for ICMP non-error messages, and a
> special case for TCP RST), or when the timer of the conntrack entry goes
> off.
> 
> So the destroy function is called when the system sees the ICMP reply
> packet from machine 3 (and there were so many request as reply packets so
> far) - otherwise it'll simply time out the connection.

Yes, this makes sense.  The fact that the connection is removed from the 
hash table indicates that conntrack thinks the connection has gone, but 
the destroy function was never called.  (The connection nolonger appears 
in /proc/net/ip_conntrack).

I turned on the debugging code and got:

----
ip_conntrack_in: new packet for ce8fae40
Altering reply tuple of ce8fae40 to tuple c0357de4: 1 172.16.0.1:5438 -> 
172.17.0.1:0
Altering reply tuple of ce8fae40 to tuple c0357cdc: 1 172.16.0.1:5438 -> 
172.17.0.1:0
Confirming conntrack ce8fae40
conntrack_put ce8faebc 4
conntrack_put ce8faebc 3
clean_from_lists(ce8fae40)
remove_expectations(ce8fae40)
conntrack_put ce8faeb4 3
conntrack_put ce8faec0 4
conntrack_put ce8faec0 3
----

(the conntrack_put debugging was added by me to the nf_conntrack_put() 
function - it shows the pointer to nfct and the usage count).

If it send a small packet through, which won't be fragmented I get:

----
ip_conntrack_in: new packet for ce8fa080
Altering reply tuple of ce8fa080 to tuple c0357de4: 1 172.16.0.1:39486 -> 
172.17.0.1:0
Altering reply tuple of ce8fa080 to tuple c0357d20: 1 172.16.0.1:39486 -> 
172.17.0.1:0
Confirming conntrack ce8fa080
conntrack_put ce8fa0fc 2
clean_from_lists(ce8fa080)
remove_expectations(ce8fa080)
conntrack_put ce8fa0f4 2
conntrack_put ce8fa100 1
destroy_conntrack(ce8fa080)
destroy_conntrack: returning ct=ce8fa080 to slab
----

As you can see, in both cases everything happens in a similar way, except 
when dealing with fragmented packets the usage count is > 1 when 
conntrack_put is called.  Nomatter how long it is left idle, conntrack_put 
is never called again for the packet, so the memory never gets freed.  
However, in both cases the connection is removed from the hash table.

> Machine 3 answers the ping requests, doesn't it? You ping the same IP
> address all the time?

Yes, the machine always responds to the pings and I'm always using the 
same addresses.  My setup is as follows:

[ Machine 1 ]
     | 172.17.0.1/24
     |
     | 172.17.0.254/24
[ Machine 2 ]
     | 172.16.0.254/24
     |
     | 172.16.0.1/24
[ Machine 3 ]

I am consistently testing by making pings from machine 1 to machine 3 - 
machine 3 always responds and there is no other routing in place, so both 
the echo request and the echo reply are being routed through machine 2..

- Steve Hill
Senior Software Developer                        Email: steve@navaho.co.uk
Navaho Technologies Ltd.                           Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 13:36               ` Steve Hill
@ 2004-02-02 13:46                 ` Jozsef Kadlecsik
  2004-02-02 14:03                   ` Steve Hill
  2004-02-03  8:14                   ` Andi Kleen
  0 siblings, 2 replies; 21+ messages in thread
From: Jozsef Kadlecsik @ 2004-02-02 13:46 UTC (permalink / raw)
  To: Steve Hill; +Cc: netdev

On Mon, 2 Feb 2004, Steve Hill wrote:

> I turned on the debugging code and got:
>
> ----
> ip_conntrack_in: new packet for ce8fae40
> Altering reply tuple of ce8fae40 to tuple c0357de4: 1 172.16.0.1:5438 ->
> 172.17.0.1:0
> Altering reply tuple of ce8fae40 to tuple c0357cdc: 1 172.16.0.1:5438 ->
> 172.17.0.1:0
> Confirming conntrack ce8fae40
> conntrack_put ce8faebc 4
> conntrack_put ce8faebc 3
> clean_from_lists(ce8fae40)
> remove_expectations(ce8fae40)
> conntrack_put ce8faeb4 3
> conntrack_put ce8faec0 4
> conntrack_put ce8faec0 3
> ----
>
> (the conntrack_put debugging was added by me to the nf_conntrack_put()
> function - it shows the pointer to nfct and the usage count).
>
> If it send a small packet through, which won't be fragmented I get:
>
> ----
> ip_conntrack_in: new packet for ce8fa080
> Altering reply tuple of ce8fa080 to tuple c0357de4: 1 172.16.0.1:39486 ->
> 172.17.0.1:0
> Altering reply tuple of ce8fa080 to tuple c0357d20: 1 172.16.0.1:39486 ->
> 172.17.0.1:0
> Confirming conntrack ce8fa080
> conntrack_put ce8fa0fc 2
> clean_from_lists(ce8fa080)
> remove_expectations(ce8fa080)
> conntrack_put ce8fa0f4 2
> conntrack_put ce8fa100 1
> destroy_conntrack(ce8fa080)
> destroy_conntrack: returning ct=ce8fa080 to slab
> ----

You convinced me: something is really fishy. I fire up debugging and
checking.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 13:46                 ` Jozsef Kadlecsik
@ 2004-02-02 14:03                   ` Steve Hill
  2004-02-03  8:14                   ` Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: Steve Hill @ 2004-02-02 14:03 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netdev

On Mon, 2 Feb 2004, Jozsef Kadlecsik wrote:

> You convinced me: something is really fishy. I fire up debugging and
> checking.

:)
I'm just investigating the usage count ATM to see if I can work out what 
is (claiming to be) using the data.

- Steve Hill
Senior Software Developer                        Email: steve@navaho.co.uk
Navaho Technologies Ltd.                           Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 13:46                 ` Jozsef Kadlecsik
  2004-02-02 14:03                   ` Steve Hill
@ 2004-02-03  8:14                   ` Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2004-02-03  8:14 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: steve, netdev

On Mon, 2 Feb 2004 14:46:24 +0100 (CET)
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:

> 
> You convinced me: something is really fishy. I fire up debugging and
> checking.

I also have some problems with conntrack since updating to 2.6.2rc3
from 2.6.1 on x86-64. Some TCP connections for the masqueraded machines 
seem to go extremly slow. I haven't investigated more closely yet.

/proc/slabinfo doesn't show leaks for the conntrack slab though.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 10:48       ` Steve Hill
  2004-02-02 11:45         ` Jozsef Kadlecsik
@ 2004-02-03  8:48         ` Jozsef Kadlecsik
  2004-02-03 14:35           ` Steve Hill
  2004-02-04  9:20         ` Conntrack leak (2.6.2rc2) Harald Welte
  2 siblings, 1 reply; 21+ messages in thread
From: Jozsef Kadlecsik @ 2004-02-03  8:48 UTC (permalink / raw)
  To: Steve Hill; +Cc: netdev

On Mon, 2 Feb 2004, Steve Hill wrote:

> Just to confirm, you have your network set up like:
>
>     [ Machine 1 ]----[ Machine 2 ]----[Machine 3]
>
> Machines 1 and 3 are running the 2.4 kernel for me, but that shouldn't be
> important.
> Machine 2 is running 2.6.2rc2.

I created exactly the same setup (machine 1 and 3 are UMLs) and could not
reproduce the problem. tcpdump shows that machine 1 sends fragmented ICMP
echo requests and machine 3 sends ICMP echo reply back. On machine 2,
ip_conntrack_max is lowered to 10, still there is no problem after
hundreds of pings.

Do you have any extra patch applied on the top of 2.6.2rc2?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-03  8:48         ` Jozsef Kadlecsik
@ 2004-02-03 14:35           ` Steve Hill
  2004-02-03 15:32             ` Jozsef Kadlecsik
  0 siblings, 1 reply; 21+ messages in thread
From: Steve Hill @ 2004-02-03 14:35 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: netdev

On Tue, 3 Feb 2004, Jozsef Kadlecsik wrote:

> I created exactly the same setup (machine 1 and 3 are UMLs) and could not
> reproduce the problem. tcpdump shows that machine 1 sends fragmented ICMP
> echo requests and machine 3 sends ICMP echo reply back. On machine 2,
> ip_conntrack_max is lowered to 10, still there is no problem after
> hundreds of pings.
> 
> Do you have any extra patch applied on the top of 2.6.2rc2?

No extra patches, it's the vanilla 2.6.2rc2 kernel.  I'm running a 
nonmodular kernel and have spent this morning recompiling it with 
different options - the problem is only showing up when CONFIG_IP_NF_NAT 
is turned on, so I'm guessing that you are using a modular kernel and 
since you haven't set up any rules in the nat table, the module isn't 
loaded - try modprobing it and seeing if that helps.


- Steve Hill
Senior Software Developer                        Email: steve@navaho.co.uk
Navaho Technologies Ltd.                           Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-03 14:35           ` Steve Hill
@ 2004-02-03 15:32             ` Jozsef Kadlecsik
  2004-02-03 17:43               ` [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)] Jozsef Kadlecsik
  0 siblings, 1 reply; 21+ messages in thread
From: Jozsef Kadlecsik @ 2004-02-03 15:32 UTC (permalink / raw)
  To: Steve Hill; +Cc: netdev

On Tue, 3 Feb 2004, Steve Hill wrote:

> No extra patches, it's the vanilla 2.6.2rc2 kernel.  I'm running a
> nonmodular kernel and have spent this morning recompiling it with
> different options - the problem is only showing up when CONFIG_IP_NF_NAT
> is turned on, so I'm guessing that you are using a modular kernel and
> since you haven't set up any rules in the nat table, the module isn't
> loaded - try modprobing it and seeing if that helps.

I can confirm that with the NAT module loaded in, the leak you described
appears. As if the reference counts created as refragging the packet
would not be cleaned up properly...

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)]
  2004-02-03 15:32             ` Jozsef Kadlecsik
@ 2004-02-03 17:43               ` Jozsef Kadlecsik
  2004-02-03 17:48                 ` David S. Miller
                                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jozsef Kadlecsik @ 2004-02-03 17:43 UTC (permalink / raw)
  To: David Miller, Steve Hill; +Cc: netdev, netfilter-devel

Hi Dave,

Steve Hill reported a conntrack leakage in 2.6.2-rc2 when nat is enabled
and the system forwards fragmented packets. It turned out that an
nf_conntrack_put was missing from ip_copy_metadata:

--- a/net/ipv4/ip_output.c	2004-01-09 08:00:12.000000000 +0100
+++ t/net/ipv4/ip_output.c	2004-02-03 18:15:07.000000000 +0100
@@ -414,6 +414,7 @@
 	to->nfmark = from->nfmark;
 	to->nfcache = from->nfcache;
 	/* Connection association is same as pre-frag packet */
+	nf_conntrack_put(to->nfct);
 	to->nfct = from->nfct;
 	nf_conntrack_get(to->nfct);
 #ifdef CONFIG_BRIDGE_NETFILTER

Please apply the patch.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)]
  2004-02-03 17:43               ` [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)] Jozsef Kadlecsik
@ 2004-02-03 17:48                 ` David S. Miller
  2004-02-03 18:27                 ` David S. Miller
  2004-02-04 10:19                 ` Steve Hill
  2 siblings, 0 replies; 21+ messages in thread
From: David S. Miller @ 2004-02-03 17:48 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: steve, netdev, netfilter-devel

On Tue, 3 Feb 2004 18:43:38 +0100 (CET)
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:

> Steve Hill reported a conntrack leakage in 2.6.2-rc2 when nat is enabled
> and the system forwards fragmented packets. It turned out that an
> nf_conntrack_put was missing from ip_copy_metadata:

Yeah, but... look at what you patched.

>  	/* Connection association is same as pre-frag packet */
> +	nf_conntrack_put(to->nfct);
>  	to->nfct = from->nfct;
>  	nf_conntrack_get(to->nfct);

What about that comment?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)]
  2004-02-03 17:43               ` [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)] Jozsef Kadlecsik
  2004-02-03 17:48                 ` David S. Miller
@ 2004-02-03 18:27                 ` David S. Miller
  2004-02-04 10:19                 ` Steve Hill
  2 siblings, 0 replies; 21+ messages in thread
From: David S. Miller @ 2004-02-03 18:27 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: steve, netdev, netfilter-devel

On Tue, 3 Feb 2004 18:43:38 +0100 (CET)
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:

> Steve Hill reported a conntrack leakage in 2.6.2-rc2 when nat is enabled
> and the system forwards fragmented packets. It turned out that an
> nf_conntrack_put was missing from ip_copy_metadata:

Nevermind my previous email, it was a total thinko... you're patch
is obviously correct and we had this same damn exact problem with
the bridging skbuff nf objects as well. (see changeset 1.1474.41.3)

I'll apply your patch and push to Linus now.  Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 10:48       ` Steve Hill
  2004-02-02 11:45         ` Jozsef Kadlecsik
  2004-02-03  8:48         ` Jozsef Kadlecsik
@ 2004-02-04  9:20         ` Harald Welte
  2 siblings, 0 replies; 21+ messages in thread
From: Harald Welte @ 2004-02-04  9:20 UTC (permalink / raw)
  To: Steve Hill; +Cc: Jozsef Kadlecsik, netdev

[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]

On Mon, Feb 02, 2004 at 10:48:08AM +0000, Steve Hill wrote:

> If fragmented packets do not lead to conntrack entries, how are their 
> connections tracked?  I was under the impression that fragmented packets 
> were received by one NIC, defragged, pushed through all the netfilter code 
> and then transmitted by another NIC (after being fragmented again if they 
> are > MTU size)?

Yes, this is indeed the case.  Whihc is not a contradiction to what
Jozsef said.  They are defragmented before getting passed to conntrack,
and thus look exactly the same like unfragmented packets throughout the
network stack (until NF_IP_POST_ROUTING).

> Machines 1 and 3 are running the 2.4 kernel for me, but that shouldn't be 
> important.
> Machine 2 is running 2.6.2rc2.
> I am making > MTU sized pings from machine 1 to machine 3 and machine 2 is 
> showing the leak.

Are you running any netfilter / networking related patches?  Anything
else special about the setup?

> - Steve Hill
> Senior Software Developer                        Email: steve@navaho.co.uk

-- 
- Harald Welte <laforge@gnumonks.org>               http://www.gnumonks.org/
============================================================================
Programming is like sex: One mistake and you have to support it your lifetime

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Conntrack leak (2.6.2rc2)
  2004-02-02 10:34     ` Jozsef Kadlecsik
  2004-02-02 10:48       ` Steve Hill
@ 2004-02-04  9:22       ` Harald Welte
  1 sibling, 0 replies; 21+ messages in thread
From: Harald Welte @ 2004-02-04  9:22 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Steve Hill, netdev

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

On Mon, Feb 02, 2004 at 11:34:22AM +0100, Jozsef Kadlecsik wrote:
> On Mon, 2 Feb 2004, Steve Hill wrote:
> 
> > > init_conntrack is called only when we have full, non-fragmented
> > > packets: ip_conntrack_in explicitly calls the proper function to gather
> > > the fragments before calling init_conntrack. There is no memory leak
> > > there.
> >
> > >From my observations, init_conntrack() is being called for each packet
> > (not fragment, packet), which seems right.
> 
> No, that's not true (and would be bad). Please check the code.

To be more precise:

It is called for every NEW packet, after defragmentation happens (i.e.
if ip_conntrack_find_get() returns NULL, meaning there is no entry in
the hash table.).

-- 
- Harald Welte <laforge@gnumonks.org>               http://www.gnumonks.org/
============================================================================
Programming is like sex: One mistake and you have to support it your lifetime

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)]
  2004-02-03 17:43               ` [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)] Jozsef Kadlecsik
  2004-02-03 17:48                 ` David S. Miller
  2004-02-03 18:27                 ` David S. Miller
@ 2004-02-04 10:19                 ` Steve Hill
  2 siblings, 0 replies; 21+ messages in thread
From: Steve Hill @ 2004-02-04 10:19 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: David Miller, netdev, netfilter-devel

On Tue, 3 Feb 2004, Jozsef Kadlecsik wrote:

> Steve Hill reported a conntrack leakage in 2.6.2-rc2 when nat is enabled
> and the system forwards fragmented packets. It turned out that an
> nf_conntrack_put was missing from ip_copy_metadata:

I noticed this fix made it into the 2.6.2 release last night, so I have 
tested with a vanilla 2.6.2 kernel this morning and can confirm it's fixed 
the problem.  Thank you.

- Steve Hill
Senior Software Developer                        Email: steve@navaho.co.uk
Navaho Technologies Ltd.                           Tel: +44-870-7034015

        ... Alcohol and calculus don't mix - Don't drink and derive! ...

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2004-02-04 10:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-02  8:56 Conntrack leak (2.6.2rc2) Steve Hill
2004-02-02  9:22 ` Jozsef Kadlecsik
2004-02-02  9:46   ` Steve Hill
2004-02-02 10:34     ` Jozsef Kadlecsik
2004-02-02 10:48       ` Steve Hill
2004-02-02 11:45         ` Jozsef Kadlecsik
2004-02-02 11:58           ` Steve Hill
2004-02-02 12:47             ` Jozsef Kadlecsik
2004-02-02 13:36               ` Steve Hill
2004-02-02 13:46                 ` Jozsef Kadlecsik
2004-02-02 14:03                   ` Steve Hill
2004-02-03  8:14                   ` Andi Kleen
2004-02-03  8:48         ` Jozsef Kadlecsik
2004-02-03 14:35           ` Steve Hill
2004-02-03 15:32             ` Jozsef Kadlecsik
2004-02-03 17:43               ` [PATCH] fix netfilter refcounting [was Re: Conntrack leak (2.6.2rc2)] Jozsef Kadlecsik
2004-02-03 17:48                 ` David S. Miller
2004-02-03 18:27                 ` David S. Miller
2004-02-04 10:19                 ` Steve Hill
2004-02-04  9:20         ` Conntrack leak (2.6.2rc2) Harald Welte
2004-02-04  9:22       ` Harald Welte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).