Slow OOM in netif_RX function

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Slow OOM  in netif_RX function
@ 2008-01-24 17:28 Ivan Dichev
  2008-01-24 18:29 ` Stephen Hemminger
  2008-01-24 19:12 ` Eric Dumazet
  0 siblings, 2 replies; 14+ messages in thread
From: Ivan Dichev @ 2008-01-24 17:28 UTC (permalink / raw)
  To: netdev

Hello,
I got problem with my linux router. It has slow persistent OOM
problems from few months ago.
Every working(I mean days when more traffic is generated) day my
router is leaking with 15-20 MB memory and
after 2 weeks the restart is a MUST.
>From /proc/slabinfo I saw that size-2048 and size-512 are growing
rapidly every day when traffic occur.

--------- /proc/slabinfo --------------------
size-2048          20322  20349   2072    3    2 : tunables   24  
12    0 : slabdata   6780   6783      0
size-512           50984  51016    536    7    1 : tunables   32  
16    0 : slabdata   7288   7288      0


I was wondering who is allocating this mem pools and then I changed
the kernel with 2.6.23-rc12 including  options
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y


Unfortunately changing the kernel didn't solve the mem leak....
Now /proc/slab_allocators is showing that 3c59x driver is allocating
2048 and 512 bytes memory pools
caused by RX function.
--------- from /proc/slab_allocators ------------------------------
7612 size-2048: boomerang_rx+0x33b/0x437 [3c59x]
16018 size-512: boomerang_rx+0x165/0x437 [3c59x]

I was thinking that the 3com driver is bogus, .. but not!
After few days I changed the cards with rtl8139 and now ....
--------- from /proc/slab_allocators ------------------------------
size-2048: 20159 rtl8139_rx+0x155/0x2dc [8139too]
size-1024: 2693 rtl8139_rx+0x155/0x2dc [8139too]
size-512: 50515 rtl8139_rx+0x155/0x2dc [8139too]

the memory leak appear again in the same function(RX).

I did search over the mailing list and found as similar only this
http://www.spinics.net/lists/kernel/old/2003-q4/msg03071.html


For sure it does not depend on kernel version and network
driver(except case if both drivers are bogus :)
Any ideas ?

Ivan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM  in netif_RX function
  2008-01-24 17:28 Slow OOM in netif_RX function Ivan Dichev
@ 2008-01-24 18:29 ` Stephen Hemminger
  2008-01-24 19:12 ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: Stephen Hemminger @ 2008-01-24 18:29 UTC (permalink / raw)
  To: Ivan Dichev; +Cc: netdev

On Thu, 24 Jan 2008 19:28:09 +0200
Ivan Dichev <idichev@obs.bg> wrote:

> Hello,
> I got problem with my linux router. It has slow persistent OOM
> problems from few months ago.
> Every working(I mean days when more traffic is generated) day my
> router is leaking with 15-20 MB memory and
> after 2 weeks the restart is a MUST.
> From /proc/slabinfo I saw that size-2048 and size-512 are growing
> rapidly every day when traffic occur.
> 
> --------- /proc/slabinfo --------------------
> size-2048          20322  20349   2072    3    2 : tunables   24  
> 12    0 : slabdata   6780   6783      0
> size-512           50984  51016    536    7    1 : tunables   32  
> 16    0 : slabdata   7288   7288      0
> 
> 
> I was wondering who is allocating this mem pools and then I changed
> the kernel with 2.6.23-rc12 including  options
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y
> 
> 
> Unfortunately changing the kernel didn't solve the mem leak....
> Now /proc/slab_allocators is showing that 3c59x driver is allocating
> 2048 and 512 bytes memory pools
> caused by RX function.
> --------- from /proc/slab_allocators ------------------------------
> 7612 size-2048: boomerang_rx+0x33b/0x437 [3c59x]
> 16018 size-512: boomerang_rx+0x165/0x437 [3c59x]
> 
> I was thinking that the 3com driver is bogus, .. but not!
> After few days I changed the cards with rtl8139 and now ....
> --------- from /proc/slab_allocators ------------------------------
> size-2048: 20159 rtl8139_rx+0x155/0x2dc [8139too]
> size-1024: 2693 rtl8139_rx+0x155/0x2dc [8139too]
> size-512: 50515 rtl8139_rx+0x155/0x2dc [8139too]
> 
> the memory leak appear again in the same function(RX).
> 
> I did search over the mailing list and found as similar only this
> http://www.spinics.net/lists/kernel/old/2003-q4/msg03071.html
> 
> 
> For sure it does not depend on kernel version and network
> driver(except case if both drivers are bogus :)
> Any ideas ?
> 
> Ivan
> 

Receive packets are allocated by the driver, and then consumed by
the protocols or sockets.  The problem is in the consumer side, so you need
to go looking to see if lots of data is getting queued to some application
that is never reading.  Alternatively, it could be some form of control packet
that is not properly processed by a protocol.

Also look at firewall and classification rules, could be a buggy iptables rule?

-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM  in netif_RX function
  2008-01-24 17:28 Slow OOM in netif_RX function Ivan Dichev
  2008-01-24 18:29 ` Stephen Hemminger
@ 2008-01-24 19:12 ` Eric Dumazet
  2008-01-24 21:18   ` Ivan H. Dichev
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2008-01-24 19:12 UTC (permalink / raw)
  To: Ivan Dichev; +Cc: netdev

Ivan Dichev a écrit :
> Hello,
> I got problem with my linux router. It has slow persistent OOM
> problems from few months ago.
> Every working(I mean days when more traffic is generated) day my
> router is leaking with 15-20 MB memory and
> after 2 weeks the restart is a MUST.
>>From /proc/slabinfo I saw that size-2048 and size-512 are growing
> rapidly every day when traffic occur.
> 
> --------- /proc/slabinfo --------------------
> size-2048          20322  20349   2072    3    2 : tunables   24  
> 12    0 : slabdata   6780   6783      0
> size-512           50984  51016    536    7    1 : tunables   32  
> 16    0 : slabdata   7288   7288      0
> 
> 
> I was wondering who is allocating this mem pools and then I changed
> the kernel with 2.6.23-rc12 including  options
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y
> 
> 
> Unfortunately changing the kernel didn't solve the mem leak....
> Now /proc/slab_allocators is showing that 3c59x driver is allocating
> 2048 and 512 bytes memory pools
> caused by RX function.
> --------- from /proc/slab_allocators ------------------------------
> 7612 size-2048: boomerang_rx+0x33b/0x437 [3c59x]
> 16018 size-512: boomerang_rx+0x165/0x437 [3c59x]
> 
> I was thinking that the 3com driver is bogus, .. but not!
> After few days I changed the cards with rtl8139 and now ....
> --------- from /proc/slab_allocators ------------------------------
> size-2048: 20159 rtl8139_rx+0x155/0x2dc [8139too]
> size-1024: 2693 rtl8139_rx+0x155/0x2dc [8139too]
> size-512: 50515 rtl8139_rx+0x155/0x2dc [8139too]
> 
> the memory leak appear again in the same function(RX).
> 
> I did search over the mailing list and found as similar only this
> http://www.spinics.net/lists/kernel/old/2003-q4/msg03071.html
> 
> 
> For sure it does not depend on kernel version and network
> driver(except case if both drivers are bogus :)
> Any ideas ?
> 

Could you post your iptable rules "iptables -t nat -nvL ; iptables -nvL", and 
full "cat /proc/slabinfo" ?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Slow OOM in netif_RX function
  2008-01-24 19:12 ` Eric Dumazet
@ 2008-01-24 21:18   ` Ivan H. Dichev
  2008-01-24 21:51     ` Francois Romieu
  2008-01-25 13:21     ` Andi Kleen
  0 siblings, 2 replies; 14+ messages in thread
From: Ivan H. Dichev @ 2008-01-24 21:18 UTC (permalink / raw)
  To: netdev

Eric Dumazet writes: 

> Ivan Dichev a écrit :
>> Hello,
>> I got problem with my linux router. It has slow persistent OOM
>> problems from few months ago.
>> Every working(I mean days when more traffic is generated) day my
>> router is leaking with 15-20 MB memory and
>> after 2 weeks the restart is a MUST.
>>> From /proc/slabinfo I saw that size-2048 and size-512 are growing
>> rapidly every day when traffic occur. 
>> 
>> --------- /proc/slabinfo --------------------
>> size-2048          20322  20349   2072    3    2 : tunables   24  12    0 
>> : slabdata   6780   6783      0
>> size-512           50984  51016    536    7    1 : tunables   32  16    0 
>> : slabdata   7288   7288      0 
>> 
>> 
>> I was wondering who is allocating this mem pools and then I changed
>> the kernel with 2.6.23-rc12 including  options
>> CONFIG_DEBUG_SLAB=y
>> CONFIG_DEBUG_SLAB_LEAK=y 
>> 
>> 
>> Unfortunately changing the kernel didn't solve the mem leak....
>> Now /proc/slab_allocators is showing that 3c59x driver is allocating
>> 2048 and 512 bytes memory pools
>> caused by RX function.
>> --------- from /proc/slab_allocators ------------------------------
>> 7612 size-2048: boomerang_rx+0x33b/0x437 [3c59x]
>> 16018 size-512: boomerang_rx+0x165/0x437 [3c59x] 
>> 
>> I was thinking that the 3com driver is bogus, .. but not!
>> After few days I changed the cards with rtl8139 and now ....
>> --------- from /proc/slab_allocators ------------------------------
>> size-2048: 20159 rtl8139_rx+0x155/0x2dc [8139too]
>> size-1024: 2693 rtl8139_rx+0x155/0x2dc [8139too]
>> size-512: 50515 rtl8139_rx+0x155/0x2dc [8139too] 
>> 
>> the memory leak appear again in the same function(RX). 
>> 
>> I did search over the mailing list and found as similar only this
>> http://www.spinics.net/lists/kernel/old/2003-q4/msg03071.html 
>> 
>> 
>> For sure it does not depend on kernel version and network
>> driver(except case if both drivers are bogus :)
>> Any ideas ? 
>> 
> 
> Could you post your iptable rules "iptables -t nat -nvL ; iptables -nvL", 
> and full "cat /proc/slabinfo" ? 
> 
> 
 

Sorry but my firewall is 7000+ lines and I cant paste the rules.
And that's why it's very hard to debug the chains :( 

I have better idea! 

What could happen if I put different Lan card in every slot?
In ex. to-private -> 3com
       to-inet    -> VIA
       to-dmz     -> rtl8139
And then to look which RX function is consuming the memory.
(boomerang_rx, rtl8139_rx, ... etc) 

With this it will be easier to understand which iptables rules(bound to the 
found interface) have to be watched.
(I am not sure that it will work ?) 

Any other ideas appreciated. 

Ivan Dichev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-01-24 21:18   ` Ivan H. Dichev
@ 2008-01-24 21:51     ` Francois Romieu
  2008-01-25 13:21     ` Andi Kleen
  1 sibling, 0 replies; 14+ messages in thread
From: Francois Romieu @ 2008-01-24 21:51 UTC (permalink / raw)
  To: Ivan H. Dichev; +Cc: netdev

Ivan H. Dichev <idichev@obs.bg> :
[...]
> Any other ideas appreciated. 

Plot the slab values and the counters of the iptables rules against time ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-01-24 21:18   ` Ivan H. Dichev
  2008-01-24 21:51     ` Francois Romieu
@ 2008-01-25 13:21     ` Andi Kleen
  2008-01-25 14:12       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2008-01-25 13:21 UTC (permalink / raw)
  To: Ivan H. Dichev; +Cc: netdev

"Ivan H. Dichev" <idichev@obs.bg> writes:
>
> What could happen if I put different Lan card in every slot?
> In ex. to-private -> 3com
>       to-inet    -> VIA
>       to-dmz     -> rtl8139
> And then to look which RX function is consuming the memory.
> (boomerang_rx, rtl8139_rx, ... etc) 

The problem is unlikely to be in the driver (these are both
well tested ones) but more likely your complicated iptables setup somehow
triggers a skb leak.

There are unfortunately no shrink wrapped debug mechanisms in the kernel
for leaks like this (ok you could enable CONFIG_NETFILTER_DEBUG 
and see if it prints something interesting, but that's a long shot).

If you wanted to write a custom debugging patch I would do something like this:

- Add two new integer fields to struct sk_buff: a time stamp and a integer field
- Fill the time stamp with jiffies in alloc_skb and clear the integer field
- In __kfree_skb clear the time stamp
- For all the ipt target modules in net/ipv4/netfilter/*.c you use change their 
->target functions to put an unique value into the integer field you added.
- Do the same for the pkt_to_tuple functions for all conntrack modules

Then when you observe the leak take a crash dump using kdump on the router 
and then use crash to dump all the slab objects for the sk_head_cache.
Then look for any that have an old time stamp and check what value they
have in the integer field. Then the netfilter function who set that unique value 
likely triggered the leak somehow.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-01-25 13:21     ` Andi Kleen
@ 2008-01-25 14:12       ` Arnaldo Carvalho de Melo
  2008-02-01 12:51         ` Ivan Dichev
  0 siblings, 1 reply; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2008-01-25 14:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ivan H. Dichev, netdev

Em Fri, Jan 25, 2008 at 02:21:08PM +0100, Andi Kleen escreveu:
> "Ivan H. Dichev" <idichev@obs.bg> writes:
> >
> > What could happen if I put different Lan card in every slot?
> > In ex. to-private -> 3com
> >       to-inet    -> VIA
> >       to-dmz     -> rtl8139
> > And then to look which RX function is consuming the memory.
> > (boomerang_rx, rtl8139_rx, ... etc) 
> 
> The problem is unlikely to be in the driver (these are both
> well tested ones) but more likely your complicated iptables setup somehow
> triggers a skb leak.
> 
> There are unfortunately no shrink wrapped debug mechanisms in the kernel
> for leaks like this (ok you could enable CONFIG_NETFILTER_DEBUG 
> and see if it prints something interesting, but that's a long shot).
> 
> If you wanted to write a custom debugging patch I would do something like this:
> 
> - Add two new integer fields to struct sk_buff: a time stamp and a integer field
> - Fill the time stamp with jiffies in alloc_skb and clear the integer field
> - In __kfree_skb clear the time stamp
> - For all the ipt target modules in net/ipv4/netfilter/*.c you use change their 
> ->target functions to put an unique value into the integer field you added.
> - Do the same for the pkt_to_tuple functions for all conntrack modules
> 
> Then when you observe the leak take a crash dump using kdump on the router 
> and then use crash to dump all the slab objects for the sk_head_cache.
> Then look for any that have an old time stamp and check what value they
> have in the integer field. Then the netfilter function who set that unique value 
> likely triggered the leak somehow.

I wrote some systemtap scripts that do parts of what you suggest, and at
least for the timestamp there was no need to add a new field to struct
sk_buff, I just reuse skb->timestamp, as it is only used when we use a
packet sniffer. Here it is for reference, but it needs some tapsets I
wrote, so I'll publish this git repo in git.kernel.org, perhaps it can
be useful in this case as a starting point. Find another unused field
(hint: I know that at least 4 bytes on 64 bits is present as a hole) and
you're done, no need to rebuild the kernel :)

http://git.kernel.org/?p=linux/kernel/git/acme/nettaps.git

- Arnaldo

#!/usr/bin/stap

global stats_latency
global stats_bufsize

probe new_packet = kernel.function("__alloc_skb").return
{
	skb = $return
}

probe tcp_in = kernel.function("tcp_v4_rcv")
{
	skb = $skb
	sport = skb_tcphdr_sport(skb)
	dport = skb_tcphdr_dport(skb)
	saddr = skb_iphdr_saddr(skb)
	daddr = skb_iphdr_daddr(skb)
	len = $skb->len
	timestamp = skb_tstamp(skb)
}

probe tcp_out = kernel.function("tcp_transmit_skb")
{
	sk = $sk
	len = $skb->len
	timestamp = skb_tstamp($skb)
	sport = inet_sk_sport(sk)
	dport = inet_sk_dport(sk)
	saddr = inet_sk_saddr(sk)
	daddr = inet_sk_daddr(sk)
}

probe ip_in = kernel.function("ip_rcv")
{
	skb = $skb
	saddr = skb_iphdr_saddr(skb)
	daddr = skb_iphdr_daddr(skb)
	protocol = skb_iphdr_protocol(skb)
	len = $skb->len
	timestamp = skb_tstamp(skb)
}

probe ip_out = kernel.function("ip_queue_xmit")
{
	sk = $skb->sk
	len = $skb->len
	protocol = sk_protocol(sk)
	timestamp = skb_tstamp($skb)
	sport = inet_sk_sport(sk)
	dport = inet_sk_dport(sk)
	saddr = inet_sk_saddr(sk)
	daddr = inet_sk_daddr(sk)
}

probe dev_out = kernel.function("dev_hard_start_xmit")
{
	skb = $skb
	sk = $skb->sk
	len = $skb->len
	timestamp = skb_tstamp(skb)
	if (sk) {
		protocol = sk_protocol(sk)
		sport = inet_sk_sport(sk)
		dport = inet_sk_dport(sk)
		saddr = inet_sk_saddr(sk)
		daddr = inet_sk_daddr(sk)
	}
}

probe dev_in = kernel.function("netif_rx"), kernel.function("netif_receive_skb")
{
	skb = $skb
}

probe user_in = kernel.function("skb_copy_datagram_iovec"),
		kernel.function("skb_copy_and_csum_datagram")
{
	skb = $skb
	sk = $skb->sk
	len = len
	timestamp = skb_tstamp(skb)
	protocol = 0
	if (sk) {
		protocol = sk_protocol(sk)
		dport = inet_sk_dport(sk)
		sport = inet_sk_sport(sk)
		saddr = inet_sk_saddr(sk)
		daddr = inet_sk_daddr(sk)
	}
}

probe new_packet
{
	if (skb)
		skb_take_tstamp(skb)
}

probe dev_in
{
	if (skb)
		skb_take_tstamp(skb)
}

function add_sample(table_id, saddr, sport, daddr, dport, timestamp, len)
{
	/* We're only interested in loopback 
	if (daddr != 0x100007f)
		return 0 */
	delay = gettimeofday_ns() - timestamp
	if (delay < 0) {
		printf("delay < 0! timestamp=%d\n", timestamp)
		return 0
	}

	stats_latency[table_id, saddr, sport, daddr, dport] <<< delay
	stats_bufsize[table_id, saddr, sport, daddr, dport] <<< len
}

probe dev_out
{
	if (protocol == IPPROTO_TCP)
		add_sample("dev_out", saddr, sport, daddr, dport, timestamp, len)
}

probe tcp_out
{
	add_sample("tcp_out", saddr, sport, daddr, dport, timestamp, len)
}

probe ip_in
{
	if (protocol == IPPROTO_TCP) {
		sport = skb_iphdr_tcp_sport(skb)
		dport = skb_iphdr_tcp_dport(skb)

		add_sample("ip_in", daddr, dport, saddr, sport, timestamp, len)
	}
}

probe ip_out
{
	if (protocol == IPPROTO_TCP)
		add_sample("ip_out", daddr, dport, saddr, sport, timestamp, len)
}

probe tcp_in
{
	add_sample("tcp_in", daddr, dport, saddr, sport, timestamp, len)
}

probe user_in
{
	if (protocol == IPPROTO_TCP)
		add_sample("user_in", saddr, sport, daddr, dport, timestamp, len)
}

probe end
{
	printf("%8s %15.15s %5s %15s %5s %23s %18s\n",
	       "", "", "", "", "", "latency(ns)", "buffer size")
	printf("%8.8s %15.15s %5s %15.15s %5s %8s %7s %9s %5s %5s %5s\n",
	       "entry", "local address", "port", "remote address", "port",
	       "avg", "min", "max", "avg", "min", "max")

	foreach ([table_id-, saddr, sport, daddr, dport] in stats_latency) {
		printf("%-8.8s %15.15s %5d %15.15s %5d %8d %7d %9d %5d %5d %5d\n",
		       table_id, inet_sk_ntop(saddr), sport, inet_sk_ntop(daddr), dport,
		       @avg(stats_latency[table_id, saddr, sport, daddr, dport]),
		       @min(stats_latency[table_id, saddr, sport, daddr, dport]),
		       @max(stats_latency[table_id, saddr, sport, daddr, dport]),
		       @avg(stats_bufsize[table_id, saddr, sport, daddr, dport]),
		       @min(stats_bufsize[table_id, saddr, sport, daddr, dport]),
		       @max(stats_bufsize[table_id, saddr, sport, daddr, dport]))
	}
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-01-25 14:12       ` Arnaldo Carvalho de Melo
@ 2008-02-01 12:51         ` Ivan Dichev
  2008-02-01 13:16           ` Eric Dumazet
  2008-02-01 14:29           ` Andi Kleen
  0 siblings, 2 replies; 14+ messages in thread
From: Ivan Dichev @ 2008-02-01 12:51 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Andi Kleen, netdev

Arnaldo Carvalho de Melo wrote:
> Em Fri, Jan 25, 2008 at 02:21:08PM +0100, Andi Kleen escreveu:
>   
>> "Ivan H. Dichev" <idichev@obs.bg> writes:
>>     
>>> What could happen if I put different Lan card in every slot?
>>> In ex. to-private -> 3com
>>>       to-inet    -> VIA
>>>       to-dmz     -> rtl8139
>>> And then to look which RX function is consuming the memory.
>>> (boomerang_rx, rtl8139_rx, ... etc) 
>>>       
>> The problem is unlikely to be in the driver (these are both
>> well tested ones) but more likely your complicated iptables setup somehow
>> triggers a skb leak.
>>
>> There are unfortunately no shrink wrapped debug mechanisms in the kernel
>> for leaks like this (ok you could enable CONFIG_NETFILTER_DEBUG 
>> and see if it prints something interesting, but that's a long shot).
>>
>> If you wanted to write a custom debugging patch I would do something like this:
>>
>> - Add two new integer fields to struct sk_buff: a time stamp and a integer field
>> - Fill the time stamp with jiffies in alloc_skb and clear the integer field
>> - In __kfree_skb clear the time stamp
>> - For all the ipt target modules in net/ipv4/netfilter/*.c you use change their 
>> ->target functions to put an unique value into the integer field you added.
>> - Do the same for the pkt_to_tuple functions for all conntrack modules
>>
>> Then when you observe the leak take a crash dump using kdump on the router 
>> and then use crash to dump all the slab objects for the sk_head_cache.
>> Then look for any that have an old time stamp and check what value they
>> have in the integer field. Then the netfilter function who set that unique value 
>> likely triggered the leak somehow.
>>     
>
> I wrote some systemtap scripts that do parts of what you suggest, and at
> least for the timestamp there was no need to add a new field to struct
> sk_buff, I just reuse skb->timestamp, as it is only used when we use a
> packet sniffer. Here it is for reference, but it needs some tapsets I
> wrote, so I'll publish this git repo in git.kernel.org, perhaps it can
> be useful in this case as a starting point. Find another unused field
> (hint: I know that at least 4 bytes on 64 bits is present as a hole) and
> you're done, no need to rebuild the kernel :)
>
> http://git.kernel.org/?p=linux/kernel/git/acme/nettaps.git
>
> - Arnaldo
>   
Thanks to everyone for the given ideas.
I am not kernel guru so writing patch is difficult. This is a production
server and it is quite difficult to debug (only at night)
I removed some iptables exotics -  recent , ulog, string , but no effect.
Since we can reach OOM most of the memory is going to be filled with the
leak, and we are thinking to try to dump and analyze it.
We have looked at the "crash" tool, and we will see what we can do with
it. Meanwhile do you have any hint/ideas ?
Thanks a lot.

Ivan Dichev


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-02-01 12:51         ` Ivan Dichev
@ 2008-02-01 13:16           ` Eric Dumazet
  2008-02-01 15:38             ` Ivan Dichev
  2008-02-01 14:29           ` Andi Kleen
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2008-02-01 13:16 UTC (permalink / raw)
  To: Ivan Dichev; +Cc: Arnaldo Carvalho de Melo, Andi Kleen, netdev

Ivan Dichev a écrit :
> Arnaldo Carvalho de Melo wrote:
>   
>> Em Fri, Jan 25, 2008 at 02:21:08PM +0100, Andi Kleen escreveu:
>>   
>>     
>>> "Ivan H. Dichev" <idichev@obs.bg> writes:
>>>     
>>>       
>>>> What could happen if I put different Lan card in every slot?
>>>> In ex. to-private -> 3com
>>>>       to-inet    -> VIA
>>>>       to-dmz     -> rtl8139
>>>> And then to look which RX function is consuming the memory.
>>>> (boomerang_rx, rtl8139_rx, ... etc) 
>>>>       
>>>>         
>>> The problem is unlikely to be in the driver (these are both
>>> well tested ones) but more likely your complicated iptables setup somehow
>>> triggers a skb leak.
>>>
>>> There are unfortunately no shrink wrapped debug mechanisms in the kernel
>>> for leaks like this (ok you could enable CONFIG_NETFILTER_DEBUG 
>>> and see if it prints something interesting, but that's a long shot).
>>>
>>> If you wanted to write a custom debugging patch I would do something like this:
>>>
>>> - Add two new integer fields to struct sk_buff: a time stamp and a integer field
>>> - Fill the time stamp with jiffies in alloc_skb and clear the integer field
>>> - In __kfree_skb clear the time stamp
>>> - For all the ipt target modules in net/ipv4/netfilter/*.c you use change their 
>>> ->target functions to put an unique value into the integer field you added.
>>> - Do the same for the pkt_to_tuple functions for all conntrack modules
>>>
>>> Then when you observe the leak take a crash dump using kdump on the router 
>>> and then use crash to dump all the slab objects for the sk_head_cache.
>>> Then look for any that have an old time stamp and check what value they
>>> have in the integer field. Then the netfilter function who set that unique value 
>>> likely triggered the leak somehow.
>>>     
>>>       
>> I wrote some systemtap scripts that do parts of what you suggest, and at
>> least for the timestamp there was no need to add a new field to struct
>> sk_buff, I just reuse skb->timestamp, as it is only used when we use a
>> packet sniffer. Here it is for reference, but it needs some tapsets I
>> wrote, so I'll publish this git repo in git.kernel.org, perhaps it can
>> be useful in this case as a starting point. Find another unused field
>> (hint: I know that at least 4 bytes on 64 bits is present as a hole) and
>> you're done, no need to rebuild the kernel :)
>>
>> http://git.kernel.org/?p=linux/kernel/git/acme/nettaps.git
>>
>> - Arnaldo
>>   
>>     
> Thanks to everyone for the given ideas.
> I am not kernel guru so writing patch is difficult. This is a production
> server and it is quite difficult to debug (only at night)
> I removed some iptables exotics -  recent , ulog, string , but no effect.
> Since we can reach OOM most of the memory is going to be filled with the
> leak, and we are thinking to try to dump and analyze it.
> We have looked at the "crash" tool, and we will see what we can do with
> it. Meanwhile do you have any hint/ideas ?
> Thanks a lot.
>
>   
I understand you dont want to tell us exact firewall rules you have.

Maybe you could post at least following infos :

# cat /proc/slabinfo
# lsmod





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-02-01 12:51         ` Ivan Dichev
  2008-02-01 13:16           ` Eric Dumazet
@ 2008-02-01 14:29           ` Andi Kleen
  1 sibling, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2008-02-01 14:29 UTC (permalink / raw)
  To: Ivan Dichev; +Cc: Arnaldo Carvalho de Melo, Andi Kleen, netdev

On Fri, Feb 01, 2008 at 02:51:40PM +0200, Ivan Dichev wrote:
> Arnaldo Carvalho de Melo wrote:
> > Em Fri, Jan 25, 2008 at 02:21:08PM +0100, Andi Kleen escreveu:
> >   
> >> "Ivan H. Dichev" <idichev@obs.bg> writes:
> >>     
> >>> What could happen if I put different Lan card in every slot?
> >>> In ex. to-private -> 3com
> >>>       to-inet    -> VIA
> >>>       to-dmz     -> rtl8139
> >>> And then to look which RX function is consuming the memory.
> >>> (boomerang_rx, rtl8139_rx, ... etc) 
> >>>       
> >> The problem is unlikely to be in the driver (these are both
> >> well tested ones) but more likely your complicated iptables setup somehow
> >> triggers a skb leak.
> >>
> >> There are unfortunately no shrink wrapped debug mechanisms in the kernel
> >> for leaks like this (ok you could enable CONFIG_NETFILTER_DEBUG 
> >> and see if it prints something interesting, but that's a long shot).
> >>
> >> If you wanted to write a custom debugging patch I would do something like this:
> >>
> >> - Add two new integer fields to struct sk_buff: a time stamp and a integer field
> >> - Fill the time stamp with jiffies in alloc_skb and clear the integer field
> >> - In __kfree_skb clear the time stamp
> >> - For all the ipt target modules in net/ipv4/netfilter/*.c you use change their 
> >> ->target functions to put an unique value into the integer field you added.
> >> - Do the same for the pkt_to_tuple functions for all conntrack modules
> >>
> >> Then when you observe the leak take a crash dump using kdump on the router 
> >> and then use crash to dump all the slab objects for the sk_head_cache.
> >> Then look for any that have an old time stamp and check what value they
> >> have in the integer field. Then the netfilter function who set that unique value 
> >> likely triggered the leak somehow.
> >>     
> >
> > I wrote some systemtap scripts that do parts of what you suggest, and at
> > least for the timestamp there was no need to add a new field to struct
> > sk_buff, I just reuse skb->timestamp, as it is only used when we use a
> > packet sniffer. Here it is for reference, but it needs some tapsets I
> > wrote, so I'll publish this git repo in git.kernel.org, perhaps it can
> > be useful in this case as a starting point. Find another unused field
> > (hint: I know that at least 4 bytes on 64 bits is present as a hole) and
> > you're done, no need to rebuild the kernel :)
> >
> > http://git.kernel.org/?p=linux/kernel/git/acme/nettaps.git
> >
> > - Arnaldo
> >   
> Thanks to everyone for the given ideas.
> I am not kernel guru so writing patch is difficult. This is a production
> server and it is quite difficult to debug (only at night)
> I removed some iptables exotics -  recent , ulog, string , but no effect.
> Since we can reach OOM most of the memory is going to be filled with the
> leak, and we are thinking to try to dump and analyze it.

You could perhaps use crash to look for leaked packets and then 
see if you can see a pattern, as in what types of packets they are.

Still I expect without modifying the kernel to add some more
netfilter tracing it will be difficult to diagnose this.

I suppose it would be possible to write a suitable systemtap script
to also trace this without modifying the kernel, although it will be 
probably not easy and more complicated than just changing the C code.

-Andi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-02-01 13:16           ` Eric Dumazet
@ 2008-02-01 15:38             ` Ivan Dichev
  2008-02-04 14:54               ` Ivan Dichev
  0 siblings, 1 reply; 14+ messages in thread
From: Ivan Dichev @ 2008-02-01 15:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Arnaldo Carvalho de Melo, Andi Kleen, netdev


> I understand you dont want to tell us exact firewall rules you have.
>
> Maybe you could post at least following infos :
>
> # cat /proc/slabinfo
> # lsmod
>
I have changed slab with slub.

firewall #   cat /sys/slab/kmalloc-2048/alloc_calls
      1 add_sect_attrs+0x57/0x120 age=20565254 pid=1115
      1 __vmalloc_area_node+0x5d/0xf3 age=586962 pid=31548
      6 journal_init_revoke+0xe0/0x241 age=20562046/20563010/20566655
pid=1-1510
      6 journal_init_revoke+0x1c7/0x241 age=20562046/20563010/20566655
pid=1-1510
      6 journal_init_inode+0x7d/0x123 age=20562046/20563010/20566655
pid=1-1510
      1 tty_write+0xe8/0x1bc age=685813 pid=21217
      1 input_allocate_device+0x10/0x6c age=20566814 pid=38
      2 reqsk_queue_alloc+0x58/0xa8 age=5679409/8932742/12186076
pid=1135-2500
      5 alloc_netdev_mq+0x3c/0x9a age=20555675/20561345/20565141
pid=1233-3041
      1 neigh_hash_alloc+0x14/0x2c age=20527064 pid=0
     11 neigh_sysctl_register+0x24/0x1fd age=20555673/20560511/20567588
pid=1-3041
      6 qdisc_alloc+0x1b/0x70 age=585308/585373/585539 pid=31629-31818
     26 qdisc_get_rtab+0x5e/0xa2 age=585498/585519/585535 pid=31630-31664
     11 devinet_sysctl_register+0x21/0xd7 age=20555673/20560511/20567588
pid=1-3041
      1 netlink_proto_init+0x2a/0x123 age=20567604 pid=1
   3106 boomerang_rx+0x30d/0x40d [3c59x] age=1/9966140/20553543 pid=0-22895
      3 bm_init+0x28/0xa3 [ts_bm] age=586918/586918/586918 pid=31548

firewall #  cat /sys/slab/kmalloc-2048/free_calls
    109 <not-available> age=20515711 pid=0
      1 rcu_do_batch+0x1a/0x71 age=608627 pid=31627
     19 kobject_uevent_env+0x3c5/0x3da age=20578755/20585359/20590686
pid=1-3041
   3055 kfree_skbmem+0x8/0x68 age=3/9973654/20579063 pid=0-31818
      1 pskb_expand_head+0xe3/0x13d age=15946314 pid=695
      1 huft_build+0x498/0x4a2 age=20590653 pid=1
      3 htb_destroy_class+0x5e/0x12e [sch_htb] age=608630/608630/608630
pid=31625
      6 htb_destroy_class+0x69/0x12e [sch_htb] age=608630/608630/608630
pid=31625
fire-sp # lsmod
Module                  Size  Used by
xt_string               2272  3
ipt_ULOG                8004  5
ipt_recent              9360  40
softdog                 5792  0
act_mirred              5060  4
cls_u32                 7972  4
sch_sfq                 5760  56
cls_fw                  5408  54
sch_htb                16192  6
ifb                     5156  0
aes                    28512  0
des                    15456  0
md5                     3936  0
sha256                  9248  0
ipsec                 312176  2
nf_nat_tftp             1792  0
nf_conntrack_tftp       5144  1 nf_nat_tftp
nf_nat_pptp             3712  0
nf_conntrack_pptp       6688  1 nf_nat_pptp
nf_conntrack_proto_gre     4992  1 nf_conntrack_pptp
nf_nat_proto_gre        2724  1 nf_nat_pptp
nf_nat_ftp              3236  0
nf_conntrack_ftp        8680  1 nf_nat_ftp
ipt_tos                 1536  492
xt_mark                 1760  12
xt_DSCP                 2336  13
ipt_NETMAP              1888  6
xt_TCPMSS               4064  4
xt_length               1856  3
ts_bm                   2304  3
xt_mac                  1792  28
ipt_REJECT              4416  74
xt_limit                2496  153
xt_state                2368  2948
iptable_nat             7172  1
nf_nat                 18412  6
nf_nat_tftp,nf_nat_pptp,nf_nat_proto_gre,nf_nat_ftp,ipt_NETMAP,iptable_nat
nf_conntrack_ipv4      16744  2950 iptable_nat
nf_conntrack           57412  11
nf_nat_tftp,nf_conntrack_tftp,nf_nat_pptp,nf_conntrack_pptp,nf_conntrack_proto_gre,nf_nat_ftp,nf_conntrack_ftp,xt_state,iptable_nat,nf_nat,nf_conntrack_ipv4
nfnetlink               5784  3 nf_nat,nf_conntrack_ipv4,nf_conntrack
xt_MARK                 2176  28
iptable_mangle          2720  1
xt_multiport            3232  2325
iptable_filter          2852  1
ip_tables              12824  3 iptable_nat,iptable_mangle,iptable_filter
binfmt_misc            10792  1
dm_mirror              20608  0
dm_mod                 53280  1 dm_mirror
i2c_viapro              8252  0
i2c_core               23376  1 i2c_viapro
3c59x                  41132  0
mii                     5280  1 3c59x
floppy                 53892  0
pata_via               11684  0
libata                110188  1 pata_via
scsi_mod              137996  1 libata
raid1                  20448  7
firewall#

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-02-01 15:38             ` Ivan Dichev
@ 2008-02-04 14:54               ` Ivan Dichev
  2008-02-04 15:55                 ` Andi Kleen
  0 siblings, 1 reply; 14+ messages in thread
From: Ivan Dichev @ 2008-02-04 14:54 UTC (permalink / raw)
  Cc: Eric Dumazet, Arnaldo Carvalho de Melo, Andi Kleen, netdev

Hi,

Thanks again for your help...

Here's more debug info (long email !):

We installed crash, compiled a kernel with debug symbols, dumped all the
allocated size-2048 slabs, waited some time, and re-dumped them. Then we
compared both dumps: we assumed that slab dumps which were not modified
could be considered as leaks (see end of mail for commands we used).

>From the 3c59x driver source, boomerang_rx() has only a "struct
net_device" as argument, so the idea was to take a dumped slab that
looked like a leak, remove any offset, and "apply" a struct net_device
to the dumped slab data. Then we could have a clue on which interface
the problem happens, and dig deeper to find - say - the packet ip header.

Result: none of the "leaked" slabs seem to match struct net_device.
"Valid" slabs are found in the dumps though, but not in the leaked one.

Example:

a valid slab hexdump:

c0 88 56 63 c5 56 41 d8  00 00 00 00 00 00 00 00  |..Vc.VA.........|
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
65 74 68 32 00 00 00 00  00 00 00 00 00 00 00 00  |eth2............|
00 00 00 00 28 6f 37 c0  00 00 00 00 00 00 00 00  |....(o7.........|
00 20 82 d0 0c 00 00 00  08 00 00 00 06 00 00 00  |. ..............|
[...]

There seems to be a 32 byts slab header, then struct net_device which
begins with a 16 bytes interface name (here eth2). If we "apply" a
struct net_device, we can also find the irq, in this case 12, which is
the correct value on our machine.


Now, with a "leaked" slab:

c0 88 56 63 c5 56 41 d8  5a 5a 5a 5a 5a 5a 5a 5a  |..Vc.VA.ZZZZZZZZ|
5a 5a 5a 5a 5a 5a 5a 5a  5a 5a 00 0a 5e 5d cf 88  |ZZZZZZZZZZ..^]..|
00 11 20 da 91 01 08 00  45 20 05 d8 5e de 00 00  |.. .....E ..^...|
38 32 00 00 d5 5b 97 c2  55 5f 42 32 61 14 cd 3b  |82...[..U_B2a..;|
[...]

Nothing that looks like a struct net_device. All the dumped leaked slab
look the same until "45 20 05 d8" (the ascii 'E' on the 3rd line).


It took quite a bit of time to dig that far (for non kernel experts like
us!), and we're now out of ideas. Is it possible to have something else
than a struct net_device for boomerang_rx() ? Any idea ? Writing a patch
with the ideas mentioned before in this thread is above my level...


Things are also quite weird since we don't seem to have this problem on
two other similar machines (one 100% identical with less traffic, and
another one with the same distro/soft but different hardware).
Also note that all the machines use the out-of-tree openswan ipsec.ko
module, but it doesn't seem to be the problem since the other 2 machines
don't leak, and we didn't find any correlation between plotted IKE
packets / VPN traffic against slab leaks.

Another weird fact is that the leak increase is somewhat correlated to
network traffic - it grows slowly - but there are huge steps (ie. 1000+
more slabs in a few minutes) that are not bound to any traffic peak; if
needed, I can upload the graphs somewhere.

Some other things that might be useful: when we switched from 2.6.16.x
to 2.6.23.14, we began to have "eth1: Too much work in interrupt, status
8401" messages. Playing with 3c59x driver option "max_interrupt_work"
didn't help.

When doing tests with a kernel with slub instead of slab and misc
changes - I think we tried tickless, but not sure - we also got the
following oopses (once):

swapper: page allocation failure. order:1, mode:0x4020
 [<c0136e1a>] __alloc_pages+0x295/0x2a4
 [<c0149a77>] allocate_slab+0x59/0x96
 [<c0149b05>] new_slab+0x32/0x126
 [<c014982a>] alloc_debug_processing+0xcf/0x10c
 [<c0149eee>] __slab_alloc+0x80/0xdb
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<c014ada5>] __kmalloc_track_caller+0x44/0x91
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<c021ee94>] __alloc_skb+0x46/0xef
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<d0886b0d>] boomerang_interrupt+0x11e/0x324 [3c59x]
 [<c011295b>] profile_tick+0x38/0x52
 [<c0131c31>] handle_IRQ_event+0x1a/0x3f
 [<c0132782>] handle_level_irq+0x0/0x85
 [<c01327d2>] handle_level_irq+0x50/0x85
 [<c010356e>] do_IRQ+0x7d/0xa3
 [<c010cc7e>] update_stats_wait_end+0xa5/0xc2
 [<c0102547>] common_interrupt+0x23/0x28
 [<c010083c>] default_idle+0x0/0x39
 [<c0100863>] default_idle+0x27/0x39
 [<c01008bc>] cpu_idle+0x44/0x60
 [<c031c7b5>] start_kernel+0x1cd/0x1d1
 [<c031c33f>] unknown_bootoption+0x0/0x139


swapper: page allocation failure. order:1, mode:0x4020
 [<c0136e1a>] __alloc_pages+0x295/0x2a4
 [<c0149a77>] allocate_slab+0x59/0x96
 [<c0149b05>] new_slab+0x32/0x126
 [<c014982a>] alloc_debug_processing+0xcf/0x10c
 [<c0149eee>] __slab_alloc+0x80/0xdb
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<c014ada5>] __kmalloc_track_caller+0x44/0x91
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<c021ee94>] __alloc_skb+0x46/0xef
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<d0886b0d>] boomerang_interrupt+0x11e/0x324 [3c59x]
 [<c0131c31>] handle_IRQ_event+0x1a/0x3f
 [<c01327d2>] handle_level_irq+0x50/0x85
 [<c0103579>] do_IRQ+0x88/0xa3
 [<c0102547>] common_interrupt+0x23/0x28
 [<c0131c2d>] handle_IRQ_event+0x16/0x3f
 [<c01327d2>] handle_level_irq+0x50/0x85
 [<c0103579>] do_IRQ+0x88/0xa3
 [<c0102547>] common_interrupt+0x23/0x28
 [<c0131c2d>] handle_IRQ_event+0x16/0x3f
 [<c01327d2>] handle_level_irq+0x50/0x85
 [<c0103579>] do_IRQ+0x88/0xa3
 [<c0149a77>] allocate_slab+0x59/0x96
 [<c0102547>] common_interrupt+0x23/0x28
 [<c014adb7>] __kmalloc_track_caller+0x56/0x91
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<c021ee94>] __alloc_skb+0x46/0xef
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<d0886b0d>] boomerang_interrupt+0x11e/0x324 [3c59x]
 [<c011295b>] profile_tick+0x38/0x52
 [<c0131c31>] handle_IRQ_event+0x1a/0x3f
 [<c0132782>] handle_level_irq+0x0/0x85
 [<c01327d2>] handle_level_irq+0x50/0x85
 [<c010356e>] do_IRQ+0x7d/0xa3
 [<c010cc7e>] update_stats_wait_end+0xa5/0xc2
 [<c0102547>] common_interrupt+0x23/0x28
 [<c010083c>] default_idle+0x0/0x39
 [<c0100863>] default_idle+0x27/0x39
 [<c01008bc>] cpu_idle+0x44/0x60
 [<c031c7b5>] start_kernel+0x1cd/0x1d1
 [<c031c33f>] unknown_bootoption+0x0/0x139

swapper: page allocation failure. order:1, mode:0x4020
 [<c0136e1a>] __alloc_pages+0x295/0x2a4
 [<c0149a77>] allocate_slab+0x59/0x96
 [<c0149b05>] new_slab+0x32/0x126
 [<c014982a>] alloc_debug_processing+0xcf/0x10c
 [<c0149eee>] __slab_alloc+0x80/0xdb
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<c014ada5>] __kmalloc_track_caller+0x44/0x91
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<c021ee94>] __alloc_skb+0x46/0xef
 [<d088731f>] boomerang_rx+0x30d/0x40d [3c59x]
 [<d0886b0d>] boomerang_interrupt+0x11e/0x324 [3c59x]
 [<c011295b>] profile_tick+0x38/0x52
 [<c0131c31>] handle_IRQ_event+0x1a/0x3f
 [<c0132782>] handle_level_irq+0x0/0x85
 [<c01327d2>] handle_level_irq+0x50/0x85
 [<c010356e>] do_IRQ+0x7d/0xa3
 [<c010cc7e>] update_stats_wait_end+0xa5/0xc2
 [<c0102547>] common_interrupt+0x23/0x28
 [<c010083c>] default_idle+0x0/0x39
 [<c0100863>] default_idle+0x27/0x39
 [<c01008bc>] cpu_idle+0x44/0x60
 [<c031c7b5>] start_kernel+0x1cd/0x1d1
 [<c031c33f>] unknown_bootoption+0x0/0x139


(I'm wondering what's the unknown_bootoption; ours are "ro root=/dev/md1
nousb panic=10").


Slab dump commands:

# in crash:
 kmem -S size-2048 > kmem_S

# in another shell:
 awk -f extract_slabs.awk kmem_S > dump_cmds

# in crash:
 source dump_cmds

then redo a dump later and find the same slabs; these should be leaks:

for i in $(ls memdump/); do
        [ -f memdump1/$i ] || continue
        cmp -s memdump/$i memdump1/$i || continue
        echo $i
done > same_slabs



extract_slabs.awk:
/ *\[[a-f0-9]+\] */ {
        beg_hex = strtonum(gensub(/ *\[([a-f0-9]+)\] */, "0x\\1", "g",
$1));
        printf("dump memory /home/slab_analysis/memdump/memdump-%x 0x%x
0x%x\n", beg_hex, beg_hex, beg_hex + 2072);
}


Ivan Dichev

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-02-04 14:54               ` Ivan Dichev
@ 2008-02-04 15:55                 ` Andi Kleen
  2008-02-05  9:04                   ` Ivan Mitev
  0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2008-02-04 15:55 UTC (permalink / raw)
  To: Ivan Dichev; +Cc: Eric Dumazet, Arnaldo Carvalho de Melo, Andi Kleen, netdev

> Nothing that looks like a struct net_device. All the dumped leaked slab
> look the same until "45 20 05 d8" (the ascii 'E' on the 3rd line).

45 ... is often the start of an IP header (IPv4, 5*4=20 bytes length)

You could dump them to a file (e.g. using a sial script) and then
look at them with tcpdump or similar to get an idea what kinds 
of packets they are.

-Andi


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Slow OOM in netif_RX function
  2008-02-04 15:55                 ` Andi Kleen
@ 2008-02-05  9:04                   ` Ivan Mitev
  0 siblings, 0 replies; 14+ messages in thread
From: Ivan Mitev @ 2008-02-05  9:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ivan Dichev, Eric Dumazet, Arnaldo Carvalho de Melo, netdev

[(the other) Ivan took a few days holidays, so I'm replacing him for 
this issue.]

Andi, you spotted it, it was really the start of an IP header, and it 
shows up that these are ESP packets for a quite complicated VPN tunnel 
we have (re-routing packets from an office to another, with some NAT on 
top of that). So openswan/ipsec.ko seems to be the problem here, I will 
file a bug report there. Meanwhile we'll try to set up manual keying and 
decrypt the encrypted payload to gather more details on the packets.

My apologies, the issue seems to be with an out-of-tree module, but we 
really didn't think the problem was there (there's no correlation 
between the leak increase and vpn/ike traffic). But it was interesting 
to understand slabs, learn how to setup/use crash, and analyze memory 
bits :)

Thanks again to all the people who helped !

Ivan Mitev

Andi Kleen wrote:
>> Nothing that looks like a struct net_device. All the dumped leaked slab
>> look the same until "45 20 05 d8" (the ascii 'E' on the 3rd line).
> 
> 45 ... is often the start of an IP header (IPv4, 5*4=20 bytes length)
> 
> You could dump them to a file (e.g. using a sial script) and then
> look at them with tcpdump or similar to get an idea what kinds 
> of packets they are.
> 
> -Andi
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-02-05  9:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-24 17:28 Slow OOM in netif_RX function Ivan Dichev
2008-01-24 18:29 ` Stephen Hemminger
2008-01-24 19:12 ` Eric Dumazet
2008-01-24 21:18   ` Ivan H. Dichev
2008-01-24 21:51     ` Francois Romieu
2008-01-25 13:21     ` Andi Kleen
2008-01-25 14:12       ` Arnaldo Carvalho de Melo
2008-02-01 12:51         ` Ivan Dichev
2008-02-01 13:16           ` Eric Dumazet
2008-02-01 15:38             ` Ivan Dichev
2008-02-04 14:54               ` Ivan Dichev
2008-02-04 15:55                 ` Andi Kleen
2008-02-05  9:04                   ` Ivan Mitev
2008-02-01 14:29           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).