All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Jesper Dangaard Brouer <hawk@diku.dk>
Cc: Robert Olsson <Robert.Olsson@data.slu.se>,
	jens.laas@data.slu.se, hans.liss@its.uu.se,
	linux-net@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Kernel panic: Route cache, RCU, possibly FIB trie.
Date: Tue, 21 Mar 2006 16:37:19 +0100	[thread overview]
Message-ID: <44201DAF.7090707@cosmosbay.com> (raw)
In-Reply-To: <Pine.LNX.4.61.0603211552590.28173@ask.diku.dk>

Jesper Dangaard Brouer a écrit :
> 
> On Tue, 21 Mar 2006, Robert Olsson wrote:
> 
>> Jesper Dangaard Brouer writes:
>>
>> > I have tried to track down the problem, and I think I have narrowed it
>> > a bit down.  My theory is that it is related to the route cache
>> > (ip_dst_cache) or FIB, which cannot dealloacate route cache slab
>> > elements (maybe RCU related).  (I have seen my route cache increase to
>> > around 520k entries using rtstat, before dying).
>> >
>> > I'm using the FIB trie system/algorithm (CONFIG_IP_FIB_TRIE). Think
>> > that the error might be cause by the "fib_trie" code.  See the syslog,
>> > output below.
>>
>> > Syslog#1 (indicating a problem with the fib trie)
>> > --------
>> > Mar 20 18:00:04 hostname kernel: Debug: sleeping function called 
>> from invalid context at mm/slab.c:2472
>> > Mar 20 18:00:04 hostname kernel: in_atomic():1, irqs_disabled():0
>> > Mar 20 18:00:04 hostname kernel:  [<c0103d9f>] dump_stack+0x1e/0x22
>> > Mar 20 18:00:04 hostname kernel:  [<c011cbe1>] __might_sleep+0xa6/0xae
>> > Mar 20 18:00:04 hostname kernel:  [<c014f3e9>] __kmalloc+0xd9/0xf3
>> > Mar 20 18:00:04 hostname kernel:  [<c014f5a4>] kzalloc+0x23/0x50
>> > Mar 20 18:00:04 hostname kernel:  [<c030ecd1>] tnode_alloc+0x3c/0x82
>> > Mar 20 18:00:04 hostname kernel:  [<c030edf6>] tnode_new+0x26/0x91
>> > Mar 20 18:00:04 hostname kernel:  [<c030f757>] halve+0x43/0x31d
>> > Mar 20 18:00:04 hostname kernel:  [<c030f090>] resize+0x118/0x27e
>>
>> Hello!
>>
>> Out of memory?
> One of the crashed was caused by out of memory, but all the memory was 
> allocated through slab.  More specifically to ip_dst_cache.
> 
>> Running BGP with full routing?
> No, running OSPF with around 760 subnets.
> 
>> And large number of flows.
> Yes, very large number of flows.
> 
>> Whats your normal number of entries route cache?
> On this machine, rigth now, between 14000 to 60000 entries in the route 
> cache.  On other machines, rigth now, I have a max of 151560 entries.
> 
>> And how much memory do you have?
> On this machine 1Gb memory (and 4 others), most of the machines have 2Gb.
> 
> 
>> From your report problems seems to related to flushing either 
>> rt_cache_flush
>> or fib_flush (before there was dev_close()?) so all associated entries 
>> should
>> freed. All the entries are freed via RCU which due to the deferred delete
>> can give a very high transient memory pressure. If we believe it's 
>> memory problem
>> we can try something out...
> 
> There is definitly high memory pressure on this machine!
> Slab memory usage, range from 39Mb to 205Mb (at the moment on the 
> production servers).
> 

Did you tried 2.6.16 ?

It contains changes in kernel/rcupdate.c so that not too many RCU elems are 
queued (force_quiescent_state()). So in the case a rt_cache_flush is done, you 
have the guarantee all entries are not pushed into rcu at once.

Eric

  reply	other threads:[~2006-03-21 15:37 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-20 21:44 Kernel panic: Route cache, RCU, possibly FIB trie Jesper Dangaard Brouer
2006-03-20 22:09 ` Dipankar Sarma
2006-03-21 10:29   ` Jesper Dangaard Brouer
2006-03-21 10:37     ` David S. Miller
2006-03-21 14:51       ` Jesper Dangaard Brouer
2006-03-21 21:25         ` David S. Miller
2006-03-23 15:35           ` Jesper Dangaard Brouer
2006-03-23 15:44             ` Jesper Dangaard Brouer
2006-03-23 16:15               ` Eric Dumazet
2006-03-23 21:37                 ` Jesper Dangaard Brouer
2006-03-24  6:11                   ` Eric Dumazet
2006-03-24 10:34                     ` Jesper Dangaard Brouer
2006-03-23 21:32             ` Robert Olsson
2006-03-21 13:28 ` Robert Olsson
2006-03-21 15:27   ` Jesper Dangaard Brouer
2006-03-21 15:37     ` Eric Dumazet [this message]
2006-03-21 15:43       ` Dipankar Sarma
2006-03-21 16:30         ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44201DAF.7090707@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=Robert.Olsson@data.slu.se \
    --cc=hans.liss@its.uu.se \
    --cc=hawk@diku.dk \
    --cc=jens.laas@data.slu.se \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-net@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.