From: Eric Dumazet <dada1@cosmosbay.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: davem@davemloft.net, netdev@oss.sgi.com, Robert.Olsson@data.slu.se
Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire()
Date: Sat, 02 Apr 2005 11:21:30 +0200 [thread overview]
Message-ID: <424E641A.1020609@cosmosbay.com> (raw)
In-Reply-To: <E1DHdsP-0003Lr-00@gondolin.me.apana.org.au>
Herbert Xu a écrit :
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>>OK this patch includes everything...
>>
>> - Locking abstraction
>> - rt_check_expire() fixes
>> - New gc_interval_ms sysctl to be able to have timer gc_interval < 1 second
>> - New gc_debug sysctl to let sysadmin tune gc
>> - Less memory used by hash table (spinlocks moved to a smaller table)
>> - sizing of spinlocks table depends on NR_CPUS
>> - hash table allocated using alloc_large_system_hash() function
>> - header fix for /proc/net/stat/rt_cache
>
>
> This patch is doing too many things. How about splitting it up?
>
> For instance the spin lock stuff is pretty straightforward and
> should be in its own patch.
>
> The benefits of the GC changes are not obvious to me. rt_check_expire
> is simply meant to kill off old entries. It's not really meant to be
> used to free up entries when the table gets full.
Well, I began my work because of the overflow bug in rt_check_expire()...
Then I realize this function could not work as expected. On a loaded machine, one timer tick is 1 ms.
During this time, number of chains that are scanned is ridiculous.
With the standard timer of 60 second, fact is rt_check_expire() is useless.
>
> rt_garbage_collect on the other hand is designed to free entries
> when it is needed. Eric raised the point that rt_garbage_collect
> is pretty expensive. So what about amortising its cost a bit more?
Yes. rt_garbage_collect() has serious problems. But this function is sooo complex I dont want to touch it and let experts do it if they
want. But then one may think why we have two similar functions that are doing basically the same thing : garbage collection.
One of a production machine rtstat -i 1 output is :
rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti|
out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|out_hlis|
| | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow|
_search|t_search|
2618087| 28581| 7673| 0| 0| 0| 0| 0| 1800| 1450| 0| 0| 0| 0| 0|
37630| 4783|
2618689| 25444| 4918| 0| 0| 0| 0| 0| 2051| 1699| 0| 0| 0| 0| 0|
27741| 5461|
2619369| 25000| 4567| 0| 0| 0| 0| 0| 1860| 1304| 0| 0| 0| 0| 0|
26606| 4563|
2618396| 24830| 4633| 0| 0| 0| 0| 0| 1959| 1492| 0| 0| 0| 0| 0|
26643| 4930|
Without serious tuning, this machine could not handle this load, or even half of it.
Crashes usually occurs when secret_interval interval is elapsed : rt_cache_flush(0); is called, and the whole machine begins to die.
>
> For instance, we can set a new threshold that's lower than gc_thresh
> and perform GC on the chain being inserted in rt_intern_hash if we're
> above that threshold.
We could also try to perform GC on L1_CACHE_SIZE/sizeof(struct rt_hash_bucket) chains, not only the 'current chain', to fully use the cache
miss.
>
> Cheers,
Thank you
next prev parent reply other threads:[~2005-04-02 9:21 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-15 16:13 [PATCH] no more rwlock_t inside tcp_ehash_bucket Eric Dumazet
2005-03-15 18:32 ` David S. Miller
2005-03-16 10:47 ` [BUG] overflow in net/ipv4/route.c rt_check_expire() Eric Dumazet
2005-03-16 18:05 ` [PATCH] reduce sizeof(struct inet_peer) from 128 to 64 bytes on 64bits architectures Eric Dumazet
2005-03-16 22:10 ` David S. Miller
2005-03-16 22:09 ` [BUG] overflow in net/ipv4/route.c rt_check_expire() David S. Miller
2005-03-17 19:52 ` Eric Dumazet
2005-04-01 6:13 ` David S. Miller
2005-04-01 14:39 ` Eric Dumazet
2005-04-01 15:53 ` Robert Olsson
2005-04-01 16:34 ` Eric Dumazet
2005-04-01 17:26 ` Robert Olsson
2005-04-01 20:28 ` David S. Miller
2005-04-01 21:05 ` Eric Dumazet
2005-04-01 21:08 ` David S. Miller
2005-04-01 21:43 ` Eric Dumazet
2005-04-01 22:34 ` David S. Miller
2005-04-01 23:21 ` Eric Dumazet
2005-04-01 23:54 ` David S. Miller
2005-04-02 8:21 ` Herbert Xu
2005-04-02 9:21 ` Eric Dumazet [this message]
2005-04-02 11:23 ` Get rid of rt_check_expire and rt_garbage_collect Herbert Xu
2005-04-02 13:58 ` Eric Dumazet
2005-04-02 14:03 ` Robert Olsson
2005-04-02 21:05 ` jamal
2005-04-03 7:38 ` Herbert Xu
2005-04-03 7:41 ` Herbert Xu
2005-04-02 13:48 ` [BUG] overflow in net/ipv4/route.c rt_check_expire() Robert Olsson
2005-04-02 14:10 ` Eric Dumazet
2005-04-02 14:46 ` Robert Olsson
2005-04-02 20:47 ` jamal
2005-04-02 19:32 ` Herbert Xu
2005-04-02 19:55 ` David S. Miller
2005-04-03 7:43 ` Herbert Xu
2005-04-03 19:57 ` Robert Olsson
2005-04-03 21:45 ` Herbert Xu
2005-04-04 10:27 ` Robert Olsson
2005-04-04 10:38 ` Herbert Xu
2005-04-04 12:29 ` Robert Olsson
2005-04-03 19:36 ` Robert Olsson
2005-04-03 21:43 ` Herbert Xu
2005-04-04 10:38 ` Robert Olsson
2005-04-04 10:48 ` Herbert Xu
2005-04-04 13:17 ` Robert Olsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=424E641A.1020609@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=Robert.Olsson@data.slu.se \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).