From: Eric Dumazet <dada1@cosmosbay.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: davem@davemloft.net, netdev@oss.sgi.com,
Robert.Olsson@data.slu.se, hadi@cyberus.ca
Subject: Re: Get rid of rt_check_expire and rt_garbage_collect
Date: Sat, 02 Apr 2005 15:58:55 +0200 [thread overview]
Message-ID: <424EA51F.6000300@cosmosbay.com> (raw)
In-Reply-To: <20050402112304.GA11321@gondor.apana.org.au>
Herbert Xu a écrit :
> On Sat, Apr 02, 2005 at 11:21:30AM +0200, Eric Dumazet wrote:
>
>>Well, I began my work because of the overflow bug in rt_check_expire()...
>>Then I realize this function could not work as expected. On a loaded
>>machine, one timer tick is 1 ms.
>>During this time, number of chains that are scanned is ridiculous.
>>With the standard timer of 60 second, fact is rt_check_expire() is useless.
>
>
> I see. What we've got here is a scalability problem with respect
> to the number of hash buckets. As the number of buckets increases,
> the amount of work the timer GC has to perform inreases proportionally.
>
> Since the timer GC parameters are fixed, this will eventually break.
>
> Rather than changing the timer GC so that it runs more often to keep
> up with the large routing cache, we should get out of this by reducing
> the amount of work we have to do.
>
> Imagine an ideal balanced hash table with 2.6 million entries. That
> is, all incoming/outgoing packets belong to flows that are already in
> the hash table. Imagine also that there is no PMTU/link failure taking
> place so all entries are valid forever.
>
> In this state there is absolutely no need to execute the timer GC.
>
> Let's remove one of those assumptions and allow there to be entries
> which need to expire after a set period.
>
> Instead of having the timer GC clean them up, we can move the expire
> check to the place where the entries are used. That is, we make
> ip_route_input/ip_route_output/ipv4_dst_check check whether the
> entry has expired.
>
> On the face of it we're doing more work since every routing cache
> hit will need to check the validity of the dst. However, because
> it's a single subtraction it is actually pretty cheap. There is
> also no additional cache miss compared to doing it in the timer
> GC since we have to read the dst anyway.
>
> Let's go one step further and make the routing cache come to life.
> Now there are new entries coming in and we need to remove old ones
> in order to make room for them.
>
> That task is currently carried out by the timer GC in rt_check_expire
> and on demand by rt_garbage_collect. Either way we have to walk the
> entire routing cache looking for entries to get rid of.
>
> This is quite expensive when the routing cache is large. However,
> there is a better way.
>
> The reason we keep a cap on the routing cache (for a given hash size)
> is so that individual chains do not degenerate into long linked lists.
>
> In other words, we don't really care about how many entries there are
> in the routing cache. But we do care about how long each hash chain
> is.
>
> So instead of walking the entire routing cache to keep the number of
> entries down, what we should do is keep each hash chain as short as
> possible.
>
> Assuming that the hash function is good, this should achieve the
> same end result.
>
> Here is how it can be done: Every time a routing entry is inserted into
> a hash chain, we perform GC on that chain unconditionally.
>
> It might seem that we're doing more work again. However, as before
> because we're traversing the chain anyway, it is very cheap to perform
> the GC operations which mainly involve the checks in rt_may_expire.
>
> OK that's enough thinking and it's time to write some code to see
> whether this is all bullshit :)
>
> Cheers,
Well, it may work if you dont care about memory used.
# grep dst /proc/slabinfo
ip_dst_cache 2825575 2849590 384 10 1 : tunables 54 27 8 : slabdata 284959 284959 0
On this machine, route cache takes 1.1 GB of ram... impressive.
Then if the network load decrease (or completely stop), only a timer driven gc could purge the cache.
So rt_check_expire() is *needed*
You are right saying that gc parameters are fixed, thus gc breaks at high load.
Eric
next prev parent reply other threads:[~2005-04-02 13:58 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-15 16:13 [PATCH] no more rwlock_t inside tcp_ehash_bucket Eric Dumazet
2005-03-15 18:32 ` David S. Miller
2005-03-16 10:47 ` [BUG] overflow in net/ipv4/route.c rt_check_expire() Eric Dumazet
2005-03-16 18:05 ` [PATCH] reduce sizeof(struct inet_peer) from 128 to 64 bytes on 64bits architectures Eric Dumazet
2005-03-16 22:10 ` David S. Miller
2005-03-16 22:09 ` [BUG] overflow in net/ipv4/route.c rt_check_expire() David S. Miller
2005-03-17 19:52 ` Eric Dumazet
2005-04-01 6:13 ` David S. Miller
2005-04-01 14:39 ` Eric Dumazet
2005-04-01 15:53 ` Robert Olsson
2005-04-01 16:34 ` Eric Dumazet
2005-04-01 17:26 ` Robert Olsson
2005-04-01 20:28 ` David S. Miller
2005-04-01 21:05 ` Eric Dumazet
2005-04-01 21:08 ` David S. Miller
2005-04-01 21:43 ` Eric Dumazet
2005-04-01 22:34 ` David S. Miller
2005-04-01 23:21 ` Eric Dumazet
2005-04-01 23:54 ` David S. Miller
2005-04-02 8:21 ` Herbert Xu
2005-04-02 9:21 ` Eric Dumazet
2005-04-02 11:23 ` Get rid of rt_check_expire and rt_garbage_collect Herbert Xu
2005-04-02 13:58 ` Eric Dumazet [this message]
2005-04-02 14:03 ` Robert Olsson
2005-04-02 21:05 ` jamal
2005-04-03 7:38 ` Herbert Xu
2005-04-03 7:41 ` Herbert Xu
2005-04-02 13:48 ` [BUG] overflow in net/ipv4/route.c rt_check_expire() Robert Olsson
2005-04-02 14:10 ` Eric Dumazet
2005-04-02 14:46 ` Robert Olsson
2005-04-02 20:47 ` jamal
2005-04-02 19:32 ` Herbert Xu
2005-04-02 19:55 ` David S. Miller
2005-04-03 7:43 ` Herbert Xu
2005-04-03 19:57 ` Robert Olsson
2005-04-03 21:45 ` Herbert Xu
2005-04-04 10:27 ` Robert Olsson
2005-04-04 10:38 ` Herbert Xu
2005-04-04 12:29 ` Robert Olsson
2005-04-03 19:36 ` Robert Olsson
2005-04-03 21:43 ` Herbert Xu
2005-04-04 10:38 ` Robert Olsson
2005-04-04 10:48 ` Herbert Xu
2005-04-04 13:17 ` Robert Olsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=424EA51F.6000300@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=Robert.Olsson@data.slu.se \
--cc=davem@davemloft.net \
--cc=hadi@cyberus.ca \
--cc=herbert@gondor.apana.org.au \
--cc=netdev@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).