Current ip route cache implementation is not suited to large caches. We can consume a lot of CPU when cache must be invalidated, since we currently need to evict all cache entries, and this eviction is sometimes asynchronous. min_delay & max_delay can somewhat control this asynchronism behavior, but whole thing is a kludge, regularly triggering infamous soft lockup messages. When entries are still in use, this also consumes a lot of ram, filling dst_garbage.list. A better scheme is to use a generation identifier on each entry, so that cache invalidation can be performed by changing the table identifier, without having to scan all entries. No more delayed flushing, no more stalling when secret_interval expires. Invalidated entries will then be freed at GC time (controled by ip_rt_gc_timeout or stress), or when an invalidated entry is found in a chain when an insert is done. Thus we keep a normal equilibrium. This patch : - renames rt_hash_rnd to rt_genid (and makes it an atomic_t) - Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit) - Checks entry->rt_genid at appropriate places : --- Readers have to ignore invalidated entries. --- Writers can delete invalidated entries. - Removes rt_flush_timer timer - Removes unused /proc/sys/net/ipv4/{min_delay,max_delay} We even reduce size of route.o # size net/ipv4/route.o text data bss dec hex filename 20038 1331 160 21529 5419 net/ipv4/route.o.before 19991 1203 104 21298 5332 net/ipv4/route.o Next step will be to audit all rt_cache_flush(0) (aka flushes) users, see if they can be converted to "invalidate the cache" users. Signed-off-by: Eric Dumazet Documentation/filesystems/proc.txt | 4 include/linux/sysctl.h | 4 include/net/route.h | 1 net/ipv4/route.c | 209 +++++++++++---------------- 4 files changed, 92 insertions(+), 126 deletions(-)