From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: shemminger@vyatta.com, andi@firstfloor.org, davej@redhat.com,
netdev@vger.kernel.org, j.w.r.degoede@hhs.nl
Subject: Re: cat /proc/net/tcp takes 0.5 seconds on x86_64
Date: Thu, 28 Aug 2008 08:20:51 +0200 [thread overview]
Message-ID: <48B643C3.9040502@cosmosbay.com> (raw)
In-Reply-To: <20080827.150955.118944272.davem@davemloft.net>
David Miller a écrit :
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Wed, 27 Aug 2008 14:48:00 -0700
>
>> I do wonder if having large hash table actually helps? When TCP hash
>> table gets too big, it means every lookup is a cache miss. Assuming
>> a busy server with 2000 connections and perfect hash. On a 4G mem x86-64
>> we are doing 512K hash entries which is ridiculous. Something like 64K
>> entries is more than enough.
>
> That's true, but it's nearly guaranteed to only be a single cache miss
> at worst (if the hash function is working) compared to potentially
> multiple ones if we sized it too small.
>
> I really see the only way to move forward is to dynamically size the
> thing. And nobody has been strong enough to implement that yet :)
>
You are right. For TCP hash table thats probably hard to implement.
But for route cache, it is probably doable since we added the rt_genid
thing in commit 29e75252da20f3ab9e132c68c9aed156b87beae6
([IPV4] route cache: Introduce rt_genid for smooth cache invalidation)
If we add a hash table for each "struct net" (net->ipv4.rt_hash_table),
we then could do something sensible when an admin writes to
/proc/sys/net/ipv4/route/hash_size or at rt_check_expire() time, if
hash table is found to be full...
1) Instead of using alloc_large_system_hash() at boot time to allocate
rt_hash_table, use a plain vmalloc()
Initial hash size could be small (one page) unless "rhash_entries=xxx" boot parameter says otherwise.
2) If an admin writes a new value to /proc/sys/net/ipv4/route/hash_size :
- Allocate a new table with vmalloc()
- Change the net->ipv4.rt_genid and net->ipv4.rt_hash_table
- Old table contains obsolete entries, rt_free() them all.
- vfree() old hash table, now empty.
3) In rt_check_expire(), adds some metrics to trigger an expand of the
hash table in case we found too many entries in it.
next prev parent reply other threads:[~2008-08-28 6:21 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-459782-176318@bugzilla.redhat.com>
[not found] ` <200808261549.m7QFnVUN032543@bz-web1.app.phx.redhat.com>
2008-08-26 16:37 ` cat /proc/net/tcp takes 0.5 seconds on x86_64 Dave Jones
2008-08-26 18:32 ` Eric Dumazet
2008-08-26 19:01 ` Hans de Goede
2008-08-26 20:39 ` Eric Dumazet
2008-08-26 20:58 ` Hans de Goede
2008-08-26 21:27 ` Eric Dumazet
2008-08-27 9:14 ` Hans de Goede
2008-08-27 9:05 ` David Miller
2008-08-27 9:45 ` Hans de Goede
2008-08-27 9:39 ` David Miller
2008-08-27 4:19 ` Herbert Xu
2008-08-27 9:07 ` Hans de Goede
2008-08-27 12:41 ` Andi Kleen
2008-08-27 21:29 ` Trent Piepho
2008-08-27 21:47 ` Andi Kleen
2008-08-27 22:54 ` Andi Kleen
2008-08-27 21:29 ` David Miller
2008-08-27 21:48 ` Stephen Hemminger
2008-08-27 22:09 ` David Miller
2008-08-28 6:20 ` Eric Dumazet [this message]
2008-08-28 6:51 ` David Miller
2008-08-28 7:13 ` Eric Dumazet
2008-08-28 7:57 ` David Miller
2008-08-28 9:52 ` Eric Dumazet
2008-08-28 7:26 ` Andi Kleen
2008-08-27 22:34 ` Andi Kleen
2008-08-27 22:39 ` David Miller
2008-08-27 22:57 ` Andi Kleen
2008-08-27 23:07 ` David Miller
2008-08-27 23:09 ` Eric Dumazet
2008-08-27 23:15 ` David Miller
2008-08-27 23:35 ` Andi Kleen
2008-08-27 23:43 ` Eric Dumazet
2008-08-27 23:45 ` David Miller
2008-08-28 0:40 ` Eric Dumazet
2008-08-28 7:45 ` Andi Kleen
2008-08-28 7:59 ` David Miller
2008-08-28 8:12 ` Hans de Goede
2008-08-28 8:04 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48B643C3.9040502@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=andi@firstfloor.org \
--cc=davej@redhat.com \
--cc=davem@davemloft.net \
--cc=j.w.r.degoede@hhs.nl \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.