From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: shemminger@vyatta.com, andi@firstfloor.org, davej@redhat.com,
netdev@vger.kernel.org, j.w.r.degoede@hhs.nl
Subject: Re: cat /proc/net/tcp takes 0.5 seconds on x86_64
Date: Thu, 28 Aug 2008 08:20:51 +0200 [thread overview]
Message-ID: <48B643C3.9040502@cosmosbay.com> (raw)
In-Reply-To: <20080827.150955.118944272.davem@davemloft.net>
David Miller a écrit :
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Wed, 27 Aug 2008 14:48:00 -0700
>
>> I do wonder if having large hash table actually helps? When TCP hash
>> table gets too big, it means every lookup is a cache miss. Assuming
>> a busy server with 2000 connections and perfect hash. On a 4G mem x86-64
>> we are doing 512K hash entries which is ridiculous. Something like 64K
>> entries is more than enough.
>
> That's true, but it's nearly guaranteed to only be a single cache miss
> at worst (if the hash function is working) compared to potentially
> multiple ones if we sized it too small.
>
> I really see the only way to move forward is to dynamically size the
> thing. And nobody has been strong enough to implement that yet :)
>
You are right. For TCP hash table thats probably hard to implement.
But for route cache, it is probably doable since we added the rt_genid
thing in commit 29e75252da20f3ab9e132c68c9aed156b87beae6
([IPV4] route cache: Introduce rt_genid for smooth cache invalidation)
If we add a hash table for each "struct net" (net->ipv4.rt_hash_table),
we then could do something sensible when an admin writes to
/proc/sys/net/ipv4/route/hash_size or at rt_check_expire() time, if
hash table is found to be full...
1) Instead of using alloc_large_system_hash() at boot time to allocate
rt_hash_table, use a plain vmalloc()
Initial hash size could be small (one page) unless "rhash_entries=xxx" boot parameter says otherwise.
2) If an admin writes a new value to /proc/sys/net/ipv4/route/hash_size :
- Allocate a new table with vmalloc()
- Change the net->ipv4.rt_genid and net->ipv4.rt_hash_table
- Old table contains obsolete entries, rt_free() them all.
- vfree() old hash table, now empty.
3) In rt_check_expire(), adds some metrics to trigger an expand of the
hash table in case we found too many entries in it.
next prev parent reply other threads:[~2008-08-28 6:21 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-459782-176318@bugzilla.redhat.com>
[not found] ` <200808261549.m7QFnVUN032543@bz-web1.app.phx.redhat.com>
2008-08-26 16:37 ` cat /proc/net/tcp takes 0.5 seconds on x86_64 Dave Jones
2008-08-26 18:32 ` Eric Dumazet
2008-08-26 19:01 ` Hans de Goede
2008-08-26 20:39 ` Eric Dumazet
2008-08-26 20:58 ` Hans de Goede
2008-08-26 21:27 ` Eric Dumazet
2008-08-27 9:14 ` Hans de Goede
2008-08-27 9:05 ` David Miller
2008-08-27 9:45 ` Hans de Goede
2008-08-27 9:39 ` David Miller
2008-08-27 4:19 ` Herbert Xu
2008-08-27 9:07 ` Hans de Goede
2008-08-27 12:41 ` Andi Kleen
2008-08-27 21:29 ` Trent Piepho
2008-08-27 21:47 ` Andi Kleen
2008-08-27 22:54 ` Andi Kleen
2008-08-27 21:29 ` David Miller
2008-08-27 21:48 ` Stephen Hemminger
2008-08-27 22:09 ` David Miller
2008-08-28 6:20 ` Eric Dumazet [this message]
2008-08-28 6:51 ` David Miller
2008-08-28 7:13 ` Eric Dumazet
2008-08-28 7:57 ` David Miller
2008-08-28 9:52 ` Eric Dumazet
2008-08-28 7:26 ` Andi Kleen
2008-08-27 22:34 ` Andi Kleen
2008-08-27 22:39 ` David Miller
2008-08-27 22:57 ` Andi Kleen
2008-08-27 23:07 ` David Miller
2008-08-27 23:09 ` Eric Dumazet
2008-08-27 23:15 ` David Miller
2008-08-27 23:35 ` Andi Kleen
2008-08-27 23:43 ` Eric Dumazet
2008-08-27 23:45 ` David Miller
2008-08-28 0:40 ` Eric Dumazet
2008-08-28 7:45 ` Andi Kleen
2008-08-28 7:59 ` David Miller
2008-08-28 8:12 ` Hans de Goede
2008-08-28 8:04 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48B643C3.9040502@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=andi@firstfloor.org \
--cc=davej@redhat.com \
--cc=davem@davemloft.net \
--cc=j.w.r.degoede@hhs.nl \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).