From: Eric Dumazet <dada1@cosmosbay.com>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: akepner@sgi.com, linux@horizon.com, davem@davemloft.net,
netdev@vger.kernel.org, bcrl@linux.intel.com
Subject: Re: Extensible hashing and RCU
Date: Sun, 18 Feb 2007 21:21:30 +0100 [thread overview]
Message-ID: <45D8B54A.70903@cosmosbay.com> (raw)
In-Reply-To: <20070218191009.GA28216@2ka.mipt.ru>
Evgeniy Polyakov a e'crit :
> On Sun, Feb 18, 2007 at 07:46:22PM +0100, Eric Dumazet (dada1@cosmosbay.com) wrote:
>>> Why anyone do not want to use trie - for socket-like loads it has
>>> exactly constant search/insert/delete time and scales as hell.
>>>
>> Because we want to be *very* fast. You cannot beat hash table.
>>
>> Say you have 1.000.000 tcp connections, with 50.000 incoming packets per
>> second to *random* streams...
>
> What is really good in trie, that you may have upto 2^32 connections
> without _any_ difference in lookup performance of random streams.
So are you speaking of one memory cache miss per lookup ?
If not, you loose.
>
>> With a 2^20 hashtable, a lookup uses one cache line (the hash head pointer)
>> plus one cache line to get the socket (you need it to access its refcounter)
>>
>> Several attempts were done in the past to add RCU to ehash table (last done
>> by Benjamin LaHaise last March). I believe this was delayed a bit, because
>> David would like to be able to resize the hash table...
>
> This is a theory.
Not theory, but actual practice, on a real machine.
# cat /proc/net/sockstat
sockets: used 918944
TCP: inuse 925413 orphan 7401 tw 4906 alloc 926292 mem 304759
UDP: inuse 9
RAW: inuse 0
FRAG: inuse 9 memory 18360
> Practice includes cost for hashing, locking, and list traversal
> (each pointer is in own cache line btw, which must be fetched) and plus
> the same for time wait sockets (if we are unlucky).
>
> No need to talk about price of cache miss when there might be more
> serious problems - for example length of the linked list to traverse each
> time new packet is received.
>
> For example lookup time in trie with 1.6 millions random 3-dimensional
> 32bit (saddr/daddr/ports) entries is about 1 microsecond on amd athlon64
> 3500 cpu (test was ran in userspace emulator though).
1 microsecond ? Are you kidding ? We want no more than 50 ns.
You can check on this dual cpu machine, tcp_v4_rcv() uses 2.29 % of cpu.
CPU: AMD64 processors, speed 1992.67 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 100000
samples % symbol name
2009510 4.6863 memcpy_c
1668842 3.8918 tg3_start_xmit_dma_bug
1485844 3.4651 tg3_poll
1293558 3.0167 kmem_cache_free
1232862 2.8751 kfree
1131012 2.6376 free_block
1000671 2.3336 ip_route_input
982655 2.2916 tcp_v4_rcv
955554 2.2284 __alloc_skb
863753 2.0143 tcp_ack
863222 2.0131 tcp_recvmsg
834680 1.9465 fget_light
801445 1.8690 lock_sock_nested
793699 1.8510 tcp_sendmsg
764689 1.7833 copy_user_generic_string
743515 1.7339 ip_queue_xmit
712314 1.6612 sock_wfree
650486 1.5170 tcp_rcv_established
next prev parent reply other threads:[~2007-02-18 20:27 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-04 7:41 Extensible hashing and RCU linux
2007-02-05 18:02 ` akepner
2007-02-17 13:13 ` Evgeniy Polyakov
2007-02-18 18:46 ` Eric Dumazet
2007-02-18 19:10 ` Evgeniy Polyakov
2007-02-18 20:21 ` Eric Dumazet [this message]
2007-02-18 21:23 ` Michael K. Edwards
2007-02-18 22:04 ` Michael K. Edwards
2007-02-19 12:04 ` Andi Kleen
2007-02-19 19:18 ` Michael K. Edwards
2007-02-19 11:41 ` Evgeniy Polyakov
2007-02-19 13:38 ` Eric Dumazet
2007-02-19 13:56 ` Evgeniy Polyakov
2007-02-19 14:14 ` Eric Dumazet
2007-02-19 14:25 ` Evgeniy Polyakov
2007-02-19 15:14 ` Eric Dumazet
2007-02-19 18:13 ` Eric Dumazet
2007-02-19 18:26 ` Benjamin LaHaise
2007-02-19 18:38 ` Benjamin LaHaise
2007-02-20 9:25 ` Evgeniy Polyakov
2007-02-20 9:57 ` David Miller
2007-02-20 10:22 ` Evgeniy Polyakov
2007-02-20 10:04 ` Eric Dumazet
2007-02-20 10:12 ` David Miller
2007-02-20 10:30 ` Evgeniy Polyakov
2007-02-20 11:10 ` Eric Dumazet
2007-02-20 11:23 ` Evgeniy Polyakov
2007-02-20 11:30 ` Eric Dumazet
2007-02-20 11:41 ` Evgeniy Polyakov
2007-02-20 10:49 ` Eric Dumazet
2007-02-20 15:07 ` Michael K. Edwards
2007-02-20 15:11 ` Evgeniy Polyakov
2007-02-20 15:49 ` Michael K. Edwards
2007-02-20 15:59 ` Evgeniy Polyakov
2007-02-20 16:08 ` Eric Dumazet
2007-02-20 16:20 ` Evgeniy Polyakov
2007-02-20 16:38 ` Eric Dumazet
2007-02-20 16:59 ` Evgeniy Polyakov
2007-02-20 17:05 ` Evgeniy Polyakov
2007-02-20 17:53 ` Eric Dumazet
2007-02-20 18:00 ` Evgeniy Polyakov
2007-02-20 18:55 ` Eric Dumazet
2007-02-20 19:06 ` Evgeniy Polyakov
2007-02-20 19:17 ` Eric Dumazet
2007-02-20 19:36 ` Evgeniy Polyakov
2007-02-20 19:44 ` Michael K. Edwards
2007-02-20 17:20 ` Eric Dumazet
2007-02-20 17:55 ` Evgeniy Polyakov
2007-02-20 18:12 ` Evgeniy Polyakov
2007-02-20 19:13 ` Michael K. Edwards
2007-02-20 19:44 ` Evgeniy Polyakov
2007-02-20 20:03 ` Michael K. Edwards
2007-02-20 20:09 ` Michael K. Edwards
2007-02-21 8:56 ` Evgeniy Polyakov
2007-02-21 9:34 ` David Miller
2007-02-21 9:51 ` Evgeniy Polyakov
2007-02-21 10:03 ` David Miller
2007-02-21 8:54 ` Evgeniy Polyakov
2007-02-21 9:15 ` Eric Dumazet
2007-02-21 9:27 ` Evgeniy Polyakov
2007-02-21 9:38 ` Eric Dumazet
2007-02-21 9:57 ` Evgeniy Polyakov
2007-02-21 21:15 ` Michael K. Edwards
2007-02-22 9:06 ` David Miller
2007-02-22 11:00 ` Michael K. Edwards
2007-02-22 11:07 ` David Miller
2007-02-22 19:24 ` Stephen Hemminger
2007-02-20 16:04 ` Eric Dumazet
2007-02-22 23:49 ` linux
2007-02-23 2:31 ` Michael K. Edwards
2007-02-20 10:44 ` Evgeniy Polyakov
2007-02-20 11:09 ` Eric Dumazet
2007-02-20 11:29 ` Evgeniy Polyakov
2007-02-20 11:34 ` Eric Dumazet
2007-02-20 11:45 ` Evgeniy Polyakov
2007-02-21 12:41 ` Andi Kleen
2007-02-21 13:19 ` Eric Dumazet
2007-02-21 13:37 ` David Miller
2007-02-21 23:13 ` Robert Olsson
2007-02-22 6:06 ` Eric Dumazet
2007-02-22 11:41 ` Andi Kleen
2007-02-22 11:44 ` David Miller
2007-02-20 12:11 ` Evgeniy Polyakov
2007-02-19 22:10 ` Andi Kleen
2007-02-19 12:02 ` Andi Kleen
2007-02-19 12:35 ` Robert Olsson
2007-02-19 14:04 ` Evgeniy Polyakov
2007-03-02 8:52 ` Evgeniy Polyakov
2007-03-02 9:56 ` Eric Dumazet
2007-03-02 10:28 ` Evgeniy Polyakov
2007-03-02 20:45 ` Michael K. Edwards
2007-03-03 10:46 ` Evgeniy Polyakov
2007-03-04 10:02 ` Michael K. Edwards
2007-03-04 20:36 ` David Miller
2007-03-05 7:12 ` Michael K. Edwards
2007-03-05 10:02 ` Robert Olsson
2007-03-05 10:00 ` Evgeniy Polyakov
2007-03-13 9:32 ` Evgeniy Polyakov
2007-03-13 10:08 ` Eric Dumazet
2007-03-13 10:24 ` Evgeniy Polyakov
2007-02-05 18:41 ` [RFC/TOY]Extensible " akepner
2007-02-06 19:09 ` linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45D8B54A.70903@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akepner@sgi.com \
--cc=bcrl@linux.intel.com \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=linux@horizon.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).