netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: David Miller <davem@davemloft.net>
Cc: dada1@cosmosbay.com, akepner@sgi.com, linux@horizon.com,
	netdev@vger.kernel.org, bcrl@kvack.org
Subject: Re: Extensible hashing and RCU
Date: Tue, 20 Feb 2007 13:22:06 +0300	[thread overview]
Message-ID: <20070220102206.GA7237@2ka.mipt.ru> (raw)
In-Reply-To: <20070220.015745.48806097.davem@davemloft.net>

On Tue, Feb 20, 2007 at 01:57:45AM -0800, David Miller (davem@davemloft.net) wrote:
> > What you miss is the fact, that upper layers of the tree are always in the
> > cache. So its access cost nothing compared to every time new entry in
> > the hash table.
> 
> It will not be on a real computer spending reasonable if not
> predominant time in userspace.
> 
> I agree with others here in this thread, in that your test program is
> not accurate at all in several aspects as it makes many incorrect
> assumptions about the environment in which these lookups run:
> 
> 1) cache must be assumed cold, ALWAYS
> 2) large PTE mappings must be used to simulate kernel costs
>    and timings accurately
> 
> Even on computer not spending much time in userspace, cache will
> be cold too in most of these paths.  Do you remember the cpu cycle
> counter experiment I ran in the tg3 driver a few years back?
> 
> That test showed that in NAPI loop, second packet was always processed
> some 40 times faster than first one, and it's all because of cache
> warming done by first packet.
> 
> It is a very essential and sometimes non-intuitive aspect of packet
> input processing.  You must code for the cold cache case.  Therefore
> you cannot run silly loops in userspace to measure the performance of
> a flow demux algorithm, it is a completely pointless exercise in fact
> :-)

It is not correct.
Getting size of the table for 2^20 entries is about 4Mb, we get
half/third of it in the cache in my userspace test - and it still performs 
bad (well, it is noticebly better than all other approaches (yet), but it 
is not that good as expected it to be).

With my trie test, _smaller_ part of the trie is in the cache due to
size of the node, which includes _a lot_ of additional information used
for wildcard processing (needed for netfilter and/or route
implementation in netchannels).

Above words are _only_ correct for hash tables - yes, in that case cache
is always cold, since each new flow goes into completely different
entry. With tree/trie (and _especially_ on SMP_) access to low-level
entry _requires_ all above entries to be in the cache, so until load
fully flushes it away, there will be a win for tree implementation.
If load is that, that it flushes the whole cache each time, then we are
going to die just because populating of the struct sock and/or tcp
socket is much worse than several tree/trie entris in the lookup.

> > And there is _no_ O(1) - O(1) is just a hash entry selection, then you
> > need to add the whole chain path, which can be long enough.
> > For example for the length 9 you have 1000 entries - it is exactly size
> > of the tree - sum of all entries upto and including 2^9.
> 
> Hash table should grow to exactly number of hash chains as number
> of sockets that are active.  That is how we program every dynamically
> grown hash table algorithm in the kernel, and we would do the same
> for TCP once dynamic growth is possible.

Eric's test shows that only one third of the sockets is in the chains
with 1 length, so it does not work that way, which is not a bad or good
- table scaling is not trivial operation indeed.

> It is assumption of every hash algorithm.  We choose poor sizes at
> boot time on Eric's computer, or there is some limit preventing from
> using a proper size.  It does not make a flaw in the hash approach.
> 
> People don't use trees or tries for socket demux in any modern
> operating system, there is a reason and it is not because this
> approach has not been tried. :-)

Trees are used _always_ in huge route tables - not just because it is
too ugly to implement wildcards in hash tables :).

I would agree that routers do not perform any real work besides packet
forwarding, so they do have theirs caches filled with all needed
information (especially if all above is implemented in hardware and thus
is completely different), but it does not mean that existing systems do
not allow to scale with trees with existing cache issues.

-- 
	Evgeniy Polyakov

  reply	other threads:[~2007-02-20 10:23 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-04  7:41 Extensible hashing and RCU linux
2007-02-05 18:02 ` akepner
2007-02-17 13:13   ` Evgeniy Polyakov
2007-02-18 18:46     ` Eric Dumazet
2007-02-18 19:10       ` Evgeniy Polyakov
2007-02-18 20:21         ` Eric Dumazet
2007-02-18 21:23           ` Michael K. Edwards
2007-02-18 22:04             ` Michael K. Edwards
2007-02-19 12:04             ` Andi Kleen
2007-02-19 19:18               ` Michael K. Edwards
2007-02-19 11:41           ` Evgeniy Polyakov
2007-02-19 13:38             ` Eric Dumazet
2007-02-19 13:56               ` Evgeniy Polyakov
2007-02-19 14:14                 ` Eric Dumazet
2007-02-19 14:25                   ` Evgeniy Polyakov
2007-02-19 15:14                     ` Eric Dumazet
2007-02-19 18:13                       ` Eric Dumazet
2007-02-19 18:26                         ` Benjamin LaHaise
2007-02-19 18:38                           ` Benjamin LaHaise
2007-02-20  9:25                         ` Evgeniy Polyakov
2007-02-20  9:57                           ` David Miller
2007-02-20 10:22                             ` Evgeniy Polyakov [this message]
2007-02-20 10:04                           ` Eric Dumazet
2007-02-20 10:12                             ` David Miller
2007-02-20 10:30                               ` Evgeniy Polyakov
2007-02-20 11:10                                 ` Eric Dumazet
2007-02-20 11:23                                   ` Evgeniy Polyakov
2007-02-20 11:30                                   ` Eric Dumazet
2007-02-20 11:41                                     ` Evgeniy Polyakov
2007-02-20 10:49                               ` Eric Dumazet
2007-02-20 15:07                               ` Michael K. Edwards
2007-02-20 15:11                                 ` Evgeniy Polyakov
2007-02-20 15:49                                   ` Michael K. Edwards
2007-02-20 15:59                                     ` Evgeniy Polyakov
2007-02-20 16:08                                       ` Eric Dumazet
2007-02-20 16:20                                         ` Evgeniy Polyakov
2007-02-20 16:38                                           ` Eric Dumazet
2007-02-20 16:59                                             ` Evgeniy Polyakov
2007-02-20 17:05                                               ` Evgeniy Polyakov
2007-02-20 17:53                                                 ` Eric Dumazet
2007-02-20 18:00                                                   ` Evgeniy Polyakov
2007-02-20 18:55                                                     ` Eric Dumazet
2007-02-20 19:06                                                       ` Evgeniy Polyakov
2007-02-20 19:17                                                         ` Eric Dumazet
2007-02-20 19:36                                                           ` Evgeniy Polyakov
2007-02-20 19:44                                                           ` Michael K. Edwards
2007-02-20 17:20                                               ` Eric Dumazet
2007-02-20 17:55                                                 ` Evgeniy Polyakov
2007-02-20 18:12                                                   ` Evgeniy Polyakov
2007-02-20 19:13                                                     ` Michael K. Edwards
2007-02-20 19:44                                                       ` Evgeniy Polyakov
2007-02-20 20:03                                                         ` Michael K. Edwards
2007-02-20 20:09                                                           ` Michael K. Edwards
2007-02-21  8:56                                                             ` Evgeniy Polyakov
2007-02-21  9:34                                                               ` David Miller
2007-02-21  9:51                                                                 ` Evgeniy Polyakov
2007-02-21 10:03                                                                   ` David Miller
2007-02-21  8:54                                                           ` Evgeniy Polyakov
2007-02-21  9:15                                                             ` Eric Dumazet
2007-02-21  9:27                                                               ` Evgeniy Polyakov
2007-02-21  9:38                                                                 ` Eric Dumazet
2007-02-21  9:57                                                                   ` Evgeniy Polyakov
2007-02-21 21:15                                                                     ` Michael K. Edwards
2007-02-22  9:06                                                                       ` David Miller
2007-02-22 11:00                                                                         ` Michael K. Edwards
2007-02-22 11:07                                                                           ` David Miller
2007-02-22 19:24                                                                             ` Stephen Hemminger
2007-02-20 16:04                                     ` Eric Dumazet
2007-02-22 23:49                                     ` linux
2007-02-23  2:31                                       ` Michael K. Edwards
2007-02-20 10:44                             ` Evgeniy Polyakov
2007-02-20 11:09                               ` Eric Dumazet
2007-02-20 11:29                                 ` Evgeniy Polyakov
2007-02-20 11:34                                   ` Eric Dumazet
2007-02-20 11:45                                     ` Evgeniy Polyakov
2007-02-21 12:41                                 ` Andi Kleen
2007-02-21 13:19                                   ` Eric Dumazet
2007-02-21 13:37                                     ` David Miller
2007-02-21 23:13                                       ` Robert Olsson
2007-02-22  6:06                                         ` Eric Dumazet
2007-02-22 11:41                                         ` Andi Kleen
2007-02-22 11:44                                           ` David Miller
2007-02-20 12:11                           ` Evgeniy Polyakov
2007-02-19 22:10                 ` Andi Kleen
2007-02-19 12:02           ` Andi Kleen
2007-02-19 12:35             ` Robert Olsson
2007-02-19 14:04       ` Evgeniy Polyakov
2007-03-02  8:52     ` Evgeniy Polyakov
2007-03-02  9:56       ` Eric Dumazet
2007-03-02 10:28         ` Evgeniy Polyakov
2007-03-02 20:45         ` Michael K. Edwards
2007-03-03 10:46           ` Evgeniy Polyakov
2007-03-04 10:02             ` Michael K. Edwards
2007-03-04 20:36               ` David Miller
2007-03-05  7:12                 ` Michael K. Edwards
2007-03-05 10:02                   ` Robert Olsson
2007-03-05 10:00               ` Evgeniy Polyakov
2007-03-13  9:32       ` Evgeniy Polyakov
2007-03-13 10:08         ` Eric Dumazet
2007-03-13 10:24           ` Evgeniy Polyakov
2007-02-05 18:41 ` [RFC/TOY]Extensible " akepner
2007-02-06 19:09   ` linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070220102206.GA7237@2ka.mipt.ru \
    --to=johnpol@2ka.mipt.ru \
    --cc=akepner@sgi.com \
    --cc=bcrl@kvack.org \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=linux@horizon.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).