From: Andi Kleen <andi@firstfloor.org>
To: Chris Snook <csnook@redhat.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>, linux-kernel@vger.kernel.org
Subject: Re: [Slightly off topic] A question about R/B trees.
Date: Sat, 18 Oct 2008 09:53:10 +0200 [thread overview]
Message-ID: <87myh2jq55.fsf@basil.nowhere.org> (raw)
In-Reply-To: <48F90E96.3060800@redhat.com> (Chris Snook's message of "Fri, 17 Oct 2008 18:15:50 -0400")
Chris Snook <csnook@redhat.com> writes:
> Maxim Levitsky wrote:
>> I am working on my small project, and I need a fast container to
>> hold a large sparse array.
>> Balanced trees seem to fit perfectly.
>
> Balanced trees take O(log n) to perform a great many operations, and
> traversing a binary tree is a particularly bad case for branch
> prediction. Hash tables will perform much better, unless you get them
> horribly wrong.
That seems like a unfair generalization.
The problem with hash tables is that if they're big enough
or if the rest of the workload is memory intensive
each hit will be a cache miss. And you can do a lot of branch
mispredicts in the time of a single cache miss.
In general trees can be much better for cache usage, although
it's generally better to use some tree that has nodes near
the cache line size. Binary trees like RB are too small for that.
The other advantage of trees is that they scale naturally.
For hash tables you either have it being sized for the worst
case (usually wasting a lot of memory[1] and making the cache miss
problem worse) or you need to do dynamic rehashing which
is complex and difficult to get right especially in multi threaded
situations.
That is why the trend in Linux at least is to move away from
hash tables.
[1] Just take a look at how much memory that various hash
tables in Linux use: dmesg | grep hash
> The kernel is written in a dialect of C that makes several assumptions
> about the compiler, among them that the compiler won't screw this up
> unless you tell it to. Any compiler that has alignment problems with
> the rbtree code is going to have similar problems in lots of other
> places too. We don't support those compilers.
At least upto 4 bytes or so it's generally a safe assumption
that objects will be naturally aligned. Also except for tree
root the objects are typically allocated with malloc() (or
equivalent kmalloc) anyways and malloc()s guarantee
worst case alignment.
-Andi
--
ak@linux.intel.com
next prev parent reply other threads:[~2008-10-18 7:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-17 21:34 [Slightly off topic] A question about R/B trees Maxim Levitsky
2008-10-17 22:15 ` Chris Snook
2008-10-17 22:47 ` Maxim Levitsky
2008-10-20 14:54 ` Chris Friesen
2008-10-18 7:53 ` Andi Kleen [this message]
2008-10-18 8:45 ` Pekka Enberg
2008-10-20 15:57 ` Maxim Levitsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87myh2jq55.fsf@basil.nowhere.org \
--to=andi@firstfloor.org \
--cc=csnook@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maximlevitsky@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox