public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Chris Snook <csnook@redhat.com>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>, linux-kernel@vger.kernel.org
Subject: Re: [Slightly off topic] A question about R/B trees.
Date: Sat, 18 Oct 2008 09:53:10 +0200	[thread overview]
Message-ID: <87myh2jq55.fsf@basil.nowhere.org> (raw)
In-Reply-To: <48F90E96.3060800@redhat.com> (Chris Snook's message of "Fri, 17 Oct 2008 18:15:50 -0400")

Chris Snook <csnook@redhat.com> writes:

> Maxim Levitsky wrote:
>> I am working on my small project, and I need a fast container to
>> hold a large sparse array.
>> Balanced trees seem to fit perfectly.
>
> Balanced trees take O(log n) to perform a great many operations, and
> traversing a binary tree is a particularly bad case for branch
> prediction.  Hash tables will perform much better, unless you get them
> horribly wrong.

That seems like a unfair generalization.

The problem with hash tables is that if they're big enough
or if the rest of the workload is memory intensive
each hit will be a cache miss. And you can do a lot of branch
mispredicts in the time of a single cache miss.

In general trees can be much better for cache usage, although
it's generally better to use some tree that has nodes near 
the cache line size. Binary trees like RB are too small for that.

The other advantage of trees is that they scale naturally.
For hash tables you either have it being sized for the worst
case (usually wasting a lot of memory[1] and making the cache miss
problem worse) or you need to do dynamic rehashing which 
is complex and difficult to get right especially in multi threaded
situations.

That is why the trend in Linux at least is to move away from
hash tables.

[1] Just take a look at how much memory that various hash 
tables in Linux use: dmesg | grep hash

> The kernel is written in a dialect of C that makes several assumptions
> about the compiler, among them that the compiler won't screw this up
> unless you tell it to.  Any compiler that has alignment problems with
> the rbtree code is going to have similar problems in lots of other
> places too.  We don't support those compilers.

At least upto 4 bytes or so it's generally a safe assumption
that objects will be naturally aligned. Also except for tree 
root the objects are typically allocated with malloc() (or
equivalent kmalloc) anyways and malloc()s guarantee
worst case alignment.


-Andi
-- 
ak@linux.intel.com

  parent reply	other threads:[~2008-10-18  7:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-17 21:34 [Slightly off topic] A question about R/B trees Maxim Levitsky
2008-10-17 22:15 ` Chris Snook
2008-10-17 22:47   ` Maxim Levitsky
2008-10-20 14:54     ` Chris Friesen
2008-10-18  7:53   ` Andi Kleen [this message]
2008-10-18  8:45     ` Pekka Enberg
2008-10-20 15:57       ` Maxim Levitsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87myh2jq55.fsf@basil.nowhere.org \
    --to=andi@firstfloor.org \
    --cc=csnook@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maximlevitsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox