Re: [PATCH] add b+tree library

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Jörn Engel" <joern@logfs.org>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Theodore Tso <tytso@mit.edu>, Andi Kleen <andi@firstfloor.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Linux Kernel list <linux-kernel@vger.kernel.org>,
	"Luis R. Rodriguez" <mcgrof@gmail.com>
Subject: Re: [PATCH] add b+tree library
Date: Sat, 7 Feb 2009 13:26:18 +0100	[thread overview]
Message-ID: <20090207122618.GA17222@logfs.org> (raw)
In-Reply-To: <1233793066.7390.34.camel@johannes.local>

On Thu, 5 February 2009 01:17:46 +0100, Johannes Berg wrote:
> 
> Joern may need arbitrary key lengths, don't. But I've just looked around
> a little:
> 
>  * radix trees are completely unsuitable for use as a sort of hash table
>    because of their behaviour when keys are not at last mostly
>    contiguous
>  * rbtrees require lots of boilerplate code, and have much worse cache
>    behaviour

I did some testing as well.  And I didn't like the results very much.
In my test harness, rbtrees performed roughly twice as good as btrees.

Something clearly is wrong with my theory.  To spell it out, the theory
assumes that 1) CPUs get continuously faster at computations while
memory latency stays roughly constant and as a result 2) current CPUs
are sufficiently fast that memory latency is more important than a large
amount of computation.  And maybe 3) L1 cache latency can be ignored,
while DRAM latency most definitely can not.

At least one of the above must be wrong.  Another interesting data point
is that after hacking up binary search within nodes, btrees performed
roughly 10% better than before.  Binary search means we mispredict every
other branch, yet this still improved performance.  So on my test CPU
(Pentium M), branch mispredictions must be relatively cheap compared to
either calculations or L1 cache latency.

I also tried to rig the tests to favor btrees over rbtrees.  Since the
rb_node is embedded in an object, I grew those objects beyond cacheline
size, so no two rb_nodes would ever share a cacheline while all the
pointers in btrees still do.  And still btrees lost.  Well - if the
dataset is large enough that all the object padding is comsuming half a
gigabyte of memory, swapping will make the rbtree load go slow, but
given enough free memory and no swapping (i.e. second run) it beats
btrees.

So there are two results I can see from all this.  Rbtrees are still a
good choice an semi-current machines and the kernel doesn't need much
rework yet.  Whether my assumption 2) above will match reality better in
the future and the scales will tip to the other side I don't know.  The
other is that my assumptions are wrong somewhere and I don't yet
understand where.  If anyone has an idea, I'd be glad to hear about it.

Jörn

-- 
Science is like sex: sometimes something useful comes out,
but that is not the reason we are doing it.
-- Richard Feynman

next prev parent reply	other threads:[~2009-02-07 12:27 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-10 10:47 [PATCH] add b+tree library Johannes Berg
2009-01-10 11:02 ` KOSAKI Motohiro
2009-01-10 11:37   ` Johannes Berg
2009-01-10 11:56     ` Jörn Engel
2009-01-10 12:29     ` KOSAKI Motohiro
2009-01-10 18:39       ` Jörn Engel
2009-01-10 18:44         ` Johannes Berg
2009-01-10 19:41           ` Andi Kleen
2009-01-10 20:22             ` Johannes Berg
2009-01-10 20:23             ` Jörn Engel
2009-01-10 21:27               ` Theodore Tso
2009-01-10 22:01                 ` Jörn Engel
2009-01-10 22:23                   ` Andrew Morton
2009-01-10 23:57                     ` Peter Zijlstra
2009-01-11  8:30                       ` Jörn Engel
2009-01-12 16:20                         ` Paul E. McKenney
2009-02-05  0:17                       ` Johannes Berg
2009-02-05  8:46                         ` Andi Kleen
2009-02-07 12:26                         ` Jörn Engel [this message]
2009-01-11  3:13                   ` Theodore Tso
2009-01-10 22:26                 ` Andi Kleen
2009-01-11  8:20                   ` Jörn Engel
2009-01-11 18:23                     ` Andi Kleen
2009-01-17 17:53 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090207122618.GA17222@logfs.org \
    --to=joern@logfs.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=johannes@sipsolutions.net \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox