From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754019AbZBGM1S (ORCPT ); Sat, 7 Feb 2009 07:27:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752610AbZBGM1D (ORCPT ); Sat, 7 Feb 2009 07:27:03 -0500 Received: from lazybastard.de ([212.112.238.170]:44342 "EHLO longford.logfs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752420AbZBGM1B (ORCPT ); Sat, 7 Feb 2009 07:27:01 -0500 Date: Sat, 7 Feb 2009 13:26:18 +0100 From: =?utf-8?B?SsO2cm4=?= Engel To: Johannes Berg Cc: Peter Zijlstra , Andrew Morton , Theodore Tso , Andi Kleen , KOSAKI Motohiro , Linux Kernel list , "Luis R. Rodriguez" Subject: Re: [PATCH] add b+tree library Message-ID: <20090207122618.GA17222@logfs.org> References: <2f11576a0901100429h415d3a87o40ba4849120832c8@mail.gmail.com> <20090110183921.GD20611@logfs.org> <1231613042.3706.16.camel@johannes> <87fxjrgd9s.fsf@basil.nowhere.org> <20090110202315.GE20611@logfs.org> <20090110212740.GE31579@mit.edu> <20090110220135.GF20611@logfs.org> <20090110142330.295a8847.akpm@linux-foundation.org> <1231631840.13420.24.camel@twins> <1233793066.7390.34.camel@johannes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1233793066.7390.34.camel@johannes.local> User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 5 February 2009 01:17:46 +0100, Johannes Berg wrote: > > Joern may need arbitrary key lengths, don't. But I've just looked around > a little: > > * radix trees are completely unsuitable for use as a sort of hash table > because of their behaviour when keys are not at last mostly > contiguous > * rbtrees require lots of boilerplate code, and have much worse cache > behaviour I did some testing as well. And I didn't like the results very much. In my test harness, rbtrees performed roughly twice as good as btrees. Something clearly is wrong with my theory. To spell it out, the theory assumes that 1) CPUs get continuously faster at computations while memory latency stays roughly constant and as a result 2) current CPUs are sufficiently fast that memory latency is more important than a large amount of computation. And maybe 3) L1 cache latency can be ignored, while DRAM latency most definitely can not. At least one of the above must be wrong. Another interesting data point is that after hacking up binary search within nodes, btrees performed roughly 10% better than before. Binary search means we mispredict every other branch, yet this still improved performance. So on my test CPU (Pentium M), branch mispredictions must be relatively cheap compared to either calculations or L1 cache latency. I also tried to rig the tests to favor btrees over rbtrees. Since the rb_node is embedded in an object, I grew those objects beyond cacheline size, so no two rb_nodes would ever share a cacheline while all the pointers in btrees still do. And still btrees lost. Well - if the dataset is large enough that all the object padding is comsuming half a gigabyte of memory, swapping will make the rbtree load go slow, but given enough free memory and no swapping (i.e. second run) it beats btrees. So there are two results I can see from all this. Rbtrees are still a good choice an semi-current machines and the kernel doesn't need much rework yet. Whether my assumption 2) above will match reality better in the future and the scales will tip to the other side I don't know. The other is that my assumptions are wrong somewhere and I don't yet understand where. If anyone has an idea, I'd be glad to hear about it. Jörn -- Science is like sex: sometimes something useful comes out, but that is not the reason we are doing it. -- Richard Feynman