Re: Generic B-tree implementation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrea Arcangeli <andrea@suse.de>
To: Vishal Patil <vishpat@gmail.com>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>,
	Gary Funck <gary@intrepid.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Generic B-tree implementation
Date: Wed, 19 Jul 2006 18:14:27 +0200	[thread overview]
Message-ID: <20060719161427.GN5726@opteron.random> (raw)
In-Reply-To: <4745278c0607190634l3ab43bb7t3d2a7b80c22d44c4@mail.gmail.com>

On Wed, Jul 19, 2006 at 09:34:43AM -0400, Vishal Patil wrote:
> I can get rid of recursions using loops, will need to work a little more on 
> it.

Before doing the above you may want to learn about all possible malloc
retvals too and to make sure the interface has all needed oom failure
paths that you're obviously missing.

One of the advantages of rbtree vs b-trees (and vs radixtrees too) is
the fact they require zero dynamic metadata allocations of ram. They
use the same trick of list.h to avoid it while still being mostly
generic and sharable library code. Imagine rbtrees like scalable
lists. The kernel usage is quite optimized too, the mmap path for
example does a single lookup and it stores the last "lookup" point
before restarting with an insertion while keeping the mmap_sem (or
mutex renaming of the day) on hold so to avoid the insertion operation
to start over with a second (wasteful) lookup (again very similar to
what you could do if you had list, and the rebalancing is a very
immediate operation too involving only a limited number of pointers).

> Also I will be working on developing a patch for VM management using
> B-trees instead of RB-trees.

Once you start changing those bits, you'll notice the further
requirement of the btrees due to the oom failures in code paths that
are already reasonably complex with vma oom failures.

As speed of cache raises faster than speed of ram, memory seeks tends
to cost more than they did in the past, but I doubt it worth it, most
important especially in the common case of very few vmas. I like the
common case of only a few dozen vmas to be so fast and low
overhead. The corner cases like uml and oracle already use nonlinear,
to also avoid the ram overhead of the vmas, with btree the lowmem
overhead would be even higher (the only 4/8 bytes of overhead of the
rbtrees would even be fixable with David's patch, but nobody
considered it very important so far to eliminate those 4/8 bytes
32bit/64bit per vma, though we can do that in the future). So even if
btree would be faster for those extreme corner cases, it would still
not be a replacement for the nonlinear (I wish there was a decent
replacement for nonlinear, whose only reason to exist seems to be uml
on 64bit archs).

If I would be in you, as a slightly more likely to succeed experiment,
I would be looking into replacing the pagecache radix-tree with a
btree, as long as you can leave intact the tagging properties we have
in the radix-tree needed for finding only dirty elements in the tree
etc... (we use that to avoid separate dirty lists for the pages). You
should also size the order to automatically match the cache size of
the arch (dunno if it's better at compile or run time). I'm no a
radix-tree guru but the btree may save some ram if you've all
pagecache pages scattered all over the place with random access. It
also won't require all levels to be allocated. However it will require
rebalancing, something the radix tree doesn't require, it seems a bit
of a tradeoff, and I suspect the radix-tree will still win in all
common cases. But at least all oom failure paths should already exists
for you, so that should avoid you having to touch much code externally
to your own btree files.

I wish you to have fun with the btrees, I remember I had fun back then
when I was playing with the rbtrees ;).

next prev parent reply	other threads:[~2006-07-19 16:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-18  2:02 Generic B-tree implementation Vishal Patil
2006-07-18  2:58 ` Horst von Brand
2006-07-18  3:08   ` Vishal Patil
2006-07-18  3:20     ` Valdis.Kletnieks
2006-07-18  4:27 ` Gary Funck
2006-07-18 13:30   ` Vishal Patil
2006-07-18 15:00     ` Gary Funck
2006-07-18 15:13       ` Bob Copeland
2006-07-18 15:22       ` Vishal Patil
2006-07-19  7:33         ` Anton Altaparmakov
2006-07-19 13:34           ` Vishal Patil
2006-07-19 16:14             ` Andrea Arcangeli [this message]
2006-07-19 16:26               ` Vishal Patil
2006-08-07  2:18                 ` Vishal Patil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060719161427.GN5726@opteron.random \
    --to=andrea@suse.de \
    --cc=aia21@cam.ac.uk \
    --cc=gary@intrepid.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vishpat@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox