Fractal Tree Indexing over B-Trees?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fractal Tree Indexing over B-Trees?
@ 2012-03-28 14:25 Danny Piccirillo
  2012-03-28 15:10 ` C Anthony Risinger
  2012-03-28 18:42 ` Josef Bacik
  0 siblings, 2 replies; 12+ messages in thread
From: Danny Piccirillo @ 2012-03-28 14:25 UTC (permalink / raw)
  To: linux-btrfs

The case has been made on Phoronix for F-Trees: They makes use hard
drive speeds, not (relatively slow) access times; beat SSD's; and scale 
perfectly across multiple cores with hundreds of millions of entries.

http://en.wikipedia.org/wiki/TokuDB#Fractal_tree_indexes

How TokuDB Fractal Tree Databases Work

Via: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NjM

Time for someone to get started on ftrfs? Or can it be implemented 
in Btrfs? 
https://bugzilla.kernel.org/show_bug.cgi?id=43004

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 14:25 Fractal Tree Indexing over B-Trees? Danny Piccirillo
@ 2012-03-28 15:10 ` C Anthony Risinger
  2012-03-28 18:42 ` Josef Bacik
  1 sibling, 0 replies; 12+ messages in thread
From: C Anthony Risinger @ 2012-03-28 15:10 UTC (permalink / raw)
  To: Danny Piccirillo; +Cc: linux-btrfs

On Wed, Mar 28, 2012 at 9:25 AM, Danny Piccirillo
<danny.piccirillo@member.fsf.org> wrote:
> The case has been made on Phoronix for F-Trees: They makes use hard
> drive speeds, not (relatively slow) access times; beat SSD's; and scale
> perfectly across multiple cores with hundreds of millions of entries.
>
> http://en.wikipedia.org/wiki/TokuDB#Fractal_tree_indexes
>
> How TokuDB Fractal Tree Databases Work
>
> Via: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NjM
>
> Time for someone to get started on ftrfs? Or can it be implemented
> in Btrfs?
> https://bugzilla.kernel.org/show_bug.cgi?id=43004

whoa, very cool stuff.  fractals are awesome, cool to see them in use.

... 2010/11, surprised i never heard of it before now.  thanks for the
reference/links at the very least!

aside: i once described fractals to my grandmother (100% devout
catholic) as related to my own understanding of the universe --
specifically, i pointed out how their often simple mathematical
identity fragments into an infinitely self-similar pattern of
seemingly unbounded complexity -- she told me i was describing god ...
and, well, we agreed :-)

-- 

C Anthony

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 14:25 Fractal Tree Indexing over B-Trees? Danny Piccirillo
  2012-03-28 15:10 ` C Anthony Risinger
@ 2012-03-28 18:42 ` Josef Bacik
  2012-03-28 18:45   ` Jeff Mahoney
  2012-03-28 18:57   ` Josef Bacik
  1 sibling, 2 replies; 12+ messages in thread
From: Josef Bacik @ 2012-03-28 18:42 UTC (permalink / raw)
  To: Danny Piccirillo; +Cc: linux-btrfs

On Wed, Mar 28, 2012 at 02:25:39PM +0000, Danny Piccirillo wrote:
> The case has been made on Phoronix for F-Trees: They makes use hard
> drive speeds, not (relatively slow) access times; beat SSD's; and scale 
> perfectly across multiple cores with hundreds of millions of entries.
> 
> http://en.wikipedia.org/wiki/TokuDB#Fractal_tree_indexes
> 
> How TokuDB Fractal Tree Databases Work
> 
> Via: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NjM
> 
> Time for someone to get started on ftrfs? Or can it be implemented 
> in Btrfs? 
> https://bugzilla.kernel.org/show_bug.cgi?id=43004
> 

This is dumber shit than usual out of phoronix.  I did some just basic
calculations for worst case scenarios, let's say we max out btrfs's 8 level
limit, so we have 7 levels of nodes and then our leaves.  Lets just for
simplicity sake say we can fit 1 billion objects into this kind of tree, and for
each node/leaf we can only hold 10 objects.  (Keep in mind these are just random
numbers I'm pulling out of my ass and may not work out right with such a large
tree, just work with me here).

So worst case scenario doing a binary search for an object with 10 objects is 4
comparisons, so with a full 8 level btree we're doing 4 comparisons per level,
so 32 comparisons worst possible case to find our object.

Now let's consider a fractal tree with 1 billion objects.  So that would have a
29 row fractal tree (if I did my math right).  Now I have to make some
assumptions about how you would implement a fractal tree here, but let's assume
that every level tells you it's min and max value to make it easier to tell
which level you need to search in.  So worst case you're object is in the 29th
level, so you have to read 29 blocks in and check the min/max levels to figure
out which row to search.  Let's be fair and say these are in memory, we're still
doing 29 comparisons right out of the gate to find the right row.  Now we have
the right row, but this is a file system and the rows are split up into blocks,
so we not only have to binary search each block, but the blocks themselves to
find the right block with our object.  And we can't do this directed either
because we have no way of indexing the individual blocks, so worst case scenario
we're having to read in 268435456 blocks to then do a normal binary search on
_each_ block!  Now lets again say just to give fractal trees an added benefit
that each block has it's own min/max setting, we only have to binary search the
one that will have our actual data, but we're still talking about reading in
33554432 times the number of blocks as compared to a btree!!!!!!

And this doesn't even begin to touch the ability to handle multi-threaded
workloads.  Imagine having to move all of these blocks around in memory because
your insert created a new row.  You'd have to lock each row as you move stuff
around (at the very least), so if you hit a particularly hot row your
multithreaded performance is going to look like you are running on an Atari
2600.

So no I don't think it's time for someone to get started on ftrfs.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 18:42 ` Josef Bacik
@ 2012-03-28 18:45   ` Jeff Mahoney
  2012-03-28 18:57   ` Josef Bacik
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff Mahoney @ 2012-03-28 18:45 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Danny Piccirillo, linux-btrfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/28/2012 02:42 PM, Josef Bacik wrote:
> On Wed, Mar 28, 2012 at 02:25:39PM +0000, Danny Piccirillo wrote:
>> The case has been made on Phoronix for F-Trees: They makes use
>> hard drive speeds, not (relatively slow) access times; beat
>> SSD's; and scale perfectly across multiple cores with hundreds of
>> millions of entries.

> So no I don't think it's time for someone to get started on ftrfs.
> Thanks,

Thanks for doing the research to back up the answer I was going to
respond with based on intuition.

+1

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPc1xgAAoJEB57S2MheeWydJIP/R7jHiloaPLfwffuFS3gobmy
h0DqX8YezGMs70kLBYTXeu3EVuFtX8IFGJDGoVp3pkM9nS6F6iK9LbRWDC7IDAy7
t1taQoUqy0HMmmrkXfZ8KWmXWv424/Zq5aHfnegL/oq4OrUwnHtjzJBbsx4fvezW
mnPrNlOxaLWgXSyUs+1hDjvcgfmnRjpgURHfsfGNgX3gTUE4xNKCFXD4zs0bs5YO
NcpLa/66R5UwINPLOazHt9QpC2jHxPb0j93YZBipwtbtXPj1W+od/0rOgPvc9gaK
hzi7kRiegt50V2LVOJnofdaBC0AlCfRlcXRUNOil1vgFe6s8sft1KOdl8MShQ/+t
JUTjb1j7Z3efds42nnahvNj8wV4T6z6qLlhSQiOulYbVarBsfrWoM3EJY2ObzIlM
uOjCIPwHr/fzMJ9wVyLgK2ksssZmYAVgQ3YyS9G1CHIPe9xgW9Ld570mhhXRoLRj
HG5QZkfkmFzlde+S/gGgrdrvTWDT1zDsUhe+IGYu3jtPjXVs1udB+BN3c7rTWjf+
gQUOdcS/6XHoTXWpgZ7p5DUb8qF7Nv+ABBcEWeiaIEPH7uUZ22/XZEdf2euSCTp2
+TRZfoj9PZ17cszkllG4n+kc5H4gdKMYRyvNQo9mQ2TvogsmVOgIY+0Fys2ZdOkF
CaZ6ti9ZLkofawzImOV5
=UaEh
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 18:42 ` Josef Bacik
  2012-03-28 18:45   ` Jeff Mahoney
@ 2012-03-28 18:57   ` Josef Bacik
  2012-03-28 19:50     ` Zach Brown
  1 sibling, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2012-03-28 18:57 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Danny Piccirillo, linux-btrfs

On Wed, Mar 28, 2012 at 02:42:04PM -0400, Josef Bacik wrote:
> On Wed, Mar 28, 2012 at 02:25:39PM +0000, Danny Piccirillo wrote:
> > The case has been made on Phoronix for F-Trees: They makes use hard
> > drive speeds, not (relatively slow) access times; beat SSD's; and scale 
> > perfectly across multiple cores with hundreds of millions of entries.
> > 
> > http://en.wikipedia.org/wiki/TokuDB#Fractal_tree_indexes
> > 
> > How TokuDB Fractal Tree Databases Work
> > 
> > Via: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NjM
> > 
> > Time for someone to get started on ftrfs? Or can it be implemented 
> > in Btrfs? 
> > https://bugzilla.kernel.org/show_bug.cgi?id=43004
> > 
> 
> This is dumber shit than usual out of phoronix.  I did some just basic
> calculations for worst case scenarios, let's say we max out btrfs's 8 level
> limit, so we have 7 levels of nodes and then our leaves.  Lets just for
> simplicity sake say we can fit 1 billion objects into this kind of tree, and for
> each node/leaf we can only hold 10 objects.  (Keep in mind these are just random
> numbers I'm pulling out of my ass and may not work out right with such a large
> tree, just work with me here).
> 
> So worst case scenario doing a binary search for an object with 10 objects is 4
> comparisons, so with a full 8 level btree we're doing 4 comparisons per level,
> so 32 comparisons worst possible case to find our object.
> 
> Now let's consider a fractal tree with 1 billion objects.  So that would have a
> 29 row fractal tree (if I did my math right).  Now I have to make some
> assumptions about how you would implement a fractal tree here, but let's assume
> that every level tells you it's min and max value to make it easier to tell
> which level you need to search in.  So worst case you're object is in the 29th
> level, so you have to read 29 blocks in and check the min/max levels to figure
> out which row to search.  Let's be fair and say these are in memory, we're still
> doing 29 comparisons right out of the gate to find the right row.  Now we have
> the right row, but this is a file system and the rows are split up into blocks,
> so we not only have to binary search each block, but the blocks themselves to
> find the right block with our object.  And we can't do this directed either
> because we have no way of indexing the individual blocks, so worst case scenario
> we're having to read in 268435456 blocks to then do a normal binary search on
> _each_ block!  Now lets again say just to give fractal trees an added benefit
> that each block has it's own min/max setting, we only have to binary search the
> one that will have our actual data, but we're still talking about reading in
> 33554432 times the number of blocks as compared to a btree!!!!!!
> 

Ok looks like I wasn't being completely fair here, part of the presentation they
show talks about using forward pointers to point to where the element would be
in the next row.  So if we assume we're using forward pointers and every row has
a min/max indicator you can walk down each row and do a binary search to get as
close as possible to what you want and then follow the forward pointer down to
the next level.  The problem with this is that in the worst case you end up
having to binary search on each row, which makes the SMP case even crappier
because you have to make sure nothing changes while you are walking down the
rows and following the forward pointers.  The math here is a little trickier
since I can't easily picture what the worst case scenario would be per level,
but lets say O(log N/2) where N is the number of elements in the row.  So in the
situation I describe you are looking at having to do minimum of 29 reads, one
for each row, and then add into that all the reads you need to binary search
from your forward pointer on to the end of the row, which is going to increase
as you go down the tree.  So probably not millions times more work than b-trees,
but at least 10s if not 100s of times more work in the worst case.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 18:57   ` Josef Bacik
@ 2012-03-28 19:50     ` Zach Brown
  2012-03-28 20:13       ` Josef Bacik
  0 siblings, 1 reply; 12+ messages in thread
From: Zach Brown @ 2012-03-28 19:50 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Danny Piccirillo, linux-btrfs

> but lets say O(log N/2) where N is the number of elements in the row.
> So in the situation I describe you are looking at having to do minimum
> of 29 reads, one for each row,

Hmm.

Levels are powers of two and are either full or empty.  So the total
item count tells you which levels are full or empty.

(gdb) print/t 1000000000
$3 = 111011100110101100101000000000

So no, definitely not reading 29 levels.

After realizing this, I have to be honest: I'm not finding your
hand-wavey dismissal of the technology convincing at all. :)

There's a *ton* of detail that they don't describe about how to actually
translate the logical notion of power-of-two node sizes into coherent
block IO that scales.  I don't doubt that it's possible.

I very much doubt that the required complexity and cpu cost in the
kernel would be worth it for generic file system users, though.  Hooo
boy, do I doubt it.

- z

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 19:50     ` Zach Brown
@ 2012-03-28 20:13       ` Josef Bacik
  2012-03-28 20:29         ` Zach Brown
  2012-03-28 20:44         ` Niels de Carpentier
  0 siblings, 2 replies; 12+ messages in thread
From: Josef Bacik @ 2012-03-28 20:13 UTC (permalink / raw)
  To: Zach Brown; +Cc: Josef Bacik, Danny Piccirillo, linux-btrfs

On Wed, Mar 28, 2012 at 03:50:07PM -0400, Zach Brown wrote:
> 
> >but lets say O(log N/2) where N is the number of elements in the row.
> >So in the situation I describe you are looking at having to do minimum
> >of 29 reads, one for each row,
> 
> Hmm.
> 
> Levels are powers of two and are either full or empty.  So the total
> item count tells you which levels are full or empty.
> 
> (gdb) print/t 1000000000
> $3 = 111011100110101100101000000000
> 
> So no, definitely not reading 29 levels.
>

You are still going to have to have at least 29 levels to accomodate 1 billion
objects, though they won't all be full (sorry I missed the must be full or empty
bit).  So it looks like we'll have to actually search what 13 rows right?  So
still more rows than a b-tree, and again you are talking about binary searching
each row's blocks and then following its forward pointer down to the next one,
so maybe not 100's of times slower than a b-tree, but we're not talking about a
5-10% difference either, probably still measured in multiples.

> After realizing this, I have to be honest: I'm not finding your
> hand-wavey dismissal of the technology convincing at all. :)
> 

Hah my math was a little off I'll grant you but the basic idea still stands,
once we start getting into larger and larger rows we're going to have to binary
search just to find the right _block_ to use, which in my mind is much more of a
problem than having to binary search inside of multiple blocks.  At the very
least as the number of objects grows the time it takes to find something in the
tree increases exponentially.

> There's a *ton* of detail that they don't describe about how to actually
> translate the logical notion of power-of-two node sizes into coherent
> block IO that scales.  I don't doubt that it's possible.

I imagine there is, but based on what little information they've shown I don't
see how it's a hands down win against b-trees.  If anything we're talking about
having to solve really complex problems in order to get any sort of good
performance out of this thing.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 20:13       ` Josef Bacik
@ 2012-03-28 20:29         ` Zach Brown
  2012-03-28 20:44         ` Niels de Carpentier
  1 sibling, 0 replies; 12+ messages in thread
From: Zach Brown @ 2012-03-28 20:29 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Danny Piccirillo, linux-btrfs


> I imagine there is, but based on what little information they've shown
> I don't see how it's a hands down win against b-trees.  If anything
> we're talking about having to solve really complex problems in order
> to get any sort of good performance out of this thing.

Oh, absolutely.  Tack on COW and online repair and the complexity goes
through the roof.

- z

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 20:13       ` Josef Bacik
  2012-03-28 20:29         ` Zach Brown
@ 2012-03-28 20:44         ` Niels de Carpentier
  2012-03-28 20:53           ` Josef Bacik
  1 sibling, 1 reply; 12+ messages in thread
From: Niels de Carpentier @ 2012-03-28 20:44 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Zach Brown, Josef Bacik, Danny Piccirillo, linux-btrfs


>
> You are still going to have to have at least 29 levels to accomodate 1
> billion
> objects, though they won't all be full (sorry I missed the must be full or
> empty
> bit).  So it looks like we'll have to actually search what 13 rows right?
> So
> still more rows than a b-tree, and again you are talking about binary
> searching
> each row's blocks and then following its forward pointer down to the next
> one,
> so maybe not 100's of times slower than a b-tree, but we're not talking
> about a
> 5-10% difference either, probably still measured in multiples.

They say that they can do the block pointers in a way that limits the
number for searches per row to a constant, so it's not that bad. They do
less random seeks, but bigger ones at the cost of more cpu usage.
>
> Hah my math was a little off I'll grant you but the basic idea still
> stands,
> once we start getting into larger and larger rows we're going to have to
> binary
> search just to find the right _block_ to use, which in my mind is much
> more of a
> problem than having to binary search inside of multiple blocks.  At the
> very
> least as the number of objects grows the time it takes to find something
> in the
> tree increases exponentially.

That's not what they say. A lot of info is missing though.
>
>> There's a *ton* of detail that they don't describe about how to actually
>> translate the logical notion of power-of-two node sizes into coherent
>> block IO that scales.  I don't doubt that it's possible.
>
> I imagine there is, but based on what little information they've shown I
> don't
> see how it's a hands down win against b-trees.  If anything we're talking
> about
> having to solve really complex problems in order to get any sort of good
> performance out of this thing.  Thanks,

They don't claim to win hands down for the search case, they say they are
similar for the search case, but much better at inserts.

I don't think it's good for the linux fs general use case, but it's not as
bad as you think. But it will be very hard to implement anyway unless they
release more details.

Niels


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 20:44         ` Niels de Carpentier
@ 2012-03-28 20:53           ` Josef Bacik
  2012-03-28 21:14             ` Niels de Carpentier
  0 siblings, 1 reply; 12+ messages in thread
From: Josef Bacik @ 2012-03-28 20:53 UTC (permalink / raw)
  To: Niels de Carpentier
  Cc: Josef Bacik, Zach Brown, Danny Piccirillo, linux-btrfs

On Wed, Mar 28, 2012 at 10:44:03PM +0200, Niels de Carpentier wrote:
> 
> >
> > You are still going to have to have at least 29 levels to accomodate 1
> > billion
> > objects, though they won't all be full (sorry I missed the must be full or
> > empty
> > bit).  So it looks like we'll have to actually search what 13 rows right?
> > So
> > still more rows than a b-tree, and again you are talking about binary
> > searching
> > each row's blocks and then following its forward pointer down to the next
> > one,
> > so maybe not 100's of times slower than a b-tree, but we're not talking
> > about a
> > 5-10% difference either, probably still measured in multiples.
> 
> They say that they can do the block pointers in a way that limits the
> number for searches per row to a constant, so it's not that bad. They do
> less random seeks, but bigger ones at the cost of more cpu usage.

I'd like to see how they do that.  The fact is you are still going to get random
seeks since you have to binary search the blocks in an entire row since there is
no way you can read a several thousand block row into memory to search it, so
once your rows get pretty big you are doing just as much if not more random io
as a btree.

> >
> > Hah my math was a little off I'll grant you but the basic idea still
> > stands,
> > once we start getting into larger and larger rows we're going to have to
> > binary
> > search just to find the right _block_ to use, which in my mind is much
> > more of a
> > problem than having to binary search inside of multiple blocks.  At the
> > very
> > least as the number of objects grows the time it takes to find something
> > in the
> > tree increases exponentially.
> 
> That's not what they say. A lot of info is missing though.
> >
> >> There's a *ton* of detail that they don't describe about how to actually
> >> translate the logical notion of power-of-two node sizes into coherent
> >> block IO that scales.  I don't doubt that it's possible.
> >
> > I imagine there is, but based on what little information they've shown I
> > don't
> > see how it's a hands down win against b-trees.  If anything we're talking
> > about
> > having to solve really complex problems in order to get any sort of good
> > performance out of this thing.  Thanks,
> 
> They don't claim to win hands down for the search case, they say they are
> similar for the search case, but much better at inserts.
> 
> I don't think it's good for the linux fs general use case, but it's not as
> bad as you think. But it will be very hard to implement anyway unless they
> release more details.
>

I don't buy that its better in the insert case either for similar reasons, you
are talking about having to rewrite entire rows to maintain the sequential
nature of everything, so if you end up adding a giant row you have to go and
write the whole thing out again, so you are talking about a lot more IO than
with b-trees, albeit sequential.  So maybe it's a win since it's sequential but
I wonder at what tree size that benefit no longer exists.

Of course all this analysis is based on a power point presentation and there may
be some detail I'm missing, but that's mostly my point, claiming that b-trees
are now obsolete because we can maybe do inserts faster is just idiotic.
Thanks,

Josef 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 20:53           ` Josef Bacik
@ 2012-03-28 21:14             ` Niels de Carpentier
  2013-07-04 17:48               ` Kẏra
  0 siblings, 1 reply; 12+ messages in thread
From: Niels de Carpentier @ 2012-03-28 21:14 UTC (permalink / raw)
  To: linux-btrfs

> I'd like to see how they do that.  The fact is you are still going to get
> random
> seeks since you have to binary search the blocks in an entire row since
> there is
> no way you can read a several thousand block row into memory to search it,
> so
> once your rows get pretty big you are doing just as much if not more
> random io
> as a btree.

Why? The rows are sequential on disk, so you're never doing more random
seeks than the number of rows as long as you can search faster than the
disk can provide data. If they indeed can limit the number of searches per
row to a constant, it shouldn't be an issue at all.

> I don't buy that its better in the insert case either for similar reasons,
> you
> are talking about having to rewrite entire rows to maintain the sequential
> nature of everything, so if you end up adding a giant row you have to go
> and
> write the whole thing out again, so you are talking about a lot more IO
> than
> with b-trees, albeit sequential.  So maybe it's a win since it's
> sequential but
> I wonder at what tree size that benefit no longer exists.

The worst case might be bad, but on average it should be good. I don't
doubt it is always better on rotating disks, probably not on ssd's.
>
> Of course all this analysis is based on a power point presentation and
> there may
> be some detail I'm missing, but that's mostly my point, claiming that
> b-trees
> are now obsolete because we can maybe do inserts faster is just idiotic.

It's a presentation from a company, so lots of marketing. Of course it
doesn't make btrees obsolete, it's optimized for specific cases. Like they
say they reduce random seeks at the cost of disk bandwidth and cpu usage.
It depends on the use case if that is useful or not. Probably not for
btrfs, since the future will be ssd's, and meta data will generally be
cached.

Niels


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Fractal Tree Indexing over B-Trees?
  2012-03-28 21:14             ` Niels de Carpentier
@ 2013-07-04 17:48               ` Kẏra
  0 siblings, 0 replies; 12+ messages in thread
From: Kẏra @ 2013-07-04 17:48 UTC (permalink / raw)
  To: linux-btrfs

Niels de Carpentier <niels <at> decarpentier.com> writes:
> > I'd like to see how they do that.  The fact is you are still going to get
> > random
> > seeks since you have to binary search the blocks in an entire row since
> > there is
> > no way you can read a several thousand block row into memory to search it,
> > so
> > once your rows get pretty big you are doing just as much if not more
> > random io
> > as a btree.
> 
> Why? The rows are sequential on disk, so you're never doing more random
> seeks than the number of rows as long as you can search faster than the
> disk can provide data. If they indeed can limit the number of searches per
> row to a constant, it shouldn't be an issue at all.
> 
> > I don't buy that its better in the insert case either for similar reasons,
> > you
> > are talking about having to rewrite entire rows to maintain the sequential
> > nature of everything, so if you end up adding a giant row you have to go
> > and
> > write the whole thing out again, so you are talking about a lot more IO
> > than
> > with b-trees, albeit sequential.  So maybe it's a win since it's
> > sequential but
> > I wonder at what tree size that benefit no longer exists.
> 
> The worst case might be bad, but on average it should be good. I don't
> doubt it is always better on rotating disks, probably not on ssd's.
> >
> > Of course all this analysis is based on a power point presentation and
> > there may
> > be some detail I'm missing, but that's mostly my point, claiming that
> > b-trees
> > are now obsolete because we can maybe do inserts faster is just idiotic.
> 
> It's a presentation from a company, so lots of marketing. Of course it
> doesn't make btrees obsolete, it's optimized for specific cases. Like they
> say they reduce random seeks at the cost of disk bandwidth and cpu usage.
> It depends on the use case if that is useful or not. Probably not for
> btrfs, since the future will be ssd's, and meta data will generally be
> cached.
> 
> Niels
> 

I'm reviving a very old thread from here:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/16484

I found a bunch of more recent info from the company doing most of the work
behind fractal tree FS that I'd love to hear your thoughts on.

https://www.youtube.com/watch?v=c-n2LGPpQEw
http://www.tokutek.com/2013/02/mongodb-fractal-tree-indexes-high-compression
https://www.usenix.org/conference/hotstorage12/tokufs-streaming-file-system

Does this address earlier concerns at all? 

I just hope their work will be
released as unpatented GPL-compatible free software. 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-07-04 17:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-28 14:25 Fractal Tree Indexing over B-Trees? Danny Piccirillo
2012-03-28 15:10 ` C Anthony Risinger
2012-03-28 18:42 ` Josef Bacik
2012-03-28 18:45   ` Jeff Mahoney
2012-03-28 18:57   ` Josef Bacik
2012-03-28 19:50     ` Zach Brown
2012-03-28 20:13       ` Josef Bacik
2012-03-28 20:29         ` Zach Brown
2012-03-28 20:44         ` Niels de Carpentier
2012-03-28 20:53           ` Josef Bacik
2012-03-28 21:14             ` Niels de Carpentier
2013-07-04 17:48               ` Kẏra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).