Split Trees (my design, similar to binary treaps)

All of lore.kernel.org
 help / color / mirror / Atom feed

* Split Trees (my design, similar to binary treaps)
@ 2003-05-08  4:22 Kevin Bealer
  2003-05-22 15:30 ` Hans Reiser
  0 siblings, 1 reply; 24+ messages in thread
From: Kevin Bealer @ 2003-05-08  4:22 UTC (permalink / raw)
  To: Hans Reiser, reiserfs-list; +Cc: jmacd, kevinbealer

Here is how 'Split Trees' work.

Each node has four components; Left, Right,
Value, and Level.  Every operation is the same
as a 'simple' binary tree except insert and
delete.

Need to preserve two invariants:
  1. Basic binary tree ordering invariant.
  2. No node's level is higher than its parent's.

Insert:

Basically, you select a 'level' for each node,
the same way you would in a skip list, i.e. half
the nodes are level 0, half the remaining are
level 1; 2^32 nodes implies 32 different levels.
No 'maximum level' has to be chosen initially.

If the node level is zero, insertion is exactly
the same as a 'simple' binary tree - traverse to
the bottom of the tree and attach to one side of
a bottom level leaf node.

Insertions of other nodes are more complex,
because that is where balancing is done.

Given a new node with value V and level N,
traverse down the tree until you get to a node
whose level is LESS than N.

To preserve the invariant, insert the new node
ABOVE the node which is of a lower level.

(If you look at only the subset of nodes of the
same or higher level as the new node, they form
a normal binary tree, rooted at the "top" node.
We are inserting a new leaf a the bottom of this
'subset' tree.)

Now we have the problem: what to do with the
node which we are inserting ABOVE.  We split it
into two halves - nodes which are less than (or
equal to) the new node in value, and nodes which
are higher in value.  This preserves the order
of the tree of course.

Suprisingly, splitting a binary tree is very
easy and can be done in O(ln n) time.

The split algorithm is conceptually simple.
There are two sides of the subtree which is
being split, the left half, and the right half,
with a jagged line down the middle.  This jagged
line will follow the pointers (edges) which we
would follow if we were to insert the new node
at the actual bottom-level leaf.

Now I introduce several terms: the left-receiver
and the right-receiver.  The left receiver is
where we attach nodes with values less than or
equal to V and the right is where we attach
nodes with values greater than V.  OLDTREE is
the tree we are splitting (it was attached where
newnode is now attached).  The 'disputed area'
is the subtree which contains nodes that may
belong in the left or right reciever.

At first, the left receiver is the Left side of
the NEWNODE, the right receiver is the RIGHT
side.  We attach the OLDTREE to whichever side
it belongs to (based on its value).

If OLDTREE->value is less than V, attach OLDTREE
to NEWNODE->left.  Then we need to split the new
'disputed area', which is OLDTREE->right, over
the new left and right receivers.  The right
receiver is still the same, but now the left
receiver is OLDTREE->right.

If the OLDTREE->value is greater than V, we do
the symmetric operation (switch R & L in the
paragraph above).

When you reach a NULL value for OLDTREE, you are
done - the division is complete.

There is a tiny wrinkle -- At first the left
receiver is the "left" pointer of NEWNODE, but
once a left-attachment is made, it is always a
"right" pointer, and vice versa.  Therefore,
there are four almost-identical "Split"
functions.
SplitLR() - calls down to SplitRR if it makes a
    Left attachment, otherwise calls SplitLL.

SplitRR() - recurses to SplitRR if it makes a
    Left attachment, otherwise calls SplitRL.

SplitLL() - calls SplitRL if it makes a Right
    attachment, otherwise recurses to SplitRL.

SplitRL() - always recurses to itself, because
    an attachment had been made to both sides
    of NEWNODE.

Actually, in the code the recursion has mostly
been removed because it was so easy to do.

Delete:

I haven't written the code for this yet, but
it should be trivial.

Deletion is basically the same as with a
'simple' binary tree.  Basically, replace the
node with the highest-valued node not higher
than V; or with the lowest-valued node not lower
than V.

The only trick is that you need to promote the
moved node to whatever level the deleted node
was, by assigning the deleted node's level to
its 'level' variable.

After a lot of deletion, the tree could become
biased with a lot of high level nodes.  This
is easy to prevent - when traversing the tree
during deletion, if you see a leaf node, lower
its level to zero (which is always a legal thing
to do).

For non-leaf nodes, demote them by one level if
they are 2 or more levels above both children.
This will 'comb out' the inequality, in an
amortized way.

I'll follow up with the code and an example
in two more emails.

Kevin

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-08  4:22 Split Trees (my design, similar to binary treaps) Kevin Bealer
@ 2003-05-22 15:30 ` Hans Reiser
  2003-05-22 16:09   ` Alexander G. M. Smith
  2003-05-27  1:38   ` Enrique Perez-Terron
  0 siblings, 2 replies; 24+ messages in thread
From: Hans Reiser @ 2003-05-22 15:30 UTC (permalink / raw)
  To: Kevin Bealer; +Cc: reiserfs-list, jmacd

Forgive me for asking some bonehead questions.....

Are skip lists height balanced?  (All paths to the leaves are equal?)

Kevin Bealer wrote:

>Here is how 'Split Trees' work.
>
>Each node has four components; Left, Right,
>Value, and Level.  Every operation is the same
>as a 'simple' binary tree except insert and
>delete.
>
>Need to preserve two invariants:
>  1. Basic binary tree ordering invariant.
>  2. No node's level is higher than its parent's.
>
>Insert:
>
>Basically, you select a 'level' for each node,
>the same way you would in a skip list, i.e. half
>the nodes are level 0, half the remaining are
>level 1; 2^32 nodes implies 32 different levels.
>
How do you determine which level a node is?

>No 'maximum level' has to be chosen initially.
>
>If the node level is zero, insertion is exactly
>the same as a 'simple' binary tree - traverse to
>the bottom of the tree and attach to one side of
>a bottom level leaf node.
>
>Insertions of other nodes are more complex,
>because that is where balancing is done.
>
>Given a new node with value V and level N,
>traverse down the tree until you get to a node
>whose level is LESS than N.
>
>To preserve the invariant, insert the new node
>ABOVE the node which is of a lower level.
>
Interesting.  Also, this suggests that skip lists are not height 
balanced.....?

What is the average fan-out?  Is all data in leaves and are all pointers 
in internal nodes?

>
>(If you look at only the subset of nodes of the
>same or higher level as the new node, they form
>a normal binary tree, rooted at the "top" node.
>We are inserting a new leaf a the bottom of this
>'subset' tree.)
>
>
>Now we have the problem: what to do with the
>node which we are inserting ABOVE.  We split it
>into two halves - nodes which are less than (or
>equal to) the new node in value, and nodes which
>are higher in value.  This preserves the order
>of the tree of course.
>
>Suprisingly, splitting a binary tree is very
>easy and can be done in O(ln n) time.
>
Why does splitting of a tree not a node occur?

>
>The split algorithm is conceptually simple.
>There are two sides of the subtree which is
>being split, the left half, and the right half,
>with a jagged line down the middle.  This jagged
>line will follow the pointers (edges) which we
>would follow if we were to insert the new node
>at the actual bottom-level leaf.
>
>Now I introduce several terms: the left-receiver
>and the right-receiver.  The left receiver is
>where we attach nodes with values less than or
>equal to V and the right is where we attach
>nodes with values greater than V.  OLDTREE is
>the tree we are splitting (it was attached where
>newnode is now attached).  The 'disputed area'
>is the subtree which contains nodes that may
>belong in the left or right reciever.
>
>
>At first, the left receiver is the Left side of
>the NEWNODE, the right receiver is the RIGHT
>side.  We attach the OLDTREE to whichever side
>it belongs to (based on its value).
>
>If OLDTREE->value is less than V, attach OLDTREE
>to NEWNODE->left.  Then we need to split the new
>'disputed area', which is OLDTREE->right, over
>the new left and right receivers.  The right
>receiver is still the same, but now the left
>receiver is OLDTREE->right.
>
>If the OLDTREE->value is greater than V, we do
>the symmetric operation (switch R & L in the
>paragraph above).
>
>When you reach a NULL value for OLDTREE, you are
>done - the division is complete.
>
>There is a tiny wrinkle -- At first the left
>receiver is the "left" pointer of NEWNODE, but
>once a left-attachment is made, it is always a
>"right" pointer, and vice versa.  Therefore,
>there are four almost-identical "Split"
>functions.
>SplitLR() - calls down to SplitRR if it makes a
>    Left attachment, otherwise calls SplitLL.
>
>SplitRR() - recurses to SplitRR if it makes a
>    Left attachment, otherwise calls SplitRL.
>
>SplitLL() - calls SplitRL if it makes a Right
>    attachment, otherwise recurses to SplitRL.
>
>SplitRL() - always recurses to itself, because
>    an attachment had been made to both sides
>    of NEWNODE.
>
>Actually, in the code the recursion has mostly
>been removed because it was so easy to do.
>
>
>Delete:
>
>I haven't written the code for this yet, but
>it should be trivial.
>
>Deletion is basically the same as with a
>'simple' binary tree.  Basically, replace the
>node with the highest-valued node not higher
>than V; or with the lowest-valued node not lower
>than V.
>
>The only trick is that you need to promote the
>moved node to whatever level the deleted node
>was, by assigning the deleted node's level to
>its 'level' variable.
>
>After a lot of deletion, the tree could become
>biased with a lot of high level nodes.  This
>is easy to prevent - when traversing the tree
>during deletion, if you see a leaf node, lower
>its level to zero (which is always a legal thing
>to do).
>
>For non-leaf nodes, demote them by one level if
>they are 2 or more levels above both children.
>This will 'comb out' the inequality, in an
>amortized way.
>
>I'll follow up with the code and an example
>in two more emails.
>
>
>Kevin
>
>
>
>__________________________________
>Do you Yahoo!?
>The New Yahoo! Search - Faster. Easier. Bingo.
>http://search.yahoo.com
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-22 15:30 ` Hans Reiser
@ 2003-05-22 16:09   ` Alexander G. M. Smith
  2003-05-22 17:16     ` Hans Reiser
  2003-05-27  1:38   ` Enrique Perez-Terron
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander G. M. Smith @ 2003-05-22 16:09 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

Hans Reiser wrote on Thu, 22 May 2003 19:30:16 +0400:
> Forgive me for asking some bonehead questions.....
> 
> Are skip lists height balanced?  (All paths to the leaves are equal?)

Only on the average, at least in the original skip list algorithm.  The
height is determined randomly when the node is created; there is no
rebalancing.  The random distribution is defined as you would expect,
50% of the nodes have 1 level, 25% have 2 levels, ... 0.5**n of the
nodes have n levels.  Surprisingly, this works fairly well.

- Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-22 16:09   ` Alexander G. M. Smith
@ 2003-05-22 17:16     ` Hans Reiser
  2003-05-22 20:21       ` Alexander G. M. Smith
  2003-05-23 19:24       ` Kevin Bealer
  0 siblings, 2 replies; 24+ messages in thread
From: Hans Reiser @ 2003-05-22 17:16 UTC (permalink / raw)
  To: Alexander G. M. Smith; +Cc: reiserfs-list

Alexander G. M. Smith wrote:

>Hans Reiser wrote on Thu, 22 May 2003 19:30:16 +0400:
>  
>
>>Forgive me for asking some bonehead questions.....
>>
>>Are skip lists height balanced?  (All paths to the leaves are equal?)
>>    
>>
>
>Only on the average, at least in the original skip list algorithm.  The
>height is determined randomly when the node is created; there is no
>rebalancing.  The random distribution is defined as you would expect,
>50% of the nodes have 1 level, 25% have 2 levels, ... 0.5**n of the
>nodes have n levels.  Surprisingly, this works fairly well.
>
>- Alex
>
>
>  
>
To have a level means to be on that level or lower?

-- 
Hans



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-22 17:16     ` Hans Reiser
@ 2003-05-22 20:21       ` Alexander G. M. Smith
  2003-05-22 22:48         ` Hans Reiser
  2003-05-23 19:24       ` Kevin Bealer
  1 sibling, 1 reply; 24+ messages in thread
From: Alexander G. M. Smith @ 2003-05-22 20:21 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

Hans Reiser wrote on Thu, 22 May 2003 21:16:15 +0400:
> Alexander G. M. Smith wrote:
> >Only on the average, at least in the original skip list algorithm.  The
> >height is determined randomly when the node is created; there is no
> >rebalancing.  The random distribution is defined as you would expect,
> >50% of the nodes have 1 level, 25% have 2 levels, ... 0.5**n of the
> >nodes have n levels.  Surprisingly, this works fairly well.
>
> To have a level means to be on that level or lower?

Yes, at least for skip lists.  A level n node has n next pointers
(or maybe n+1 if you are counting starting at zero) in the skip list.
All nodes have level 0 next node pointers (same as an ordinary linked
list).  Half of them have level 0 and level 1 pointers (level 1 pointers
point to the next node that is level 1 or higher, skipping past level 0
nodes - that's why it's a "skip list").  A quarter have level 2, 1, and 0
pointers.  In general, when you create a node, randomly determine its
level (rarely is infinite :-), allocate an array of pointers that big,
and thread it into the 0 to n different parallel level lists.  Of course,
the list head/tail endpoints have to have all n levels.

- Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-22 20:21       ` Alexander G. M. Smith
@ 2003-05-22 22:48         ` Hans Reiser
  2003-05-22 23:27           ` Alexander G. M. Smith
  0 siblings, 1 reply; 24+ messages in thread
From: Hans Reiser @ 2003-05-22 22:48 UTC (permalink / raw)
  To: Alexander G. M. Smith; +Cc: reiserfs-list

Alexander G. M. Smith wrote:

>Hans Reiser wrote on Thu, 22 May 2003 21:16:15 +0400:
>  
>
>>Alexander G. M. Smith wrote:
>>    
>>
>>>Only on the average, at least in the original skip list algorithm.  The
>>>height is determined randomly when the node is created; there is no
>>>rebalancing.  The random distribution is defined as you would expect,
>>>50% of the nodes have 1 level, 25% have 2 levels, ... 0.5**n of the
>>>nodes have n levels.  Surprisingly, this works fairly well.
>>>      
>>>
>>To have a level means to be on that level or lower?
>>    
>>
>
>Yes, at least for skip lists.  A level n node has n next pointers
>(or maybe n+1 if you are counting starting at zero) in the skip list.
>All nodes have level 0 next node pointers (same as an ordinary linked
>list).  Half of them have level 0 and level 1 pointers (level 1 pointers
>point to the next node that is level 1 or higher, skipping past level 0
>nodes - that's why it's a "skip list").  A quarter have level 2, 1, and 0
>pointers.  In general, when you create a node, randomly determine its
>level (rarely is infinite :-), allocate an array of pointers that big,
>and thread it into the 0 to n different parallel level lists.  Of course,
>the list head/tail endpoints have to have all n levels.
>
>- Alex
>
>
>  
>
Do nodes contain both pointers and data?

-- 
Hans



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-22 22:48         ` Hans Reiser
@ 2003-05-22 23:27           ` Alexander G. M. Smith
  2003-05-23 18:56             ` Kevin Bealer
  0 siblings, 1 reply; 24+ messages in thread
From: Alexander G. M. Smith @ 2003-05-22 23:27 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

Hans Reiser wrote on Fri, 23 May 2003 02:48:05 +0400:
> Do nodes contain both pointers and data?

Yes, that's a common way of doing skip lists.  Or the node could contain
the level pointers and a data pointer.  The nodes are also awkwardly
variable sized due to the random number of levels.  That leads to
allocation fragmentation unless extra steps are taken.

I don't know how the split trees approach does it.

- Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-22 23:27           ` Alexander G. M. Smith
@ 2003-05-23 18:56             ` Kevin Bealer
  2003-06-02 17:33               ` Hans Reiser
  0 siblings, 1 reply; 24+ messages in thread
From: Kevin Bealer @ 2003-05-23 18:56 UTC (permalink / raw)
  To: Alexander G. M. Smith, Hans Reiser; +Cc: reiserfs-list


As Alexander said, data can be in the node or
elsewhere in
either a skip list or skip tree.  Skip List nodes are
variable
sized; skip tree nodes are uniform, three pointers +
data.
(Unless the data is variable sized).

Skip Lists look like:

XX-----X-----X
X--X--X--X--X--X--X--X


--- "Alexander G. M. Smith" <agmsmith@rogers.com>
wrote:
> Hans Reiser wrote on Fri, 23 May 2003 02:48:05
> +0400:
> > Do nodes contain both pointers and data?
> 
> Yes, that's a common way of doing skip lists.  Or
> the node could contain
> the level pointers and a data pointer.  The nodes
> are also awkwardly
> variable sized due to the random number of levels. 
> That leads to
> allocation fragmentation unless extra steps are
> taken.
> 
> I don't know how the split trees approach does it.
> 
> - Alex


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-22 17:16     ` Hans Reiser
  2003-05-22 20:21       ` Alexander G. M. Smith
@ 2003-05-23 19:24       ` Kevin Bealer
  2003-06-03 17:49         ` Hans Reiser
  1 sibling, 1 reply; 24+ messages in thread
From: Kevin Bealer @ 2003-05-23 19:24 UTC (permalink / raw)
  To: Hans Reiser, Alexander G. M. Smith; +Cc: reiserfs-list

Sorry, last message was accidentally sent prematurely.

To continue:

Skip lists (turned 90 degrees) look like:

topnode--x------x
C0
A1-----A1-----A1
C2
C3
B4-----B4
C5
A6-----A6-----A6
C7
C8
B9-----B9
C11
A12----A12----A12

With pointers vertically between all aligned letters
(conceptually at least).

Now here is the reason why SkipTrees do NOT need
variable sized nodes:

The Pointer from A1 to A6 is used, so is the pointer
from A1 to B4 and from B4 to C5.

However, the pointers from C2 and C3 to B4, B4 to A6,
and B9 to A12 are NEVER USED for example.  You are
sacrificing
pointers to get to a binary tree, but the pointers you
lose
are the pointers you don't need anyway.

In SkipList, you need those pointers only to copy
their value to
other pointers which will not be used.  The nodes
can't shrink,
because an insertion changes which pointers are used
and which
are not.

If * is a wildcard, the rule is:

No pointer from C* to B* or B* to A* is ever used.  If
you were
going to get to A6, you would have gotten there while
traversing
the "A" list, and if you are at the C list level, you
have already left 
the A and B lists behind.

To convert a Skip List to a Skip Tree, note that every
decision
you need to make is based on a NODE->VALUE comparison.
After each decision, processing can go in two
directions, so there
are two "edges" from that decision.

Decisions become binary tree nodes.  EDGES become left
and
right pointers.

The first decision in following the skip list is "Is
the search
value greater than "1" (i.e. node A1's value is 1). 
Thus the
top node of the Skip Tree is A1.  If the value is less
than 1,
we look at C0.  Thus C0 is on the left hand side of
A1.

Just traverse the skip list, and wherever you make a
decision
based on a value, there is a node.  Note that this
skip list is
not really balanced.  Due to the extra pointers, the
policy is
normally to have FEWER high-level nodes in a Skip List
than
would be predicted by the (1/2, 1/4, 1/8) progression
from the
bottom up.  Usually 3/4 on the bottom level, then 3/4
of the
remaining on the next level, etc.

With a skip tree, we actually want the balanced
version, i.e.
ratios of 1/2 on each level, although other ratios can
be used.

Kevin

--- Hans Reiser <reiser@namesys.com> wrote:
> Alexander G. M. Smith wrote:
> 
> >Hans Reiser wrote on Thu, 22 May 2003 19:30:16
> +0400:
> >  
> >
> >>Forgive me for asking some bonehead questions.....
> >>
> >>Are skip lists height balanced?  (All paths to the
> leaves are equal?)
> >>    
> >>
> >
> >Only on the average, at least in the original skip
> list algorithm.  The
> >height is determined randomly when the node is
> created; there is no
> >rebalancing.  The random distribution is defined as
> you would expect,
> >50% of the nodes have 1 level, 25% have 2 levels,
> ... 0.5**n of the
> >nodes have n levels.  Surprisingly, this works
> fairly well.
> >
> >- Alex
> >
> >
> >  
> >
> To have a level means to be on that level or lower?
> 
> -- 
> Hans
> 
> 

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-22 15:30 ` Hans Reiser
  2003-05-22 16:09   ` Alexander G. M. Smith
@ 2003-05-27  1:38   ` Enrique Perez-Terron
  2003-05-27 20:47     ` Hans Reiser
  2003-05-28 16:44     ` Kevin Bealer
  1 sibling, 2 replies; 24+ messages in thread
From: Enrique Perez-Terron @ 2003-05-27  1:38 UTC (permalink / raw)
  To: reiserfs-list; +Cc: kevinbealer

Hello,

Prompted by this discussion I wrote some code to make sense of the
structures described, and they work wonderfully.

However, I found that in the split trees it is not necessary to
manipulate the levels of the remaining nodes after a deletion. It is
possible to reverse the effects of the split that occurs during
insertion; it is quite easy.

The reversal procedure merges two non-overlapping trees, the left and
right subtrees of the node that was removed. By non-overlapping I mean
that the largest key in the tree with small keys is smaller than the
smallest key in the other tree.

The procedure retains the desirable property of working downwards and
not looking back, so two processes could lock and update different parts
of the tree and propagate effects downwards, automatically avoiding
deadlocks. The procedure also avoids having to traverse entire subtrees
to update levels. It only touches nodes along the left edge of the right
subtree, and along the right edge of the left subtree. This should be
O(ln n).

The code can be found at http://home.online.no/~enrio/b3.c

Regards, Enrique

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-27  1:38   ` Enrique Perez-Terron
@ 2003-05-27 20:47     ` Hans Reiser
  2003-05-28 16:44     ` Kevin Bealer
  1 sibling, 0 replies; 24+ messages in thread
From: Hans Reiser @ 2003-05-27 20:47 UTC (permalink / raw)
  To: Enrique Perez-Terron; +Cc: reiserfs-list, kevinbealer

I am traveling, and so I will not participate in this thread for 7-10 
days....

-- 
Hans



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-27  1:38   ` Enrique Perez-Terron
  2003-05-27 20:47     ` Hans Reiser
@ 2003-05-28 16:44     ` Kevin Bealer
  1 sibling, 0 replies; 24+ messages in thread
From: Kevin Bealer @ 2003-05-28 16:44 UTC (permalink / raw)
  To: Enrique Perez-Terron; +Cc: reiserfs-list

Sounds great.  I thought that merging as you describe
might be possible but never figured out the details. 
(I haven't looked at
your code, I'll assume for now all works as
described).

Have you tested performance of merge-delete versus
simple
deletion?

Note that in my (simple delete with adjustments)
version you don't need to modify levels of the whole
tree, only the nodes on the path to the one you are
moving.  Asymptotically, this should keep the tree on
the correct average level.

If you don't adjust levels at all, I think the
performance cost would be small in practice.  But I
never actually implemented the delete in my version.

Kevin Bealer

--- Enrique Perez-Terron
<enrique.perez-terron@norway.online.no> wrote:
> 
> Hello,
> 
> Prompted by this discussion I wrote some code to
> make sense of the
> structures described, and they work wonderfully.
> 
> However, I found that in the split trees it is not
> necessary to
> manipulate the levels of the remaining nodes after a
> deletion. It is
> possible to reverse the effects of the split that
> occurs during
> insertion; it is quite easy.
> 
> The reversal procedure merges two non-overlapping
> trees, the left and
> right subtrees of the node that was removed. By
> non-overlapping I mean
> that the largest key in the tree with small keys is
> smaller than the
> smallest key in the other tree.
> 
> The procedure retains the desirable property of
> working downwards and
> not looking back, so two processes could lock and
> update different parts
> of the tree and propagate effects downwards,
> automatically avoiding
> deadlocks. The procedure also avoids having to
> traverse entire subtrees
> to update levels. It only touches nodes along the
> left edge of the right
> subtree, and along the right edge of the left
> subtree. This should be
> O(ln n).
> 
> The code can be found at
> http://home.online.no/~enrio/b3.c
> 
> Regards, Enrique
> 

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-23 18:56             ` Kevin Bealer
@ 2003-06-02 17:33               ` Hans Reiser
  2003-06-04  4:54                 ` Kevin Bealer
  0 siblings, 1 reply; 24+ messages in thread
From: Hans Reiser @ 2003-06-02 17:33 UTC (permalink / raw)
  To: Kevin Bealer, ReiserFS

Kevin Bealer wrote:

>As Alexander said, data can be in the node or
>elsewhere in
>either a skip list or skip tree.  Skip List nodes are
>variable
>sized; skip tree nodes are uniform, three pointers +
>data.
>(Unless the data is variable sized).
>
>Skip Lists look like:
>
>XX-----X-----X
>X--X--X--X--X--X--X--X
>
>
>--- "Alexander G. M. Smith" <agmsmith@rogers.com>
>wrote:
>  
>
>>Hans Reiser wrote on Fri, 23 May 2003 02:48:05
>>+0400:
>>    
>>
>>>Do nodes contain both pointers and data?
>>>      
>>>
>>Yes, that's a common way of doing skip lists.  Or
>>the node could contain
>>the level pointers and a data pointer.  The nodes
>>are also awkwardly
>>variable sized due to the random number of levels. 
>>That leads to
>>allocation fragmentation unless extra steps are
>>taken.
>>
>>I don't know how the split trees approach does it.
>>
>>- Alex
>>    
>>
>
>
>__________________________________
>Do you Yahoo!?
>The New Yahoo! Search - Faster. Easier. Bingo.
>http://search.yahoo.com
>
>
>  
>
So the fanout of a skip tree is 3?  (Just got back from Italy.....)

-- 
Hans



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-05-23 19:24       ` Kevin Bealer
@ 2003-06-03 17:49         ` Hans Reiser
  2003-06-04  5:17           ` Enrique Perez-Terron
  2003-06-04  5:22           ` Kevin Bealer
  0 siblings, 2 replies; 24+ messages in thread
From: Hans Reiser @ 2003-06-03 17:49 UTC (permalink / raw)
  To: Kevin Bealer; +Cc: Alexander G. M. Smith, reiserfs-list

Kevin Bealer wrote:

>Sorry, last message was accidentally sent prematurely.
>
>To continue:
>
>Skip lists (turned 90 degrees) look like:
>
>topnode--x------x
>C0
>A1-----A1-----A1
>C2
>C3
>B4-----B4
>C5
>A6-----A6-----A6
>C7
>C8
>B9-----B9
>C11
>A12----A12----A12
>
>With pointers vertically between all aligned letters
>(conceptually at least).
>
I don't understand the diagram above, which means I don't understand the 
rest.  I think you are explaining something you understand too well to 
explain....;-)

>
>Now here is the reason why SkipTrees do NOT need
>variable sized nodes:
>
>The Pointer from A1 to A6 is used, so is the pointer
>from A1 to B4 and from B4 to C5.
>
>However, the pointers from C2 and C3 to B4, B4 to A6,
>and B9 to A12 are NEVER USED for example.  You are
>sacrificing
>pointers to get to a binary tree, but the pointers you
>lose
>are the pointers you don't need anyway.
>
>In SkipList, you need those pointers only to copy
>their value to
>other pointers which will not be used.  The nodes
>can't shrink,
>because an insertion changes which pointers are used
>and which
>are not.
>
>If * is a wildcard, the rule is:
>
>No pointer from C* to B* or B* to A* is ever used.  If
>you were
>going to get to A6, you would have gotten there while
>traversing
>the "A" list, and if you are at the C list level, you
>have already left 
>the A and B lists behind.
>
>To convert a Skip List to a Skip Tree, note that every
>decision
>you need to make is based on a NODE->VALUE comparison.
>After each decision, processing can go in two
>directions, so there
>are two "edges" from that decision.
>
>Decisions become binary tree nodes.  EDGES become left
>and
>right pointers.
>
>The first decision in following the skip list is "Is
>the search
>value greater than "1" (i.e. node A1's value is 1). 
>Thus the
>top node of the Skip Tree is A1.  If the value is less
>than 1,
>we look at C0.  Thus C0 is on the left hand side of
>A1.
>
>Just traverse the skip list, and wherever you make a
>decision
>based on a value, there is a node.  Note that this
>skip list is
>not really balanced.  Due to the extra pointers, the
>policy is
>normally to have FEWER high-level nodes in a Skip List
>than
>would be predicted by the (1/2, 1/4, 1/8) progression
>from the
>bottom up.  Usually 3/4 on the bottom level, then 3/4
>of the
>remaining on the next level, etc.
>
>With a skip tree, we actually want the balanced
>version, i.e.
>ratios of 1/2 on each level, although other ratios can
>be used.
>
>Kevin
>
>--- Hans Reiser <reiser@namesys.com> wrote:
>  
>
>>Alexander G. M. Smith wrote:
>>
>>    
>>
>>>Hans Reiser wrote on Thu, 22 May 2003 19:30:16
>>>      
>>>
>>+0400:
>>    
>>
>>> 
>>>
>>>      
>>>
>>>>Forgive me for asking some bonehead questions.....
>>>>
>>>>Are skip lists height balanced?  (All paths to the
>>>>        
>>>>
>>leaves are equal?)
>>    
>>
>>>>   
>>>>
>>>>        
>>>>
>>>Only on the average, at least in the original skip
>>>      
>>>
>>list algorithm.  The
>>    
>>
>>>height is determined randomly when the node is
>>>      
>>>
>>created; there is no
>>    
>>
>>>rebalancing.  The random distribution is defined as
>>>      
>>>
>>you would expect,
>>    
>>
>>>50% of the nodes have 1 level, 25% have 2 levels,
>>>      
>>>
>>... 0.5**n of the
>>    
>>
>>>nodes have n levels.  Surprisingly, this works
>>>      
>>>
>>fairly well.
>>    
>>
>>>- Alex
>>>
>>>
>>> 
>>>
>>>      
>>>
>>To have a level means to be on that level or lower?
>>
>>-- 
>>Hans
>>
>>
>>    
>>
>
>
>__________________________________
>Do you Yahoo!?
>The New Yahoo! Search - Faster. Easier. Bingo.
>http://search.yahoo.com
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-02 17:33               ` Hans Reiser
@ 2003-06-04  4:54                 ` Kevin Bealer
  0 siblings, 0 replies; 24+ messages in thread
From: Kevin Bealer @ 2003-06-04  4:54 UTC (permalink / raw)
  To: Hans Reiser, ReiserFS


Sorry, I misspoke.  A SkipTree contains two pointers
down, a number from 1 to L, plus the data.  Fanout is
two, the 'third pointer' is really L, which probably
won't have to be larger than one byte (but would be
aligned to pointer size in practice.)

Normally, N would be no more than the log based 2 of
the max number of elements.  One byte should therefore
be enough for 2^256 elements.

Kevin

--- Hans Reiser <reiser@namesys.com> wrote:
> Kevin Bealer wrote:
> 
> >As Alexander said, data can be in the node or
> >elsewhere in
> >either a skip list or skip tree.  Skip List nodes
> are
> >variable
> >sized; skip tree nodes are uniform, three pointers
> +
> >data.
> >(Unless the data is variable sized).
> >
> >Skip Lists look like:
> >
> >XX-----X-----X
> >X--X--X--X--X--X--X--X
> >
> >
> >--- "Alexander G. M. Smith" <agmsmith@rogers.com>
> >wrote:
> >  
> >
> >>Hans Reiser wrote on Fri, 23 May 2003 02:48:05
> >>+0400:
> >>    
> >>
> >>>Do nodes contain both pointers and data?
> >>>      
> >>>
> >>Yes, that's a common way of doing skip lists.  Or
> >>the node could contain
> >>the level pointers and a data pointer.  The nodes
> >>are also awkwardly
> >>variable sized due to the random number of levels.
> 
> >>That leads to
> >>allocation fragmentation unless extra steps are
> >>taken.
> >>
> >>I don't know how the split trees approach does it.
> >>
> >>- Alex
> >>    
> >>
> >
> >
> >__________________________________
> >Do you Yahoo!?
> >The New Yahoo! Search - Faster. Easier. Bingo.
> >http://search.yahoo.com
> >
> >
> >  
> >
> So the fanout of a skip tree is 3?  (Just got back
> from Italy.....)
> 
> -- 
> Hans
> 
> 


__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-03 17:49         ` Hans Reiser
@ 2003-06-04  5:17           ` Enrique Perez-Terron
  2003-06-04  9:40             ` Hans Reiser
  2003-06-04  5:22           ` Kevin Bealer
  1 sibling, 1 reply; 24+ messages in thread
From: Enrique Perez-Terron @ 2003-06-04  5:17 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Kevin Bealer, Alexander G. M. Smith, reiserfs-list

On Tue, 2003-06-03 at 19:49, Hans Reiser wrote:
> Kevin Bealer wrote:

> >Skip lists (turned 90 degrees) look like:
> >
> >topnode--x------x
> >C0
> >A1-----A1-----A1
> >C2
> >C3
> >B4-----B4
> >C5
> >A6-----A6-----A6
> >C7
> >C8
> >B9-----B9
> >C11
> >A12----A12----A12
> >
> >With pointers vertically between all aligned letters
> >(conceptually at least).

> I don't understand the diagram above, which means I don't understand the 
> rest.  I think you are explaining something you understand too well to 
> explain....;-)

This is a skip list. 

Each line in the diagram represents a single node.

Each node is given a randomly chosen "level", a small number. A node
never changes level in its lifetime. Diagram lines with three items,
like A6---A6---A6, are level three nodes. All level three nodes are here
shown with the letter A, level two nodes with B, etc.

The vertical order of the node is that of the keys, i.e., the nodes are
shown sorted by key. The number 6 in "A6" corresponds to key number 6.

There is potentially no limit to the number of levels. Practically, it
is limited by the implementors expectation of maximal number of nodes.
(If the list grows larger than expected the skip list will still work,
but perform gradually worse than it would with a higher number of
levels.)

The columns of the diagram represent linked lists. The rightmost linked
list links only the A nodes. The middle linked list links the A and the
B nodes. Nodes of level n are members of n distinct lists. The order of
the nodes in the lists is that of the keys.

By the way the levels are chosen, there will normally be just 1-3 nodes
in the rightmost list. This list could be compared to the root node of a
B-tree. The next lower level list has on average twice as many nodes.
(Actually, this is a design parameter.) This means that walking down the
second list, starting from an A node, there are on average 1 B-node
before the next A node. Since the assignment of levels is random there
is no upper limit, but more than three happens only once in 16 cases,
etc.

To locate a particular node, say C5, by key, you could scan the leftmost
list linearly, but that would be O(N). Instead you can start scanning
the rightmost list. When you reach A6 you notice that you are too far,
so you back up to A1. Then you continue down the middle list, etc. This
traversal is actually quite similar to B-tree traversal, where scanning
the B nodes between A1 and A6 corresponds to scanning a B-tree node
below the root, and scanning the Cs between two Bs corresponds to
scanning a B-tree node two levels below the root.

> >
> >Now here is the reason why SkipTrees do NOT need
> >variable sized nodes:
> >
> >The Pointer from A1 to A6 is used, so is the pointer
> >from A1 to B4 and from B4 to C5.
> >
> >However, the pointers from C2 and C3 to B4, B4 to A6,
> >and B9 to A12 are NEVER USED for example.  You are

Well, from C2 to C3 _is_ used, since there is no B node between them.

> >sacrificing
> >pointers to get to a binary tree, but the pointers you
> >lose
> >are the pointers you don't need anyway.
> >
> >In SkipList, you need those pointers only to copy
> >their value to
> >other pointers which will not be used.  The nodes
> >can't shrink,
> >because an insertion changes which pointers are used
> >and which
> >are not.
> >
> >If * is a wildcard, the rule is:
> >
> >No pointer from C* to B* or B* to A* is ever used.  If
> >you were
> >going to get to A6, you would have gotten there while
> >traversing
> >the "A" list, and if you are at the C list level, you
> >have already left 
> >the A and B lists behind.
> >
> >To convert a Skip List to a Skip Tree, note that every
> >decision
> >you need to make is based on a NODE->VALUE comparison.
> >After each decision, processing can go in two
> >directions, so there
> >are two "edges" from that decision.
> >
> >Decisions become binary tree nodes.  EDGES become left
> >and
> >right pointers.
> >
> >The first decision in following the skip list is "Is
> >the search
> >value greater than "1" (i.e. node A1's value is 1). 
> >Thus the
> >top node of the Skip Tree is A1.  If the value is less
> >than 1,
> >we look at C0.  Thus C0 is on the left hand side of
> >A1.
> >
> >Just traverse the skip list, and wherever you make a
> >decision
> >based on a value, there is a node.  Note that this
> >skip list is
> >not really balanced.  Due to the extra pointers, the
> >policy is
> >normally to have FEWER high-level nodes in a Skip List
> >than
> >would be predicted by the (1/2, 1/4, 1/8) progression
> >from the
> >bottom up.  Usually 3/4 on the bottom level, then 3/4
> >of the
> >remaining on the next level, etc.
> >
> >With a skip tree, we actually want the balanced
> >version, i.e.
> >ratios of 1/2 on each level, although other ratios can
> >be used.
> >
> >Kevin

Regards, Enrique

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-03 17:49         ` Hans Reiser
  2003-06-04  5:17           ` Enrique Perez-Terron
@ 2003-06-04  5:22           ` Kevin Bealer
  2003-06-04  9:44             ` Hans Reiser
  1 sibling, 1 reply; 24+ messages in thread
From: Kevin Bealer @ 2003-06-04  5:22 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Alexander G. M. Smith, reiserfs-list

>Hans Reiser <reiser@namesys.com> wrote:
> Kevin Bealer wrote:
> 
> >Sorry, last message was accidentally sent
> prematurely.
> >
> >To continue:
> >
> >Skip lists (turned 90 degrees) look like:
> >
> >topnode--x------x
> >C0
> >A1-----A1-----A1
> >C2
> >C3
> >B4-----B4
> >C5
> >A6-----A6-----A6
> >C7
> >C8
> >B9-----B9
> >C11
> >A12----A12----A12
> >
> >With pointers vertically between all aligned
> letters
> >(conceptually at least).
> >
> I don't understand the diagram above, which means I
> don't understand the 
> rest.  I think you are explaining something you
> understand too well to 
> explain....;-)

Yes, probably.  Sorry about that;  I was trying to
make a diagram that works with variable sized type.

Each line of the diagram is a node of the skip list.

The 'topnode' is the header of the skip list, which
typically looks like a skip node of the maximum
node size with no data.

Nodes with one pointer are 'C#' here, nodes with
two pointers are 'B#', nodes with three are 'A#'.
The '#' is the data value of the node (or the key,
if it is a multiset).

In the skip list, node A1 is a node with three
pointers, to C1, B2, and A6.  Each of these must
have a value greater or equal to A1, to preserve
the invariant, which is that the list at each
level is an ordered list.  In this case the list
at level 3 is the list containing A's and chained
in the third pointer.  The list using the second 
pointer contains A's, and B's.  The list at the
first level, represented here as character column
1 in the diagram, is the first level list, it 
contains all the nodes.

The argument I am making is not essential, it is
only that some of these pointers are unnecessary,
which leads to the Split Tree.

A different organization of pointers, and we have
the same search ORDER, but with slightly different
internal connections; plus all nodes are the same
size, and all normal 'binary tree' based algorithms
work as expected.

If you understand the Skip List, you can see every
location where a decision is made by comparing the
search value to the value of a node.  Running through 
these by hand, I was able to use these decisions as
binary tree traversal decisions, and create a binary
tree that has the same traversal behavior as a skip
list.

In doing so, I realized that the pointers which were
from a low level node to a high level node are not
consulted.  It looks good to have them because then
we have a set of ordered lists at each level, but
they don't actually get traversed in searches.

Anyway, its sort of a 'theory' argument to explain
why Split Tree is possible even though it seems like
you can't safely do this to a Skip List.

Does that help or am I still babbling? ;)

Kevin

> 
> >
> >Now here is the reason why SkipTrees do NOT need
> >variable sized nodes:
> >
> >The Pointer from A1 to A6 is used, so is the
> pointer
> >from A1 to B4 and from B4 to C5.
> >
> >However, the pointers from C2 and C3 to B4, B4 to
> A6,
> >and B9 to A12 are NEVER USED for example.  You are
> >sacrificing
> >pointers to get to a binary tree, but the pointers
> you
> >lose
> >are the pointers you don't need anyway.
> >
> >In SkipList, you need those pointers only to copy
> >their value to
> >other pointers which will not be used.  The nodes
> >can't shrink,
> >because an insertion changes which pointers are
> used
> >and which
> >are not.
> >
> >If * is a wildcard, the rule is:
> >
> >No pointer from C* to B* or B* to A* is ever used. 
> If
> >you were
> >going to get to A6, you would have gotten there
> while
> >traversing
> >the "A" list, and if you are at the C list level,
> you
> >have already left 
> >the A and B lists behind.
> >
> >To convert a Skip List to a Skip Tree, note that
> every
> >decision
> >you need to make is based on a NODE->VALUE
> comparison.
> >After each decision, processing can go in two
> >directions, so there
> >are two "edges" from that decision.
> >
> >Decisions become binary tree nodes.  EDGES become
> left
> >and
> >right pointers.
> >
> >The first decision in following the skip list is
> "Is
> >the search
> >value greater than "1" (i.e. node A1's value is 1).
> 
> >Thus the
> >top node of the Skip Tree is A1.  If the value is
> less
> >than 1,
> >we look at C0.  Thus C0 is on the left hand side of
> >A1.
> >
> >Just traverse the skip list, and wherever you make
> a
> >decision
> >based on a value, there is a node.  Note that this
> >skip list is
> >not really balanced.  Due to the extra pointers,
> the
> >policy is
> >normally to have FEWER high-level nodes in a Skip
> List
> >than
> >would be predicted by the (1/2, 1/4, 1/8)
> progression
> >from the
> >bottom up.  Usually 3/4 on the bottom level, then
> 3/4
> >of the
> >remaining on the next level, etc.
> >
> >With a skip tree, we actually want the balanced
> >version, i.e.
> >ratios of 1/2 on each level, although other ratios
> can
> >be used.
> >
> >Kevin
> >
> >--- Hans Reiser <reiser@namesys.com> wrote:
> >  
> >
> >>Alexander G. M. Smith wrote:
> >>
> >>    
> >>
> >>>Hans Reiser wrote on Thu, 22 May 2003 19:30:16
> >>>      
> >>>
> >>+0400:
> >>    
> >>
> >>> 
> >>>
> >>>      
> >>>
> >>>>Forgive me for asking some bonehead
> questions.....
> >>>>
> >>>>Are skip lists height balanced?  (All paths to
> the
> >>>>        
> >>>>
> >>leaves are equal?)
> >>    
> >>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>Only on the average, at least in the original
> skip
> >>>      
> >>>
> >>list algorithm.  The
> >>    
> >>
> >>>height is determined randomly when the node is
> >>>      
> >>>
> >>created; there is no
> >>    
> >>
> >>>rebalancing.  The random distribution is defined
> as
> >>>      
> >>>
> >>you would expect,
> >>    
> >>
> >>>50% of the nodes have 1 level, 25% have 2 levels,
> >>>      
> >>>
> >>... 0.5**n of the
> >>    
> >>
> >>>nodes have n levels.  Surprisingly, this works
> >>>      
> >>>
> >>fairly well.
> >>    
> >>
> >>>- Alex
> >>>
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>To have a level means to be on that level or
> lower?
> >>
> >>-- 
> >>Hans
> >>
> 
=== message truncated ===

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-04  5:17           ` Enrique Perez-Terron
@ 2003-06-04  9:40             ` Hans Reiser
  2003-06-05 22:34               ` Enrique Perez-Terron
  2003-06-08  5:20               ` Kevin Bealer
  0 siblings, 2 replies; 24+ messages in thread
From: Hans Reiser @ 2003-06-04  9:40 UTC (permalink / raw)
  To: Enrique Perez-Terron; +Cc: Kevin Bealer, Alexander G. M. Smith, reiserfs-list

So if I understand right, the level is determined by what the key is.  
Suppose that for key K the determined level is 3, but there are no other 
records in the tree?  Is K then stored at a node at level 1 until such 
time as a level 1 key is created for it to be attached to, or does one 
create empty level 1 and level 2 nodes?

-- 
Hans

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-04  5:22           ` Kevin Bealer
@ 2003-06-04  9:44             ` Hans Reiser
  2003-06-08  5:17               ` Kevin Bealer
  0 siblings, 1 reply; 24+ messages in thread
From: Hans Reiser @ 2003-06-04  9:44 UTC (permalink / raw)
  To: Kevin Bealer; +Cc: Alexander G. M. Smith, reiserfs-list

Kevin Bealer wrote:

>>Hans Reiser <reiser@namesys.com> wrote:
>>Kevin Bealer wrote:
>>
>>    
>>
>>>Sorry, last message was accidentally sent
>>>      
>>>
>>prematurely.
>>    
>>
>>>To continue:
>>>
>>>Skip lists (turned 90 degrees) look like:
>>>
>>>topnode--x------x
>>>C0
>>>A1-----A1-----A1
>>>C2
>>>C3
>>>B4-----B4
>>>C5
>>>A6-----A6-----A6
>>>C7
>>>C8
>>>B9-----B9
>>>C11
>>>A12----A12----A12
>>>
>>>With pointers vertically between all aligned
>>>      
>>>
>>letters
>>    
>>
>>>(conceptually at least).
>>>
>>>      
>>>
>>I don't understand the diagram above, which means I
>>don't understand the 
>>rest.  I think you are explaining something you
>>understand too well to 
>>explain....;-)
>>    
>>
>
>
>Yes, probably.  Sorry about that;  I was trying to
>make a diagram that works with variable sized type.
>
>Each line of the diagram is a node of the skip list.
>
>The 'topnode' is the header of the skip list, which
>typically looks like a skip node of the maximum
>node size with no data.
>
>Nodes with one pointer are 'C#' here, nodes with
>two pointers are 'B#', nodes with three are 'A#'.
>The '#' is the data value of the node (or the key,
>if it is a multiset).
>
>In the skip list, node A1 is a node with three
>pointers, to C1, B2, and A6.
>

C1 and B2 are not in your diagram....

>  Each of these must
>have a value greater or equal to A1, to preserve
>the invariant, which is that the list at each
>level is an ordered list.  In this case the list
>at level 3 is the list containing A's and chained
>in the third pointer.  The list using the second 
>pointer contains A's, and B's.  The list at the
>first level, represented here as character column
>1 in the diagram, is the first level list, it 
>contains all the nodes.
>
>The argument I am making is not essential, it is
>only that some of these pointers are unnecessary,
>which leads to the Split Tree.
>
>A different organization of pointers, and we have
>the same search ORDER, but with slightly different
>internal connections; plus all nodes are the same
>size, and all normal 'binary tree' based algorithms
>work as expected.
>
>If you understand the Skip List, you can see every
>location where a decision is made by comparing the
>search value to the value of a node.  Running through 
>these by hand, I was able to use these decisions as
>binary tree traversal decisions, and create a binary
>tree that has the same traversal behavior as a skip
>list.
>
>In doing so, I realized that the pointers which were
>from a low level node to a high level node are not
>consulted.  It looks good to have them because then
>we have a set of ordered lists at each level, but
>they don't actually get traversed in searches.
>
>Anyway, its sort of a 'theory' argument to explain
>why Split Tree is possible even though it seems like
>you can't safely do this to a Skip List.
>
>Does that help or am I still babbling? ;)
>
>Kevin
>
>  
>
>>>Now here is the reason why SkipTrees do NOT need
>>>variable sized nodes:
>>>
>>>The Pointer from A1 to A6 is used, so is the
>>>      
>>>
>>pointer
>>>from A1 to B4 and from B4 to C5.
>>    
>>
>>>However, the pointers from C2 and C3 to B4, B4 to
>>>      
>>>
>>A6,
>>    
>>
>>>and B9 to A12 are NEVER USED for example.  You are
>>>sacrificing
>>>pointers to get to a binary tree, but the pointers
>>>      
>>>
>>you
>>    
>>
>>>lose
>>>are the pointers you don't need anyway.
>>>
>>>In SkipList, you need those pointers only to copy
>>>their value to
>>>other pointers which will not be used.  The nodes
>>>can't shrink,
>>>because an insertion changes which pointers are
>>>      
>>>
>>used
>>    
>>
>>>and which
>>>are not.
>>>
>>>If * is a wildcard, the rule is:
>>>
>>>No pointer from C* to B* or B* to A* is ever used. 
>>>      
>>>
>>If
>>    
>>
>>>you were
>>>going to get to A6, you would have gotten there
>>>      
>>>
>>while
>>    
>>
>>>traversing
>>>the "A" list, and if you are at the C list level,
>>>      
>>>
>>you
>>    
>>
>>>have already left 
>>>the A and B lists behind.
>>>
>>>To convert a Skip List to a Skip Tree, note that
>>>      
>>>
>>every
>>    
>>
>>>decision
>>>you need to make is based on a NODE->VALUE
>>>      
>>>
>>comparison.
>>    
>>
>>>After each decision, processing can go in two
>>>directions, so there
>>>are two "edges" from that decision.
>>>
>>>Decisions become binary tree nodes.  EDGES become
>>>      
>>>
>>left
>>    
>>
>>>and
>>>right pointers.
>>>
>>>The first decision in following the skip list is
>>>      
>>>
>>"Is
>>    
>>
>>>the search
>>>value greater than "1" (i.e. node A1's value is 1).
>>>      
>>>
>>>Thus the
>>>top node of the Skip Tree is A1.  If the value is
>>>      
>>>
>>less
>>    
>>
>>>than 1,
>>>we look at C0.  Thus C0 is on the left hand side of
>>>A1.
>>>
>>>Just traverse the skip list, and wherever you make
>>>      
>>>
>>a
>>    
>>
>>>decision
>>>based on a value, there is a node.  Note that this
>>>skip list is
>>>not really balanced.  Due to the extra pointers,
>>>      
>>>
>>the
>>    
>>
>>>policy is
>>>normally to have FEWER high-level nodes in a Skip
>>>      
>>>
>>List
>>    
>>
>>>than
>>>would be predicted by the (1/2, 1/4, 1/8)
>>>      
>>>
>>progression
>>>from the
>>    
>>
>>>bottom up.  Usually 3/4 on the bottom level, then
>>>      
>>>
>>3/4
>>    
>>
>>>of the
>>>remaining on the next level, etc.
>>>
>>>With a skip tree, we actually want the balanced
>>>version, i.e.
>>>ratios of 1/2 on each level, although other ratios
>>>      
>>>
>>can
>>    
>>
>>>be used.
>>>
>>>Kevin
>>>
>>>--- Hans Reiser <reiser@namesys.com> wrote:
>>> 
>>>
>>>      
>>>
>>>>Alexander G. M. Smith wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Hans Reiser wrote on Thu, 22 May 2003 19:30:16
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>+0400:
>>>>   
>>>>
>>>>        
>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Forgive me for asking some bonehead
>>>>>>            
>>>>>>
>>questions.....
>>    
>>
>>>>>>Are skip lists height balanced?  (All paths to
>>>>>>            
>>>>>>
>>the
>>    
>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>leaves are equal?)
>>>>   
>>>>
>>>>        
>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>Only on the average, at least in the original
>>>>>          
>>>>>
>>skip
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>list algorithm.  The
>>>>   
>>>>
>>>>        
>>>>
>>>>>height is determined randomly when the node is
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>created; there is no
>>>>   
>>>>
>>>>        
>>>>
>>>>>rebalancing.  The random distribution is defined
>>>>>          
>>>>>
>>as
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>you would expect,
>>>>   
>>>>
>>>>        
>>>>
>>>>>50% of the nodes have 1 level, 25% have 2 levels,
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>... 0.5**n of the
>>>>   
>>>>
>>>>        
>>>>
>>>>>nodes have n levels.  Surprisingly, this works
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>fairly well.
>>>>   
>>>>
>>>>        
>>>>
>>>>>- Alex
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>To have a level means to be on that level or
>>>>        
>>>>
>>lower?
>>    
>>
>>>>-- 
>>>>Hans
>>>>
>>>>        
>>>>
>=== message truncated ===
>
>
>__________________________________
>Do you Yahoo!?
>Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
>http://calendar.yahoo.com
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-04  9:40             ` Hans Reiser
@ 2003-06-05 22:34               ` Enrique Perez-Terron
  2003-06-08  5:26                 ` Kevin Bealer
  2003-06-08  5:20               ` Kevin Bealer
  1 sibling, 1 reply; 24+ messages in thread
From: Enrique Perez-Terron @ 2003-06-05 22:34 UTC (permalink / raw)
  To: reiserfs-list

On Wed, 2003-06-04 at 11:40, Hans Reiser wrote:
> So if I understand right, the level is determined by what the key is.  

Hans, I am afraid this discussion is getting confusing, because we are
actually discussing three different data structures, and with you
traveling around attention gets spotty and concepts get mixed up. :)

I will try to sum up:

First there were the "skip list", invented by Puig in 1991, I think.

Skip lists are interesting because they need no re-balancing. Their
structure is completely independent of the history of insertions and
deletions. Operations are fast and very elegant. It is claimed they beat
AVL-trees and many other structures by constant factors.

However, skip lists are not easy to use in a massively parallel
algorithms. One has to lock a large portion of the skip list before
updating it, and keep this lock until finished. Also, the nodes have a
varying number of pointers in them (constant for a given node, though).
This leads to some allocation issues.

Then came the skip trees. Somebody (forgot his name) discovered that
every skip list could be transformed into an equivalent skip tree, and
vice versa.

Skip trees are rather similar to B-trees. However there is no upper
bound to the number of keys in a node, so the node might have to occupy
multiple disk blocks. Probability laws will keep the vast majority of
the nodes small. (Just how small is tunable.)

The Skip tree algorithms have the nice feature that you can lock a node
to operate on it, then lock some of its children, release the parent,
etc, proceeding toward the leafs. This is nice because two processes
could work on the tree at the same time without risking deadlocks. This
makes skip trees interesting for parallel algorithms.

Now Kevin has observed that rather than converting a skip list into a
skip tree, one can convert it into what he calls a "split tree". Split
trees are binary trees where every node has an attribute "level", a
small integer. Thus, nodes are of fixed size (provided keys are). 

Besides the usual key-ordering invariants, split trees obey the law that
the children of a node have no higher level than the node itself. The
level is assigned the same way as in skip lists: at random when the node
is created, with low levels chosen more often. It is not necessarily a
function of the key, but your question made me think that was another
interesting idea.

Split trees have the same downward lock propagation as skip trees,
making them interesting for parallel processing applications.

However, the last few mails have been describing the skip list to you.
Originally I think the skip lists were only mentioned as a supposedly
known reference when describing the split tree. 

A skip list is at first a singly lined list containing all nodes and
ordered by ascending keys. There are additionally two dummy nodes Head
and Tail (or Nil in the literature), with suitable dummy keys. Scanning
this list for a given key is O(N).

But then there is a second list, which includes on average some fraction
of all nodes, typically half of them. These nodes are then members of
both lists at once. They must have multiple "next" pointers one for each
list. Scanning this list for a given key is faster than the first list,
but you might not find the key. However, if you don't find the node with
key K, you can locate consecutive nodes with keys J and L in this list
such that J < K < L.  Since node(J) is member of list one too, you can
switch to scanning list one starting from node(J) . That gives you the
speed of list two, yet the completeness of list one.

But just as list two is a fast lane compared to list one, there is a
list three whose members is a subset of those of list two, and further
lists, each a faster lane to get quickly at any point in the previous
list.

Notice that Head and Nil are members of all lists. The following search
procedure is O(lnN): Start at Head and scan the fastest list and drop
subsequently down to lower-level (slower/more complete) lists whenever
you would otherwise step into nodes with larger keys than the one you
are looking for.

When a node is created, a random number "level" is assigned to it, and
then then node is inserted into the first "level" lists in its correct
place according to the key ordering.  A node never changes its level. It
remains a member of the same lists until it is eventually removed from
all of them.

> Suppose that for key K the determined level is 3, but there are no 
> other 
> records in the tree?  Is K then stored at a node at level 1 until such 
> time as a level 1 key is created for it to be attached to, or does one 
> create empty level 1 and level 2 nodes?

If we are still describing the skip list, no. K is stored at level 3,
and no empty nodes of level 1 or 2 are created. However, node(K)
is simultaneously member of three lists. One list holds nodes with level
>= 1, i.e. all nodes. Another is for nodes with level >= 2, and that is
usually about half the nodes. The third list contains all nodes with
levels 3 and higher.

Using a variation of Kevin's diagram notation, we could depict the list
with a single node with key K and level three: Horizontally repeated
letters are the same node, but the node is member of multiple lists.
Vertical lines are pointers, 'v' are arrowheads.

A----A----A----A...  (This is a single node called Head and with a dummy
|    |    |    |      key smaller than any legal key. It is member of
|    |    |    |      all lists. )
v    v    v    |
K----K----K    |     (K is simultaneously member of three lists. It is 
|    |    |    |      not member of the fourth list (or any higher
|    |    |    |      list.)
v    v    v    v
Z----Z----Z----Z...  (This is the final node, a dummy node of infinite
                      level, and thus (final) member of all lists. Its
                      key is a dummy key larger than any legal key.)

Regards, Enrique

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-04  9:44             ` Hans Reiser
@ 2003-06-08  5:17               ` Kevin Bealer
  0 siblings, 0 replies; 24+ messages in thread
From: Kevin Bealer @ 2003-06-08  5:17 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Alexander G. M. Smith, reiserfs-list


--- Hans Reiser <reiser@namesys.com> wrote:
> Kevin Bealer wrote:
> 
> >>Hans Reiser <reiser@namesys.com> wrote:
> >>Kevin Bealer wrote:
> >>
> >>    
> >>
> >>>Sorry, last message was accidentally sent
> >>>      
> >>>
> >>prematurely.
> >>    
> >>
> >>>To continue:
> >>>
> >>>Skip lists (turned 90 degrees) look like:
> >>>
> >>>topnode--x------x
> >>>C0
> >>>A1-----A1-----A1
> >>>C2
> >>>C3
> >>>B4-----B4
> >>>C5
> >>>A6-----A6-----A6
> >>>C7
> >>>C8
> >>>B9-----B9
> >>>C11
> >>>A12----A12----A12
> >>>
> >>>With pointers vertically between all aligned
> >>>      
> >>>
> >>letters
> >>    
> >>
> >>>(conceptually at least).
> >>>
> >>>      
> >>>
> >>I don't understand the diagram above, which means
> I
> >>don't understand the 
> >>rest.  I think you are explaining something you
> >>understand too well to 
> >>explain....;-)
> >>    
> >>
> >
> >
> >Yes, probably.  Sorry about that;  I was trying to
> >make a diagram that works with variable sized type.
> >
> >Each line of the diagram is a node of the skip
> list.
> >
> >The 'topnode' is the header of the skip list, which
> >typically looks like a skip node of the maximum
> >node size with no data.
> >
> >Nodes with one pointer are 'C#' here, nodes with
> >two pointers are 'B#', nodes with three are 'A#'.
> >The '#' is the data value of the node (or the key,
> >if it is a multiset).
> >
> >In the skip list, node A1 is a node with three
> >pointers, to C1, B2, and A6.
> >
> 
> C1 and B2 are not in your diagram....
> 

Sorry, it should read C2, B4, and A6.

> >  Each of these must
> >have a value greater or equal to A1, to preserve
> >the invariant, which is that the list at each
> >level is an ordered list.  In this case the list
> >at level 3 is the list containing A's and chained
> >in the third pointer.  The list using the second 
> >pointer contains A's, and B's.  The list at the
> >first level, represented here as character column
> >1 in the diagram, is the first level list, it 
> >contains all the nodes.
> >
> >The argument I am making is not essential, it is
> >only that some of these pointers are unnecessary,
> >which leads to the Split Tree.
> >
> >A different organization of pointers, and we have
> >the same search ORDER, but with slightly different
> >internal connections; plus all nodes are the same
> >size, and all normal 'binary tree' based algorithms
> >work as expected.
> >
> >If you understand the Skip List, you can see every
> >location where a decision is made by comparing the
> >search value to the value of a node.  Running
> through 
> >these by hand, I was able to use these decisions as
> >binary tree traversal decisions, and create a
> binary
> >tree that has the same traversal behavior as a skip
> >list.
> >
> >In doing so, I realized that the pointers which
> were
> >from a low level node to a high level node are not
> >consulted.  It looks good to have them because then
> >we have a set of ordered lists at each level, but
> >they don't actually get traversed in searches.
> >
> >Anyway, its sort of a 'theory' argument to explain
> >why Split Tree is possible even though it seems
> like
> >you can't safely do this to a Skip List.
> >
> >Does that help or am I still babbling? ;)
> >
> >Kevin
> >
> >  
> >
> >>>Now here is the reason why SkipTrees do NOT need
> >>>variable sized nodes:
> >>>
> >>>The Pointer from A1 to A6 is used, so is the
> >>>      
> >>>
> >>pointer
> >>>from A1 to B4 and from B4 to C5.
> >>    
> >>
> >>>However, the pointers from C2 and C3 to B4, B4 to
> >>>      
> >>>
> >>A6,
> >>    
> >>
> >>>and B9 to A12 are NEVER USED for example.  You
> are
> >>>sacrificing
> >>>pointers to get to a binary tree, but the
> pointers
> >>>      
> >>>
> >>you
> >>    
> >>
> >>>lose
> >>>are the pointers you don't need anyway.
> >>>
> >>>In SkipList, you need those pointers only to copy
> >>>their value to
> >>>other pointers which will not be used.  The nodes
> >>>can't shrink,
> >>>because an insertion changes which pointers are
> >>>      
> >>>
> >>used
> >>    
> >>
> >>>and which
> >>>are not.
> >>>
> >>>If * is a wildcard, the rule is:
> >>>
> >>>No pointer from C* to B* or B* to A* is ever
> used. 
> >>>      
> >>>
> >>If
> >>    
> >>
> >>>you were
> >>>going to get to A6, you would have gotten there
> >>>      
> >>>
> >>while
> >>    
> >>
> >>>traversing
> >>>the "A" list, and if you are at the C list level,
> >>>      
> >>>
> >>you
> >>    
> >>
> >>>have already left 
> >>>the A and B lists behind.
> >>>
> >>>To convert a Skip List to a Skip Tree, note that
> >>>      
> >>>
> >>every
> >>    
> >>
> >>>decision
> >>>you need to make is based on a NODE->VALUE
> >>>      
> >>>
> >>comparison.
> >>    
> 
=== message truncated ===


__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-04  9:40             ` Hans Reiser
  2003-06-05 22:34               ` Enrique Perez-Terron
@ 2003-06-08  5:20               ` Kevin Bealer
  1 sibling, 0 replies; 24+ messages in thread
From: Kevin Bealer @ 2003-06-08  5:20 UTC (permalink / raw)
  To: Hans Reiser, Enrique Perez-Terron
  Cc: Kevin Bealer, Alexander G. M. Smith, reiserfs-list

In this case you would have a tree with only a level 3
node.  The levels are only used during insertion.  If
a level 2 node was later added, it would be added
under the level 3 node.

The level is not determined by the key, but chosen
randomly.  Half of the nodes are given level 0, half
of the remaining are given level 1, etc.

(I am talking about my algorithm here.)

Kevin

--- Hans Reiser <reiser@namesys.com> wrote:
> So if I understand right, the level is determined by
> what the key is.  
> Suppose that for key K the determined level is 3,
> but there are no other 
> records in the tree?  Is K then stored at a node at
> level 1 until such 
> time as a level 1 key is created for it to be
> attached to, or does one 
> create empty level 1 and level 2 nodes?
> 
> -- 
> Hans
> 
> 

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-05 22:34               ` Enrique Perez-Terron
@ 2003-06-08  5:26                 ` Kevin Bealer
  2003-06-09 15:33                   ` Enrique Perez-Terron
  0 siblings, 1 reply; 24+ messages in thread
From: Kevin Bealer @ 2003-06-08  5:26 UTC (permalink / raw)
  To: Enrique Perez-Terron, reiserfs-list


I should mention that I have never heard of "skip
tree" however.  The names I have been using are:

1. skip list
2. binary treap
3. split tree (mine)

The binary treap is very close to my split tree (it
has the same invariants but slightly different 
algorithms).  I did not know about binary treaps
until I had already designed split tree, and was
looking to see if it was a new invention.

I may have accidentally called my version a skip
tree.  If skip tree is another version, I'm not
sure how it is different from mine or what the
details are.

Kevin

--- Enrique Perez-Terron
<enrique.perez-terron@norway.online.no> wrote:
> On Wed, 2003-06-04 at 11:40, Hans Reiser wrote:
> > So if I understand right, the level is determined
> by what the key is.  
> 
> Hans, I am afraid this discussion is getting
> confusing, because we are
> actually discussing three different data structures,
> and with you
> traveling around attention gets spotty and concepts
> get mixed up. :)
> 
> I will try to sum up:
> 
> First there were the "skip list", invented by Puig
> in 1991, I think.
> 
> Skip lists are interesting because they need no
> re-balancing. Their
> structure is completely independent of the history
> of insertions and
> deletions. Operations are fast and very elegant. It
> is claimed they beat
> AVL-trees and many other structures by constant
> factors.
> 
> However, skip lists are not easy to use in a
> massively parallel
> algorithms. One has to lock a large portion of the
> skip list before
> updating it, and keep this lock until finished.
> Also, the nodes have a
> varying number of pointers in them (constant for a
> given node, though).
> This leads to some allocation issues.
> 
> Then came the skip trees. Somebody (forgot his name)
> discovered that
> every skip list could be transformed into an
> equivalent skip tree, and
> vice versa.
> 
> Skip trees are rather similar to B-trees. However
> there is no upper
> bound to the number of keys in a node, so the node
> might have to occupy
> multiple disk blocks. Probability laws will keep the
> vast majority of
> the nodes small. (Just how small is tunable.)
> 
> The Skip tree algorithms have the nice feature that
> you can lock a node
> to operate on it, then lock some of its children,
> release the parent,
> etc, proceeding toward the leafs. This is nice
> because two processes
> could work on the tree at the same time without
> risking deadlocks. This
> makes skip trees interesting for parallel
> algorithms.
> 
> Now Kevin has observed that rather than converting a
> skip list into a
> skip tree, one can convert it into what he calls a
> "split tree". Split
> trees are binary trees where every node has an
> attribute "level", a
> small integer. Thus, nodes are of fixed size
> (provided keys are). 
> 
> Besides the usual key-ordering invariants, split
> trees obey the law that
> the children of a node have no higher level than the
> node itself. The
> level is assigned the same way as in skip lists: at
> random when the node
> is created, with low levels chosen more often. It is
> not necessarily a
> function of the key, but your question made me think
> that was another
> interesting idea.
> 
> Split trees have the same downward lock propagation
> as skip trees,
> making them interesting for parallel processing
> applications.
> 
> However, the last few mails have been describing the
> skip list to you.
> Originally I think the skip lists were only
> mentioned as a supposedly
> known reference when describing the split tree. 
> 
> A skip list is at first a singly lined list
> containing all nodes and
> ordered by ascending keys. There are additionally
> two dummy nodes Head
> and Tail (or Nil in the literature), with suitable
> dummy keys. Scanning
> this list for a given key is O(N).
> 
> But then there is a second list, which includes on
> average some fraction
> of all nodes, typically half of them. These nodes
> are then members of
> both lists at once. They must have multiple "next"
> pointers one for each
> list. Scanning this list for a given key is faster
> than the first list,
> but you might not find the key. However, if you
> don't find the node with
> key K, you can locate consecutive nodes with keys J
> and L in this list
> such that J < K < L.  Since node(J) is member of
> list one too, you can
> switch to scanning list one starting from node(J) .
> That gives you the
> speed of list two, yet the completeness of list one.
> 
> But just as list two is a fast lane compared to list
> one, there is a
> list three whose members is a subset of those of
> list two, and further
> lists, each a faster lane to get quickly at any
> point in the previous
> list.
> 
> Notice that Head and Nil are members of all lists.
> The following search
> procedure is O(lnN): Start at Head and scan the
> fastest list and drop
> subsequently down to lower-level (slower/more
> complete) lists whenever
> you would otherwise step into nodes with larger keys
> than the one you
> are looking for.
> 
> When a node is created, a random number "level" is
> assigned to it, and
> then then node is inserted into the first "level"
> lists in its correct
> place according to the key ordering.  A node never
> changes its level. It
> remains a member of the same lists until it is
> eventually removed from
> all of them.
> 
> > Suppose that for key K the determined level is 3,
> but there are no 
> > other 
> > records in the tree?  Is K then stored at a node
> at level 1 until such 
> > time as a level 1 key is created for it to be
> attached to, or does one 
> > create empty level 1 and level 2 nodes?
> 
> If we are still describing the skip list, no. K is
> stored at level 3,
> and no empty nodes of level 1 or 2 are created.
> However, node(K)
> is simultaneously member of three lists. One list
> holds nodes with level
> >= 1, i.e. all nodes. Another is for nodes with
> level >= 2, and that is
> usually about half the nodes. The third list
> contains all nodes with
> levels 3 and higher.
> 
> Using a variation of Kevin's diagram notation, we
> could depict the list
> with a single node with key K and level three:
> Horizontally repeated
> letters are the same node, but the node is member of
> multiple lists.
> Vertical lines are pointers, 'v' are arrowheads.
> 
> A----A----A----A...  (This is a single node called
> Head and with a dummy
> |    |    |    |      key smaller than any legal
> key. It is member of
> |    |    |    |      all lists. )
> v    v    v    |
> K----K----K    |     (K is simultaneously member of
> three lists. It is 
> |    |    |    |      not member of the fourth list
> (or any higher
> |    |    |    |      list.)
> v    v    v    v
> Z----Z----Z----Z...  (This is the final node, a
> dummy node of infinite
>                       level, and thus (final) member
> of all lists. Its
>                       key is a dummy key larger than
> any legal key.)
> 
> Regards, Enrique
> 


__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Split Trees (my design, similar to binary treaps)
  2003-06-08  5:26                 ` Kevin Bealer
@ 2003-06-09 15:33                   ` Enrique Perez-Terron
  0 siblings, 0 replies; 24+ messages in thread
From: Enrique Perez-Terron @ 2003-06-09 15:33 UTC (permalink / raw)
  To: Kevin Bealer; +Cc: reiserfs-list

On Sun, 2003-06-08 at 07:26, Kevin Bealer wrote:

> I may have accidentally called my version a skip
> tree.  If skip tree is another version, I'm not

I won't discard completely the possibility that it was me that
short-circuited to skip trees when seeing "split tree" some lines below
"skip list"...

In any case, this is my description of a skip tree:

A skip tree is just like a B-tree except that the number of keys in a
node may be zero, and has no upper bound. Keys are not repeated at lower
levels, and "payload data" associated with a key is stored with the key,
usually in the form of a pointer. 

These data pointers are not further mentioned. In what follows all
pointers are pointers to skip tree nodes.

Thus a skip tree of 

   height 0     is empty,
   height 1     is a node with zero or more keys in ascending order, 
   height h > 1 is a node with zero or more keys, k_1,...,k_n, in
                ascending order, and n+1 pointers p_0,...p_n to height
                h-1 skip trees. 

The keys accessible directly or indirectly through pointer p_i in a
node, have values between the keys k_{i-1} and k_i in the node.

On insertion, a level h is randomly assigned to the new key. 

If the skip tree already has height h (sub)trees, the key is inserted
into the root node of the unique height h (sub)tree where the key is
permitted by the ordering constraints. If the new key becomes k_i in the
node, the node pointed to by p_{i-1} is split in two, one node becomes
the new p_{i-1}, the other becomes the new p_i. Former keys k_i,...,k_n
are renumbered k_{i+1},...,k_{n+1}, and the former pointers p_i,..,p_n
are renumbered p_{i+1},...,p{n+1}. Nodes two levels below h are not
changed.

If no level-l node exists, one is created with the new key, and as many
empty nodes are created as needed to make the tree have two subtrees of
height l-1. The original tree's root node is split in two if required
and the parts used instead of empty nodes at the appropriate place in
the subtrees below the new root node.

Converting a skip list into a skip tree is straightforward.

The head and nil nodes of the skip list are not included in the skip
tree.

A skip tree whose highest level with nonzero population is L, is
converted to a skip tree of height L.

The elements in the top-level list are placed in the root node of the
skip tree.

In a skip list, if a and b are level l nodes that are consecutive in the
level l list, then the nodes between a and b in the lower level lists
form a skip list with a and b as head and nil respectively.

This (sub)skip list is converted in the same way as the original skip
list, so the keys of the level l-1 nodes between a and b form the root
node of a skip (sub)tree. Notice that there may be zero keys in this
(subtree) root node, since there may be no level l-1 nodes between a and
b.

The skip tree is a structure where the power-of-two  distribution of the
levels is often not optimal. Using a block size that can accommodate N
keys per one-block node, I guess something near 3N/4 keys of level l-1
for each level l key is optimal. The trade-off is between the cost of
having a linear list of blocks to constitute a single logical node when
there are too many keys (and most of the time the second block will have
very few keys in it), against the cost of having more levels if each
node has few keys. 3N/4 corresponds to a p value of 4/(3N). Then level 1
must be chosen (1-p) of the time, level 2 (1-p)*p of the time, etc. If
you expect your data to have less than MAX keys over the lifetime of the
tree, you would clamp the level to be no more than log_{3N/4}(MAX) + 1.

Regards, Enrique

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2003-06-09 15:33 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-08  4:22 Split Trees (my design, similar to binary treaps) Kevin Bealer
2003-05-22 15:30 ` Hans Reiser
2003-05-22 16:09   ` Alexander G. M. Smith
2003-05-22 17:16     ` Hans Reiser
2003-05-22 20:21       ` Alexander G. M. Smith
2003-05-22 22:48         ` Hans Reiser
2003-05-22 23:27           ` Alexander G. M. Smith
2003-05-23 18:56             ` Kevin Bealer
2003-06-02 17:33               ` Hans Reiser
2003-06-04  4:54                 ` Kevin Bealer
2003-05-23 19:24       ` Kevin Bealer
2003-06-03 17:49         ` Hans Reiser
2003-06-04  5:17           ` Enrique Perez-Terron
2003-06-04  9:40             ` Hans Reiser
2003-06-05 22:34               ` Enrique Perez-Terron
2003-06-08  5:26                 ` Kevin Bealer
2003-06-09 15:33                   ` Enrique Perez-Terron
2003-06-08  5:20               ` Kevin Bealer
2003-06-04  5:22           ` Kevin Bealer
2003-06-04  9:44             ` Hans Reiser
2003-06-08  5:17               ` Kevin Bealer
2003-05-27  1:38   ` Enrique Perez-Terron
2003-05-27 20:47     ` Hans Reiser
2003-05-28 16:44     ` Kevin Bealer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.