All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chuck Lever <cel@citi.umich.edu>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/2] A new merge algorithm, take 3
Date: Thu, 08 Sep 2005 11:06:21 -0400	[thread overview]
Message-ID: <4320536D.2010706@citi.umich.edu> (raw)
In-Reply-To: <7vvf1cz64l.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]

Junio C Hamano wrote:
> Chuck Lever <cel@citi.umich.edu> writes:
> 
> 
>>for the past two weeks i've been attempting to replace the active_cache 
>>array with an abstract data type (linked list just as a prototype) in 
>>order to eliminate the need to use memmove() during insertions and 
>>deletions.
>>
>>one big win is that the merge algorithm no longer needs to overwrite the 
>>active_cache.  you can keep two lists and simply move elements from one 
>>to the other as you do the merge, and then hook the new, merged list 
>>back to the cache head when you're done.
>>
>>anyway, i'm at a point where i could use some help and advice if this is 
>>something you think could be useful in the long run.
> 
> 
> This is interesting.
> 
> For something like a kernel tree that has around 18k entries, a
> single memmove will at the worst case move that many pointers,
> so naturally a list based implementation would perform well for
> many insertions and deletions.   On the other hand, the list
> based implementation would make it very expensive to find an
> entry read-only, I suspect, unlike the array based
> implementation which lets us do binary search.

using a list was an easy prototype to see what issues there are when 
converting from an array into an abstract data type.  i've created a few 
helpers that allow me to limit the list implementation changes to 
cache.h, read-cache.c, and read-tree.c; but having these helpers means 
it should be fairly simple to switch to any reasonable data structure. 
(the helper parts are already working and i can post them to the list if 
folks are interested).

once the list implementation is working well, i plan to go back and 
replace it with something more sophisticated which can perform 
insertion, deletion, and searching efficiently.

in the long run, i see using a balanced tree.  that would make 
insertions and searches quick, and prevent the tree from degenerating 
into a linked list due to a series of in-order insertions.  a splay tree 
might be even more entertaining, as it always remains balanced, and a 
search operation places the found node or insertion point right at the 
root of the tree.

each node in the tree would represent a unique path, and have references 
to the entries of all four stages, contained in an array.  that would 
simplify several routines in read-cache.c and probably make the merge 
logic even easier to understand and more efficient.

> I haven't timed it like you did, but my gut feeling is that the
> most wastage during the merge is coming from having to move
> entries because we "insert into" or "remove from" the same
> active-cache array.

the merge actually replaces entries in place, so it is already fairly 
efficient.  the wastage in the merge case arises from the many list 
insertions done by read_cache, all of which involve moving some of the 
active cache array.

> If we allocate a new array and populate it
> from scratch as we iterate through the paths being merged and
> replace the active_cache pointer at the very end, we would do
> much better without going to a linked list implementation, which
> would penalize the normal read-only case.

i can already tell you that using a separate source and destination data 
structure during the merge simplifies the logic and makes it easier to 
understand.  i imagine it will also be easier to maintain and enhance in 
the long run.

> I think what Daniel
> did to read-tree recently, still in the proposed updates branch,
> would make this kind of change far easier to implement.

as i'm new to git development, i wasn't aware of the proposed branch.  i 
will see if i can take a look (or send me a pointer to the list archives 
if he has posted them here).

> I wanted to cc this message to Daniel but your message to me was
> somehow private so I did not.  Please feel free to bring this
> issue up either with him directly or on the list.

i wanted to assess whether this was a reasonable idea or if there was 
already work in progress in this area.  thanks for your response!

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


  parent reply	other threads:[~2005-09-08 15:06 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-07 16:47 [PATCH 0/2] A new merge algorithm, take 3 Fredrik Kuivinen
2005-09-07 16:50 ` [PATCH 1/2] Add '-i' flag to read-tree to make it ignore whats in the working directory Fredrik Kuivinen
2005-09-11  2:54   ` Unified merge driver pushed out to "master" branch Junio C Hamano
2005-09-11 21:05     ` Fredrik Kuivinen
2005-09-12  1:23       ` Junio C Hamano
2005-09-14  5:56     ` Another merge test case from the kernel tree Junio C Hamano
2005-09-14 16:11       ` Daniel Barkalow
2005-09-14 16:30         ` Junio C Hamano
2005-09-14 17:42       ` Fredrik Kuivinen
2005-09-14 17:51         ` Junio C Hamano
2005-09-15  0:47       ` Yet another set of merge test cases " Junio C Hamano
2005-09-19 16:13         ` Fredrik Kuivinen
2005-09-20  1:53           ` Junio C Hamano
2005-09-20  5:50             ` Fredrik Kuivinen
2005-09-07 16:51 ` [PATCH 2/2] A new merge algorithm Fredrik Kuivinen
2005-09-07 18:33 ` [PATCH 0/2] A new merge algorithm, take 3 Daniel Barkalow
2005-09-08  6:06   ` Fredrik Kuivinen
2005-09-08 15:27     ` Daniel Barkalow
2005-09-08 20:05       ` Fredrik Kuivinen
2005-09-08 21:27         ` Daniel Barkalow
2005-09-07 18:36 ` Junio C Hamano
     [not found]   ` <431F34FF.5050301@citi.umich.edu>
     [not found]     ` <7vvf1cz64l.fsf@assigned-by-dhcp.cox.net>
2005-09-08 15:06       ` Chuck Lever [this message]
2005-09-08 16:51         ` Junio C Hamano
2005-09-08 17:19           ` Linus Torvalds
2005-09-08 17:51             ` Junio C Hamano
2005-09-08 18:16             ` Chuck Lever
2005-09-08 18:35               ` Linus Torvalds
2005-09-08 18:58                 ` Chuck Lever
2005-09-08 20:59                   ` Linus Torvalds
2005-09-09  7:44                   ` [RFH] Merge driver Junio C Hamano
2005-09-09 16:05                     ` Daniel Barkalow
2005-09-09 16:43                       ` Junio C Hamano
2005-09-09 17:25                         ` Daniel Barkalow
2005-09-11  4:58                     ` Junio C Hamano
2005-09-12 21:08                     ` Fredrik Kuivinen
2005-09-12 21:16                       ` Junio C Hamano
2005-09-13 20:33                         ` Fredrik Kuivinen
2005-09-13 20:46                           ` Junio C Hamano
2005-09-08 20:54   ` [PATCH 0/2] A new merge algorithm, take 3 Junio C Hamano
2005-09-08 21:23     ` A Large Angry SCM

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4320536D.2010706@citi.umich.edu \
    --to=cel@citi.umich.edu \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.