git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "C. Scott Ananian" <cscott@cscott.net>
To: Martin Uecker <muecker@gmx.de>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
Date: Wed, 20 Apr 2005 11:28:20 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.61.0504201121490.2630@cag.csail.mit.edu> (raw)
In-Reply-To: <20050420151902.GA13175@macavity>

On Wed, 20 Apr 2005, Martin Uecker wrote:

>> You can (and my code demonstrates/will demonstrate) still use a whole-file
>> hash to use chunking.  With content prefixes, this takes O(N ln M) time
>> (where N is the file size and M is the number of chunks) to compute all
>> hashes; if subtrees can share the same prefix, then you can do this in
>> O(N) time (ie, as fast as possible, modulo a constant factor, which is
>> '2').  You don't *need* internal hashing functions.
>
> I don't understand this paragraph. What is an internal
> hash function? Your code seems to do exactly what I want.
> The hashes are computed recusively as in a hash tree
> with O(N ln N). The only difference between your design
> and a design based on a conventional (binary) hash tree
> seems to be that data is stored in the intermediate nodes
> too.

A merkle-tree (which I think you initially pointed me at) makes the hash 
of the internal nodes be a hash of the chunk's hashes; ie not a straight 
content hash.  This is roughly what my current implementation does, but
I would like to identify each subtree with the hash of the 
*(expanded) contents of that subtree* (ie no explicit reference to 
subtree hashes).  This makes it interoperable with non-chunked or 
differently-chunked representations, in that the top-level hash is *just 
the hash of the complete content*, not some hash-of-subtree-hashes.  Does 
that make more sense?

The code I posted doesn't demonstrate this very well, but now that Linus 
has abandoned the 'hash of compressed content' stuff, my next code posting 
should show this more clearly.

> If I don't miss anything essential, you can validate
> each treap piece at the moment you get it from the
> network with its SHA1 hash and then proceed with
> downloading the prefix and suffix tree (in parallel
> if you have more than one peer a la bittorrent).

Yes, I guess this is the detail I was going to abandon. =)

I viewed the fact that the top-level hash was dependent on the exact chunk 
makeup a 'misfeature', because it doesn't allow easy interoperability with 
existing non-chunked repos.
  --scott

WTO atomic operation Mossad Castro overthrow FSF fissionable HTAUTOMAT 
LCPANES MKDELTA Bush non-violent protest OVER THE HORIZON RADAR KUPALM
                          ( http://cscott.net/ )

  reply	other threads:[~2005-04-20 15:24 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-19 16:50 [PATCH] write-tree performance problems Chris Mason
2005-04-19 17:36 ` Linus Torvalds
2005-04-19 18:11   ` Chris Mason
2005-04-19 19:03     ` Linus Torvalds
2005-04-19 21:08       ` Chris Mason
2005-04-19 21:23         ` Linus Torvalds
2005-04-20  0:49           ` Chris Mason
2005-04-20  1:09             ` Linus Torvalds
2005-04-20  6:43             ` Linus Torvalds
2005-04-20  7:38               ` H. Peter Anvin
2005-04-20  9:08                 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) Linus Torvalds
2005-04-20 10:04                   ` Ingo Molnar
2005-04-20 12:11                   ` Jon Seymour
2005-04-20 13:24                     ` Martin Uecker
2005-04-20 13:35                       ` Morten Welinder
2005-04-20 13:41                       ` Jon Seymour
2005-04-20 14:30                       ` C. Scott Ananian
2005-04-20 15:19                         ` Martin Uecker
2005-04-20 15:28                           ` C. Scott Ananian [this message]
2005-04-20 15:57                             ` Martin Uecker
2005-04-20 16:33                               ` Martin Uecker
2005-04-20 13:30                   ` Blob chunking code. [First look.] C. Scott Ananian
2005-04-20 17:31                     ` Blob chunking code. [Second look] C. Scott Ananian
2005-04-20 14:13                   ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) David Woodhouse
2005-04-20 14:59                     ` Linus Torvalds
2005-04-20 22:29                       ` David Woodhouse
     [not found]                   ` <2cfc4032050420050655265d3a@mail.gmail.com>
2005-04-20 14:29                     ` Linus Torvalds
2005-04-20 14:35                       ` C. Scott Ananian
2005-04-20 15:22               ` [PATCH] write-tree performance problems Chris Mason
2005-04-20 15:30                 ` C. Scott Ananian
2005-04-20 15:46                   ` Linus Torvalds
2005-04-20 15:52                     ` C. Scott Ananian
2005-04-20 16:21                       ` Linus Torvalds
2005-04-20 15:40                 ` Linus Torvalds
2005-04-20 16:10                   ` David Willmore
2005-04-20 16:33                   ` Linus Torvalds
2005-04-20 16:41                     ` Linus Torvalds
2005-04-20 16:37                   ` Chris Mason
2005-04-20 17:06                     ` Linus Torvalds
2005-04-20 17:23                       ` Chris Mason
2005-04-20 17:52                         ` Linus Torvalds
2005-04-20 19:04                           ` Chris Mason
2005-04-20 19:19                             ` Linus Torvalds
2005-04-20 19:47                               ` Linus Torvalds
2005-04-20 18:07                       ` David S. Miller
2005-04-19 22:09       ` David Lang
2005-04-19 22:21         ` Linus Torvalds
2005-04-19 23:00           ` David Lang
2005-04-19 23:09             ` Linus Torvalds
2005-04-19 23:42               ` David Lang
2005-04-19 23:59                 ` Linus Torvalds
2005-04-19 21:52                   ` Christopher Li
2005-04-19 18:51   ` Olivier Galibert
2005-04-19 22:47   ` C. Scott Ananian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.61.0504201121490.2630@cag.csail.mit.edu \
    --to=cscott@cscott.net \
    --cc=git@vger.kernel.org \
    --cc=muecker@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).