From: "C. Scott Ananian" <cscott@cscott.net>
To: Martin Uecker <muecker@gmx.de>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
Date: Wed, 20 Apr 2005 11:28:20 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.61.0504201121490.2630@cag.csail.mit.edu> (raw)
In-Reply-To: <20050420151902.GA13175@macavity>
On Wed, 20 Apr 2005, Martin Uecker wrote:
>> You can (and my code demonstrates/will demonstrate) still use a whole-file
>> hash to use chunking. With content prefixes, this takes O(N ln M) time
>> (where N is the file size and M is the number of chunks) to compute all
>> hashes; if subtrees can share the same prefix, then you can do this in
>> O(N) time (ie, as fast as possible, modulo a constant factor, which is
>> '2'). You don't *need* internal hashing functions.
>
> I don't understand this paragraph. What is an internal
> hash function? Your code seems to do exactly what I want.
> The hashes are computed recusively as in a hash tree
> with O(N ln N). The only difference between your design
> and a design based on a conventional (binary) hash tree
> seems to be that data is stored in the intermediate nodes
> too.
A merkle-tree (which I think you initially pointed me at) makes the hash
of the internal nodes be a hash of the chunk's hashes; ie not a straight
content hash. This is roughly what my current implementation does, but
I would like to identify each subtree with the hash of the
*(expanded) contents of that subtree* (ie no explicit reference to
subtree hashes). This makes it interoperable with non-chunked or
differently-chunked representations, in that the top-level hash is *just
the hash of the complete content*, not some hash-of-subtree-hashes. Does
that make more sense?
The code I posted doesn't demonstrate this very well, but now that Linus
has abandoned the 'hash of compressed content' stuff, my next code posting
should show this more clearly.
> If I don't miss anything essential, you can validate
> each treap piece at the moment you get it from the
> network with its SHA1 hash and then proceed with
> downloading the prefix and suffix tree (in parallel
> if you have more than one peer a la bittorrent).
Yes, I guess this is the detail I was going to abandon. =)
I viewed the fact that the top-level hash was dependent on the exact chunk
makeup a 'misfeature', because it doesn't allow easy interoperability with
existing non-chunked repos.
--scott
WTO atomic operation Mossad Castro overthrow FSF fissionable HTAUTOMAT
LCPANES MKDELTA Bush non-violent protest OVER THE HORIZON RADAR KUPALM
( http://cscott.net/ )
next prev parent reply other threads:[~2005-04-20 15:24 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-19 16:50 [PATCH] write-tree performance problems Chris Mason
2005-04-19 17:36 ` Linus Torvalds
2005-04-19 18:11 ` Chris Mason
2005-04-19 19:03 ` Linus Torvalds
2005-04-19 21:08 ` Chris Mason
2005-04-19 21:23 ` Linus Torvalds
2005-04-20 0:49 ` Chris Mason
2005-04-20 1:09 ` Linus Torvalds
2005-04-20 6:43 ` Linus Torvalds
2005-04-20 7:38 ` H. Peter Anvin
2005-04-20 9:08 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) Linus Torvalds
2005-04-20 10:04 ` Ingo Molnar
2005-04-20 12:11 ` Jon Seymour
2005-04-20 13:24 ` Martin Uecker
2005-04-20 13:35 ` Morten Welinder
2005-04-20 13:41 ` Jon Seymour
2005-04-20 14:30 ` C. Scott Ananian
2005-04-20 15:19 ` Martin Uecker
2005-04-20 15:28 ` C. Scott Ananian [this message]
2005-04-20 15:57 ` Martin Uecker
2005-04-20 16:33 ` Martin Uecker
2005-04-20 13:30 ` Blob chunking code. [First look.] C. Scott Ananian
2005-04-20 17:31 ` Blob chunking code. [Second look] C. Scott Ananian
2005-04-20 14:13 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) David Woodhouse
2005-04-20 14:59 ` Linus Torvalds
2005-04-20 22:29 ` David Woodhouse
[not found] ` <2cfc4032050420050655265d3a@mail.gmail.com>
2005-04-20 14:29 ` Linus Torvalds
2005-04-20 14:35 ` C. Scott Ananian
2005-04-20 15:22 ` [PATCH] write-tree performance problems Chris Mason
2005-04-20 15:30 ` C. Scott Ananian
2005-04-20 15:46 ` Linus Torvalds
2005-04-20 15:52 ` C. Scott Ananian
2005-04-20 16:21 ` Linus Torvalds
2005-04-20 15:40 ` Linus Torvalds
2005-04-20 16:10 ` David Willmore
2005-04-20 16:33 ` Linus Torvalds
2005-04-20 16:41 ` Linus Torvalds
2005-04-20 16:37 ` Chris Mason
2005-04-20 17:06 ` Linus Torvalds
2005-04-20 17:23 ` Chris Mason
2005-04-20 17:52 ` Linus Torvalds
2005-04-20 19:04 ` Chris Mason
2005-04-20 19:19 ` Linus Torvalds
2005-04-20 19:47 ` Linus Torvalds
2005-04-20 18:07 ` David S. Miller
2005-04-19 22:09 ` David Lang
2005-04-19 22:21 ` Linus Torvalds
2005-04-19 23:00 ` David Lang
2005-04-19 23:09 ` Linus Torvalds
2005-04-19 23:42 ` David Lang
2005-04-19 23:59 ` Linus Torvalds
2005-04-19 21:52 ` Christopher Li
2005-04-19 18:51 ` Olivier Galibert
2005-04-19 22:47 ` C. Scott Ananian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.61.0504201121490.2630@cag.csail.mit.edu \
--to=cscott@cscott.net \
--cc=git@vger.kernel.org \
--cc=muecker@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).