From: Martin Uecker <muecker@gmx.de>
To: Git Mailing List <git@vger.kernel.org>
Subject: Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
Date: Wed, 20 Apr 2005 17:57:34 +0200 [thread overview]
Message-ID: <20050420155734.GA13575@macavity> (raw)
In-Reply-To: <Pine.LNX.4.61.0504201121490.2630@cag.csail.mit.edu>
[-- Attachment #1: Type: text/plain, Size: 2419 bytes --]
On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote:
Hi,
> A merkle-tree (which I think you initially pointed me at) makes the hash
> of the internal nodes be a hash of the chunk's hashes; ie not a straight
> content hash. This is roughly what my current implementation does, but
> I would like to identify each subtree with the hash of the
> *(expanded) contents of that subtree* (ie no explicit reference to
> subtree hashes). This makes it interoperable with non-chunked or
> differently-chunked representations, in that the top-level hash is *just
> the hash of the complete content*, not some hash-of-subtree-hashes. Does
> that make more sense?
Yes, thank you. But I would like to argue against this:
You can make the representations interoperable
if you calculate the hash for the non-chunked
representations exactly as if this file is stored
chunked but simple do not store it in that way.
Of course this is not backward compatible to the
monolithic hash and not compatible with a differently
chunked representation (but you could store subtrees
unchunked if you think your chunks are too small).
> The code I posted doesn't demonstrate this very well, but now that Linus
> has abandoned the 'hash of compressed content' stuff, my next code posting
> should show this more clearly.
I think the hash of the treap piece should be calculated
from the hash of the prefix and suffix tree and the already
calculated hash of the uncompressed data. This makes hashing
nearly as cheap as in Linus version which is important
because checking whether a given file has identically
content as a stored version should be fast.
> >If I don't miss anything essential, you can validate
> >each treap piece at the moment you get it from the
> >network with its SHA1 hash and then proceed with
> >downloading the prefix and suffix tree (in parallel
> >if you have more than one peer a la bittorrent).
>
> Yes, I guess this is the detail I was going to abandon. =)
>
> I viewed the fact that the top-level hash was dependent on the exact chunk
> makeup a 'misfeature', because it doesn't allow easy interoperability with
> existing non-chunked repos.
I thought this as a misfeature too before I realized how
many advantages this has.
Martin
--
One night, when little Giana from Milano was fast asleep,
she had a strange dream.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2005-04-20 15:55 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-19 16:50 [PATCH] write-tree performance problems Chris Mason
2005-04-19 17:36 ` Linus Torvalds
2005-04-19 18:11 ` Chris Mason
2005-04-19 19:03 ` Linus Torvalds
2005-04-19 21:08 ` Chris Mason
2005-04-19 21:23 ` Linus Torvalds
2005-04-20 0:49 ` Chris Mason
2005-04-20 1:09 ` Linus Torvalds
2005-04-20 6:43 ` Linus Torvalds
2005-04-20 7:38 ` H. Peter Anvin
2005-04-20 9:08 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) Linus Torvalds
2005-04-20 10:04 ` Ingo Molnar
2005-04-20 12:11 ` Jon Seymour
2005-04-20 13:24 ` Martin Uecker
2005-04-20 13:35 ` Morten Welinder
2005-04-20 13:41 ` Jon Seymour
2005-04-20 14:30 ` C. Scott Ananian
2005-04-20 15:19 ` Martin Uecker
2005-04-20 15:28 ` C. Scott Ananian
2005-04-20 15:57 ` Martin Uecker [this message]
2005-04-20 16:33 ` Martin Uecker
2005-04-20 13:30 ` Blob chunking code. [First look.] C. Scott Ananian
2005-04-20 17:31 ` Blob chunking code. [Second look] C. Scott Ananian
2005-04-20 14:13 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) David Woodhouse
2005-04-20 14:59 ` Linus Torvalds
2005-04-20 22:29 ` David Woodhouse
[not found] ` <2cfc4032050420050655265d3a@mail.gmail.com>
2005-04-20 14:29 ` Linus Torvalds
2005-04-20 14:35 ` C. Scott Ananian
2005-04-20 15:22 ` [PATCH] write-tree performance problems Chris Mason
2005-04-20 15:30 ` C. Scott Ananian
2005-04-20 15:46 ` Linus Torvalds
2005-04-20 15:52 ` C. Scott Ananian
2005-04-20 16:21 ` Linus Torvalds
2005-04-20 15:40 ` Linus Torvalds
2005-04-20 16:10 ` David Willmore
2005-04-20 16:33 ` Linus Torvalds
2005-04-20 16:41 ` Linus Torvalds
2005-04-20 16:37 ` Chris Mason
2005-04-20 17:06 ` Linus Torvalds
2005-04-20 17:23 ` Chris Mason
2005-04-20 17:52 ` Linus Torvalds
2005-04-20 19:04 ` Chris Mason
2005-04-20 19:19 ` Linus Torvalds
2005-04-20 19:47 ` Linus Torvalds
2005-04-20 18:07 ` David S. Miller
2005-04-19 22:09 ` David Lang
2005-04-19 22:21 ` Linus Torvalds
2005-04-19 23:00 ` David Lang
2005-04-19 23:09 ` Linus Torvalds
2005-04-19 23:42 ` David Lang
2005-04-19 23:59 ` Linus Torvalds
2005-04-19 21:52 ` Christopher Li
2005-04-19 18:51 ` Olivier Galibert
2005-04-19 22:47 ` C. Scott Ananian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050420155734.GA13575@macavity \
--to=muecker@gmx.de \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).