From: Martin Uecker <muecker@gmx.de>
To: Git Mailing List <git@vger.kernel.org>
Subject: Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
Date: Wed, 20 Apr 2005 17:19:02 +0200 [thread overview]
Message-ID: <20050420151902.GA13175@macavity> (raw)
In-Reply-To: <Pine.LNX.4.61.0504201025030.2630@cag.csail.mit.edu>
[-- Attachment #1: Type: text/plain, Size: 2752 bytes --]
On Wed, Apr 20, 2005 at 10:30:15AM -0400, C. Scott Ananian wrote:
Hi,
your code looks pretty cool. thank you!
> On Wed, 20 Apr 2005, Martin Uecker wrote:
>
> >The other thing I don't like is the use of a sha1
> >for a complete file. Switching to some kind of hash
> >tree would allow to introduce chunks later. This has
> >two advantages:
>
> You can (and my code demonstrates/will demonstrate) still use a whole-file
> hash to use chunking. With content prefixes, this takes O(N ln M) time
> (where N is the file size and M is the number of chunks) to compute all
> hashes; if subtrees can share the same prefix, then you can do this in
> O(N) time (ie, as fast as possible, modulo a constant factor, which is
> '2'). You don't *need* internal hashing functions.
I don't understand this paragraph. What is an internal
hash function? Your code seems to do exactly what I want.
The hashes are computed recusively as in a hash tree
with O(N ln N). The only difference between your design
and a design based on a conventional (binary) hash tree
seems to be that data is stored in the intermediate nodes
too.
> >It would allow git to scale to repositories of large
> >binary files. And it would allow to build a very cool
> >content transport algorithm for those repositories.
> >This algorithm could combine all the advantages of
> >bittorrent and rsync (without the cpu load).
>
> Yes, the big benefit of internal hashing is that it lets you check
> validity of a chunk w/o having the entire file available. I'm not sure
> that's terribly useful in this case. [And, if it is, then it can
> obviously be done w/ other means.]
If I don't miss anything essential, you can validate
each treap piece at the moment you get it from the
network with its SHA1 hash and then proceed with
downloading the prefix and suffix tree (in parallel
if you have more than one peer a la bittorrent).
> >And it would allow trivial merging of patches which
> >apply to different chunks of a file in exact the same
> >way as merging changesets which apply to different
> >files in a tree.
>
> I'm not sure anyone should be looking at chunks. To me, at least, they
> are an object-store-implementation detail only. For merging, etc, we
> should be looking at whole files, or (better) the whole repository.
> The chunking algorithm is guaranteed not to respect semantic boundaries
> (for *some* semantics of *some* file).
You might be right. I just wanted to point out this
possibility because it would allow to avoid calling
external merging code for a lot of trivial merges.
bye,
Martin
--
One night, when little Giana from Milano was fast asleep,
she had a strange dream.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2005-04-20 15:16 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-19 16:50 [PATCH] write-tree performance problems Chris Mason
2005-04-19 17:36 ` Linus Torvalds
2005-04-19 18:11 ` Chris Mason
2005-04-19 19:03 ` Linus Torvalds
2005-04-19 21:08 ` Chris Mason
2005-04-19 21:23 ` Linus Torvalds
2005-04-20 0:49 ` Chris Mason
2005-04-20 1:09 ` Linus Torvalds
2005-04-20 6:43 ` Linus Torvalds
2005-04-20 7:38 ` H. Peter Anvin
2005-04-20 9:08 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) Linus Torvalds
2005-04-20 10:04 ` Ingo Molnar
2005-04-20 12:11 ` Jon Seymour
2005-04-20 13:24 ` Martin Uecker
2005-04-20 13:35 ` Morten Welinder
2005-04-20 13:41 ` Jon Seymour
2005-04-20 14:30 ` C. Scott Ananian
2005-04-20 15:19 ` Martin Uecker [this message]
2005-04-20 15:28 ` C. Scott Ananian
2005-04-20 15:57 ` Martin Uecker
2005-04-20 16:33 ` Martin Uecker
2005-04-20 13:30 ` Blob chunking code. [First look.] C. Scott Ananian
2005-04-20 17:31 ` Blob chunking code. [Second look] C. Scott Ananian
2005-04-20 14:13 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) David Woodhouse
2005-04-20 14:59 ` Linus Torvalds
2005-04-20 22:29 ` David Woodhouse
[not found] ` <2cfc4032050420050655265d3a@mail.gmail.com>
2005-04-20 14:29 ` Linus Torvalds
2005-04-20 14:35 ` C. Scott Ananian
2005-04-20 15:22 ` [PATCH] write-tree performance problems Chris Mason
2005-04-20 15:30 ` C. Scott Ananian
2005-04-20 15:46 ` Linus Torvalds
2005-04-20 15:52 ` C. Scott Ananian
2005-04-20 16:21 ` Linus Torvalds
2005-04-20 15:40 ` Linus Torvalds
2005-04-20 16:10 ` David Willmore
2005-04-20 16:33 ` Linus Torvalds
2005-04-20 16:41 ` Linus Torvalds
2005-04-20 16:37 ` Chris Mason
2005-04-20 17:06 ` Linus Torvalds
2005-04-20 17:23 ` Chris Mason
2005-04-20 17:52 ` Linus Torvalds
2005-04-20 19:04 ` Chris Mason
2005-04-20 19:19 ` Linus Torvalds
2005-04-20 19:47 ` Linus Torvalds
2005-04-20 18:07 ` David S. Miller
2005-04-19 22:09 ` David Lang
2005-04-19 22:21 ` Linus Torvalds
2005-04-19 23:00 ` David Lang
2005-04-19 23:09 ` Linus Torvalds
2005-04-19 23:42 ` David Lang
2005-04-19 23:59 ` Linus Torvalds
2005-04-19 21:52 ` Christopher Li
2005-04-19 18:51 ` Olivier Galibert
2005-04-19 22:47 ` C. Scott Ananian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050420151902.GA13175@macavity \
--to=muecker@gmx.de \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).