From: "C. Scott Ananian" <cscott@cscott.net>
To: Martin Uecker <muecker@gmx.de>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
Date: Wed, 20 Apr 2005 10:30:15 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.61.0504201025030.2630@cag.csail.mit.edu> (raw)
In-Reply-To: <20050420132446.GA10126@macavity>
On Wed, 20 Apr 2005, Martin Uecker wrote:
> The other thing I don't like is the use of a sha1
> for a complete file. Switching to some kind of hash
> tree would allow to introduce chunks later. This has
> two advantages:
You can (and my code demonstrates/will demonstrate) still use a whole-file
hash to use chunking. With content prefixes, this takes O(N ln M) time
(where N is the file size and M is the number of chunks) to compute all
hashes; if subtrees can share the same prefix, then you can do this in
O(N) time (ie, as fast as possible, modulo a constant factor, which is
'2'). You don't *need* internal hashing functions.
> It would allow git to scale to repositories of large
> binary files. And it would allow to build a very cool
> content transport algorithm for those repositories.
> This algorithm could combine all the advantages of
> bittorrent and rsync (without the cpu load).
Yes, the big benefit of internal hashing is that it lets you check
validity of a chunk w/o having the entire file available. I'm not sure
that's terribly useful in this case. [And, if it is, then it can
obviously be done w/ other means.]
> And it would allow trivial merging of patches which
> apply to different chunks of a file in exact the same
> way as merging changesets which apply to different
> files in a tree.
I'm not sure anyone should be looking at chunks. To me, at least, they
are an object-store-implementation detail only. For merging, etc, we
should be looking at whole files, or (better) the whole repository.
The chunking algorithm is guaranteed not to respect semantic boundaries
(for *some* semantics of *some* file).
--scott
explosion JMTRAX DC KUBARK biowarfare LCFLUTTER ESMERALDITE for Dummies
Hager Nader Israel General ZRMETAL Castro cryptographic Indonesia
( http://cscott.net/ )
next prev parent reply other threads:[~2005-04-20 14:26 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-19 16:50 [PATCH] write-tree performance problems Chris Mason
2005-04-19 17:36 ` Linus Torvalds
2005-04-19 18:11 ` Chris Mason
2005-04-19 19:03 ` Linus Torvalds
2005-04-19 21:08 ` Chris Mason
2005-04-19 21:23 ` Linus Torvalds
2005-04-20 0:49 ` Chris Mason
2005-04-20 1:09 ` Linus Torvalds
2005-04-20 6:43 ` Linus Torvalds
2005-04-20 7:38 ` H. Peter Anvin
2005-04-20 9:08 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) Linus Torvalds
2005-04-20 10:04 ` Ingo Molnar
2005-04-20 12:11 ` Jon Seymour
2005-04-20 13:24 ` Martin Uecker
2005-04-20 13:35 ` Morten Welinder
2005-04-20 13:41 ` Jon Seymour
2005-04-20 14:30 ` C. Scott Ananian [this message]
2005-04-20 15:19 ` Martin Uecker
2005-04-20 15:28 ` C. Scott Ananian
2005-04-20 15:57 ` Martin Uecker
2005-04-20 16:33 ` Martin Uecker
2005-04-20 13:30 ` Blob chunking code. [First look.] C. Scott Ananian
2005-04-20 17:31 ` Blob chunking code. [Second look] C. Scott Ananian
2005-04-20 14:13 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) David Woodhouse
2005-04-20 14:59 ` Linus Torvalds
2005-04-20 22:29 ` David Woodhouse
[not found] ` <2cfc4032050420050655265d3a@mail.gmail.com>
2005-04-20 14:29 ` Linus Torvalds
2005-04-20 14:35 ` C. Scott Ananian
2005-04-20 15:22 ` [PATCH] write-tree performance problems Chris Mason
2005-04-20 15:30 ` C. Scott Ananian
2005-04-20 15:46 ` Linus Torvalds
2005-04-20 15:52 ` C. Scott Ananian
2005-04-20 16:21 ` Linus Torvalds
2005-04-20 15:40 ` Linus Torvalds
2005-04-20 16:10 ` David Willmore
2005-04-20 16:33 ` Linus Torvalds
2005-04-20 16:41 ` Linus Torvalds
2005-04-20 16:37 ` Chris Mason
2005-04-20 17:06 ` Linus Torvalds
2005-04-20 17:23 ` Chris Mason
2005-04-20 17:52 ` Linus Torvalds
2005-04-20 19:04 ` Chris Mason
2005-04-20 19:19 ` Linus Torvalds
2005-04-20 19:47 ` Linus Torvalds
2005-04-20 18:07 ` David S. Miller
2005-04-19 22:09 ` David Lang
2005-04-19 22:21 ` Linus Torvalds
2005-04-19 23:00 ` David Lang
2005-04-19 23:09 ` Linus Torvalds
2005-04-19 23:42 ` David Lang
2005-04-19 23:59 ` Linus Torvalds
2005-04-19 21:52 ` Christopher Li
2005-04-19 18:51 ` Olivier Galibert
2005-04-19 22:47 ` C. Scott Ananian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.61.0504201025030.2630@cag.csail.mit.edu \
--to=cscott@cscott.net \
--cc=git@vger.kernel.org \
--cc=muecker@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).