git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "C. Scott Ananian" <cscott@cscott.net>
To: Martin Uecker <muecker@gmx.de>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
Date: Wed, 20 Apr 2005 10:30:15 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.61.0504201025030.2630@cag.csail.mit.edu> (raw)
In-Reply-To: <20050420132446.GA10126@macavity>

On Wed, 20 Apr 2005, Martin Uecker wrote:

> The other thing I don't like is the use of a sha1
> for a complete file. Switching to some kind of hash
> tree would allow to introduce chunks later. This has
> two advantages:

You can (and my code demonstrates/will demonstrate) still use a whole-file 
hash to use chunking.  With content prefixes, this takes O(N ln M) time 
(where N is the file size and M is the number of chunks) to compute all 
hashes; if subtrees can share the same prefix, then you can do this in 
O(N) time (ie, as fast as possible, modulo a constant factor, which is 
'2').  You don't *need* internal hashing functions.

> It would allow git to scale to repositories of large
> binary files. And it would allow to build a very cool
> content transport algorithm for those repositories.
> This algorithm could combine all the advantages of
> bittorrent and rsync (without the cpu load).

Yes, the big benefit of internal hashing is that it lets you check 
validity of a chunk w/o having the entire file available.  I'm not sure 
that's terribly useful in this case.  [And, if it is, then it can 
obviously be done w/ other means.]

> And it would allow trivial merging of patches which
> apply to different chunks of a file in exact the same
> way as merging changesets which apply to different
> files in a tree.

I'm not sure anyone should be looking at chunks.  To me, at least, they 
are an object-store-implementation detail only.  For merging, etc, we 
should be looking at whole files, or (better) the whole repository.
The chunking algorithm is guaranteed not to respect semantic boundaries 
(for *some* semantics of *some* file).
  --scott

explosion JMTRAX DC KUBARK biowarfare LCFLUTTER ESMERALDITE for Dummies 
Hager Nader Israel General ZRMETAL Castro cryptographic Indonesia
                          ( http://cscott.net/ )

  parent reply	other threads:[~2005-04-20 14:26 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-19 16:50 [PATCH] write-tree performance problems Chris Mason
2005-04-19 17:36 ` Linus Torvalds
2005-04-19 18:11   ` Chris Mason
2005-04-19 19:03     ` Linus Torvalds
2005-04-19 21:08       ` Chris Mason
2005-04-19 21:23         ` Linus Torvalds
2005-04-20  0:49           ` Chris Mason
2005-04-20  1:09             ` Linus Torvalds
2005-04-20  6:43             ` Linus Torvalds
2005-04-20  7:38               ` H. Peter Anvin
2005-04-20  9:08                 ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) Linus Torvalds
2005-04-20 10:04                   ` Ingo Molnar
2005-04-20 12:11                   ` Jon Seymour
2005-04-20 13:24                     ` Martin Uecker
2005-04-20 13:35                       ` Morten Welinder
2005-04-20 13:41                       ` Jon Seymour
2005-04-20 14:30                       ` C. Scott Ananian [this message]
2005-04-20 15:19                         ` Martin Uecker
2005-04-20 15:28                           ` C. Scott Ananian
2005-04-20 15:57                             ` Martin Uecker
2005-04-20 16:33                               ` Martin Uecker
2005-04-20 13:30                   ` Blob chunking code. [First look.] C. Scott Ananian
2005-04-20 17:31                     ` Blob chunking code. [Second look] C. Scott Ananian
2005-04-20 14:13                   ` WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems) David Woodhouse
2005-04-20 14:59                     ` Linus Torvalds
2005-04-20 22:29                       ` David Woodhouse
     [not found]                   ` <2cfc4032050420050655265d3a@mail.gmail.com>
2005-04-20 14:29                     ` Linus Torvalds
2005-04-20 14:35                       ` C. Scott Ananian
2005-04-20 15:22               ` [PATCH] write-tree performance problems Chris Mason
2005-04-20 15:30                 ` C. Scott Ananian
2005-04-20 15:46                   ` Linus Torvalds
2005-04-20 15:52                     ` C. Scott Ananian
2005-04-20 16:21                       ` Linus Torvalds
2005-04-20 15:40                 ` Linus Torvalds
2005-04-20 16:10                   ` David Willmore
2005-04-20 16:33                   ` Linus Torvalds
2005-04-20 16:41                     ` Linus Torvalds
2005-04-20 16:37                   ` Chris Mason
2005-04-20 17:06                     ` Linus Torvalds
2005-04-20 17:23                       ` Chris Mason
2005-04-20 17:52                         ` Linus Torvalds
2005-04-20 19:04                           ` Chris Mason
2005-04-20 19:19                             ` Linus Torvalds
2005-04-20 19:47                               ` Linus Torvalds
2005-04-20 18:07                       ` David S. Miller
2005-04-19 22:09       ` David Lang
2005-04-19 22:21         ` Linus Torvalds
2005-04-19 23:00           ` David Lang
2005-04-19 23:09             ` Linus Torvalds
2005-04-19 23:42               ` David Lang
2005-04-19 23:59                 ` Linus Torvalds
2005-04-19 21:52                   ` Christopher Li
2005-04-19 18:51   ` Olivier Galibert
2005-04-19 22:47   ` C. Scott Ananian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.61.0504201025030.2630@cag.csail.mit.edu \
    --to=cscott@cscott.net \
    --cc=git@vger.kernel.org \
    --cc=muecker@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).