git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergio Callegari <sergio.callegari@gmail.com>
To: git@vger.kernel.org
Subject: Multiblobs
Date: Wed, 28 Apr 2010 15:12:07 +0000 (UTC)	[thread overview]
Message-ID: <loom.20100428T164432-954@post.gmane.org> (raw)

Hi,

it happened to me to read an older post by Jeff King about "multiblobs"
(http://kerneltrap.org/mailarchive/git/2008/4/6/1360014) and I was wandering
whether the idea has been abandoned for some reason or just put on hold.

Apparently, this would marvellously help on
- storing large binary blobs (the split could happen with a rolling checksum
approach)
- storing "structured files", such as the many zip-based file formats
(Opendocument, Docx, Jar files, zip files themselves), tars (including
compressed tars), pdfs, etc, whose number is rising day after day...
- storing binary files with textual tags, where the tags could go on a separate
blob, greatly simplifying their readout without any need for caching them on a
note tree.
- etc...

Furthermore, this could also
- help the management of upstream trees. This could be simplified since the
"pristine tree" distributed as a tar.gz file and the exploded repo could share
their blobs making commands such as pristine-tree unnecessary.
- help projects such as bup that currently need to provide split mechanisms of
their own.
- be used to add "different representations" to objects... for instance, when
storing a pdf one could use a fake split to store in a separate blob the
corresponding text, making the git-diff of pdfs almost instantaneous.

>From Jeff's post, I guess that the major issue could be that the same file could
get a different sha1 as a multiblob versus a regular blob, but maybe it could be
possible to make the multiblob take the same sha1 of the "equivalent plain blob"
rather than its real hash.

For the moment, I am just very curious about the idea and the possible pros and
cons... can someone (maybe Jeff himself) tell me a little more? Also I wonder
about the two possibilities (implement it in git vs implement it "on top of"
git).

Sergio

             reply	other threads:[~2010-04-28 15:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-28 15:12 Sergio Callegari [this message]
2010-04-28 18:07 ` Multiblobs Avery Pennarun
2010-04-28 19:13   ` Multiblobs Sergio Callegari
2010-04-28 21:27     ` Multiblobs Avery Pennarun
2010-04-28 23:10       ` Multiblobs Michael Witten
2010-04-28 23:26       ` Multiblobs Sergio
2010-04-29  0:44         ` Multiblobs Avery Pennarun
2010-04-29 11:34       ` Multiblobs Peter Krefting
2010-04-29 15:28         ` Multiblobs Avery Pennarun
2010-04-30  8:20           ` Multiblobs Peter Krefting
2010-04-30 17:26             ` Multiblobs Avery Pennarun
2010-04-30  9:14     ` Multiblobs Hervé Cauwelier
2010-04-30 17:32       ` Multiblobs Avery Pennarun
2010-04-30 18:16       ` Multiblobs Michael Witten
2010-04-30 19:06         ` Multiblobs Hervé Cauwelier
2010-04-28 18:34 ` Multiblobs Geert Bosch
2010-04-29  6:55 ` Multiblobs Mike Hommey
2010-05-06  6:26 ` Multiblobs Jeff King
2010-05-06 22:56   ` Multiblobs Sergio Callegari
2010-05-10  6:36     ` Multiblobs Jeff King
2010-05-10 13:58       ` Multiblobs Sergio Callegari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20100428T164432-954@post.gmane.org \
    --to=sergio.callegari@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).