git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Multiblobs
@ 2010-04-28 15:12 Sergio Callegari
  2010-04-28 18:07 ` Multiblobs Avery Pennarun
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Sergio Callegari @ 2010-04-28 15:12 UTC (permalink / raw)
  To: git

Hi,

it happened to me to read an older post by Jeff King about "multiblobs"
(http://kerneltrap.org/mailarchive/git/2008/4/6/1360014) and I was wandering
whether the idea has been abandoned for some reason or just put on hold.

Apparently, this would marvellously help on
- storing large binary blobs (the split could happen with a rolling checksum
approach)
- storing "structured files", such as the many zip-based file formats
(Opendocument, Docx, Jar files, zip files themselves), tars (including
compressed tars), pdfs, etc, whose number is rising day after day...
- storing binary files with textual tags, where the tags could go on a separate
blob, greatly simplifying their readout without any need for caching them on a
note tree.
- etc...

Furthermore, this could also
- help the management of upstream trees. This could be simplified since the
"pristine tree" distributed as a tar.gz file and the exploded repo could share
their blobs making commands such as pristine-tree unnecessary.
- help projects such as bup that currently need to provide split mechanisms of
their own.
- be used to add "different representations" to objects... for instance, when
storing a pdf one could use a fake split to store in a separate blob the
corresponding text, making the git-diff of pdfs almost instantaneous.

>From Jeff's post, I guess that the major issue could be that the same file could
get a different sha1 as a multiblob versus a regular blob, but maybe it could be
possible to make the multiblob take the same sha1 of the "equivalent plain blob"
rather than its real hash.

For the moment, I am just very curious about the idea and the possible pros and
cons... can someone (maybe Jeff himself) tell me a little more? Also I wonder
about the two possibilities (implement it in git vs implement it "on top of"
git).

Sergio

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-05-10 13:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-28 15:12 Multiblobs Sergio Callegari
2010-04-28 18:07 ` Multiblobs Avery Pennarun
2010-04-28 19:13   ` Multiblobs Sergio Callegari
2010-04-28 21:27     ` Multiblobs Avery Pennarun
2010-04-28 23:10       ` Multiblobs Michael Witten
2010-04-28 23:26       ` Multiblobs Sergio
2010-04-29  0:44         ` Multiblobs Avery Pennarun
2010-04-29 11:34       ` Multiblobs Peter Krefting
2010-04-29 15:28         ` Multiblobs Avery Pennarun
2010-04-30  8:20           ` Multiblobs Peter Krefting
2010-04-30 17:26             ` Multiblobs Avery Pennarun
2010-04-30  9:14     ` Multiblobs Hervé Cauwelier
2010-04-30 17:32       ` Multiblobs Avery Pennarun
2010-04-30 18:16       ` Multiblobs Michael Witten
2010-04-30 19:06         ` Multiblobs Hervé Cauwelier
2010-04-28 18:34 ` Multiblobs Geert Bosch
2010-04-29  6:55 ` Multiblobs Mike Hommey
2010-05-06  6:26 ` Multiblobs Jeff King
2010-05-06 22:56   ` Multiblobs Sergio Callegari
2010-05-10  6:36     ` Multiblobs Jeff King
2010-05-10 13:58       ` Multiblobs Sergio Callegari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).