From: Shawn Pearce <spearce@spearce.org>
To: "Randal L. Schwartz" <merlyn@stonehenge.com>
Cc: Nicolas Pitre <nico@cam.org>, Jakub Narebski <jnareb@gmail.com>,
git@vger.kernel.org
Subject: Re: How should I handle binary file with GIT
Date: Wed, 5 Apr 2006 11:55:28 -0400 [thread overview]
Message-ID: <20060405155528.GI14625@spearce.org> (raw)
In-Reply-To: <86wte4rq3d.fsf@blue.stonehenge.com>
"Randal L. Schwartz" <merlyn@stonehenge.com> wrote:
> >>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
>
> >> IIRC bsdiff is used by Firefox to distribute binary software updates.
> >> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
> >> supposedly offers worse compression (bigger diffs).
>
> Nicolas> We already have our own delta code for pack storage.
>
> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved. I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted? When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?
The clearly safe approach is to include the full SHA1 ID of the
old object the patch was created from and use the xdelta in the
patch only as a means of transporting a compressed form of the new
version of the object. If git-diff starts to export say a base 64
encoding of the xdelta then it should also include the full SHA1
ID for binary files, even if --full-index wasn't given.
git-apply should only apply an xdelta patch to the exact same
old object. If the tree currently has a different object at that
path then reject the patch entirely.
If a path has a different object then the patch was based on then
we can do one of two things to be ``nice'' to the human:
- If the old blob exists in the repository (it just isn't the
current version at that path) then generate a temporary merge
file holding the old blob with the delta applied. The user can
then finish the merge with whatever tool understands that binary
file format, or do the merge by hand.
- Supply a ``do it anyway'' flag to git-apply. If this flag is
given on the command line then the binary file is patched even
though the object versions differ. For some binary file formats
this may actually be a valid thing to do. But it probably isn't
for a very large percentage of known file formats.
I could see some cases where it might be nice to be able to perform
specialized merge handling of binary files via hooks or filters.
For example *.tar.gz, *.zip, *.jar - these files are all just
compressed trees. They should be somewhat mergeable with the same
semantics as other trees in GIT. Of course one could just unpack
these into a directory and let GIT track the directory instead,
but this is rather inconvenient in a Java project. :-)
If I recall correctly OpenOffice document files are XML compressed
into ZIP archives. The XML *might* diff/patch cleanly as plain text.
The other resources in that archive are typically binary graphic
files and the like, which of course wouldn't diff/patch nicely.
But being able to diff/patch the main content might be semi-useful.
--
Shawn.
next prev parent reply other threads:[~2006-04-05 15:55 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-05 7:30 How should I handle binary file with GIT moreau francis
2006-04-05 8:14 ` Junio C Hamano
2006-04-05 12:21 ` moreau francis
2006-04-05 13:25 ` Nicolas Pitre
2006-04-05 13:35 ` moreau francis
2006-04-05 13:06 ` Nicolas Pitre
2006-04-05 13:18 ` moreau francis
2006-04-05 19:23 ` Marco Roeland
2006-04-05 15:11 ` Jakub Narebski
2006-04-05 15:32 ` Nicolas Pitre
2006-04-05 15:37 ` Randal L. Schwartz
2006-04-05 15:55 ` Shawn Pearce [this message]
2006-04-05 16:25 ` Nicolas Pitre
2006-04-05 16:21 ` Nicolas Pitre
2006-04-05 18:34 ` Junio C Hamano
2006-04-05 18:51 ` Randal L. Schwartz
2006-04-05 19:31 ` Nicolas Pitre
2006-04-05 20:20 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060405155528.GI14625@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=merlyn@stonehenge.com \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.