Git development
 help / color / mirror / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: "Randal L. Schwartz" <merlyn@stonehenge.com>
Cc: Nicolas Pitre <nico@cam.org>, Jakub Narebski <jnareb@gmail.com>,
	git@vger.kernel.org
Subject: Re: How should I handle binary file with GIT
Date: Wed, 5 Apr 2006 11:55:28 -0400	[thread overview]
Message-ID: <20060405155528.GI14625@spearce.org> (raw)
In-Reply-To: <86wte4rq3d.fsf@blue.stonehenge.com>

"Randal L. Schwartz" <merlyn@stonehenge.com> wrote:
> >>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
> 
> >> IIRC bsdiff is used by Firefox to distribute binary software updates.
> >> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
> >> supposedly offers worse compression (bigger diffs).
> 
> Nicolas> We already have our own delta code for pack storage.
> 
> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

The clearly safe approach is to include the full SHA1 ID of the
old object the patch was created from and use the xdelta in the
patch only as a means of transporting a compressed form of the new
version of the object.  If git-diff starts to export say a base 64
encoding of the xdelta then it should also include the full SHA1
ID for binary files, even if --full-index wasn't given.

git-apply should only apply an xdelta patch to the exact same
old object.  If the tree currently has a different object at that
path then reject the patch entirely.

If a path has a different object then the patch was based on then
we can do one of two things to be ``nice'' to the human:

  - If the old blob exists in the repository (it just isn't the
  current version at that path) then generate a temporary merge
  file holding the old blob with the delta applied.  The user can
  then finish the merge with whatever tool understands that binary
  file format, or do the merge by hand.

  - Supply a ``do it anyway'' flag to git-apply.  If this flag is
  given on the command line then the binary file is patched even
  though the object versions differ.  For some binary file formats
  this may actually be a valid thing to do.  But it probably isn't
  for a very large percentage of known file formats.

I could see some cases where it might be nice to be able to perform
specialized merge handling of binary files via hooks or filters.

For example *.tar.gz, *.zip, *.jar - these files are all just
compressed trees.  They should be somewhat mergeable with the same
semantics as other trees in GIT.  Of course one could just unpack
these into a directory and let GIT track the directory instead,
but this is rather inconvenient in a Java project.  :-)

If I recall correctly OpenOffice document files are XML compressed
into ZIP archives.  The XML *might* diff/patch cleanly as plain text.
The other resources in that archive are typically binary graphic
files and the like, which of course wouldn't diff/patch nicely.
But being able to diff/patch the main content might be semi-useful.

-- 
Shawn.

  reply	other threads:[~2006-04-05 15:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-05  7:30 How should I handle binary file with GIT moreau francis
2006-04-05  8:14 ` Junio C Hamano
2006-04-05 12:21   ` moreau francis
2006-04-05 13:25     ` Nicolas Pitre
2006-04-05 13:35       ` moreau francis
2006-04-05 13:06   ` Nicolas Pitre
2006-04-05 13:18   ` moreau francis
2006-04-05 19:23     ` Marco Roeland
2006-04-05 15:11   ` Jakub Narebski
2006-04-05 15:32     ` Nicolas Pitre
2006-04-05 15:37       ` Randal L. Schwartz
2006-04-05 15:55         ` Shawn Pearce [this message]
2006-04-05 16:25           ` Nicolas Pitre
2006-04-05 16:21         ` Nicolas Pitre
2006-04-05 18:34         ` Junio C Hamano
2006-04-05 18:51           ` Randal L. Schwartz
2006-04-05 19:31           ` Nicolas Pitre
2006-04-05 20:20             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060405155528.GI14625@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=merlyn@stonehenge.com \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox