Git development
 help / color / mirror / Atom feed
* [PATCH] Tweaks to make asciidoc play nice.
From: Francis Daly @ 2006-04-05 22:25 UTC (permalink / raw)
  To: git; +Cc: Marco Roeland

Once the content has been generated, the formatting elves can reorder
it to be pretty...

Signed-off-by: Francis Daly <francis@daoine.org>

---

The manpage formatting needed to stop things being ugly is
nontrivial. In this case, an option has multi-paragraph content,
some of which should be displayed preformatted.

"+" means "this next paragraph is a continuation of the same section"
"--" means "the next paragraphs are all continuations -- don't worry
     about all the other "+"s
"------" means "the next content is preformatted"
"------", the blank line, and the extra indentation within ------ are 
     there because docbook-xsl stylesheets are odd in different ways.
     This combination makes the preformatted text be visually distinct 
     with both versions 1.68 and 1.69. Without that oddness, we could
     use either just ------ or just indentation. Ho hum.

 Documentation/git-commit.txt |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

e46caf57bc2d09c032b992f9024226acd2d68fbc
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index ec8b562..0a7365b 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -83,15 +83,16 @@ OPTIONS
 	the current tip as parents -- so the current top commit is
 	discarded.
 +
+--
 It is a rough equivalent for:
-+
-$ git reset --soft HEAD^
-+
-$ ... do something else to come up with the right tree ...
-+
-$ git commit -c ORIG_HEAD
-+
+------
+	$ git reset --soft HEAD^
+	$ ... do something else to come up with the right tree ...
+	$ git commit -c ORIG_HEAD
+
+------
 but can be used to amend a merge commit.
+--
 
 -i|--include::
 	Instead of committing only the files specified on the
-- 
1.2.4.g3488

-- 
Francis Daly        francis@daoine.org

^ permalink raw reply related

* Re: Cygwin can't handle huge packfiles?
From: Christopher Faylor @ 2006-04-05 21:08 UTC (permalink / raw)
  To: Johannes Schindelin, Kees-Jan Dijkzeul, git
In-Reply-To: <Pine.LNX.4.63.0604051612200.25304@wbgn013.biozentrum.uni-wuerzburg.de>

On Wed, Apr 05, 2006 at 04:14:20PM +0200, Johannes Schindelin wrote:
>> Inspired by a patch of Alex Riesen (thanks, Alex), I tried to use the
>> regular mmap for mapping pack files, only to discover that I compile
>> without defining "NO_MMAP", so I've been using the stock mmap all
>> along. So now I'm thinking that the cygwin mmap also does a
>> malloc-and-read, just like git does with NO_MMAP. So I'll continue to
>> investigate in that direction.
>
>I think cygwin's mmap() is based on the Win32 API equivalent, which could 
>mean that it *is* memory mapped, but in a special area (which is smaller 
>than 1.5 gigabyte). In this case, it would make sense to limit the pack 
>size, thereby having several packs, and mmap() them as they are needed.

Yes, cygwin's mmap uses CreateFileMapping and MapViewOfFile.  IIRC,
Windows might have a 2G limitation lurking under the hood somewhere but
I think that might be tweakable with some registry setting.

cgf

^ permalink raw reply

* [PATCH] git-commit: document --amend
From: Marco Roeland @ 2006-04-05 20:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Roeland, git
In-Reply-To: <7vacaz23wr.fsf@assigned-by-dhcp.cox.net>

The "--amend" option is used to amend the tip of the current branch. This
documentation text was copied straight from the commit that implemented it.

Some minor format tweaks for asciidoc were taken from work by Francis Daly
in commit b0d08a5.. It looks good now also in the html page.

Signed-off-by: Marco Roeland <marco.roeland@xs4all.nl>

---

 Documentation/git-commit.txt |   24 +++++++++++++++++++++++-
 1 files changed, 23 insertions(+), 1 deletions(-)

293dccf6f8c47294a42376ad96d8c1130b06c9b9
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index d04b342..ec8b562 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -9,7 +9,8 @@ SYNOPSIS
 --------
 [verse]
 'git-commit' [-a] [-s] [-v] [(-c | -C) <commit> | -F <file> | -m <msg>]
-	   [-e] [--author <author>] [--] [[-i | -o ]<file>...]
+	   [--no-verify] [--amend] [-e] [--author <author>]
+	   [--] [[-i | -o ]<file>...]
 
 DESCRIPTION
 -----------
@@ -70,6 +71,27 @@ OPTIONS
 	`-m`, and from file with `-C` are usually used as the
 	commit log message unmodified.  This option lets you
 	further edit the message taken from these sources.
+
+--amend::
+
+	Used to amend the tip of the current branch. Prepare the tree
+	object you would want to replace the latest commit as usual
+	(this includes the usual -i/-o and explicit paths), and the
+	commit log editor is seeded with the commit message from the
+	tip of the current branch. The commit you create replaces the
+	current tip -- if it was a merge, it will have the parents of
+	the current tip as parents -- so the current top commit is
+	discarded.
++
+It is a rough equivalent for:
++
+$ git reset --soft HEAD^
++
+$ ... do something else to come up with the right tree ...
++
+$ git commit -c ORIG_HEAD
++
+but can be used to amend a merge commit.
 
 -i|--include::
 	Instead of committing only the files specified on the
-- 
1.3.0.rc2.gca38

^ permalink raw reply related

* Re: How should I handle binary file with GIT
From: Junio C Hamano @ 2006-04-05 20:20 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Randal L. Schwartz, git
In-Reply-To: <Pine.LNX.4.64.0604051521480.2550@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> On Wed, 5 Apr 2006, Junio C Hamano wrote:
>
>> We've been trying to keep our diff output reversible (e.g. we
>> show what the filemode of the preimage is), so if we take the
>> above route, it probably should record deltas for both going
>> from preimage to postimage _and_ going the other way (unless
>> xdelta can be applied in-reverse, which I do not think is the
>> case).
>
> You cannot reverse a delta.  However if you were able to apply a delta 
> from preimage to postimage that means you must already have had preimage 
> in your object store.  Therefore reverting such a patch would simply 
> involve restoring preimage.

The case I had in mind was where you shipped a tarball of the
tip to somebody (or "a shallow clone"), and after seeing him
having problems with that release, sending him a patch telling
him "reverting this might help, could you please give it a try?"

Of course you could be nicer to him and generate the reverse
diff on your end in such a case instead.

^ permalink raw reply

* Re: [PATCH] git-commit: document --append (amend really!)
From: Junio C Hamano @ 2006-04-05 19:55 UTC (permalink / raw)
  To: Marco Roeland; +Cc: git
In-Reply-To: <20060405194607.GB20854@fiberbit.xs4all.nl>

Marco Roeland <marco.roeland@xs4all.nl> writes:

> Here with asciidoc 7.1.2 (Debian 'sid') it looks good in the generated
> man page. But I'll investigate if nobody beats me to it.

Please see below for an example.

> Oops. Well I suppose I could use "git commit --amend" and then run "git
> format-patch" again I suppose. ;-)

Yup ;-).

diff-tree b0d08a504bee17dfc46f761e166ff2c20c59a91a (from 3103cf9e1e09b0045a60542f24a2a1e4ed7b1237)
Author: Francis Daly <francis@daoine.org>
Date:   Wed Mar 22 09:53:57 2006 +0000

    Format tweaks for asciidoc.
    
    Some documentation "options" were followed by independent preformatted
    paragraphs. Now they are associated plain text paragraphs. The
    difference is clear in the generated html.
    
    Signed-off-by: Junio C Hamano <junkio@cox.net>

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index fbd2394..d55456a 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -24,13 +24,13 @@ OPTIONS
 
 <option>...::
 	Either an option to pass to `grep` or `git-ls-files`.
-
-	The following are the specific `git-ls-files` options
-	that may be given: `-o`, `--cached`, `--deleted`, `--others`,
-	`--killed`, `--ignored`, `--modified`, `--exclude=*`,
-	`--exclude-from=*`, and `--exclude-per-directory=*`.
-
-	All other options will be passed to `grep`.
++
+The following are the specific `git-ls-files` options
+that may be given: `-o`, `--cached`, `--deleted`, `--others`,
+`--killed`, `--ignored`, `--modified`, `--exclude=\*`,
+`--exclude-from=\*`, and `--exclude-per-directory=\*`.
++
+All other options will be passed to `grep`.
 
 <pattern>::
 	The pattern to look for.  The first non option is taken

^ permalink raw reply related

* Re: [PATCH] git-commit: document --append (amend really!)
From: Marco Roeland @ 2006-04-05 19:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Roeland, git
In-Reply-To: <7vfykr24wi.fsf@assigned-by-dhcp.cox.net>

On Wednesday April 5th 2006 Junio C Hamano wrote:

> Thanks for resurrecting this.
> 
> I suspect that some formatting tweak is needed; I recall
> asciidoc needs some special formatting when multi- paragraph
> description is involved in the list.

Here with asciidoc 7.1.2 (Debian 'sid') it looks good in the generated
man page. But I'll investigate if nobody beats me to it. Perhaps we
should develop a "sparse" like module for asciidoc.

> Of course, munging the patch title with s/append/amend/ would
> not hurt ;-).

Oops. Well I suppose I could use "git commit --amend" and then run "git
format-patch" again I suppose. ;-)
-- 
Marco Roeland

^ permalink raw reply

* Re: [PATCH] git-commit: document --append
From: Junio C Hamano @ 2006-04-05 19:34 UTC (permalink / raw)
  To: Marco Roeland; +Cc: git
In-Reply-To: <20060405191608.GA20572@fiberbit.xs4all.nl>

Thanks for resurrecting this.

I suspect that some formatting tweak is needed; I recall
asciidoc needs some special formatting when multi- paragraph
description is involved in the list.

Of course, munging the patch title with s/append/amend/ would
not hurt ;-).

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Nicolas Pitre @ 2006-04-05 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Randal L. Schwartz, git
In-Reply-To: <7vslor27n4.fsf@assigned-by-dhcp.cox.net>

On Wed, 5 Apr 2006, Junio C Hamano wrote:

> If we wanted to use the patch+diff (i.e. "format-patch,
> send-email, and then am" workflow) to transfer new version of
> binary files to a recipient, which I think is useful in some
> projects, the sanest way to handle this is probably to add
> Nico's delta, going from preimage to postimage, encoded for
> safer transport, to our diff output.  For safety and sanity, we
> will not "apply" the patch unless the patched file exactly
> matches the preimage that is recorded in the diff, and as long
> as the recipient has the preimage, such a patch would be able to
> reproduce the postimage and hopefully be smaller than
> transferring the whole thing.

Exactly the point.

> We've been trying to keep our diff output reversible (e.g. we
> show what the filemode of the preimage is), so if we take the
> above route, it probably should record deltas for both going
> from preimage to postimage _and_ going the other way (unless
> xdelta can be applied in-reverse, which I do not think is the
> case).

You cannot reverse a delta.  However if you were able to apply a delta 
from preimage to postimage that means you must already have had preimage 
in your object store.  Therefore reverting such a patch would simply 
involve restoring preimage.

> Of course, to be _completely_ generic, you could include both
> compressed then uuencoded preimage and postimage, and let the
> recipient sort it out.

I think this is just too much and besides the point of a diff.  If the 
work flow is so convoluted such that the simple binary patch as a delta 
doesn't apply then it would probably be a better idea to simply transfer 
those binaries as email attachments.  In other words, if a binary patch 
transfer mechanism is added, it should cover the common case and leave 
the rest for a better process like git-fetch/pull.


Nicolas

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Marco Roeland @ 2006-04-05 19:23 UTC (permalink / raw)
  To: moreau francis; +Cc: Junio C Hamano, git
In-Reply-To: <20060405131834.60888.qmail@web25804.mail.ukl.yahoo.com>

On Wednesday April 5th 2006 moreau francis wrote:

> BTW, what does "--amend" option do ? It doesn't seem to be documented anywhere.

This is the original commit text that introduced it:

diff-tree b4019f045646b1770a80394da876b8a7c6b8ca7b (from d320a5437f8304cf9ea3ee1898e49d643e005738)
Author: Junio C Hamano <junkio@cox.net>
Date:   Thu Mar 2 21:04:05 2006 -0800

    git-commit --amend
    
    The new flag is used to amend the tip of the current branch.  Prepare
    the tree object you would want to replace the latest commit as usual
    (this includes the usual -i/-o and explicit paths), and the commit log
    editor is seeded with the commit message from the tip of the current
    branch.  The commit you create replaces the current tip -- if it was a
    merge, it will have the parents of the current tip as parents -- so the
    current top commit is discarded.
    
    It is a rough equivalent for:
    
    	$ git reset --soft HEAD^
    	$ ... do something else to come up with the right tree ...
    	$ git commit -c ORIG_HEAD
    
    but can be used to amend a merge commit.
    
    Signed-off-by: Junio C Hamano <junkio@cox.net>

So in the original context you can add separate binaries to a commit
of only text files that you just rescued from CVS or something and then
change the commit to include these binaries as well.

I've sent a separate patch for the documentation for git-commit using
Junio's clear explanation.
-- 
Marco Roeland

^ permalink raw reply

* [PATCH] git-commit: document --append
From: Marco Roeland @ 2006-04-05 19:16 UTC (permalink / raw)
  To: git

The "--amend" option is used to amend the tip of the current branch. This
documentation text was copied straight from the commit that implemented it.

Signed-off-by: Marco Roeland <marco.roeland@xs4all.nl>

---

 Documentation/git-commit.txt |   22 +++++++++++++++++++++-
 1 files changed, 21 insertions(+), 1 deletions(-)

ca7d3b4fdd0cb24b7353da312fb9306531468f54
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index d04b342..3701cb3 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -9,7 +9,8 @@ SYNOPSIS
 --------
 [verse]
 'git-commit' [-a] [-s] [-v] [(-c | -C) <commit> | -F <file> | -m <msg>]
-	   [-e] [--author <author>] [--] [[-i | -o ]<file>...]
+	   [--no-verify] [--amend] [-e] [--author <author>]
+	   [--] [[-i | -o ]<file>...]
 
 DESCRIPTION
 -----------
@@ -70,6 +71,25 @@ OPTIONS
 	`-m`, and from file with `-C` are usually used as the
 	commit log message unmodified.  This option lets you
 	further edit the message taken from these sources.
+
+--amend::
+
+	Used to amend the tip of the current branch. Prepare the tree
+	object you would want to replace the latest commit as usual
+	(this includes the usual -i/-o and explicit paths), and the
+	commit log editor is seeded with the commit message from the
+	tip of the current branch. The commit you create replaces the
+	current tip -- if it was a merge, it will have the parents of
+	the current tip as parents -- so the current top commit is
+	discarded.
+
+	It is a rough equivalent for:
+
+		$ git reset --soft HEAD^
+		$ ... do something else to come up with the right tree ...
+		$ git commit -c ORIG_HEAD
+
+	but can be used to amend a merge commit.
 
 -i|--include::
 	Instead of committing only the files specified on the
-- 
1.3.0.rc2.gca38

^ permalink raw reply related

* Re: How should I handle binary file with GIT
From: Randal L. Schwartz @ 2006-04-05 18:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslor27n4.fsf@assigned-by-dhcp.cox.net>

>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:

Junio> If we wanted to use the patch+diff (i.e. "format-patch,
Junio> send-email, and then am" workflow) to transfer new version of
Junio> binary files to a recipient, which I think is useful in some
Junio> projects, the sanest way to handle this is probably to add
Junio> Nico's delta, going from preimage to postimage, encoded for
Junio> safer transport, to our diff output.

This is what I was looking for, and thanks for confirming that at least within
a local respository, everything already works.  Yeay.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Junio C Hamano @ 2006-04-05 18:34 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: git
In-Reply-To: <86wte4rq3d.fsf@blue.stonehenge.com>

merlyn@stonehenge.com (Randal L. Schwartz) writes:

> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

First of all, binary files are handled by cherry-pick and merge
without needing to involve "diff"+"patch" (which is not so
useful for binary files anyway).  They use 3-way read-tree merge
which compares the object names and leave the index unmerged if
there are conflicting changes, so you should be able to sort it
out by running up to three "git-cat-file blob $sha1".

What involves "diff"+"patch" are rebases and processing mailed-in
patches as in the example by the original poster.

In our diff output, we record the blob object name of preimage
and postimage, along with filemode, on the "index" line.
git-apply does not do anything with it by default, but if:

 - --binary flag is given,

 - the postimage blob is already available locally, and,

 - the file the patch is being applied to is the same as the
   recorded preimage,

then the file is _replaced_ with the postimage.

This is good enough for git-rebase (which uses format-patch
piped to am) and is safe (we do not "apply delta" -- only
replace when the file "being patched" matches the recorded
preimage).  It does not do any good for transferring a postimage
that the person who applies the patch does not yet have.

I think "applying delta" to a binary file is not very useful
thing to do.  Depending on the nature of the file being patched,
it may produce a perfectly good result, but verifying if the
result makes sense by the end user and hand-fixing it if does
not, which can be done for text files, is near impossible for
binary files.  "replace with postimage only when you are
applying to the same preimage" rule would be the only practical,
sane thing.

If we wanted to use the patch+diff (i.e. "format-patch,
send-email, and then am" workflow) to transfer new version of
binary files to a recipient, which I think is useful in some
projects, the sanest way to handle this is probably to add
Nico's delta, going from preimage to postimage, encoded for
safer transport, to our diff output.  For safety and sanity, we
will not "apply" the patch unless the patched file exactly
matches the preimage that is recorded in the diff, and as long
as the recipient has the preimage, such a patch would be able to
reproduce the postimage and hopefully be smaller than
transferring the whole thing.

We've been trying to keep our diff output reversible (e.g. we
show what the filemode of the preimage is), so if we take the
above route, it probably should record deltas for both going
from preimage to postimage _and_ going the other way (unless
xdelta can be applied in-reverse, which I do not think is the
case).

Of course, to be _completely_ generic, you could include both
compressed then uuencoded preimage and postimage, and let the
recipient sort it out.  An advantage of that approach is that
the applicability of such a "patch" improves as the tools to
apply it improve, after the patch was originally generated.  I
however think that is only a theoretical advantage, not a very
practical one.

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Nicolas Pitre @ 2006-04-05 16:25 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Randal L. Schwartz, Jakub Narebski, git
In-Reply-To: <20060405155528.GI14625@spearce.org>

On Wed, 5 Apr 2006, Shawn Pearce wrote:

> The clearly safe approach is to include the full SHA1 ID of the
> old object the patch was created from and use the xdelta in the
> patch only as a means of transporting a compressed form of the new
> version of the object.  If git-diff starts to export say a base 64
> encoding of the xdelta then it should also include the full SHA1
> ID for binary files, even if --full-index wasn't given.
> 
> git-apply should only apply an xdelta patch to the exact same
> old object.  If the tree currently has a different object at that
> path then reject the patch entirely.

Amen.  Exactly what I just said.


Nicolas

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Nicolas Pitre @ 2006-04-05 16:21 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: Jakub Narebski, git
In-Reply-To: <86wte4rq3d.fsf@blue.stonehenge.com>

On Wed, 5 Apr 2006, Randal L. Schwartz wrote:

> >>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
> 
> >> IIRC bsdiff is used by Firefox to distribute binary software updates.
> >> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
> >> supposedly offers worse compression (bigger diffs).
> 
> Nicolas> We already have our own delta code for pack storage.
> 
> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

First of all, does cherry-picking binary patches is a sensible thing to 
do?

Do you expect, say, a Word document, a JPEG image, or an MP3 file to 
still be valid and error free if two binary patches modifying a 
different part of the same file (same revision) are successively 
applied?  I seriously doubt it.

And what do you do with conflicts?  Using diff3 might be sensible for 
text data, but for binaries you really need a tool that understands the 
type of data your binary contains, which means one tool for each 
possible type of binary data which is outside the scope of GIT.

For example, if you patch a .wav file adding some data, then you end up 
with the additional samples and a new length in the file header.  If 
another patch to that .wav is applied, then it is easy to find the 
"surrounding context" where the second patch is adding/removing some 
other samples, but then you really needs knowledge about the .wav format 
to handle the conflict that will occur on the .wav header modification.

And so on for all possible binary types.

So IMHO a binary patch format is only useful for easy _transport_ along 
with other text patches.  And the binary patch must either apply 
perfectly against the same source file or it must not apply at all.  
That's the only sensible accommodation we can do with a generic binary 
patch format.

When the patch doesn't apply to your tree, then nothing prevents you 
from hooking a dedicated tool that will pick up the original file, the 
reconstructed remote version according to the binary patch you received 
and your own modified version so that tool can process them and do the 
necessary changes with proper knowledge of the data format.


Nicolas

^ permalink raw reply

* unchecked uses of strdup
From: Jim Meyering @ 2006-04-05 16:02 UTC (permalink / raw)
  To: git
In-Reply-To: <1144165927.30675.32.camel@dv>

There are pretty many uses of strdup in git's sources.
Here's one that can cause trouble if it ever returns NULL:

    [from fsck-objects.c]
    static int fsck_head_link(void)
    {
            unsigned char sha1[20];
            const char *git_HEAD = strdup(git_path("HEAD"));
            const char *git_refs_heads_master = resolve_ref(git_HEAD, sha1, 1);

The problem is that resolve_ref does an unconditional `stat'
on the parameter corresponding to the maybe-NULL git_HEAD.

One solution is to change such uses of strdup to uses of xstrdup.

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Shawn Pearce @ 2006-04-05 15:55 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: Nicolas Pitre, Jakub Narebski, git
In-Reply-To: <86wte4rq3d.fsf@blue.stonehenge.com>

"Randal L. Schwartz" <merlyn@stonehenge.com> wrote:
> >>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
> 
> >> IIRC bsdiff is used by Firefox to distribute binary software updates.
> >> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
> >> supposedly offers worse compression (bigger diffs).
> 
> Nicolas> We already have our own delta code for pack storage.
> 
> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

The clearly safe approach is to include the full SHA1 ID of the
old object the patch was created from and use the xdelta in the
patch only as a means of transporting a compressed form of the new
version of the object.  If git-diff starts to export say a base 64
encoding of the xdelta then it should also include the full SHA1
ID for binary files, even if --full-index wasn't given.

git-apply should only apply an xdelta patch to the exact same
old object.  If the tree currently has a different object at that
path then reject the patch entirely.

If a path has a different object then the patch was based on then
we can do one of two things to be ``nice'' to the human:

  - If the old blob exists in the repository (it just isn't the
  current version at that path) then generate a temporary merge
  file holding the old blob with the delta applied.  The user can
  then finish the merge with whatever tool understands that binary
  file format, or do the merge by hand.

  - Supply a ``do it anyway'' flag to git-apply.  If this flag is
  given on the command line then the binary file is patched even
  though the object versions differ.  For some binary file formats
  this may actually be a valid thing to do.  But it probably isn't
  for a very large percentage of known file formats.

I could see some cases where it might be nice to be able to perform
specialized merge handling of binary files via hooks or filters.

For example *.tar.gz, *.zip, *.jar - these files are all just
compressed trees.  They should be somewhat mergeable with the same
semantics as other trees in GIT.  Of course one could just unpack
these into a directory and let GIT track the directory instead,
but this is rather inconvenient in a Java project.  :-)

If I recall correctly OpenOffice document files are XML compressed
into ZIP archives.  The XML *might* diff/patch cleanly as plain text.
The other resources in that archive are typically binary graphic
files and the like, which of course wouldn't diff/patch nicely.
But being able to diff/patch the main content might be semi-useful.

-- 
Shawn.

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Randal L. Schwartz @ 2006-04-05 15:37 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.64.0604051131010.2550@localhost.localdomain>

>>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:

>> IIRC bsdiff is used by Firefox to distribute binary software updates.
>> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
>> supposedly offers worse compression (bigger diffs).

Nicolas> We already have our own delta code for pack storage.

I think the issue is related to being able to cherry-pick and merge
when binaries are involved.  I've been worried about that myself.
How well are binaries supported these days for all the operations
we're taking for granted?  When is a "diff" expected to be a real
"diff" and not just "binary files differ"?

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Nicolas Pitre @ 2006-04-05 15:32 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e10mn9$cjs$1@sea.gmane.org>

On Wed, 5 Apr 2006, Jakub Narebski wrote:

> Junio C Hamano wrote:
> 
> > It _might_ make sense to adopt a well-defined binary patch
> > format (or if there is no prior art, introduce our own) and
> > support that format with both git-diff-* brothers and git-apply,
> > but that would be a bit longer term project.
> 
> bsdiff? http://www.daemonology.net/bsdiff/
> EDelta? http://www.diku.dk/~jacobg/edelta/
> Xdelta? http://xdelta.blogspot.com/
> 
> IIRC bsdiff is used by Firefox to distribute binary software updates.
> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
> supposedly offers worse compression (bigger diffs).

We already have our own delta code for pack storage.


Nicolas

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Jakub Narebski @ 2006-04-05 15:11 UTC (permalink / raw)
  To: git
In-Reply-To: <7v3bgs4exz.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> It _might_ make sense to adopt a well-defined binary patch
> format (or if there is no prior art, introduce our own) and
> support that format with both git-diff-* brothers and git-apply,
> but that would be a bit longer term project.

bsdiff? http://www.daemonology.net/bsdiff/
EDelta? http://www.diku.dk/~jacobg/edelta/
Xdelta? http://xdelta.blogspot.com/

IIRC bsdiff is used by Firefox to distribute binary software updates.
Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
supposedly offers worse compression (bigger diffs).

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* [PATCH] Avoid a crash if realloc returns a different pointer.
From: Mike McCormack @ 2006-04-05 14:22 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 80 bytes --]

---

  imap-send.c |    1 +
  1 files changed, 1 insertions(+), 0 deletions(-)


[-- Attachment #2: 235cf581a853777fdb6886806ddbfcd9f782eb98.diff --]
[-- Type: text/x-patch, Size: 369 bytes --]

235cf581a853777fdb6886806ddbfcd9f782eb98
diff --git a/imap-send.c b/imap-send.c
index f3cb79b..d04259a 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1202,6 +1202,7 @@ read_message( FILE *f, msg_data_t *msg )
 			p = xrealloc(msg->data, len+1);
 			if (!p)
 				break;
+			msg->data = p;
 		}
 		r = fread( &msg->data[msg->len], 1, len - msg->len, f );
 		if (r <= 0)


^ permalink raw reply related

* [PATCH] Avoid a divide by zero if there's no messages to send.
From: Mike McCormack @ 2006-04-05 14:22 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 86 bytes --]

---

  imap-send.c |    7 ++++++-
  1 files changed, 6 insertions(+), 1 deletions(-)


[-- Attachment #2: 8f910a131c905720e9640ddecfd8f85927ddc660.diff --]
[-- Type: text/x-patch, Size: 643 bytes --]

8f910a131c905720e9640ddecfd8f85927ddc660
diff --git a/imap-send.c b/imap-send.c
index d04259a..52e2400 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1333,6 +1333,12 @@ main(int argc, char **argv)
 		return 1;
 	}
 
+	total = count_messages( &all_msgs );
+	if (!total) {
+		fprintf(stderr,"no messages to send\n");
+		return 1;
+	}
+
 	/* write it to the imap server */
 	ctx = imap_open_store( &server );
 	if (!ctx) {
@@ -1340,7 +1346,6 @@ main(int argc, char **argv)
 		return 1;
 	}
 
-	total = count_messages( &all_msgs );
 	fprintf( stderr, "sending %d message%s\n", total, (total!=1)?"s":"" );
 	ctx->name = imap_folder;
 	while (1) {


^ permalink raw reply related

* [PATCH] Fix compile with expat, but an old curl version
From: Johannes Schindelin @ 2006-04-05 14:22 UTC (permalink / raw)
  To: git, junkio


With an old curl version, git-http-push is not compiled. But git-http-fetch
still needs to be linked with expat if NO_EXPAT is not defined.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>

---

 Makefile |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

57918780ecdff0c767a22b7589ff1025de6cb40f
diff --git a/Makefile b/Makefile
index 3596445..557d322 100644
--- a/Makefile
+++ b/Makefile
@@ -333,9 +333,11 @@ ifndef NO_CURL
 	curl_check := $(shell (echo 070908; curl-config --vernum) | sort -r | sed -ne 2p)
 	ifeq "$(curl_check)" "070908"
 		ifndef NO_EXPAT
-			EXPAT_LIBEXPAT = -lexpat
 			PROGRAMS += git-http-push$X
 		endif
+	endif
+	ifndef NO_EXPAT
+		EXPAT_LIBEXPAT = -lexpat
 	endif
 endif
 
-- 
1.3.0.rc2.g4a16-dirty

^ permalink raw reply related

* Re: Cygwin can't handle huge packfiles?
From: Johannes Schindelin @ 2006-04-05 14:14 UTC (permalink / raw)
  To: Kees-Jan Dijkzeul; +Cc: git
In-Reply-To: <fa0b6e200604050624h13ebd8deg241ae98cef1f5a74@mail.gmail.com>

Hi,

On Wed, 5 Apr 2006, Kees-Jan Dijkzeul wrote:

> On 4/3/06, Linus Torvalds <torvalds@osdl.org> wrote:
> [...]
> > That's not hugely fundamental, but I didn't expect people to hit it this
> > quickly. What kind of project has a 1.5GB pack-file _already_? I hope it's
> > fifteen years of history (so that we'll have another fifteen years before
> > we'll have to worry about 4GB pack-files ;)
> 
> I'm trying to get Git to manage my companies source tree. We're
> writing software for digital TV sets. Anyway, the archive is about 5Gb
> in size and contains binaries, zip files, excel sheets meeting minutes
> and whatnot. So it doesn't compress very well. The 1.5Gb pack file
> hardly contains any history at all (five commits or so). On the flip
> side, for now I'll be the only one adding to the archive, so at least
> it will not grow that fast ;-)
> 
> Anyway, to reconstitute the tree, I need very nearly the entire pack,
> so limiting the pack size won't do much good, as git will still try to
> allocate a total of 1.5Gb memory (which, unfortunately, isn't there
> :-)
> 
> Inspired by a patch of Alex Riesen (thanks, Alex), I tried to use the
> regular mmap for mapping pack files, only to discover that I compile
> without defining "NO_MMAP", so I've been using the stock mmap all
> along. So now I'm thinking that the cygwin mmap also does a
> malloc-and-read, just like git does with NO_MMAP. So I'll continue to
> investigate in that direction.

I think cygwin's mmap() is based on the Win32 API equivalent, which could 
mean that it *is* memory mapped, but in a special area (which is smaller 
than 1.5 gigabyte). In this case, it would make sense to limit the pack 
size, thereby having several packs, and mmap() them as they are needed.

Hth,
Dscho

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: moreau francis @ 2006-04-05 13:35 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0604050906590.2550@localhost.localdomain>


--- Nicolas Pitre <nico@cam.org> a écrit :

> On Wed, 5 Apr 2006, moreau francis wrote:
> > 
> > well maybe it's just stupid, but why not simply transforming binary files
> into
> > ascii files (maybe by using uuencode) before  using git-diff-* brothers and
> > git-apply ?
> 
> Imagine if the only difference between two versions of the same file is 
> a single byte inserted at the very beginning.  The uuencode would then 
> be totally different between the two files.
> 

ok uuencode was just a bad example for encoding...

Francis



	

	
		
___________________________________________________________________________ 
Nouveau : téléphonez moins cher avec Yahoo! Messenger ! Découvez les tarifs exceptionnels pour appeler la France et l'international.
Téléchargez sur http://fr.messenger.yahoo.com

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Nicolas Pitre @ 2006-04-05 13:25 UTC (permalink / raw)
  To: moreau francis; +Cc: Junio C Hamano, git
In-Reply-To: <20060405122113.60376.qmail@web25801.mail.ukl.yahoo.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 723 bytes --]

On Wed, 5 Apr 2006, moreau francis wrote:

> 
> --- Junio C Hamano <junkio@cox.net> a écrit :
> 
> > It _might_ make sense to adopt a well-defined binary patch
> > format (or if there is no prior art, introduce our own) and
> > support that format with both git-diff-* brothers and git-apply,
> > but that would be a bit longer term project.
> > 
> 
> well maybe it's just stupid, but why not simply transforming binary files into
> ascii files (maybe by using uuencode) before  using git-diff-* brothers and
> git-apply ?

Imagine if the only difference between two versions of the same file is 
a single byte inserted at the very beginning.  The uuencode would then 
be totally different between the two files.


Nicolas

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox