Git development
 help / color / mirror / Atom feed
* Re: How should I handle binary file with GIT
From: Randal L. Schwartz @ 2006-04-05 15:37 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.64.0604051131010.2550@localhost.localdomain>

>>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:

>> IIRC bsdiff is used by Firefox to distribute binary software updates.
>> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
>> supposedly offers worse compression (bigger diffs).

Nicolas> We already have our own delta code for pack storage.

I think the issue is related to being able to cherry-pick and merge
when binaries are involved.  I've been worried about that myself.
How well are binaries supported these days for all the operations
we're taking for granted?  When is a "diff" expected to be a real
"diff" and not just "binary files differ"?

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Shawn Pearce @ 2006-04-05 15:55 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: Nicolas Pitre, Jakub Narebski, git
In-Reply-To: <86wte4rq3d.fsf@blue.stonehenge.com>

"Randal L. Schwartz" <merlyn@stonehenge.com> wrote:
> >>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
> 
> >> IIRC bsdiff is used by Firefox to distribute binary software updates.
> >> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
> >> supposedly offers worse compression (bigger diffs).
> 
> Nicolas> We already have our own delta code for pack storage.
> 
> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

The clearly safe approach is to include the full SHA1 ID of the
old object the patch was created from and use the xdelta in the
patch only as a means of transporting a compressed form of the new
version of the object.  If git-diff starts to export say a base 64
encoding of the xdelta then it should also include the full SHA1
ID for binary files, even if --full-index wasn't given.

git-apply should only apply an xdelta patch to the exact same
old object.  If the tree currently has a different object at that
path then reject the patch entirely.

If a path has a different object then the patch was based on then
we can do one of two things to be ``nice'' to the human:

  - If the old blob exists in the repository (it just isn't the
  current version at that path) then generate a temporary merge
  file holding the old blob with the delta applied.  The user can
  then finish the merge with whatever tool understands that binary
  file format, or do the merge by hand.

  - Supply a ``do it anyway'' flag to git-apply.  If this flag is
  given on the command line then the binary file is patched even
  though the object versions differ.  For some binary file formats
  this may actually be a valid thing to do.  But it probably isn't
  for a very large percentage of known file formats.

I could see some cases where it might be nice to be able to perform
specialized merge handling of binary files via hooks or filters.

For example *.tar.gz, *.zip, *.jar - these files are all just
compressed trees.  They should be somewhat mergeable with the same
semantics as other trees in GIT.  Of course one could just unpack
these into a directory and let GIT track the directory instead,
but this is rather inconvenient in a Java project.  :-)

If I recall correctly OpenOffice document files are XML compressed
into ZIP archives.  The XML *might* diff/patch cleanly as plain text.
The other resources in that archive are typically binary graphic
files and the like, which of course wouldn't diff/patch nicely.
But being able to diff/patch the main content might be semi-useful.

-- 
Shawn.

^ permalink raw reply

* unchecked uses of strdup
From: Jim Meyering @ 2006-04-05 16:02 UTC (permalink / raw)
  To: git
In-Reply-To: <1144165927.30675.32.camel@dv>

There are pretty many uses of strdup in git's sources.
Here's one that can cause trouble if it ever returns NULL:

    [from fsck-objects.c]
    static int fsck_head_link(void)
    {
            unsigned char sha1[20];
            const char *git_HEAD = strdup(git_path("HEAD"));
            const char *git_refs_heads_master = resolve_ref(git_HEAD, sha1, 1);

The problem is that resolve_ref does an unconditional `stat'
on the parameter corresponding to the maybe-NULL git_HEAD.

One solution is to change such uses of strdup to uses of xstrdup.

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Nicolas Pitre @ 2006-04-05 16:21 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: Jakub Narebski, git
In-Reply-To: <86wte4rq3d.fsf@blue.stonehenge.com>

On Wed, 5 Apr 2006, Randal L. Schwartz wrote:

> >>>>> "Nicolas" == Nicolas Pitre <nico@cam.org> writes:
> 
> >> IIRC bsdiff is used by Firefox to distribute binary software updates.
> >> Xdelta is generic (not optimized for binaries like bsdiff and edelta), but
> >> supposedly offers worse compression (bigger diffs).
> 
> Nicolas> We already have our own delta code for pack storage.
> 
> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

First of all, does cherry-picking binary patches is a sensible thing to 
do?

Do you expect, say, a Word document, a JPEG image, or an MP3 file to 
still be valid and error free if two binary patches modifying a 
different part of the same file (same revision) are successively 
applied?  I seriously doubt it.

And what do you do with conflicts?  Using diff3 might be sensible for 
text data, but for binaries you really need a tool that understands the 
type of data your binary contains, which means one tool for each 
possible type of binary data which is outside the scope of GIT.

For example, if you patch a .wav file adding some data, then you end up 
with the additional samples and a new length in the file header.  If 
another patch to that .wav is applied, then it is easy to find the 
"surrounding context" where the second patch is adding/removing some 
other samples, but then you really needs knowledge about the .wav format 
to handle the conflict that will occur on the .wav header modification.

And so on for all possible binary types.

So IMHO a binary patch format is only useful for easy _transport_ along 
with other text patches.  And the binary patch must either apply 
perfectly against the same source file or it must not apply at all.  
That's the only sensible accommodation we can do with a generic binary 
patch format.

When the patch doesn't apply to your tree, then nothing prevents you 
from hooking a dedicated tool that will pick up the original file, the 
reconstructed remote version according to the binary patch you received 
and your own modified version so that tool can process them and do the 
necessary changes with proper knowledge of the data format.


Nicolas

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Nicolas Pitre @ 2006-04-05 16:25 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Randal L. Schwartz, Jakub Narebski, git
In-Reply-To: <20060405155528.GI14625@spearce.org>

On Wed, 5 Apr 2006, Shawn Pearce wrote:

> The clearly safe approach is to include the full SHA1 ID of the
> old object the patch was created from and use the xdelta in the
> patch only as a means of transporting a compressed form of the new
> version of the object.  If git-diff starts to export say a base 64
> encoding of the xdelta then it should also include the full SHA1
> ID for binary files, even if --full-index wasn't given.
> 
> git-apply should only apply an xdelta patch to the exact same
> old object.  If the tree currently has a different object at that
> path then reject the patch entirely.

Amen.  Exactly what I just said.


Nicolas

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Junio C Hamano @ 2006-04-05 18:34 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: git
In-Reply-To: <86wte4rq3d.fsf@blue.stonehenge.com>

merlyn@stonehenge.com (Randal L. Schwartz) writes:

> I think the issue is related to being able to cherry-pick and merge
> when binaries are involved.  I've been worried about that myself.
> How well are binaries supported these days for all the operations
> we're taking for granted?  When is a "diff" expected to be a real
> "diff" and not just "binary files differ"?

First of all, binary files are handled by cherry-pick and merge
without needing to involve "diff"+"patch" (which is not so
useful for binary files anyway).  They use 3-way read-tree merge
which compares the object names and leave the index unmerged if
there are conflicting changes, so you should be able to sort it
out by running up to three "git-cat-file blob $sha1".

What involves "diff"+"patch" are rebases and processing mailed-in
patches as in the example by the original poster.

In our diff output, we record the blob object name of preimage
and postimage, along with filemode, on the "index" line.
git-apply does not do anything with it by default, but if:

 - --binary flag is given,

 - the postimage blob is already available locally, and,

 - the file the patch is being applied to is the same as the
   recorded preimage,

then the file is _replaced_ with the postimage.

This is good enough for git-rebase (which uses format-patch
piped to am) and is safe (we do not "apply delta" -- only
replace when the file "being patched" matches the recorded
preimage).  It does not do any good for transferring a postimage
that the person who applies the patch does not yet have.

I think "applying delta" to a binary file is not very useful
thing to do.  Depending on the nature of the file being patched,
it may produce a perfectly good result, but verifying if the
result makes sense by the end user and hand-fixing it if does
not, which can be done for text files, is near impossible for
binary files.  "replace with postimage only when you are
applying to the same preimage" rule would be the only practical,
sane thing.

If we wanted to use the patch+diff (i.e. "format-patch,
send-email, and then am" workflow) to transfer new version of
binary files to a recipient, which I think is useful in some
projects, the sanest way to handle this is probably to add
Nico's delta, going from preimage to postimage, encoded for
safer transport, to our diff output.  For safety and sanity, we
will not "apply" the patch unless the patched file exactly
matches the preimage that is recorded in the diff, and as long
as the recipient has the preimage, such a patch would be able to
reproduce the postimage and hopefully be smaller than
transferring the whole thing.

We've been trying to keep our diff output reversible (e.g. we
show what the filemode of the preimage is), so if we take the
above route, it probably should record deltas for both going
from preimage to postimage _and_ going the other way (unless
xdelta can be applied in-reverse, which I do not think is the
case).

Of course, to be _completely_ generic, you could include both
compressed then uuencoded preimage and postimage, and let the
recipient sort it out.  An advantage of that approach is that
the applicability of such a "patch" improves as the tools to
apply it improve, after the patch was originally generated.  I
however think that is only a theoretical advantage, not a very
practical one.

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Randal L. Schwartz @ 2006-04-05 18:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslor27n4.fsf@assigned-by-dhcp.cox.net>

>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:

Junio> If we wanted to use the patch+diff (i.e. "format-patch,
Junio> send-email, and then am" workflow) to transfer new version of
Junio> binary files to a recipient, which I think is useful in some
Junio> projects, the sanest way to handle this is probably to add
Junio> Nico's delta, going from preimage to postimage, encoded for
Junio> safer transport, to our diff output.

This is what I was looking for, and thanks for confirming that at least within
a local respository, everything already works.  Yeay.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

^ permalink raw reply

* [PATCH] git-commit: document --append
From: Marco Roeland @ 2006-04-05 19:16 UTC (permalink / raw)
  To: git

The "--amend" option is used to amend the tip of the current branch. This
documentation text was copied straight from the commit that implemented it.

Signed-off-by: Marco Roeland <marco.roeland@xs4all.nl>

---

 Documentation/git-commit.txt |   22 +++++++++++++++++++++-
 1 files changed, 21 insertions(+), 1 deletions(-)

ca7d3b4fdd0cb24b7353da312fb9306531468f54
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index d04b342..3701cb3 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -9,7 +9,8 @@ SYNOPSIS
 --------
 [verse]
 'git-commit' [-a] [-s] [-v] [(-c | -C) <commit> | -F <file> | -m <msg>]
-	   [-e] [--author <author>] [--] [[-i | -o ]<file>...]
+	   [--no-verify] [--amend] [-e] [--author <author>]
+	   [--] [[-i | -o ]<file>...]
 
 DESCRIPTION
 -----------
@@ -70,6 +71,25 @@ OPTIONS
 	`-m`, and from file with `-C` are usually used as the
 	commit log message unmodified.  This option lets you
 	further edit the message taken from these sources.
+
+--amend::
+
+	Used to amend the tip of the current branch. Prepare the tree
+	object you would want to replace the latest commit as usual
+	(this includes the usual -i/-o and explicit paths), and the
+	commit log editor is seeded with the commit message from the
+	tip of the current branch. The commit you create replaces the
+	current tip -- if it was a merge, it will have the parents of
+	the current tip as parents -- so the current top commit is
+	discarded.
+
+	It is a rough equivalent for:
+
+		$ git reset --soft HEAD^
+		$ ... do something else to come up with the right tree ...
+		$ git commit -c ORIG_HEAD
+
+	but can be used to amend a merge commit.
 
 -i|--include::
 	Instead of committing only the files specified on the
-- 
1.3.0.rc2.gca38

^ permalink raw reply related

* Re: How should I handle binary file with GIT
From: Marco Roeland @ 2006-04-05 19:23 UTC (permalink / raw)
  To: moreau francis; +Cc: Junio C Hamano, git
In-Reply-To: <20060405131834.60888.qmail@web25804.mail.ukl.yahoo.com>

On Wednesday April 5th 2006 moreau francis wrote:

> BTW, what does "--amend" option do ? It doesn't seem to be documented anywhere.

This is the original commit text that introduced it:

diff-tree b4019f045646b1770a80394da876b8a7c6b8ca7b (from d320a5437f8304cf9ea3ee1898e49d643e005738)
Author: Junio C Hamano <junkio@cox.net>
Date:   Thu Mar 2 21:04:05 2006 -0800

    git-commit --amend
    
    The new flag is used to amend the tip of the current branch.  Prepare
    the tree object you would want to replace the latest commit as usual
    (this includes the usual -i/-o and explicit paths), and the commit log
    editor is seeded with the commit message from the tip of the current
    branch.  The commit you create replaces the current tip -- if it was a
    merge, it will have the parents of the current tip as parents -- so the
    current top commit is discarded.
    
    It is a rough equivalent for:
    
    	$ git reset --soft HEAD^
    	$ ... do something else to come up with the right tree ...
    	$ git commit -c ORIG_HEAD
    
    but can be used to amend a merge commit.
    
    Signed-off-by: Junio C Hamano <junkio@cox.net>

So in the original context you can add separate binaries to a commit
of only text files that you just rescued from CVS or something and then
change the commit to include these binaries as well.

I've sent a separate patch for the documentation for git-commit using
Junio's clear explanation.
-- 
Marco Roeland

^ permalink raw reply

* Re: How should I handle binary file with GIT
From: Nicolas Pitre @ 2006-04-05 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Randal L. Schwartz, git
In-Reply-To: <7vslor27n4.fsf@assigned-by-dhcp.cox.net>

On Wed, 5 Apr 2006, Junio C Hamano wrote:

> If we wanted to use the patch+diff (i.e. "format-patch,
> send-email, and then am" workflow) to transfer new version of
> binary files to a recipient, which I think is useful in some
> projects, the sanest way to handle this is probably to add
> Nico's delta, going from preimage to postimage, encoded for
> safer transport, to our diff output.  For safety and sanity, we
> will not "apply" the patch unless the patched file exactly
> matches the preimage that is recorded in the diff, and as long
> as the recipient has the preimage, such a patch would be able to
> reproduce the postimage and hopefully be smaller than
> transferring the whole thing.

Exactly the point.

> We've been trying to keep our diff output reversible (e.g. we
> show what the filemode of the preimage is), so if we take the
> above route, it probably should record deltas for both going
> from preimage to postimage _and_ going the other way (unless
> xdelta can be applied in-reverse, which I do not think is the
> case).

You cannot reverse a delta.  However if you were able to apply a delta 
from preimage to postimage that means you must already have had preimage 
in your object store.  Therefore reverting such a patch would simply 
involve restoring preimage.

> Of course, to be _completely_ generic, you could include both
> compressed then uuencoded preimage and postimage, and let the
> recipient sort it out.

I think this is just too much and besides the point of a diff.  If the 
work flow is so convoluted such that the simple binary patch as a delta 
doesn't apply then it would probably be a better idea to simply transfer 
those binaries as email attachments.  In other words, if a binary patch 
transfer mechanism is added, it should cover the common case and leave 
the rest for a better process like git-fetch/pull.


Nicolas

^ permalink raw reply

* Re: [PATCH] git-commit: document --append
From: Junio C Hamano @ 2006-04-05 19:34 UTC (permalink / raw)
  To: Marco Roeland; +Cc: git
In-Reply-To: <20060405191608.GA20572@fiberbit.xs4all.nl>

Thanks for resurrecting this.

I suspect that some formatting tweak is needed; I recall
asciidoc needs some special formatting when multi- paragraph
description is involved in the list.

Of course, munging the patch title with s/append/amend/ would
not hurt ;-).

^ permalink raw reply

* Re: [PATCH] git-commit: document --append (amend really!)
From: Marco Roeland @ 2006-04-05 19:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Roeland, git
In-Reply-To: <7vfykr24wi.fsf@assigned-by-dhcp.cox.net>

On Wednesday April 5th 2006 Junio C Hamano wrote:

> Thanks for resurrecting this.
> 
> I suspect that some formatting tweak is needed; I recall
> asciidoc needs some special formatting when multi- paragraph
> description is involved in the list.

Here with asciidoc 7.1.2 (Debian 'sid') it looks good in the generated
man page. But I'll investigate if nobody beats me to it. Perhaps we
should develop a "sparse" like module for asciidoc.

> Of course, munging the patch title with s/append/amend/ would
> not hurt ;-).

Oops. Well I suppose I could use "git commit --amend" and then run "git
format-patch" again I suppose. ;-)
-- 
Marco Roeland

^ permalink raw reply

* Re: [PATCH] git-commit: document --append (amend really!)
From: Junio C Hamano @ 2006-04-05 19:55 UTC (permalink / raw)
  To: Marco Roeland; +Cc: git
In-Reply-To: <20060405194607.GB20854@fiberbit.xs4all.nl>

Marco Roeland <marco.roeland@xs4all.nl> writes:

> Here with asciidoc 7.1.2 (Debian 'sid') it looks good in the generated
> man page. But I'll investigate if nobody beats me to it.

Please see below for an example.

> Oops. Well I suppose I could use "git commit --amend" and then run "git
> format-patch" again I suppose. ;-)

Yup ;-).

diff-tree b0d08a504bee17dfc46f761e166ff2c20c59a91a (from 3103cf9e1e09b0045a60542f24a2a1e4ed7b1237)
Author: Francis Daly <francis@daoine.org>
Date:   Wed Mar 22 09:53:57 2006 +0000

    Format tweaks for asciidoc.
    
    Some documentation "options" were followed by independent preformatted
    paragraphs. Now they are associated plain text paragraphs. The
    difference is clear in the generated html.
    
    Signed-off-by: Junio C Hamano <junkio@cox.net>

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index fbd2394..d55456a 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -24,13 +24,13 @@ OPTIONS
 
 <option>...::
 	Either an option to pass to `grep` or `git-ls-files`.
-
-	The following are the specific `git-ls-files` options
-	that may be given: `-o`, `--cached`, `--deleted`, `--others`,
-	`--killed`, `--ignored`, `--modified`, `--exclude=*`,
-	`--exclude-from=*`, and `--exclude-per-directory=*`.
-
-	All other options will be passed to `grep`.
++
+The following are the specific `git-ls-files` options
+that may be given: `-o`, `--cached`, `--deleted`, `--others`,
+`--killed`, `--ignored`, `--modified`, `--exclude=\*`,
+`--exclude-from=\*`, and `--exclude-per-directory=\*`.
++
+All other options will be passed to `grep`.
 
 <pattern>::
 	The pattern to look for.  The first non option is taken

^ permalink raw reply related

* Re: How should I handle binary file with GIT
From: Junio C Hamano @ 2006-04-05 20:20 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Randal L. Schwartz, git
In-Reply-To: <Pine.LNX.4.64.0604051521480.2550@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> On Wed, 5 Apr 2006, Junio C Hamano wrote:
>
>> We've been trying to keep our diff output reversible (e.g. we
>> show what the filemode of the preimage is), so if we take the
>> above route, it probably should record deltas for both going
>> from preimage to postimage _and_ going the other way (unless
>> xdelta can be applied in-reverse, which I do not think is the
>> case).
>
> You cannot reverse a delta.  However if you were able to apply a delta 
> from preimage to postimage that means you must already have had preimage 
> in your object store.  Therefore reverting such a patch would simply 
> involve restoring preimage.

The case I had in mind was where you shipped a tarball of the
tip to somebody (or "a shallow clone"), and after seeing him
having problems with that release, sending him a patch telling
him "reverting this might help, could you please give it a try?"

Of course you could be nicer to him and generate the reverse
diff on your end in such a case instead.

^ permalink raw reply

* [PATCH] git-commit: document --amend
From: Marco Roeland @ 2006-04-05 20:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Roeland, git
In-Reply-To: <7vacaz23wr.fsf@assigned-by-dhcp.cox.net>

The "--amend" option is used to amend the tip of the current branch. This
documentation text was copied straight from the commit that implemented it.

Some minor format tweaks for asciidoc were taken from work by Francis Daly
in commit b0d08a5.. It looks good now also in the html page.

Signed-off-by: Marco Roeland <marco.roeland@xs4all.nl>

---

 Documentation/git-commit.txt |   24 +++++++++++++++++++++++-
 1 files changed, 23 insertions(+), 1 deletions(-)

293dccf6f8c47294a42376ad96d8c1130b06c9b9
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index d04b342..ec8b562 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -9,7 +9,8 @@ SYNOPSIS
 --------
 [verse]
 'git-commit' [-a] [-s] [-v] [(-c | -C) <commit> | -F <file> | -m <msg>]
-	   [-e] [--author <author>] [--] [[-i | -o ]<file>...]
+	   [--no-verify] [--amend] [-e] [--author <author>]
+	   [--] [[-i | -o ]<file>...]
 
 DESCRIPTION
 -----------
@@ -70,6 +71,27 @@ OPTIONS
 	`-m`, and from file with `-C` are usually used as the
 	commit log message unmodified.  This option lets you
 	further edit the message taken from these sources.
+
+--amend::
+
+	Used to amend the tip of the current branch. Prepare the tree
+	object you would want to replace the latest commit as usual
+	(this includes the usual -i/-o and explicit paths), and the
+	commit log editor is seeded with the commit message from the
+	tip of the current branch. The commit you create replaces the
+	current tip -- if it was a merge, it will have the parents of
+	the current tip as parents -- so the current top commit is
+	discarded.
++
+It is a rough equivalent for:
++
+$ git reset --soft HEAD^
++
+$ ... do something else to come up with the right tree ...
++
+$ git commit -c ORIG_HEAD
++
+but can be used to amend a merge commit.
 
 -i|--include::
 	Instead of committing only the files specified on the
-- 
1.3.0.rc2.gca38

^ permalink raw reply related

* Re: Cygwin can't handle huge packfiles?
From: Christopher Faylor @ 2006-04-05 21:08 UTC (permalink / raw)
  To: Johannes Schindelin, Kees-Jan Dijkzeul, git
In-Reply-To: <Pine.LNX.4.63.0604051612200.25304@wbgn013.biozentrum.uni-wuerzburg.de>

On Wed, Apr 05, 2006 at 04:14:20PM +0200, Johannes Schindelin wrote:
>> Inspired by a patch of Alex Riesen (thanks, Alex), I tried to use the
>> regular mmap for mapping pack files, only to discover that I compile
>> without defining "NO_MMAP", so I've been using the stock mmap all
>> along. So now I'm thinking that the cygwin mmap also does a
>> malloc-and-read, just like git does with NO_MMAP. So I'll continue to
>> investigate in that direction.
>
>I think cygwin's mmap() is based on the Win32 API equivalent, which could 
>mean that it *is* memory mapped, but in a special area (which is smaller 
>than 1.5 gigabyte). In this case, it would make sense to limit the pack 
>size, thereby having several packs, and mmap() them as they are needed.

Yes, cygwin's mmap uses CreateFileMapping and MapViewOfFile.  IIRC,
Windows might have a 2G limitation lurking under the hood somewhere but
I think that might be tweakable with some registry setting.

cgf

^ permalink raw reply

* [PATCH] Tweaks to make asciidoc play nice.
From: Francis Daly @ 2006-04-05 22:25 UTC (permalink / raw)
  To: git; +Cc: Marco Roeland

Once the content has been generated, the formatting elves can reorder
it to be pretty...

Signed-off-by: Francis Daly <francis@daoine.org>

---

The manpage formatting needed to stop things being ugly is
nontrivial. In this case, an option has multi-paragraph content,
some of which should be displayed preformatted.

"+" means "this next paragraph is a continuation of the same section"
"--" means "the next paragraphs are all continuations -- don't worry
     about all the other "+"s
"------" means "the next content is preformatted"
"------", the blank line, and the extra indentation within ------ are 
     there because docbook-xsl stylesheets are odd in different ways.
     This combination makes the preformatted text be visually distinct 
     with both versions 1.68 and 1.69. Without that oddness, we could
     use either just ------ or just indentation. Ho hum.

 Documentation/git-commit.txt |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

e46caf57bc2d09c032b992f9024226acd2d68fbc
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index ec8b562..0a7365b 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -83,15 +83,16 @@ OPTIONS
 	the current tip as parents -- so the current top commit is
 	discarded.
 +
+--
 It is a rough equivalent for:
-+
-$ git reset --soft HEAD^
-+
-$ ... do something else to come up with the right tree ...
-+
-$ git commit -c ORIG_HEAD
-+
+------
+	$ git reset --soft HEAD^
+	$ ... do something else to come up with the right tree ...
+	$ git commit -c ORIG_HEAD
+
+------
 but can be used to amend a merge commit.
+--
 
 -i|--include::
 	Instead of committing only the files specified on the
-- 
1.2.4.g3488

-- 
Francis Daly        francis@daoine.org

^ permalink raw reply related

* [RFC/PATCH] date parsing: be friendlier to our European friends.
From: Junio C Hamano @ 2006-04-05 22:39 UTC (permalink / raw)
  To: git
In-Reply-To: <7virpo4jxf.fsf@assigned-by-dhcp.cox.net>

This does three things, only applies to cases where the user
manually tries to override the author/commit time by environment
variables, with non-ISO, non-2822 format date-string:

 - Refuses to use the interpretation to put the date in the
   future; recent kernel history has a commit made with
   10/03/2006 which is recorded as October 3rd.

 - Adds '.' as the possible year-month-date separator.  We
   learned from our European friends on the #git channel that
   dd.mm.yyyy is the norm there.

 - When the separator is '.', we prefer dd.mm.yyyy over
   mm.dd.yyyy; otherwise mm/dd/yy[yy] takes precedence over
   dd/mm/yy[yy].

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * This is more of a RFC than ready-to-be-merged patch.
   Alternative patches and improvements are welcome.

 date.c |   77 +++++++++++++++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 56 insertions(+), 21 deletions(-)

b9065540826426ac0e4959e869ba7e08d1ae65d8
diff --git a/date.c b/date.c
index 376d25d..034d722 100644
--- a/date.c
+++ b/date.c
@@ -197,26 +197,43 @@ static int match_alpha(const char *date,
 	return skip_alpha(date);
 }
 
-static int is_date(int year, int month, int day, struct tm *tm)
+static int is_date(int year, int month, int day, struct tm *now_tm, time_t now, struct tm *tm)
 {
 	if (month > 0 && month < 13 && day > 0 && day < 32) {
+		struct tm check = *tm;
+		struct tm *r = (now_tm ? &check : tm);
+		time_t specified;
+
+		r->tm_mon = month - 1;
+		r->tm_mday = day;
 		if (year == -1) {
-			tm->tm_mon = month-1;
-			tm->tm_mday = day;
-			return 1;
+			if (!now_tm)
+				return 1;
+			r->tm_year = now_tm->tm_year;
 		}
-		if (year >= 1970 && year < 2100) {
-			year -= 1900;
-		} else if (year > 70 && year < 100) {
-			/* ok */
-		} else if (year < 38) {
-			year += 100;
-		} else
+		else if (year >= 1970 && year < 2100)
+			r->tm_year = year - 1900;
+		else if (year > 70 && year < 100)
+			r->tm_year = year;
+		else if (year < 38)
+			r->tm_year = year + 100;
+		else
 			return 0;
+		if (!now_tm)
+			return 1;
+
+		specified = my_mktime(r);
 
-		tm->tm_mon = month-1;
-		tm->tm_mday = day;
-		tm->tm_year = year;
+		/* Be it commit time or author time, it does not make
+		 * sense to specify timestamp way into the future.  Make
+		 * sure it is not later than ten days from now...
+		 */
+		if (now + 10*24*3600 < specified)
+			return 0;
+		tm->tm_mon = r->tm_mon;
+		tm->tm_mday = r->tm_mday;
+		if (year != -1)
+			tm->tm_year = r->tm_year;
 		return 1;
 	}
 	return 0;
@@ -224,6 +241,9 @@ static int is_date(int year, int month, 
 
 static int match_multi_number(unsigned long num, char c, const char *date, char *end, struct tm *tm)
 {
+	time_t now;
+	struct tm now_tm;
+	struct tm *refuse_future;
 	long num2, num3;
 
 	num2 = strtol(end+1, &end, 10);
@@ -246,19 +266,33 @@ static int match_multi_number(unsigned l
 
 	case '-':
 	case '/':
+	case '.':
+		now = time(NULL);
+		refuse_future = NULL;
+		if (gmtime_r(&now, &now_tm))
+			refuse_future = &now_tm;
+
 		if (num > 70) {
 			/* yyyy-mm-dd? */
-			if (is_date(num, num2, num3, tm))
+			if (is_date(num, num2, num3, refuse_future, now, tm))
 				break;
 			/* yyyy-dd-mm? */
-			if (is_date(num, num3, num2, tm))
+			if (is_date(num, num3, num2, refuse_future, now, tm))
 				break;
 		}
-		/* mm/dd/yy ? */
-		if (is_date(num3, num, num2, tm))
+		/* Our eastern European friends say dd.mm.yy[yy]
+		 * is the norm there, so giving precedence to
+		 * mm/dd/yy[yy] form only when separator is not '.'
+		 */
+		if (c != '.' &&
+		    is_date(num3, num, num2, refuse_future, now, tm))
+			break;
+		/* European dd.mm.yy[yy] or funny US dd/mm/yy[yy] */
+		if (is_date(num3, num2, num, refuse_future, now, tm))
 			break;
-		/* dd/mm/yy ? */
-		if (is_date(num3, num2, num, tm))
+		/* Funny European mm.dd.yy */
+		if (c == '.' &&
+		    is_date(num3, num, num2, refuse_future, now, tm))
 			break;
 		return 0;
 	}
@@ -288,10 +322,11 @@ static int match_digit(const char *date,
 	}
 
 	/*
-	 * Check for special formats: num[:-/]num[same]num
+	 * Check for special formats: num[-.:/]num[same]num
 	 */
 	switch (*end) {
 	case ':':
+	case '.':
 	case '/':
 	case '-':
 		if (isdigit(end[1])) {
-- 
1.3.0.rc2.g1b83

^ permalink raw reply related

* Re: [RFC/PATCH] date parsing: be friendlier to our European friends.
From: Sam Ravnborg @ 2006-04-05 22:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vlkujzly0.fsf_-_@assigned-by-dhcp.cox.net>

On Wed, Apr 05, 2006 at 03:39:35PM -0700, Junio C Hamano wrote:
> This does three things, only applies to cases where the user
> manually tries to override the author/commit time by environment
> variables, with non-ISO, non-2822 format date-string:
> 
>  - Refuses to use the interpretation to put the date in the
>    future; recent kernel history has a commit made with
>    10/03/2006 which is recorded as October 3rd.
> 
>  - Adds '.' as the possible year-month-date separator.  We
>    learned from our European friends on the #git channel that
>    dd.mm.yyyy is the norm there.

I my company we have always used yyyy-mm-dd - this is an ISO standard
IIRC. The company is European based.

mm/dd/yy has always made my head spin ;-)

	Sam

^ permalink raw reply

* Re: [RFC/PATCH] date parsing: be friendlier to our European friends.
From: Junio C Hamano @ 2006-04-05 22:54 UTC (permalink / raw)
  To: git
In-Reply-To: <7vlkujzly0.fsf_-_@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> This does three things, only applies to cases where the user
> manually tries to override the author/commit time by environment
> variables, with non-ISO, non-2822 format date-string:
>
>  - Refuses to use the interpretation to put the date in the
>    future; recent kernel history has a commit made with
>    10/03/2006 which is recorded as October 3rd.
>
>  - Adds '.' as the possible year-month-date separator.  We
>    learned from our European friends on the #git channel that
>    dd.mm.yyyy is the norm there.
>
>  - When the separator is '.', we prefer dd.mm.yyyy over
>    mm.dd.yyyy; otherwise mm/dd/yy[yy] takes precedence over
>    dd/mm/yy[yy].

Before the list gets useless comments, the code prefer to accept
more sensible and/or unambiguous forms, such as ISO or RFC2822.
The issue this addresses is what to do when we get other forms.

^ permalink raw reply

* [PATCH] Fix searching for filenames in gitk
From: Pavel Roskin @ 2006-04-05 23:02 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: git

findcont should not accept any arguments.

Signed-off-by: Pavel Roskin <proski@gnu.org>
---

 gitk |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gitk b/gitk
index 26fa79a..e1848cd 100755
--- a/gitk
+++ b/gitk
@@ -2230,7 +2230,7 @@ proc donefilediff {} {
     }
 }
 
-proc findcont {id} {
+proc findcont {} {
     global findid treediffs parentlist
     global ffileline findstartline finddidsel
     global displayorder numcommits matchinglines findinprogress

^ permalink raw reply related

* Re: Cygwin can't handle huge packfiles?
From: Rutger Nijlunsing @ 2006-04-05 23:27 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: Johannes Schindelin, Kees-Jan Dijkzeul, git
In-Reply-To: <20060405210844.GN26780@trixie.casa.cgf.cx>

On Wed, Apr 05, 2006 at 05:08:44PM -0400, Christopher Faylor wrote:
> On Wed, Apr 05, 2006 at 04:14:20PM +0200, Johannes Schindelin wrote:
> >> Inspired by a patch of Alex Riesen (thanks, Alex), I tried to use the
> >> regular mmap for mapping pack files, only to discover that I compile
> >> without defining "NO_MMAP", so I've been using the stock mmap all
> >> along. So now I'm thinking that the cygwin mmap also does a
> >> malloc-and-read, just like git does with NO_MMAP. So I'll continue to
> >> investigate in that direction.
> >
> >I think cygwin's mmap() is based on the Win32 API equivalent, which could 
> >mean that it *is* memory mapped, but in a special area (which is smaller 
> >than 1.5 gigabyte). In this case, it would make sense to limit the pack 
> >size, thereby having several packs, and mmap() them as they are needed.
> 
> Yes, cygwin's mmap uses CreateFileMapping and MapViewOfFile.  IIRC,
> Windows might have a 2G limitation lurking under the hood somewhere but
> I think that might be tweakable with some registry setting.

Windows places its DLLs criss-cross through the memory space because
every DLL on the system has its own preferred place to be loaded (the
base address). This severely limits the amount of largest contiguous
memory block available, which is needed for one mmap() I think.

Several solutions exist:
  - enlarge the address space with the /3GB boot flag in boot.ini
  - rebase all DLLs with REBASE.EXE (part of platform sdk) .
    Just make them the same and fix them to a low address.
    Problem is rebasing system dlls since those are locked by the system.
  - at start of program before other DLLs are loaded,
    reserve an as large part of the memory as possible with
    VirtualAlloc()

-- 
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------

^ permalink raw reply

* blame not working well?
From: Junio C Hamano @ 2006-04-06  0:11 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git

I was having fun updating blame.c to use the built-in xdiff
instead of spawning and reading from external GNU diff (it is
currently in "next" branch).  It seems to pass the trivial
testsuite case but I noticed for example annotating Makefile,
sha1_name.c, or blame.c in git.git repository seems to show
quite bogus annotation.  One extreme case is the Makefile; for
all but one line is blamed for the very initial commit made by
Linus X-<.  One good news for me is that the version before this
change has the same breakage.  One bad news is this seems to
have been broken for some time.

Bisecting indicates 2a0925be3512451834ec9a3e023f4cff23c1cfb7 is
the first bad commit, but I do not see how the change can break
it.  I'll continue digging it, but if you have a chance, could
you take a look, too?

^ permalink raw reply

* Re: Cygwin can't handle huge packfiles?
From: Christopher Faylor @ 2006-04-06  0:34 UTC (permalink / raw)
  To: git
In-Reply-To: <20060405232739.GA18121@nospam.com>

On Thu, Apr 06, 2006 at 01:27:39AM +0200, Rutger Nijlunsing wrote:
>On Wed, Apr 05, 2006 at 05:08:44PM -0400, Christopher Faylor wrote:
>> On Wed, Apr 05, 2006 at 04:14:20PM +0200, Johannes Schindelin wrote:
>> >> Inspired by a patch of Alex Riesen (thanks, Alex), I tried to use the
>> >> regular mmap for mapping pack files, only to discover that I compile
>> >> without defining "NO_MMAP", so I've been using the stock mmap all
>> >> along. So now I'm thinking that the cygwin mmap also does a
>> >> malloc-and-read, just like git does with NO_MMAP. So I'll continue to
>> >> investigate in that direction.
>> >
>> >I think cygwin's mmap() is based on the Win32 API equivalent, which could 
>> >mean that it *is* memory mapped, but in a special area (which is smaller 
>> >than 1.5 gigabyte). In this case, it would make sense to limit the pack 
>> >size, thereby having several packs, and mmap() them as they are needed.
>> 
>> Yes, cygwin's mmap uses CreateFileMapping and MapViewOfFile.  IIRC,
>> Windows might have a 2G limitation lurking under the hood somewhere but
>> I think that might be tweakable with some registry setting.
>
>Windows places its DLLs criss-cross through the memory space because
>every DLL on the system has its own preferred place to be loaded (the
>base address). This severely limits the amount of largest contiguous
>memory block available, which is needed for one mmap() I think.
>
>Several solutions exist:
>  - enlarge the address space with the /3GB boot flag in boot.ini

Thanks.  The 3GB boot flag is what I was trying to remember.

>  - rebase all DLLs with REBASE.EXE (part of platform sdk) .
>    Just make them the same and fix them to a low address.
>    Problem is rebasing system dlls since those are locked by the system.

Cygwin has its own version of rebase and a method for rebasing all of the
dlls in the distribution.  Using that may help squeeze out a little bit
of memory.

>  - at start of program before other DLLs are loaded,
>    reserve an as large part of the memory as possible with
>    VirtualAlloc()

Cygwin actually uses this trick to try to push DLLs into their right
locations after a fork.  It sort of works but sometimes, in a child
proccess, Windows puts "stuff" in locations previously occupied by a
DLL.  I could swear that it does that just to be annoying...

There is a chicken/egg problem here in that Cygwin uses Doug Lea's malloc
and that version of malloc will use mmap when sbrk() fails -- as it is
apt to do when allocating gigabytes of memory.  So, using malloc is
not a way to avoid mmap.

cgf

^ permalink raw reply

* Re: blame not working well?
From: Junio C Hamano @ 2006-04-06  1:26 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git
In-Reply-To: <7vacazy33w.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> I was having fun updating blame.c to use the built-in xdiff
> instead of spawning and reading from external GNU diff (it is
> currently in "next" branch).  It seems to pass the trivial
> testsuite case but I noticed for example annotating Makefile,
> sha1_name.c, or blame.c in git.git repository seems to show
> quite bogus annotation.  One extreme case is the Makefile; for
> all but one line is blamed for the very initial commit made by
> Linus X-<.  One good news for me is that the version before this
> change has the same breakage.  One bad news is this seems to
> have been broken for some time.
>
> Bisecting indicates 2a0925be3512451834ec9a3e023f4cff23c1cfb7 is
> the first bad commit, but I do not see how the change can break
> it.  I'll continue digging it, but if you have a chance, could
> you take a look, too?

It turns out that the only change needed to revert the breakage
was this one-liner.  get_revision() used to always rewrite
parents when prune and dense are specified, but the updated code
simply skips during the output filtering phase the parents that
would have been culled by calling rewrite_parents() unless the
caller tells it that it is interested in the parent field by
setting rev.parents.

-- >8 --
[PATCH] blame.c: fix completely broken ancestry traversal.

Recent revision.c updates completely broken the assignment of
blames by not rewriting commit->parents field unless explicitly
asked to by the caller.  The caller needs to set revs.parents.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 blame.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

ba3c93743a8151e3663e1fda6b3cb165d8373ddf
diff --git a/blame.c b/blame.c
index 98f9992..9bb34e6 100644
--- a/blame.c
+++ b/blame.c
@@ -813,6 +813,7 @@ int main(int argc, const char **argv)
 	rev.prune_fn = simplify_commit;
 	rev.topo_setter = topo_setter;
 	rev.topo_getter = topo_getter;
+	rev.parents = 1;
 	rev.limited = 1;
 
 	commit_list_insert(start_commit, &rev.commits);
-- 
1.3.0.rc2.g9cda

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox