Git development
 help / color / mirror / Atom feed
* Re: [PATCH] Remove gitenv macro hack
From: Junio C Hamano @ 2005-05-19 23:41 UTC (permalink / raw)
  To: Dan Weber; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.62.0505191800280.16809@mirrorlynx.com>

Please do not do this.

I believe we have that in quite a few places.  IIRC, it is one
of the Linus-approved GCC extensions and also used extensively
in the kernel source.

./cache.h:39:#define gitenv(e) (getenv(e) ? : gitenv_bc(e))
./commit-tree.c:149:	commitgecos = gitenv("GIT_COMMITTER_NAME") ? : realgecos;
./commit-tree.c:150:	commitemail = gitenv("GIT_COMMITTER_EMAIL") ? : realemail;
./commit-tree.c:151:	gecos = gitenv("GIT_AUTHOR_NAME") ? : realgecos;
./commit-tree.c:152:	email = gitenv("GIT_AUTHOR_EMAIL") ? : realemail;
./fsck-cache.c:359:	char *git_dir = gitenv(GIT_DIR_ENVIRONMENT) ? : DEFAULT_GIT_DIR_ENVIRONMENT;
./diff.c:35:	diff_opts = gitenv("GIT_DIFF_OPTS") ? : diff_opts;
./diff.c:358:		prepare_temp_file(other ? : name, &temp[1], two);
./diff.c:398:			builtin_diff(name, other ? : name, temp);
./diff.c:717:	diff_rename_minimum_score = minimum_score_ ? : MINIMUM_SCORE;
./sha1_file.c:203:	const char *alt = gitenv(ALTERNATE_DB_ENVIRONMENT) ? : "";
./sha1_file.c:215:			cp = strchr(last, ':') ? : last + strlen(last);


^ permalink raw reply

* Re: gitk-1.0 released
From: Paul Mackerras @ 2005-05-19 22:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git
In-Reply-To: <20050519132411.GA29111@elte.hu>

Ingo Molnar writes:

> very nice! Works well and it's pretty fast on a 2GHz P4.

I'm glad you like it. :)

> The "Octopus merge ..." text is incorrectly overlayed with a graph line.

The patch below fixes that.

> - i guess this one is on your todo list: the history graph of a single
>   object (file).

Yes.  I was hoping that git-rev-tree would grow an option to do the
necessary selection of commits and produce a simplified graph.  I
could do it in Tcl but it's probably better done in C.

> - first window appearance on an uncached repository can be pretty slow 
>   due to disk seeks - so it might make sense to display something (an 
>   hourglass?) sooner - when i first started it i thought it hung. On 
>   already cached repositories the window comes up immediately, and the 
>   list of commits is updated dynamically.

The problem is that git-rev-tree HEAD doesn't output anything until it
has read all the relevant commits, which can involve a lot of disk
seeks.  I put the "Reading commits..." message in to indicate that
something was happening, but your hourglass cursor suggestion is a
good one.  It looks like git-rev-list might be better suited to what I
want, actually.

> (and the biggest missing feature of GIT right now is author + 
> last-commit annotated file viewing which could be integrated into gitk 
> a'ka BK's revtool: selecting a given line of the file would bring one to 
> that commit, etc.)

Yes, indeed.  I'll have to think about how to do it in a responsive
fashion, since getting the necessary information involves reading all
the commits and all the tree objects back to the beginning of time,
AFAICS.  Gitk currently only reads the tree objects when you select a
commit, and it does that asynchronously; when you select a commit, it
immediately displays the commit message and starts a git-diff-tree
process.  When the output from git-diff-tree arrives, it updates the
listbox and then (if you haven't selected another commit in the
meantime) starts a git-diff-tree -p to get the diff.  As the output
from git-diff-tree arrives, it is colorized and placed in the details
window.  That's why you can let the up or down key autorepeat and gitk
doesn't get hopelessly behind.

Another thing I want to do is find a way to display the deleted lines
in the annotated file listing.  One thing I found quite frustrating
with bk revtool was trying to find which changeset deleted some
particular lines of code.  I was basically reduced to binary searching
through the changesets - and with a large source file, just finding
the place to check in the annotated listing for each changeset was
time-consuming and error-prone in itself.

Paul.

diff -urN gitk-1.0/gitk gitk
--- gitk-1.0/gitk	2005-05-20 08:17:18.000000000 +1000
+++ gitk	2005-05-20 08:09:38.000000000 +1000
@@ -563,6 +563,9 @@
 		   -fill $ofill -outline black -width 1]
 	$canv raise $t
 	set xt [expr $canvx0 + $nlines * $linespc]
+	if {$nparents($id) > 2} {
+	    set xt [expr {$xt + ($nparents($id) - 2) * $linespc}]
+	}
 	set headline [lindex $commitinfo($id) 0]
 	set name [lindex $commitinfo($id) 1]
 	set date [lindex $commitinfo($id) 2]

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Linus Torvalds @ 2005-05-19 22:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git
In-Reply-To: <7vll6ayjok.fsf@assigned-by-dhcp.cox.net>



On Thu, 19 May 2005, Junio C Hamano wrote:
> 
> We are talking about the Plumbing.  Thank you for reminding me,
> but sometimes people end up using the bare Plumbing.

There's a pretty simple and nice way to make it both useful and easy to do 
to arbitrary precision.

Think of the number following the -M as the mantissa.

So -M9 means 0.9 aka "90% match" (or "difference", depending on which way
you want to go), and in general -Mx would have the "10% increments" thing.

But since it's a fraction, you just give more precision by adding more 
numbers, and -M99 would be "99% match", while "-M02" would be "2% match"

Then it would be logical for a plain -M to be 100% match / 0% difference
(ie only show renames that are exact), since a "0% match" / 100%
difference is nonsensical.

Alternatively, we'd have -M (without any number) just default to 
something, and you'd give a separate number of how closely you want to 
mach things, ie

	# These all mean the same thing: (default) 20% difference
	git-diff-tree -M
	git-diff-tree -M --match=80
	git-diff-tree -M --differ=20

	# show only renames that are perfect matches.
	git-diff-tree -M --match=100

	# show _everything_ as a rename, except the
	# matching matrix means that we prefer better
	# matches over worse
	git-diff-tree -M --match=0

Hmm?

		Linus

^ permalink raw reply

* [PATCH] Remove gitenv macro hack
From: Dan Weber @ 2005-05-19 22:01 UTC (permalink / raw)
  To: Git Mailing List


Removed hacky macro for gitenv.  Often produced warnings by the compiler 
for the use of ?: without anything after the ?

Signed-off-by: Dan Weber <dan@mirrorlynx.com>

---
commit 1b48b369a152a6315a9b4e6eebf50f56176cdd82
tree 53c238f3aa788df47325c456ab16b0eb25004074
parent 5cd4c7b7686d334e341b21d92449349feda3ef65
author Dan Weber <dan@mirrorlynx.com> Thu, 19 May 2005 17:57:44 -0400
committer Dan Weber <dan@mirrorlynx.com> Thu, 19 May 2005 17:57:44 -0400

  cache.h |    8 +++++++-
  1 files changed, 7 insertions(+), 1 deletion(-)

Index: cache.h
===================================================================
--- ca5fef50fb68a3afbb35e1a48ac622f7a964f021/cache.h  (mode:100644)
+++ 53c238f3aa788df47325c456ab16b0eb25004074/cache.h  (mode:100644)
@@ -37,7 +37,13 @@
   * We accept older names for now but warn.
   */
  extern char *gitenv_bc(const char *);
-#define gitenv(e) (getenv(e) ? : gitenv_bc(e))
+static inline char* gitenv(const char* name) {
+       char* result = getenv(name);
+       if (result)
+               return result;
+       else
+               return gitenv_bc(name);
+}

  /*
   * Basic data structures for the directory cache


^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 21:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.62.0505191643030.20274@localhost.localdomain>

>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:

NP> Are we talking about git plumbing or porcelain here?

We are talking about the Plumbing.  Thank you for reminding me,
but sometimes people end up using the bare Plumbing.

As I stated before, I do not do Porcelain [*1*]; my main
interest lies in helping Linus and Linux Kernel development
process, by helping him in the Plumbing area and making the use
of bare Plumbing layer a confortable enough experience.

My ultimate goal is to make the Plumbing useful enough to make
what Porcelain layers do more or less irrelevant ;-).

[Footnote]

*1* Yes I do have my own Porcelain layer, and personally I feel
some of the things it does and some of the approaches it takes
are quite good, but I do not advocate it more than necessary on
this list.


^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Nicolas Pitre @ 2005-05-19 20:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7vy8abx8ay.fsf@assigned-by-dhcp.cox.net>

On Thu, 19 May 2005, Junio C Hamano wrote:

> >>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:
> 
> NP> Yes, but 0-9 is putting a bound on the accuracy.  What if someone wants 
> NP> not more than 2% difference?
> 
> That statement is correct, but I think you are looking at it
> from a developer perspective.
> 
> I suspect people would not want to pay the price of having
> always to type many digits for the benefit of being able to
> specify differences of 2% and 5%.  Would you also complain gzip
> only lets you say -1 .. -9 and not -1.63 ;-)?

Are we talking about git plumbing or porcelain here?


Nicolas

^ permalink raw reply

* Re: git-diff-tree for the first commit
From: Linus Torvalds @ 2005-05-19 20:46 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: GIT
In-Reply-To: <Pine.LNX.4.58.0505191323060.2322@ppc970.osdl.org>



On Thu, 19 May 2005, Linus Torvalds wrote:
> 
> That said, a new flag that says "diff the root against the NUL tree" 
> wouldn't be wrong either, for when that is what you want.

Done. Use "git-diff-tree --root" if you want to see the root commit as a 
big diff against nothing.

		Linus

^ permalink raw reply

* Re: git-diff-tree for the first commit
From: Thomas Glanzmann @ 2005-05-19 20:38 UTC (permalink / raw)
  To: GIT
In-Reply-To: <Pine.LNX.4.58.0505191323060.2322@ppc970.osdl.org>

Hello,
I see. Thanks for the elaboration. I got the idea now. ;-)

> That said, a new flag that says "diff the root against the NUL tree" 
> wouldn't be wrong either, for when that is what you want.

I want it for the following scenario:

My git frontend sets the time stamps of the checked-out files to the
time of the last modification. That way I can do a 'ls -lart' in a sub
directory and have the most recently touched files on bottom.  I also
want to use this for keyword expansion.

I currently do it by calling more or less 'git-rev-tool HEAD' and
'git-diff-tree -r REVISION' and cache the output. However for the inital
import are no 'timestamps' available. Now I can do two things. Implement
a git-diff-tree flag or assume that any files which don't have a delta
are imported by the initial tree.


Ideas?

	Thomas

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 20:36 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.62.0505191456040.20274@localhost.localdomain>

>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:

NP> Yes, but 0-9 is putting a bound on the accuracy.  What if someone wants 
NP> not more than 2% difference?

That statement is correct, but I think you are looking at it
from a developer perspective.

I suspect people would not want to pay the price of having
always to type many digits for the benefit of being able to
specify differences of 2% and 5%.  Would you also complain gzip
only lets you say -1 .. -9 and not -1.63 ;-)?


^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Junio C Hamano @ 2005-05-19 20:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.58.0505181731450.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> ... I do believe that git-run-with-user-path _could_ be a
LT> good way to abstract out the "where the heck in the tree am
LT> I?" issues.

Yes, I am still in search of a good way to abstract that issue
out and I myself is not yet convinced that the command in its
current form _is_ a good enough way yet.

What I am most unhappy about with it lies elsewhere, though.
There needs to be a better way to tell it how the underlying
command handles non-paths arguments, so that I can just say

    git-run-with-user-path <some option spec for the command> \
        command arg1 arg2 arg3 ...

and if arg1 through argO is non-path options then have it
canonicalize and filter only starting from argO+1.  That would
alleviate one issue I have with the current implementation.


^ permalink raw reply

* Re: git-diff-tree for the first commit
From: Linus Torvalds @ 2005-05-19 20:31 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: GIT
In-Reply-To: <20050519195110.GG8105@cip.informatik.uni-erlangen.de>



On Thu, 19 May 2005, Thomas Glanzmann wrote:
>
> I would like to see output for the first commit (initial import) in:
> 
> 	git-rev-list HEAD | git-diff-tree -r --stdin
> 
> is it supposed to just be empty or is that a bug?

Hmm.. That's a bug and/or a feature, entirely depending on how you feel.

The first commit doesn't have a parent, so in the world where this is a 
feature, this is 100% consistent with the notion that since there is 
nothing to diff against, diff-tree has nothing to do.

In an alternate world, you can decide that not having a parent is 
equivalent to being parented with an empty tree.

In yet a third world, you'd decide that all git projects should start off 
from the empty tree root parent, and that the kernel project (and the git 
archive itself) is invalid.

I don't think there is a right answer, except that I think the third
version is likely wrong, since by definition the kernel archive is
perfect.

There's actually some reason to consider the current behaviour correct, in
that the initial tree really _is_ special: it was imported from somewhere
else, and as such anybody who wants to know "what changed" really doesn't
want to see the explosion that happened at the beginning of time: that
wasn't a "change" at all, that was something else.

So the current behaviour actually is (in my opinion) the right one, at
least when considering something like git-whatchanged. Similarly, if you
use a variation of git-whatchanged to implement the equivalent of "cvs
annotate", leaving the lines that don't have a diff _non-annotated_ is
actually the right thing to do, since it would be wrong to say "they came
from the person who did the initial import".

That said, a new flag that says "diff the root against the NUL tree" 
wouldn't be wrong either, for when that is what you want.

			Linus

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: H. Peter Anvin @ 2005-05-19 20:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.58.0505191148470.2322@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> IOW, you're screwed. "execvp()" really should take an argument of type
> "const char * const *", but it doesn't for historical reasons.
> 

The real problem, IMNSHO, is that C doesn't allow a pointer to a pointer 
to a non-const object to be implicitly treated as a pointer to a pointer 
to a const object.  C should have required those two pointer classes to 
have the same representation (which they would in any sane, and pretty 
much any insane, system) and therefore a lot of functions could have the 
additional consts added to their prototypes.

At least one can do casts on sane architectures... :-/

	-hpa

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 19:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505191148470.2322@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> IOW, you're screwed. "execvp()" really should take an argument of type
LT> "const char * const *", but it doesn't for historical reasons.

That's what I suspected.


^ permalink raw reply

* git-diff-tree for the first commit
From: Thomas Glanzmann @ 2005-05-19 19:51 UTC (permalink / raw)
  To: GIT

Hello,
I would like to see output for the first commit (initial import) in:

	git-rev-list HEAD | git-diff-tree -r --stdin

is it supposed to just be empty or is that a bug?

	Thomas

^ permalink raw reply

* Re: [PATCH] packed delta git
From: Nicolas Pitre @ 2005-05-19 19:30 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: git
In-Reply-To: <20050519183810.GF8105@cip.informatik.uni-erlangen.de>

On Thu, 19 May 2005, Thomas Glanzmann wrote:

> Hello Chris,
> 
> > size (du -sh .git)              2.5G                  227M
> 
> wow that beats bitkeeper in size. What is missing to actual use such a
> approach in a distributed environment?

Me completing fsck-cache support for delta objects.


Nicolas

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Nicolas Pitre @ 2005-05-19 18:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7vsm0jyryf.fsf@assigned-by-dhcp.cox.net>

On Thu, 19 May 2005, Junio C Hamano wrote:

> >>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:
> 
> NP> On Thu, 19 May 2005, Junio C Hamano wrote:
> >> - the command line interface "-M" to read "-M" or "-M[0-9]"
> >> (one digit); -M defaults to -M5 and give the cut-off point at
> >> similarity score 5000, -M9 at 9000, etc.
> 
> NP> Why not a fractional value instead?  -M1 is 100% the same while -M.95 
> NP> allows for some 5% changes.
> 
> We are essentially saying the same thing.  Internally diff core
> uses score between 0 and 10000 but single digit proposed above
> or fractional both hides that from the user by normalizing the
> scale to something less arbitrary (in my case 0-9 in your case
> 0-1.0).

Yes, but 0-9 is putting a bound on the accuracy.  What if someone wants 
not more than 2% difference?


Nicolas

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Linus Torvalds @ 2005-05-19 18:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vy8abys5a.fsf@assigned-by-dhcp.cox.net>



On Thu, 19 May 2005, Junio C Hamano wrote:
> 
> Here is another "doubt" point.  I am almost embarrassed to ask
> this, but what's the right way to express the following?  I
> could not figure out how to silence const warnings from gcc
> without using the cast there, which defeats the whole point of
> const warnings:

You're not doing this right.

Like it or not, "execvp()" does not take a pointer to "const char *".

It takes a pointer to a constant array of _non-const_ "char *".

Which is not what you have. You have a non-const array of "const char *", 
and as a result, you need the cast.

In other words, you really need

	char *exec_arg[9];

or, if you can indeed set the array to be const, you can make it be

	char *const exec_arg[9] = {
		pgm, name,
		temp[0].name, temp[0].hex, temp[0].mode,
		temp[1].name, temp[1].hex, temp[1].mode,
	}

which would work, except to avoid warnings it obviously requires that none
of the strings themselves are "const" (which isn't true).

IOW, you're screwed. "execvp()" really should take an argument of type
"const char * const *", but it doesn't for historical reasons.

		Linus

^ permalink raw reply

* Re: [PATCH] packed delta git
From: Chris Mason @ 2005-05-19 18:53 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: git
In-Reply-To: <20050519183810.GF8105@cip.informatik.uni-erlangen.de>

On Thursday 19 May 2005 14:38, Thomas Glanzmann wrote:
> Hello Chris,
>
> > size (du -sh .git)              2.5G                  227M
>
> wow that beats bitkeeper in size. What is missing to actual use such a
> approach in a distributed environment?

It's not quite fair to compare with bitkeeper, since my changeset comments are 
only the name of the bk->cvs patch, and I've only got 28k changesets vs bk's 
60k or so.

In terms of actually making use of this, we need to deal with the hard linked 
files during push/pull.  This means using -H on rsync and teaching the 
push/pull code about packed files.

git-pack needs to be able to unpack/undelta files so that people can clean a 
tree.

git-fsck-cache needs to understand packed files and deltas.

-chris

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 18:47 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.62.0505191426000.20274@localhost.localdomain>

>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:

>> - I have been assuming that diff_delta uses its two input
>> read-only but have not verified that myself yet.

NP> It does.

Thanks (also thanks to Linus for pointing out the PROT_READ in
the test program).


^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 18:46 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.62.0505191426000.20274@localhost.localdomain>

>>>>> "NP" == Nicolas Pitre <nico@cam.org> writes:

NP> On Thu, 19 May 2005, Junio C Hamano wrote:
>> - the command line interface "-M" to read "-M" or "-M[0-9]"
>> (one digit); -M defaults to -M5 and give the cut-off point at
>> similarity score 5000, -M9 at 9000, etc.

NP> Why not a fractional value instead?  -M1 is 100% the same while -M.95 
NP> allows for some 5% changes.

We are essentially saying the same thing.  Internally diff core
uses score between 0 and 10000 but single digit proposed above
or fractional both hides that from the user by normalizing the
scale to something less arbitrary (in my case 0-9 in your case
0-1.0).




^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 18:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7v4qcz16n6.fsf@assigned-by-dhcp.cox.net>

Replying to myself.

JCH> Oops,... thanks.  I still had some doubts about it and that's
JCH> why I said it was beta, but that is fine.  My doubts are minor:

Here is another "doubt" point.  I am almost embarrassed to ask
this, but what's the right way to express the following?  I
could not figure out how to silence const warnings from gcc
without using the cast there, which defeats the whole point of
const warnings:

	if (!pid) {
		const char *pgm = external_diff();
		if (pgm) {
			if (one && two) {
				const char *exec_arg[9];
				const char **arg = &exec_arg[0];
				*arg++ = pgm;
				*arg++ = name;
				*arg++ = temp[0].name;
				*arg++ = temp[0].hex;
				*arg++ = temp[0].mode;
				*arg++ = temp[1].name;
				*arg++ = temp[1].hex;
				*arg++ = temp[1].mode;
				if (other)
					*arg++ = other;
				*arg = 0;
				execvp(pgm, (char *const*) exec_arg);
			}

Here, pgm and name are const,  execvp expects char *const argv[]
as its second argument.


^ permalink raw reply

* Re: [PATCH] packed delta git
From: Thomas Glanzmann @ 2005-05-19 18:38 UTC (permalink / raw)
  To: git
In-Reply-To: <200505191428.52238.mason@suse.com>

Hello Chris,

> size (du -sh .git)              2.5G                  227M

wow that beats bitkeeper in size. What is missing to actual use such a
approach in a distributed environment?

	Thomas

^ permalink raw reply

* Re: [PATCH] Detect renames in diff family.
From: Nicolas Pitre @ 2005-05-19 18:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7v4qcz16n6.fsf@assigned-by-dhcp.cox.net>

On Thu, 19 May 2005, Junio C Hamano wrote:

>  - the command line interface "-M" to read "-M" or "-M[0-9]"
>    (one digit); -M defaults to -M5 and give the cut-off point at
>    similarity score 5000, -M9 at 9000, etc.

Why not a fractional value instead?  -M1 is 100% the same while -M.95 
allows for some 5% changes.

This is clear and not based on some arbitrary level values.

>  - I have been assuming that diff_delta uses its two input
>    read-only but have not verified that myself yet.

It does.


Nicolas

^ permalink raw reply

* Re: [PATCH] packed delta git
From: Chris Mason @ 2005-05-19 18:28 UTC (permalink / raw)
  To: git; +Cc: Nicolas Pitre
In-Reply-To: <200505171857.46370.mason@suse.com>

On Tuesday 17 May 2005 18:57, Chris Mason wrote:
> Hello everyone,
>
> Here's a new version of my packed git patch, diffed on top of Nicolas'
> delta code (link below to that).  It doesn't change the core git commands
> to create packed/delta files, that is done via a new git-pack command.  The
> git-pack usage is very simple:
>
> git-pack [<reference_sha1>:]<target_sha1> [ <next_sha1> ... ]

My original git-pack-changes script didn't properly limit the length of the 
delta chains, so you could use it to create a repo that you can't later read.

The new one below fixes that, and also changes the direction of the delta.
Deltas are now done in reverse, leaving the most recent sha1 as a whole file
and diffing old revisions against it.

The result is the same size (62M for current linux-2.6 git tree) and faster
checkout times for head (9s vs 15s).  I also tested against the bk-cvs patch
set:
                                      vanilla               packed/delta
checkout-cache (hot)      (only 1.5G ram) 15s
checkout-cache (cold)     4m30s               1m19s  
size (du -sh .git)              2.5G                  227M

The steps to pack/delta the 2.6 git tree have changed:

# step one, pack all of HEAD together
git-ls-tree -r HEAD | awk '{print $3}' | xargs git-pack

# step two pack deltas for all revs back to the first commit
git-pack-changes-script | xargs git-pack

# step three, pack a delta from 2.6.12-rc2 to 2.6.11
git-pack-changes-script -t 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 c39ae07f393806ccf406ef966e9a15afc43cc36a | xargs git-pack

New git-pack-changes-script:

--
#!/usr/bin/perl
#
# script to search through the rev-list output and generate delta history
# you can specify either a start and stop commit or two trees to search.
# with no command line args it searches the entire revision history.
# output is suitable for piping to xargs git-pack

use strict;

my $ret;
my $i;
my @wanted = ();
my $argc = scalar(@ARGV);
my $commit;
my $stop;
my %delta = ();
my %packed = ();

sub add_packed($) {
    my ($sha1) = @_;
    if (defined($packed{$sha1})) {
        return 1;
    }
    if (defined($delta{$sha1})) {
        return 1;
    }
    $packed{$sha1} = 1;
    print "$sha1\n";
    return 0;
}

sub add_delta($$) {
    my ($ref, $target) = @_;
    my $chain = 0;
    my $recur = $ref;
    if (defined($delta{$target})) {
        return 1;
    }
    if (defined($packed{$target})) {
        return 1;
    }
    while(1) {
	last if (!defined($delta{$recur}));
	if ($target eq $delta{$recur}) {
	    add_packed($target);
	    return 1;
	}
	$chain++;
	if ($chain > 32) {
	    add_packed($target);
	    return 1;
	}
	$recur = $delta{$recur};
    }
    $delta{$target} = $ref;
    print "$ref:$target\n";
    return 0;
}

sub print_usage() {
    print STDERR "usage: pack-changes [-c commit] [-s stop commit] [-t tree1 tree2]\n";
    exit(1);
}

sub find_tree($) {
    my ($commit) = @_;
    open(CM, "git-cat-file commit $commit|") || die "git-cat-file failed";
    while(<CM>) {
        chomp;
	my @words = split;
	if ($words[0] eq "tree") {
	    return $words[1];
	} elsif ($words[0] ne "parent") {
	    last;
	}
    }
    close(CM);
    if ($? && ($ret = $? >> 8)) {
        die "cat-file $commit failed with $ret";
    }
    return undef;
}

sub test_diff($$) {
    my ($a, $b) = @_;
    open(DT, "git-diff-tree -r -t $a $b|") || die "diff-tree failed";
    while(<DT>) {
        chomp;
	my @words = split;
	my $sha1 = $words[2];
	my $change = $words[0];
	if ($change =~ m/^\*/) {
	    @words = split("->", $sha1);
	    add_delta($words[0], $words[1]);
	} elsif ($change =~ m/^\-/) {
	    next;
	} else {
	    add_packed($sha1);
	}
    }
    close(DT);
    if ($? && ($ret = $? >> 8)) {
	die "git-diff-tree failed with $ret";
    }
    return 0;
}

for ($i = 0 ; $i < $argc ; $i++)  {
    if ($ARGV[$i] eq "-c") {
    	if ($i == $argc - 1) {
	    print_usage();
	}
	$commit = $ARGV[++$i];
    } elsif ($ARGV[$i] eq "-s") {
    	if ($i == $argc - 1) {
	    print_usage();
	}
	$stop = $ARGV[++$i];
    } elsif ($ARGV[$i] eq "-t") {
        if ($argc != 3 || $i != 0) {
	    print_usage();
	}
	if (test_diff($ARGV[1], $ARGV[2])) {
	    die "test_diff failed\n";
	}
	add_delta($ARGV[1], $ARGV[2]);
	exit(0);
    }
}

if (!defined($commit)) {
    $commit = `commit-id`;
    if ($?) {
    	print STDERR "commit-id failed, try using -c to specify a commit\n";
	exit(1);
    }
    chomp $commit;
}

open(RL, "git-rev-list $commit|") || die "rev-list failed";
while(<RL>) {
    chomp;
    my $cur = $_;
    my $cur_tree;
    my $parent_tree;
    my $parent_commit = undef;
    open(PARENT, "git-cat-file commit $cur|") || die "cat-file failed";
    while(<PARENT>) {
        chomp;
	my @words = split;
	if ($words[0] eq "tree") {
	    $cur_tree = $words[1];
	    next;
	} elsif ($words[0] ne "parent") {
	    last;
	}
	$parent_commit = $words[1];
	my $next = <PARENT>;
	# ignore merge sets for now
	if ($next =~ m/^parent/) {
	    last;
	}
	# note that we run test_diff to generate a reverse
	# diff
	if (test_diff($cur, $words[1])) {
	    die "test_diff failed\n";
	}
	$parent_tree = find_tree($words[1]);
	if (!defined($parent_tree)) {
	    die "failed to find tree for $words[1]\n";
	}
	add_delta($cur_tree, $parent_tree);
	add_packed($cur);
	last;
    }
    close(PARENT);
    if (!defined($parent_commit)) {
        print STDERR "parentless commit $cur\n";
    }
    if ($? && ($ret = $? >> 8)) {
        die "cat-file failed with $ret";
    }
    if ($cur eq $stop) {
        last;
    }
}
close(RL);

if ($? && ($ret = $? >> 8)) {
    die "rev-list failed with $ret";
}


^ permalink raw reply

* Re: manpage name conflict
From: Sebastian Kuzminsky @ 2005-05-19 18:18 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.58.0505190956330.2322@ppc970.osdl.org>

Linus Torvalds <torvalds@osdl.org> wrote:
> On Thu, 19 May 2005, Sebastian Kuzminsky wrote:
> > Anyway, here's the documentation patch:
> 
> It's whitespace-corrupted, with tabs turned into spaces..


<blush>


Index: Documentation/Makefile
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/Makefile  (mode:100644)
+++ uncommitted/Documentation/Makefile  (mode:100644)
@@ -1,6 +1,6 @@
 DOC_SRC=$(wildcard git*.txt)
 DOC_HTML=$(patsubst %.txt,%.html,$(DOC_SRC))
-DOC_MAN=$(patsubst %.txt,%.1,$(DOC_SRC))
+DOC_MAN=$(patsubst %.txt,%.1,$(wildcard git-*.txt)) git.7
 
 all: $(DOC_HTML) $(DOC_MAN)
 
@@ -13,13 +13,15 @@
 	touch $@
 
 clean:
-	rm -f *.xml *.html *.1
+	rm -f *.xml *.html *.1 *.7
 
 %.html : %.txt
 	asciidoc -b css-embedded -d manpage $<
 
-%.1 : %.xml
+%.1 %.7 : %.xml
 	xmlto man $<
+	# FIXME: this next line works around an output filename bug in asciidoc 6.0.3
+	[ "$@" = "git.7" ] || mv git.1 $@
 
 %.xml : %.txt
 	asciidoc -b docbook -d manpage $<
Index: Documentation/git-diff-helper.txt
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/git-diff-helper.txt  (mode:100644)
+++ uncommitted/Documentation/git-diff-helper.txt  (mode:100644)
@@ -1,5 +1,5 @@
 git-diff-helper(1)
-=======================
+==================
 v0.1, May 2005
 
 NAME
Index: Documentation/git.txt
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/Documentation/git.txt  (mode:100644)
+++ uncommitted/Documentation/git.txt  (mode:100644)
@@ -1,4 +1,4 @@
-git(1)
+git(7)
 ======
 v0.1, May 2005
 


-- 
Sebastian Kuzminsky
"Marie will know I'm headed south, so's to meet me by and by"
-Townes Van Zandt

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox