Git development

Git development
 help / color / mirror / Atom feed

* Re: Following renames
From: Marco Costalba @ 2006-03-27 11:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.64.0603270005330.15714@g5.osdl.org>

On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Mon, 27 Mar 2006, Marco Costalba wrote:
> >
> > Historic Linux test (63428 revisions)
> >
> > File: drivers/net/tg3.c
> > Revisions that modify tg3.c : 292
> >
> > With qgit
> > 15s to retrieve file history (git-rev-list)
> > 19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one)
>
> .. and it does absolutely _nothing_ while it's doing that, does it?
>

yes, it's true.

> > $ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null
> > 98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (797major+43033minor)pagefaults 0swaps
>
> In contrast, git-whatchanged will start outputting the recent changes
> immediately.
>
> And that's the point. Almost always, we're interested in the _recent_
> stuff. The fact that it takes longer to get the old history  is not very
> important. You generally don't ask "what changed in this file" for a file
> that hasn't changed in five years.
>

We could run git-rev-list with a time range specifier (changes of last
year as example) by default so to have fast results and run all time
history _only_  on request.

This perhaps could solve the fast output for recent revs problem, if
this is the problem.

I still think the problem with annotation is that you don't see
patches that _remove_ lines of code, you need the whole diff for this.

Marco

^ permalink raw reply

* Re: Following renames
From: Linus Torvalds @ 2006-03-27  8:07 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Jakub Narebski, git
In-Reply-To: <e5bfff550603262147t3aec8da6p6bf2a333e2d35f1d@mail.gmail.com>

On Mon, 27 Mar 2006, Marco Costalba wrote:
> 
> Historic Linux test (63428 revisions)
> 
> File: drivers/net/tg3.c
> Revisions that modify tg3.c : 292
> 
> With qgit
> 15s to retrieve file history (git-rev-list)
> 19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one)

.. and it does absolutely _nothing_ while it's doing that, does it?

> $ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null
> 98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (797major+43033minor)pagefaults 0swaps

In contrast, git-whatchanged will start outputting the recent changes 
immediately.

And that's the point. Almost always, we're interested in the _recent_ 
stuff. The fact that it takes longer to get the old history  is not very 
important. You generally don't ask "what changed in this file" for a file 
that hasn't changed in five years.

		Linus

^ permalink raw reply

* Re: Following renames
From: Jakub Narebski @ 2006-03-27  7:53 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.62.0603262337580.26865@qynat.qvtvafvgr.pbz>

David Lang wrote:

> On Mon, 27 Mar 2006, Jakub Narebski wrote:
> 
>> 2.) Caching the results of similarity algorithm/rename detection tool
>> (also Paul Jakma post), including remembering false positives and
>> undetected renames, for efficiency. Calculated automatically parts might
>> be throw-away.
> 
> this sounds like it could easily devolve into a O(n!) situation where you
> are cacheing how everything is related (or not related) to everything
> else. Paul was makeing the point that the purpose was to cache the data to
> eliminate the time needed to calculate it, but if you don't store all the
> results then you don't know if the result is not relavent, or unknown, so
> you need to calculate it again.

First of all, you only remember non-trivial relations (i.e. file.c is always
related to file.c). If the cache would be only for commits, it would be
O(c*p*n), where c is number of commits, p is percentage of contents moving
("renames") times percent of files changed in the commit, and n is the
number of files, probably O(c) practically. Even if we remember for all
(tree1,tree2) pairs it would be O(c^2). Additionally cache can be limited
in size (pruning oldest cache).  

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: Following renames
From: David Lang @ 2006-03-27  7:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e0827k$7tk$1@sea.gmane.org>

On Mon, 27 Mar 2006, Jakub Narebski wrote:

> 2.) Caching the results of similarity algorithm/rename detection tool (also
> Paul Jakma post), including remembering false positives and undetected
> renames, for efficiency. Calculated automatically parts might be
> throw-away.

this sounds like it could easily devolve into a O(n!) situation where you 
are cacheing how everything is related (or not related) to everything 
else. Paul was makeing the point that the purpose was to cache the data to 
eliminate the time needed to calculate it, but if you don't store all the 
results then you don't know if the result is not relavent, or unknown, so 
you need to calculate it again.

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply

* Re: Following renames
From: Junio C Hamano @ 2006-03-27  7:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.64.0603261509320.15714@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> No. "--sparse" still removes the uninteresting parents of merges. It just 
> doesn't then make the linear history any denser.

Hmph, you are right.  add_parents_to_list() calls prune_fn
unconditionally while running limit_list().

Disabling that with yet another flag might be a possibility but
I suspect then it would not be much different from running
rev-list without path limiter and having the caller process the
result.

^ permalink raw reply

* Re: Following renames
From: Jakub Narebski @ 2006-03-27  6:55 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0603260947100.15714@g5.osdl.org>

Linus Torvalds wrote:

> On Sun, 26 Mar 2006, Jakub Narebski wrote:
>> 
>> If (2) is common enough then discussed improvements to rename detection,
>> namely comparing basenames as a base for candidate selection is a good
>> idea.
> 
> BK had this "renametool" which got started automatically when you applied
> a patch that removed one or more files and added one or more files, so
> that you could then pair up the files manually.
[...]
> The thing is, the fast rename detection that is in the "next" branch
> really does a lot better, and it's fast enough.

I was thinking about the fast ename detection algorithm in "next" branch.

That is the question if recording additional (helper) information about
contents copying and moving like the mentioned "renametool" did is worth
the effort, both in coding it and from user's point of view. Or would
better contents copying and moving detection ("renames detection") for
whatchanged and similar suffice.

I am of opinion that voluntary information about contents moving and copying
in the commits would help.

Purposes:
1.) Record contents moving and similarity information which cannot or cannot
be easily calculated; see Paul Jakma response in this thread
  MessageID: <Pine.LNX.4.64.0603270642090.5276@sheen.jakma.org>
for example copying fragment of code, small fragment of the whole file,
creating documentation or header file from code, or code skeleton from
template, or rewrite of code in different language (e.g. shell script to
perl, script to compiled code e.g. Perl or Python to C).
2.) Caching the results of similarity algorithm/rename detection tool (also
Paul Jakma post), including remembering false positives and undetected
renames, for efficiency. Calculated automatically parts might be
throw-away.

Sources of information:
1.) Manually entered information *at commit*, including *-rm, *-mv, *-cp
like commands (which nobody likes) and systematized (pseudolanguage?) for
copying and moving contents in the log messages.
2.) Semi-manual tools like the mentioned "renametool" of BK.
3.) Support from editor (remebering where copied and pasted, or cut and
pasted fragment came from, and providing prefilled command to record
contents moving ("renames") or prefilled commit log containing this
information. Hard to get, probably most useful.
4.) Information from resolved merges and results of diagnosis (pickaxe like)
tools, especially recording "renames" which were not detected, and removing
"renames" which were detected falsily.  

Is that the place where I should provide code (patch) for testing the
idea :) ?

>> I wonder how common is (2) compared to (1)+(2) i.e. move to other dir
>> and rename, old-dir/old-file.c to new-dir/new-subdir/new-file.c
>
> For example, one common case was a directory structure like
> 
> ..
> type-file1.c
> type-file2.c
> otherfiles.c
> yet-more.c
> ..
> 
> being split up into a subdirectory
> 
> ..
> type/file1.c
> type/file2.c
> otherfiles.c
> yet-more.c
> ..
> 
> (eg drivers/scsi/aic7xx-* being given a subdirectory of it's own, as
> drivers/scsi/aic7xx/*). So the basename wouldn't stay the same, because it
> contained some piece of data that became redundant with the move.

Perhaps fast rename detection algorithm needs some smart similarity estimate
for names, which would put more weight in the parts closer to basename, and
would detect */type-file1.c and */type/file1.c as similar.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: Following renames
From: Junio C Hamano @ 2006-03-27  6:46 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git
In-Reply-To: <e5bfff550603262147t3aec8da6p6bf2a333e2d35f1d@mail.gmail.com>

"Marco Costalba" <mcostalba@gmail.com> writes:

> NOTE: It seems that  git-whatchanged asks for checked the out file to
> work. It didn't work with no repository checked out.

Perhaps,

	$ git-whatchanged HEAD -- drivers/net/tg3.c

as Linus explained in a separate message today...

^ permalink raw reply

* Re: Following renames
From: Paul Jakma @ 2006-03-27  6:00 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e05354$cc9$1@sea.gmane.org>

On Sun, 26 Mar 2006, Jakub Narebski wrote:

> I think one of the better ideas/suggestions about *recording* filenames was
> in the "impure renames / history tracking" thread
> http://marc.theaimsgroup.com/?l=git&m=114122175216489&w=2
> <Pine.LNX.4.64.0603011343170.13612@sheen.jakma.org>

For the record, the responses I received were educational ;). 
Sufficiently so I no longer think renames should be recorded. At 
least, definitely not as renames.

I now grok the reasoning for doing it by 'similarity' - it is indeed 
a *much* more useful concept. (E.g. the 'pickaxe' idea people keep 
alluding though sounds amazingly useful).

So the question really is what, if any, weaknesses does the current 
similarity estimation have, and how to solve them. I can think of two 
weaknesses:

1. the similarity algorithms can be expensive potentially, and they
    essentially get run a lot with the same inputs, to produce the
    same results - over and over as one works with a git repo. (there
    was a thread a while ago on this I think).

2. Some 'similarities' are just not deducible by current software
    state of the art. E.g. where some code is rewritten in another
    language:

 	foo.X -> foo.Y

    The high-level algorithms may remain the exact same, but the code
    may be unrecognisable as similar except to a human. However,
    tracking history back across this rewrite probably would still be
    valuable to the human.

So I think what /might/ be interesting is to have a 'similarity 
cache', which would help 1, and to allow for manual injection of such 
hints (into a seperate and stronger cache most likely) - which would 
help 2.

Something to record the following information:

(tree1,tree2)[1]:
 	Id1 <-> Id1'
 	.
 	.
 	.
 	Idn <-> Idn'

That would allow:

1. Performance repercussions of similarity estimation to be one-time,
    cached there-after. (throw-away information, if a better
    similarity estimation heuristic comes along, you can rebuild this
    cache)

2. The user to inject their own 'hints' into similarity estimation,
    particularly for cases that just aren't obvious and probably never
    will be to software estimators (e.g. the rewrite cases), but where
    the user sees value in being able to follow back the history.

Avoids:

- encoding anything permanently into the repository (which was
   something I was thinking of, and others before me apparently, but
   which I now accept would be an awful idea ;) ).

1. I'm not sure if it should be indexed by (commit ID) or
    (tree1,tree2) tuple. ??

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Men take only their needs into consideration -- never their abilities.
 		-- Napoleon Bonaparte

^ permalink raw reply

* Re: Following renames
From: Marco Costalba @ 2006-03-27  5:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.64.0603261422280.15714@g5.osdl.org>

On 3/27/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Sun, 26 Mar 2006, Marco Costalba wrote:
> >
> > FIRST WAY
> >
> > After annotating a file history (double click on a file name in
> > bottom-right window or directly from tree view), you see the whole
> > file annotated. If you have the diff window open you see also the
> > corresponding patch (scrolled to selected file name).
>
> The problem is that this step is already _way_ too expensive.
>
> I don't want to work with any tool that makes "Step 1" take a minute or
> two for a project that has a few years of history. Try it on the linux
> historic project with some file that gets lots of modifications.
>

Historic Linux test (63428 revisions)

File: drivers/net/tg3.c
Revisions that modify tg3.c : 292

With qgit
15s to retrieve file history (git-rev-list)
19.5s to annotate (git-diff-tree -p, current GNU algorithm, not new faster one)

and...

$ time git-whatchanged HEAD drivers/net/tg3.c > /dev/null
98.01user 2.44system 1:46.19elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (797major+43033minor)pagefaults 0swaps

NOTE: It seems that  git-whatchanged asks for checked the out file to
work. It didn't work with no repository checked out.


Marco

^ permalink raw reply

* Re: [PATCH] Add git-explode-packs
From: Junio C Hamano @ 2006-03-27  3:53 UTC (permalink / raw)
  To: Jan-Benedict Glaw; +Cc: git
In-Reply-To: <20060326125450.GT31387@lug-owl.de>

Jan-Benedict Glaw <jbglaw@lug-owl.de> writes:

> On Sat, 2006-03-25 22:12:46 -0800, Junio C Hamano <junkio@cox.net> wrote:
>> The script seems to do what it claims to, but now why would one
>> need to use this?  In other words what's the situation one would
>> find this useful?
>
> It's possibly useful if you oftenly access old objects with
> git-cat-file or git-ls-tree.

Benchmarks?

I created two cloned repositories from git.git.  victim03
repository is fully packed with the default pack parameter of
depth and window set both to 10. victim04 repository has the
same set of objects and refs but the pack is expanded (16232
loose objects).

Now in victim03 repository, 657 blobs have depth 10 (i.e. you
need to inflate and apply delta 10 times to get to the object).
So I made the list of these "expensive to access" objects and
run this:

	$ cd victim03
	$ /usr/bin/time sh -c '
            while read sha1; do git cat-file blob $sha1;
            done >/dev/null <list
	'

3.43user 3.36system 0:07.17elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+364561minor)pagefaults 0swaps
3.51user 3.33system 0:07.10elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+364499minor)pagefaults 0swaps
3.76user 2.99system 0:07.28elapsed 92%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+365155minor)pagefaults 0swaps

With the same file list, in victim04 repository that has 16232
loose objects:

	$ cd victim04
	$ /usr/bin/time sh -c '
            while read sha1; do git cat-file blob $sha1;
            done >/dev/null <../victim03/list
	'

3.29user 2.98system 0:06.33elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+348786minor)pagefaults 0swaps
3.26user 2.88system 0:06.63elapsed 92%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+347512minor)pagefaults 0swaps
3.16user 2.98system 0:06.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+347489minor)pagefaults 0swaps

So you are getting slight performance gain out of this by
exploding the pack, but on the other hand you are taxing the
buffer cache quite heavily by reading the loose objects (in both
of the experiments above, I discarded numbers from the very
first run).  The size of object databases in these cases are:

        $ du -sh victim0[34]/.git/objects
        6.2M    victim03/.git/objects
        84M     victim04/.git/objects

So I am still not convinced it would be useful in general.  It
used to be that exploding everything and repacking was the only
way to clean out garbage from packs, but after "repack -a -d"
was invented by Frank Sorenson that became more convenient way.
Especially with the recent "delta reusing" pack-objects, doing
"repack -a -d" has become quite cheap, so...

^ permalink raw reply

* Fix error handling for nonexistent names
From: Linus Torvalds @ 2006-03-27  0:28 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


[ This is an expanded version of a patch I sent out earlier: the 
  "rev-parse.c" part of it is identical to the earlier version, the 
  revision.c thing is new ]

When passing in a pathname pattern without the "--" separator on the 
command line, we verify that the pathnames in question exist. However, 
there were two bugs in that verification: 

 - git-rev-parse would only check the first pathname, and silently allow 
   any invalid subsequent pathname, whether it existed or not (which 
   defeats the purpose of the check, and is also inconsistent with what 
   git-rev-list actually does)

 - git-rev-list (and "git log" etc) would check each filename, but if the 
   check failed, it would print the error using the first one, ie:

	[torvalds@g5 git]$ git log Makefile bad-file
	fatal: 'Makefile': No such file or directory

   instead of saying that it's 'bad-file' that doesn't exist.

This fixes both bugs.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
diff --git a/rev-parse.c b/rev-parse.c
index 19a5ef7..8ca1c69 100644
--- a/rev-parse.c
+++ b/rev-parse.c
@@ -174,7 +174,9 @@ int main(int argc, char **argv)
 		char *dotdot;
 	
 		if (as_is) {
-			show_file(arg);
+			if (show_file(arg) && as_is < 2)
+				if (lstat(arg, &st) < 0)
+					die("'%s': %s", arg, strerror(errno));
 			continue;
 		}
 		if (!strcmp(arg,"-n")) {
@@ -194,7 +196,7 @@ int main(int argc, char **argv)
 
 		if (*arg == '-') {
 			if (!strcmp(arg, "--")) {
-				as_is = 1;
+				as_is = 2;
 				/* Pass on the "--" if we show anything but files.. */
 				if (filter & (DO_FLAGS | DO_REVS))
 					show_file(arg);
diff --git a/revision.c b/revision.c
index 12cd052..d67718c 100644
--- a/revision.c
+++ b/revision.c
@@ -649,7 +649,7 @@ int setup_revisions(int argc, const char
 			/* If we didn't have a "--", all filenames must exist */
 			for (j = i; j < argc; j++) {
 				if (lstat(argv[j], &st) < 0)
-					die("'%s': %s", arg, strerror(errno));
+					die("'%s': %s", argv[j], strerror(errno));
 			}
 			revs->prune_data = get_pathspec(revs->prefix, argv + i);
 			break;

^ permalink raw reply related

* Re: Following renames
From: Petr Baudis @ 2006-03-26 23:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ryan Anderson, git
In-Reply-To: <20060326191445.GQ18185@pasky.or.cz>

Dear diary, on Sun, Mar 26, 2006 at 09:14:45PM CEST, I got a letter
where Petr Baudis <pasky@suse.cz> said that...
> Dear diary, on Sun, Mar 26, 2006 at 06:33:13PM CEST, I got a letter
> where Linus Torvalds <torvalds@osdl.org> said that...
> > If you do
> > 
> > 	git-rev-list --parents --remove-empty $REV -- $filename
> > 
> > then you'll get the whole history for that filename. When it ends, you 
> > know the file went away, and then you do basically _one_ "where the hell 
> > did it go" thing.
> > 
> > And yes, it's not git-ls-tree (unless you only want to follow pure 
> > renames), it's actually one "git-diff-tree -M $lastrev". Then you just 
> > continue with the new filename (and do another "git-rev-list" until you 
> > hit the next rename).
> 
> I wrote a long rant but then it all suddenly fit together and I have now
> an idea how to implement it reasonably elegantly.

So, this is what I have. Testing (I've gave it very little of that) and
thoughts welcome. It is probably pretty efficient, at least in terms of
fork()s it does only 2*N of them where N is the number of commits
containing interesting renames.  Actually, this should be even possible
to reduce to N+1 if you do a single git-diff-tree call and multiplex
different git-rev-lists to it, but I'm too tired to do the trickery now.

It has 'cg' in the name but depends on no Cogito stuff; it should be in
fact possible to trivially put it to git-whatchanged in place of the
final pipeline (not that I'd be suggesting this to be done universally,
but perhaps git-whatchanged -f ...?). There are three downsides in this
regard:

(i) No -c support. I need the separate deltas coming out from
git-diff-tree but I think I can join them together pretty easily on my
own, except that I have problems with -c (see
<20060326102100.GF18185@pasky.or.cz>) so I'm not sure how exactly is it
supposed to behave.

(ii) Only --pretty=raw output. It shouldn't be hard to add the
reformatting code, but I'm personally not going to use it and kind of
lazy, so I'll let someone else do that, I guess. :-)

(iii) Raw deltas required. -p parsing support would be certainly useful
and possible, but see (ii).


To quickly see what it does, you can try it e.g. on the git-log.sh file
in the Git repository.

Thoughts? Opinions? Bugs? Patches?


Signed-off-by: Petr Baudis <pasky@suse.cz>


diff --git a/cg-Xfollowrenames b/cg-Xfollowrenames
new file mode 100755
index 0000000..fa5c552
--- /dev/null
+++ b/cg-Xfollowrenames
@@ -0,0 +1,246 @@
+#!/usr/bin/env perl
+#
+# git-rev-list | git-diff-tree --stdin following renames
+# Copyright (c) Petr Baudis, 2006
+# Uses bits of git-annotate.perl by Ryan Anderson.
+#
+# This script will efficiently show output as of the
+#
+#	git-rev-list --remove-empty ARGS -- FILE... |
+#	git-diff-tree -M -r -m --stdin --pretty=raw ARGS
+#
+# pipeline, except that it follows renames of individual files listed
+# in the FILE... set.
+#
+# Usage:
+#
+#	cg-Xfollowrenames revlistargs -- difftreeargs -- revs -- files
+
+# TODO: Does not work on multiple files properly yet - most probably
+# (I didn't test it!). We want git-rev-list to stop traversing the history
+# when _any_ file disappears while now it probably stops traversing when
+# _all_ files disappear.
+
+use warnings;
+use strict;
+
+$| = 1;
+
+our (@revlist_args, @difftree_args, @revs, @files);
+
+{ # Load arguments
+	my @argp = (\@revlist_args, \@difftree_args, \@revs, \@files);
+	my $argi = 0;
+	for my $arg (@ARGV) {
+		if ($arg eq '--' and $argi < $#argp) {
+			$argi++;
+			next;
+		}
+		push(@{$argp[$argi]}, $arg);
+	}
+}
+
+
+# The heads we watch (sorted by commit time)
+our @heads;
+# Each head is: {
+#	# Persistent for the whole line of development:
+#	pipe => $pipe,
+#	files => \@files, # to watch for
+#
+#	id => $sha1, # useful actually only for debugging
+#	time => $timestamp,
+#	str => $prettyoutput,
+#	parents => \@sha1s,
+#
+#	# When the commit is processed, spawn these extra heads:
+#	recurse => {$sha1id => \@files, ...},
+# }
+
+# To avoid printing duplicate commits
+# FIXME: Currently, we will not handle merge commits properly since
+# we hit them multiple times.
+our %commits;
+
+
+sub open_pipe($@) {
+	my ($stdin, @execlist) = @_;
+
+	my $pid = open my $kid, "-|";
+	defined $pid or die "Cannot fork: $!";
+
+	unless ($pid) {
+		if (defined $stdin) {
+			open STDIN, "<&", $stdin or die "Cannot dup(): $!";
+		}
+		exec @execlist;
+		die "Cannot exec @execlist: $!";
+	}
+
+	return $kid;
+}
+
+sub revlist($@) {
+	my ($rev, @files) = @_;
+	open_pipe(undef, "git-rev-list", "--remove-empty",
+	                 @revlist_args, $rev, "--", @files)
+		or die "Failed to exec git-rev-list: $!";
+}
+
+sub difftree($) {
+	my ($revlist) = @_;
+	open_pipe($revlist, "git-diff-tree", "-r", "-m", "--stdin", "-M",
+	                    "--pretty=raw", @difftree_args)
+		or die "Failed to exec git-diff-tree: $!";
+}
+
+sub revdiffpipe($@) {
+	my ($rev, @files) = @_;
+	my $pipe = difftree(revlist($rev, @files));
+}
+
+
+sub read_commit($$) {
+	my ($head, $tolerant) = @_;
+	my $pipe = $head->{'pipe'};
+	my $against;
+	my @oldset = @{$head->{'files'}};
+	my @newset;
+	my $rename;
+
+	# Load header
+	while (my $line = <$pipe>) {
+		$head->{'str'} .= $line;
+		chomp $line;
+		$line eq '' and goto header_loaded;
+
+		if ($line =~ /^diff-tree (\S+) \(from (root|\S+)\)/) {
+			$head->{'id'} = $1;
+			if (not $tolerant and $commits{$1}++) {
+				close $pipe;
+				return undef;
+			}
+			# The 'root' case is harmless since there'll be no renames.
+			$against = $2;
+		} elsif ($line =~ /^parent (\S+)/) {
+			push (@{$head->{'parents'}}, $1);
+		} elsif ($line =~ /^committer .*?> (\d+)/) {
+			$head->{'time'} = $1;
+		}
+	}
+	return undef;
+header_loaded:
+
+	# Load message
+	while (my $line = <$pipe>) {
+		$head->{'str'} .= $line;
+		chomp $line;
+		$line eq '' and goto message_loaded;
+	}
+	return undef;
+message_loaded:
+
+	# Load delta
+	while (my $line = <$pipe>) {
+		$head->{'str'} .= $line;
+		chomp $line;
+		$line eq '' and goto delta_loaded;
+
+		$line =~ /^:/ or return undef;
+		my ($info, $newfile, $oldfile) = split("\t", $line);
+		if ($info =~ /[RC]\d*$/) {
+			# Behold, a rename!
+			# (Or a copy, it's all the same for us.)
+			my $i;
+			for ($i = 0; $i <= $#oldset; $i++) {
+				$oldfile eq $oldset[$i] or next;
+				$rename = 1;
+				splice(@oldset, $i, 1);
+				push(@newset, $newfile);
+				last;
+			}
+			# In case of multiple candidates, follow
+			# all of them:
+			# (TODO: This might be a policy decision
+			# best left on the user.)
+			if ($i > $#oldset and grep { $oldfile eq $_ } @newset) {
+				$rename = 1;
+				push(@newset, $newfile);
+			}
+		} elsif ($info =~ /D$/) {
+			# Not weeding out deleted files might cause bizarre
+			# results when following multiple files since
+			# git-rev-list weeds them out too (probably?).
+			@oldset = grep { $newfile ne $_ } @oldset;
+			@{$head->{'files'}} = grep { $newfile ne $_ } @{$head->{'files'}};
+		}
+	}
+	$head->{'str'} .= "\n";
+delta_loaded:
+
+	if ($rename) {
+		$head->{'recurse'}->{$against} = [@newset, @oldset];
+	}
+	return 1;
+}
+
+sub load_commit($) {
+	my ($head) = @_;
+	$head->{'time'} = undef;
+	$head->{'str'} = '';
+	$head->{'parents'} = ();
+
+	read_commit($head, 0) or return undef;
+
+	# In case there was a merge, the commit will be multiple times
+	# here, each time with a different delta section. Read them all.
+	for (1 .. $#{$head->{'parents'}}) { # stupid vim syntax highlighting
+		read_commit($head, 1) or return undef;
+	}
+
+	return 1;
+}
+
+
+# Add head at the proper position
+sub add_head($) {
+	my ($head) = @_;
+	my $i;
+	for ($i = 0; $i <= $#heads; $i++) {
+		last if ($head->{'time'} > $heads[$i]->{'time'})
+	}
+	splice(@heads, $i, 0, $head);
+}
+
+# Create new head
+sub init_head($@) {
+	my ($rev, @files) = @_;
+	my $head = { files => \@files, 'pipe' => revdiffpipe($rev, @files) };
+	load_commit($head) or return;
+	add_head($head);
+}
+
+
+
+{ # Seed the heads list
+	for my $rev (@revs) {
+		init_head($rev, @files);
+	}
+}
+
+# Process the heads
+{
+	while (@heads) {
+		my $head = splice(@heads, 0, 1);
+
+		print $head->{'str'};
+
+		foreach my $parent (keys %{$head->{'recurse'}}) {
+			init_head($parent, @{$head->{'recurse'}->{$parent}});
+		}
+		$head->{'recurse'} = undef;
+
+		load_commit($head) or next;
+		add_head($head);
+	}
+}


-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply related

* [PATCH] Remove dependency on a file named "-lz"
From: Johannes Schindelin @ 2006-03-26 23:14 UTC (permalink / raw)
  To: git, junkio


By changing the dependency "$(LIB_H)" to "$(LIBS)", at least one version
of make thought that a file named "-lz" would be needed.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>

---

 Makefile |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

5a8333baa1845924348b958208bee59831d4e04e
diff --git a/Makefile b/Makefile
index a8cb0af..055d155 100644
--- a/Makefile
+++ b/Makefile
@@ -214,8 +214,8 @@
 	fetch-clone.o revision.o pager.o \
 	$(DIFF_OBJS)
 
-LIBS = $(LIB_FILE) $(XDIFF_LIB)
-LIBS += -lz
+GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
+LIBS = $(GITLIBS) -lz
 
 #
 # Platform specific tweaks
@@ -554,7 +554,7 @@
 		-DDEFAULT_GIT_TEMPLATE_DIR='"$(template_dir_SQ)"' $*.c
 
 $(LIB_OBJS): $(LIB_H)
-$(patsubst git-%$X,%.o,$(PROGRAMS)): $(LIBS)
+$(patsubst git-%$X,%.o,$(PROGRAMS)): $(GITLIBS)
 $(DIFF_OBJS): diffcore.h
 
 $(LIB_FILE): $(LIB_OBJS)
-- 
1.2.0.gd95e-dirty

^ permalink raw reply related

* Re: Following renames
From: Linus Torvalds @ 2006-03-26 23:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7vodzsq12g.fsf@assigned-by-dhcp.cox.net>



On Sun, 26 Mar 2006, Junio C Hamano wrote:
> Petr Baudis <pasky@suse.cz> writes:
> 
> >> No, it's the expected output just because you expected merges to always 
> >> show up. Merges get ignored if any of the parents have the same content 
> >> already.
> >
> > Eek. Can I avoid that? What was the reason for choosing this behavior?
> 
> Perhaps rev-list --sparse?

No. "--sparse" still removes the uninteresting parents of merges. It just 
doesn't then make the linear history any denser.

		Linus

^ permalink raw reply

* Re: Following renames
From: Linus Torvalds @ 2006-03-26 23:09 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Ryan Anderson, git
In-Reply-To: <20060326223154.GU18185@pasky.or.cz>

On Mon, 27 Mar 2006, Petr Baudis wrote:

> Dear diary, on Mon, Mar 27, 2006 at 12:22:04AM CEST, I got a letter
> where Linus Torvalds <torvalds@osdl.org> said that...
> > So commit "6" is uninteresting, and commit "5" will never even be
> > looked at, since we decided that the history of "d" comes from the
> > first parent with the same contents.
> 
> And this is the thing I have a problem with - this does not make much
> sense to me, why can't we just follow all parents instead of arbitrarily
> choosing one of them?

Sure, you can. It's _usually_ a huge waste of time, though. Why would you 
want to do more work than you need, since clearly the other parent was 
_not_ interesting from the standpoint of the question "where did this 
content come from"?

> > No, it's the expected output just because you expected merges to always 
> > show up. Merges get ignored if any of the parents have the same content 
> > already.
> 
> Eek. Can I avoid that? What was the reason for choosing this behavior?

Huge efficiency gains.

Lookie here. Do

	gitk -- rev-list.c

on the git archive with the current git-rev-list, and with your hacked-up 
version.

And tell me my version isn't a hell of a lot better. Because, I guarantee 
you, it is. We're just not _interested_ in all those merges that didn't 
actually make any difference.

Read up on what modern neuro-science thinks about the human brain, and 
what a lot of it is about. It's about ignoring irrelevant information.

The ability to throw stuff out that isn't interesting is the _real_ basis 
of true intelligence. I'd rather have git do the _intelligent_ history, 
than show history that isn't relevant and workign harder doing so.

		Linus

^ permalink raw reply

* Re: Following renames
From: Junio C Hamano @ 2006-03-26 22:43 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, Linus Torvalds
In-Reply-To: <20060326223154.GU18185@pasky.or.cz>

Petr Baudis <pasky@suse.cz> writes:

>> No, it's the expected output just because you expected merges to always 
>> show up. Merges get ignored if any of the parents have the same content 
>> already.
>
> Eek. Can I avoid that? What was the reason for choosing this behavior?

Perhaps rev-list --sparse?

^ permalink raw reply

* Re: Following renames
From: Petr Baudis @ 2006-03-26 22:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ryan Anderson, git
In-Reply-To: <Pine.LNX.4.64.0603261415390.15714@g5.osdl.org>

Dear diary, on Mon, Mar 27, 2006 at 12:22:04AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> said that...
> So commit "6" is uninteresting, and commit "5" will never even be
> looked at, since we decided that the history of "d" comes from the
> first parent with the same contents.

And this is the thing I have a problem with - this does not make much
sense to me, why can't we just follow all parents instead of arbitrarily
choosing one of them?

> which is correct (now, there are other histories _too_ that get us to the 
> same point, but the one you found this way was _a_ history).

Ok, in that case I want the _full_ history. :-)

> No, it's the expected output just because you expected merges to always 
> show up. Merges get ignored if any of the parents have the same content 
> already.

Eek. Can I avoid that? What was the reason for choosing this behavior?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: Following renames
From: Linus Torvalds @ 2006-03-26 22:23 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Jakub Narebski, git
In-Reply-To: <e5bfff550603261122m5e680c62ye1290f3e601e947e@mail.gmail.com>

On Sun, 26 Mar 2006, Marco Costalba wrote:
> 
> FIRST WAY
> 
> After annotating a file history (double click on a file name in
> bottom-right window or directly from tree view), you see the whole
> file annotated. If you have the diff window open you see also the
> corresponding patch (scrolled to selected file name).

The problem is that this step is already _way_ too expensive.

I don't want to work with any tool that makes "Step 1" take a minute or 
two for a project that has a few years of history. Try it on the linux 
historic project with some file that gets lots of modifications.

In other words, starting off with "annotate" is MUCH too expensive. You 
should start off basically with "git-whatchanged".

		Linus

^ permalink raw reply

* Re: Following renames
From: Linus Torvalds @ 2006-03-26 22:22 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Ryan Anderson, git
In-Reply-To: <20060326191445.GQ18185@pasky.or.cz>

On Sun, 26 Mar 2006, Petr Baudis wrote:
> 
> My current target is to support this tree (letters are filenames,
> numbers are commit ids; I'll translate any git output to those digits):
> 
>     2    4
>     b -- d
> 1 /        \ 6
> a            d
>   \ 3    5 /
>     c -- d

Yeah, the problem with this is that you need to track separate names 
across separate points. However:

> Curiously, git-rev-list does something totally strange when trying to
> list per-file history at this point:
> 
> 	$ git-rev-list HEAD -- d
> 	4
> 
> Huh? (It should list 6, 5, 4 instead.)

What it does is list the points where file "d" _changed_.

"d" did not change in 6 - it had a parent commit (4) where "d" had the 
same contents (in fact, it likely had _two_ parents where it had the same 
contents, but git will pick the first one). So commit "6" is 
uninteresting, and commit "5" will never even be looked at, since we 
decided that the history of "d" comes from the first parent with the same 
contents.

So then it lists "4", because file "d" really did change in that commit 
(it went away).

Now you need to look at "4" and find the rename (which gives you 2) and 
then from there you do rename detection and get (1), and as a result your 
change history should end up being

 (1)a -> (2)b -> (4)d (-> 6(d) which was your start point)

which is correct (now, there are other histories _too_ that get us to the 
same point, but the one you found this way was _a_ history).

> I worked it around by recording a change in d in the merge 6:
> 
> 	http://pasky.or.cz/~xpasky/renametree2.git/
> 
> 	$ git-rev-list --parents --remove-empty HEAD -- d
> 	6 4 5
> 	5
> 	4
> 
> Which is the expected output.

No, it's the expected output just because you expected merges to always 
show up. Merges get ignored if any of the parents have the same content 
already.

		Linus

^ permalink raw reply

* Re: cg-status and empty directories
From: Petr Baudis @ 2006-03-26 21:37 UTC (permalink / raw)
  To: Jim MacBaine; +Cc: git
In-Reply-To: <3afbacad0602270643k9fdd255w8f3769ad77c54e65@mail.gmail.com>

  Hi,

Dear diary, on Mon, Feb 27, 2006 at 03:43:32PM CET, I got a letter
where Jim MacBaine <jmacbaine@gmail.com> said that...
> Many packages put empty directories under /etc, and although only a
> few of those directories are actually needed, the automatic removal of
> those packages will fail if I remove the empty directories manually.  
> Equally, the removal will fail, if I put a .placeholder file into
> those direrectories and cg-add it.  Is there a simple way out?

  BTW, with Cogito-0.17.1 the simple way out should be cg-status -S
which restores the original behaviour.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: [PATCH] Optionally do not list empty directories in git-ls-files --others
From: Junio C Hamano @ 2006-03-26 21:32 UTC (permalink / raw)
  To: Petr Baudis; +Cc: junkio, Jim MacBaine, git
In-Reply-To: <20060326145952.GM18185@pasky.or.cz>

Petr Baudis <pasky@suse.cz> writes:

>   it turned out that cg-clean depends on the original behaviour...

Supporting both sounds sensible.

^ permalink raw reply

* Re: Following renames
From: Petr Baudis @ 2006-03-26 21:09 UTC (permalink / raw)
  To: Ryan Anderson; +Cc: Linus Torvalds, git
In-Reply-To: <44264426.8010608@michonline.com>

Dear diary, on Sun, Mar 26, 2006 at 09:35:02AM CEST, I got a letter
where Ryan Anderson <ryan@michonline.com> said that...
> Linus Torvalds wrote:
> > On Sun, 26 Mar 2006, Petr Baudis wrote:
> >   
> >>   In [1], Linus suggests a non-core solution. Unfortunately, it doesn't
> >> fly - it requires at least two git-ls-tree calls per revision which
> >> would bog things down awfully (to roughly half of the original speed).
> >>     
> >
> > No it doesn't. It requires one git-ls-tree WHEN SOMETHING IS RENAMED.
> >
> > In other words, basically never.
> >   
> 
> A simple example is the first loop in git-annotate.perl.  (Which was
> basically written by Linus, I just translated it from a
> shell/pseudo-code example into Perl)

One case it does not handle:

         2
      -- b --
  1 /         \ 6
  a             d
    \ 3     5 /
      c --- d

git-rev-list at 6 will (understandably) show

        6 5
        5

and you will never detect the d -> b rename leading to 2.

This is one reason why I'm actually not using --parents and pipe stuff
directly to git-diff-tree --stdin -M and then read its output. This also
lets me merge parallel lines of development based on date and I don't
have to fork per each file deletion.

With any luck I'll have the first draft of my (also perlish) script done
this evening yet. (BTW, it has the same output format as

	git-rev-list | git-diff-tree --pretty=raw -M

so with some tweaking it could also serve as a git-whatchanged backend.
Actually, it would be nice to have it in core Git in the long term so
that it gets all the portability tweaks and such.)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: Following renames
From: Petr Baudis @ 2006-03-26 20:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ryan Anderson, git
In-Reply-To: <20060326191445.GQ18185@pasky.or.cz>

Dear diary, on Sun, Mar 26, 2006 at 09:14:45PM CEST, I got a letter
where Petr Baudis <pasky@suse.cz> said that...
> Curiously, git-rev-list does something totally strange when trying to
> list per-file history at this point:
> 
> 	$ git-rev-list HEAD -- d
> 	4
> 
> Huh? (It should list 6, 5, 4 instead.)

Obviously not 6 since the file was not changed in that revision, but I'd
still expect it to list 5.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: Effective difference between git-rebase and git-resolve
From: J. Bruce Fields @ 2006-03-26 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marc Singer, git
In-Reply-To: <7vacbfxadu.fsf@assigned-by-dhcp.cox.net>

On Fri, Mar 24, 2006 at 11:15:57PM -0800, Junio C Hamano wrote:
>      - Patch C does not apply.  git-am stops here, with conflicts to
>        be resolved in the working tree.  Yet-to-be-applied D and E
>        are still kept in .dotest/ directory at this point.  What the
>        user does is exactly the same as fixing up unapplicable patch
>        when running git-am:
>     
>        - Resolve conflict just like any merge conflicts.
>        - "git am --resolved --3way" to continue applying the patches.

So, does this sum it up accurately for the man page?

--b.

Document git-rebase behavior on conflicts.

---

 Documentation/git-rebase.txt |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

3ef0c8cc7a505f9023a87e7e1ca22251a91bf188
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index b36276c..4a7e67a 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -48,6 +48,18 @@ would be:
              /
     D---E---F---G master
 
+In case of conflict, git-rebase will stop at the first problematic commit
+and leave conflict markers in the tree.  After resolving the conflict manually
+and updating the index with the desired resolution, you can continue the
+rebasing process with
+
+    git am --resolved --3way
+
+Alternatively, you can undo the git-rebase with
+
+    git reset --hard ORIG_HEAD
+    rm -r .dotest
+
 OPTIONS
 -------
 <newbase>::
-- 
1.2.4.g0382

^ permalink raw reply related

* Re: Following renames
From: Marco Costalba @ 2006-03-26 19:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.64.0603260947100.15714@g5.osdl.org>

On 3/26/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> So wouldn't it be _much_ nicer to have a "graphical git-whatchanged",
> where you just delve deeper (and you don't even look at the whole file
> like git-whatchanged does, but you ask for a very particular region).
>
> Ie, what I imagine would be something gitk/qgit like, where you see the
> file content, select a line or two (or a whole function), and it goes back
> in history and shows you the last diff that changed that
> line/two/function. We can do that EFFICIENTLY. Much more efficiently than
> git-annotate, in fact. And then when you see the diff, you might say "I'm
> not interested in this one, that was just a re-indent" and then continue
> back.
>
> THAT is the kind of graphical tool I'd want. And dammit, it should even be
> _easy_. I'm just a total clutz myself when it comes to doing things like
> QT or nice tcl/tk text-panes, and this really does have to be visual,
> since the whole point is that "select text" and interactive part.
>
> So if somebody wants to be a hero, and feels comfortable with those kinds
> of things, this really should be a fairly straightforward thing to do (it
> would be useful even without rename detection or data movement detection,
> but it's also something where you really _could_ do efficient data
> movement detection by just looking at the "whole diff" when something
> changed in that small area).
>

I am a thousand miles away from being an hero (and glad of it), but....

I really need a bit of feedback or comment about this because IMHO
qgit annotate is *almost* very similar to what you would ask, so I
need to understand well the difference:

FIRST WAY

After annotating a file history (double click on a file name in
bottom-right window or directly from tree view), you see the whole
file annotated. If you have the diff window open you see also the
corresponding patch (scrolled to selected file name).

Now, double clicking on the chosen code line in file content makes
currently two things:

  - Diff window is updated to show corresponding revision patch, i.e.
the last patch that modified that line of code.

- File content, as well as file annotation, changes to show the
content of the file just after the patch was applied, from there it is
normally possible to go back in the history of that code region in the
same way, i.e. double clicking on interesting lines.

Biggest limitation of 'annotation browsing' is that 'code removing
only' patches are not annotated and you need to check them  directly
in diff window.

SECOND WAY

Without opening the file viewer it is possible to select a file (or
more then one or one directory) from tree view and press magic wand
button. This causes main view to be updated with git-rev-list  --
<selected paths>  content, i.e. a filtered view.

With diff viewer window open you can browse across file patch history
related to chosen file.

Biggest limitation is that all the revisions who touch the file are
shown, not only the ones limited to a selected region.

IF I HAVE UNDERSTOOD...

If I have understood what you would like to see it something like the following:

- From diff/file viewer window select a code region.

- Press Magic wand button and feed git-rev-list with <selected path>
_and_  <selected content>

- Show git-rev-list output on main window as usual, but now selected
revisions are filtered out not only for path but also for region of
code touched.

Am I guessing correctly?

Marco

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox