* Unexpected behavior in git-rev-list
@ 2005-09-18 14:49 Peter Eriksen
2005-09-18 17:18 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Peter Eriksen @ 2005-09-18 14:49 UTC (permalink / raw)
To: git
Hello people,
There's something I don't quite understand about git-rev-list.
After adding two files only one shows up with the --objects option.
I'm looking at commit e621a691e9bdbbe263ce34dd20458d9fbbf1a126 at
http://www.student.dtu.dk/~s022018/git/gitweb.cgi?p=recipes.git;a=summary
I can find the difference between the latest commit and it's parent:
> git diff HEAD^ HEAD
diff --git a/HS-Plugins/20050403/Recipe b/HS-Plugins/20050403/Recipe
new file mode 100644
--- /dev/null
+++ b/HS-Plugins/20050403/Recipe
@@ -0,0 +1,16 @@
[snip]
diff --git a/HS-Plugins/20050403/Resources/Dependencies
b/HS-Plugins/20050403/Resources/Dependencies
new file mode 100644
--- /dev/null
+++ b/HS-Plugins/20050403/Resources/Dependencies
@@ -0,0 +1,5 @@
[snip]
Notice that it creates exactly two files. Now I expect the folllowing
objects:
tree HS-Plugins
tree 20050403
blob Recipe
tree Resources
blob Dependencies
Now what I understand so far is that we can list all objects reachable
from the HEAD commit but not reachable from its parent commit by:
> git-rev-list --objects ^HEAD^ HEAD
e621a691e9bdbbe263ce34dd20458d9fbbf1a126
609c26436053564e8df145b175d75df339b2318b
fe47bcfb8f47b55e3f6fabd2b2d188030fb57e1f HS-Plugins
6c8582e49c9f792f4f550fcf510432c84d24d868 20050403
808a68c33f87693c873f8f9c5f66c050a5ddc81e Recipe
My question is now: Why doesn't "git-rev-list --objects ^HEAD^ HEAD"
list the Dependencies blob? I'm a bit confused.
Regards,
Peter
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Unexpected behavior in git-rev-list
2005-09-18 14:49 Unexpected behavior in git-rev-list Peter Eriksen
@ 2005-09-18 17:18 ` Linus Torvalds
2005-09-18 17:58 ` Peter Eriksen
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2005-09-18 17:18 UTC (permalink / raw)
To: Peter Eriksen; +Cc: git
On Sun, 18 Sep 2005, Peter Eriksen wrote:
>
> There's something I don't quite understand about git-rev-list.
> After adding two files only one shows up with the --objects option.
>
> I can find the difference between the latest commit and it's parent:
>
> > git diff HEAD^ HEAD
> diff --git a/HS-Plugins/20050403/Recipe b/HS-Plugins/20050403/Recipe
> new file mode 100644
> --- /dev/null
> +++ b/HS-Plugins/20050403/Recipe
> @@ -0,0 +1,16 @@
> [snip]
> diff --git a/HS-Plugins/20050403/Resources/Dependencies
> b/HS-Plugins/20050403/Resources/Dependencies
> new file mode 100644
> --- /dev/null
> +++ b/HS-Plugins/20050403/Resources/Dependencies
> @@ -0,0 +1,5 @@
> [snip]
>
> Notice that it creates exactly two files. Now I expect the folllowing
> objects:
>
> tree HS-Plugins
> tree 20050403
> blob Recipe
> tree Resources
> blob Dependencies
Well, it looks like some other file has _exactly_ the same contents as the
new "Dependencies", which means that git notices that the blob isn't
actually new.
Which doesn't surprise me at all - you've got a lot of projects there that
seem to have a Dependencies thing. Why wouldn't some other project have
the exact same ones?
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Unexpected behavior in git-rev-list
2005-09-18 17:18 ` Linus Torvalds
@ 2005-09-18 17:58 ` Peter Eriksen
2005-09-21 16:49 ` Peter Eriksen
0 siblings, 1 reply; 9+ messages in thread
From: Peter Eriksen @ 2005-09-18 17:58 UTC (permalink / raw)
To: git
On Sun, Sep 18, 2005 at 10:18:10AM -0700, Linus Torvalds wrote:
>
>
> On Sun, 18 Sep 2005, Peter Eriksen wrote:
> >
> > There's something I don't quite understand about git-rev-list.
> > After adding two files only one shows up with the --objects option.
...
> Well, it looks like some other file has _exactly_ the same contents as the
> new "Dependencies", which means that git notices that the blob isn't
> actually new.
>
> Which doesn't surprise me at all - you've got a lot of projects there that
> seem to have a Dependencies thing. Why wouldn't some other project have
> the exact same ones?
Ah! You are right, there is a Dependencies file in each of the 1000+
directories and they are generated from almost the same setup, so it
must be that there is another one like the one, I just commited. That
explains it. Now I will try to see, if I actually can get the effect I
expected somehow. :-)
So my new challenge to myself: Given two commit objects A and B list all
the tree and blob objects which are not in both A and B.
After that I think writing a command which does the same as
'cvs annotate' would be a good exercise.
Thanks for the explanation.
Regards,
Peter
P.S.
I'm on the list, so it's not necessary to cc me.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Unexpected behavior in git-rev-list
2005-09-18 17:58 ` Peter Eriksen
@ 2005-09-21 16:49 ` Peter Eriksen
2005-09-21 17:40 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Peter Eriksen @ 2005-09-21 16:49 UTC (permalink / raw)
To: git
On Sun, Sep 18, 2005 at 07:58:50PM +0200, Peter Eriksen wrote:
...
> So my new challenge to myself: Given two commit objects A and B list all
> the tree and blob objects which are not in both A and B.
$ git-diff-tree -t A B
>
> After that I think writing a command which does the same as
> 'cvs annotate' would be a good exercise.
Ok, I have a prototype. The algorithm has three steps:
1) traverse the commit DAG in breadth first order
2) for each commit in 1) find the diff against HEAD
3) for each diff from 2) accumulate the lines and
commit ids that latest affected the current HEAD
4) print the commit ids found in 3), one for each
line in HEAD.
The command is used like this:
$ git-annotate-script.sh Documentation/git.txt >blame
$ cat blame
51017101c7a308745ba3c04944457f1dc6a55780
51017101c7a308745ba3c04944457f1dc6a55780
3db6b224cf36748b969acdd96b9fb2de641cd641
51017101c7a308745ba3c04944457f1dc6a55780
51017101c7a308745ba3c04944457f1dc6a55780
...
51017101c7a308745ba3c04944457f1dc6a55780
51017101c7a308745ba3c04944457f1dc6a55780
51017101c7a308745ba3c04944457f1dc6a55780
51017101c7a308745ba3c04944457f1dc6a55780
$ paste -d ' ' blame - <Documentation/git.txt
The script runs in about 6 seconds on my machine.
Any comments?
Regards,
Peter
diff --git a/git-annotate-bfs.pl b/git-annotate-bfs.pl
new file mode 100755
--- /dev/null
+++ b/git-annotate-bfs.pl
@@ -0,0 +1,35 @@
+#!/usr/bin/env perl
+
+# 1) Bredde-først-søgning
+# 2) For hvert commit A i 1) find diff(parent(A), HEAD)
+# 3) Byg anklage-tabellen og skriv den ud.
+
+use strict;
+use warnings;
+
+my $v0 = $ARGV[0];
+
+my @Q; # BFS helper queue of commit ids.
+my %C; # BFS helper colours table. C[commit id] = colour. 1=grey,
2=black.
+my $v;
+
+$C{v0} = 1;
+push @Q, $v0;
+while (@Q) {
+ $v = shift @Q;
+ #print "$v\n"; ### DEBUG
+ open(PARENTS, "git-rev-list --parents --max-count=1 $v |");
+ chomp(my $commits = <PARENTS>);
+ close PARENTS;
+ my @parents = split(' ', $commits);
+ shift @parents;
+ my $v1;
+ foreach $v1 (@parents) {
+ #print "."; ## DEBUG
+ if (not defined($C{$v1})) {
+ $C{$v1} = 1;
+ push @Q, $v1;
+ }
+ }
+ print "$commits\n"
+}
diff --git a/git-annotate-diff.sh b/git-annotate-diff.sh
new file mode 100755
--- /dev/null
+++ b/git-annotate-diff.sh
@@ -0,0 +1,21 @@
+#!/bin/sh
+
+# 1) Bredde-først-søgning
+# 2) For hvert commit A i 1) find diff(parent(A), HEAD)
+# 3) Byg anklage-tabellen og skriv den ud.
+
+FILEPATH=$1
+
+while read COMMITS; do
+ COMMIT=${COMMITS:0:40}
+ echo blame $COMMIT
+ for PARENT in ${COMMITS#* }; do
+ # Hvis HEAD har ændret sig i forhold til $PARENT,
+ # skyldes det ikke $PARENT, men et senere commit,
+ # nemlig ${COMMITS:0:40}
+ DIFF=`git-diff-tree -r -m $PARENT $COMMIT $FILEPATH`
+ if [ -n "$DIFF" ]; then
+ git diff $PARENT HEAD $FILEPATH
+ fi
+ done
+done
diff --git a/git-annotate-script.sh b/git-annotate-script.sh
new file mode 100755
--- /dev/null
+++ b/git-annotate-script.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+
+FILEPATH=$1
+
+BLOB=`git-ls-files --stage | grep $FILEPATH | cut -c8-47`
+LENGTH=`git-cat-file blob $BLOB | wc -l`
+
+git-annotate-bfs.pl HEAD | \
+#cut -c1-40 | git-diff-tree -r -m --stdin rev-list.c | grep "^[^:]" \
+git-annotate-diff.sh $FILEPATH | \
+git-annotate-table.pl $LENGTH
diff --git a/git-annotate-table.pl b/git-annotate-table.pl
new file mode 100755
--- /dev/null
+++ b/git-annotate-table.pl
@@ -0,0 +1,41 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+
+# 1) Bredde-først-søgning
+# 2) For hvert commit A i 1) find diff(parent(A), HEAD)
+# 3) Byg anklage-tabellen og skriv den ud.
+
+my $len = $ARGV[0];
+
+my @T; # Blame table. T[line number] = commit ids
+my $n = 0;
+my $blame;
+my $cln;
+my $state = "header";
+
+while (defined (my $line = <STDIN>) and $n < $len) {
+ if ($line =~ /^blame ([0-9a-fA-F]{40})/) { $blame = $1; $state =
"header"; }
+ elsif ($line =~ /^diff --git/) { $state = "header"; }
+ elsif ($line =~ /^@@ -\d+,\d+ \+(\d+),/) { $cln = $1-1; $state =
"chunks"; }
+ elsif ($state eq "chunks") {
+ if ($line =~ /^\+/) {
+ if (not defined $T[$cln]) { $T[$cln] = $blame; $n++; }
#print "$cln $blame\n"; }
+ $cln++;
+ }
+ elsif ($line =~ /^-/) { }
+ elsif ($line =~ /^ /) { $cln++; }
+ else {
+ print "line = $line\n";
+ print "state = $state\n";
+ print "cln = $cln\n";
+ die "I'm not supposed to read this line.";
+ }
+ }
+}
+
+foreach (@T) {
+ print "$_\n";
+}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Unexpected behavior in git-rev-list
2005-09-21 16:49 ` Peter Eriksen
@ 2005-09-21 17:40 ` Linus Torvalds
2005-09-21 18:34 ` Junio C Hamano
2005-09-22 21:04 ` Daniel Barkalow
0 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2005-09-21 17:40 UTC (permalink / raw)
To: Peter Eriksen; +Cc: git
On Wed, 21 Sep 2005, Peter Eriksen wrote:
>
> Ok, I have a prototype. The algorithm has three steps:
>
> 1) traverse the commit DAG in breadth first order
The thing is, this is _expensive_.
It's very possible to cut down a lot of the costs by having logic to cut
down the expense of looking at the whole commit dag.
In particular, almost all merges will have the same object in _one_ of the
parents as in the result. And if you just make the rule be that you only
follow the first parent that matches the result in the merge, you'll
almost always end up with a nice linear thing, with no need to look at
multiple parents at all.
Of course, sometimes the merge actually _does_ merge changes from both
(or more) sides of a commit, and then you need to follow them down and it
gets nasty and complicated.
Anyway, I've seriously considered adding a mode to "git-rev-list" that
automatically avoids following the parents that aren't relevant for a
certain set of files.
Ie if you did
git-rev-list rev1 rev2 ^rev3 ^rev4 .. pathname
it would only show the revisions that actually _change_ the pathname.
It's not entirely trivial. The biggest bummer is that we'd have to fake
out the parent info (ie the "parent" would have to be the previous entry
that changes it, not the real one).
I'm convinced that it's all quite possible, though, by just rewriting the
"commit->parents" list (remove parents that don't change the set of files,
and in merges where one parent has zero diffs for that set, just select
_that_ parent, and then continue to prune).
It might be best not being done by git-rev-list, but by a specialized
program. However, the advantage of doing it in git-rev-list is that then
things like "git log" and "gitk" would automatically take advantage of it,
ie you could say
gitk v2.6.12.. drivers/char/
and it would show a "cut-down" revision tree that only contained the stuff
that changed anything in drivers/char/.
This would be (a) very useful (b) very powerful and (c) should even be
pretty efficient. Sure, systems that natively do things in a file-specific
way are still a lot more efficient on a single-file basis, but the git
architecture actually lends itself very well to the above kind of "track a
whole subdirectory" (or "track two subdirectories and one filename", or
anything like that).
And a much more efficient "annotate" would fall out automatically out of
it (although I really think that the "gitk v2.6.12.. drivers/char/" is
what would be a lot more useful than annotate has ever been).
We already have this in "git-whatchanged", which I personally find very
very powerful. But we could do it even better.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Unexpected behavior in git-rev-list
2005-09-21 17:40 ` Linus Torvalds
@ 2005-09-21 18:34 ` Junio C Hamano
2005-09-21 18:45 ` Linus Torvalds
2005-09-22 21:04 ` Daniel Barkalow
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2005-09-21 18:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> It might be best not being done by git-rev-list, but by a specialized
> program. However, the advantage of doing it in git-rev-list is that then
> things like "git log" and "gitk" would automatically take advantage of it,
> ie you could say
>
> gitk v2.6.12.. drivers/char/
>
> and it would show a "cut-down" revision tree that only contained the stuff
> that changed anything in drivers/char/.
Wouldn't gitk get confused by the sparse set of commits your
rev-list feeds it, when it tries to draw ancestry lines and find
many commits missing in between them?
Ah, you told him to use 'rev-list --parents' and you can rewrite
the list of parents there -- clever.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Unexpected behavior in git-rev-list
2005-09-21 18:34 ` Junio C Hamano
@ 2005-09-21 18:45 ` Linus Torvalds
0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2005-09-21 18:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Wed, 21 Sep 2005, Junio C Hamano wrote:
>
> Ah, you told him to use 'rev-list --parents' and you can rewrite
> the list of parents there -- clever.
Exactly.
Yes, anything that would parse the "raw" commit information would be very
lost indeed, and not able to figure out parenthood from the sparse list.
But gitk already uses the "fakey" parents.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Unexpected behavior in git-rev-list
2005-09-21 17:40 ` Linus Torvalds
2005-09-21 18:34 ` Junio C Hamano
@ 2005-09-22 21:04 ` Daniel Barkalow
2005-09-22 21:26 ` Linus Torvalds
1 sibling, 1 reply; 9+ messages in thread
From: Daniel Barkalow @ 2005-09-22 21:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Peter Eriksen, git
On Wed, 21 Sep 2005, Linus Torvalds wrote:
> Anyway, I've seriously considered adding a mode to "git-rev-list" that
> automatically avoids following the parents that aren't relevant for a
> certain set of files.
>
> Ie if you did
>
> git-rev-list rev1 rev2 ^rev3 ^rev4 .. pathname
>
> it would only show the revisions that actually _change_ the pathname.
>
> It's not entirely trivial. The biggest bummer is that we'd have to fake
> out the parent info (ie the "parent" would have to be the previous entry
> that changes it, not the real one).
How about a program that made the fake thing real? That is, actually wrote
to the database the entire history with only those paths included, and
only commits that change those paths.
This would be exactly the right thing for the people who want kbuild to be
a separate project from the kernel, because "the kbuild project", with
full history, could be automatically generated.
For that matter, Sam could actually use that repository for maintaining
kbuild, because if mainline merges that instead of merging the present
kbuild-in-kernel repository, it'll be exactly the same. He could pick up
stuff from the mainline by subsetting mainline.
In fact, this operation would allow Junio to push gitk changes upstream,
as well; "git subset -w heads/gitk gitk" would generate the gitk
repository, with the addition of any changes to gitk made and committed in
the git repository.
I think the only problem with this scheme would be that, if someone does a
commit that changes both gitk and something else, the commit message might
be a bit confusing in the gitk tree.
(I'm not sure, but this might also generate just the right thing for
driver maintainers who want to distribute the latest version of their
drivers as an out-of-tree module)
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Unexpected behavior in git-rev-list
2005-09-22 21:04 ` Daniel Barkalow
@ 2005-09-22 21:26 ` Linus Torvalds
0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2005-09-22 21:26 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Peter Eriksen, git
On Thu, 22 Sep 2005, Daniel Barkalow wrote:
> >
> > It's not entirely trivial. The biggest bummer is that we'd have to fake
> > out the parent info (ie the "parent" would have to be the previous entry
> > that changes it, not the real one).
>
> How about a program that made the fake thing real? That is, actually wrote
> to the database the entire history with only those paths included, and
> only commits that change those paths.
I think it would be a fine thing to do. If/once git-rev-list can do the
"limit by filename" part, generating a new git history that uses that
shouldn't be that hard.
And yes, it would be a way to generate a "subproject" automatically.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-09-22 21:26 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-18 14:49 Unexpected behavior in git-rev-list Peter Eriksen
2005-09-18 17:18 ` Linus Torvalds
2005-09-18 17:58 ` Peter Eriksen
2005-09-21 16:49 ` Peter Eriksen
2005-09-21 17:40 ` Linus Torvalds
2005-09-21 18:34 ` Junio C Hamano
2005-09-21 18:45 ` Linus Torvalds
2005-09-22 21:04 ` Daniel Barkalow
2005-09-22 21:26 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).