Add "--show-all" revision walker flag for debugging

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Add "--show-all" revision walker flag for debugging
@ 2008-02-09 22:02 Linus Torvalds
  2008-02-09 23:52 ` Linus Torvalds
  2008-02-10  1:12 ` Johannes Schindelin
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-02-09 22:02 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


So the kernel merge window for 2.6.25 is closing today (or possibly 
tomorrow if I feel like it), and I've already mentally closed it, so I'm 
starting to free up some resources to look at the interesting problem with 
screwed-up commit dates confusing our commit walker into thinking that 
some uninteresting commit isn't actually uninteresting due to not 
traversing the uninteresting chain deep enough.

But one of the things I noticed is that it's really not very easy to 
visualize the commit walker, because - on purpose - it obvously doesn't 
show the uninteresting commits!

So here's a patch that adds a "--show-all" flag to the revision walker, 
which will make it show uninteresting commits too, and they'll have a '*' 
in front of them (it also fixes a logic error for !verbose_header for 
boundary commits - we should show the '-' even if left_right isn't shown).

It also updates 'gitk' to show those negative commits in gray. Whether 
that's the right color choice or not, I'll leave to bikeshedders, but it 
looks fairly neutral, and shows the differences pretty well.

It actually is interesting even for the cases that git doesn't have any 
problems with, ie for the kernel you can do

	gitk -d --show-all v2.6.24..

and you see just how far down it has to parse things to see it all. The 
use of "-d" is a good idea, since the date-ordered toposort is much better 
at showing why it goes deep down (ie the date of some of those commits 
after 2.6.24 is much older, because they were merged from trees that 
weren't rebased).

So I think this is a useful feature even for non-debugging - just to 
visualize what git does internally more.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Heh. Amusingly, the changes to gitk were much bigger than the trivial 
changes to core git routines themselves. The log-tree.c diffstat is a bit 
bigger just because of the logic fix (making it more than just another 
added "else if ()" statement), while much of the revision.c patch is just 
the argument parsing ;)

 builtin-rev-list.c |    2 ++
 gitk-git/gitk      |   15 ++++++++-------
 log-tree.c         |   12 ++++++++----
 revision.c         |    9 ++++++++-
 revision.h         |    1 +
 5 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index de80158..5966199 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -60,6 +60,8 @@ static void show_commit(struct commit *commit)
 		fputs(header_prefix, stdout);
 	if (commit->object.flags & BOUNDARY)
 		putchar('-');
+	else if (commit->object.flags & UNINTERESTING)
+		putchar('^');
 	else if (revs.left_right) {
 		if (commit->object.flags & SYMMETRIC_LEFT)
 			putchar('<');
diff --git a/gitk-git/gitk b/gitk-git/gitk
index 5560e4d..90d9b61 100644
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -240,11 +240,12 @@ proc getcommitlines {fd view}  {
 	set listed 1
 	if {$j >= 0 && [string match "commit *" $cmit]} {
 	    set ids [string range $cmit 7 [expr {$j - 1}]]
-	    if {[string match {[-<>]*} $ids]} {
+	    if {[string match {[-^<>]*} $ids]} {
 		switch -- [string index $ids 0] {
 		    "-" {set listed 0}
-		    "<" {set listed 2}
-		    ">" {set listed 3}
+		    "^" {set listed 2}
+		    "<" {set listed 3}
+		    ">" {set listed 4}
 		}
 		set ids [string range $ids 1 end]
 	    }
@@ -3627,23 +3628,23 @@ proc drawcmittext {id row col} {
     global linehtag linentag linedtag selectedline
     global canvxmax boldrows boldnamerows fgcolor nullid nullid2
 
-    # listed is 0 for boundary, 1 for normal, 2 for left, 3 for right
+    # listed is 0 for boundary, 1 for normal, 2 for negative, 3 for left, 4 for right
     set listed [lindex $commitlisted $row]
     if {$id eq $nullid} {
 	set ofill red
     } elseif {$id eq $nullid2} {
 	set ofill green
     } else {
-	set ofill [expr {$listed != 0? "blue": "white"}]
+	set ofill [expr {$listed != 0 ? $listed == 2 ? "gray" : "blue" : "white"}]
     }
     set x [xc $row $col]
     set y [yc $row]
     set orad [expr {$linespc / 3}]
-    if {$listed <= 1} {
+    if {$listed <= 2} {
 	set t [$canv create oval [expr {$x - $orad}] [expr {$y - $orad}] \
 		   [expr {$x + $orad - 1}] [expr {$y + $orad - 1}] \
 		   -fill $ofill -outline $fgcolor -width 1 -tags circle]
-    } elseif {$listed == 2} {
+    } elseif {$listed == 3} {
 	# triangle pointing left for left-side commits
 	set t [$canv create polygon \
 		   [expr {$x - $orad}] $y \
diff --git a/log-tree.c b/log-tree.c
index 1f3fcf1..dd110ca 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -149,10 +149,12 @@ void show_log(struct rev_info *opt, const char *sep)
 
 	opt->loginfo = NULL;
 	if (!opt->verbose_header) {
-		if (opt->left_right) {
-			if (commit->object.flags & BOUNDARY)
-				putchar('-');
-			else if (commit->object.flags & SYMMETRIC_LEFT)
+		if (commit->object.flags & BOUNDARY)
+			putchar('-');
+		else if (commit->object.flags & UNINTERESTING)
+			putchar('^');
+		else if (opt->left_right) {
+			if (commit->object.flags & SYMMETRIC_LEFT)
 				putchar('<');
 			else
 				putchar('>');
@@ -250,6 +252,8 @@ void show_log(struct rev_info *opt, const char *sep)
 			fputs("commit ", stdout);
 		if (commit->object.flags & BOUNDARY)
 			putchar('-');
+		else if (commit->object.flags & UNINTERESTING)
+			putchar('^');
 		else if (opt->left_right) {
 			if (commit->object.flags & SYMMETRIC_LEFT)
 				putchar('<');
diff --git a/revision.c b/revision.c
index 6e85aaa..158727c 100644
--- a/revision.c
+++ b/revision.c
@@ -581,7 +581,8 @@ static int limit_list(struct rev_info *revs)
 			mark_parents_uninteresting(commit);
 			if (everybody_uninteresting(list))
 				break;
-			continue;
+			if (!revs->show_all)
+				continue;
 		}
 		if (revs->min_age != -1 && (commit->date > revs->min_age))
 			continue;
@@ -1055,6 +1056,10 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 				revs->dense = 0;
 				continue;
 			}
+			if (!strcmp(arg, "--show-all")) {
+				revs->show_all = 1;
+				continue;
+			}
 			if (!strcmp(arg, "--remove-empty")) {
 				revs->remove_empty_trees = 1;
 				continue;
@@ -1438,6 +1443,8 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
 		return commit_ignore;
 	if (revs->unpacked && has_sha1_pack(commit->object.sha1, revs->ignore_packed))
 		return commit_ignore;
+	if (revs->show_all)
+		return commit_show;
 	if (commit->object.flags & UNINTERESTING)
 		return commit_ignore;
 	if (revs->min_age != -1 && (commit->date > revs->min_age))
diff --git a/revision.h b/revision.h
index 8572315..b5f01f8 100644
--- a/revision.h
+++ b/revision.h
@@ -33,6 +33,7 @@ struct rev_info {
 			prune:1,
 			no_merges:1,
 			no_walk:1,
+			show_all:1,
 			remove_empty_trees:1,
 			simplify_history:1,
 			lifo:1,

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-09 22:02 Add "--show-all" revision walker flag for debugging Linus Torvalds
@ 2008-02-09 23:52 ` Linus Torvalds
  2008-02-10  4:44   ` Junio C Hamano
  2008-02-10  1:12 ` Johannes Schindelin
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-02-09 23:52 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List



On Sat, 9 Feb 2008, Linus Torvalds wrote:
> 
> So here's a patch that adds a "--show-all" flag to the revision walker, 
> which will make it show uninteresting commits too, and they'll have a '*' 
> in front of them (it also fixes a logic error for !verbose_header for 
> boundary commits - we should show the '-' even if left_right isn't shown).

And here's a slight expansion patch on top of the above

It does:

 - when it actually breaks out due to the "everybody_uninteresting()" 
   case, it adds the uninteresting commits (both the one it's looking at 
   now, and the list of pending ones) to the list

   This way, we really list *all* the commits we've looked at

 - Because we now end up listing commits we may not even have been parsed 
   at all "show_log" and "show_commit" need to protect against commits 
   that don't have a commit buffer entry.

That second part is debatable just how it should work. Maybe we shouldn't 
show such entries at all (with this patch those entries do get shown, they 
just don't get any message shown with them). But I think this is a useful 
case.

		Linus

---
 builtin-rev-list.c |    2 +-
 log-tree.c         |    3 +++
 revision.c         |   11 ++++++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 5966199..7163116 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -86,7 +86,7 @@ static void show_commit(struct commit *commit)
 	else
 		putchar('\n');
 
-	if (revs.verbose_header) {
+	if (revs.verbose_header && commit->buffer) {
 		struct strbuf buf;
 		strbuf_init(&buf, 0);
 		pretty_print_commit(revs.commit_format, commit,
diff --git a/log-tree.c b/log-tree.c
index dd110ca..e9ba6df 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -282,6 +282,9 @@ void show_log(struct rev_info *opt, const char *sep)
 		}
 	}
 
+	if (!commit->buffer)
+		return;
+
 	/*
 	 * And then the pretty-printed message itself
 	 */
diff --git a/revision.c b/revision.c
index 158727c..76faf4b 100644
--- a/revision.c
+++ b/revision.c
@@ -558,6 +558,12 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
 	free_patch_ids(&ids);
 }
 
+static void add_to_list(struct commit_list **p, struct commit *commit, struct commit_list *n)
+{
+	p = &commit_list_insert(commit, p)->next;
+	*p = n;
+}
+
 static int limit_list(struct rev_info *revs)
 {
 	struct commit_list *list = revs->commits;
@@ -579,8 +585,11 @@ static int limit_list(struct rev_info *revs)
 			return -1;
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
-			if (everybody_uninteresting(list))
+			if (everybody_uninteresting(list)) {
+				if (revs->show_all)
+					add_to_list(p, commit, list);
 				break;
+			}
 			if (!revs->show_all)
 				continue;
 		}

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-09 23:52 ` Linus Torvalds
@ 2008-02-10  4:44   ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2008-02-10  4:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> And here's a slight expansion patch on top of the above
>
> It does:
>
>  - when it actually breaks out due to the "everybody_uninteresting()" 
>    case, it adds the uninteresting commits (both the one it's looking at 
>    now, and the list of pending ones) to the list
>
>    This way, we really list *all* the commits we've looked at
>
>  - Because we now end up listing commits we may not even have been parsed 
>    at all "show_log" and "show_commit" need to protect against commits 
>    that don't have a commit buffer entry.
>
> That second part is debatable just how it should work. Maybe we shouldn't 
> show such entries at all (with this patch those entries do get shown, they 
> just don't get any message shown with them). But I think this is a useful 
> case.

Showing them would probably be useful (definitely for
debugging), but I do not think it is useful not to show the
buffer contents (i.e. adding without parsing).

It _might_ mean that the excess uninteresting ones we take out
from "list" can be distinguished from the ones that we did look
at (and once thought they were worth keeping), but I do not
think it is reliable information.  The command line parser may
have caused them to be parsed independent from what the revision
traversal did.

So I think it would make more sense to parse them while we add
them to the "newlist", and if we really want to distinguish them
from others we would want to mark them separately with another
flag bit.

Or unparse them when we add them to the "newlist".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-09 22:02 Add "--show-all" revision walker flag for debugging Linus Torvalds
  2008-02-09 23:52 ` Linus Torvalds
@ 2008-02-10  1:12 ` Johannes Schindelin
  2008-02-10  1:22   ` Linus Torvalds
  2008-02-10  1:28   ` Nicolas Pitre
  1 sibling, 2 replies; 20+ messages in thread
From: Johannes Schindelin @ 2008-02-10  1:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

Hi,

On Sat, 9 Feb 2008, Linus Torvalds wrote:

> I'm starting to free up some resources to look at the interesting 
> problem with screwed-up commit dates confusing our commit walker into 
> thinking that some uninteresting commit isn't actually uninteresting due 
> to not traversing the uninteresting chain deep enough.

I was thinking the other night why I did not like the generation header.  
And I found out why: it is redundant information.

So why not have some local "cache" which maintains the generation numbers 
for the commits?  (Much like the "notes" cache I showed last year?)

Hmm?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10  1:12 ` Johannes Schindelin
@ 2008-02-10  1:22   ` Linus Torvalds
  2008-02-10  1:29     ` Johannes Schindelin
  2008-02-10  4:09     ` Junio C Hamano
  2008-02-10  1:28   ` Nicolas Pitre
  1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-02-10  1:22 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List

On Sun, 10 Feb 2008, Johannes Schindelin wrote:
> 
> I was thinking the other night why I did not like the generation header.  
> And I found out why: it is redundant information.

Actually, that's not the real issue.

The real issue is that it doesn't work. I thought about it, and with 
multiple roots (which _can_ get merged together) it just isn't something 
that actually helps.

If you couldn't merge across roots, you could have a "uuid+generation 
header", but the moment you have multiple roots it actually gets quite 
complex.

So scratch the generation header.  It's not the answer.

And I do think that we can do it without it. I'm still thinking about how 
to do it efficiently, but I think I can get there. I just haven't had the 
time to sit down and really think it through and try out my ideas yet.

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10  1:22   ` Linus Torvalds
@ 2008-02-10  1:29     ` Johannes Schindelin
  2008-02-10  4:09     ` Junio C Hamano
  1 sibling, 0 replies; 20+ messages in thread
From: Johannes Schindelin @ 2008-02-10  1:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

Hi,

On Sat, 9 Feb 2008, Linus Torvalds wrote:

> On Sun, 10 Feb 2008, Johannes Schindelin wrote:
> > 
> > I was thinking the other night why I did not like the generation 
> > header.  And I found out why: it is redundant information.
> 
> Actually, that's not the real issue.
> 
> The real issue is that it doesn't work. I thought about it, and with 
> multiple roots (which _can_ get merged together) it just isn't something 
> that actually helps.

I don't see it.  If G(c) >= 1 + max(G(c^) | c^ is a parent of c), then it 
_should_ work.  And then, I think it is _much_ better to just cache it 
locally (if you care enough about it, i.e. if you set a certain config 
option that triggers possibly expensive one-shot caching) than have it in 
the commit object (where the information could be just plain wrong).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10  1:22   ` Linus Torvalds
  2008-02-10  1:29     ` Johannes Schindelin
@ 2008-02-10  4:09     ` Junio C Hamano
  2008-02-10  4:21       ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2008-02-10  4:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> The real issue is that it doesn't work. I thought about it, and with 
> multiple roots (which _can_ get merged together) it just isn't something 
> that actually helps.
>
> If you couldn't merge across roots, you could have a "uuid+generation 
> header", but the moment you have multiple roots it actually gets quite 
> complex.
>
> So scratch the generation header.  It's not the answer.

I do not think multiple roots can be helped without going all
the way down to the roots, and I think it can be proven.

> And I do think that we can do it without it. I'm still thinking about how 
> to do it efficiently, but I think I can get there. I just haven't had the 
> time to sit down and really think it through and try out my ideas yet.

Generation number would on the other hand eliminate the issue of
clock skew if heads are connected, and I think it also can be
proven.

Both proofs need the axiom "there is no loop in the commit
ancestry graph", but that is something we already rely on in the
merge-base computation.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10  4:09     ` Junio C Hamano
@ 2008-02-10  4:21       ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2008-02-10  4:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> The real issue is that it doesn't work. I thought about it, and with 
>> multiple roots (which _can_ get merged together) it just isn't something 
>> that actually helps.
>>
>> If you couldn't merge across roots, you could have a "uuid+generation 
>> header", but the moment you have multiple roots it actually gets quite 
>> complex.
>>
>> So scratch the generation header.  It's not the answer.
>
> I do not think multiple roots can be helped without going all
> the way down to the roots, and I think it can be proven.

Ah, please scratch that comment.  I was puzzled but did not
understand what you meant by uuid+generation.

You are right.  If we cannot tell if we are dealing with
disconnected history, we would always need to play safe and do
the clean-up like what I suggested earlier using merge-base
traversal, which could be costly, and we do not want to pay that
penalty in a connected history, which should be the normal case.

Generation header does not help us detect that case, so it is
not useful.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10  1:12 ` Johannes Schindelin
  2008-02-10  1:22   ` Linus Torvalds
@ 2008-02-10  1:28   ` Nicolas Pitre
  2008-02-10  1:30     ` Johannes Schindelin
  1 sibling, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2008-02-10  1:28 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Sun, 10 Feb 2008, Johannes Schindelin wrote:

> Hi,
> 
> On Sat, 9 Feb 2008, Linus Torvalds wrote:
> 
> > I'm starting to free up some resources to look at the interesting 
> > problem with screwed-up commit dates confusing our commit walker into 
> > thinking that some uninteresting commit isn't actually uninteresting due 
> > to not traversing the uninteresting chain deep enough.
> 
> I was thinking the other night why I did not like the generation header.  
> And I found out why: it is redundant information.
> 
> So why not have some local "cache" which maintains the generation numbers 
> for the commits?  (Much like the "notes" cache I showed last year?)

Absolutely.

I, too, have some reservations about adding any kind of generation 
header to commit objects.  First because it can be generated and 
maintained locally, just like the pack index.  But also because its 
usefulness has not been proven in all possible graph topologies, and 
adding it to the commit header pretty much deny any further 
modifications/improvements on it, if for example some other kind of 
generation notation becomes advantageous to use.

So please don't put it into the commit object format.  The object 
database should ultimately contain only data that cannot be regenerated.

Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10  1:28   ` Nicolas Pitre
@ 2008-02-10  1:30     ` Johannes Schindelin
  2008-02-10 20:17       ` Jakub Narebski
  0 siblings, 1 reply; 20+ messages in thread
From: Johannes Schindelin @ 2008-02-10  1:30 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

Hi,

On Sat, 9 Feb 2008, Nicolas Pitre wrote:

> I, too, have some reservations about adding any kind of generation 
> header to commit objects.  First because it can be generated and 
> maintained locally, just like the pack index.  But also because its 
> usefulness has not been proven in all possible graph topologies, and 
> adding it to the commit header pretty much deny any further 
> modifications/improvements on it, if for example some other kind of 
> generation notation becomes advantageous to use.

I fully agree.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10  1:30     ` Johannes Schindelin
@ 2008-02-10 20:17       ` Jakub Narebski
  2008-02-10 20:50         ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-02-10 20:17 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Nicolas Pitre, Linus Torvalds, Junio C Hamano, Git Mailing List

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Sat, 9 Feb 2008, Nicolas Pitre wrote:
> 
> > I, too, have some reservations about adding any kind of generation 
> > header to commit objects.  First because it can be generated and 
> > maintained locally, just like the pack index.  But also because its 
> > usefulness has not been proven in all possible graph topologies, and 
> > adding it to the commit header pretty much deny any further 
> > modifications/improvements on it, if for example some other kind of 
> > generation notation becomes advantageous to use.
> 
> I fully agree.

I think I'd agree, also because without 'generation' (and 'roots')
commit object header being there from beginning it is not clear when
to calculate it: the first few commits with it would be costly.
Besides graft and shallow information is local, and it affect commit
traversal.

I was thinking about pack-index like file, either directly mapping
"sha1 -> generation + roots", or indirectly "sha1 -> offset", 
"offset -> generation + roots".

P.S. Would having generation + roots be enough?

  gen(rev) = max_i { gen(i) + 1 : i in parents(rev) } || 1

  roots(rev) = { rev  if rev is a root (parentless) commit 
               { union of roots of parents of a commit otherwise

Union in the sense of set sum.

Note that if roots was to be saved in commit header, then it couldn't
be as simple as commit-id for root commits: no self links. It would
have to be empty for root commits, and the "union of roots" would have
to be modified to "union of roots or commit ids for root commits, of
each of parents of a commit".

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10 20:17       ` Jakub Narebski
@ 2008-02-10 20:50         ` Linus Torvalds
  2008-02-10 21:04           ` Nicolas Pitre
  2008-02-10 22:53           ` Jakub Narebski
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-02-10 20:50 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
	Git Mailing List

On Sun, 10 Feb 2008, Jakub Narebski wrote:
> 
> P.S. Would having generation + roots be enough?

I'm wavering here. Maybe just "generation" works (the longest path from 
any root), because what we are looking for is essentially "guarantee that 
this commit cannot possibly be reached from that other commit", and I 
guess a simple generation count actually does work for that (if the 
generation of "x" is smaller than the generation of "y", we definitely 
cannot reach y from x).

At the same time, I'm still not really convinced we need to add the 
redundant info. I do think I *should* have designed it that way to start 
with (and I thought so two years ago - blaah), so the strongest reason for 
"we should add generation numbers" at least for me is that I actually 
think it's a GoodThing(tm) to have.

But adding it is a pretty invasive thing, and would force people to 
upgrade (it really isn't backwards compatible - old versions of git would 
immediately refuse to touch archives with even just a single top commit 
that has a generation number in it, unless we'd hide it at the end of the 
buffer and just uglify things in general).

So even if it does work, the advantage isn't big enough for me to think 
it's really worth it. I'd *really* prefer to not have a flag day here, and 
the fact is, it's redundant information _and_ it fundamentally doesn't 
work with grafting anyway, so even if we were able to start from scratch, 
we'd then have to seriously consider dropping grafts.

So let's work at not needing it, rather than argue whether we could use it 
or not ;)

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10 20:50         ` Linus Torvalds
@ 2008-02-10 21:04           ` Nicolas Pitre
  2008-02-10 22:53           ` Jakub Narebski
  1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2008-02-10 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jakub Narebski, Johannes Schindelin, Junio C Hamano,
	Git Mailing List

On Sun, 10 Feb 2008, Linus Torvalds wrote:

> At the same time, I'm still not really convinced we need to add the 
> redundant info. I do think I *should* have designed it that way to start 
> with (and I thought so two years ago - blaah), so the strongest reason for 
> "we should add generation numbers" at least for me is that I actually 
> think it's a GoodThing(tm) to have.
> 
> But adding it is a pretty invasive thing, and would force people to 
> upgrade (it really isn't backwards compatible - old versions of git would 
> immediately refuse to touch archives with even just a single top commit 
> that has a generation number in it, unless we'd hide it at the end of the 
> buffer and just uglify things in general).

Repeating myself: for one, I'm rather against any such generation 
headers in the commit object, and so is Dscho.

Why?  Because it doesn't have to live in the commit object at all.

Just like we have a locally managed pack index file, we can have a 
locally managed "index of generations" file just fine.


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10 20:50         ` Linus Torvalds
  2008-02-10 21:04           ` Nicolas Pitre
@ 2008-02-10 22:53           ` Jakub Narebski
  2008-02-10 23:11             ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-02-10 22:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
	Git Mailing List

On Sat, 10 Feb 2008, Linus Torvalds wrote:
> 
> On Sun, 10 Feb 2008, Jakub Narebski wrote:
> > 
> > P.S. Would having generation + roots be enough?
>
> I'm wavering here. Maybe just "generation" works (the longest path from 
> any root), because what we are looking for is essentially "guarantee that 
> this commit cannot possibly be reached from that other commit", and I 
> guess a simple generation count actually does work for that (if the 
> generation of "x" is smaller than the generation of "y", we definitely 
> cannot reach y from x).

Well, "generation" number alone would work quote well as an exclusion
mechanism; generation + roots would work better, I think.

Lets take for an example the following revision graph:

roots: a    a    a    aA   aA
gen:   1    2    3    4    5

       a----b----c----d----e
                     /
            A----B--/

gen:        1    2
roots:      A    A

For example lone generation number is enough to decide that 'c'
(generation 3) cannot be reached from 'a' (generation 1 < 3), and
that 'c' (generation 3) cannot be reached from 'B' (generation 2 < 3).
Roots allow for easy check that 'B' (gen: 2, roots: A) cannot be
reached from 'c' (roots: a, and A \not\in a), but can be reached
from 'e' (gen: 5 > 2, roots: aA \ni a).

What I don't know if generation number would be enough to avoid
"going to root" or "going to common ancestor" costly case when
calculating excluded commits.

> At the same time, I'm still not really convinced we need to add the 
> redundant info. I do think I *should* have designed it that way to start 
> with (and I thought so two years ago - blaah), so the strongest reason for 
> "we should add generation numbers" at least for me is that I actually 
> think it's a GoodThing(tm) to have.

While this information can be calculated from revision graph it is
I think costly enough that it truly would be better to have it in
commit object.

Well, we could always start using core.repositoryFormatVersion ;-)

> But adding it is a pretty invasive thing, and would force people to 
> upgrade (it really isn't backwards compatible - old versions of git would 
> immediately refuse to touch archives with even just a single top commit 
> that has a generation number in it, unless we'd hide it at the end of the 
> buffer and just uglify things in general).

Well, we could always add it as a local (per repository) "cache".
With only generation numbers we could use pack-index-like format
to store a mapping "commit sha-1 => generation number", just like
now pack index stores mapping "object sha-1 => offset in pack".

If we want to store also roots, we could either map 
"commit sha-1 => generation number, roots set offset / id" (constant
length value)[*1*], or have gen-*.gen file with generation numbers
and roots, and gen-*.idx as index to that file.


[*1*] If I understand math correctly it would limit us in theory to
up to 64 roots (git.git has 8 roots IIRC).
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10 22:53           ` Jakub Narebski
@ 2008-02-10 23:11             ` Linus Torvalds
  2008-02-11  1:24               ` Jakub Narebski
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-02-10 23:11 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
	Git Mailing List



On Sun, 10 Feb 2008, Jakub Narebski wrote:
> 
> Well, we could always add it as a local (per repository) "cache".
> With only generation numbers we could use pack-index-like format
> to store a mapping "commit sha-1 => generation number", just like
> now pack index stores mapping "object sha-1 => offset in pack".

Yes, and even embedding this in the pack-file (or perhaps, the index, as a 
new index extension) is probably a good idea.

> If we want to store also roots, we could either map 
> "commit sha-1 => generation number, roots set offset / id" (constant
> length value)[*1*], or have gen-*.gen file with generation numbers
> and roots, and gen-*.idx as index to that file.

The thing is, the roots are almost never actually interesting. Very seldom 
do you have any issue with different commits having different roots. So I 
really don't think it's worth it, considering that it's also much more 
complicated than just adding a sequence number to each commit.

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-10 23:11             ` Linus Torvalds
@ 2008-02-11  1:24               ` Jakub Narebski
  2008-02-11  1:59                 ` Nicolas Pitre
  2008-02-11 15:59                 ` Linus Torvalds
  0 siblings, 2 replies; 20+ messages in thread
From: Jakub Narebski @ 2008-02-11  1:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
	Git Mailing List

Linus Torvalds wrote:
> On Sun, 10 Feb 2008, Jakub Narebski wrote:
> > 
> > Well, we could always add it as a local (per repository) "cache".
> > With only generation numbers we could use pack-index-like format
> > to store a mapping "commit sha-1 => generation number", just like
> > now pack index stores mapping "object sha-1 => offset in pack".
> 
> Yes, and even embedding this in the pack-file (or perhaps, the index, as a 
> new index extension) is probably a good idea.

Errr... index is per workarea (per checkout), and this information
is per repository, so IMHO storing this info in an index (dircache)
is a layering violation. Unless you were talking about pack-file-index.

I think that separate file is better idea, this way we can update
generation info lazily: when we went to the root or project whose
all parents have generatrion number, then we save gen numbers to
such "cache" file. If we had to calculate it (even accidentally),
save it.

> > If we want to store also roots, we could either map 
> > "commit sha-1 => generation number, roots set offset / id" (constant
> > length value)[*1*], or have gen-*.gen file with generation numbers
> > and roots, and gen-*.idx as index to that file.
> 
> The thing is, the roots are almost never actually interesting. Very seldom 
> do you have any issue with different commits having different roots. So I 
> really don't think it's worth it, considering that it's also much more 
> complicated than just adding a sequence number to each commit.

Weren't the cases of multiple roots that were difficult? Storing roots
would help with 'hard' (if seldom happening) cases then.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-11  1:24               ` Jakub Narebski
@ 2008-02-11  1:59                 ` Nicolas Pitre
  2008-02-11 15:59                 ` Linus Torvalds
  1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2008-02-11  1:59 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Linus Torvalds, Johannes Schindelin, Junio C Hamano,
	Git Mailing List

On Mon, 11 Feb 2008, Jakub Narebski wrote:

> Linus Torvalds wrote:
> > On Sun, 10 Feb 2008, Jakub Narebski wrote:
> > > 
> > > Well, we could always add it as a local (per repository) "cache".
> > > With only generation numbers we could use pack-index-like format
> > > to store a mapping "commit sha-1 => generation number", just like
> > > now pack index stores mapping "object sha-1 => offset in pack".
> > 
> > Yes, and even embedding this in the pack-file (or perhaps, the index, as a 
> > new index extension) is probably a good idea.
> 
> Errr... index is per workarea (per checkout), and this information
> is per repository, so IMHO storing this info in an index (dircache)
> is a layering violation. Unless you were talking about pack-file-index.

The pack index isn't a good idea since one pack doesn't necessarily 
cover the whole of the repository.  A separate new file is the only 
sensible thing to do, if done at all.


Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-11  1:24               ` Jakub Narebski
  2008-02-11  1:59                 ` Nicolas Pitre
@ 2008-02-11 15:59                 ` Linus Torvalds
  2008-02-11 16:26                   ` Jakub Narebski
  2008-02-11 16:39                   ` Nicolas Pitre
  1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-02-11 15:59 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
	Git Mailing List



On Mon, 11 Feb 2008, Jakub Narebski wrote:
> 
> Errr... index is per workarea (per checkout), and this information
> is per repository, so IMHO storing this info in an index (dircache)
> is a layering violation. Unless you were talking about pack-file-index.

I did mean the pack-file index, not the "cache" index.

> Weren't the cases of multiple roots that were difficult? Storing roots
> would help with 'hard' (if seldom happening) cases then.

It's not that they aren't difficult, it's that they are so rare (ie having 
ranges that really *are* separate never happens in practice). So it's not 
worth worrying about from a performance angle.

The thing that worried me about multiple roots was that they make the 
generation numbers essentially "meaningless" when compared across totally 
unrelated commits, and might give incorrect results for generation number 
comparisons as a result.

However, I decided that if two commits really *are* totally unrelated and 
don't share a commit, then:

 - yes, the generation number comparison is "meaningless"

 - BUT: we don't actually care if it's correct or not, because it will 
   never matter: whatever we choose to do, it's correct. Because there are 
   just two choices:

    (a) stop early because everything we have left is uninteresting

        This is correct, because if they have two separate roots, they'll 
        never meet anyway, so from a correctness standpoint they will 
        never change a interesting commit into an uninteresting one.

    (b) continue to the root because we think we might turn something 
        interesting into an uninteresting commit. This may be wasteful 
        (because the generation numbers happened to fool us into thinking 
        we migth care and one is older than the other), but it's still 
        technically _correct_.

        And we really don't need to care about the performance issues 
        since continuing down to the root is what we'd have had to do even 
        without the generation numbers anyway, but more importantly 
        because we simply don't care - if people start doing comparisons 
        across truly independent commits, they are doing something wrong.

So that's why a generation number is sufficient.

And yes, just generating the generation number when repacking is fine. It 
would mean that unpacked objects don't have generation numbers, but of you 
have tons and tons of unpacked objects, you have more serious problems 
anyway!

			Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-11 15:59                 ` Linus Torvalds
@ 2008-02-11 16:26                   ` Jakub Narebski
  2008-02-11 16:39                   ` Nicolas Pitre
  1 sibling, 0 replies; 20+ messages in thread
From: Jakub Narebski @ 2008-02-11 16:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
	Git Mailing List

On Mon, 11 Feb 2008, Linus Torvalds wrote:
> On Mon, 11 Feb 2008, Jakub Narebski wrote:
>> 
>> Errr... index is per workarea (per checkout), and this information
>> is per repository, so IMHO storing this info in an index (dircache)
>> is a layering violation. Unless you were talking about pack-file-index.
> 
> I did mean the pack-file index, not the "cache" index.

> And yes, just generating the generation number when repacking is fine. It 
> would mean that unpacked objects don't have generation numbers, but of you 
> have tons and tons of unpacked objects, you have more serious problems 
> anyway!

With generation number info in pack index, we could generate them
on repack (adding some time to repack).

With generation number info in separate pack-index like file, we
could add generation info whenever during browsing history we get
to root or to commit with generation number, saving generation
numbers in the by-the-way way, generating them lazily.

>> Weren't the cases of multiple roots that were difficult? Storing roots
>> would help with 'hard' (if seldom happening) cases then.

> The thing that worried me about multiple roots was that they make the 
> generation numbers essentially "meaningless" when compared across totally 
> unrelated commits, and might give incorrect results for generation number 
> comparisons as a result.
> 
> However, I decided that if two commits really *are* totally unrelated and 
> don't share a commit, then:
> 
>  - yes, the generation number comparison is "meaningless"
> 
>  - BUT: we don't actually care if it's correct or not, because it will 
>    never matter: whatever we choose to do, it's correct. Because there are 
>    just two choices:
> 
>     (a) stop early because everything we have left is uninteresting
> 
>     (b) continue to the root because we think we might turn something 
>         interesting into an uninteresting commit.

By the way, with generation number we can always limit walk length
to difference between generation numbers, or distance to root if
it is smaller.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add "--show-all" revision walker flag for debugging
  2008-02-11 15:59                 ` Linus Torvalds
  2008-02-11 16:26                   ` Jakub Narebski
@ 2008-02-11 16:39                   ` Nicolas Pitre
  1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2008-02-11 16:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jakub Narebski, Johannes Schindelin, Junio C Hamano,
	Git Mailing List

On Mon, 11 Feb 2008, Linus Torvalds wrote:

> On Mon, 11 Feb 2008, Jakub Narebski wrote:
> > 
> > Errr... index is per workarea (per checkout), and this information
> > is per repository, so IMHO storing this info in an index (dircache)
> > is a layering violation. Unless you were talking about pack-file-index.
> 
> I did mean the pack-file index, not the "cache" index.

I think this has nothing to do with pack index either.  Your repo may 
have one pack, or multiple packs, or even no pack at all.  They may or 
may not contain commit objects, etc.

The generation indexing is OTOH a repo wide issue, completely orthogonal 
to object packing. It has to be centralized in a single global file, or 
possibly one file per branch, and updated whenever new commits are added 
to the repository.  I think the best format is to have a list of 
(generation_nr, commit_sha1) tuples in assending order, so whenever a 
fetch or a commit is performed then updating this file is only a matter 
of appending to it.

And optimizing it even further, only merge commits (or rather merge 
direct parents as well as branch heads) should have their generation 
number cached in a file since the linear commits are trivially deduced 
from there.

And regenerating this generation cache should not take longer than a 
full history walk.

Nicolas

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-02-11 16:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-09 22:02 Add "--show-all" revision walker flag for debugging Linus Torvalds
2008-02-09 23:52 ` Linus Torvalds
2008-02-10  4:44   ` Junio C Hamano
2008-02-10  1:12 ` Johannes Schindelin
2008-02-10  1:22   ` Linus Torvalds
2008-02-10  1:29     ` Johannes Schindelin
2008-02-10  4:09     ` Junio C Hamano
2008-02-10  4:21       ` Junio C Hamano
2008-02-10  1:28   ` Nicolas Pitre
2008-02-10  1:30     ` Johannes Schindelin
2008-02-10 20:17       ` Jakub Narebski
2008-02-10 20:50         ` Linus Torvalds
2008-02-10 21:04           ` Nicolas Pitre
2008-02-10 22:53           ` Jakub Narebski
2008-02-10 23:11             ` Linus Torvalds
2008-02-11  1:24               ` Jakub Narebski
2008-02-11  1:59                 ` Nicolas Pitre
2008-02-11 15:59                 ` Linus Torvalds
2008-02-11 16:26                   ` Jakub Narebski
2008-02-11 16:39                   ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).