* Add "--show-all" revision walker flag for debugging
@ 2008-02-09 22:02 Linus Torvalds
2008-02-09 23:52 ` Linus Torvalds
2008-02-10 1:12 ` Johannes Schindelin
0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-02-09 22:02 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
So the kernel merge window for 2.6.25 is closing today (or possibly
tomorrow if I feel like it), and I've already mentally closed it, so I'm
starting to free up some resources to look at the interesting problem with
screwed-up commit dates confusing our commit walker into thinking that
some uninteresting commit isn't actually uninteresting due to not
traversing the uninteresting chain deep enough.
But one of the things I noticed is that it's really not very easy to
visualize the commit walker, because - on purpose - it obvously doesn't
show the uninteresting commits!
So here's a patch that adds a "--show-all" flag to the revision walker,
which will make it show uninteresting commits too, and they'll have a '*'
in front of them (it also fixes a logic error for !verbose_header for
boundary commits - we should show the '-' even if left_right isn't shown).
It also updates 'gitk' to show those negative commits in gray. Whether
that's the right color choice or not, I'll leave to bikeshedders, but it
looks fairly neutral, and shows the differences pretty well.
It actually is interesting even for the cases that git doesn't have any
problems with, ie for the kernel you can do
gitk -d --show-all v2.6.24..
and you see just how far down it has to parse things to see it all. The
use of "-d" is a good idea, since the date-ordered toposort is much better
at showing why it goes deep down (ie the date of some of those commits
after 2.6.24 is much older, because they were merged from trees that
weren't rebased).
So I think this is a useful feature even for non-debugging - just to
visualize what git does internally more.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
Heh. Amusingly, the changes to gitk were much bigger than the trivial
changes to core git routines themselves. The log-tree.c diffstat is a bit
bigger just because of the logic fix (making it more than just another
added "else if ()" statement), while much of the revision.c patch is just
the argument parsing ;)
builtin-rev-list.c | 2 ++
gitk-git/gitk | 15 ++++++++-------
log-tree.c | 12 ++++++++----
revision.c | 9 ++++++++-
revision.h | 1 +
5 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index de80158..5966199 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -60,6 +60,8 @@ static void show_commit(struct commit *commit)
fputs(header_prefix, stdout);
if (commit->object.flags & BOUNDARY)
putchar('-');
+ else if (commit->object.flags & UNINTERESTING)
+ putchar('^');
else if (revs.left_right) {
if (commit->object.flags & SYMMETRIC_LEFT)
putchar('<');
diff --git a/gitk-git/gitk b/gitk-git/gitk
index 5560e4d..90d9b61 100644
--- a/gitk-git/gitk
+++ b/gitk-git/gitk
@@ -240,11 +240,12 @@ proc getcommitlines {fd view} {
set listed 1
if {$j >= 0 && [string match "commit *" $cmit]} {
set ids [string range $cmit 7 [expr {$j - 1}]]
- if {[string match {[-<>]*} $ids]} {
+ if {[string match {[-^<>]*} $ids]} {
switch -- [string index $ids 0] {
"-" {set listed 0}
- "<" {set listed 2}
- ">" {set listed 3}
+ "^" {set listed 2}
+ "<" {set listed 3}
+ ">" {set listed 4}
}
set ids [string range $ids 1 end]
}
@@ -3627,23 +3628,23 @@ proc drawcmittext {id row col} {
global linehtag linentag linedtag selectedline
global canvxmax boldrows boldnamerows fgcolor nullid nullid2
- # listed is 0 for boundary, 1 for normal, 2 for left, 3 for right
+ # listed is 0 for boundary, 1 for normal, 2 for negative, 3 for left, 4 for right
set listed [lindex $commitlisted $row]
if {$id eq $nullid} {
set ofill red
} elseif {$id eq $nullid2} {
set ofill green
} else {
- set ofill [expr {$listed != 0? "blue": "white"}]
+ set ofill [expr {$listed != 0 ? $listed == 2 ? "gray" : "blue" : "white"}]
}
set x [xc $row $col]
set y [yc $row]
set orad [expr {$linespc / 3}]
- if {$listed <= 1} {
+ if {$listed <= 2} {
set t [$canv create oval [expr {$x - $orad}] [expr {$y - $orad}] \
[expr {$x + $orad - 1}] [expr {$y + $orad - 1}] \
-fill $ofill -outline $fgcolor -width 1 -tags circle]
- } elseif {$listed == 2} {
+ } elseif {$listed == 3} {
# triangle pointing left for left-side commits
set t [$canv create polygon \
[expr {$x - $orad}] $y \
diff --git a/log-tree.c b/log-tree.c
index 1f3fcf1..dd110ca 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -149,10 +149,12 @@ void show_log(struct rev_info *opt, const char *sep)
opt->loginfo = NULL;
if (!opt->verbose_header) {
- if (opt->left_right) {
- if (commit->object.flags & BOUNDARY)
- putchar('-');
- else if (commit->object.flags & SYMMETRIC_LEFT)
+ if (commit->object.flags & BOUNDARY)
+ putchar('-');
+ else if (commit->object.flags & UNINTERESTING)
+ putchar('^');
+ else if (opt->left_right) {
+ if (commit->object.flags & SYMMETRIC_LEFT)
putchar('<');
else
putchar('>');
@@ -250,6 +252,8 @@ void show_log(struct rev_info *opt, const char *sep)
fputs("commit ", stdout);
if (commit->object.flags & BOUNDARY)
putchar('-');
+ else if (commit->object.flags & UNINTERESTING)
+ putchar('^');
else if (opt->left_right) {
if (commit->object.flags & SYMMETRIC_LEFT)
putchar('<');
diff --git a/revision.c b/revision.c
index 6e85aaa..158727c 100644
--- a/revision.c
+++ b/revision.c
@@ -581,7 +581,8 @@ static int limit_list(struct rev_info *revs)
mark_parents_uninteresting(commit);
if (everybody_uninteresting(list))
break;
- continue;
+ if (!revs->show_all)
+ continue;
}
if (revs->min_age != -1 && (commit->date > revs->min_age))
continue;
@@ -1055,6 +1056,10 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
revs->dense = 0;
continue;
}
+ if (!strcmp(arg, "--show-all")) {
+ revs->show_all = 1;
+ continue;
+ }
if (!strcmp(arg, "--remove-empty")) {
revs->remove_empty_trees = 1;
continue;
@@ -1438,6 +1443,8 @@ enum commit_action simplify_commit(struct rev_info *revs, struct commit *commit)
return commit_ignore;
if (revs->unpacked && has_sha1_pack(commit->object.sha1, revs->ignore_packed))
return commit_ignore;
+ if (revs->show_all)
+ return commit_show;
if (commit->object.flags & UNINTERESTING)
return commit_ignore;
if (revs->min_age != -1 && (commit->date > revs->min_age))
diff --git a/revision.h b/revision.h
index 8572315..b5f01f8 100644
--- a/revision.h
+++ b/revision.h
@@ -33,6 +33,7 @@ struct rev_info {
prune:1,
no_merges:1,
no_walk:1,
+ show_all:1,
remove_empty_trees:1,
simplify_history:1,
lifo:1,
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-09 22:02 Add "--show-all" revision walker flag for debugging Linus Torvalds
@ 2008-02-09 23:52 ` Linus Torvalds
2008-02-10 4:44 ` Junio C Hamano
2008-02-10 1:12 ` Johannes Schindelin
1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-02-09 23:52 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
On Sat, 9 Feb 2008, Linus Torvalds wrote:
>
> So here's a patch that adds a "--show-all" flag to the revision walker,
> which will make it show uninteresting commits too, and they'll have a '*'
> in front of them (it also fixes a logic error for !verbose_header for
> boundary commits - we should show the '-' even if left_right isn't shown).
And here's a slight expansion patch on top of the above
It does:
- when it actually breaks out due to the "everybody_uninteresting()"
case, it adds the uninteresting commits (both the one it's looking at
now, and the list of pending ones) to the list
This way, we really list *all* the commits we've looked at
- Because we now end up listing commits we may not even have been parsed
at all "show_log" and "show_commit" need to protect against commits
that don't have a commit buffer entry.
That second part is debatable just how it should work. Maybe we shouldn't
show such entries at all (with this patch those entries do get shown, they
just don't get any message shown with them). But I think this is a useful
case.
Linus
---
builtin-rev-list.c | 2 +-
log-tree.c | 3 +++
revision.c | 11 ++++++++++-
3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 5966199..7163116 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -86,7 +86,7 @@ static void show_commit(struct commit *commit)
else
putchar('\n');
- if (revs.verbose_header) {
+ if (revs.verbose_header && commit->buffer) {
struct strbuf buf;
strbuf_init(&buf, 0);
pretty_print_commit(revs.commit_format, commit,
diff --git a/log-tree.c b/log-tree.c
index dd110ca..e9ba6df 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -282,6 +282,9 @@ void show_log(struct rev_info *opt, const char *sep)
}
}
+ if (!commit->buffer)
+ return;
+
/*
* And then the pretty-printed message itself
*/
diff --git a/revision.c b/revision.c
index 158727c..76faf4b 100644
--- a/revision.c
+++ b/revision.c
@@ -558,6 +558,12 @@ static void cherry_pick_list(struct commit_list *list, struct rev_info *revs)
free_patch_ids(&ids);
}
+static void add_to_list(struct commit_list **p, struct commit *commit, struct commit_list *n)
+{
+ p = &commit_list_insert(commit, p)->next;
+ *p = n;
+}
+
static int limit_list(struct rev_info *revs)
{
struct commit_list *list = revs->commits;
@@ -579,8 +585,11 @@ static int limit_list(struct rev_info *revs)
return -1;
if (obj->flags & UNINTERESTING) {
mark_parents_uninteresting(commit);
- if (everybody_uninteresting(list))
+ if (everybody_uninteresting(list)) {
+ if (revs->show_all)
+ add_to_list(p, commit, list);
break;
+ }
if (!revs->show_all)
continue;
}
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-09 23:52 ` Linus Torvalds
@ 2008-02-10 4:44 ` Junio C Hamano
0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2008-02-10 4:44 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> And here's a slight expansion patch on top of the above
>
> It does:
>
> - when it actually breaks out due to the "everybody_uninteresting()"
> case, it adds the uninteresting commits (both the one it's looking at
> now, and the list of pending ones) to the list
>
> This way, we really list *all* the commits we've looked at
>
> - Because we now end up listing commits we may not even have been parsed
> at all "show_log" and "show_commit" need to protect against commits
> that don't have a commit buffer entry.
>
> That second part is debatable just how it should work. Maybe we shouldn't
> show such entries at all (with this patch those entries do get shown, they
> just don't get any message shown with them). But I think this is a useful
> case.
Showing them would probably be useful (definitely for
debugging), but I do not think it is useful not to show the
buffer contents (i.e. adding without parsing).
It _might_ mean that the excess uninteresting ones we take out
from "list" can be distinguished from the ones that we did look
at (and once thought they were worth keeping), but I do not
think it is reliable information. The command line parser may
have caused them to be parsed independent from what the revision
traversal did.
So I think it would make more sense to parse them while we add
them to the "newlist", and if we really want to distinguish them
from others we would want to mark them separately with another
flag bit.
Or unparse them when we add them to the "newlist".
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-09 22:02 Add "--show-all" revision walker flag for debugging Linus Torvalds
2008-02-09 23:52 ` Linus Torvalds
@ 2008-02-10 1:12 ` Johannes Schindelin
2008-02-10 1:22 ` Linus Torvalds
2008-02-10 1:28 ` Nicolas Pitre
1 sibling, 2 replies; 20+ messages in thread
From: Johannes Schindelin @ 2008-02-10 1:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
Hi,
On Sat, 9 Feb 2008, Linus Torvalds wrote:
> I'm starting to free up some resources to look at the interesting
> problem with screwed-up commit dates confusing our commit walker into
> thinking that some uninteresting commit isn't actually uninteresting due
> to not traversing the uninteresting chain deep enough.
I was thinking the other night why I did not like the generation header.
And I found out why: it is redundant information.
So why not have some local "cache" which maintains the generation numbers
for the commits? (Much like the "notes" cache I showed last year?)
Hmm?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 1:12 ` Johannes Schindelin
@ 2008-02-10 1:22 ` Linus Torvalds
2008-02-10 1:29 ` Johannes Schindelin
2008-02-10 4:09 ` Junio C Hamano
2008-02-10 1:28 ` Nicolas Pitre
1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-02-10 1:22 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Junio C Hamano, Git Mailing List
On Sun, 10 Feb 2008, Johannes Schindelin wrote:
>
> I was thinking the other night why I did not like the generation header.
> And I found out why: it is redundant information.
Actually, that's not the real issue.
The real issue is that it doesn't work. I thought about it, and with
multiple roots (which _can_ get merged together) it just isn't something
that actually helps.
If you couldn't merge across roots, you could have a "uuid+generation
header", but the moment you have multiple roots it actually gets quite
complex.
So scratch the generation header. It's not the answer.
And I do think that we can do it without it. I'm still thinking about how
to do it efficiently, but I think I can get there. I just haven't had the
time to sit down and really think it through and try out my ideas yet.
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 1:22 ` Linus Torvalds
@ 2008-02-10 1:29 ` Johannes Schindelin
2008-02-10 4:09 ` Junio C Hamano
1 sibling, 0 replies; 20+ messages in thread
From: Johannes Schindelin @ 2008-02-10 1:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
Hi,
On Sat, 9 Feb 2008, Linus Torvalds wrote:
> On Sun, 10 Feb 2008, Johannes Schindelin wrote:
> >
> > I was thinking the other night why I did not like the generation
> > header. And I found out why: it is redundant information.
>
> Actually, that's not the real issue.
>
> The real issue is that it doesn't work. I thought about it, and with
> multiple roots (which _can_ get merged together) it just isn't something
> that actually helps.
I don't see it. If G(c) >= 1 + max(G(c^) | c^ is a parent of c), then it
_should_ work. And then, I think it is _much_ better to just cache it
locally (if you care enough about it, i.e. if you set a certain config
option that triggers possibly expensive one-shot caching) than have it in
the commit object (where the information could be just plain wrong).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 1:22 ` Linus Torvalds
2008-02-10 1:29 ` Johannes Schindelin
@ 2008-02-10 4:09 ` Junio C Hamano
2008-02-10 4:21 ` Junio C Hamano
1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2008-02-10 4:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Johannes Schindelin, Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> The real issue is that it doesn't work. I thought about it, and with
> multiple roots (which _can_ get merged together) it just isn't something
> that actually helps.
>
> If you couldn't merge across roots, you could have a "uuid+generation
> header", but the moment you have multiple roots it actually gets quite
> complex.
>
> So scratch the generation header. It's not the answer.
I do not think multiple roots can be helped without going all
the way down to the roots, and I think it can be proven.
> And I do think that we can do it without it. I'm still thinking about how
> to do it efficiently, but I think I can get there. I just haven't had the
> time to sit down and really think it through and try out my ideas yet.
Generation number would on the other hand eliminate the issue of
clock skew if heads are connected, and I think it also can be
proven.
Both proofs need the axiom "there is no loop in the commit
ancestry graph", but that is something we already rely on in the
merge-base computation.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 4:09 ` Junio C Hamano
@ 2008-02-10 4:21 ` Junio C Hamano
0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2008-02-10 4:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Johannes Schindelin, Git Mailing List
Junio C Hamano <gitster@pobox.com> writes:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> The real issue is that it doesn't work. I thought about it, and with
>> multiple roots (which _can_ get merged together) it just isn't something
>> that actually helps.
>>
>> If you couldn't merge across roots, you could have a "uuid+generation
>> header", but the moment you have multiple roots it actually gets quite
>> complex.
>>
>> So scratch the generation header. It's not the answer.
>
> I do not think multiple roots can be helped without going all
> the way down to the roots, and I think it can be proven.
Ah, please scratch that comment. I was puzzled but did not
understand what you meant by uuid+generation.
You are right. If we cannot tell if we are dealing with
disconnected history, we would always need to play safe and do
the clean-up like what I suggested earlier using merge-base
traversal, which could be costly, and we do not want to pay that
penalty in a connected history, which should be the normal case.
Generation header does not help us detect that case, so it is
not useful.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 1:12 ` Johannes Schindelin
2008-02-10 1:22 ` Linus Torvalds
@ 2008-02-10 1:28 ` Nicolas Pitre
2008-02-10 1:30 ` Johannes Schindelin
1 sibling, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2008-02-10 1:28 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List
On Sun, 10 Feb 2008, Johannes Schindelin wrote:
> Hi,
>
> On Sat, 9 Feb 2008, Linus Torvalds wrote:
>
> > I'm starting to free up some resources to look at the interesting
> > problem with screwed-up commit dates confusing our commit walker into
> > thinking that some uninteresting commit isn't actually uninteresting due
> > to not traversing the uninteresting chain deep enough.
>
> I was thinking the other night why I did not like the generation header.
> And I found out why: it is redundant information.
>
> So why not have some local "cache" which maintains the generation numbers
> for the commits? (Much like the "notes" cache I showed last year?)
Absolutely.
I, too, have some reservations about adding any kind of generation
header to commit objects. First because it can be generated and
maintained locally, just like the pack index. But also because its
usefulness has not been proven in all possible graph topologies, and
adding it to the commit header pretty much deny any further
modifications/improvements on it, if for example some other kind of
generation notation becomes advantageous to use.
So please don't put it into the commit object format. The object
database should ultimately contain only data that cannot be regenerated.
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 1:28 ` Nicolas Pitre
@ 2008-02-10 1:30 ` Johannes Schindelin
2008-02-10 20:17 ` Jakub Narebski
0 siblings, 1 reply; 20+ messages in thread
From: Johannes Schindelin @ 2008-02-10 1:30 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List
Hi,
On Sat, 9 Feb 2008, Nicolas Pitre wrote:
> I, too, have some reservations about adding any kind of generation
> header to commit objects. First because it can be generated and
> maintained locally, just like the pack index. But also because its
> usefulness has not been proven in all possible graph topologies, and
> adding it to the commit header pretty much deny any further
> modifications/improvements on it, if for example some other kind of
> generation notation becomes advantageous to use.
I fully agree.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 1:30 ` Johannes Schindelin
@ 2008-02-10 20:17 ` Jakub Narebski
2008-02-10 20:50 ` Linus Torvalds
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-02-10 20:17 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Nicolas Pitre, Linus Torvalds, Junio C Hamano, Git Mailing List
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> On Sat, 9 Feb 2008, Nicolas Pitre wrote:
>
> > I, too, have some reservations about adding any kind of generation
> > header to commit objects. First because it can be generated and
> > maintained locally, just like the pack index. But also because its
> > usefulness has not been proven in all possible graph topologies, and
> > adding it to the commit header pretty much deny any further
> > modifications/improvements on it, if for example some other kind of
> > generation notation becomes advantageous to use.
>
> I fully agree.
I think I'd agree, also because without 'generation' (and 'roots')
commit object header being there from beginning it is not clear when
to calculate it: the first few commits with it would be costly.
Besides graft and shallow information is local, and it affect commit
traversal.
I was thinking about pack-index like file, either directly mapping
"sha1 -> generation + roots", or indirectly "sha1 -> offset",
"offset -> generation + roots".
P.S. Would having generation + roots be enough?
gen(rev) = max_i { gen(i) + 1 : i in parents(rev) } || 1
roots(rev) = { rev if rev is a root (parentless) commit
{ union of roots of parents of a commit otherwise
Union in the sense of set sum.
Note that if roots was to be saved in commit header, then it couldn't
be as simple as commit-id for root commits: no self links. It would
have to be empty for root commits, and the "union of roots" would have
to be modified to "union of roots or commit ids for root commits, of
each of parents of a commit".
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 20:17 ` Jakub Narebski
@ 2008-02-10 20:50 ` Linus Torvalds
2008-02-10 21:04 ` Nicolas Pitre
2008-02-10 22:53 ` Jakub Narebski
0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-02-10 20:50 UTC (permalink / raw)
To: Jakub Narebski
Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
Git Mailing List
On Sun, 10 Feb 2008, Jakub Narebski wrote:
>
> P.S. Would having generation + roots be enough?
I'm wavering here. Maybe just "generation" works (the longest path from
any root), because what we are looking for is essentially "guarantee that
this commit cannot possibly be reached from that other commit", and I
guess a simple generation count actually does work for that (if the
generation of "x" is smaller than the generation of "y", we definitely
cannot reach y from x).
At the same time, I'm still not really convinced we need to add the
redundant info. I do think I *should* have designed it that way to start
with (and I thought so two years ago - blaah), so the strongest reason for
"we should add generation numbers" at least for me is that I actually
think it's a GoodThing(tm) to have.
But adding it is a pretty invasive thing, and would force people to
upgrade (it really isn't backwards compatible - old versions of git would
immediately refuse to touch archives with even just a single top commit
that has a generation number in it, unless we'd hide it at the end of the
buffer and just uglify things in general).
So even if it does work, the advantage isn't big enough for me to think
it's really worth it. I'd *really* prefer to not have a flag day here, and
the fact is, it's redundant information _and_ it fundamentally doesn't
work with grafting anyway, so even if we were able to start from scratch,
we'd then have to seriously consider dropping grafts.
So let's work at not needing it, rather than argue whether we could use it
or not ;)
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 20:50 ` Linus Torvalds
@ 2008-02-10 21:04 ` Nicolas Pitre
2008-02-10 22:53 ` Jakub Narebski
1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2008-02-10 21:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jakub Narebski, Johannes Schindelin, Junio C Hamano,
Git Mailing List
On Sun, 10 Feb 2008, Linus Torvalds wrote:
> At the same time, I'm still not really convinced we need to add the
> redundant info. I do think I *should* have designed it that way to start
> with (and I thought so two years ago - blaah), so the strongest reason for
> "we should add generation numbers" at least for me is that I actually
> think it's a GoodThing(tm) to have.
>
> But adding it is a pretty invasive thing, and would force people to
> upgrade (it really isn't backwards compatible - old versions of git would
> immediately refuse to touch archives with even just a single top commit
> that has a generation number in it, unless we'd hide it at the end of the
> buffer and just uglify things in general).
Repeating myself: for one, I'm rather against any such generation
headers in the commit object, and so is Dscho.
Why? Because it doesn't have to live in the commit object at all.
Just like we have a locally managed pack index file, we can have a
locally managed "index of generations" file just fine.
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 20:50 ` Linus Torvalds
2008-02-10 21:04 ` Nicolas Pitre
@ 2008-02-10 22:53 ` Jakub Narebski
2008-02-10 23:11 ` Linus Torvalds
1 sibling, 1 reply; 20+ messages in thread
From: Jakub Narebski @ 2008-02-10 22:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
Git Mailing List
On Sat, 10 Feb 2008, Linus Torvalds wrote:
>
> On Sun, 10 Feb 2008, Jakub Narebski wrote:
> >
> > P.S. Would having generation + roots be enough?
>
> I'm wavering here. Maybe just "generation" works (the longest path from
> any root), because what we are looking for is essentially "guarantee that
> this commit cannot possibly be reached from that other commit", and I
> guess a simple generation count actually does work for that (if the
> generation of "x" is smaller than the generation of "y", we definitely
> cannot reach y from x).
Well, "generation" number alone would work quote well as an exclusion
mechanism; generation + roots would work better, I think.
Lets take for an example the following revision graph:
roots: a a a aA aA
gen: 1 2 3 4 5
a----b----c----d----e
/
A----B--/
gen: 1 2
roots: A A
For example lone generation number is enough to decide that 'c'
(generation 3) cannot be reached from 'a' (generation 1 < 3), and
that 'c' (generation 3) cannot be reached from 'B' (generation 2 < 3).
Roots allow for easy check that 'B' (gen: 2, roots: A) cannot be
reached from 'c' (roots: a, and A \not\in a), but can be reached
from 'e' (gen: 5 > 2, roots: aA \ni a).
What I don't know if generation number would be enough to avoid
"going to root" or "going to common ancestor" costly case when
calculating excluded commits.
> At the same time, I'm still not really convinced we need to add the
> redundant info. I do think I *should* have designed it that way to start
> with (and I thought so two years ago - blaah), so the strongest reason for
> "we should add generation numbers" at least for me is that I actually
> think it's a GoodThing(tm) to have.
While this information can be calculated from revision graph it is
I think costly enough that it truly would be better to have it in
commit object.
Well, we could always start using core.repositoryFormatVersion ;-)
> But adding it is a pretty invasive thing, and would force people to
> upgrade (it really isn't backwards compatible - old versions of git would
> immediately refuse to touch archives with even just a single top commit
> that has a generation number in it, unless we'd hide it at the end of the
> buffer and just uglify things in general).
Well, we could always add it as a local (per repository) "cache".
With only generation numbers we could use pack-index-like format
to store a mapping "commit sha-1 => generation number", just like
now pack index stores mapping "object sha-1 => offset in pack".
If we want to store also roots, we could either map
"commit sha-1 => generation number, roots set offset / id" (constant
length value)[*1*], or have gen-*.gen file with generation numbers
and roots, and gen-*.idx as index to that file.
[*1*] If I understand math correctly it would limit us in theory to
up to 64 roots (git.git has 8 roots IIRC).
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 22:53 ` Jakub Narebski
@ 2008-02-10 23:11 ` Linus Torvalds
2008-02-11 1:24 ` Jakub Narebski
0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2008-02-10 23:11 UTC (permalink / raw)
To: Jakub Narebski
Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
Git Mailing List
On Sun, 10 Feb 2008, Jakub Narebski wrote:
>
> Well, we could always add it as a local (per repository) "cache".
> With only generation numbers we could use pack-index-like format
> to store a mapping "commit sha-1 => generation number", just like
> now pack index stores mapping "object sha-1 => offset in pack".
Yes, and even embedding this in the pack-file (or perhaps, the index, as a
new index extension) is probably a good idea.
> If we want to store also roots, we could either map
> "commit sha-1 => generation number, roots set offset / id" (constant
> length value)[*1*], or have gen-*.gen file with generation numbers
> and roots, and gen-*.idx as index to that file.
The thing is, the roots are almost never actually interesting. Very seldom
do you have any issue with different commits having different roots. So I
really don't think it's worth it, considering that it's also much more
complicated than just adding a sequence number to each commit.
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-10 23:11 ` Linus Torvalds
@ 2008-02-11 1:24 ` Jakub Narebski
2008-02-11 1:59 ` Nicolas Pitre
2008-02-11 15:59 ` Linus Torvalds
0 siblings, 2 replies; 20+ messages in thread
From: Jakub Narebski @ 2008-02-11 1:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
Git Mailing List
Linus Torvalds wrote:
> On Sun, 10 Feb 2008, Jakub Narebski wrote:
> >
> > Well, we could always add it as a local (per repository) "cache".
> > With only generation numbers we could use pack-index-like format
> > to store a mapping "commit sha-1 => generation number", just like
> > now pack index stores mapping "object sha-1 => offset in pack".
>
> Yes, and even embedding this in the pack-file (or perhaps, the index, as a
> new index extension) is probably a good idea.
Errr... index is per workarea (per checkout), and this information
is per repository, so IMHO storing this info in an index (dircache)
is a layering violation. Unless you were talking about pack-file-index.
I think that separate file is better idea, this way we can update
generation info lazily: when we went to the root or project whose
all parents have generatrion number, then we save gen numbers to
such "cache" file. If we had to calculate it (even accidentally),
save it.
> > If we want to store also roots, we could either map
> > "commit sha-1 => generation number, roots set offset / id" (constant
> > length value)[*1*], or have gen-*.gen file with generation numbers
> > and roots, and gen-*.idx as index to that file.
>
> The thing is, the roots are almost never actually interesting. Very seldom
> do you have any issue with different commits having different roots. So I
> really don't think it's worth it, considering that it's also much more
> complicated than just adding a sequence number to each commit.
Weren't the cases of multiple roots that were difficult? Storing roots
would help with 'hard' (if seldom happening) cases then.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-11 1:24 ` Jakub Narebski
@ 2008-02-11 1:59 ` Nicolas Pitre
2008-02-11 15:59 ` Linus Torvalds
1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2008-02-11 1:59 UTC (permalink / raw)
To: Jakub Narebski
Cc: Linus Torvalds, Johannes Schindelin, Junio C Hamano,
Git Mailing List
On Mon, 11 Feb 2008, Jakub Narebski wrote:
> Linus Torvalds wrote:
> > On Sun, 10 Feb 2008, Jakub Narebski wrote:
> > >
> > > Well, we could always add it as a local (per repository) "cache".
> > > With only generation numbers we could use pack-index-like format
> > > to store a mapping "commit sha-1 => generation number", just like
> > > now pack index stores mapping "object sha-1 => offset in pack".
> >
> > Yes, and even embedding this in the pack-file (or perhaps, the index, as a
> > new index extension) is probably a good idea.
>
> Errr... index is per workarea (per checkout), and this information
> is per repository, so IMHO storing this info in an index (dircache)
> is a layering violation. Unless you were talking about pack-file-index.
The pack index isn't a good idea since one pack doesn't necessarily
cover the whole of the repository. A separate new file is the only
sensible thing to do, if done at all.
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-11 1:24 ` Jakub Narebski
2008-02-11 1:59 ` Nicolas Pitre
@ 2008-02-11 15:59 ` Linus Torvalds
2008-02-11 16:26 ` Jakub Narebski
2008-02-11 16:39 ` Nicolas Pitre
1 sibling, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2008-02-11 15:59 UTC (permalink / raw)
To: Jakub Narebski
Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
Git Mailing List
On Mon, 11 Feb 2008, Jakub Narebski wrote:
>
> Errr... index is per workarea (per checkout), and this information
> is per repository, so IMHO storing this info in an index (dircache)
> is a layering violation. Unless you were talking about pack-file-index.
I did mean the pack-file index, not the "cache" index.
> Weren't the cases of multiple roots that were difficult? Storing roots
> would help with 'hard' (if seldom happening) cases then.
It's not that they aren't difficult, it's that they are so rare (ie having
ranges that really *are* separate never happens in practice). So it's not
worth worrying about from a performance angle.
The thing that worried me about multiple roots was that they make the
generation numbers essentially "meaningless" when compared across totally
unrelated commits, and might give incorrect results for generation number
comparisons as a result.
However, I decided that if two commits really *are* totally unrelated and
don't share a commit, then:
- yes, the generation number comparison is "meaningless"
- BUT: we don't actually care if it's correct or not, because it will
never matter: whatever we choose to do, it's correct. Because there are
just two choices:
(a) stop early because everything we have left is uninteresting
This is correct, because if they have two separate roots, they'll
never meet anyway, so from a correctness standpoint they will
never change a interesting commit into an uninteresting one.
(b) continue to the root because we think we might turn something
interesting into an uninteresting commit. This may be wasteful
(because the generation numbers happened to fool us into thinking
we migth care and one is older than the other), but it's still
technically _correct_.
And we really don't need to care about the performance issues
since continuing down to the root is what we'd have had to do even
without the generation numbers anyway, but more importantly
because we simply don't care - if people start doing comparisons
across truly independent commits, they are doing something wrong.
So that's why a generation number is sufficient.
And yes, just generating the generation number when repacking is fine. It
would mean that unpacked objects don't have generation numbers, but of you
have tons and tons of unpacked objects, you have more serious problems
anyway!
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-11 15:59 ` Linus Torvalds
@ 2008-02-11 16:26 ` Jakub Narebski
2008-02-11 16:39 ` Nicolas Pitre
1 sibling, 0 replies; 20+ messages in thread
From: Jakub Narebski @ 2008-02-11 16:26 UTC (permalink / raw)
To: Linus Torvalds
Cc: Johannes Schindelin, Nicolas Pitre, Junio C Hamano,
Git Mailing List
On Mon, 11 Feb 2008, Linus Torvalds wrote:
> On Mon, 11 Feb 2008, Jakub Narebski wrote:
>>
>> Errr... index is per workarea (per checkout), and this information
>> is per repository, so IMHO storing this info in an index (dircache)
>> is a layering violation. Unless you were talking about pack-file-index.
>
> I did mean the pack-file index, not the "cache" index.
> And yes, just generating the generation number when repacking is fine. It
> would mean that unpacked objects don't have generation numbers, but of you
> have tons and tons of unpacked objects, you have more serious problems
> anyway!
With generation number info in pack index, we could generate them
on repack (adding some time to repack).
With generation number info in separate pack-index like file, we
could add generation info whenever during browsing history we get
to root or to commit with generation number, saving generation
numbers in the by-the-way way, generating them lazily.
>> Weren't the cases of multiple roots that were difficult? Storing roots
>> would help with 'hard' (if seldom happening) cases then.
> The thing that worried me about multiple roots was that they make the
> generation numbers essentially "meaningless" when compared across totally
> unrelated commits, and might give incorrect results for generation number
> comparisons as a result.
>
> However, I decided that if two commits really *are* totally unrelated and
> don't share a commit, then:
>
> - yes, the generation number comparison is "meaningless"
>
> - BUT: we don't actually care if it's correct or not, because it will
> never matter: whatever we choose to do, it's correct. Because there are
> just two choices:
>
> (a) stop early because everything we have left is uninteresting
>
> (b) continue to the root because we think we might turn something
> interesting into an uninteresting commit.
By the way, with generation number we can always limit walk length
to difference between generation numbers, or distance to root if
it is smaller.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Add "--show-all" revision walker flag for debugging
2008-02-11 15:59 ` Linus Torvalds
2008-02-11 16:26 ` Jakub Narebski
@ 2008-02-11 16:39 ` Nicolas Pitre
1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2008-02-11 16:39 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jakub Narebski, Johannes Schindelin, Junio C Hamano,
Git Mailing List
On Mon, 11 Feb 2008, Linus Torvalds wrote:
> On Mon, 11 Feb 2008, Jakub Narebski wrote:
> >
> > Errr... index is per workarea (per checkout), and this information
> > is per repository, so IMHO storing this info in an index (dircache)
> > is a layering violation. Unless you were talking about pack-file-index.
>
> I did mean the pack-file index, not the "cache" index.
I think this has nothing to do with pack index either. Your repo may
have one pack, or multiple packs, or even no pack at all. They may or
may not contain commit objects, etc.
The generation indexing is OTOH a repo wide issue, completely orthogonal
to object packing. It has to be centralized in a single global file, or
possibly one file per branch, and updated whenever new commits are added
to the repository. I think the best format is to have a list of
(generation_nr, commit_sha1) tuples in assending order, so whenever a
fetch or a commit is performed then updating this file is only a matter
of appending to it.
And optimizing it even further, only merge commits (or rather merge
direct parents as well as branch heads) should have their generation
number cached in a file since the linear commits are trivially deduced
from there.
And regenerating this generation cache should not take longer than a
full history walk.
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-02-11 16:39 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-09 22:02 Add "--show-all" revision walker flag for debugging Linus Torvalds
2008-02-09 23:52 ` Linus Torvalds
2008-02-10 4:44 ` Junio C Hamano
2008-02-10 1:12 ` Johannes Schindelin
2008-02-10 1:22 ` Linus Torvalds
2008-02-10 1:29 ` Johannes Schindelin
2008-02-10 4:09 ` Junio C Hamano
2008-02-10 4:21 ` Junio C Hamano
2008-02-10 1:28 ` Nicolas Pitre
2008-02-10 1:30 ` Johannes Schindelin
2008-02-10 20:17 ` Jakub Narebski
2008-02-10 20:50 ` Linus Torvalds
2008-02-10 21:04 ` Nicolas Pitre
2008-02-10 22:53 ` Jakub Narebski
2008-02-10 23:11 ` Linus Torvalds
2008-02-11 1:24 ` Jakub Narebski
2008-02-11 1:59 ` Nicolas Pitre
2008-02-11 15:59 ` Linus Torvalds
2008-02-11 16:26 ` Jakub Narebski
2008-02-11 16:39 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).