* [PATCH/RFC 1/2] Move "--parent" parsing into generic revision.c library code @ 2006-03-31 0:52 Linus Torvalds 2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds 0 siblings, 1 reply; 17+ messages in thread From: Linus Torvalds @ 2006-03-31 0:52 UTC (permalink / raw) To: Junio C Hamano, Git Mailing List Move "--parent" parsing into generic revision.c library code Not only do we do it in both rev-list.c and git.c, the revision walking code will soon want to know whether we should rewrite parenthood information or not. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- This is the trivial part. The next email will contain the real meat of the change, which starts doing path limiting incrementally, and makes doing things like git log drivers/ on the kernel a hell of a lot more pleasant, because it starts outputting the log immediately instead of pausing for about 20 seconds before it has parsed all of the history and then outputting it all in one go. There's a HUGE difference to the "feel" of doing pathname limiting git-rev-list options. Not very well tested, but at least this initial preparatory patch is "obviously safe". diff --git a/git.c b/git.c index 0b40e30..72039c6 100644 --- a/git.c +++ b/git.c @@ -283,7 +283,6 @@ static int cmd_log(int argc, const char char *buf = xmalloc(LOGSIZE); static enum cmit_fmt commit_format = CMIT_FMT_DEFAULT; int abbrev = DEFAULT_ABBREV; - int show_parents = 0; const char *commit_prefix = "commit "; argc = setup_revisions(argc, argv, &rev, "HEAD"); @@ -294,9 +293,6 @@ static int cmd_log(int argc, const char if (commit_format == CMIT_FMT_ONELINE) commit_prefix = ""; } - else if (!strcmp(arg, "--parents")) { - show_parents = 1; - } else if (!strcmp(arg, "--no-abbrev")) { abbrev = 0; } @@ -317,7 +313,7 @@ static int cmd_log(int argc, const char while ((commit = get_revision(&rev)) != NULL) { printf("%s%s", commit_prefix, sha1_to_hex(commit->object.sha1)); - if (show_parents) { + if (rev.parents) { struct commit_list *parents = commit->parents; while (parents) { struct object *o = &(parents->item->object); diff --git a/rev-list.c b/rev-list.c index f3a989c..19a547a 100644 --- a/rev-list.c +++ b/rev-list.c @@ -39,7 +39,6 @@ struct rev_info revs; static int bisect_list = 0; static int verbose_header = 0; static int abbrev = DEFAULT_ABBREV; -static int show_parents = 0; static int show_timestamp = 0; static int hdr_termination = 0; static const char *commit_prefix = ""; @@ -54,7 +53,7 @@ static void show_commit(struct commit *c if (commit->object.flags & BOUNDARY) putchar('-'); fputs(sha1_to_hex(commit->object.sha1), stdout); - if (show_parents) { + if (revs.parents) { struct commit_list *parents = commit->parents; while (parents) { struct object *o = &(parents->item->object); @@ -336,10 +335,6 @@ int main(int argc, const char **argv) commit_prefix = ""; else commit_prefix = "commit "; - continue; - } - if (!strcmp(arg, "--parents")) { - show_parents = 1; continue; } if (!strcmp(arg, "--timestamp")) { diff --git a/revision.c b/revision.c index abc8745..0330f9f 100644 --- a/revision.c +++ b/revision.c @@ -596,6 +596,10 @@ int setup_revisions(int argc, const char revs->limited = 1; continue; } + if (!strcmp(arg, "--parents")) { + revs->parents = 1; + continue; + } if (!strcmp(arg, "--dense")) { revs->dense = 1; continue; diff --git a/revision.h b/revision.h index 61e6bc9..0caeecf 100644 --- a/revision.h +++ b/revision.h @@ -34,7 +34,8 @@ struct rev_info { edge_hint:1, limited:1, unpacked:1, - boundary:1; + boundary:1, + parents:1; /* special limits */ int max_count; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 0:52 [PATCH/RFC 1/2] Move "--parent" parsing into generic revision.c library code Linus Torvalds @ 2006-03-31 1:05 ` Linus Torvalds 2006-03-31 6:05 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Linus Torvalds @ 2006-03-31 1:05 UTC (permalink / raw) To: Junio C Hamano, Git Mailing List This makes git-rev-list able to do path-limiting without having to parse all of history before it starts showing the results. This makes it things like "git log -- pathname" much more pleasant to use. This is actually a pretty small patch, and the biggest part of it is purely cleanups (turning the "goto next" statements into "continue"), but it's conceptually a lot bigger than it looks. What it does is that if you do a path-limited revision list, and you do _not_ ask for pseudo-parenthood information, it won't do all the path-limiting up-front, but instead do it incrementally in "get_revision()". This is an absolutely huge deal for anything like "git log -- <pathname>", but also for some things that we don't do yet - like the "find where things changed" logic I've described elsewhere, where we want to find the previous revision that changed a file. The reason I put "RFC" in the subject line is that while I've validated it various ways, like doing git-rev-list HEAD -- drivers/char/ | md5sum before-and-after on the kernel archive, it's "git-rev-list" after all. In other words, it's that really really subtle and complex central piece of software. So while I think this is important and should go in asap, I also think it should get lots of testing and eyeballs looking at the code. Btw, don't even bother testing this with the git archive. git itself is so small that parsing the whole revision history for it takes about a second even with path limiting. The thing that _really_ shows this off is doing git log drivers/ on the kernel archive, or even better, on the _historic_ kernel archive. With this change, the response is instantaneous (although seeking to the end of the result will obviously take as long as it ever did). Before this change, the command would think about the result for tens of seconds - or even minutes, in the case of the bigger old kernel archive - before starting to output the results. NOTE NOTE NOTE! Using path limiting with things like "gitk", which uses the "--parents" flag to actually generate a pseudo-history of the resulting commits won't actually see the improvement in interactivity, since that forces git-rev-list to do the whole-history thing after all. MAYBE we can fix that too at some point, but I won't promise anything. Signed-off-by: Linus Torvalds <torvalds@osdl.org> --- diff --git a/revision.c b/revision.c index 0330f9f..0e3f074 100644 --- a/revision.c +++ b/revision.c @@ -702,7 +702,13 @@ int setup_revisions(int argc, const char if (revs->prune_data) { diff_tree_setup_paths(revs->prune_data); revs->prune_fn = try_to_simplify_commit; - revs->limited = 1; + + /* + * If we fix up parent data, we currently cannot + * do that on-the-fly. + */ + if (revs->parents) + revs->limited = 1; } return left; @@ -763,23 +769,32 @@ struct commit *get_revision(struct rev_i do { struct commit *commit = revs->commits->item; + + revs->commits = revs->commits->next; + /* + * If we haven't done the list limiting, we need to look at + * the parents here + */ + if (!revs->limited) + add_parents_to_list(revs, commit, &revs->commits); if (commit->object.flags & SHOWN) - goto next; + continue; if (!(commit->object.flags & BOUNDARY) && (commit->object.flags & UNINTERESTING)) - goto next; + continue; if (revs->min_age != -1 && (commit->date > revs->min_age)) - goto next; + continue; if (revs->max_age != -1 && (commit->date < revs->max_age)) return NULL; if (revs->no_merges && commit->parents && commit->parents->next) - goto next; + continue; if (revs->prune_fn && revs->dense) { if (!(commit->object.flags & TREECHANGE)) - goto next; - rewrite_parents(commit); + continue; + if (revs->parents) + rewrite_parents(commit); } /* More to go? */ if (revs->max_count) { @@ -792,13 +807,9 @@ struct commit *get_revision(struct rev_i revs->commits = it->next; free(it); } - else - pop_most_recent_commit(&revs->commits, SEEN); } commit->object.flags |= SHOWN; return commit; -next: - pop_most_recent_commit(&revs->commits, SEEN); } while (revs->commits); return NULL; } ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds @ 2006-03-31 6:05 ` Linus Torvalds 2006-03-31 6:45 ` Junio C Hamano 2006-03-31 6:16 ` Junio C Hamano 2006-03-31 8:28 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Junio C Hamano 2 siblings, 1 reply; 17+ messages in thread From: Linus Torvalds @ 2006-03-31 6:05 UTC (permalink / raw) To: Junio C Hamano, Git Mailing List On Thu, 30 Mar 2006, Linus Torvalds wrote: > > This makes git-rev-list able to do path-limiting without having to parse > all of history before it starts showing the results. > > This makes it things like "git log -- pathname" much more pleasant to use. Sadly, it seems to react really badly to Junio's new --boundary logic for some reason that I haven't quite figured out yet. That reaction is independent of the actual pathname restriction, and seems to be related to how the --boundary logic expected pop_most_recent_commit() to work. In particular: ... if (commit->object.flags & BOUNDARY) { /* this is already uninteresting, * so there is no point popping its * parents into the list. */ that code is magic, and seems to depend on something subtle going on with the list, and the incremental thing already popped the parent earlier and screwed up whatever magic that the BOUNDARY code depends on. Junio? I think you did some funky special case with BOUNDARY commits, and I broke it for you, can you look at the patch and see if you can see it? I'd really like to have the incremental path-limiter, because it really makes a huge difference in the usability of "git log pathname". Linus ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 6:05 ` Linus Torvalds @ 2006-03-31 6:45 ` Junio C Hamano 2006-03-31 19:39 ` Linus Torvalds 0 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2006-03-31 6:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds <torvalds@osdl.org> writes: > Sadly, it seems to react really badly to Junio's new --boundary logic for > some reason that I haven't quite figured out yet. There already was a report that --boundary stuff is not quite right, so what you are seeing might be that the new code exposes its original breakage even more. I haven't looked into the breakage of the original version yet either, so I cannot really say how your change breaks it. > That reaction is independent of the actual pathname restriction, and seems > to be related to how the --boundary logic expected > pop_most_recent_commit() to work. In particular: > > ... > if (commit->object.flags & BOUNDARY) { > /* this is already uninteresting, > * so there is no point popping its > * parents into the list. > */ > > that code is magic, and seems to depend on something subtle going on with > the list, and the incremental thing already popped the parent earlier and > screwed up whatever magic that the BOUNDARY code depends on. This was not so magic, but the magic was actually in the code added to limit_list(). Usually, "newlist" consists interesting commits, and what are found interesting initially but marked as uninteresting when a different ancestry chain coming from an uninteresting head leading to it was later discovered. The magic code looks at still-interesting commits, and re-injects its parents that are uninteresting to the list (and I just spotted a bug there -- it does not check if what is being "re-" injected are already on the list -- should I check for SEEN flag there perhaps?), while marking them as boundary. This was done to make sure that all the open-circle (in gitk) commits are on the resulting list. The part of the code you quoted is just a short-cut for not doing pop_most_recent_commit() -- we used to have pop_most_recent_commit() there, which pushed the parents of the commit being processed into the list. Because we are processing a boundary commit, we know it is uninteresting -- and by definition, its parents are uninteresting and that is why it just advances the list without calling pop_most_recent_commit(), bypassing its side effect to push its parents into the list. Since the new code has already advanced the list immediately after the beginning of do {} block, I think you can remove the entire "if (revs->max_count) {}" block. As the code currently stands, you are skipping what happens to be next to the boundary commit on the result list. > Junio? I think you did some funky special case with BOUNDARY commits, and > I broke it for you, can you look at the patch and see if you can see it? > I'd really like to have the incremental path-limiter, because it really > makes a huge difference in the usability of "git log pathname". Oh, there is no question about making it streamable in more cases is a good thing. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 6:45 ` Junio C Hamano @ 2006-03-31 19:39 ` Linus Torvalds 2006-03-31 20:35 ` Junio C Hamano 0 siblings, 1 reply; 17+ messages in thread From: Linus Torvalds @ 2006-03-31 19:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List On Thu, 30 Mar 2006, Junio C Hamano wrote: > > There already was a report that --boundary stuff is not quite > right, so what you are seeing might be that the new code exposes > its original breakage even more. I haven't looked into the > breakage of the original version yet either, so I cannot really > say how your change breaks it. [ Awake and thinking about this again ] No, I think the new breakage was just because I was being a stupid ass. When I converted get_revision() to do the parent parsing up at the top instead of at the bottom, I just didn't think correctly about your new BOUNDARY code, and my conversion for that was just wrong. Part of the "do the parents early" code was also removing the current commit early, so suddenly your BOUNDARY case for doing that thing was just removing some other random commit instead, which was obviously wrong. The fix is trivial - with the new get_revision() organization, the BOUNDARY case special-case actually goes away entirely, and this trivial patch (on top of my 2/2 patch) should just fix it. At least it passes my tests again now, and looking at the code everything seems sane. Linus --- diff --git a/revision.c b/revision.c index 0e3f074..753633e 100644 --- a/revision.c +++ b/revision.c @@ -796,18 +796,6 @@ struct commit *get_revision(struct rev_i if (revs->parents) rewrite_parents(commit); } - /* More to go? */ - if (revs->max_count) { - if (commit->object.flags & BOUNDARY) { - /* this is already uninteresting, - * so there is no point popping its - * parents into the list. - */ - struct commit_list *it = revs->commits; - revs->commits = it->next; - free(it); - } - } commit->object.flags |= SHOWN; return commit; } while (revs->commits); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 19:39 ` Linus Torvalds @ 2006-03-31 20:35 ` Junio C Hamano 2006-03-31 20:50 ` Linus Torvalds 0 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2006-03-31 20:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > The fix is trivial - with the new get_revision() organization, the > BOUNDARY case special-case actually goes away entirely, and this trivial > patch (on top of my 2/2 patch) should just fix it. Yes, that is exactly the fix I have in "pu" -- I suspect you replied before getting to my response last night. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 20:35 ` Junio C Hamano @ 2006-03-31 20:50 ` Linus Torvalds 0 siblings, 0 replies; 17+ messages in thread From: Linus Torvalds @ 2006-03-31 20:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Fri, 31 Mar 2006, Junio C Hamano wrote: > > Yes, that is exactly the fix I have in "pu" -- I suspect you > replied before getting to my response last night. No, I just get too much email, and hadn't read your response carefully enough, so I just missed the fact that you had already found the problem. Linus ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds 2006-03-31 6:05 ` Linus Torvalds @ 2006-03-31 6:16 ` Junio C Hamano 2006-03-31 6:42 ` Linus Torvalds 2006-03-31 8:28 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Junio C Hamano 2 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2006-03-31 6:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > The reason I put "RFC" in the subject line is that while I've validated it > various ways, like doing > > git-rev-list HEAD -- drivers/char/ | md5sum > > before-and-after on the kernel archive, it's "git-rev-list" after all. In > other words, it's that really really subtle and complex central piece of > software. So while I think this is important and should go in asap, I also > think it should get lots of testing and eyeballs looking at the code. Let me make sure I understand what is going on. Currently, limit_list(), which is called when revs->limited is set, serves two different purposes: * traverse the ancestry chain and mark ancestors of uninteresting commits also uninteresting; * "simplify" each commit that is still interesting, by comparing the tree object of it and its parents. We used to call limit_list() when we are limiting the commits by paths, because that was what the latter does as a side effect of add_parents_to_list(). You made it streamable by moving the call to get_revision() and not calling limit_list() when you do not have other reasons to call it. You currently do not do this streaming optimization when you have to show simplified parents, because the streaming version traverses ancestry chain one step at a time as it goes, and you cannot obviously see who the final "fake" parent is until you simplify the parents enouogh. Now, I have some observations, not necessarily limited to this patch but also to the code from the "master" version. * get_commit_reference() sets "limited" when the user gives ^exclude commit. no question about this --- we would need to see the reachability in this case. * when we are going to sort the result we are going to show, we set "limited" -- we cannot sort without knowing the set we are sorting. * giving commit timestamp limits via --max-age, --min-age etc. sets "limited". I have doubts about this. We currently look at the commit timestamp in limit_list() and mark a commit old enough to be "uninteresting" -- which makes ancestor of such commit also uninteresting. Similarly commits that are too young are not pushed into the result. There is a filter in get_revision() to omit ineligible commits by time range already, so I wonder if we can remove that code from limit_list() and not set "limited" when these timestamp ranges are given. This would let us stream even more. Admittably, ancestors of an old commit had better be even older, so marking them uninteresting to stop the traversal early is a good way to optimize the full-traversal case, but not having to call limit_list() would help streaming usage. Also I have doubts about correctness issue about the max-age handling. Is it correct to mark a commit that is old enough to be uninteresting, implying that its ancestors are all uninteresting? With clock skew among people, it is possible to make a merge commit that has parents one of which is "in the future". * As to handling of --unpacked, exactly the same comment as "max-age" applies, but it might be even worse. Marking an unpacked one "uninteresting" means if a packed commit refers to loose commit, the loose one will be also marked uninteresting, I think. "--objects --unpacked" is an idiom to repack objects that are still loose, but I suspect it would do interesting thing if the commit is in a pack but its trees and blobs are still loose. Am I confused, or have I just spotted a potential (but hard to trigger) bug? -- diff --git a/revision.c b/revision.c index 0e3f074..a55c0d1 100644 --- a/revision.c +++ b/revision.c @@ -404,10 +404,6 @@ static void limit_list(struct rev_info * list = list->next; free(entry); - if (revs->max_age != -1 && (commit->date < revs->max_age)) - obj->flags |= UNINTERESTING; - if (revs->unpacked && has_sha1_pack(obj->sha1)) - obj->flags |= UNINTERESTING; add_parents_to_list(revs, commit, &list); if (obj->flags & UNINTERESTING) { mark_parents_uninteresting(commit); @@ -415,8 +411,6 @@ static void limit_list(struct rev_info * break; continue; } - if (revs->min_age != -1 && (commit->date > revs->min_age)) - continue; p = &commit_list_insert(commit, p)->next; } if (revs->boundary) { @@ -543,32 +537,26 @@ int setup_revisions(int argc, const char } if (!strncmp(arg, "--max-age=", 10)) { revs->max_age = atoi(arg + 10); - revs->limited = 1; continue; } if (!strncmp(arg, "--min-age=", 10)) { revs->min_age = atoi(arg + 10); - revs->limited = 1; continue; } if (!strncmp(arg, "--since=", 8)) { revs->max_age = approxidate(arg + 8); - revs->limited = 1; continue; } if (!strncmp(arg, "--after=", 8)) { revs->max_age = approxidate(arg + 8); - revs->limited = 1; continue; } if (!strncmp(arg, "--before=", 9)) { revs->min_age = approxidate(arg + 9); - revs->limited = 1; continue; } if (!strncmp(arg, "--until=", 8)) { revs->min_age = approxidate(arg + 8); - revs->limited = 1; continue; } if (!strcmp(arg, "--all")) { @@ -635,7 +623,6 @@ int setup_revisions(int argc, const char } if (!strcmp(arg, "--unpacked")) { revs->unpacked = 1; - revs->limited = 1; continue; } *unrecognized++ = arg; @@ -786,7 +773,9 @@ struct commit *get_revision(struct rev_i if (revs->min_age != -1 && (commit->date > revs->min_age)) continue; if (revs->max_age != -1 && (commit->date < revs->max_age)) - return NULL; + continue; + if (revs->unpacked && has_sha1_pack(commit->object.sha1)) + continue; if (revs->no_merges && commit->parents && commit->parents->next) continue; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 6:16 ` Junio C Hamano @ 2006-03-31 6:42 ` Linus Torvalds 2006-03-31 7:02 ` Junio C Hamano 0 siblings, 1 reply; 17+ messages in thread From: Linus Torvalds @ 2006-03-31 6:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 30 Mar 2006, Junio C Hamano wrote: > > Linus Torvalds <torvalds@osdl.org> writes: > > > The reason I put "RFC" in the subject line is that while I've validated it > > various ways, like doing > > > > git-rev-list HEAD -- drivers/char/ | md5sum > > > > before-and-after on the kernel archive, it's "git-rev-list" after all. In > > other words, it's that really really subtle and complex central piece of > > software. So while I think this is important and should go in asap, I also > > think it should get lots of testing and eyeballs looking at the code. > > Let me make sure I understand what is going on. > > Currently, limit_list(), which is called when revs->limited is > set, serves two different purposes: > > * traverse the ancestry chain and mark ancestors of > uninteresting commits also uninteresting; Right. We _also_ traverse the ancestry chain in the "walking" stage later in get_revision(), but if we have a "limit" case, we'll pre-traverse everything early in limit_list. > * "simplify" each commit that is still interesting, by > comparing the tree object of it and its parents. Through "add_parent_to_list()", yes. > We used to call limit_list() when we are limiting the commits by > paths, because that was what the latter does as a side effect of > add_parents_to_list(). You made it streamable by moving the > call to get_revision() and not calling limit_list() when you do > not have other reasons to call it. Yes, and by using "add_parent_to_list()" instead of the non-pathname-limit-aware "pop_most_recent_commit()". So we don't call limit-list any more, because we do the same thing in get_revision() that it used to do at limit time. > You currently do not do this streaming optimization when you > have to show simplified parents, because the streaming version > traverses ancestry chain one step at a time as it goes, and you > cannot obviously see who the final "fake" parent is until you > simplify the parents enouogh. Yes. For exactly the same reason some other things ause us to do limit_list(): some ops simply need the fully connected end result in order to be able to work correctly. > * get_commit_reference() sets "limited" when the user gives ^exclude > commit. no question about this --- we would need to see the > reachability in this case. Right. > * when we are going to sort the result we are going to show, > we set "limited" -- we cannot sort without knowing the set we > are sorting. Right. > * giving commit timestamp limits via --max-age, --min-age > etc. sets "limited". I have doubts about this. I agree. I looked at it when I did the "rev-list.c" vs "revision.c" split, but I wanted to do as few changes as possible, and for some reason we've always considered "time" to cause us to limit things. However, I think your patch is wrong. Even if you don't limit things when we have a date specifier, you still want to stop doing traversal in limit_list when you hit that date. Why? Because it might be limited for some _other_ reason, eg due to --topo-order or or some other issue. > * As to handling of --unpacked, exactly the same comment as > "max-age" applies, but it might be even worse. Marking an > unpacked one "uninteresting" means if a packed commit refers > to loose commit, the loose one will be also marked > uninteresting, I think. > > "--objects --unpacked" is an idiom to repack objects that are > still loose, but I suspect it would do interesting thing if the > commit is in a pack but its trees and blobs are still loose. Am > I confused, or have I just spotted a potential (but hard to > trigger) bug? Regardless of where you do the "unpacked" test, it will always really end up stopping when it hits a packed commit. So you won't ever get away from that. So same logic applies. Once you hit a packed commit, you should stop, both in limit_list _and_ in get_revision(). Otherwise you'll do too much work when trying to limit things, for no gain. The fact is, "--unpacked" means that we traverse the part of the chain that hasn't been packed yet. Anything that is packed automatically cuts off parsing, whether there is anything else unpacked below it or not. It's not a bug, it's a feature, and if you want to pack everything, you should just do git repack -a -d and not use --unpacked. Same goes for dates, btw. We've always stopped when we reached a certain date, even though there _could_ be commits before it that are from a more recent date (due to clocks being set wrong). Neither --unpacked nor --since=xyzzy are meant to be any kind of "guarantees" that you get all commits that match some pattern. They are just "stop early" rules. Linus So the following is wrong: > diff --git a/revision.c b/revision.c > index 0e3f074..a55c0d1 100644 > --- a/revision.c > +++ b/revision.c > @@ -404,10 +404,6 @@ static void limit_list(struct rev_info * > list = list->next; > free(entry); > > - if (revs->max_age != -1 && (commit->date < revs->max_age)) > - obj->flags |= UNINTERESTING; > - if (revs->unpacked && has_sha1_pack(obj->sha1)) > - obj->flags |= UNINTERESTING; > add_parents_to_list(revs, commit, &list); > if (obj->flags & UNINTERESTING) { > mark_parents_uninteresting(commit); > @@ -415,8 +411,6 @@ static void limit_list(struct rev_info * > break; > continue; > } > - if (revs->min_age != -1 && (commit->date > revs->min_age)) > - continue; > p = &commit_list_insert(commit, p)->next; > } > if (revs->boundary) { ..but the later part of the patch looks fine (modulo testing, of course): > @@ -543,32 +537,26 @@ int setup_revisions(int argc, const char > } > if (!strncmp(arg, "--max-age=", 10)) { > revs->max_age = atoi(arg + 10); > - revs->limited = 1; > continue; > } > if (!strncmp(arg, "--min-age=", 10)) { > revs->min_age = atoi(arg + 10); > - revs->limited = 1; > continue; > } > if (!strncmp(arg, "--since=", 8)) { > revs->max_age = approxidate(arg + 8); > - revs->limited = 1; > continue; > } > if (!strncmp(arg, "--after=", 8)) { > revs->max_age = approxidate(arg + 8); > - revs->limited = 1; > continue; > } > if (!strncmp(arg, "--before=", 9)) { > revs->min_age = approxidate(arg + 9); > - revs->limited = 1; > continue; > } > if (!strncmp(arg, "--until=", 8)) { > revs->min_age = approxidate(arg + 8); > - revs->limited = 1; > continue; > } > if (!strcmp(arg, "--all")) { > @@ -635,7 +623,6 @@ int setup_revisions(int argc, const char > } > if (!strcmp(arg, "--unpacked")) { > revs->unpacked = 1; > - revs->limited = 1; > continue; > } > *unrecognized++ = arg; > @@ -786,7 +773,9 @@ struct commit *get_revision(struct rev_i > if (revs->min_age != -1 && (commit->date > revs->min_age)) > continue; > if (revs->max_age != -1 && (commit->date < revs->max_age)) > - return NULL; > + continue; > + if (revs->unpacked && has_sha1_pack(commit->object.sha1)) > + continue; > if (revs->no_merges && > commit->parents && commit->parents->next) > continue; > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 6:42 ` Linus Torvalds @ 2006-03-31 7:02 ` Junio C Hamano 2006-04-02 0:35 ` Linus Torvalds 0 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2006-03-31 7:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > So the following is wrong: > >> diff --git a/revision.c b/revision.c >> index 0e3f074..a55c0d1 100644 >> --- a/revision.c >> +++ b/revision.c >> @@ -404,10 +404,6 @@ static void limit_list(struct rev_info * >> list = list->next; >> free(entry); >> >> - if (revs->max_age != -1 && (commit->date < revs->max_age)) >> - obj->flags |= UNINTERESTING; >> - if (revs->unpacked && has_sha1_pack(obj->sha1)) >> - obj->flags |= UNINTERESTING; >> add_parents_to_list(revs, commit, &list); >> if (obj->flags & UNINTERESTING) { >> mark_parents_uninteresting(commit); >... > ..but the later part of the patch looks fine (modulo testing, of course): > >> @@ -786,7 +773,9 @@ struct commit *get_revision(struct rev_i >> if (revs->min_age != -1 && (commit->date > revs->min_age)) >> continue; >> if (revs->max_age != -1 && (commit->date < revs->max_age)) >> - return NULL; >> + continue; >> + if (revs->unpacked && has_sha1_pack(commit->object.sha1)) >> + continue; OK, so let's say I agree with you that --unpacked and --since are "stop early" rules. I fully agree with that from usability and implementation point of view, but I now wonder if the "output filter" in get_revision() and "stop early" in limit_list() would do the same thing. That is, if we are _otherwise_ limited, limit_list() would use them for "stop early" rule, but if we end up not calling limit_list() we would simply filter them and keep going (which is what my patch did for both timestamp and packedness), or we could also stop there. I am not sure which one is correct, but I suspect whichever we do in get_revision() we would get inconsistent results between a case where we call limit_list() and we do not. That is, the set of commits we are going to show when --(topo|date)-order is given and not given can be different. Is this acceptable? I would say, from the implementation point of view, it should be tolerated, but... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 7:02 ` Junio C Hamano @ 2006-04-02 0:35 ` Linus Torvalds 2006-04-02 3:11 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Linus Torvalds @ 2006-04-02 0:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 30 Mar 2006, Junio C Hamano wrote: > > OK, so let's say I agree with you that --unpacked and --since > are "stop early" rules. I fully agree with that from usability > and implementation point of view, but I now wonder if the > "output filter" in get_revision() and "stop early" in limit_list() > would do the same thing. They don't. What ends up not working very well at all is the combination of "--topo-order" and the output filter in get_revision. It will return NULL when we see the first commit out of date-order, even if we have other commits coming. So we really should do the "past the date order" thing in get_revision() only if we have _not_ done it already in limit_list(). Something like this. The easiest way to test this is with just gitk --since=3.days.ago on the kernel tree. Without this patch, it tends to be pretty obviously broken. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Linus --- diff --git a/revision.c b/revision.c index a8a54b6..558ed01 100644 --- a/revision.c +++ b/revision.c @@ -783,10 +783,14 @@ struct commit *get_revision(struct rev_i /* * If we haven't done the list limiting, we need to look at - * the parents here + * the parents here. We also need to do the date-based limiting + * that we'd otherwise have done in limit_list(). */ - if (!revs->limited) + if (!revs->limited) { + if (revs->max_age != -1 && (commit->date < revs->max_age)) + continue; add_parents_to_list(revs, commit, &revs->commits); + } if (commit->object.flags & SHOWN) continue; if (!(commit->object.flags & BOUNDARY) && @@ -794,8 +798,6 @@ struct commit *get_revision(struct rev_i continue; if (revs->min_age != -1 && (commit->date > revs->min_age)) continue; - if (revs->max_age != -1 && (commit->date < revs->max_age)) - return NULL; if (revs->no_merges && commit->parents && commit->parents->next) continue; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-04-02 0:35 ` Linus Torvalds @ 2006-04-02 3:11 ` Junio C Hamano 2006-04-02 3:17 ` [PATCH] revision: simplify argument parsing Junio C Hamano 2006-04-02 3:17 ` [PATCH] revision: --max-age alone does not need limit_list() anymore Junio C Hamano 2 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2006-04-02 3:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > What ends up not working very well at all is the combination of > "--topo-order" and the output filter in get_revision. It will return NULL > when we see the first commit out of date-order, even if we have other > commits coming. > > So we really should do the "past the date order" thing in get_revision() > only if we have _not_ done it already in limit_list(). Right now --max-age causes "limited" so there should not be a difference, if I am not mistaken. But I've been meaning to change that, so the patch makes sense. Similarly,... -- >8 -- [PATCH] revision: --topo-order and --unpacked Now, using --unpacked without limit_list() does not make much sense, but this is parallel to the earlier --max-age fix. Signed-off-by: Junio C Hamano <junkio@cox.net> --- revision.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) 22c31bf183bff576c7807f9d67abfc11ee8e1fa4 diff --git a/revision.c b/revision.c index 558ed01..07cc86f 100644 --- a/revision.c +++ b/revision.c @@ -787,7 +787,10 @@ struct commit *get_revision(struct rev_i * that we'd otherwise have done in limit_list(). */ if (!revs->limited) { - if (revs->max_age != -1 && (commit->date < revs->max_age)) + if ((revs->unpacked && + has_sha1_pack(commit->object.sha1)) || + (revs->max_age != -1 && + (commit->date < revs->max_age))) continue; add_parents_to_list(revs, commit, &revs->commits); } -- 1.3.0.rc1.gf385 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH] revision: simplify argument parsing. 2006-04-02 0:35 ` Linus Torvalds 2006-04-02 3:11 ` Junio C Hamano @ 2006-04-02 3:17 ` Junio C Hamano [not found] ` <443063E2.1040904@lsrfire.ath.cx> 2006-04-02 3:17 ` [PATCH] revision: --max-age alone does not need limit_list() anymore Junio C Hamano 2 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2006-04-02 3:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: git This just moves code around to consolidate the part that sets revs->limited to one place based on various flags. Signed-off-by: Junio C Hamano <junkio@cox.net> --- * Just the preparation for the next step... revision.c | 20 +++++++------------- 1 files changed, 7 insertions(+), 13 deletions(-) 53069686601d156dea3787a100ffc4e35c78040f diff --git a/revision.c b/revision.c index 07cc86f..cdd29c5 100644 --- a/revision.c +++ b/revision.c @@ -552,32 +552,26 @@ int setup_revisions(int argc, const char } if (!strncmp(arg, "--max-age=", 10)) { revs->max_age = atoi(arg + 10); - revs->limited = 1; continue; } - if (!strncmp(arg, "--min-age=", 10)) { - revs->min_age = atoi(arg + 10); - revs->limited = 1; - continue; - } if (!strncmp(arg, "--since=", 8)) { revs->max_age = approxidate(arg + 8); - revs->limited = 1; continue; } if (!strncmp(arg, "--after=", 8)) { revs->max_age = approxidate(arg + 8); - revs->limited = 1; continue; } + if (!strncmp(arg, "--min-age=", 10)) { + revs->min_age = atoi(arg + 10); + continue; + } if (!strncmp(arg, "--before=", 9)) { revs->min_age = approxidate(arg + 9); - revs->limited = 1; continue; } if (!strncmp(arg, "--until=", 8)) { revs->min_age = approxidate(arg + 8); - revs->limited = 1; continue; } if (!strcmp(arg, "--all")) { @@ -596,13 +590,11 @@ int setup_revisions(int argc, const char } if (!strcmp(arg, "--topo-order")) { revs->topo_order = 1; - revs->limited = 1; continue; } if (!strcmp(arg, "--date-order")) { revs->lifo = 0; revs->topo_order = 1; - revs->limited = 1; continue; } if (!strcmp(arg, "--parents")) { @@ -644,7 +636,6 @@ int setup_revisions(int argc, const char } if (!strcmp(arg, "--unpacked")) { revs->unpacked = 1; - revs->limited = 1; continue; } *unrecognized++ = arg; @@ -707,6 +698,9 @@ int setup_revisions(int argc, const char commit = get_commit_reference(revs, def, sha1, 0); add_one_commit(commit, revs); } + + if ((revs->max_age != -1) || revs->topo_order || revs->unpacked) + revs->limited = 1; if (revs->prune_data) { diff_tree_setup_paths(revs->prune_data); -- 1.3.0.rc1.gf385 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <443063E2.1040904@lsrfire.ath.cx>]
* Re: [PATCH] revision: simplify argument parsing. [not found] ` <443063E2.1040904@lsrfire.ath.cx> @ 2006-04-03 4:22 ` Junio C Hamano 0 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2006-04-03 4:22 UTC (permalink / raw) To: Rene Scharfe; +Cc: Linus Torvalds, git Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes: > The comment is misleading because the patch not only moves stuff around, > it also removes setting of revs->limited when revs->min_age has been > set. You are correct that the patch does more than it claims to. That is an independent fix mentioned in my earlier message <7vr74jxhp3.fsf@assigned-by-dhcp.cox.net>. We did not have to pay the overhead of having to call limit_list() when only min_age was specified because we did not do traversal filtering using UNINTERESTING logic for min_age, but we called it anyway. Unlike --max-age that marks an old one _and_ its parents uninteresting, it needs to filter a young one without making its parents uninteresting. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] revision: --max-age alone does not need limit_list() anymore. 2006-04-02 0:35 ` Linus Torvalds 2006-04-02 3:11 ` Junio C Hamano 2006-04-02 3:17 ` [PATCH] revision: simplify argument parsing Junio C Hamano @ 2006-04-02 3:17 ` Junio C Hamano 2 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2006-04-02 3:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: git This makes git log --since=7.days to be streamable. Signed-off-by: Junio C Hamano <junkio@cox.net> --- revision.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) bbbc8c3a8d455e1f5d15c3764eba70250b5479e9 diff --git a/revision.c b/revision.c index cdd29c5..728b6d1 100644 --- a/revision.c +++ b/revision.c @@ -699,7 +699,7 @@ int setup_revisions(int argc, const char add_one_commit(commit, revs); } - if ((revs->max_age != -1) || revs->topo_order || revs->unpacked) + if (revs->topo_order || revs->unpacked) revs->limited = 1; if (revs->prune_data) { -- 1.3.0.rc1.gf385 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds 2006-03-31 6:05 ` Linus Torvalds 2006-03-31 6:16 ` Junio C Hamano @ 2006-03-31 8:28 ` Junio C Hamano 2006-03-31 19:44 ` Linus Torvalds 2 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2006-03-31 8:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > This is an absolutely huge deal for anything like "git log -- <pathname>", > but also for some things that we don't do yet - like the "find where > things changed" logic I've described elsewhere, where we want to find the > previous revision that changed a file. >... > Btw, don't even bother testing this with the git archive. git itself is so > small that parsing the whole revision history for it takes about a second > even with path limiting. By the way, I forgot to praise you ;-). Even on a fast machine, the old one was not very useful, but this one is _instantaneous_. Very good job. $ PAGER=cat GIT_DIR=/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/ \ /usr/bin/time git log -1 --pretty=short -- drivers/ commit ce362c009250340358a7221f3cdb7954cbf19c01 Merge: 064c94f... cd7a920... Author: Linus Torvalds <torvalds@g5.osdl.org> Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6 15.44user 0.19system 0:25.11elapsed 62%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+18050minor)pagefaults 0swaps $ PAGER=cat GIT_DIR=/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/ \ /usr/bin/time ./git.pu log -1 --pretty=short -- drivers/ commit ce362c009250340358a7221f3cdb7954cbf19c01 Merge: 064c94f... cd7a920... Author: Linus Torvalds <torvalds@g5.osdl.org> Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6 0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+388minor)pagefaults 0swaps ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible. 2006-03-31 8:28 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Junio C Hamano @ 2006-03-31 19:44 ` Linus Torvalds 0 siblings, 0 replies; 17+ messages in thread From: Linus Torvalds @ 2006-03-31 19:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Fri, 31 Mar 2006, Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > > This is an absolutely huge deal for anything like "git log -- <pathname>", > > but also for some things that we don't do yet - like the "find where > > things changed" logic I've described elsewhere, where we want to find the > > previous revision that changed a file. > >... > > Btw, don't even bother testing this with the git archive. git itself is so > > small that parsing the whole revision history for it takes about a second > > even with path limiting. > > By the way, I forgot to praise you ;-). > > Even on a fast machine, the old one was not very useful, but > this one is _instantaneous_. Very good job. Indeed. It's why I'd really like this to be merged before 1.3.0 - it moves a certain class of problems from "it works" to "it's actually usable". Now, the _real_ usage I foresee (which just wasn't practical before) is the interactive annotation thing - this won't help a _full_file_ annotate (which usually needs to go back to the very first version of a file anyway), but it should make it possible to play with an incremental one (the "graphical git-whatchanged" kind). But even just the "git log" difference makes it worth it. Linus ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-04-03 4:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-31 0:52 [PATCH/RFC 1/2] Move "--parent" parsing into generic revision.c library code Linus Torvalds
2006-03-31 1:05 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Linus Torvalds
2006-03-31 6:05 ` Linus Torvalds
2006-03-31 6:45 ` Junio C Hamano
2006-03-31 19:39 ` Linus Torvalds
2006-03-31 20:35 ` Junio C Hamano
2006-03-31 20:50 ` Linus Torvalds
2006-03-31 6:16 ` Junio C Hamano
2006-03-31 6:42 ` Linus Torvalds
2006-03-31 7:02 ` Junio C Hamano
2006-04-02 0:35 ` Linus Torvalds
2006-04-02 3:11 ` Junio C Hamano
2006-04-02 3:17 ` [PATCH] revision: simplify argument parsing Junio C Hamano
[not found] ` <443063E2.1040904@lsrfire.ath.cx>
2006-04-03 4:22 ` Junio C Hamano
2006-04-02 3:17 ` [PATCH] revision: --max-age alone does not need limit_list() anymore Junio C Hamano
2006-03-31 8:28 ` [PATCH/RFC 2/2] Make path-limiting be incremental when possible Junio C Hamano
2006-03-31 19:44 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).