* [PATCH] revision: avoid work after --max-count is reached
@ 2012-07-13 7:50 Jeff King
2012-07-13 7:53 ` Jeff King
2012-07-13 21:10 ` Junio C Hamano
0 siblings, 2 replies; 6+ messages in thread
From: Jeff King @ 2012-07-13 7:50 UTC (permalink / raw)
To: git; +Cc: git-dev
During a revision traversal in which --max-count has been
specified, we decrement a counter for each revision returned
by get_revision. When it hits 0, we typically return NULL
(the exception being if we still have boundary commits to
show).
However, before we check the counter, we call get_revision_1
to get the next commit. This might involve looking at a
large number of commits if we have restricted the traversal
(e.g., we might traverse until we find the next commit whose
diff actually matches a pathspec).
There's no need to make this get_revision_1 call when our
counter runs out. If we are not in --boundary mode, we will
just throw away the result and immediately return NULL. If
we are in --boundary mode, then we will still throw away the
result, and then start showing the boundary commits.
However, as git_revision_1 does not impact the boundary
list, it should not have an impact.
In most cases, avoiding this work will not be especially
noticeable. However, in some cases, it can make a big
difference:
[before]
$ time git rev-list -1 origin Documentation/RelNotes/1.7.11.2.txt
8d141a1d562abb31f27f599dbf6e10a6c06ed73e
real 0m0.301s
user 0m0.280s
sys 0m0.016s
[after]
$ time git rev-list -1 origin Documentation/RelNotes/1.7.11.2.txt
8d141a1d562abb31f27f599dbf6e10a6c06ed73e
real 0m0.010s
user 0m0.008s
sys 0m0.000s
Note that the output is produced almost instantaneously in
the first case, and then git uselessly spends a long time
looking for the next commit to touch that file (but there
isn't one, and we traverse all the way down to the roots).
Signed-off-by: Jeff King <peff@peff.net>
---
I'd really like more sets of eyes to make sure this won't
cause a weird regression. Calling get_revision_1 does tweak
the flags on objects as it traverses, so I'm worried about
some other code relying on this side effect of get_revision
in some way. That does seem a bit crazy to me, though.
revision.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)
diff --git a/revision.c b/revision.c
index 5b81a92..7e39655 100644
--- a/revision.c
+++ b/revision.c
@@ -2361,29 +2361,28 @@ static struct commit *get_revision_internal(struct rev_info *revs)
}
/*
- * Now pick up what they want to give us
+ * If our max_count counter has reached zero, then we are done. We
+ * don't simply return NULL because we still might need to show
+ * boundary commits. But we want to avoid calling get_revision_1, which
+ * might do a considerable amount of work finding the next commit only
+ * for us to throw it away.
+ *
+ * If it is non-zero, then either we don't have a max_count at all
+ * (-1), or it is still counting, in which case we decrement.
*/
- c = get_revision_1(revs);
- if (c) {
- while (0 < revs->skip_count) {
- revs->skip_count--;
- c = get_revision_1(revs);
- if (!c)
- break;
+ if (revs->max_count) {
+ c = get_revision_1(revs);
+ if (c) {
+ while (0 < revs->skip_count) {
+ revs->skip_count--;
+ c = get_revision_1(revs);
+ if (!c)
+ break;
+ }
}
- }
- /*
- * Check the max_count.
- */
- switch (revs->max_count) {
- case -1:
- break;
- case 0:
- c = NULL;
- break;
- default:
- revs->max_count--;
+ if (revs->max_count > 0)
+ revs->max_count--;
}
if (c)
--
1.7.11.35.gbaf554e.dirty
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] revision: avoid work after --max-count is reached
2012-07-13 7:50 [PATCH] revision: avoid work after --max-count is reached Jeff King
@ 2012-07-13 7:53 ` Jeff King
2012-07-13 21:10 ` Junio C Hamano
1 sibling, 0 replies; 6+ messages in thread
From: Jeff King @ 2012-07-13 7:53 UTC (permalink / raw)
To: git; +Cc: git-dev
On Fri, Jul 13, 2012 at 03:50:23AM -0400, Jeff King wrote:
> revision.c | 39 +++++++++++++++++++--------------------
> 1 file changed, 19 insertions(+), 20 deletions(-)
BTW, the patch is slightly hard to read because of the re-indentation.
Here it is with "-w -U5":
diff --git a/revision.c b/revision.c
index 5b81a92..7e39655 100644
--- a/revision.c
+++ b/revision.c
@@ -2359,32 +2359,31 @@ static struct commit *get_revision_internal(struct rev_info *revs)
c->object.flags |= SHOWN;
return c;
}
/*
- * Now pick up what they want to give us
+ * If our max_count counter has reached zero, then we are done. We
+ * don't simply return NULL because we still might need to show
+ * boundary commits. But we want to avoid calling get_revision_1, which
+ * might do a considerable amount of work finding the next commit only
+ * for us to throw it away.
+ *
+ * If it is non-zero, then either we don't have a max_count at all
+ * (-1), or it is still counting, in which case we decrement.
*/
+ if (revs->max_count) {
c = get_revision_1(revs);
if (c) {
while (0 < revs->skip_count) {
revs->skip_count--;
c = get_revision_1(revs);
if (!c)
break;
}
}
- /*
- * Check the max_count.
- */
- switch (revs->max_count) {
- case -1:
- break;
- case 0:
- c = NULL;
- break;
- default:
+ if (revs->max_count > 0)
revs->max_count--;
}
if (c)
c->object.flags |= SHOWN;
--
1.7.11.35.gbaf554e.dirty
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] revision: avoid work after --max-count is reached
2012-07-13 7:50 [PATCH] revision: avoid work after --max-count is reached Jeff King
2012-07-13 7:53 ` Jeff King
@ 2012-07-13 21:10 ` Junio C Hamano
2012-07-13 21:20 ` Jeff King
1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2012-07-13 21:10 UTC (permalink / raw)
To: Jeff King; +Cc: git, git-dev
Jeff King <peff@peff.net> writes:
> There's no need to make this get_revision_1 call when our
> counter runs out. If we are not in --boundary mode, we will
> just throw away the result and immediately return NULL. If
> we are in --boundary mode, then we will still throw away the
> result, and then start showing the boundary commits.
>
> However, as git_revision_1 does not impact the boundary
> list, it should not have an impact.
We used to reset 'c' to NULL ("throw away the result"), run
create_boundary_commit_list(), and ask ourselves to pop the boundary
it computed.
Now we don't call get_revision_1() and leave 'c' to NULL as
initialized ("avoid work"), and do the same.
The state create_boundary_commit_list() sees would slightly be
different, as we used to dig one level deeper, smudging more commits
with bits, but the only bits that may be smudged by this digging
that may matter in create_commit_list() is CHILD_SHOWN and SHOWN,
but by definition the commits around the commit the extra call to
get_revision_1() would have returned are marked with neither during
that extra call, so I think this conversion does not affect the
boundary list.
So I think I like this change. If anything, it makes the loop
structure simpler and a bit easier to understand.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] revision: avoid work after --max-count is reached
2012-07-13 21:10 ` Junio C Hamano
@ 2012-07-13 21:20 ` Jeff King
2012-07-13 22:12 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2012-07-13 21:20 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, git-dev
On Fri, Jul 13, 2012 at 02:10:54PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > There's no need to make this get_revision_1 call when our
> > counter runs out. If we are not in --boundary mode, we will
> > just throw away the result and immediately return NULL. If
> > we are in --boundary mode, then we will still throw away the
> > result, and then start showing the boundary commits.
> >
> > However, as git_revision_1 does not impact the boundary
> > list, it should not have an impact.
>
> We used to reset 'c' to NULL ("throw away the result"), run
> create_boundary_commit_list(), and ask ourselves to pop the boundary
> it computed.
>
> Now we don't call get_revision_1() and leave 'c' to NULL as
> initialized ("avoid work"), and do the same.
Right.
> The state create_boundary_commit_list() sees would slightly be
> different, as we used to dig one level deeper, smudging more commits
> with bits, but the only bits that may be smudged by this digging
> that may matter in create_commit_list() is CHILD_SHOWN and SHOWN,
> but by definition the commits around the commit the extra call to
> get_revision_1() would have returned are marked with neither during
> that extra call, so I think this conversion does not affect the
> boundary list.
Yeah, this was my analysis, too. Though reading get_revision-1, it seems
like we can actually set SHOWN, but I wasn't able to trigger any change
of behavior in practice. I think it is because we must set both SHOWN
and BOUNDARY to have an effect, and we do not do so.
So the only questionable thing would be: are there commits with BOUNDARY
set but not SHOWN that could be affected by calling get_revision_1? For
that matter, if such a commit existed, would the current behavior even
be correct? We would not have actually shown the commit, so if such a
case did exist, I wonder if we would be fixing a bug.
-Peff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] revision: avoid work after --max-count is reached
2012-07-13 21:20 ` Jeff King
@ 2012-07-13 22:12 ` Junio C Hamano
2012-07-14 8:10 ` Jeff King
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2012-07-13 22:12 UTC (permalink / raw)
To: Jeff King; +Cc: git, git-dev
Jeff King <peff@peff.net> writes:
> Yeah, this was my analysis, too. Though reading get_revision-1, it seems
> like we can actually set SHOWN, but I wasn't able to trigger any change
> of behavior in practice. I think it is because we must set both SHOWN
> and BOUNDARY to have an effect, and we do not do so.
In principle, SHOWN is only given when get_revision_internal gives
the commit back to be shown, and the parents of the returned commit
are painted CHILD_SHOWN. This should be the only place to paint
commit as CHILD_SHOWN.
A handful of places set the bit to commits that would be shown if
some options that further limit what is shown by topological
property (e.g. --left-only, --cherry-pick), which may cause that a
parent of a commit that was omitted due to these conditions may
later be marked incorrectly as a boundary inside
create_boundary_commit_list().
BOUNDARY is only given in create_boundary_commit_list() using
CHILD_SHOWN and SHOWN, and that should happen only once when
get_revision_1() runs out of the commits.
By the way, cherry_pick_list() pays attention to BOUNDARY, but I
think it is written overly defensively to protect itself from future
callsites. With the current code structure, it is only called from
limit_list() and get_revision_*() functions are never called until
limit_list() returns (and again create_boundary_commit_list() that
is called from get_revision_internal() is the only place that sets
BOUNDARY, so the commits cherry_pick_list() sees would never have
that bit set.
> So the only questionable thing would be: are there commits with BOUNDARY
> set but not SHOWN that could be affected by calling get_revision_1? For
> that matter, if such a commit existed, would the current behavior even
> be correct? We would not have actually shown the commit, so if such a
> case did exist, I wonder if we would be fixing a bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] revision: avoid work after --max-count is reached
2012-07-13 22:12 ` Junio C Hamano
@ 2012-07-14 8:10 ` Jeff King
0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2012-07-14 8:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, git-dev
On Fri, Jul 13, 2012 at 03:12:47PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Yeah, this was my analysis, too. Though reading get_revision-1, it seems
> > like we can actually set SHOWN, but I wasn't able to trigger any change
> > of behavior in practice. I think it is because we must set both SHOWN
> > and BOUNDARY to have an effect, and we do not do so.
>
> In principle, SHOWN is only given when get_revision_internal gives
> the commit back to be shown, and the parents of the returned commit
> are painted CHILD_SHOWN. This should be the only place to paint
> commit as CHILD_SHOWN.
>
> A handful of places set the bit to commits that would be shown if
> some options that further limit what is shown by topological
> property (e.g. --left-only, --cherry-pick), which may cause that a
> parent of a commit that was omitted due to these conditions may
> later be marked incorrectly as a boundary inside
> create_boundary_commit_list().
I think what confused me is that I thought I saw get_revision_1 setting
SHOWN in the reflog case. But of course it is _clearing_ the flag, since
the reflog walker may show the commit multiple times. And no other code
paths in get_revision_1 set it (nor should they, since as you say, it is
about us actually handing back the commit to be shown).
> BOUNDARY is only given in create_boundary_commit_list() using
> CHILD_SHOWN and SHOWN, and that should happen only once when
> get_revision_1() runs out of the commits.
Yeah, agreed.
> By the way, cherry_pick_list() pays attention to BOUNDARY, but I
> think it is written overly defensively to protect itself from future
> callsites. With the current code structure, it is only called from
> limit_list() and get_revision_*() functions are never called until
> limit_list() returns (and again create_boundary_commit_list() that
> is called from get_revision_internal() is the only place that sets
> BOUNDARY, so the commits cherry_pick_list() sees would never have
> that bit set.
I think we wouldn't be impacting it even so, because the commits that
make it to create_boundary_commit_list (and therefore get marked with
BOUNDARY) should be identical. We only put items into the
boundary_commits list when we are about to return them from
get_revision_internal, and with my patch that will not change (because
prior to my patch, we would have thrown away the commit after checking
max_count anyway).
So I think after looking again that this change is the right
thing to do, and there shouldn't be any side effects. Thanks for the
careful review.
-Peff
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-07-14 8:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-13 7:50 [PATCH] revision: avoid work after --max-count is reached Jeff King
2012-07-13 7:53 ` Jeff King
2012-07-13 21:10 ` Junio C Hamano
2012-07-13 21:20 ` Jeff King
2012-07-13 22:12 ` Junio C Hamano
2012-07-14 8:10 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).