* [PATCH 1/2] t/perf: time rev-list with UNINTERESTING commits
2014-01-20 21:28 [PATCH 0/2] performance regression in mark_edges_uninteresting Jeff King
@ 2014-01-20 21:31 ` Jeff King
2014-01-20 22:11 ` Jeff King
2014-01-20 21:32 ` [PATCH 2/2] list-objects: only look at cmdline trees with edge_hint Jeff King
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2014-01-20 21:31 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
We time a straight "rev-list --all" and its "--object"
counterpart, both going all the way to the root. However, we
do not time a partial history walk. This patch adds an
extreme case: a walk over a very small slice of history, but
with a very large set of UNINTERESTING tips. This is similar
to the connectivity check run by git on a small fetch, or
the walk done by any pre-receive hooks that want to check
incoming commits.
This test reveals a performance regression in git v1.8.4.2,
caused by fbd4a70 (list-objects: mark more commits as edges
in mark_edges_uninteresting, 2013-08-16):
Test fbd4a703^ fbd4a703
------------------------------------------------------------------------------------------
0001.1: rev-list --all 0.69(0.67+0.02) 0.69(0.68+0.01) +0.0%
0001.2: rev-list --all --objects 3.47(3.44+0.02) 3.48(3.44+0.03) +0.3%
0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0%
0001.5: rev-list --objects $commit --not --all 0.04(0.03+0.00) 0.27(0.24+0.02) +575.0%
Signed-off-by: Jeff King <peff@peff.net>
---
Sorry for the long lines in the commit message. I'm open to suggestions
on reformatting.
t/perf/p0001-rev-list.sh | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/t/perf/p0001-rev-list.sh b/t/perf/p0001-rev-list.sh
index 4f71a63..b7258a7 100755
--- a/t/perf/p0001-rev-list.sh
+++ b/t/perf/p0001-rev-list.sh
@@ -14,4 +14,21 @@ test_perf 'rev-list --all --objects' '
git rev-list --all --objects >/dev/null
'
+test_expect_success 'create new unreferenced commit' '
+ git checkout --detach HEAD &&
+ echo content >>file &&
+ git add file &&
+ git commit -m detached &&
+ commit=$(git rev-parse --verify HEAD) &&
+ git checkout -
+'
+
+test_perf 'rev-list $commit --not --all' '
+ git rev-list $commit --not --all >/dev/null
+'
+
+test_perf 'rev-list --objects $commit --not --all' '
+ git rev-list --objects $commit --not --all >/dev/null
+'
+
test_done
--
1.8.5.2.500.g8060133
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] t/perf: time rev-list with UNINTERESTING commits
2014-01-20 21:31 ` [PATCH 1/2] t/perf: time rev-list with UNINTERESTING commits Jeff King
@ 2014-01-20 22:11 ` Jeff King
2014-01-20 22:32 ` Thomas Rast
0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2014-01-20 22:11 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
On Mon, Jan 20, 2014 at 04:31:01PM -0500, Jeff King wrote:
> diff --git a/t/perf/p0001-rev-list.sh b/t/perf/p0001-rev-list.sh
> index 4f71a63..b7258a7 100755
> --- a/t/perf/p0001-rev-list.sh
> +++ b/t/perf/p0001-rev-list.sh
> @@ -14,4 +14,21 @@ test_perf 'rev-list --all --objects' '
> git rev-list --all --objects >/dev/null
> '
>
> +test_expect_success 'create new unreferenced commit' '
> + git checkout --detach HEAD &&
> + echo content >>file &&
> + git add file &&
> + git commit -m detached &&
> + commit=$(git rev-parse --verify HEAD) &&
> + git checkout -
> +'
This is bad to be touching the repo and assuming it is non-bare. For
some reason I assumed that the perf suite made a copy of the repo, but
it doesn't. If you point to a bare repo via GIT_PERF_REPO, this part of
the test fails.
It's actually enough to demonstrate the problem without changing the
tree at all. So this produces the same numbers, and works everywhere:
diff --git a/t/perf/p0001-rev-list.sh b/t/perf/p0001-rev-list.sh
index b7258a7..16359d5 100755
--- a/t/perf/p0001-rev-list.sh
+++ b/t/perf/p0001-rev-list.sh
@@ -15,12 +15,7 @@ test_perf 'rev-list --all --objects' '
'
test_expect_success 'create new unreferenced commit' '
- git checkout --detach HEAD &&
- echo content >>file &&
- git add file &&
- git commit -m detached &&
- commit=$(git rev-parse --verify HEAD) &&
- git checkout -
+ commit=$(git commit-tree HEAD^{tree} -p HEAD)
'
test_perf 'rev-list $commit --not --all' '
It still modifies the test repo, but at least in a fairly innocuous way.
-Peff
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] t/perf: time rev-list with UNINTERESTING commits
2014-01-20 22:11 ` Jeff King
@ 2014-01-20 22:32 ` Thomas Rast
2014-01-20 22:39 ` Jeff King
0 siblings, 1 reply; 13+ messages in thread
From: Thomas Rast @ 2014-01-20 22:32 UTC (permalink / raw)
To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git
Jeff King <peff@peff.net> writes:
> On Mon, Jan 20, 2014 at 04:31:01PM -0500, Jeff King wrote:
>
>> diff --git a/t/perf/p0001-rev-list.sh b/t/perf/p0001-rev-list.sh
>> index 4f71a63..b7258a7 100755
>> --- a/t/perf/p0001-rev-list.sh
>> +++ b/t/perf/p0001-rev-list.sh
>> @@ -14,4 +14,21 @@ test_perf 'rev-list --all --objects' '
>> git rev-list --all --objects >/dev/null
>> '
>>
>> +test_expect_success 'create new unreferenced commit' '
>> + git checkout --detach HEAD &&
>> + echo content >>file &&
>> + git add file &&
>> + git commit -m detached &&
>> + commit=$(git rev-parse --verify HEAD) &&
>> + git checkout -
>> +'
>
> This is bad to be touching the repo and assuming it is non-bare. For
> some reason I assumed that the perf suite made a copy of the repo, but
> it doesn't. If you point to a bare repo via GIT_PERF_REPO, this part of
> the test fails.
It does make a copy, but with cp -Rl. I haven't actually ever tried
what happens if you point it at a bare though. It *should* fail because
it tries to cd $repo/.git, but if that was itself bare...
--
Thomas Rast
tr@thomasrast.ch
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] t/perf: time rev-list with UNINTERESTING commits
2014-01-20 22:32 ` Thomas Rast
@ 2014-01-20 22:39 ` Jeff King
0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2014-01-20 22:39 UTC (permalink / raw)
To: Thomas Rast; +Cc: Nguyễn Thái Ngọc Duy, git
On Mon, Jan 20, 2014 at 11:32:12PM +0100, Thomas Rast wrote:
> > This is bad to be touching the repo and assuming it is non-bare. For
> > some reason I assumed that the perf suite made a copy of the repo, but
> > it doesn't. If you point to a bare repo via GIT_PERF_REPO, this part of
> > the test fails.
>
> It does make a copy, but with cp -Rl. I haven't actually ever tried
> what happens if you point it at a bare though. It *should* fail because
> it tries to cd $repo/.git, but if that was itself bare...
Oh, hmph. I checked my linux repo, which I had used as GIT_PERF_REPO,
and noticed that it had the test commit in its reflog. But I forgot that
is because I did the test manually there right before writing up the
t/perf script! So yes, it copies, and it's totally fine to be modifying
the repo.
Bare repos seem to work just fine for me. It looks like we use `git
rev-parse --git-dir` to get the source, and then copy that to `.git` in
the temporary directory. So that works fine either way, and we do have a
directory available as the working dir. But of course the config from
the bare repo says `core.bare = true`, so some commands will bail.
We could perhaps just set GIT_WORK_TREE in the perf scripts, which I
believe would override the bare setting in the .git/config. And then we
know the repos will be consistently non-bare.
Whether we do that or not, I think the update I posted is preferable, as
it reproduces the problem in a much simpler manner.
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] list-objects: only look at cmdline trees with edge_hint
2014-01-20 21:28 [PATCH 0/2] performance regression in mark_edges_uninteresting Jeff King
2014-01-20 21:31 ` [PATCH 1/2] t/perf: time rev-list with UNINTERESTING commits Jeff King
@ 2014-01-20 21:32 ` Jeff King
2014-01-20 23:57 ` Duy Nguyen
2014-01-20 22:29 ` [PATCH 0/2] performance regression in mark_edges_uninteresting Jeff King
2014-01-21 2:24 ` [PATCH v2 " Jeff King
3 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2014-01-20 21:32 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
When rev-list is given a command-line like:
git rev-list --objects $commit --not --all
the most accurate answer is the difference between the set
of objects reachable from $commit and the set reachable from
all of the existing refs. However, we have not historically
provided that answer, because it is very expensive to
calculate. We would have to open every tree of every commit
in the entire history.
Instead, we find the accurate set difference of the
reachable commits, and then mark the trees at the boundaries
as uninteresting. This misses objects which appear in the
trees of both the interesting commits and deep within the
uninteresting history.
Commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting, 2013-08-16) noticed that we miss
those objects during pack-objects, and added code to examine
the trees of all of the "--not" refs given on the
command-line. Note that this is still not the complete set
difference, because we look only at the tips of the
command-line arguments, not all of their reachable commits.
But it increases the set of boundary objects we consider,
which is especially important for shallow fetches. So we
are trading extra CPU time for a larger set of boundary
objects, which can improve the resulting pack size for a
--thin pack.
This tradeoff probably makes sense in the context of
pack-objects, where we have set revs->edge_hint to have the
traversal feed us the set of boundary objects. For a
regular rev-list, though, it is probably not a good
tradeoff. It is true that it makes our list slightly closer
to a true set difference, but it is a rare case where this
is important. And because we do not have revs->edge_hint
set, we do nothing useful with the larger set of boundary
objects.
This patch therefore ties the extra tree examination to the
revs->edge_hint flag; it is the presence of that flag that
makes the tradeoff worthwhile.
Here is output from the p0001-rev-list showing the
improvement in performance:
Test HEAD^ HEAD
-----------------------------------------------------------------------------------------
0001.1: rev-list --all 0.69(0.65+0.02) 0.69(0.66+0.02) +0.0%
0001.2: rev-list --all --objects 3.22(3.19+0.03) 3.23(3.20+0.03) +0.3%
0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0%
0001.5: rev-list --objects $commit --not --all 0.27(0.26+0.01) 0.04(0.04+0.00) -85.2%
Signed-off-by: Jeff King <peff@peff.net>
---
list-objects.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/list-objects.c b/list-objects.c
index 6cbedf0..43ce1d9 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -162,15 +162,17 @@ void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
}
mark_edge_parents_uninteresting(commit, revs, show_edge);
}
- for (i = 0; i < revs->cmdline.nr; i++) {
- struct object *obj = revs->cmdline.rev[i].item;
- struct commit *commit = (struct commit *)obj;
- if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
- continue;
- mark_tree_uninteresting(commit->tree);
- if (revs->edge_hint && !(obj->flags & SHOWN)) {
- obj->flags |= SHOWN;
- show_edge(commit);
+ if (revs->edge_hint) {
+ for (i = 0; i < revs->cmdline.nr; i++) {
+ struct object *obj = revs->cmdline.rev[i].item;
+ struct commit *commit = (struct commit *)obj;
+ if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
+ continue;
+ mark_tree_uninteresting(commit->tree);
+ if (revs->edge_hint && !(obj->flags & SHOWN)) {
+ obj->flags |= SHOWN;
+ show_edge(commit);
+ }
}
}
}
--
1.8.5.2.500.g8060133
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] list-objects: only look at cmdline trees with edge_hint
2014-01-20 21:32 ` [PATCH 2/2] list-objects: only look at cmdline trees with edge_hint Jeff King
@ 2014-01-20 23:57 ` Duy Nguyen
2014-01-21 2:22 ` Jeff King
0 siblings, 1 reply; 13+ messages in thread
From: Duy Nguyen @ 2014-01-20 23:57 UTC (permalink / raw)
To: Jeff King; +Cc: Git Mailing List
On Tue, Jan 21, 2014 at 4:32 AM, Jeff King <peff@peff.net> wrote:
> This patch therefore ties the extra tree examination to the
> revs->edge_hint flag; it is the presence of that flag that
> makes the tradeoff worthwhile.
>
> Here is output from the p0001-rev-list showing the
> improvement in performance:
>
> Test HEAD^ HEAD
> -----------------------------------------------------------------------------------------
> 0001.1: rev-list --all 0.69(0.65+0.02) 0.69(0.66+0.02) +0.0%
> 0001.2: rev-list --all --objects 3.22(3.19+0.03) 3.23(3.20+0.03) +0.3%
> 0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0%
> 0001.5: rev-list --objects $commit --not --all 0.27(0.26+0.01) 0.04(0.04+0.00) -85.2%
You must have so much fun (or headache, depending on your view) with
the variety of repos on github :)
> diff --git a/list-objects.c b/list-objects.c
> index 6cbedf0..43ce1d9 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -162,15 +162,17 @@ void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
> }
> mark_edge_parents_uninteresting(commit, revs, show_edge);
> }
> - for (i = 0; i < revs->cmdline.nr; i++) {
> - struct object *obj = revs->cmdline.rev[i].item;
> - struct commit *commit = (struct commit *)obj;
> - if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
> - continue;
> - mark_tree_uninteresting(commit->tree);
> - if (revs->edge_hint && !(obj->flags & SHOWN)) {
> - obj->flags |= SHOWN;
> - show_edge(commit);
> + if (revs->edge_hint) {
> + for (i = 0; i < revs->cmdline.nr; i++) {
> + struct object *obj = revs->cmdline.rev[i].item;
> + struct commit *commit = (struct commit *)obj;
> + if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
> + continue;
> + mark_tree_uninteresting(commit->tree);
> + if (revs->edge_hint && !(obj->flags & SHOWN)) {
Not really important, but perhaps remove revs->edge_hint here because
it's already checked?
> + obj->flags |= SHOWN;
> + show_edge(commit);
> + }
> }
> }
> }
--
Duy
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] list-objects: only look at cmdline trees with edge_hint
2014-01-20 23:57 ` Duy Nguyen
@ 2014-01-21 2:22 ` Jeff King
0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2014-01-21 2:22 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Git Mailing List
On Tue, Jan 21, 2014 at 06:57:08AM +0700, Duy Nguyen wrote:
> You must have so much fun (or headache, depending on your view) with
> the variety of repos on github :)
It's fun on days like these when the solution is fairly straightforward.
:)
> > + if (revs->edge_hint) {
> > + for (i = 0; i < revs->cmdline.nr; i++) {
> > + struct object *obj = revs->cmdline.rev[i].item;
> > + struct commit *commit = (struct commit *)obj;
> > + if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
> > + continue;
> > + mark_tree_uninteresting(commit->tree);
> > + if (revs->edge_hint && !(obj->flags & SHOWN)) {
>
> Not really important, but perhaps remove revs->edge_hint here because
> it's already checked?
Yes, I think that is a good idea. Thanks.
Re-roll coming in a moment.
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] performance regression in mark_edges_uninteresting
2014-01-20 21:28 [PATCH 0/2] performance regression in mark_edges_uninteresting Jeff King
2014-01-20 21:31 ` [PATCH 1/2] t/perf: time rev-list with UNINTERESTING commits Jeff King
2014-01-20 21:32 ` [PATCH 2/2] list-objects: only look at cmdline trees with edge_hint Jeff King
@ 2014-01-20 22:29 ` Jeff King
2014-01-21 2:24 ` [PATCH v2 " Jeff King
3 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2014-01-20 22:29 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
On Mon, Jan 20, 2014 at 04:28:45PM -0500, Jeff King wrote:
> [1/2]: t/perf: time rev-list with UNINTERESTING commits
> [2/2]: list-objects: only look at cmdline trees with edge_hint
>
> Here's t/perf/p0001 output that shows the problem:
>
> 0001.5: rev-list --objects $commit --not --all
> fbd4a703^ fbd4a703 HEAD
> 0.04(0.04+0.00) 0.28(0.27+0.00) +600.0% 0.04(0.04+0.00) +0.0%
Those numbers are on git.git. Obviously 600% is a lot, but 24ms is not.
However, the cost is a factor of the tree size times the number of refs.
For the "homebrew.git" repository stored at GitHub, which has ~28,000
refs (mostly pointing at pull-request tips in refs/pull), the numbers
are much more dramatic:
fbd4a703^ fbd4a703 HEAD
0.50(0.46+0.02) 8.23(8.17+0.06) +1546.0% 0.50(0.48+0.01) +0.0%
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 0/2] performance regression in mark_edges_uninteresting
2014-01-20 21:28 [PATCH 0/2] performance regression in mark_edges_uninteresting Jeff King
` (2 preceding siblings ...)
2014-01-20 22:29 ` [PATCH 0/2] performance regression in mark_edges_uninteresting Jeff King
@ 2014-01-21 2:24 ` Jeff King
2014-01-21 2:25 ` [PATCH v2 1/2] t/perf: time rev-list with UNINTERESTING commits Jeff King
2014-01-21 2:25 ` [PATCH v2 2/2] list-objects: only look at cmdline trees with edge_hint Jeff King
3 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2014-01-21 2:24 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: Junio C Hamano, git
On Mon, Jan 20, 2014 at 04:28:45PM -0500, Jeff King wrote:
> This series fixes a rev-list performance regression in fbd4a70 (list-objects:
> mark more commits as edges in mark_edges_uninteresting, 2013-08-16). That
> commit is a little tricky because it actually _knows_ it's trading off CPU for
> a better packfile, but I think we're performing the tradeoff in too many
> places. See the second commit for details.
>
> [1/2]: t/perf: time rev-list with UNINTERESTING commits
> [2/2]: list-objects: only look at cmdline trees with edge_hint
>
> Here's t/perf/p0001 output that shows the problem:
>
> 0001.5: rev-list --objects $commit --not --all
> fbd4a703^ fbd4a703 HEAD
> 0.04(0.04+0.00) 0.28(0.27+0.00) +600.0% 0.04(0.04+0.00) +0.0%
Here's v2 that addresses the minor comments on the list (simpler test in
the first patch, and dropping the now-redundant revs->edge_hint check in
the second patch).
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/2] t/perf: time rev-list with UNINTERESTING commits
2014-01-21 2:24 ` [PATCH v2 " Jeff King
@ 2014-01-21 2:25 ` Jeff King
2014-01-21 2:25 ` [PATCH v2 2/2] list-objects: only look at cmdline trees with edge_hint Jeff King
1 sibling, 0 replies; 13+ messages in thread
From: Jeff King @ 2014-01-21 2:25 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: Junio C Hamano, git
We time a straight "rev-list --all" and its "--object"
counterpart, both going all the way to the root. However, we
do not time a partial history walk. This patch adds an
extreme case: a walk over a very small slice of history, but
with a very large set of UNINTERESTING tips. This is similar
to the connectivity check run by git on a small fetch, or
the walk done by any pre-receive hooks that want to check
incoming commits.
This test reveals a performance regression in git v1.8.4.2,
caused by fbd4a70 (list-objects: mark more commits as edges
in mark_edges_uninteresting, 2013-08-16):
Test fbd4a703^ fbd4a703
------------------------------------------------------------------------------------------
0001.1: rev-list --all 0.69(0.67+0.02) 0.69(0.68+0.01) +0.0%
0001.2: rev-list --all --objects 3.47(3.44+0.02) 3.48(3.44+0.03) +0.3%
0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0%
0001.5: rev-list --objects $commit --not --all 0.04(0.03+0.00) 0.27(0.24+0.02) +575.0%
Signed-off-by: Jeff King <peff@peff.net>
---
t/perf/p0001-rev-list.sh | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/t/perf/p0001-rev-list.sh b/t/perf/p0001-rev-list.sh
index 4f71a63..16359d5 100755
--- a/t/perf/p0001-rev-list.sh
+++ b/t/perf/p0001-rev-list.sh
@@ -14,4 +14,16 @@ test_perf 'rev-list --all --objects' '
git rev-list --all --objects >/dev/null
'
+test_expect_success 'create new unreferenced commit' '
+ commit=$(git commit-tree HEAD^{tree} -p HEAD)
+'
+
+test_perf 'rev-list $commit --not --all' '
+ git rev-list $commit --not --all >/dev/null
+'
+
+test_perf 'rev-list --objects $commit --not --all' '
+ git rev-list --objects $commit --not --all >/dev/null
+'
+
test_done
--
1.8.5.2.500.g8060133
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/2] list-objects: only look at cmdline trees with edge_hint
2014-01-21 2:24 ` [PATCH v2 " Jeff King
2014-01-21 2:25 ` [PATCH v2 1/2] t/perf: time rev-list with UNINTERESTING commits Jeff King
@ 2014-01-21 2:25 ` Jeff King
2014-01-21 22:49 ` Junio C Hamano
1 sibling, 1 reply; 13+ messages in thread
From: Jeff King @ 2014-01-21 2:25 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: Junio C Hamano, git
When rev-list is given a command-line like:
git rev-list --objects $commit --not --all
the most accurate answer is the difference between the set
of objects reachable from $commit and the set reachable from
all of the existing refs. However, we have not historically
provided that answer, because it is very expensive to
calculate. We would have to open every tree of every commit
in the entire history.
Instead, we find the accurate set difference of the
reachable commits, and then mark the trees at the boundaries
as uninteresting. This misses objects which appear in the
trees of both the interesting commits and deep within the
uninteresting history.
Commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting, 2013-08-16) noticed that we miss
those objects during pack-objects, and added code to examine
the trees of all of the "--not" refs given on the
command-line. Note that this is still not the complete set
difference, because we look only at the tips of the
command-line arguments, not all of their reachable commits.
But it increases the set of boundary objects we consider,
which is especially important for shallow fetches. So we
are trading extra CPU time for a larger set of boundary
objects, which can improve the resulting pack size for a
--thin pack.
This tradeoff probably makes sense in the context of
pack-objects, where we have set revs->edge_hint to have the
traversal feed us the set of boundary objects. For a
regular rev-list, though, it is probably not a good
tradeoff. It is true that it makes our list slightly closer
to a true set difference, but it is a rare case where this
is important. And because we do not have revs->edge_hint
set, we do nothing useful with the larger set of boundary
objects.
This patch therefore ties the extra tree examination to the
revs->edge_hint flag; it is the presence of that flag that
makes the tradeoff worthwhile.
Here is output from the p0001-rev-list showing the
improvement in performance:
Test HEAD^ HEAD
-----------------------------------------------------------------------------------------
0001.1: rev-list --all 0.69(0.65+0.02) 0.69(0.66+0.02) +0.0%
0001.2: rev-list --all --objects 3.22(3.19+0.03) 3.23(3.20+0.03) +0.3%
0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0%
0001.5: rev-list --objects $commit --not --all 0.27(0.26+0.01) 0.04(0.04+0.00) -85.2%
Signed-off-by: Jeff King <peff@peff.net>
---
list-objects.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/list-objects.c b/list-objects.c
index 6cbedf0..206816f 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -162,15 +162,17 @@ void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
}
mark_edge_parents_uninteresting(commit, revs, show_edge);
}
- for (i = 0; i < revs->cmdline.nr; i++) {
- struct object *obj = revs->cmdline.rev[i].item;
- struct commit *commit = (struct commit *)obj;
- if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
- continue;
- mark_tree_uninteresting(commit->tree);
- if (revs->edge_hint && !(obj->flags & SHOWN)) {
- obj->flags |= SHOWN;
- show_edge(commit);
+ if (revs->edge_hint) {
+ for (i = 0; i < revs->cmdline.nr; i++) {
+ struct object *obj = revs->cmdline.rev[i].item;
+ struct commit *commit = (struct commit *)obj;
+ if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
+ continue;
+ mark_tree_uninteresting(commit->tree);
+ if (!(obj->flags & SHOWN)) {
+ obj->flags |= SHOWN;
+ show_edge(commit);
+ }
}
}
}
--
1.8.5.2.500.g8060133
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/2] list-objects: only look at cmdline trees with edge_hint
2014-01-21 2:25 ` [PATCH v2 2/2] list-objects: only look at cmdline trees with edge_hint Jeff King
@ 2014-01-21 22:49 ` Junio C Hamano
0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2014-01-21 22:49 UTC (permalink / raw)
To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git
Jeff King <peff@peff.net> writes:
> This tradeoff probably makes sense in the context of
> pack-objects, where we have set revs->edge_hint to have the
> traversal feed us the set of boundary objects. For a
> regular rev-list, though, it is probably not a good
> tradeoff. It is true that it makes our list slightly closer
> to a true set difference, but it is a rare case where this
> is important. And because we do not have revs->edge_hint
> set, we do nothing useful with the larger set of boundary
> objects.
>
> This patch therefore ties the extra tree examination to the
> revs->edge_hint flag; it is the presence of that flag that
> makes the tradeoff worthwhile.
Makes sense. Thanks. Will queue.
>
> Here is output from the p0001-rev-list showing the
> improvement in performance:
>
> Test HEAD^ HEAD
> -----------------------------------------------------------------------------------------
> 0001.1: rev-list --all 0.69(0.65+0.02) 0.69(0.66+0.02) +0.0%
> 0001.2: rev-list --all --objects 3.22(3.19+0.03) 3.23(3.20+0.03) +0.3%
> 0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0%
> 0001.5: rev-list --objects $commit --not --all 0.27(0.26+0.01) 0.04(0.04+0.00) -85.2%
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> list-objects.c | 20 +++++++++++---------
> 1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/list-objects.c b/list-objects.c
> index 6cbedf0..206816f 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -162,15 +162,17 @@ void mark_edges_uninteresting(struct rev_info *revs, show_edge_fn show_edge)
> }
> mark_edge_parents_uninteresting(commit, revs, show_edge);
> }
> - for (i = 0; i < revs->cmdline.nr; i++) {
> - struct object *obj = revs->cmdline.rev[i].item;
> - struct commit *commit = (struct commit *)obj;
> - if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
> - continue;
> - mark_tree_uninteresting(commit->tree);
> - if (revs->edge_hint && !(obj->flags & SHOWN)) {
> - obj->flags |= SHOWN;
> - show_edge(commit);
> + if (revs->edge_hint) {
> + for (i = 0; i < revs->cmdline.nr; i++) {
> + struct object *obj = revs->cmdline.rev[i].item;
> + struct commit *commit = (struct commit *)obj;
> + if (obj->type != OBJ_COMMIT || !(obj->flags & UNINTERESTING))
> + continue;
> + mark_tree_uninteresting(commit->tree);
> + if (!(obj->flags & SHOWN)) {
> + obj->flags |= SHOWN;
> + show_edge(commit);
> + }
> }
> }
> }
^ permalink raw reply [flat|nested] 13+ messages in thread