* git filter-branch not removing commits when it should in 2.7.0
@ 2016-01-19 20:48 John Fultz
2016-01-19 21:14 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: John Fultz @ 2016-01-19 20:48 UTC (permalink / raw)
To: git
This seems to be a 2.7.0 regression in filter-branch. The bug is reproducible on Mac/Windows (haven't tried Linux) in the 2.7.0 production releases.
Make an empty repo and put an empty commit in the history. E.g.,
echo > foo && git add . && git commit -m "commit 1" && git commit --allow-empty -m "commit 2"
Now try to use filter-branch to remove the empty commit. Both of the following methods leave master unchanged, but both worked in 2.6.4:
git filter-branch --prune-empty
git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"'
Let me know if you need any more information. Thanks.
Sincerely,
John Fultz
jfultz@wolfram.com
User Interface Group
Wolfram Research, Inc.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git filter-branch not removing commits when it should in 2.7.0
2016-01-19 20:48 git filter-branch not removing commits when it should in 2.7.0 John Fultz
@ 2016-01-19 21:14 ` Junio C Hamano
2016-01-19 21:35 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-01-19 21:14 UTC (permalink / raw)
To: Jeff King; +Cc: git, John Fultz
John Fultz <jfultz@wolfram.com> writes:
> This seems to be a 2.7.0 regression in filter-branch. The bug is reproducible on Mac/Windows (haven't tried Linux) in the 2.7.0 production releases.
>
> Make an empty repo and put an empty commit in the history. E.g.,
>
> echo > foo && git add . && git commit -m "commit 1" && git commit --allow-empty -m "commit 2"
>
> Now try to use filter-branch to remove the empty commit. Both of the following methods leave master unchanged, but both worked in 2.6.4:
>
> git filter-branch --prune-empty
> git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"'
>
> Let me know if you need any more information. Thanks.
>
> Sincerely,
>
> John Fultz
> jfultz@wolfram.com
> User Interface Group
> Wolfram Research, Inc.
Thanks.
Since there were only 5 changes to git-filter-branch.sh between
v2.6.0 and v2.7.0, it was fairly easy to pinpoint.
Reverting the following commit from v2.7.0 seems to give the same
result as v2.6.0 for "--prune-empty" experiment.
commit 348d4f2fc5d3c4f7ba47079b96676b4e2dd831fc
Author: Jeff King <peff@peff.net>
Date: Fri Nov 6 01:24:29 2015 -0500
filter-branch: skip index read/write when possible
If the user specifies an index filter but not a tree filter,
filter-branch cleverly avoids checking out the tree
entirely. But we don't do the next level of optimization: if
you have no index or tree filter, we do not need to read the
index at all.
This can greatly speed up cases where we are only changing
the commit objects (e.g., cementing a graft into place).
Here are numbers from the newly-added perf test:
Test HEAD^ HEAD
---------------------------------------------------------------
7000.2: noop filter 13.81(4.95+0.83) 5.43(0.42+0.43) -60.7%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git filter-branch not removing commits when it should in 2.7.0
2016-01-19 21:14 ` Junio C Hamano
@ 2016-01-19 21:35 ` Junio C Hamano
2016-01-19 21:37 ` Jeff King
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-01-19 21:35 UTC (permalink / raw)
To: Jeff King; +Cc: git, John Fultz
Junio C Hamano <gitster@pobox.com> writes:
> John Fultz <jfultz@wolfram.com> writes:
>
>> This seems to be a 2.7.0 regression in filter-branch. The bug is reproducible on Mac/Windows (haven't tried Linux) in the 2.7.0 production releases.
>>
>> Make an empty repo and put an empty commit in the history. E.g.,
>>
>> echo > foo && git add . && git commit -m "commit 1" && git commit --allow-empty -m "commit 2"
>>
>> Now try to use filter-branch to remove the empty commit. Both of the following methods leave master unchanged, but both worked in 2.6.4:
>>
>> git filter-branch --prune-empty
>> git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"'
>
> Thanks.
>
> Since there were only 5 changes to git-filter-branch.sh between
> v2.6.0 and v2.7.0, it was fairly easy to pinpoint.
>
> Reverting the following commit from v2.7.0 seems to give the same
> result as v2.6.0 for "--prune-empty" experiment.
>
> commit 348d4f2fc5d3c4f7ba47079b96676b4e2dd831fc
> Author: Jeff King <peff@peff.net>
> Date: Fri Nov 6 01:24:29 2015 -0500
>
> filter-branch: skip index read/write when possible
>
> If the user specifies an index filter but not a tree filter,
> filter-branch cleverly avoids checking out the tree
> entirely. But we don't do the next level of optimization: if
> you have no index or tree filter, we do not need to read the
> index at all.
>
> This can greatly speed up cases where we are only changing
> the commit objects (e.g., cementing a graft into place).
> Here are numbers from the newly-added perf test:
>
> Test HEAD^ HEAD
> ---------------------------------------------------------------
> 7000.2: noop filter 13.81(4.95+0.83) 5.43(0.42+0.43) -60.7%
>
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
OK, is this because git_commit_non_empty_tree() does this to decide
that it should skip the commit:
git_commit_non_empty_tree()
{
if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}");
then
map "$3"
else
git commit-tree "$@"
fi
}
where its parameters when --prune-empty is in use (or when the
function is used as the commit-filter), $1 is "$tree", $2 and $3 are
"-p" and its sole commit object name $commit (which is read from the
revs file, which is an output from rev-list, so it is known to be
40-hex) and tree after the said patch is computed like so:
if test -n "$need_index"
then
tree=$(git write-tree)
else
tree="$commit^{tree}"
fi
i.e. the helper does textual comparison between "$FOURTY_HEX^{tree}"
and 40-hex from rev-parse "$3^{tree}"?
In other words, would the fix be a one-liner like this?
git-filter-branch.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 98f1779..86b2ff1 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -404,7 +404,7 @@ while read commit parents; do
then
tree=$(git write-tree)
else
- tree="$commit^{tree}"
+ tree=$(git rev-parse "$commit^{tree}")
fi
workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
"$tree" $parentstr < ../message > ../map/$commit ||
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: git filter-branch not removing commits when it should in 2.7.0
2016-01-19 21:35 ` Junio C Hamano
@ 2016-01-19 21:37 ` Jeff King
2016-01-19 21:46 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2016-01-19 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, John Fultz
On Tue, Jan 19, 2016 at 01:35:25PM -0800, Junio C Hamano wrote:
> In other words, would the fix be a one-liner like this?
> [...]
> - tree="$commit^{tree}"
> + tree=$(git rev-parse "$commit^{tree}")
Yes, I was just writing up the commit message for it. :-/
It _is_ slower, though, because it introduces an extra rev-parse. When
we could in fact be getting rid of one. Give me a moment to complete a
few timing tests and post the results.
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git filter-branch not removing commits when it should in 2.7.0
2016-01-19 21:37 ` Jeff King
@ 2016-01-19 21:46 ` Junio C Hamano
2016-01-19 21:51 ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-01-19 21:46 UTC (permalink / raw)
To: Jeff King; +Cc: git, John Fultz
Jeff King <peff@peff.net> writes:
> On Tue, Jan 19, 2016 at 01:35:25PM -0800, Junio C Hamano wrote:
>
>> In other words, would the fix be a one-liner like this?
>> [...]
>> - tree="$commit^{tree}"
>> + tree=$(git rev-parse "$commit^{tree}")
>
> Yes, I was just writing up the commit message for it. :-/
>
> It _is_ slower, though, because it introduces an extra rev-parse. When
> we could in fact be getting rid of one. Give me a moment to complete a
> few timing tests and post the results.
Good point.
We should do that rev-parse in the helper function. That rev-parse
is there only because the skip-empty code wants to know the exact
object name when comparing. There is no reason for this code to do
it for the helper--the helper, if (and only if) it is called, can
do the rev-parse itself, and we can still omit the overhead when
we are not skipping empty ones.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-19 21:46 ` Junio C Hamano
@ 2016-01-19 21:51 ` Jeff King
2016-01-19 21:59 ` Jeff King
2016-01-20 0:47 ` Jonathan Nieder
0 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2016-01-19 21:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, John Fultz
On Tue, Jan 19, 2016 at 01:46:35PM -0800, Junio C Hamano wrote:
> > It _is_ slower, though, because it introduces an extra rev-parse. When
> > we could in fact be getting rid of one. Give me a moment to complete a
> > few timing tests and post the results.
>
> Good point.
>
> We should do that rev-parse in the helper function. That rev-parse
> is there only because the skip-empty code wants to know the exact
> object name when comparing. There is no reason for this code to do
> it for the helper--the helper, if (and only if) it is called, can
> do the rev-parse itself, and we can still omit the overhead when
> we are not skipping empty ones.
Here's the patch I came up with. It takes the conservative choice (see
the argument below), and shows the performance impact. I'll work up the
non-conservative one on top, which I think can do even better than the
original.
-- >8 --
Subject: filter-branch: resolve $commit^{tree} in no-index case
Commit 348d4f2 (filter-branch: skip index read/write when
possible, 2015-11-06) taught filter-branch to optimize out
the final "git write-tree" when we know we haven't touched
the tree with any of our filters. It does by simply putting
the literal text "$commit^{tree}" into the "$tree" variable,
avoiding a useless rev-parse call.
However, when we pass this to git_commit_non_empty_tree(),
it gets confused; it resolves "$commit^{tree}" itself, and
compares our string to the 40-hex sha1, which obviously
doesn't match. As a result, "--prune-empty" (or any custom
filter using git_commit_non_empty_tree) will fail to drop
an empty commit (when filter-branch is used without a tree
or index filter).
Let's resolve $tree to the 40-hex ourselves, so that
git_commit_non_empty_tree can work. Unfortunately, this is a
bit slower due to the extra process overhead:
$ cd t/perf && ./run 348d4f2 HEAD p7000-filter-branch.sh
[...]
Test 348d4f2 HEAD
--------------------------------------------------------------
7000.2: noop filter 3.76(0.24+0.26) 4.54(0.28+0.24) +20.7%
However, the value of $tree here is technically
user-visible. The user can provide arbitrary shell code at
this stage, which could itself have a similar assumption to
what is in git_commit_non_empty_tree. So the conservative
choice to fix this regression is to take the 20% hit and
give the pre-348d4f2 behavior. We still end up much faster
than before the optimization:
$ cd t/perf && ./run 348d4f2^ HEAD p7000-filter-branch.sh
[...]
Test 348d4f2^ HEAD
--------------------------------------------------------------
7000.2: noop filter 9.51(4.32+0.40) 4.51(0.28+0.23) -52.6%
Signed-off-by: Jeff King <peff@peff.net>
---
git-filter-branch.sh | 2 +-
t/t7003-filter-branch.sh | 8 ++++++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index d61f9ba..5e094ce 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -404,7 +404,7 @@ while read commit parents; do
then
tree=$(git write-tree)
else
- tree="$commit^{tree}"
+ tree=$(git rev-parse "$commit^{tree}")
fi
workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
"$tree" $parentstr < ../message > ../map/$commit ||
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 377c648..97c23c2 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -333,6 +333,14 @@ test_expect_success 'prune empty collapsed merges' '
test_cmp expect actual
'
+test_expect_success 'prune empty works even without index/tree filters' '
+ git rev-list HEAD >expect &&
+ git commit --allow-empty -m empty &&
+ git filter-branch -f --prune-empty HEAD &&
+ git rev-list HEAD >actual &&
+ test_cmp expect actual
+'
+
test_expect_success '--remap-to-ancestor with filename filters' '
git checkout master &&
git reset --hard A &&
--
2.7.0.248.g5eafd77
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-19 21:51 ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King
@ 2016-01-19 21:59 ` Jeff King
2016-01-19 22:07 ` Jeff King
2016-01-19 22:28 ` Jeff King
2016-01-20 0:47 ` Jonathan Nieder
1 sibling, 2 replies; 19+ messages in thread
From: Jeff King @ 2016-01-19 21:59 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, John Fultz
On Tue, Jan 19, 2016 at 04:51:00PM -0500, Jeff King wrote:
> Here's the patch I came up with. It takes the conservative choice (see
> the argument below), and shows the performance impact. I'll work up the
> non-conservative one on top, which I think can do even better than the
> original.
Oh, never mind. I had thought we could do away with _both_ rev-parses
and compare "foo^{tree}" to "foo^{tree}".
But of course that does not work. Because it is really "$commit^{tree}"
versus "$parent^{tree}" here. I need to update the commit message below.
We could get away "git diff --exit-code $1 $2" to do a single process
invocation (rather than two rev-parses), but I don't know if it is worth
the complexity.
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-19 21:59 ` Jeff King
@ 2016-01-19 22:07 ` Jeff King
2016-01-19 22:23 ` Junio C Hamano
2016-01-19 22:28 ` Jeff King
1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2016-01-19 22:07 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, John Fultz
On Tue, Jan 19, 2016 at 04:59:28PM -0500, Jeff King wrote:
> Oh, never mind. I had thought we could do away with _both_ rev-parses
> and compare "foo^{tree}" to "foo^{tree}".
>
> But of course that does not work. Because it is really "$commit^{tree}"
> versus "$parent^{tree}" here. I need to update the commit message below.
Actually, I ended up not including the part about doing that comparison.
So the commit message isn't _wrong_, but it is incomplete. This
paragraph:
However, the value of $tree here is technically
user-visible. The user can provide arbitrary shell code at
this stage, which could itself have a similar assumption to
what is in git_commit_non_empty_tree. So the conservative
choice to fix this regression is to take the 20% hit and
give the pre-348d4f2 behavior.
should start with something like:
We could try to make git_commit_non_empty_tree more clever.
to make it clear why we care about the user-visibility of the variable
in the first place.
Here is a re-roll with that fixup.
-- >8 --
Subject: [PATCH v2] filter-branch: resolve $commit^{tree} in no-index case
Commit 348d4f2 (filter-branch: skip index read/write when
possible, 2015-11-06) taught filter-branch to optimize out
the final "git write-tree" when we know we haven't touched
the tree with any of our filters. It does by simply putting
the literal text "$commit^{tree}" into the "$tree" variable,
avoiding a useless rev-parse call.
However, when we pass this to git_commit_non_empty_tree(),
it gets confused; it resolves "$commit^{tree}" itself, and
compares our string to the 40-hex sha1, which obviously
doesn't match. As a result, "--prune-empty" (or any custom
filter using git_commit_non_empty_tree) will fail to drop
an empty commit (when filter-branch is used without a tree
or index filter).
Let's resolve $tree to the 40-hex ourselves, so that
git_commit_non_empty_tree can work. Unfortunately, this is a
bit slower due to the extra process overhead:
$ cd t/perf && ./run 348d4f2 HEAD p7000-filter-branch.sh
[...]
Test 348d4f2 HEAD
--------------------------------------------------------------
7000.2: noop filter 3.76(0.24+0.26) 4.54(0.28+0.24) +20.7%
We could try to make git_commit_non_empty_tree more clever.
However, the value of $tree here is technically
user-visible. The user can provide arbitrary shell code at
this stage, which could itself have a similar assumption to
what is in git_commit_non_empty_tree. So the conservative
choice to fix this regression is to take the 20% hit and
give the pre-348d4f2 behavior. We still end up much faster
than before the optimization:
$ cd t/perf && ./run 348d4f2^ HEAD p7000-filter-branch.sh
[...]
Test 348d4f2^ HEAD
--------------------------------------------------------------
7000.2: noop filter 9.51(4.32+0.40) 4.51(0.28+0.23) -52.6%
Signed-off-by: Jeff King <peff@peff.net>
---
git-filter-branch.sh | 2 +-
t/t7003-filter-branch.sh | 8 ++++++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index d61f9ba..5e094ce 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -404,7 +404,7 @@ while read commit parents; do
then
tree=$(git write-tree)
else
- tree="$commit^{tree}"
+ tree=$(git rev-parse "$commit^{tree}")
fi
workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
"$tree" $parentstr < ../message > ../map/$commit ||
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 377c648..97c23c2 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -333,6 +333,14 @@ test_expect_success 'prune empty collapsed merges' '
test_cmp expect actual
'
+test_expect_success 'prune empty works even without index/tree filters' '
+ git rev-list HEAD >expect &&
+ git commit --allow-empty -m empty &&
+ git filter-branch -f --prune-empty HEAD &&
+ git rev-list HEAD >actual &&
+ test_cmp expect actual
+'
+
test_expect_success '--remap-to-ancestor with filename filters' '
git checkout master &&
git reset --hard A &&
--
2.7.0.248.g5eafd77
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-19 22:07 ` Jeff King
@ 2016-01-19 22:23 ` Junio C Hamano
0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2016-01-19 22:23 UTC (permalink / raw)
To: Jeff King; +Cc: git, John Fultz
Jeff King <peff@peff.net> writes:
> Actually, I ended up not including the part about doing that comparison.
> So the commit message isn't _wrong_, but it is incomplete. This
> paragraph:
>
> However, the value of $tree here is technically
> user-visible. The user can provide arbitrary shell code at
> this stage, which could itself have a similar assumption to
> what is in git_commit_non_empty_tree. So the conservative
> choice to fix this regression is to take the 20% hit and
> give the pre-348d4f2 behavior.
>
> should start with something like:
>
> We could try to make git_commit_non_empty_tree more clever.
>
> to make it clear why we care about the user-visibility of the variable
> in the first place.
>
> Here is a re-roll with that fixup.
Thanks. I failed to spot and realize that $tree is also part of
what we expose and end-user scripts may already be looking at when I
commented, and I agree with the reasoning behind this conservative
route.
Will queue.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-19 21:59 ` Jeff King
2016-01-19 22:07 ` Jeff King
@ 2016-01-19 22:28 ` Jeff King
2016-01-19 22:48 ` Jeff King
2016-01-20 1:22 ` Jonathan Nieder
1 sibling, 2 replies; 19+ messages in thread
From: Jeff King @ 2016-01-19 22:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, John Fultz
On Tue, Jan 19, 2016 at 04:59:28PM -0500, Jeff King wrote:
> We could get away "git diff --exit-code $1 $2" to do a single process
> invocation (rather than two rev-parses), but I don't know if it is worth
> the complexity.
And here's that patch.
I'm actually a little iffy on it because it switches to "diff-tree" from
a raw-sha1 comparison. For a well-formed repo, that shouldn't matter.
But what if you had a commit that was replacing a malformed tree object,
but not otherwise changing the diff? We might drop it as "empty", even
though you'd prefer to keep it.
I guess some alternative ways of doing the same thing are:
1. Use "git rev-parse $1 $3^{tree}", parse the output and compare.
Surprisingly tedious to do in a shell without invoking any extra
sub-processes.
2. When we do the initial rev-list, output the tree, too. Then we get
the old tree sha1 for "free" (no extra process, but at the cost of
slightly more I/O by the shell).
I suspect we could also cache the tree of the parent and avoid the
rev-parse in git_commit_non_empty_tree(), too.
I dunno. I'm inclined to say that none of this is worth it to try to
drop one or two processes. Writing filter-branch in a better language
(or just using BFG) would probably be a more productive use of time.
20% looks like a lot, but that's because it's pretty fast in the first
place. The timings I showed earlier (and below) are for git.git, which
is not that huge. But the savings from 348d4f2 are really about avoiding
looking at the trees entirely; the bigger your tree, the more you save.
Running it on linux.git should show that we're still reclaiming most of
the original optimization.
So I'm inclined to go with the "conservative" fix I sent initially, and
drop this micro-optimization.
-- >8 --
Subject: [PATCH] filter-branch: optimize out rev-parse for unchanged tree
The prior commit noticed that we started feeding
"$commit^{tree}" to git_commit_non_empty_tree instead of its
resolved 40-hex sha1. It made the conservative fix of
resolving and passing the 40-hex sha1, even though it does
add some overhead.
This patch takes the less conservative choice, under the
assumption that we are unlikely to break anybody's
commit-filter by showing them the unresolved name
"$commit^{tree}" (which is probably the case, because most
people would end their filter with "git commit-tree" or
"git_commit_non_empty_tree").
We can make the comparison in the latter function more
clever by using "diff-tree --quiet", which will compare the
two trees with only a single process invocation.
Here are the resulting numbers from p7000:
Test 348d4f2^ 348d4f2 HEAD^ HEAD
----------------------------------------------------------------------------------------------------------------
7000.2: noop filter 9.48(4.29+0.41) 3.73(0.22+0.26) -60.7% 4.56(0.27+0.26) -51.9% 3.74(0.25+0.25) -60.5%
Signed-off-by: Jeff King <peff@peff.net>
---
git-filter-branch.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5e094ce..d1879e3 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -44,7 +44,7 @@ skip_commit()
# it will skip commits that leave the tree untouched, commit the other.
git_commit_non_empty_tree()
{
- if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then
+ if test $# = 3 && git diff-tree --quiet "$1" "$3"; then
map "$3"
else
git commit-tree "$@"
@@ -404,7 +404,7 @@ while read commit parents; do
then
tree=$(git write-tree)
else
- tree=$(git rev-parse "$commit^{tree}")
+ tree="$commit^{tree}"
fi
workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
"$tree" $parentstr < ../message > ../map/$commit ||
--
2.7.0.248.g5eafd77
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-19 22:28 ` Jeff King
@ 2016-01-19 22:48 ` Jeff King
2016-01-20 1:22 ` Jonathan Nieder
1 sibling, 0 replies; 19+ messages in thread
From: Jeff King @ 2016-01-19 22:48 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, John Fultz
On Tue, Jan 19, 2016 at 05:28:02PM -0500, Jeff King wrote:
> I dunno. I'm inclined to say that none of this is worth it to try to
> drop one or two processes. Writing filter-branch in a better language
> (or just using BFG) would probably be a more productive use of time.
> 20% looks like a lot, but that's because it's pretty fast in the first
> place. The timings I showed earlier (and below) are for git.git, which
> is not that huge. But the savings from 348d4f2 are really about avoiding
> looking at the trees entirely; the bigger your tree, the more you save.
> Running it on linux.git should show that we're still reclaiming most of
> the original optimization.
In case anyone is curious, here are the linux.git numbers (re-wrapped,
because t/perf produces some really long lines; that might be worth
addressing):
348d4f2^ 295.32(269.61+14.36)
348d4f2 7.92( 0.85+ 0.72) -97.3%
HEAD^ 9.37( 0.87+ 0.80) -96.8%
HEAD 7.71( 0.92+ 0.62) -97.4%
So yes, the conservative fix costs us about 1.5 seconds, or 18%, over
the micro-optimized one. But the original point was to save over 280
seconds, which we still do.
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-19 21:51 ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King
2016-01-19 21:59 ` Jeff King
@ 2016-01-20 0:47 ` Jonathan Nieder
1 sibling, 0 replies; 19+ messages in thread
From: Jonathan Nieder @ 2016-01-20 0:47 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git, John Fultz
Jeff King wrote:
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> git-filter-branch.sh | 2 +-
> t/t7003-filter-branch.sh | 8 ++++++++
> 2 files changed, 9 insertions(+), 1 deletion(-)
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-19 22:28 ` Jeff King
2016-01-19 22:48 ` Jeff King
@ 2016-01-20 1:22 ` Jonathan Nieder
2016-01-20 1:34 ` Jeff King
1 sibling, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2016-01-20 1:22 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git, John Fultz
Jeff King wrote:
> On Tue, Jan 19, 2016 at 04:59:28PM -0500, Jeff King wrote:
>> We could get away "git diff --exit-code $1 $2" to do a single process
>> invocation (rather than two rev-parses), but I don't know if it is worth
>> the complexity.
>
> And here's that patch.
>
> I'm actually a little iffy on it because it switches to "diff-tree" from
> a raw-sha1 comparison. For a well-formed repo, that shouldn't matter.
> But what if you had a commit that was replacing a malformed tree object,
> but not otherwise changing the diff? We might drop it as "empty", even
> though you'd prefer to keep it.
Mph. We could get the best of both worlds by introducing a "git
rev-parse --compare <a> <b>" that compares object ids. Actually...
How about something like this?
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
git-filter-branch.sh | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 86b2ff1..06f4e0f 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -44,7 +44,15 @@ skip_commit()
# it will skip commits that leave the tree untouched, commit the other.
git_commit_non_empty_tree()
{
- if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then
+ if
+ test $# = 3 &&
+ git rev-parse "$1" "$3^{tree}" |
+ {
+ read a
+ read b
+ test "$a" = "$b"
+ }
+ then
map "$3"
else
git commit-tree "$@"
@@ -404,7 +412,7 @@ while read commit parents; do
then
tree=$(git write-tree)
else
- tree=$(git rev-parse "$commit^{tree}")
+ tree="$commit^{tree}"
fi
workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
"$tree" $parentstr < ../message > ../map/$commit ||
--
2.7.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-20 1:22 ` Jonathan Nieder
@ 2016-01-20 1:34 ` Jeff King
2016-01-20 1:51 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2016-01-20 1:34 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Junio C Hamano, git, John Fultz
On Tue, Jan 19, 2016 at 05:22:53PM -0800, Jonathan Nieder wrote:
> > I'm actually a little iffy on it because it switches to "diff-tree" from
> > a raw-sha1 comparison. For a well-formed repo, that shouldn't matter.
> > But what if you had a commit that was replacing a malformed tree object,
> > but not otherwise changing the diff? We might drop it as "empty", even
> > though you'd prefer to keep it.
>
> Mph. We could get the best of both worlds by introducing a "git
> rev-parse --compare <a> <b>" that compares object ids. Actually...
>
> How about something like this?
Thanks. I had in my head that we could do something like that, but
hadn't quite worked it out. I think what you wrote works.
If you want to wrap it up into a patch, I'd be OK with it, but note that
it still falls afoul of changing $tree in a user-visible way (so you
should note that in the commit message).
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-20 1:34 ` Jeff King
@ 2016-01-20 1:51 ` Junio C Hamano
2016-01-20 2:00 ` Jeff King
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-01-20 1:51 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git, John Fultz
Jeff King <peff@peff.net> writes:
>> Mph. We could get the best of both worlds by introducing a "git
>> rev-parse --compare <a> <b>" that compares object ids. Actually...
>>
>> How about something like this?
>
> Thanks. I had in my head that we could do something like that, but
> hadn't quite worked it out. I think what you wrote works.
But wouldn't "diff-tree --quiet" essentially be that command?
> If you want to wrap it up into a patch, I'd be OK with it, but note that
> it still falls afoul of changing $tree in a user-visible way (so you
> should note that in the commit message).
Yes, I think we should take your conservative variant for that exact
reason.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-20 1:51 ` Junio C Hamano
@ 2016-01-20 2:00 ` Jeff King
2016-01-20 2:43 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2016-01-20 2:00 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Nieder, git, John Fultz
On Tue, Jan 19, 2016 at 05:51:58PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> >> Mph. We could get the best of both worlds by introducing a "git
> >> rev-parse --compare <a> <b>" that compares object ids. Actually...
> >>
> >> How about something like this?
> >
> > Thanks. I had in my head that we could do something like that, but
> > hadn't quite worked it out. I think what you wrote works.
>
> But wouldn't "diff-tree --quiet" essentially be that command?
I think Jonathan was responding to my point that "diff-tree --quiet"
_isn't_ quite the same, if you have mis-formatted tree objects. If the
sha1s are different, a rev-parse comparison will keep the commit. But
"diff-tree" will actually do the diff, and may consider different sha1s
to have the same content, dropping the second one.
It's a minor point, but I find one of my primary uses for filter-branch
these days is massaging out bogus objects made by older or buggy git
clients (not that I see _that_ many of them; I think it speaks more to
the fact that I don't really use filter-branch much these days).
> > If you want to wrap it up into a patch, I'd be OK with it, but note that
> > it still falls afoul of changing $tree in a user-visible way (so you
> > should note that in the commit message).
>
> Yes, I think we should take your conservative variant for that exact
> reason.
I'm fine with that, too. We could also do the conservative variant for
"maint", and then do the other with a deprecation warning. I think I
decided I don't care enough to go through those motions myself, but I
don't mind if somebody else wants to.
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-20 2:00 ` Jeff King
@ 2016-01-20 2:43 ` Junio C Hamano
2016-01-20 3:23 ` Junio C Hamano
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-01-20 2:43 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git, John Fultz
Jeff King <peff@peff.net> writes:
> On Tue, Jan 19, 2016 at 05:51:58PM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>> >> Mph. We could get the best of both worlds by introducing a "git
>> >> rev-parse --compare <a> <b>" that compares object ids. Actually...
>> >>
>> >> How about something like this?
>> >
>> > Thanks. I had in my head that we could do something like that, but
>> > hadn't quite worked it out. I think what you wrote works.
>>
>> But wouldn't "diff-tree --quiet" essentially be that command?
>
> I think Jonathan was responding to my point that "diff-tree --quiet"
> _isn't_ quite the same, if you have mis-formatted tree objects. If the
> sha1s are different, a rev-parse comparison will keep the commit. But
> "diff-tree" will actually do the diff, and may consider different sha1s
> to have the same content, dropping the second one.
>
> It's a minor point, but I find one of my primary uses for filter-branch
> these days is massaging out bogus objects made by older or buggy git
> clients (not that I see _that_ many of them; I think it speaks more to
> the fact that I don't really use filter-branch much these days).
I (think I) understand that use case, but this function compares the
parent tree and the tree of the commit we just created, and it does
so in order to skip the one we just created (when --prune-empty is
given). It is not about "tree-filter returned a tree, let's compare
them and if the old one and the new one looks the same (even though
they do not have identical object name) and replace the old one with
the new before giving it to commit-filter, so that we have a larger
chance of noticing a case where we did not have to rewrite and instead
were able to reuse the old commit and tree".
In other words, I do not think "broken tree object may look the same
to diff-tree, but I do want to replace it" is relevant to this
codepath; it is not something this function handles, I think.
What am I not seeing?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-20 2:43 ` Junio C Hamano
@ 2016-01-20 3:23 ` Junio C Hamano
2016-01-20 4:14 ` Jeff King
0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2016-01-20 3:23 UTC (permalink / raw)
To: Jeff King; +Cc: Jonathan Nieder, git, John Fultz
Junio C Hamano <gitster@pobox.com> writes:
> In other words, I do not think "broken tree object may look the same
> to diff-tree, but I do want to replace it" is relevant to this
> codepath; it is not something this function handles, I think.
>
> What am I not seeing?
One thing that I was not seeing is that "diff-tree --quiet" (without
any magic options like -ignore-whitespace that may make two
different tree object judged to be the same, and without -C -C that
forces it to recurse into identical subtrees) lacks an obvious and
stupid optimization to say "they are the same" or "they are
different when given two tree object names without opening the
tree(s). So if you make it compare a rewritten and corrected tree
and an incorrect and possibly broken one, it may try to read the
tree data from the latter and comparison can lead to incorrect
results.
A consequence of this is if you are running filter-branch without
any tree filters (i.e. no_index) but with "--prune-empty", and a
commit and its parent actually have different trees but one (or
both) of them are broken, "diff-tree" _might_ say "they are the
same" and you end up skipping a commit when you do not want to. If
your plan was to run another round of filter-branch, this time with
a "broken tree encoding correction" tree-filter, on the result of
the first "--prune-empty" filtering, we would definitely end up with
a wrong history.
But for such a case, I would say you should be running the
correction filter as the very first thing. So I am not sure if it
matters in practice.
One possible action item out of this is that we might want to think
about giving the obvious and stupid optimization to such invocation
of "diff-tree --quiet". I _think_ we correctly avoid descending
into the identical subtrees while doing a recursive diff-tree by
saying "hey these two corresponding directories have the same tree
object names", but there is no fundamental reason why we shouldn't
be doing the same optimization at the top-level of the comparison.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
2016-01-20 3:23 ` Junio C Hamano
@ 2016-01-20 4:14 ` Jeff King
0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2016-01-20 4:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jonathan Nieder, git, John Fultz
On Tue, Jan 19, 2016 at 07:23:35PM -0800, Junio C Hamano wrote:
> A consequence of this is if you are running filter-branch without
> any tree filters (i.e. no_index) but with "--prune-empty", and a
> commit and its parent actually have different trees but one (or
> both) of them are broken, "diff-tree" _might_ say "they are the
> same" and you end up skipping a commit when you do not want to. If
> your plan was to run another round of filter-branch, this time with
> a "broken tree encoding correction" tree-filter, on the result of
> the first "--prune-empty" filtering, we would definitely end up with
> a wrong history.
Yeah, that is the case I was thinking about.
> But for such a case, I would say you should be running the
> correction filter as the very first thing. So I am not sure if it
> matters in practice.
I'd also agree with that.
> One possible action item out of this is that we might want to think
> about giving the obvious and stupid optimization to such invocation
> of "diff-tree --quiet". I _think_ we correctly avoid descending
> into the identical subtrees while doing a recursive diff-tree by
> saying "hey these two corresponding directories have the same tree
> object names", but there is no fundamental reason why we shouldn't
> be doing the same optimization at the top-level of the comparison.
Yeah, I just assumed we did, but a simple strace shows that is not the
case. I doubt it matters _too_ much in practice, but it would probably
be easy to do.
-Peff
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2016-01-20 4:14 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-19 20:48 git filter-branch not removing commits when it should in 2.7.0 John Fultz
2016-01-19 21:14 ` Junio C Hamano
2016-01-19 21:35 ` Junio C Hamano
2016-01-19 21:37 ` Jeff King
2016-01-19 21:46 ` Junio C Hamano
2016-01-19 21:51 ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King
2016-01-19 21:59 ` Jeff King
2016-01-19 22:07 ` Jeff King
2016-01-19 22:23 ` Junio C Hamano
2016-01-19 22:28 ` Jeff King
2016-01-19 22:48 ` Jeff King
2016-01-20 1:22 ` Jonathan Nieder
2016-01-20 1:34 ` Jeff King
2016-01-20 1:51 ` Junio C Hamano
2016-01-20 2:00 ` Jeff King
2016-01-20 2:43 ` Junio C Hamano
2016-01-20 3:23 ` Junio C Hamano
2016-01-20 4:14 ` Jeff King
2016-01-20 0:47 ` Jonathan Nieder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).