* git filter-branch not removing commits when it should in 2.7.0 @ 2016-01-19 20:48 John Fultz 2016-01-19 21:14 ` Junio C Hamano 0 siblings, 1 reply; 19+ messages in thread From: John Fultz @ 2016-01-19 20:48 UTC (permalink / raw) To: git This seems to be a 2.7.0 regression in filter-branch. The bug is reproducible on Mac/Windows (haven't tried Linux) in the 2.7.0 production releases. Make an empty repo and put an empty commit in the history. E.g., echo > foo && git add . && git commit -m "commit 1" && git commit --allow-empty -m "commit 2" Now try to use filter-branch to remove the empty commit. Both of the following methods leave master unchanged, but both worked in 2.6.4: git filter-branch --prune-empty git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"' Let me know if you need any more information. Thanks. Sincerely, John Fultz jfultz@wolfram.com User Interface Group Wolfram Research, Inc. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git filter-branch not removing commits when it should in 2.7.0 2016-01-19 20:48 git filter-branch not removing commits when it should in 2.7.0 John Fultz @ 2016-01-19 21:14 ` Junio C Hamano 2016-01-19 21:35 ` Junio C Hamano 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2016-01-19 21:14 UTC (permalink / raw) To: Jeff King; +Cc: git, John Fultz John Fultz <jfultz@wolfram.com> writes: > This seems to be a 2.7.0 regression in filter-branch. The bug is reproducible on Mac/Windows (haven't tried Linux) in the 2.7.0 production releases. > > Make an empty repo and put an empty commit in the history. E.g., > > echo > foo && git add . && git commit -m "commit 1" && git commit --allow-empty -m "commit 2" > > Now try to use filter-branch to remove the empty commit. Both of the following methods leave master unchanged, but both worked in 2.6.4: > > git filter-branch --prune-empty > git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"' > > Let me know if you need any more information. Thanks. > > Sincerely, > > John Fultz > jfultz@wolfram.com > User Interface Group > Wolfram Research, Inc. Thanks. Since there were only 5 changes to git-filter-branch.sh between v2.6.0 and v2.7.0, it was fairly easy to pinpoint. Reverting the following commit from v2.7.0 seems to give the same result as v2.6.0 for "--prune-empty" experiment. commit 348d4f2fc5d3c4f7ba47079b96676b4e2dd831fc Author: Jeff King <peff@peff.net> Date: Fri Nov 6 01:24:29 2015 -0500 filter-branch: skip index read/write when possible If the user specifies an index filter but not a tree filter, filter-branch cleverly avoids checking out the tree entirely. But we don't do the next level of optimization: if you have no index or tree filter, we do not need to read the index at all. This can greatly speed up cases where we are only changing the commit objects (e.g., cementing a graft into place). Here are numbers from the newly-added perf test: Test HEAD^ HEAD --------------------------------------------------------------- 7000.2: noop filter 13.81(4.95+0.83) 5.43(0.42+0.43) -60.7% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git filter-branch not removing commits when it should in 2.7.0 2016-01-19 21:14 ` Junio C Hamano @ 2016-01-19 21:35 ` Junio C Hamano 2016-01-19 21:37 ` Jeff King 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2016-01-19 21:35 UTC (permalink / raw) To: Jeff King; +Cc: git, John Fultz Junio C Hamano <gitster@pobox.com> writes: > John Fultz <jfultz@wolfram.com> writes: > >> This seems to be a 2.7.0 regression in filter-branch. The bug is reproducible on Mac/Windows (haven't tried Linux) in the 2.7.0 production releases. >> >> Make an empty repo and put an empty commit in the history. E.g., >> >> echo > foo && git add . && git commit -m "commit 1" && git commit --allow-empty -m "commit 2" >> >> Now try to use filter-branch to remove the empty commit. Both of the following methods leave master unchanged, but both worked in 2.6.4: >> >> git filter-branch --prune-empty >> git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"' > > Thanks. > > Since there were only 5 changes to git-filter-branch.sh between > v2.6.0 and v2.7.0, it was fairly easy to pinpoint. > > Reverting the following commit from v2.7.0 seems to give the same > result as v2.6.0 for "--prune-empty" experiment. > > commit 348d4f2fc5d3c4f7ba47079b96676b4e2dd831fc > Author: Jeff King <peff@peff.net> > Date: Fri Nov 6 01:24:29 2015 -0500 > > filter-branch: skip index read/write when possible > > If the user specifies an index filter but not a tree filter, > filter-branch cleverly avoids checking out the tree > entirely. But we don't do the next level of optimization: if > you have no index or tree filter, we do not need to read the > index at all. > > This can greatly speed up cases where we are only changing > the commit objects (e.g., cementing a graft into place). > Here are numbers from the newly-added perf test: > > Test HEAD^ HEAD > --------------------------------------------------------------- > 7000.2: noop filter 13.81(4.95+0.83) 5.43(0.42+0.43) -60.7% > > Signed-off-by: Jeff King <peff@peff.net> > Signed-off-by: Junio C Hamano <gitster@pobox.com> OK, is this because git_commit_non_empty_tree() does this to decide that it should skip the commit: git_commit_non_empty_tree() { if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then map "$3" else git commit-tree "$@" fi } where its parameters when --prune-empty is in use (or when the function is used as the commit-filter), $1 is "$tree", $2 and $3 are "-p" and its sole commit object name $commit (which is read from the revs file, which is an output from rev-list, so it is known to be 40-hex) and tree after the said patch is computed like so: if test -n "$need_index" then tree=$(git write-tree) else tree="$commit^{tree}" fi i.e. the helper does textual comparison between "$FOURTY_HEX^{tree}" and 40-hex from rev-parse "$3^{tree}"? In other words, would the fix be a one-liner like this? git-filter-branch.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 98f1779..86b2ff1 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -404,7 +404,7 @@ while read commit parents; do then tree=$(git write-tree) else - tree="$commit^{tree}" + tree=$(git rev-parse "$commit^{tree}") fi workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \ "$tree" $parentstr < ../message > ../map/$commit || ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: git filter-branch not removing commits when it should in 2.7.0 2016-01-19 21:35 ` Junio C Hamano @ 2016-01-19 21:37 ` Jeff King 2016-01-19 21:46 ` Junio C Hamano 0 siblings, 1 reply; 19+ messages in thread From: Jeff King @ 2016-01-19 21:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, John Fultz On Tue, Jan 19, 2016 at 01:35:25PM -0800, Junio C Hamano wrote: > In other words, would the fix be a one-liner like this? > [...] > - tree="$commit^{tree}" > + tree=$(git rev-parse "$commit^{tree}") Yes, I was just writing up the commit message for it. :-/ It _is_ slower, though, because it introduces an extra rev-parse. When we could in fact be getting rid of one. Give me a moment to complete a few timing tests and post the results. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: git filter-branch not removing commits when it should in 2.7.0 2016-01-19 21:37 ` Jeff King @ 2016-01-19 21:46 ` Junio C Hamano 2016-01-19 21:51 ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2016-01-19 21:46 UTC (permalink / raw) To: Jeff King; +Cc: git, John Fultz Jeff King <peff@peff.net> writes: > On Tue, Jan 19, 2016 at 01:35:25PM -0800, Junio C Hamano wrote: > >> In other words, would the fix be a one-liner like this? >> [...] >> - tree="$commit^{tree}" >> + tree=$(git rev-parse "$commit^{tree}") > > Yes, I was just writing up the commit message for it. :-/ > > It _is_ slower, though, because it introduces an extra rev-parse. When > we could in fact be getting rid of one. Give me a moment to complete a > few timing tests and post the results. Good point. We should do that rev-parse in the helper function. That rev-parse is there only because the skip-empty code wants to know the exact object name when comparing. There is no reason for this code to do it for the helper--the helper, if (and only if) it is called, can do the rev-parse itself, and we can still omit the overhead when we are not skipping empty ones. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-19 21:46 ` Junio C Hamano @ 2016-01-19 21:51 ` Jeff King 2016-01-19 21:59 ` Jeff King 2016-01-20 0:47 ` Jonathan Nieder 0 siblings, 2 replies; 19+ messages in thread From: Jeff King @ 2016-01-19 21:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, John Fultz On Tue, Jan 19, 2016 at 01:46:35PM -0800, Junio C Hamano wrote: > > It _is_ slower, though, because it introduces an extra rev-parse. When > > we could in fact be getting rid of one. Give me a moment to complete a > > few timing tests and post the results. > > Good point. > > We should do that rev-parse in the helper function. That rev-parse > is there only because the skip-empty code wants to know the exact > object name when comparing. There is no reason for this code to do > it for the helper--the helper, if (and only if) it is called, can > do the rev-parse itself, and we can still omit the overhead when > we are not skipping empty ones. Here's the patch I came up with. It takes the conservative choice (see the argument below), and shows the performance impact. I'll work up the non-conservative one on top, which I think can do even better than the original. -- >8 -- Subject: filter-branch: resolve $commit^{tree} in no-index case Commit 348d4f2 (filter-branch: skip index read/write when possible, 2015-11-06) taught filter-branch to optimize out the final "git write-tree" when we know we haven't touched the tree with any of our filters. It does by simply putting the literal text "$commit^{tree}" into the "$tree" variable, avoiding a useless rev-parse call. However, when we pass this to git_commit_non_empty_tree(), it gets confused; it resolves "$commit^{tree}" itself, and compares our string to the 40-hex sha1, which obviously doesn't match. As a result, "--prune-empty" (or any custom filter using git_commit_non_empty_tree) will fail to drop an empty commit (when filter-branch is used without a tree or index filter). Let's resolve $tree to the 40-hex ourselves, so that git_commit_non_empty_tree can work. Unfortunately, this is a bit slower due to the extra process overhead: $ cd t/perf && ./run 348d4f2 HEAD p7000-filter-branch.sh [...] Test 348d4f2 HEAD -------------------------------------------------------------- 7000.2: noop filter 3.76(0.24+0.26) 4.54(0.28+0.24) +20.7% However, the value of $tree here is technically user-visible. The user can provide arbitrary shell code at this stage, which could itself have a similar assumption to what is in git_commit_non_empty_tree. So the conservative choice to fix this regression is to take the 20% hit and give the pre-348d4f2 behavior. We still end up much faster than before the optimization: $ cd t/perf && ./run 348d4f2^ HEAD p7000-filter-branch.sh [...] Test 348d4f2^ HEAD -------------------------------------------------------------- 7000.2: noop filter 9.51(4.32+0.40) 4.51(0.28+0.23) -52.6% Signed-off-by: Jeff King <peff@peff.net> --- git-filter-branch.sh | 2 +- t/t7003-filter-branch.sh | 8 ++++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index d61f9ba..5e094ce 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -404,7 +404,7 @@ while read commit parents; do then tree=$(git write-tree) else - tree="$commit^{tree}" + tree=$(git rev-parse "$commit^{tree}") fi workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \ "$tree" $parentstr < ../message > ../map/$commit || diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index 377c648..97c23c2 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -333,6 +333,14 @@ test_expect_success 'prune empty collapsed merges' ' test_cmp expect actual ' +test_expect_success 'prune empty works even without index/tree filters' ' + git rev-list HEAD >expect && + git commit --allow-empty -m empty && + git filter-branch -f --prune-empty HEAD && + git rev-list HEAD >actual && + test_cmp expect actual +' + test_expect_success '--remap-to-ancestor with filename filters' ' git checkout master && git reset --hard A && -- 2.7.0.248.g5eafd77 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-19 21:51 ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King @ 2016-01-19 21:59 ` Jeff King 2016-01-19 22:07 ` Jeff King 2016-01-19 22:28 ` Jeff King 2016-01-20 0:47 ` Jonathan Nieder 1 sibling, 2 replies; 19+ messages in thread From: Jeff King @ 2016-01-19 21:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, John Fultz On Tue, Jan 19, 2016 at 04:51:00PM -0500, Jeff King wrote: > Here's the patch I came up with. It takes the conservative choice (see > the argument below), and shows the performance impact. I'll work up the > non-conservative one on top, which I think can do even better than the > original. Oh, never mind. I had thought we could do away with _both_ rev-parses and compare "foo^{tree}" to "foo^{tree}". But of course that does not work. Because it is really "$commit^{tree}" versus "$parent^{tree}" here. I need to update the commit message below. We could get away "git diff --exit-code $1 $2" to do a single process invocation (rather than two rev-parses), but I don't know if it is worth the complexity. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-19 21:59 ` Jeff King @ 2016-01-19 22:07 ` Jeff King 2016-01-19 22:23 ` Junio C Hamano 2016-01-19 22:28 ` Jeff King 1 sibling, 1 reply; 19+ messages in thread From: Jeff King @ 2016-01-19 22:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, John Fultz On Tue, Jan 19, 2016 at 04:59:28PM -0500, Jeff King wrote: > Oh, never mind. I had thought we could do away with _both_ rev-parses > and compare "foo^{tree}" to "foo^{tree}". > > But of course that does not work. Because it is really "$commit^{tree}" > versus "$parent^{tree}" here. I need to update the commit message below. Actually, I ended up not including the part about doing that comparison. So the commit message isn't _wrong_, but it is incomplete. This paragraph: However, the value of $tree here is technically user-visible. The user can provide arbitrary shell code at this stage, which could itself have a similar assumption to what is in git_commit_non_empty_tree. So the conservative choice to fix this regression is to take the 20% hit and give the pre-348d4f2 behavior. should start with something like: We could try to make git_commit_non_empty_tree more clever. to make it clear why we care about the user-visibility of the variable in the first place. Here is a re-roll with that fixup. -- >8 -- Subject: [PATCH v2] filter-branch: resolve $commit^{tree} in no-index case Commit 348d4f2 (filter-branch: skip index read/write when possible, 2015-11-06) taught filter-branch to optimize out the final "git write-tree" when we know we haven't touched the tree with any of our filters. It does by simply putting the literal text "$commit^{tree}" into the "$tree" variable, avoiding a useless rev-parse call. However, when we pass this to git_commit_non_empty_tree(), it gets confused; it resolves "$commit^{tree}" itself, and compares our string to the 40-hex sha1, which obviously doesn't match. As a result, "--prune-empty" (or any custom filter using git_commit_non_empty_tree) will fail to drop an empty commit (when filter-branch is used without a tree or index filter). Let's resolve $tree to the 40-hex ourselves, so that git_commit_non_empty_tree can work. Unfortunately, this is a bit slower due to the extra process overhead: $ cd t/perf && ./run 348d4f2 HEAD p7000-filter-branch.sh [...] Test 348d4f2 HEAD -------------------------------------------------------------- 7000.2: noop filter 3.76(0.24+0.26) 4.54(0.28+0.24) +20.7% We could try to make git_commit_non_empty_tree more clever. However, the value of $tree here is technically user-visible. The user can provide arbitrary shell code at this stage, which could itself have a similar assumption to what is in git_commit_non_empty_tree. So the conservative choice to fix this regression is to take the 20% hit and give the pre-348d4f2 behavior. We still end up much faster than before the optimization: $ cd t/perf && ./run 348d4f2^ HEAD p7000-filter-branch.sh [...] Test 348d4f2^ HEAD -------------------------------------------------------------- 7000.2: noop filter 9.51(4.32+0.40) 4.51(0.28+0.23) -52.6% Signed-off-by: Jeff King <peff@peff.net> --- git-filter-branch.sh | 2 +- t/t7003-filter-branch.sh | 8 ++++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index d61f9ba..5e094ce 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -404,7 +404,7 @@ while read commit parents; do then tree=$(git write-tree) else - tree="$commit^{tree}" + tree=$(git rev-parse "$commit^{tree}") fi workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \ "$tree" $parentstr < ../message > ../map/$commit || diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh index 377c648..97c23c2 100755 --- a/t/t7003-filter-branch.sh +++ b/t/t7003-filter-branch.sh @@ -333,6 +333,14 @@ test_expect_success 'prune empty collapsed merges' ' test_cmp expect actual ' +test_expect_success 'prune empty works even without index/tree filters' ' + git rev-list HEAD >expect && + git commit --allow-empty -m empty && + git filter-branch -f --prune-empty HEAD && + git rev-list HEAD >actual && + test_cmp expect actual +' + test_expect_success '--remap-to-ancestor with filename filters' ' git checkout master && git reset --hard A && -- 2.7.0.248.g5eafd77 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-19 22:07 ` Jeff King @ 2016-01-19 22:23 ` Junio C Hamano 0 siblings, 0 replies; 19+ messages in thread From: Junio C Hamano @ 2016-01-19 22:23 UTC (permalink / raw) To: Jeff King; +Cc: git, John Fultz Jeff King <peff@peff.net> writes: > Actually, I ended up not including the part about doing that comparison. > So the commit message isn't _wrong_, but it is incomplete. This > paragraph: > > However, the value of $tree here is technically > user-visible. The user can provide arbitrary shell code at > this stage, which could itself have a similar assumption to > what is in git_commit_non_empty_tree. So the conservative > choice to fix this regression is to take the 20% hit and > give the pre-348d4f2 behavior. > > should start with something like: > > We could try to make git_commit_non_empty_tree more clever. > > to make it clear why we care about the user-visibility of the variable > in the first place. > > Here is a re-roll with that fixup. Thanks. I failed to spot and realize that $tree is also part of what we expose and end-user scripts may already be looking at when I commented, and I agree with the reasoning behind this conservative route. Will queue. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-19 21:59 ` Jeff King 2016-01-19 22:07 ` Jeff King @ 2016-01-19 22:28 ` Jeff King 2016-01-19 22:48 ` Jeff King 2016-01-20 1:22 ` Jonathan Nieder 1 sibling, 2 replies; 19+ messages in thread From: Jeff King @ 2016-01-19 22:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, John Fultz On Tue, Jan 19, 2016 at 04:59:28PM -0500, Jeff King wrote: > We could get away "git diff --exit-code $1 $2" to do a single process > invocation (rather than two rev-parses), but I don't know if it is worth > the complexity. And here's that patch. I'm actually a little iffy on it because it switches to "diff-tree" from a raw-sha1 comparison. For a well-formed repo, that shouldn't matter. But what if you had a commit that was replacing a malformed tree object, but not otherwise changing the diff? We might drop it as "empty", even though you'd prefer to keep it. I guess some alternative ways of doing the same thing are: 1. Use "git rev-parse $1 $3^{tree}", parse the output and compare. Surprisingly tedious to do in a shell without invoking any extra sub-processes. 2. When we do the initial rev-list, output the tree, too. Then we get the old tree sha1 for "free" (no extra process, but at the cost of slightly more I/O by the shell). I suspect we could also cache the tree of the parent and avoid the rev-parse in git_commit_non_empty_tree(), too. I dunno. I'm inclined to say that none of this is worth it to try to drop one or two processes. Writing filter-branch in a better language (or just using BFG) would probably be a more productive use of time. 20% looks like a lot, but that's because it's pretty fast in the first place. The timings I showed earlier (and below) are for git.git, which is not that huge. But the savings from 348d4f2 are really about avoiding looking at the trees entirely; the bigger your tree, the more you save. Running it on linux.git should show that we're still reclaiming most of the original optimization. So I'm inclined to go with the "conservative" fix I sent initially, and drop this micro-optimization. -- >8 -- Subject: [PATCH] filter-branch: optimize out rev-parse for unchanged tree The prior commit noticed that we started feeding "$commit^{tree}" to git_commit_non_empty_tree instead of its resolved 40-hex sha1. It made the conservative fix of resolving and passing the 40-hex sha1, even though it does add some overhead. This patch takes the less conservative choice, under the assumption that we are unlikely to break anybody's commit-filter by showing them the unresolved name "$commit^{tree}" (which is probably the case, because most people would end their filter with "git commit-tree" or "git_commit_non_empty_tree"). We can make the comparison in the latter function more clever by using "diff-tree --quiet", which will compare the two trees with only a single process invocation. Here are the resulting numbers from p7000: Test 348d4f2^ 348d4f2 HEAD^ HEAD ---------------------------------------------------------------------------------------------------------------- 7000.2: noop filter 9.48(4.29+0.41) 3.73(0.22+0.26) -60.7% 4.56(0.27+0.26) -51.9% 3.74(0.25+0.25) -60.5% Signed-off-by: Jeff King <peff@peff.net> --- git-filter-branch.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 5e094ce..d1879e3 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -44,7 +44,7 @@ skip_commit() # it will skip commits that leave the tree untouched, commit the other. git_commit_non_empty_tree() { - if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then + if test $# = 3 && git diff-tree --quiet "$1" "$3"; then map "$3" else git commit-tree "$@" @@ -404,7 +404,7 @@ while read commit parents; do then tree=$(git write-tree) else - tree=$(git rev-parse "$commit^{tree}") + tree="$commit^{tree}" fi workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \ "$tree" $parentstr < ../message > ../map/$commit || -- 2.7.0.248.g5eafd77 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-19 22:28 ` Jeff King @ 2016-01-19 22:48 ` Jeff King 2016-01-20 1:22 ` Jonathan Nieder 1 sibling, 0 replies; 19+ messages in thread From: Jeff King @ 2016-01-19 22:48 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, John Fultz On Tue, Jan 19, 2016 at 05:28:02PM -0500, Jeff King wrote: > I dunno. I'm inclined to say that none of this is worth it to try to > drop one or two processes. Writing filter-branch in a better language > (or just using BFG) would probably be a more productive use of time. > 20% looks like a lot, but that's because it's pretty fast in the first > place. The timings I showed earlier (and below) are for git.git, which > is not that huge. But the savings from 348d4f2 are really about avoiding > looking at the trees entirely; the bigger your tree, the more you save. > Running it on linux.git should show that we're still reclaiming most of > the original optimization. In case anyone is curious, here are the linux.git numbers (re-wrapped, because t/perf produces some really long lines; that might be worth addressing): 348d4f2^ 295.32(269.61+14.36) 348d4f2 7.92( 0.85+ 0.72) -97.3% HEAD^ 9.37( 0.87+ 0.80) -96.8% HEAD 7.71( 0.92+ 0.62) -97.4% So yes, the conservative fix costs us about 1.5 seconds, or 18%, over the micro-optimized one. But the original point was to save over 280 seconds, which we still do. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-19 22:28 ` Jeff King 2016-01-19 22:48 ` Jeff King @ 2016-01-20 1:22 ` Jonathan Nieder 2016-01-20 1:34 ` Jeff King 1 sibling, 1 reply; 19+ messages in thread From: Jonathan Nieder @ 2016-01-20 1:22 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, John Fultz Jeff King wrote: > On Tue, Jan 19, 2016 at 04:59:28PM -0500, Jeff King wrote: >> We could get away "git diff --exit-code $1 $2" to do a single process >> invocation (rather than two rev-parses), but I don't know if it is worth >> the complexity. > > And here's that patch. > > I'm actually a little iffy on it because it switches to "diff-tree" from > a raw-sha1 comparison. For a well-formed repo, that shouldn't matter. > But what if you had a commit that was replacing a malformed tree object, > but not otherwise changing the diff? We might drop it as "empty", even > though you'd prefer to keep it. Mph. We could get the best of both worlds by introducing a "git rev-parse --compare <a> <b>" that compares object ids. Actually... How about something like this? Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- git-filter-branch.sh | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/git-filter-branch.sh b/git-filter-branch.sh index 86b2ff1..06f4e0f 100755 --- a/git-filter-branch.sh +++ b/git-filter-branch.sh @@ -44,7 +44,15 @@ skip_commit() # it will skip commits that leave the tree untouched, commit the other. git_commit_non_empty_tree() { - if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then + if + test $# = 3 && + git rev-parse "$1" "$3^{tree}" | + { + read a + read b + test "$a" = "$b" + } + then map "$3" else git commit-tree "$@" @@ -404,7 +412,7 @@ while read commit parents; do then tree=$(git write-tree) else - tree=$(git rev-parse "$commit^{tree}") + tree="$commit^{tree}" fi workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \ "$tree" $parentstr < ../message > ../map/$commit || -- 2.7.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-20 1:22 ` Jonathan Nieder @ 2016-01-20 1:34 ` Jeff King 2016-01-20 1:51 ` Junio C Hamano 0 siblings, 1 reply; 19+ messages in thread From: Jeff King @ 2016-01-20 1:34 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Junio C Hamano, git, John Fultz On Tue, Jan 19, 2016 at 05:22:53PM -0800, Jonathan Nieder wrote: > > I'm actually a little iffy on it because it switches to "diff-tree" from > > a raw-sha1 comparison. For a well-formed repo, that shouldn't matter. > > But what if you had a commit that was replacing a malformed tree object, > > but not otherwise changing the diff? We might drop it as "empty", even > > though you'd prefer to keep it. > > Mph. We could get the best of both worlds by introducing a "git > rev-parse --compare <a> <b>" that compares object ids. Actually... > > How about something like this? Thanks. I had in my head that we could do something like that, but hadn't quite worked it out. I think what you wrote works. If you want to wrap it up into a patch, I'd be OK with it, but note that it still falls afoul of changing $tree in a user-visible way (so you should note that in the commit message). -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-20 1:34 ` Jeff King @ 2016-01-20 1:51 ` Junio C Hamano 2016-01-20 2:00 ` Jeff King 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2016-01-20 1:51 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git, John Fultz Jeff King <peff@peff.net> writes: >> Mph. We could get the best of both worlds by introducing a "git >> rev-parse --compare <a> <b>" that compares object ids. Actually... >> >> How about something like this? > > Thanks. I had in my head that we could do something like that, but > hadn't quite worked it out. I think what you wrote works. But wouldn't "diff-tree --quiet" essentially be that command? > If you want to wrap it up into a patch, I'd be OK with it, but note that > it still falls afoul of changing $tree in a user-visible way (so you > should note that in the commit message). Yes, I think we should take your conservative variant for that exact reason. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-20 1:51 ` Junio C Hamano @ 2016-01-20 2:00 ` Jeff King 2016-01-20 2:43 ` Junio C Hamano 0 siblings, 1 reply; 19+ messages in thread From: Jeff King @ 2016-01-20 2:00 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, git, John Fultz On Tue, Jan 19, 2016 at 05:51:58PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > >> Mph. We could get the best of both worlds by introducing a "git > >> rev-parse --compare <a> <b>" that compares object ids. Actually... > >> > >> How about something like this? > > > > Thanks. I had in my head that we could do something like that, but > > hadn't quite worked it out. I think what you wrote works. > > But wouldn't "diff-tree --quiet" essentially be that command? I think Jonathan was responding to my point that "diff-tree --quiet" _isn't_ quite the same, if you have mis-formatted tree objects. If the sha1s are different, a rev-parse comparison will keep the commit. But "diff-tree" will actually do the diff, and may consider different sha1s to have the same content, dropping the second one. It's a minor point, but I find one of my primary uses for filter-branch these days is massaging out bogus objects made by older or buggy git clients (not that I see _that_ many of them; I think it speaks more to the fact that I don't really use filter-branch much these days). > > If you want to wrap it up into a patch, I'd be OK with it, but note that > > it still falls afoul of changing $tree in a user-visible way (so you > > should note that in the commit message). > > Yes, I think we should take your conservative variant for that exact > reason. I'm fine with that, too. We could also do the conservative variant for "maint", and then do the other with a deprecation warning. I think I decided I don't care enough to go through those motions myself, but I don't mind if somebody else wants to. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-20 2:00 ` Jeff King @ 2016-01-20 2:43 ` Junio C Hamano 2016-01-20 3:23 ` Junio C Hamano 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2016-01-20 2:43 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git, John Fultz Jeff King <peff@peff.net> writes: > On Tue, Jan 19, 2016 at 05:51:58PM -0800, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> >> Mph. We could get the best of both worlds by introducing a "git >> >> rev-parse --compare <a> <b>" that compares object ids. Actually... >> >> >> >> How about something like this? >> > >> > Thanks. I had in my head that we could do something like that, but >> > hadn't quite worked it out. I think what you wrote works. >> >> But wouldn't "diff-tree --quiet" essentially be that command? > > I think Jonathan was responding to my point that "diff-tree --quiet" > _isn't_ quite the same, if you have mis-formatted tree objects. If the > sha1s are different, a rev-parse comparison will keep the commit. But > "diff-tree" will actually do the diff, and may consider different sha1s > to have the same content, dropping the second one. > > It's a minor point, but I find one of my primary uses for filter-branch > these days is massaging out bogus objects made by older or buggy git > clients (not that I see _that_ many of them; I think it speaks more to > the fact that I don't really use filter-branch much these days). I (think I) understand that use case, but this function compares the parent tree and the tree of the commit we just created, and it does so in order to skip the one we just created (when --prune-empty is given). It is not about "tree-filter returned a tree, let's compare them and if the old one and the new one looks the same (even though they do not have identical object name) and replace the old one with the new before giving it to commit-filter, so that we have a larger chance of noticing a case where we did not have to rewrite and instead were able to reuse the old commit and tree". In other words, I do not think "broken tree object may look the same to diff-tree, but I do want to replace it" is relevant to this codepath; it is not something this function handles, I think. What am I not seeing? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-20 2:43 ` Junio C Hamano @ 2016-01-20 3:23 ` Junio C Hamano 2016-01-20 4:14 ` Jeff King 0 siblings, 1 reply; 19+ messages in thread From: Junio C Hamano @ 2016-01-20 3:23 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Nieder, git, John Fultz Junio C Hamano <gitster@pobox.com> writes: > In other words, I do not think "broken tree object may look the same > to diff-tree, but I do want to replace it" is relevant to this > codepath; it is not something this function handles, I think. > > What am I not seeing? One thing that I was not seeing is that "diff-tree --quiet" (without any magic options like -ignore-whitespace that may make two different tree object judged to be the same, and without -C -C that forces it to recurse into identical subtrees) lacks an obvious and stupid optimization to say "they are the same" or "they are different when given two tree object names without opening the tree(s). So if you make it compare a rewritten and corrected tree and an incorrect and possibly broken one, it may try to read the tree data from the latter and comparison can lead to incorrect results. A consequence of this is if you are running filter-branch without any tree filters (i.e. no_index) but with "--prune-empty", and a commit and its parent actually have different trees but one (or both) of them are broken, "diff-tree" _might_ say "they are the same" and you end up skipping a commit when you do not want to. If your plan was to run another round of filter-branch, this time with a "broken tree encoding correction" tree-filter, on the result of the first "--prune-empty" filtering, we would definitely end up with a wrong history. But for such a case, I would say you should be running the correction filter as the very first thing. So I am not sure if it matters in practice. One possible action item out of this is that we might want to think about giving the obvious and stupid optimization to such invocation of "diff-tree --quiet". I _think_ we correctly avoid descending into the identical subtrees while doing a recursive diff-tree by saying "hey these two corresponding directories have the same tree object names", but there is no fundamental reason why we shouldn't be doing the same optimization at the top-level of the comparison. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-20 3:23 ` Junio C Hamano @ 2016-01-20 4:14 ` Jeff King 0 siblings, 0 replies; 19+ messages in thread From: Jeff King @ 2016-01-20 4:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jonathan Nieder, git, John Fultz On Tue, Jan 19, 2016 at 07:23:35PM -0800, Junio C Hamano wrote: > A consequence of this is if you are running filter-branch without > any tree filters (i.e. no_index) but with "--prune-empty", and a > commit and its parent actually have different trees but one (or > both) of them are broken, "diff-tree" _might_ say "they are the > same" and you end up skipping a commit when you do not want to. If > your plan was to run another round of filter-branch, this time with > a "broken tree encoding correction" tree-filter, on the result of > the first "--prune-empty" filtering, we would definitely end up with > a wrong history. Yeah, that is the case I was thinking about. > But for such a case, I would say you should be running the > correction filter as the very first thing. So I am not sure if it > matters in practice. I'd also agree with that. > One possible action item out of this is that we might want to think > about giving the obvious and stupid optimization to such invocation > of "diff-tree --quiet". I _think_ we correctly avoid descending > into the identical subtrees while doing a recursive diff-tree by > saying "hey these two corresponding directories have the same tree > object names", but there is no fundamental reason why we shouldn't > be doing the same optimization at the top-level of the comparison. Yeah, I just assumed we did, but a simple strace shows that is not the case. I doubt it matters _too_ much in practice, but it would probably be easy to do. -Peff ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case 2016-01-19 21:51 ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King 2016-01-19 21:59 ` Jeff King @ 2016-01-20 0:47 ` Jonathan Nieder 1 sibling, 0 replies; 19+ messages in thread From: Jonathan Nieder @ 2016-01-20 0:47 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, John Fultz Jeff King wrote: > Signed-off-by: Jeff King <peff@peff.net> > --- > git-filter-branch.sh | 2 +- > t/t7003-filter-branch.sh | 8 ++++++++ > 2 files changed, 9 insertions(+), 1 deletion(-) Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2016-01-20 4:14 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-19 20:48 git filter-branch not removing commits when it should in 2.7.0 John Fultz 2016-01-19 21:14 ` Junio C Hamano 2016-01-19 21:35 ` Junio C Hamano 2016-01-19 21:37 ` Jeff King 2016-01-19 21:46 ` Junio C Hamano 2016-01-19 21:51 ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King 2016-01-19 21:59 ` Jeff King 2016-01-19 22:07 ` Jeff King 2016-01-19 22:23 ` Junio C Hamano 2016-01-19 22:28 ` Jeff King 2016-01-19 22:48 ` Jeff King 2016-01-20 1:22 ` Jonathan Nieder 2016-01-20 1:34 ` Jeff King 2016-01-20 1:51 ` Junio C Hamano 2016-01-20 2:00 ` Jeff King 2016-01-20 2:43 ` Junio C Hamano 2016-01-20 3:23 ` Junio C Hamano 2016-01-20 4:14 ` Jeff King 2016-01-20 0:47 ` Jonathan Nieder
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).