git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, John Fultz <jfultz@wolfram.com>
Subject: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
Date: Tue, 19 Jan 2016 16:51:00 -0500	[thread overview]
Message-ID: <20160119215100.GB28656@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq37tt9r9g.fsf@gitster.mtv.corp.google.com>

On Tue, Jan 19, 2016 at 01:46:35PM -0800, Junio C Hamano wrote:

> > It _is_ slower, though, because it introduces an extra rev-parse. When
> > we could in fact be getting rid of one. Give me a moment to complete a
> > few timing tests and post the results.
> 
> Good point.
> 
> We should do that rev-parse in the helper function.  That rev-parse
> is there only because the skip-empty code wants to know the exact
> object name when comparing.  There is no reason for this code to do
> it for the helper--the helper, if (and only if) it is called, can
> do the rev-parse itself, and we can still omit the overhead when
> we are not skipping empty ones.

Here's the patch I came up with. It takes the conservative choice (see
the argument below), and shows the performance impact. I'll work up the
non-conservative one on top, which I think can do even better than the
original.

-- >8 --
Subject: filter-branch: resolve $commit^{tree} in no-index case

Commit 348d4f2 (filter-branch: skip index read/write when
possible, 2015-11-06) taught filter-branch to optimize out
the final "git write-tree" when we know we haven't touched
the tree with any of our filters. It does by simply putting
the literal text "$commit^{tree}" into the "$tree" variable,
avoiding a useless rev-parse call.

However, when we pass this to git_commit_non_empty_tree(),
it gets confused; it resolves "$commit^{tree}" itself, and
compares our string to the 40-hex sha1, which obviously
doesn't match. As a result, "--prune-empty" (or any custom
filter using git_commit_non_empty_tree) will fail to drop
an empty commit (when filter-branch is used without a tree
or index filter).

Let's resolve $tree to the 40-hex ourselves, so that
git_commit_non_empty_tree can work. Unfortunately, this is a
bit slower due to the extra process overhead:

  $ cd t/perf && ./run 348d4f2 HEAD p7000-filter-branch.sh
  [...]
  Test                  348d4f2           HEAD
  --------------------------------------------------------------
  7000.2: noop filter   3.76(0.24+0.26)   4.54(0.28+0.24) +20.7%

However, the value of $tree here is technically
user-visible. The user can provide arbitrary shell code at
this stage, which could itself have a similar assumption to
what is in git_commit_non_empty_tree. So the conservative
choice to fix this regression is to take the 20% hit and
give the pre-348d4f2 behavior. We still end up much faster
than before the optimization:

  $ cd t/perf && ./run 348d4f2^ HEAD p7000-filter-branch.sh
  [...]
  Test                  348d4f2^          HEAD
  --------------------------------------------------------------
  7000.2: noop filter   9.51(4.32+0.40)   4.51(0.28+0.23) -52.6%

Signed-off-by: Jeff King <peff@peff.net>
---
 git-filter-branch.sh     | 2 +-
 t/t7003-filter-branch.sh | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index d61f9ba..5e094ce 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -404,7 +404,7 @@ while read commit parents; do
 	then
 		tree=$(git write-tree)
 	else
-		tree="$commit^{tree}"
+		tree=$(git rev-parse "$commit^{tree}")
 	fi
 	workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
 		"$tree" $parentstr < ../message > ../map/$commit ||
diff --git a/t/t7003-filter-branch.sh b/t/t7003-filter-branch.sh
index 377c648..97c23c2 100755
--- a/t/t7003-filter-branch.sh
+++ b/t/t7003-filter-branch.sh
@@ -333,6 +333,14 @@ test_expect_success 'prune empty collapsed merges' '
 	test_cmp expect actual
 '
 
+test_expect_success 'prune empty works even without index/tree filters' '
+	git rev-list HEAD >expect &&
+	git commit --allow-empty -m empty &&
+	git filter-branch -f --prune-empty HEAD &&
+	git rev-list HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success '--remap-to-ancestor with filename filters' '
 	git checkout master &&
 	git reset --hard A &&
-- 
2.7.0.248.g5eafd77

  reply	other threads:[~2016-01-19 21:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-19 20:48 git filter-branch not removing commits when it should in 2.7.0 John Fultz
2016-01-19 21:14 ` Junio C Hamano
2016-01-19 21:35   ` Junio C Hamano
2016-01-19 21:37     ` Jeff King
2016-01-19 21:46       ` Junio C Hamano
2016-01-19 21:51         ` Jeff King [this message]
2016-01-19 21:59           ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King
2016-01-19 22:07             ` Jeff King
2016-01-19 22:23               ` Junio C Hamano
2016-01-19 22:28             ` Jeff King
2016-01-19 22:48               ` Jeff King
2016-01-20  1:22               ` Jonathan Nieder
2016-01-20  1:34                 ` Jeff King
2016-01-20  1:51                   ` Junio C Hamano
2016-01-20  2:00                     ` Jeff King
2016-01-20  2:43                       ` Junio C Hamano
2016-01-20  3:23                         ` Junio C Hamano
2016-01-20  4:14                           ` Jeff King
2016-01-20  0:47           ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160119215100.GB28656@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jfultz@wolfram.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).