git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, John Fultz <jfultz@wolfram.com>
Subject: Re: [PATCH] filter-branch: resolve $commit^{tree} in no-index case
Date: Tue, 19 Jan 2016 17:28:02 -0500	[thread overview]
Message-ID: <20160119222802.GC6556@sigill.intra.peff.net> (raw)
In-Reply-To: <20160119215928.GA6556@sigill.intra.peff.net>

On Tue, Jan 19, 2016 at 04:59:28PM -0500, Jeff King wrote:

> We could get away "git diff --exit-code $1 $2" to do a single process
> invocation (rather than two rev-parses), but I don't know if it is worth
> the complexity.

And here's that patch.

I'm actually a little iffy on it because it switches to "diff-tree" from
a raw-sha1 comparison. For a well-formed repo, that shouldn't matter.
But what if you had a commit that was replacing a malformed tree object,
but not otherwise changing the diff?  We might drop it as "empty", even
though you'd prefer to keep it.

I guess some alternative ways of doing the same thing are:

  1. Use "git rev-parse $1 $3^{tree}", parse the output and compare.
     Surprisingly tedious to do in a shell without invoking any extra
     sub-processes.

  2. When we do the initial rev-list, output the tree, too. Then we get
     the old tree sha1 for "free" (no extra process, but at the cost of
     slightly more I/O by the shell).

I suspect we could also cache the tree of the parent and avoid the
rev-parse in git_commit_non_empty_tree(), too.

I dunno. I'm inclined to say that none of this is worth it to try to
drop one or two processes. Writing filter-branch in a better language
(or just using BFG) would probably be a more productive use of time.
20% looks like a lot, but that's because it's pretty fast in the first
place.  The timings I showed earlier (and below) are for git.git, which
is not that huge. But the savings from 348d4f2 are really about avoiding
looking at the trees entirely; the bigger your tree, the more you save.
Running it on linux.git should show that we're still reclaiming most of
the original optimization.

So I'm inclined to go with the "conservative" fix I sent initially, and
drop this micro-optimization.

-- >8 --
Subject: [PATCH] filter-branch: optimize out rev-parse for unchanged tree

The prior commit noticed that we started feeding
"$commit^{tree}" to git_commit_non_empty_tree instead of its
resolved 40-hex sha1. It made the conservative fix of
resolving and passing the 40-hex sha1, even though it does
add some overhead.

This patch takes the less conservative choice, under the
assumption that we are unlikely to break anybody's
commit-filter by showing them the unresolved name
"$commit^{tree}" (which is probably the case, because most
people would end their filter with "git commit-tree" or
"git_commit_non_empty_tree").

We can make the comparison in the latter function more
clever by using "diff-tree --quiet", which will compare the
two trees with only a single process invocation.

Here are the resulting numbers from p7000:

  Test                  348d4f2^          348d4f2                  HEAD^                    HEAD
  ----------------------------------------------------------------------------------------------------------------
  7000.2: noop filter   9.48(4.29+0.41)   3.73(0.22+0.26) -60.7%   4.56(0.27+0.26) -51.9%   3.74(0.25+0.25) -60.5%

Signed-off-by: Jeff King <peff@peff.net>
---
 git-filter-branch.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 5e094ce..d1879e3 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -44,7 +44,7 @@ skip_commit()
 # it will skip commits that leave the tree untouched, commit the other.
 git_commit_non_empty_tree()
 {
-	if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then
+	if test $# = 3 && git diff-tree --quiet "$1" "$3"; then
 		map "$3"
 	else
 		git commit-tree "$@"
@@ -404,7 +404,7 @@ while read commit parents; do
 	then
 		tree=$(git write-tree)
 	else
-		tree=$(git rev-parse "$commit^{tree}")
+		tree="$commit^{tree}"
 	fi
 	workdir=$workdir @SHELL_PATH@ -c "$filter_commit" "git commit-tree" \
 		"$tree" $parentstr < ../message > ../map/$commit ||
-- 
2.7.0.248.g5eafd77

  parent reply	other threads:[~2016-01-19 22:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-19 20:48 git filter-branch not removing commits when it should in 2.7.0 John Fultz
2016-01-19 21:14 ` Junio C Hamano
2016-01-19 21:35   ` Junio C Hamano
2016-01-19 21:37     ` Jeff King
2016-01-19 21:46       ` Junio C Hamano
2016-01-19 21:51         ` [PATCH] filter-branch: resolve $commit^{tree} in no-index case Jeff King
2016-01-19 21:59           ` Jeff King
2016-01-19 22:07             ` Jeff King
2016-01-19 22:23               ` Junio C Hamano
2016-01-19 22:28             ` Jeff King [this message]
2016-01-19 22:48               ` Jeff King
2016-01-20  1:22               ` Jonathan Nieder
2016-01-20  1:34                 ` Jeff King
2016-01-20  1:51                   ` Junio C Hamano
2016-01-20  2:00                     ` Jeff King
2016-01-20  2:43                       ` Junio C Hamano
2016-01-20  3:23                         ` Junio C Hamano
2016-01-20  4:14                           ` Jeff King
2016-01-20  0:47           ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160119222802.GC6556@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jfultz@wolfram.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).