git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Thomas Rast <trast@student.ethz.ch>
Cc: git@vger.kernel.org, Sebastian Hahn <mail@sebastianhahn.net>
Subject: Re: gitk "find commit adding/removing string"/possible pickaxe bug?
Date: Mon, 31 Jan 2011 15:00:03 -0500	[thread overview]
Message-ID: <20110131200002.GA7859@sigill.intra.peff.net> (raw)
In-Reply-To: <20110118205040.GA20970@sigill.intra.peff.net>

On Tue, Jan 18, 2011 at 03:50:40PM -0500, Jeff King wrote:

> > doesn't; it only gives a diff for the commit that introduced 'bar'.  I
> > guess this makes sense: -S notices that the number of 'bar's is
> > actually the same as in *one* merge parent, hence the merge cannot be
> > all that interesting.  OTOH it still shows the merge commit in the
> > history, which is a bit strange.  --pickaxe-all does not make a
> > difference either;
> 
> Hrm. What I expected[1] to happen would be for the diff machinery to
> look at each filepair individually, one of them to trigger -S, which
> shows the commit, and then to fail to produce a combined diff because we
> threw away the other uninteresting filepair. But in that case,
> --pickaxe-all _should_ show something, as its point is to keep all of
> the filepairs.  And that's clearly not happening.
> 
> So now I don't know what's going on. I'll try to trace through the diff
> machinery and see if that gives a clue.
> 
> -Peff
> 
> [1] That's what I expect, but not necessarily what I want. I think what
> I would want is for it to do a token count of the merge commit, and if
> it fails to match _every_ parent, then it it interesting. Otherwise, the
> content presumably came from that parent.

I looked into this, and sadly the "wanted" behavior I described above is
not easy to do. It turns out that we never actually see the whole 3-way
diff as a single unit in diffcore-pickaxe. Instead, log-tree calls into
diff_tree_combined, which diffs each parent _individually_, including
running diffcore magic on it. And then if one of those appears
interesting, we show the merge.

So diffcore-pickaxe never even knows that we are doing a combined diff.
It just sees the difference between M and M^, and then separately the
difference between M and M^2. This works OK in my example:

  commit() {
    echo $1 >file && git add file && git commit -m $1
  }
  commit base
  commit master
  git checkout -b other HEAD^
  commit other
  git merge master
  commit resolved

as doing "git log -Sother -c" will show both the commit "other" _and_
the merge commit (since it removed "other" in favor of "resolved"). But
you could also construct a case where it isn't true. For example,
consider a case where two sides add the same token, and the resolution
is to keep both. E.g.:

  echo base >file && git add file && git commit -m base
  echo foo bar >file && git commit -a -m master
  git checkout -b other HEAD^
  echo foo baz >file && git commit -a -m other
  git merge master
  (echo foo bar; echo foo baz) >file && git commit -a -m resolved

That shows the merge commit, even though it didn't actually introduce or
delete that token at all. OTOH, it is part of a conflict region, so it
is really difficult to say whether it is interesting or not. I dunno
what the right semantics are (and note that the definition I gave in the
above email would also trigger on this case).

I have the nagging feeling there is another less ambiguous corner case
that is wrong, but I'm having trouble constructing one.


Anyway, the real point is that we can't do anything special to pickaxe
merge commits at the diffcore level without some pretty major diff
surgery. So where does that leave us? You can still get pretty
reasonable results from turning on "-c". I was curious what the CPU cost
was of turning "-c" on by default, and was very surprised by the
results (in git.git):

  $ time git log -Sfoo >/dev/null
  real    0m11.532s
  user    0m11.273s
  sys     0m0.116s

  $ time git log -c -Sfoo >/dev/null
  real    3m7.530s
  user    3m3.991s
  sys     0m2.948s

A 1700% slowdown? Wow. There are ~20000 non-merge commits in git.git and
~4500 merge commits. Each merge commit has two parents (since we don't
tend to octopus merge), each of which is diffed individually. So I'd
expect it to add about 9000 diffs, or roughly 50% on top of the
11-second case.

My guess is that the subtree merges from gitk and git-gui are very
expensive to look at, since from one parent's perspective we will have
created the entire git project from scratch. On every merge. Yikes.

-Peff

  parent reply	other threads:[~2011-01-31 20:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-18 16:16 gitk "find commit adding/removing string"/possible pickaxe bug? Sebastian Hahn
2011-01-18 16:44 ` Thomas Rast
2011-01-18 18:39   ` Junio C Hamano
2011-01-18 18:50   ` Jeff King
2011-01-18 20:39     ` Thomas Rast
2011-01-18 20:50       ` Jeff King
2011-01-18 21:26         ` Junio C Hamano
2011-01-18 21:33           ` Jeff King
2011-01-31 20:00         ` Jeff King [this message]
2011-01-18 21:16       ` Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110131200002.GA7859@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mail@sebastianhahn.net \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).