git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Bracey <kevin@bracey.fi>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC/PATCH 1/3] revision.c: tighten up TREESAME handling of merges
Date: Mon, 29 Apr 2013 20:46:45 +0300	[thread overview]
Message-ID: <517EB205.6090804@bracey.fi> (raw)
In-Reply-To: <7v61z6sdpz.fsf@alter.siamese.dyndns.org>

On 28/04/2013 21:38, Junio C Hamano wrote:
>
>>>>    @@ -773,6 +861,9 @@ static void limit_to_ancestry(struct
>>>> commit_list *bottom, struct commit_list *li
>>>>    	 * NEEDSWORK: decide if we want to remove parents that are
>>>>    	 * not marked with TMP_MARK from commit->parents for commits
>>>>    	 * in the resulting list.  We may not want to do that, though.
>>>> +	 *
>>>> +	 * Maybe it should be considered if we are TREESAME to such
>>>> +	 * parents - now possible with stored per-parent flags.
>>>>    	 */
>>> Hmm, that is certainly a thought.
>> My comment's wrong though. Reconsidering, what I think needs removing
>> is actually off-ancestry parents that we are !TREESAME to, when we are
>> TREESAME on the ancestry path.
> I thought I read you meant exactly that, i.e. !TREESAME, but now I
> re-read what is quoted, you did say "we are TREESAME" ;-).  I think
> I agree with you that we do not want any side branch that is not on
> the ancestry path we are interested in to affect the sameness
> assigned to the merge commit.

I did a trial implementation of this in limit_to_ancestry(), and the 
result was lovely, but in the end I decided it's not actually the right 
place to do it. The logic is more general than that; this isn't just an 
ancestry-path issue, and I think "hiding" parents isn't the right way to 
go about it anyway.

To slightly generalise your own wording: I think the rule is "we do not 
want any side branch that is UNINTERESTING to affect the sameness 
assigned to the merge commit". I think that rule applies to all dense, 
pruned modes.

Having experimented with some of the annoyingly complex merge paths that 
originally prompted this series, it seems this rule makes a huge 
difference, and it's useful whether asking "--simplify-merges A..B 
<file>" or "--ancestry-path A..B <file>".

At present, either query will show lots of really boring merge commits 
of topic branches at the boundary, with 1 INTERESTING parent that 
they're TREESAME too, and 1 UNINTERESTING parent that they may or may 
not be TREESAME to, depending on how old the base of that topic branch 
was. Most such commits are of no relevance to our history whatsoever. In 
the case of "--simplify-merges", the fact that they're UNINTERESTING 
actually _prevented_ their simplification - if it had been allowed to 
follow the UNINTERESTING path back further, it would have reached an 
ancestor, and been found redundant. So limiting the rev-list actually 
increases the number of merges shown.

We can lose all those boring commits with these two changes:

1) Previously TREESAME was defined as "this commit matches at least 1 
parent". My first patch changes it to "this commit matches all parents". 
It should be refined further to "this commit matches all INTERESTING 
parents, if it has any, else all (UNINTERESTING) parents". (Can we word 
that better?) Note that this fancy rule collapses to the same 
straightforward TREESAME check as ever for 0- or 1-parent commits.

2) simplify_merges currently will not simplify commits unless they have 
exactly 1 parent. That's not what we want. We only need to preserve 
commits that don't have exactly 1 INTERESTING parent.

Those 2 rules produce the desirable result: if we have a merge commit 
with exactly 1 INTERESTING parent it is TREESAME to, it is always 
simplified away - any other UNINTERESTING parents it may have did not 
affect our code, so we don't care about whether we were TREESAME to them 
or not, and as we don't want to see any of the UNINTERESTING parents 
themselves, the merge is not worth showing.

This makes a massive difference on some of my searches, reducing the 
total commits shown by a factor of 5 to 10, greatly improving the 
signal-to-noise ratio.

I'll put together a trial patch at the end of the next iteration of the 
series that implements this logic. I need to think a bit more - I think 
"get_commit_action" needs a similar INTERESTING check for merges too, to 
get the same sort of effect without relying on simplify_merges. Parent 
rewriting shouldn't necessitate keeping all merges - only merges with 2+ 
INTERESTING parents.

>>
>>                    *   *
>>            .-A---M---N---O---P
>>           /*    /   /*  /*  /*
>>          I     B   C   D   E
>>           \   /*  /   /*  /
>>            `-------------'
> I've added '*' next to each arc between a commit-pair whose contents
> at 'foo' are different to the illustration, following the set-up the
> manual describes.  E is the same as I for 'foo' and P would resolve
> 'foo' to be the same as O.

I think that sort of thing could be a useful patch to the docs.
>
>> Given this error, and this change, I think this example may want a
>> slight rethink. Do we want a proper "messing with other paths but
>> TREESAME merge" example? Say if E's parent was O, P would not be
>> TREESAME and not included in --full-history.
> I am not sure if I follow your last sentence.
>
> Do you mean this topology, where E's sole parent is O, i.e.
>
>                E
>               / \
> 	N---O---P
>             /*
>            D
>
> and E does not change 'foo' from O?  Then P is TREESAME to all its
> parents and would not have to appear in the full history for the
> same reason M does not appear in your earlier IABNDOP output, no?

That's the topology I was thinking of.  Yes, P is then "full-TREESAME" 
like M, but it's just a more typical example of a real merge and why 
TREESAMEness arises than M is.  M didn't appear in full-history because 
both parents made the same change to foo - indeed both parents were 
identical. Whereas P wouldn't appear because E is different, but changed 
something other than foo.

Kevin

  reply	other threads:[~2013-04-29 17:46 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-09 18:00 Locating merge that dropped a change Kevin Bracey
2013-04-11 17:28 ` Kevin Bracey
2013-04-11 19:21   ` Junio C Hamano
2013-04-22 19:23     ` [RFC/PATCH] Make --full-history consider more merges Kevin Bracey
2013-04-22 19:49       ` Junio C Hamano
2013-04-23 16:35         ` Kevin Bracey
2013-04-24 22:34           ` Junio C Hamano
2013-04-25  1:59             ` Junio C Hamano
2013-04-25 15:48               ` Kevin Bracey
2013-04-25 16:51                 ` Junio C Hamano
2013-04-25 17:11                   ` Kevin Bracey
2013-04-25 18:19                     ` Junio C Hamano
2013-04-26 19:18                       ` Kevin Bracey
2013-04-26 19:31                         ` [RFC/PATCH 1/3] revision.c: tighten up TREESAME handling of merges Kevin Bracey
2013-04-26 19:31                           ` [RFC/PATCH 2/3] simplify-merges: never remove all TREESAME parents Kevin Bracey
2013-04-27 23:02                             ` Junio C Hamano
2013-04-28  7:10                               ` Kevin Bracey
2013-04-28 18:09                                 ` Junio C Hamano
2013-04-26 19:31                           ` [RFC/PATCH 3/3] simplify-merges: drop merge from irrelevant side branch Kevin Bracey
2013-04-27 22:36                           ` [RFC/PATCH 1/3] revision.c: tighten up TREESAME handling of merges Junio C Hamano
2013-04-27 22:57                             ` David Aguilar
2013-04-28  7:03                             ` Kevin Bracey
2013-04-28 18:38                               ` Junio C Hamano
2013-04-29 17:46                                 ` Kevin Bracey [this message]
2013-04-29 18:11                                   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=517EB205.6090804@bracey.fi \
    --to=kevin@bracey.fi \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).