git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Distinguishing trivial and non-trivial merge commits
@ 2010-04-30 16:35 Eli Barzilay
  2010-05-02 18:18 ` Jonathan Nieder
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Barzilay @ 2010-04-30 16:35 UTC (permalink / raw)
  To: git

I'm trying to get a goot notification script, and got stuck with merge
commits.  My script shows a list of modifications for each commit
(based on the diffstat output) -- and if I show all commits, then
merges are misleading in that a quick glance through the email makes
it look like a lot more was touched.  Using `--no-merges' helps in
avoiding the confusing parts, but that's dangerous in omitting
non-trivial merges too -- and those are probably even more worth
noting than other changes (just because they'll highlights changes
that are "hotter" in the sense of more people working on that code).

The only way I've seen to distinguish the two is to use `git show' and
see if there is no diff output (eg, "git show --pretty=format: $rev").
But that doesn't help in getting the list of modified files.  So I add
`--stat' to that, and that goes back to showing all files again, the
same stuff that "git diff $rev^!" shows.

Is there *any* way to get `git diff --stat' to do the same thing that
`git show' does?  (Or a way to get `git show --stat' not show all
files again...)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Distinguishing trivial and non-trivial merge commits
  2010-04-30 16:35 Distinguishing trivial and non-trivial merge commits Eli Barzilay
@ 2010-05-02 18:18 ` Jonathan Nieder
  2010-05-02 19:32   ` Eli Barzilay
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Nieder @ 2010-05-02 18:18 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: git

Eli Barzilay wrote:

> Using `--no-merges' helps in
> avoiding the confusing parts, but that's dangerous in omitting
> non-trivial merges too -- and those are probably even more worth
> noting than other changes (just because they'll highlights changes
> that are "hotter" in the sense of more people working on that code).
> 
> The only way I've seen to distinguish the two is to use `git show' and
> see if there is no diff output (eg, "git show --pretty=format: $rev").
> But that doesn't help in getting the list of modified files.

Maybe --name-only or --name-status can help.

Note that most conflicts will not show up here: if the merge result
matches either parent, then git diff --cc and friends will not
consider it interesting at all.  A command to list conflicts and their
resolutions would be expensive but valuable, I think.  A naïve
implementation would involve redoing the merge.

[...]
> Is there *any* way to get `git diff --stat' to do the same thing that
> `git show' does?  (Or a way to get `git show --stat' not show all
> files again...)

A “merge diffstat” sounds like an interesting idea, but the detailed
semantics are not obvious to me (maybe separate counts for nontrivial
added and removed lines from each parent?).

Thanks for the food for thought,
Jonathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Distinguishing trivial and non-trivial merge commits
  2010-05-02 18:18 ` Jonathan Nieder
@ 2010-05-02 19:32   ` Eli Barzilay
  2010-05-02 20:12     ` Jakub Narebski
  2010-05-02 20:29     ` Jonathan Nieder
  0 siblings, 2 replies; 8+ messages in thread
From: Eli Barzilay @ 2010-05-02 19:32 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On May  2, Jonathan Nieder wrote:
> Maybe --name-only or --name-status can help.
> 
> Note that most conflicts will not show up here: if the merge result
> matches either parent, then git diff --cc and friends will not
> consider it interesting at all.

Isn't that a good definition of a trivial commit?  I'm not talking
about the whole commit -- just any file that is not identical to one
of its parents.


> A command to list conflicts and their resolutions would be expensive
> but valuable, I think.  A naïve implementation would involve redoing
> the merge.
> [...]
> A “merge diffstat” sounds like an interesting idea, but the detailed
> semantics are not obvious to me (maybe separate counts for
> nontrivial added and removed lines from each parent?).

OK, thanks for clarifying that.  For my purpose, I basically just want
to know whether there was manual tweaking involved in the merge.  (For
my thing I don't even need to see those changes, since I show the
overall push diff only.)  What I ended up doing is pretty bad:

  git show --pretty=short --name-only "$r" | grep -q '^Merge: '
    --> test if it's a merge commit

  git show --pretty=format:"" --name-only "$r" | grep -q "."
    --> test if it's trivial

  git show --pretty=format:"" "$r" | diffstat -p1
    --> get the diffstated output

(My script generally "compensates" for git being fast by running a ton
of them for each email...)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Distinguishing trivial and non-trivial merge commits
  2010-05-02 19:32   ` Eli Barzilay
@ 2010-05-02 20:12     ` Jakub Narebski
  2010-05-02 21:03       ` Eli Barzilay
  2010-05-02 20:29     ` Jonathan Nieder
  1 sibling, 1 reply; 8+ messages in thread
From: Jakub Narebski @ 2010-05-02 20:12 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: Jonathan Nieder, git

Eli Barzilay <eli@barzilay.org> writes:

> OK, thanks for clarifying that.  For my purpose, I basically just want
> to know whether there was manual tweaking involved in the merge.  (For
> my thing I don't even need to see those changes, since I show the
> overall push diff only.)  What I ended up doing is pretty bad:

Well, the compact combined output is thought in such way that 
"git diff --cc" should be empty except for evil merges, when change
comes from neither of parents.  See description of --cc format in
git-diff(1) manpage.

The other side is "git diff --raw" output for merges.
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Distinguishing trivial and non-trivial merge commits
  2010-05-02 19:32   ` Eli Barzilay
  2010-05-02 20:12     ` Jakub Narebski
@ 2010-05-02 20:29     ` Jonathan Nieder
  1 sibling, 0 replies; 8+ messages in thread
From: Jonathan Nieder @ 2010-05-02 20:29 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: git

Eli Barzilay wrote:
> On May  2, Jonathan Nieder wrote:

>> Note that most conflicts will not show up here: if the merge result
>> matches either parent, then git diff --cc and friends will not
>> consider it interesting at all.
>
> Isn't that a good definition of a trivial commit?  I'm not talking
> about the whole commit -- just any file that is not identical to one
> of its parents.
[...]
> For my purpose, I basically just want
> to know whether there was manual tweaking involved in the merge.

diff --name-only follows exactly the example heuristic you described.
It still does not catch all manual merge resolutions[1].

Sometimes two branches introduce different changes to completely
separate parts of a file.  This is not a conflict, and diff --cc will
correctly report the merge as trivial (whereas diff --name-only does
not pay enough attention to do the same).

On the other hand, sometimes two branches introduce conflicting
changes, but the correct resolution for each conflict hunk is to pick
one as winner.  Though simple, this can be error-prone, because
rejecting one change from branch A might end up breaking another
change that was accepted from the same branch.  diff --cc examines
only the selected revision and its parents and for all it knows, this
is just another trivial merge.

>  git show --pretty=format:"" --name-only "$r" | grep -q "."
>    --> test if it's trivial

I would have expected

	git show --name-only --exit-code --quiet "$r"

to take care of this, but apparently it always exits zero.  Probably
no one had tried it before.

> (My script generally "compensates" for git being fast by running a ton
> of them for each email...)

:)

Jonathan

[1] http://thread.gmane.org/gmane.comp.version-control.git/89415

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Distinguishing trivial and non-trivial merge commits
  2010-05-02 20:12     ` Jakub Narebski
@ 2010-05-02 21:03       ` Eli Barzilay
  2010-05-02 23:07         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Barzilay @ 2010-05-02 21:03 UTC (permalink / raw)
  To: Jonathan Nieder, Jakub Narebski; +Cc: git

On May  2, Jakub Narebski wrote:
> 
> Well, the compact combined output is thought in such way that "git
> diff --cc" should be empty except for evil merges, when change comes
> from neither of parents.  See description of --cc format in
> git-diff(1) manpage.

Is there a way to convince `git diff' to show *only* these diffs?
Hopefully something that will also play with --stat...  Actually, I'm
fine with `git show' instead -- and it does show these things, but if
I add a --stat it shows all changed files again.


On May  2, Jonathan Nieder wrote:
> 
> diff --name-only follows exactly the example heuristic you
> described.  It still does not catch all manual merge resolutions[1].
> 
> Sometimes two branches introduce different changes to completely
> separate parts of a file.  This is not a conflict, and diff --cc
> will correctly report the merge as trivial (whereas diff --name-only
> does not pay enough attention to do the same).

Good -- the first paragraph is what I thought it would do, the second
makes it even better.  Perhaps it would be nice to show such manual
reoslutions from one parent, but I think that that certainly qualifies
as stuff that is much less interesting than work that was done to
combine content inside a file, or one of those evil merges.  (Since
it's a notification script -- so I really want it to highlight only
"real" work that was done.)


> On the other hand, sometimes two branches introduce conflicting
> changes, but the correct resolution for each conflict hunk is to
> pick one as winner.  Though simple, this can be error-prone, because
> rejecting one change from branch A might end up breaking another
> change that was accepted from the same branch.  diff --cc examines
> only the selected revision and its parents and for all it knows,
> this is just another trivial merge.

To translate the "danger" here, I think that if one person commits
some changed files, another one does a merge with the earlier content
and essentially re-commits all previous versions -- undoing this
commit -- IIUC, then in this case my strategy wouldn't show anything.
I think that what I do would work "well enough", since such things
would be rare to do (eg, it makes more sense to merge first, then
another commit that undoes the changes), and in any case there's also
the overall push diff, and those changes would show up there.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Distinguishing trivial and non-trivial merge commits
  2010-05-02 21:03       ` Eli Barzilay
@ 2010-05-02 23:07         ` Junio C Hamano
  2010-05-02 23:23           ` Eli Barzilay
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2010-05-02 23:07 UTC (permalink / raw)
  To: Eli Barzilay; +Cc: Jonathan Nieder, Jakub Narebski, git

Eli Barzilay <eli@barzilay.org> writes:

> Good -- the first paragraph is what I thought it would do, the second
> makes it even better.  Perhaps it would be nice to show such manual
> reoslutions from one parent, but I think that that certainly qualifies
> as stuff that is much less interesting than work that was done to
> combine content inside a file, or one of those evil merges.

You need to be careful here.

If you see a conflicted hunk and the result ends up being the same from
one of the parents in that area, then the _result_ is trivial as far as
the merge (and hence --cc) is concerned, even though you may have spent
considerable braincycle to stare at the conflict to decide that taking a
change from one side is the right thing to do.

You keep saying "manual resolutions", but --cc doesn't have much to do
with the resolution being manual, nor if the initial mechanical merge
attempt left conflict markers.  You would need to redo the merge to find
that out, and it can be fairly cheaply done with a temporary index and an
empty temporary working tree elsewhere in the filesystem.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Distinguishing trivial and non-trivial merge commits
  2010-05-02 23:07         ` Junio C Hamano
@ 2010-05-02 23:23           ` Eli Barzilay
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Barzilay @ 2010-05-02 23:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, Jakub Narebski, git

On May  2, Junio C Hamano wrote:
> Eli Barzilay <eli@barzilay.org> writes:
> 
> > Good -- the first paragraph is what I thought it would do, the
> > second makes it even better.  Perhaps it would be nice to show
> > such manual reoslutions from one parent, but I think that that
> > certainly qualifies as stuff that is much less interesting than
> > work that was done to combine content inside a file, or one of
> > those evil merges.
> 
> You need to be careful here.
> 
> If you see a conflicted hunk and the result ends up being the same
> from one of the parents in that area, then the _result_ is trivial
> as far as the merge (and hence --cc) is concerned, even though you
> may have spent considerable braincycle to stare at the conflict to
> decide that taking a change from one side is the right thing to do.

Yes, that's clear to me now -- thanks.


> You keep saying "manual resolutions", but --cc doesn't have much to
> do with the resolution being manual, nor if the initial mechanical
> merge attempt left conflict markers.

(Right, my original terminology was broken...)


> You would need to redo the merge to find that out, and it can be
> fairly cheaply done with a temporary index and an empty temporary
> working tree elsewhere in the filesystem.

The thing is that I spent so much time for something that is "only the
notification emails"...  And doing something like this sounds like it
requires knowing more git bits.  Is there something similar to this
somewhere that I can stare at to see how it's done?  Pointers would be
much appreciated.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-05-02 23:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-30 16:35 Distinguishing trivial and non-trivial merge commits Eli Barzilay
2010-05-02 18:18 ` Jonathan Nieder
2010-05-02 19:32   ` Eli Barzilay
2010-05-02 20:12     ` Jakub Narebski
2010-05-02 21:03       ` Eli Barzilay
2010-05-02 23:07         ` Junio C Hamano
2010-05-02 23:23           ` Eli Barzilay
2010-05-02 20:29     ` Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).