From: Linus Torvalds <torvalds@linux-foundation.org>
To: Sam Vilain <sam@vilain.net>
Cc: martin f krafft <madduck@madduck.net>,
git discussion list <git@vger.kernel.org>
Subject: Re: inexplicable failure to merge recursively across cherry-picks
Date: Thu, 11 Oct 2007 15:33:04 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.0.999.0710111455220.20690@woody.linux-foundation.org> (raw)
In-Reply-To: <470E9AD5.2090002@vilain.net>
On Fri, 12 Oct 2007, Sam Vilain wrote:
>
> 1. do a --cherry-pick rev-list on just the file being merged and see if
> all the changes on one side disappear, in which case just take the result.
>
> 2. see if the files were identical at some point, in which case use a
> new merge base for that file based on the changes since that revision.
>
> I actually thought #2 was already the way recursive worked!
Actually, I think both of these are fundamentally wrong.
The reason is that you talk about "the file".
Anything that is based on per-file heuristics is going to mean that you
use a history that is not necessarily compatible with the _other_ files in
the project.
I agree 100% that per-file history information is going to find more
things to merge automatically. But the point I was trying to make was that
"automatic merges" aren't always *good*.
I realize that pretty much every single theoretical merge algorithm out
there tries to make merges happen automatically as much as possible, but
they all en dup having strange issues.
For example, take a patch that cherry-picked into mainline from a
development branch, but that partly depended on some support-feature that
wasn't in mainline yet. So then there is another patch that removes part
of that patch from mainline. So mainline is fine.
Now, three months later, the development branch is stable, and is fully
merged. What happens?
Git will largely get this right. Git will look at the last *global* common
base, and will just look at the contents, and do a reasonable job. Yes,
there will probably be conflicts (because both the development branch and
the mainline ended up touching the same parts of the files thanks to the
cherry-pick, and yet mainline has some added hacks on top to disable it),
but on the whole that's exactly what you want!
(Alternatively, maybe the "remove the part that wasn't supported yet"
ended up meaning that that particular part of the patch was excised
entirely from mainline, and there was no conflict at all, and git just
merged the new stuff from the development branch cleanly! So I'm not
saying that it *has* to conflict, I'm just saying that it might have).
In other words: git always "does the right thing". Assuming both branches
are stable and working, git does a very reasonable thing. It's obviously
not always the thing people may *want*, but it's guaranteed to be a
reasonable and simple guess, and there's no way it's "too clever for its
own good, and just screwed the pooch entirely".
In contrast, your suggested merge strategy would be HORRIBLY BROKEN!
Why? Because it doesn't look at the *common* history to the project, it
looks at some per-file state that is totally bogus and has no relevance.
Think it through: what happens if there were files with the same content
(because of the cherry-pick), and then the file history for one of the
branches was later changed to disable something because the support for it
wasn't in the "whole history"?
Right: the final merge will contain that change! Because there as a time
where the file was identical (the cherry-pick), so you're taking all the
later changes to that file (the undo)!
Notice? Totally the wrong thing to do!
So this is a classic case of trying to make "easier" merges, but where the
whole approach is totally broken! You simply MUST NOT add logic like that.
It's a lot better to give a conflict, than to try to be "clever", and
silently do the wrong thing.
Yes, you can be really stupid, and silently do the wrong thing too, but if
you're stupid, at least the "silent wrong thing" is never really subtle,
it's pretty much guaranteed to easily explainable. And the good news is
that you didn't have a complicated and fragile algorithm just to get the
wrong answer.
(Put another way: if you are always going to have situations where you get
the wrong answer, you'd better take the simple and stupid algorithm,
because people are more likely to then be able to _predict_ that wrong
answer and are thus more ready to handle it!)
So being clever really is the wrong thing to do. And using history that
isn't global and true history (ie just looking at one file, and deciding
that matching that one file "means" something) is fundamnetally broken.
In fact, in general, individual pieces of history are totally worthless.
The fact that some individual change was done in one branch doesn't really
tell you *anything*. The reason that change was done may be implied by all
the previous changes, or conversely, later changes may have undone the
change, so any merge algorithm that starts to look at individual commits
is likely to be pure and utter crap - exactly because it's starting to
make decisions based on local information that may not be valid in the big
picture.
(Where the "big picture" may be either about "space" - other files - or
about "time" - other commits, that simply mean that the individual changes
of one commit are meaningless on their own).
Btw, one thing to note is how well the simple and stupid git merge
strategy works. It turns out that doing things with the "big picture"
model actually does work really well. People think that they need
"finegrained history" to make good merges, but I think most people who
have actually done a fair number of merges with git have noticed that it's
actually pretty dang painless.
But to be honest, there are cases where git isn't being very helpful. In
particular, I think there *are* things that git could probably be more
helpful with, but looking at local history is not one of them, I think.
So here are some suggestions on things I think we could improve on:
- I think it would be wonderful to have a helper tool for handling
conflicts.
In particular, while I don't think per-file history is good for
resolving conflicts *automatically*, I actually do think that per-file
history can be a good way to *manually* resolve conflicts.
In other words, it you have a conflict, I think it would be wonderful
to have some git-gui-like thing that can show the history (with
patches) for that file, and basically combine a three-way graphical
merge *with* some per-commit information where you can say "choose the
thing that that commit did for this conflict".
So I think git already has tools to help resolve conflicts, and I
personally love doing "gitk --merge" or "git log -p --merge" when a
conflict does happen, but I think some smart GUI person could do
something even much better!
And notice how I think that it's *really* wrong to use per-file history
automatically, but that I think it's not wrong at all to use it when
there's a human that says "ok, obviously pick that case". Things that
are horrible when they cause subtle and automatic resolves can be very
good when they cause subtle resolves that a human looked at!
- I suspect we have issues with common whitespace changes, where we again
could probably help people resolve whitespace changes etc better.
Again, I don't think those are necessarily things you want to do
automatically, but I know from personal experience that handling things
like one side having done a re-indent can be *really* annoying, just
because you end up doing tons of mindless stuff when you fix up all the
totally idiotic and usually trivial conflicts.
.. and I'm sure there are other things we could do better too, but the
above two are things that while they haven't happened for me for the
kernel (probably because we have learnt how to not cause them over the
years), I've seen them in other places.
And yes, the above two suggestions fall solidly in the "conflicts aren't
bad per se, but you want to make the tool really help you resolve them!"
camp.
Linus
prev parent reply other threads:[~2007-10-11 22:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-10 1:55 inexplicable failure to merge recursively across cherry-picks martin f krafft
2007-10-10 2:54 ` Linus Torvalds
2007-10-10 10:25 ` martin f krafft
2007-10-10 10:33 ` David Kastrup
2007-10-10 15:25 ` Linus Torvalds
2007-10-10 15:48 ` David Brown
2007-10-10 19:07 ` Miklos Vajna
2007-10-10 19:35 ` Linus Torvalds
2007-10-11 0:08 ` Miklos Vajna
2007-10-11 21:51 ` Sam Vilain
2007-10-11 22:33 ` Linus Torvalds [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.999.0710111455220.20690@woody.linux-foundation.org \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=madduck@madduck.net \
--cc=sam@vilain.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).