inexplicable failure to merge recursively across cherry-picks

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* inexplicable failure to merge recursively across cherry-picks
@ 2007-10-10  1:55 martin f krafft
  2007-10-10  2:54 ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: martin f krafft @ 2007-10-10  1:55 UTC (permalink / raw)
  To: git discussion list

[-- Attachment #1: Type: text/plain, Size: 2533 bytes --]

Hi folks,

I hope this is not a daily series of mine, being confused about Git
merging, but I've run my head against a wall again and before
I crush my skull, I'd prefer to reach out to you to help me regain
an understanding.

This is about the new mdadm for Debian packaging effort. You can
clone from git://git.debian.org/git/pkg-mdadm/mdadm-new.git and see
the repo at
http://git.debian.org/?p=pkg-mdadm/mdadm-new.git;a=summary. Do not
track as this repo is subject to change.

My master branch was last merged with upstream's mdadm-2.6.2 tag
(commit 263a535). Since then, I've committed a couple of changes to
master including three cherry-picks from upstream since mdadm-2.6.2.

I tagged upstream at the point which I want to merge into master:
mdadm-2.6.3+200709292116+4450e59. When I merge that tag into master,
I get a merge conflict on Monitor.c:

  <<<<<<< HEAD:Monitor.c
                                  if (mse->devnum != MAXINT &&
  =======
                                  if (mse->devnum != INT_MAX &&
  >>>>>>> upstream:Monitor.c

There are five commits between mdadm-2.6.2 and
mdadm-2.6.3+200709292116+4450e59 that affect Monitor.c:

  01d9299
  e4dc510
* 66f8bbb
  98127a6
  4450e59

The third commit, the one with the asterisk is the one that
I cherry-picked (as 845eef9); the other two cherries I picked do not
touch Monitor.c.

The fifth/last commit (4450e59) is the one responsible for the
change which seems to cause the conflict. It is the *only* commit
since the common ancestor of *both* branches that touches the
conflicting lines.

The fourth commit (98127a6) inserts a single line at the top of the
file, so that's nothing that would cause a conflict.

To be honest, I can't explain it. But I didn't give up.

I branched master2 off 845eef9b~1, cherry-picked the first two
commits that touch Monitor.c, cherry-picked all the commits
845eef9b..master into master2 and merge upstream...

... to get exactly the same conflict in exactly the same line in
exactly the same file.

What is going on. Am I seriously overestimating Git's merging
capacities, or do I have a bug in my brain?

-- 
martin;              (greetings from the heart of the sun.)
  \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck

"the only difference between shakespeare and you
 was the size of his idiom list -- not the size of his vocabulary."
                                                      -- alan perlis

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10  1:55 inexplicable failure to merge recursively across cherry-picks martin f krafft
@ 2007-10-10  2:54 ` Linus Torvalds
  2007-10-10 10:25   ` martin f krafft
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2007-10-10  2:54 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list

On Wed, 10 Oct 2007, martin f krafft wrote:
> 
> There are five commits between mdadm-2.6.2 and
> mdadm-2.6.3+200709292116+4450e59 that affect Monitor.c:
> 
>   01d9299
>   e4dc510
> * 66f8bbb
>   98127a6
>   4450e59
> 
> The third commit, the one with the asterisk is the one that
> I cherry-picked (as 845eef9); the other two cherries I picked do not
> touch Monitor.c.

Side note - run

	gitk --merge

when you have a merge conflict, and it will basically show you the thing 
graphically (ie history as it is relevant to the merge, and only to the 
files that get conflicts).

But basically, both sides have modified the code *around* that line, and 
they have modified it differently.

Do this in your partial merge tree on 'master':

	git diff ...mdadm-2.6.3+200709292116+4450e59 Monitor.c
	git diff mdadm-2.6.3+200709292116+4450e59... Monitor.c

which will show you the diff from the common base ancestor. And in 
particular, it will show how one branch did this:

	@@ -399,9 +401,8 @@ int Monitor(mddev_dev_t devlist,
	                        struct mdstat_ent *mse;
	                        for (mse=mdstat; mse; mse=mse->next)
	                                if (mse->devnum != MAXINT &&
	-                                   (strcmp(mse->level, "raid1")==0 ||
	-                                    strcmp(mse->level, "raid5")==0 ||
	-                                    strcmp(mse->level, "multipath")==0)
	+                                   (strcmp(mse->level, "raid0")!=0 &&
	+                                    strcmp(mse->level, "linear")!=0)
	                                        ) {
	                                        struct state *st = malloc(sizeof *st);
	                                        mdu_array_info_t array;

and the other one did

	@@ -398,10 +402,9 @@ int Monitor(mddev_dev_t devlist,
	                if (scan) {
	                        struct mdstat_ent *mse;
	                        for (mse=mdstat; mse; mse=mse->next)
	-                               if (mse->devnum != MAXINT &&
	-                                   (strcmp(mse->level, "raid1")==0 ||
	-                                    strcmp(mse->level, "raid5")==0 ||
	-                                    strcmp(mse->level, "multipath")==0)
	+                               if (mse->devnum != INT_MAX &&
	+                                   (strcmp(mse->level, "raid0")!=0 &&
	+                                    strcmp(mse->level, "linear")!=0)
	                                        ) {
	                                        struct state *st = malloc(sizeof *st);
	                                        mdu_array_info_t array;

And now maybe git's behaviour makes more sense. See? You basically had two 
different branches that made *almost* the same changes to the same area, 
but not quite. So how is git to know which one was the *right* one to 
pick? The one that changed the "if (mse->devnum != MAXINT &&" line, or the 
one that left it alone?

> I branched master2 off 845eef9b~1, cherry-picked the first two
> commits that touch Monitor.c, cherry-picked all the commits
> 845eef9b..master into master2 and merge upstream...

Cherry-picking is immaterial. It doesn't matter how the changes come into 
the tree. It doesn't matter what the history is. The only thing git cares 
about is the content, and the end result.

Git knows that the two branches got to two different end results. They 
were identical except for that one line, and it asks you to say which 
branch was "right" wrt that one line.

In other words, git never looks at individual commits when trying to 
merge. It doesn't try to figure out what the "meaning" of the changes are, 
it purely looks at the content.

And btw, it *has* to work that way, because if you don't work that way, 
then you get different results depending on which path the development 
took (eg you might get different results if something was considered a 
"revert", for example, or if something was split up into two patches on 
one side but not the other etc etc).

But in this case it's pretty obvious: the commit from one side (the one 
that changes MAXINT->INT_MAX: "Monitor.c s/MAXINT/INT_MAX/g") merged fine 
into the result *except* for that one section that had touched the same 
general area for other reasons. And that one area was seen as a conflict 
because of those other reasons being in the same hunk.

And yes, in this case the "other reasons" happened to be cherry-picked and 
thus "the same" on both sides, but that doesn't mean that they should have 
been considered in any way special: the result of cherry-picking is 
context-dependent and is a part of the history of the target, and does not 
equate any kind of "identity" with the source. Cherry-picking is 100% 
equivalent to re-doing the commit entirely, and for all git knows, there 
was a reason why it wasn't done together with that s/MAXINT/INT_MAX/g 
change.

(Not that git even thinks in those terms - git literally just says: "I 
cannot resolve the _content_ independently and without understanding the 
history and thinking behind the differences, so you'd better help me")

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10  2:54 ` Linus Torvalds
@ 2007-10-10 10:25   ` martin f krafft
  2007-10-10 10:33     ` David Kastrup
  2007-10-10 15:25     ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: martin f krafft @ 2007-10-10 10:25 UTC (permalink / raw)
  To: git discussion list; +Cc: Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

also sprach Linus Torvalds <torvalds@linux-foundation.org> [2007.10.10.0354 +0100]:
> Cherry-picking is immaterial. It doesn't matter how the changes
> come into the tree. It doesn't matter what the history is. The
> only thing git cares about is the content, and the end result.

This is the part I over-estimated. I thought that Git would figure
out that commits 1-3 had been merged into the target and thus apply,
in sequence, only the commits from the source which had not been
merged.

Many thanks (again), Linus! Looking forward to your next content
manager; you know, the one with artificial intelligence built in!
You could call it "wit" :)

-- 
martin;              (greetings from the heart of the sun.)
  \____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck

dies ist eine manuell generierte email. sie beinhaltet
tippfehler und ist auch ohne großbuchstaben gültig.

spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10 10:25   ` martin f krafft
@ 2007-10-10 10:33     ` David Kastrup
  2007-10-10 15:25     ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: David Kastrup @ 2007-10-10 10:33 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list, Linus Torvalds

martin f krafft <madduck@madduck.net> writes:

> also sprach Linus Torvalds <torvalds@linux-foundation.org> [2007.10.10.0354 +0100]:
>> Cherry-picking is immaterial. It doesn't matter how the changes
>> come into the tree. It doesn't matter what the history is. The
>> only thing git cares about is the content, and the end result.
>
> This is the part I over-estimated. I thought that Git would figure
> out that commits 1-3 had been merged into the target and thus apply,
> in sequence, only the commits from the source which had not been
> merged.
>
> Many thanks (again), Linus! Looking forward to your next content
> manager; you know, the one with artificial intelligence built in!
> You could call it "wit" :)

Well, there is also an obvious name choice when the distinguishing
innovation is a well-rounded feature set, but it would cause a name
collision for the equivalent of "tig".

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10 10:25   ` martin f krafft
  2007-10-10 10:33     ` David Kastrup
@ 2007-10-10 15:25     ` Linus Torvalds
  2007-10-10 15:48       ` David Brown
  2007-10-11 21:51       ` Sam Vilain
  1 sibling, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2007-10-10 15:25 UTC (permalink / raw)
  To: martin f krafft; +Cc: git discussion list

On Wed, 10 Oct 2007, martin f krafft wrote:
> also sprach Linus Torvalds <torvalds@linux-foundation.org> [2007.10.10.0354 +0100]:
> > Cherry-picking is immaterial. It doesn't matter how the changes
> > come into the tree. It doesn't matter what the history is. The
> > only thing git cares about is the content, and the end result.
> 
> This is the part I over-estimated. I thought that Git would figure
> out that commits 1-3 had been merged into the target and thus apply,
> in sequence, only the commits from the source which had not been
> merged.

Yes, *some* SCM's have tried to do that. In particular, the ones that are 
"patch-based" tend to think that patches are "identical" regardless of 
where they are, and while re-ordering of them is a special event, it's not 
somethign that changes the fundamental 'ID' of the patch.

For example, I think the darcs "patch algebra" works that way.

It's a really horrible model. Not only doesn't it scale, but it leads to 
various very strange linkages between patches, and it fails the most 
important part: it means that merges get different results just because 
people are doing the same changes two different ways.

> Many thanks (again), Linus! Looking forward to your next content
> manager; you know, the one with artificial intelligence built in!
> You could call it "wit" :)

Well, the git model is really largely the reverse: the system is supposed 
to be as *stupid* as humanly possible, but:

 - make it predictable exactly because it's stupid and doesn't do anything 
   even half-ways smart.

   This is part of the "it doesn't matter *how* you got to a particular 
   state, git will always do the same thing regardless of whether you 
   moved an existing patch around or whether you re-did the changes as 
   (possibly more than one) new and unrelated commits".

 - conflicts aren't bad - they're *good*. Trying to aggressively resolve 
   them automatically when two branches have done slightly different 
   things in the same area is stupid and just results in more problems.

   Instead, git tries to do what I don't think *anybody* else has done: 
   make the conflicts easy to resolve, by allowing you to work with them 
   in your normal working tree, and still giving you a lot of tools to 
   help you see what's going on.

So git doesn't try to avoid conflicts per se: the merge strategies are 
fundamentally pretty simple (rename detection and the whole "recursive 
merge" thing may not be simple code, but the concepts are pretty 
straightforward), and they handle all the really *obvious* cases, but at 
the same time, I feel strongly that anything even half-way subtle should 
not be left to the SCM - the SCM should show it and make it really easy 
for the user to then fix it up.

Side note: even with a totally obvious three-way merge, with absolutely 
zero conflicts even remotely close to each other, you can have the merge 
algorithm generate a good merge that doesn't actually *work*.

For example, it's happened a few times that one branch renames a structure 
member name (and changes all the uses) and another branch adds new code 
that uses the old member name. The end result: the code will *merge* fine, 
and there are zero conflicts in the content, because all the changes were 
totally disjoint, but the end result doesn't actually work or even 
compile!

So no merge strategy is ever perfect. The git approach is to be simple and 
predictable, and also to make it easy to fix up (ie even if you get the 
above kind of automatic merge problem, if you catch it in compiling, you 
can fix it up, and do a "git commit --amend" to fix up the merge itself 
before you push it out).

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10 15:25     ` Linus Torvalds
@ 2007-10-10 15:48       ` David Brown
  2007-10-10 19:07         ` Miklos Vajna
  2007-10-11  0:08         ` Miklos Vajna
  2007-10-11 21:51       ` Sam Vilain
  1 sibling, 2 replies; 11+ messages in thread
From: David Brown @ 2007-10-10 15:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: martin f krafft, git discussion list

On Wed, Oct 10, 2007 at 08:25:15AM -0700, Linus Torvalds wrote:

>Yes, *some* SCM's have tried to do that. In particular, the ones that are 
>"patch-based" tend to think that patches are "identical" regardless of 
>where they are, and while re-ordering of them is a special event, it's not 
>somethign that changes the fundamental 'ID' of the patch.
>
>For example, I think the darcs "patch algebra" works that way.
>
>It's a really horrible model. Not only doesn't it scale, but it leads to 
>various very strange linkages between patches, and it fails the most 
>important part: it means that merges get different results just because 
>people are doing the same changes two different ways.

Actually, specifically darcs, different merges _always_ result in the same
data.  It's a fundamental part of is patch algebra.  No matter what order
you apply a given set of patches, even with conflicts and reordering, you
always get the same result, or no result.  Conflicts are "resolved" by
inserting conflict markers in the file, ordered by the patch ID.  It
doesn't matter which order you apply them in, you get the same markers.
Then there will be a merge patch which fixes the markers that someone could
apply, no matter what order the applied the previous patches.

Darcs breaks down in a few places, though.

   - The no result.  Sometimes, it just can't figure out how to reorder
     patches.  Even worse, occasionally, the implementation will fail to
     terminate try to figure this out.  There isn't much to do at this
     point, except manually apply the patch, hence generating a new patch
     ID.

   - It doesn't scale well.

The strange linkages between patches could be thought of as a feature,
since it is basically constraining the order that the patches can be
applied in.

There is a darcs-git project that tries to do the darcs things on top of
git.

Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10 15:48       ` David Brown
@ 2007-10-10 19:07         ` Miklos Vajna
  2007-10-10 19:35           ` Linus Torvalds
  2007-10-11  0:08         ` Miklos Vajna
  1 sibling, 1 reply; 11+ messages in thread
From: Miklos Vajna @ 2007-10-10 19:07 UTC (permalink / raw)
  To: David Brown; +Cc: Linus Torvalds, martin f krafft, git discussion list

[-- Attachment #1: Type: text/plain, Size: 2199 bytes --]

On Wed, Oct 10, 2007 at 08:48:31AM -0700, David Brown <git@davidb.org> wrote:
> On Wed, Oct 10, 2007 at 08:25:15AM -0700, Linus Torvalds wrote:

> >Yes, *some* SCM's have tried to do that. In particular, the ones that are "patch-based" tend to think that patches are "identical" regardless of where they are, and while re-ordering of them is a special event, it's not somethign that changes the fundamental 'ID' of the patch.

> >For example, I think the darcs "patch algebra" works that way.

> >It's a really horrible model. Not only doesn't it scale, but it leads to various very strange linkages between patches, and it fails the most important part: it means that merges get different results just because people are doing the same changes two different ways.

> Actually, specifically darcs, different merges _always_ result in the same
> data.  It's a fundamental part of is patch algebra.  No matter what order
> you apply a given set of patches, even with conflicts and reordering, you
> always get the same result, or no result.  Conflicts are "resolved" by
> inserting conflict markers in the file, ordered by the patch ID.  It
> doesn't matter which order you apply them in, you get the same markers.
> Then there will be a merge patch which fixes the markers that someone could
> apply, no matter what order the applied the previous patches.

> Darcs breaks down in a few places, though.

>   - The no result.  Sometimes, it just can't figure out how to reorder
>     patches.  Even worse, occasionally, the implementation will fail to
>     terminate try to figure this out.  There isn't much to do at this
>     point, except manually apply the patch, hence generating a new patch
>     ID.

>   - It doesn't scale well.

> The strange linkages between patches could be thought of as a feature,
> since it is basically constraining the order that the patches can be
> applied in.

> There is a darcs-git project that tries to do the darcs things on top of
> git.

> Dave
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

thanks,
- VMiklos

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10 19:07         ` Miklos Vajna
@ 2007-10-10 19:35           ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2007-10-10 19:35 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: David Brown, martin f krafft, git discussion list

On Wed, 10 Oct 2007, Miklos Vajna wrote:
> 
> Actually, specifically darcs, different merges _always_ result in the same
> data.

No they don't. You don't understand the problem.

Yes, different merges WITH THE SAME PATCHES always result in the same 
data.

But that's not a realistic - or even very interesting - schenario.

What's much more common is that the same problem gets solved slightly 
differently in two different branches. For example, maybe somebody does it 
as two different patches - where the second one fixes a bug in the first 
fix. And another person does the same fix, but without the bug in the 
first place.

See? A patch-based system gets confused by those kinds of issues (or they 
turn into various special cases). And that is fundamentally why you MUST 
NOT take history into account (where "history" is some series of 
individual patches).

Yes, history is interesting for historical reasons, and to explain what 
the context was, but in many ways, history is exactly the *wrong* thing to 
use when it comes to merging. You should look at the end result, since 
people can - and do - come to the same result through different ways.

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10 15:48       ` David Brown
  2007-10-10 19:07         ` Miklos Vajna
@ 2007-10-11  0:08         ` Miklos Vajna
  1 sibling, 0 replies; 11+ messages in thread
From: Miklos Vajna @ 2007-10-11  0:08 UTC (permalink / raw)
  To: David Brown; +Cc: Linus Torvalds, martin f krafft, git discussion list

[-- Attachment #1: Type: text/plain, Size: 408 bytes --]

[ ehh, sorry for my previous mail, i must be doing something wrong.. ]

On Wed, Oct 10, 2007 at 08:48:31AM -0700, David Brown <git@davidb.org> wrote:
> There is a darcs-git project that tries to do the darcs things on top of
> git.

actually it's broken, according to its author:

http://www.mail-archive.com/darcs-users@darcs.net/msg03161.html

though i loved darcs, but yes - it scales horribly

- VMiklos

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-10 15:25     ` Linus Torvalds
  2007-10-10 15:48       ` David Brown
@ 2007-10-11 21:51       ` Sam Vilain
  2007-10-11 22:33         ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Sam Vilain @ 2007-10-11 21:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: martin f krafft, git discussion list

Linus Torvalds wrote:
> So git doesn't try to avoid conflicts per se: the merge strategies are 
> fundamentally pretty simple (rename detection and the whole "recursive 
> merge" thing may not be simple code, but the concepts are pretty 
> straightforward), and they handle all the really *obvious* cases, but at 
> the same time, I feel strongly that anything even half-way subtle should 
> not be left to the SCM - the SCM should show it and make it really easy 
> for the user to then fix it up.

This is true.  However I think there are some obvious places for
improvement that does look at the file history, when the regular
algorithm fails;

1. do a --cherry-pick rev-list on just the file being merged and see if
all the changes on one side disappear, in which case just take the result.

2. see if the files were identical at some point, in which case use a
new merge base for that file based on the changes since that revision.

I actually thought #2 was already the way recursive worked!

Sam.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: inexplicable failure to merge recursively across cherry-picks
  2007-10-11 21:51       ` Sam Vilain
@ 2007-10-11 22:33         ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2007-10-11 22:33 UTC (permalink / raw)
  To: Sam Vilain; +Cc: martin f krafft, git discussion list

On Fri, 12 Oct 2007, Sam Vilain wrote:
> 
> 1. do a --cherry-pick rev-list on just the file being merged and see if
> all the changes on one side disappear, in which case just take the result.
> 
> 2. see if the files were identical at some point, in which case use a
> new merge base for that file based on the changes since that revision.
> 
> I actually thought #2 was already the way recursive worked!

Actually, I think both of these are fundamentally wrong.

The reason is that you talk about "the file".

Anything that is based on per-file heuristics is going to mean that you 
use a history that is not necessarily compatible with the _other_ files in 
the project.

I agree 100% that per-file history information is going to find more 
things to merge automatically. But the point I was trying to make was that 
"automatic merges" aren't always *good*. 

I realize that pretty much every single theoretical merge algorithm out 
there tries to make merges happen automatically as much as possible, but 
they all en dup having strange issues.

For example, take a patch that cherry-picked into mainline from a 
development branch, but that partly depended on some support-feature that 
wasn't in mainline yet. So then there is another patch that removes part 
of that patch from mainline. So mainline is fine.

Now, three months later, the development branch is stable, and is fully 
merged. What happens?

Git will largely get this right. Git will look at the last *global* common 
base, and will just look at the contents, and do a reasonable job. Yes, 
there will probably be conflicts (because both the development branch and 
the mainline ended up touching the same parts of the files thanks to the 
cherry-pick, and yet mainline has some added hacks on top to disable it), 
but on the whole that's exactly what you want!

(Alternatively, maybe the "remove the part that wasn't supported yet" 
ended up meaning that that particular part of the patch was excised 
entirely from mainline, and there was no conflict at all, and git just 
merged the new stuff from the development branch cleanly! So I'm not 
saying that it *has* to conflict, I'm just saying that it might have).

In other words: git always "does the right thing". Assuming both branches 
are stable and working, git does a very reasonable thing. It's obviously 
not always the thing people may *want*, but it's guaranteed to be a 
reasonable and simple guess, and there's no way it's "too clever for its 
own good, and just screwed the pooch entirely".

In contrast, your suggested merge strategy would be HORRIBLY BROKEN!

Why? Because it doesn't look at the *common* history to the project, it 
looks at some per-file state that is totally bogus and has no relevance. 
Think it through: what happens if there were files with the same content 
(because of the cherry-pick), and then the file history for one of the 
branches was later changed to disable something because the support for it 
wasn't in the "whole history"?

Right: the final merge will contain that change! Because there as a time 
where the file was identical (the cherry-pick), so you're taking all the 
later changes to that file (the undo)!

Notice? Totally the wrong thing to do!

So this is a classic case of trying to make "easier" merges, but where the 
whole approach is totally broken! You simply MUST NOT add logic like that. 
It's a lot better to give a conflict, than to try to be "clever", and 
silently do the wrong thing.

Yes, you can be really stupid, and silently do the wrong thing too, but if 
you're stupid, at least the "silent wrong thing" is never really subtle, 
it's pretty much guaranteed to easily explainable. And the good news is 
that you didn't have a complicated and fragile algorithm just to get the 
wrong answer.

(Put another way: if you are always going to have situations where you get 
the wrong answer, you'd better take the simple and stupid algorithm, 
because people are more likely to then be able to _predict_ that wrong 
answer and are thus more ready to handle it!)

So being clever really is the wrong thing to do. And using history that 
isn't global and true history (ie just looking at one file, and deciding 
that matching that one file "means" something) is fundamnetally broken.

In fact, in general, individual pieces of history are totally worthless. 
The fact that some individual change was done in one branch doesn't really 
tell you *anything*. The reason that change was done may be implied by all 
the previous changes, or conversely, later changes may have undone the 
change, so any merge algorithm that starts to look at individual commits 
is likely to be pure and utter crap - exactly because it's starting to 
make decisions based on local information that may not be valid in the big 
picture.

(Where the "big picture" may be either about "space" - other files - or 
about "time" - other commits, that simply mean that the individual changes 
of one commit are meaningless on their own).

Btw, one thing to note is how well the simple and stupid git merge 
strategy works. It turns out that doing things with the "big picture" 
model actually does work really well. People think that they need 
"finegrained history" to make good merges, but I think most people who 
have actually done a fair number of merges with git have noticed that it's 
actually pretty dang painless.

But to be honest, there are cases where git isn't being very helpful. In 
particular, I think there *are* things that git could probably be more 
helpful with, but looking at local history is not one of them, I think.

So here are some suggestions on things I think we could improve on:

 - I think it would be wonderful to have a helper tool for handling 
   conflicts. 

   In particular, while I don't think per-file history is good for 
   resolving conflicts *automatically*, I actually do think that per-file 
   history can be a good way to *manually* resolve conflicts.

   In other words, it you have a conflict, I think it would be wonderful 
   to have some git-gui-like thing that can show the history (with 
   patches) for that file, and basically combine a three-way graphical 
   merge *with* some per-commit information where you can say "choose the 
   thing that that commit did for this conflict".

   So I think git already has tools to help resolve conflicts, and I 
   personally love doing "gitk --merge" or "git log -p --merge" when a 
   conflict does happen, but I think some smart GUI person could do 
   something even much better!

   And notice how I think that it's *really* wrong to use per-file history 
   automatically, but that I think it's not wrong at all to use it when 
   there's a human that says "ok, obviously pick that case". Things that 
   are horrible when they cause subtle and automatic resolves can be very 
   good when they cause subtle resolves that a human looked at!

 - I suspect we have issues with common whitespace changes, where we again 
   could probably help people resolve whitespace changes etc better. 

   Again, I don't think those are necessarily things you want to do 
   automatically, but I know from personal experience that handling things 
   like one side having done a re-indent can be *really* annoying, just 
   because you end up doing tons of mindless stuff when you fix up all the 
   totally idiotic and usually trivial conflicts.

.. and I'm sure there are other things we could do better too, but the 
above two are things that while they haven't happened for me for the 
kernel (probably because we have learnt how to not cause them over the 
years), I've seen them in other places.

And yes, the above two suggestions fall solidly in the "conflicts aren't 
bad per se, but you want to make the tool really help you resolve them!" 
camp. 

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-10-11 22:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-10  1:55 inexplicable failure to merge recursively across cherry-picks martin f krafft
2007-10-10  2:54 ` Linus Torvalds
2007-10-10 10:25   ` martin f krafft
2007-10-10 10:33     ` David Kastrup
2007-10-10 15:25     ` Linus Torvalds
2007-10-10 15:48       ` David Brown
2007-10-10 19:07         ` Miklos Vajna
2007-10-10 19:35           ` Linus Torvalds
2007-10-11  0:08         ` Miklos Vajna
2007-10-11 21:51       ` Sam Vilain
2007-10-11 22:33         ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).