Rebase, please help

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Rebase, please help
@ 2007-04-11  1:52 Alexander Litvinov
  2007-04-11  7:38 ` Junio C Hamano
  2007-04-11  7:48 ` Alex Riesen
  0 siblings, 2 replies; 8+ messages in thread
From: Alexander Litvinov @ 2007-04-11  1:52 UTC (permalink / raw)
  To: git

Hello list.

I have found that rebase have (new) option : --merge
Looking at the code show me that regular rebase is a simply format-patch and 
am but --merge (or -s) use some merge stratyegy to merge changes between two 
commits into current head.

What is --merge for ? Will the result be the same ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rebase, please help
  2007-04-11  1:52 Rebase, please help Alexander Litvinov
@ 2007-04-11  7:38 ` Junio C Hamano
  2007-04-11 10:10   ` Andy Parkins
  2007-04-11  7:48 ` Alex Riesen
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2007-04-11  7:38 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: git

Alexander Litvinov <litvinov2004@gmail.com> writes:

> I have found that rebase have (new) option : --merge
> Looking at the code show me that regular rebase is a simply format-patch and 
> am but --merge (or -s) use some merge stratyegy to merge changes between two 
> commits into current head.
>
> What is --merge for ? Will the result be the same ?

Regular "rebase" uses "format-patch" piped to "am -3", so if you
do not have renames the file-level patch conflict can be
resolved using the 3-way merge logic.  However, because we do
not give -M to format-patch, it does not deal with case where
you have renames in the series of commits you are rebasing, nor
where you have renames between the current base commit and the
commit you are rebasing onto (the latter won't be solved with
giving -M to format-patch anyway, so we do not even try).

In cases involving such renames, giving --merge option would
probably be nicer to work with.  It invokes merge-recursive
logic to deal with the renames.

I find that the regular rebase without --merge is faster (at
least it feels to me that it is, and I kind of understand why;
patch application to write out a tree is optimized to take
advantage of cache-tree extension, as opposed to merging three
trees which clobbers it), when there is no patch conflict.
Since most rebases do not involve patch conflict for me and
seldom involve rebases, I almost never use --merge myself, but
this would depend highly on personal taste and project.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rebase, please help
  2007-04-11  7:38 ` Junio C Hamano
@ 2007-04-11 10:10   ` Andy Parkins
  2007-04-12 20:37     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Parkins @ 2007-04-11 10:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Alexander Litvinov

On Wednesday 2007 April 11 08:38, Junio C Hamano wrote:

> I find that the regular rebase without --merge is faster (at
> least it feels to me that it is, and I kind of understand why;

This is interesting and brings to mind a difficult I've had.  I had problems 
with rebase when rebasing chains with a file that was self-similar.  Indulge 
me for a while with this example (forgive the C++, but that's where I had 
this problem):

class A : public C
{
   // ...

   int someVirtualOverride(n) { return ArrayA[n]; }

   // ...
}

class B : public C
{
   // ...

   int someVirtualOverride(n) { return ArrayB[n]; }

   // ...
}

One patch changed "ArrayX[n]" to "Array.at(n)" and another inserted more 
similar classes around these two.

When I was rebasing, some strange things happened (without any conflict 
warnings):

class D : public C
{
   int someVirtualOverride(n) { return ArrayA.at(n); }
}

class A : public C
{
   int someVirtualOverride(n) { return ArrayB.at(n); }
}

class B : public C
{
   int someVirtualOverride(n) { return ArrayB[n]; }
}

Notice that the arrays don't match up with the classes.  By some crazy 
coincidence and the strong similarity between localities within the file, the 
patch successfully applied in the wrong place.  The fix was easy enough to do 
manually, but it needed a bit of untangling as this was in a longish chain of 
revisions that I was rebasing.

I didn't mind much, and hence didn't report it as a bug as I guessed it was to 
do with git-rebase using git-am.  The annoying part was actually that there 
was no conflict warning and hence the rest of the chain applied, making it 
all the more difficult to untangle.

My question then is this: given that I don't care about speed of rebase, is it 
safe to permanently use --merge with rebase, and would that have caught the 
error in the above case?

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rebase, please help
  2007-04-11 10:10   ` Andy Parkins
@ 2007-04-12 20:37     ` Junio C Hamano
  2007-04-12 21:22       ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2007-04-12 20:37 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Alexander Litvinov, Linus Torvalds

Andy Parkins <andyparkins@gmail.com> writes:

> This is interesting and brings to mind a difficult I've had.
> I had problems with rebase when rebasing chains with a file
> that was self-similar.  Indulge me for a while with this
> example (forgive the C++, but that's where I had this
> problem):
>
> class A : public C
> {
>    // ...
>
>    int someVirtualOverride(n) { return ArrayA[n]; }
>
>    // ...
> }
>
> class B : public C
> {
>    // ...
>
>    int someVirtualOverride(n) { return ArrayB[n]; }
>
>    // ...
> }
>
> One patch changed "ArrayX[n]" to "Array.at(n)" and another inserted more 
> similar classes around these two.
>
> When I was rebasing, some strange things happened (without any conflict 
> warnings):
>
> class D : public C
> {
>    int someVirtualOverride(n) { return ArrayA.at(n); }
> }
>
> class A : public C
> {
>    int someVirtualOverride(n) { return ArrayB.at(n); }
> }
>
> class B : public C
> {
>    int someVirtualOverride(n) { return ArrayB[n]; }
> }
>
> Notice that the arrays don't match up with the classes.  By
> some crazy coincidence and the strong similarity between
> localities within the file, the patch successfully applied in
> the wrong place.

A patch that can ambiguously apply to multiple places is indeed
a problem, and in such situations merge based rebase is probably
safer as it can take advantage of the whole file as the context.

But it brings up another interesting point.  The ambiguous patch
is a problem even more so outside the context of rebasing, for
another reason.  When rebasing, you are dealing with your own
changes and you know what and how you want each of them to
change the tree state, as opposed to applying somebody else's
patch outside the context of rebasing.

When we only have the patch text (i.e. applymbox), there is no
"merge to use the whole file as the context" fallback.  I wonder
if this is a common enough problem that we would want to make it
safer somehow...

[jc: Since I happen to know somebody who applies more patches in
     one week than anybody else would ever apply in the lifetime
     ;-), I am CC'ing that person]

I can see two possible improvements.

 - On the diff generation side, we could notice that the hunk
   we are going to output can be applied to more than one
   location, and automatically widen the context for it.

   This is only a half-solution, as many patches do not even
   come from git.

 - Inside git-apply, apply_one_fragment(), ask find_offset() if
   the hunk can match more than one location, and exit with an
   error status (still writing out the patch results if it
   otherwise applies cleanly) so that the user can manually
   inspect and confirm.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rebase, please help
  2007-04-12 20:37     ` Junio C Hamano
@ 2007-04-12 21:22       ` Linus Torvalds
  0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2007-04-12 21:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andy Parkins, git, Alexander Litvinov

On Thu, 12 Apr 2007, Junio C Hamano wrote:
> Andy Parkins <andyparkins@gmail.com> writes:
> >
> > When I was rebasing, some strange things happened (without any conflict 
> > warnings):
> 
> A patch that can ambiguously apply to multiple places is indeed
> a problem, and in such situations merge based rebase is probably
> safer as it can take advantage of the whole file as the context.

Indeed. It's one of the reasons I think the default "patch" behaviour of 
allowing some fuzz in the patch is totally broken, and why "git apply" 
defaults to a much stricter "no context differences allowed" behaviour.

That still doesn't entirely get *rid* of the problem, but it at least 
makes it slightly less common. You can still get a patch that applies 
cleanly if you have certain multi-line patterns that are very common, and 
that end up causing patch to find the wrong place to apply a patch, and 
still make it look right enough.

This is just one reason why patch-based systems are totally and utterly 
broken, and why I detested the original cogito patch-based merging.

The three-way merge behaviour ("git rebase --merge") is a lot safer, but 
it's obviously more expensive too.

Also, the reason a lot of people like patch-based setups is not only is it 
efficient, but it turns out too many people prefer "clean merges" even 
*despite* the dangers. See the recent patches that were floating around on 
"stgit" that actually make applying patches default to the same 
*broken*defaults* as standard "patch" does (see the emails with subject 
line

	"Pass -C1 to git-apply in StGIT ..."

for more on that).

So a lot of people prefer the less strict patch application rules, because 
*most* of the time it actually does the right thing. Never mind that it 
makes a fundamental problem with patches even worse.

Me, I'd prefer to have the patch fail early. But even failing early does 
not _guarantee_ that it applies in the right place. 

> But it brings up another interesting point.  The ambiguous patch
> is a problem even more so outside the context of rebasing, for
> another reason.  When rebasing, you are dealing with your own
> changes and you know what and how you want each of them to
> change the tree state, as opposed to applying somebody else's
> patch outside the context of rebasing.
> 
> When we only have the patch text (i.e. applymbox), there is no
> "merge to use the whole file as the context" fallback.  I wonder
> if this is a common enough problem that we would want to make it
> safer somehow...

We could make "git apply" also refuse to move more than a few lines by 
default (ie not only does the context have to be exact, it has to actually 
show up where the patch claims it should be!)

git-apply still allows arbitrary line offset differences (see 
"find_offset()").

> I can see two possible improvements.
> 
>  - On the diff generation side, we could notice that the hunk
>    we are going to output can be applied to more than one
>    location, and automatically widen the context for it.
> 
>    This is only a half-solution, as many patches do not even
>    come from git.

It's not even a solution _within_ git. Since a patch will always apply at 
the right place if no changes have been done (because we always start 
looking at the line number where the patch fragment _claims_ it should 
go), the problem only occurs when independent changes have been done to 
the target.

And those independent changes may obviously be the ones that *introduce* 
the new location that the patch can (incorrectly) apply to.

>  - Inside git-apply, apply_one_fragment(), ask find_offset() if
>    the hunk can match more than one location, and exit with an
>    error status (still writing out the patch results if it
>    otherwise applies cleanly) so that the user can manually
>    inspect and confirm.

Yes, that might be a good idea, but is going to be pretty expensive. 
"find_offset()" wasn't exactly written to be a model of efficiency. See 
the comment about me not being one of the "smart and beautiful" people.

So you'd want to have some smarter method of finding potential places to 
apply the fragment if you want to do those kinds of things. Like creating 
hashes of the line contents in order to not have to compare the whole 
fragment..

And EVEN THEN it wouldn't actually solve the problem. The most common case 
is simply:
 - somebody *already* fixed the same bug by a very similar patch (or an 
   identical one), and thus the patch obviously won't apply, since the 
   place it *should* apply to got changed.
 - so git-apply will look for another place to apply it, and it's quite 
   possible that there is just one such place - even though it's the exact 
   wrong place!

So it really does boil down to: patches will sometimes falsely apply at 
the wrong place. That's just in the fundamental nature of patches. The 
same way a three-way merge can sometimes generate total crap, even when it 
merged totally cleanly.

It's unusual enough that most of the time it doesn't happen (and I think 
it happens less with three-way merges than with patches), and hopefully 
most of the time the error will be obvious enough that it gets noticed 
quickly (ie the badly patched sources simply won't build any more!).

But there is no practical way to guarantee it cannot happen. The only way 
to guarantee that patches apply correctly is to make sure the source file 
matches *exactly*. But that kind of defeats the whole point of sending 
patches around in the first place.

			Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rebase, please help
  2007-04-11  1:52 Rebase, please help Alexander Litvinov
  2007-04-11  7:38 ` Junio C Hamano
@ 2007-04-11  7:48 ` Alex Riesen
  2007-04-11  9:46   ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Alex Riesen @ 2007-04-11  7:48 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: git

On 4/11/07, Alexander Litvinov <litvinov2004@gmail.com> wrote:
>
> What is --merge for ? Will the result be the same ?

Maybe, maybe not. It uses merge strategies instead of git-am
and has advantages over blindly applying the patches (it can
know how a change got in, and it uses resolved conflict cache).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rebase, please help
  2007-04-11  7:48 ` Alex Riesen
@ 2007-04-11  9:46   ` Junio C Hamano
  2007-04-11 11:32     ` Alex Riesen
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2007-04-11  9:46 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Alexander Litvinov, git

"Alex Riesen" <raa.lkml@gmail.com> writes:

> On 4/11/07, Alexander Litvinov <litvinov2004@gmail.com> wrote:
>>
>> What is --merge for ? Will the result be the same ?
>
> Maybe, maybe not. It uses merge strategies instead of git-am
> and has advantages over blindly applying the patches (it can
> know how a change got in, and it uses resolved conflict cache).

I think "blindly applying the patches" is a gross overstatement,
as merge-recursive will get confused the same way as git-apply,
when a difference that comes from the two commits can be applied
to two places.  When the forward-ported change gets conflict,
3-way merge logic in "git-am -3" kicks in and does a fall back
to merge-recursive on a reconstructed tree that has only the
paths relevant to the case.

With "format-patch piped to am-3", we could give the -M option
to "format-patch" to deal with renames that happen in the series
you are rebasing, but renames between the bases (the original
base commit for the series and the new "onto" commit) is not
something it can handle sensibly.

That is the true advantage --merge has over "format-patch piped
to am-3", as it always drives merge-recursive and it can notice
renames between the two bases.

But always driving merge-recursive is also its weakness.  When
the series being rebased is simple and long, especially on a big
tree, applying many patches without conflicts tends to
outperform running the same number of merges, as the patch
application is tuned to take advantage of cache-tree while
read-tree based merge essentially trashes cache-tree, and has to
pay the full cost of write-tree for every commit it makes.

Also there is that small D/F conflict problem merge-recursive
has that I told you about, which does not exist in git-apply ;-)
Did you have a chance to take a look at it yet?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Rebase, please help
  2007-04-11  9:46   ` Junio C Hamano
@ 2007-04-11 11:32     ` Alex Riesen
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Riesen @ 2007-04-11 11:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alexander Litvinov, git

On 4/11/07, Junio C Hamano <junkio@cox.net> wrote:
> Also there is that small D/F conflict problem merge-recursive
> has that I told you about, which does not exist in git-apply ;-)
> Did you have a chance to take a look at it yet?

not yet. I was greatly distracted by subprojects

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-04-12 21:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-11  1:52 Rebase, please help Alexander Litvinov
2007-04-11  7:38 ` Junio C Hamano
2007-04-11 10:10   ` Andy Parkins
2007-04-12 20:37     ` Junio C Hamano
2007-04-12 21:22       ` Linus Torvalds
2007-04-11  7:48 ` Alex Riesen
2007-04-11  9:46   ` Junio C Hamano
2007-04-11 11:32     ` Alex Riesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).