rebase-with-history -- a technique for rebasing without trashing your repo history

All of lore.kernel.org
 help / color / mirror / Atom feed

* rebase-with-history -- a technique for rebasing without trashing your repo history
@ 2009-08-13 12:46 Michael Haggerty
  2009-08-13 16:12 ` Björn Steinbrink
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Michael Haggerty @ 2009-08-13 12:46 UTC (permalink / raw)
  To: Bazaar, Git Mailing List, mercurial mailing list

Sorry to cross-post, but I think this might be interesting to all three
projects...

I've been thinking a lot about the problems of tracking upstream changes
while developing a feature branch.  As I think everybody knows, both
rebasing and merging have serious disadvantages for this use case.
Rebasing discards history and makes it difficult to share
work-in-progress with others, whereas merging makes it difficult to
prepare a clean patch series that is suitable for submission upstream.

I've written some articles describing another possibility, which
combines the advantages of both methods.  The key idea is to retain
rebase history correctly, on a patch-by-patch level.  The resulting DAG
retains enough history to prevent problems with merge conflicts
downstream, while also allowing the patch series to be kept tidy.

(Please note that this technique only works for the typical "tracking
upstream" type of rebase; it doesn't help with rebases whose goals are
changing the order of commits, moving only part of a branch, rewriting
commits, etc.)

For more information, please see the full articles:

* A truce in the merge vs. rebase war? [1]
* Upstream rebase Just Works™ if history is retained [2]
* Rebase with history -- implementation ideas [3]

I'd appreciate feedback!

Michael

[1]
http://softwareswirl.blogspot.com/2009/04/truce-in-merge-vs-rebase-war.html
[2]
http://softwareswirl.blogspot.com/2009/08/upstream-rebase-just-works-if-history.html
[3]
http://softwareswirl.blogspot.com/2009/08/rebase-with-history-implementation.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing your repo history
  2009-08-13 12:46 rebase-with-history -- a technique for rebasing without trashing your repo history Michael Haggerty
@ 2009-08-13 16:12 ` Björn Steinbrink
  2009-08-13 22:39   ` Michael Haggerty
  2009-08-13 17:39 ` Bryan O'Sullivan
  2009-08-13 20:31 ` Abderrahim Kitouni
  2 siblings, 1 reply; 10+ messages in thread
From: Björn Steinbrink @ 2009-08-13 16:12 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Bazaar, Git Mailing List, mercurial mailing list

On 2009.08.13 14:46:07 +0200, Michael Haggerty wrote:
> Sorry to cross-post, but I think this might be interesting to all three
> projects...
> 
> I've been thinking a lot about the problems of tracking upstream changes
> while developing a feature branch.  As I think everybody knows, both
> rebasing and merging have serious disadvantages for this use case.
> Rebasing discards history and makes it difficult to share
> work-in-progress with others, whereas merging makes it difficult to
> prepare a clean patch series that is suitable for submission upstream.
> 
> I've written some articles describing another possibility, which
> combines the advantages of both methods.  The key idea is to retain
> rebase history correctly, on a patch-by-patch level.  The resulting DAG
> retains enough history to prevent problems with merge conflicts
> downstream, while also allowing the patch series to be kept tidy.
> 
> (Please note that this technique only works for the typical "tracking
> upstream" type of rebase; it doesn't help with rebases whose goals are
> changing the order of commits, moving only part of a branch, rewriting
> commits, etc.)

Hm, so that pretty much doesn't work at all for creating a clean patch
series, which usually involves rewriting commits, squasing bug fixes
into the original commits that introduced the bug etc.

And even for just continously forward porting a series of commits, a
common case might be that upstream applied some patches, but not all.
Can you deal with that?

Example:

A---B---C (upstream)
     \
      H---I---J---K (yours)

Upstream takes some changes:

A---B---C---I'--K'--D (upstream)
     \
      H---I---J---K (yours)

rebase leads to:

A---B---C---I'--K'--D (upstream)
                     \
                      H'--J' (yours)

What would your approach generate in that case?

> For more information, please see the full articles:
> 
> * Upstream rebase Just Works™ if history is retained [2]

In this one you have two DAGs:
(I fixed the second one to also have the merge commit in "subsystem"
instead of "topic", so they only differ WRT to the rebased stuff)

A)
m---N---m---m---m---m---m---M  (master)
     \                       \
      o---o---O---o---o       o'--o'--o'--o'--o'--S  (subsystem)
                       \                         /
                        *---*---*-..........-*--T (topic)

B)
m---N---m---m---m---m---m---M  (master)
     \                       \
      \                       o'--o'--o'--o'--o'----------S  (subsystem)
       \                     /   /   /   /   /           /
        --------------------o---o---O---o---o---*---*---T (topic)

And you say that the former creates problems when you want to merge
again. How so?

Merging "master" to "subsystem" is a no-op in both cases.
Merging "subsystem" to "master" is a fast-forward in both cases.
Merging "subsystem" to "topic" is a fast-forward in both cases.
Merging "topic" to "subsystem" is a no-op in both cases.

Merging "topic" and "master" (in either direction) has merge base N in
both cases.

Let's assume that there's another dev, having his own history based on
the old "O" commit. So:

A)
m---N---m---m---m---m---m---M  (master)
     \                       \
      o---o---O---o---o       o'--o'--o'--o'--o'--S  (subsystem)
               \       \                         /
                \       *---*---*-..........-*--T (topic)
                 \
                  X---Y---Z (outsider)

B)
m---N---m---m---m---m---m---M  (master)
     \                       \
      \                       o'--o'--o'--o'--o'----------S  (subsystem)
       \                     /   /   /   /   /           /
        --------------------o---o---O---o---o---*---*---T (topic)
                                     \
                                      X---Y---Z (outsider)

Merging "master" and "outsider" has merge base N in both cases.
Merging "subsystem" and "outsider" has merge base O in both cases.
Merging "topic" and "outsider" has merge base O in both cases.

The only thing that really makes a difference is when you have another
dev having based his history upon one of the o' commits. If that history
is merged with "topic", then you get merge base N in A) and one of the
o's in B).

But, in that case, merging "topic" is basically just a complicated way
of merging "subsystem". Both contain the full series of "o" commits, and
all the "topic" commits (due to the merge). So you could just trivially
merge "subsystem" instead, which leads the o' commit as the merge base
in both cases.

And, that merge of "topic" to "subsystem" was wrong to begin with. If
you rewrite history, that has to trickle down. So A) should really have
been:

m---m---m---m-...--m---m (master)
                        \
                         o'--o'-...-o'--*'--*'--*' (topic) (subsystem)

(topic was rebased, and subsystem fast-forwarded)

So AFAICT, what your system achieves WRT ease of rebasing is that it
obsoletes the need to use "--onto" with git's rebase.

Instead of "git rebase --onto subsystem old_subsystem", you can just say
"git rebase subsystem", at the cost of a very complicated DAG.

Björn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing your repo history
  2009-08-13 16:12 ` Björn Steinbrink
@ 2009-08-13 22:39   ` Michael Haggerty
  2009-08-13 23:30     ` Björn Steinbrink
  2009-08-14  3:17     ` Sitaram Chamarty
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Haggerty @ 2009-08-13 22:39 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: Git Mailing List

Björn Steinbrink wrote:
> On 2009.08.13 14:46:07 +0200, Michael Haggerty wrote:
>> (Please note that this technique only works for the typical "tracking
>> upstream" type of rebase; it doesn't help with rebases whose goals are
>> changing the order of commits, moving only part of a branch, rewriting
>> commits, etc.)
> 
> Hm, so that pretty much doesn't work at all for creating a clean patch
> series, which usually involves rewriting commits, squasing bug fixes
> into the original commits that introduced the bug etc.

Now that you mention it, there are some other uses of rebase whose
history could be recorded correctly, or at least better, in the DAG.  I
am not ready to advocate any of these changes, but I think they are
worth discussing.

A squash of two adjacent commits currently transforms this:

A---B1---B2---C

to this

A---B12---C'

(C' has the same contents and commit message as C but a different
history and therefore a different SHA1.)

But B12 includes both B1 and B2, so the correct history is this:

A---B1---B2----C
 \         \    \
  ---------B12---C'

The fact that B12 has B1 and B2 as ancestors tells git that it
incorporates both of their changes, which correctly describes reality.

Splitting a commit (not really an elementary rebase operation, but
achievable with "edit") transforms this:

A---B12---C

to this:

A---B1---B2---C'

It is not possible to represent this correctly in the DAG, because there
is no way to express that "B1" includes part of "B12" as a parent.  But
the following would be more accurate than discarding all history:

A---B12---C
 \     \   \
  B1---B2---C'

It would of course be difficult for the user-interface layer to be
confident that the changes in B1 and B2 are really equivalent to B12
unless the content of B12 and B2 are identical.

Inserting a new commit into the history (for example as part of
reordering later commits) transforms this:

A---C

to this:

A---B---C'

In this case C' includes everything that is in C, so the correct history is

A---C
 \   \
  B---C'

However, deleting a commit, which transforms this:

A---B---C

to this:

A---C'

cannot be represented in the DAG, because there is no way to express
that C' includes the changes from C without also implying that it
includes the changes from B.

Rewriting a single commit, under the assumption that the rewritten
commit is the logical equivalent of the original, transforms this:

A---B1---C

to this:

A---B2---C'

where B2 is a hand-rewritten version of B1, and C' is the version of C
produced by the rebase.  In this case, the history could be recorded as:

A---B1---C
 \    \   \
  ----B2---C'

but again, it is impossible for the user-interface layer to ascertain
that B2 is equivalent to B1 without help from the user.

All of this extra history would currently create far more clutter than
it is worth, but if there were a way to suppress the display of rebased
commits (as discussed in the third article I quoted), then the extra
information would be there to help git without overwhelming users.

> And even for just continously forward porting a series of commits, a
> common case might be that upstream applied some patches, but not all.
> Can you deal with that?
> 
> Example:
> 
> A---B---C (upstream)
>      \
>       H---I---J---K (yours)
> 
> Upstream takes some changes:
> 
> A---B---C---I'--K'--D (upstream)
>      \
>       H---I---J---K (yours)
> 
> rebase leads to:
> 
> A---B---C---I'--K'--D (upstream)
>                      \
>                       H'--J' (yours)
> 
> What would your approach generate in that case?

There *is no way* to represent this history in a DAG, and therefore the
history of this operation will necessarily be lost.  (Well, of course it
could be recorded in metadata supplemental to the DAG, but since the
history would not affect future merges it would be pointless.)  The
problem is that there is no way to claim that I' is derived from I
without also implying that I' includes the change in H (which it
doesn't).  I discuss this sort of thing in another article [1].

>> For more information, please see the full articles: [...]
> 
> In this one you have two DAGs:
> (I fixed the second one to also have the merge commit in "subsystem"
> instead of "topic", so they only differ WRT to the rebased stuff)
> 
> A)
> m---N---m---m---m---m---m---M  (master)
>      \                       \
>       o---o---O---o---o       o'--o'--o'--o'--o'--S  (subsystem)
>                        \                         /
>                         *---*---*-..........-*--T (topic)
> 
> 
> B)
> m---N---m---m---m---m---m---M  (master)
>      \                       \
>       \                       o'--o'--o'--o'--o'----------S  (subsystem)
>        \                     /   /   /   /   /           /
>         --------------------o---o---O---o---o---*---*---T (topic)
> 
> 
> And you say that the former creates problems when you want to merge
> again. How so?

As you very clearly showed (thanks!), the merge problems that I claimed
only occur in some obscure edge cases.

What I *should* have emphasized is that the merge S itself is much more
prone to conflicts in case A) (with merge base N) than in case B) (with
the last "o" as merge base).  That is the first advantage of
rebase-with-history.

And please note that I really advocate C), not B):

C)
m---N---m---m---m---m---m---M  master
     \                       \
      \                       o'--o'--o'--o'--o'  subsystem
       \                     /   /   /   /   / \
        --------------------o---o---O---o---o   \
                                             \   \
                                              \   *'--*'--*'  topic
                                               \ /   /   /
                                                *---*---*

where the topic branch is not merged into the subsystem branch but
rather rebased-with-history.  C) has the significant advantage over A)
or B) that the topic branch can be converted to a series of patches (the
*' patches) that apply cleanly to the rebased subsystem branch and can
therefore be submitted upstream.  In the case of A) or B), the only
available patch that applies cleanly to the rebased subsystem branch is
S, which is a single commit that squashes together the entire topic
branch and is therefore difficult to review.

So rebasing in a public repository makes it difficult for downstream
developers to apply their work to the rebased branch (because they have
to repeat the conflict resolution that was done in the upstream rebase),
and merging in a topic branch makes it more difficult to create an
easily-reviewable patch series.  rebase-with-history has neither of
these problems.

Michael

[1] "Git, Mercurial, and Bazaar—simplicity through inflexibility",
http://softwareswirl.blogspot.com/2009/08/git-mercurial-and-bazaarsimplicity.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing your repo history
  2009-08-13 22:39   ` Michael Haggerty
@ 2009-08-13 23:30     ` Björn Steinbrink
  2009-08-14 21:21       ` Michael Haggerty
  2009-08-14  3:17     ` Sitaram Chamarty
  1 sibling, 1 reply; 10+ messages in thread
From: Björn Steinbrink @ 2009-08-13 23:30 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Git Mailing List

On 2009.08.14 00:39:48 +0200, Michael Haggerty wrote:
> Björn Steinbrink wrote:
> > On 2009.08.13 14:46:07 +0200, Michael Haggerty wrote:
> > And even for just continously forward porting a series of commits, a
> > common case might be that upstream applied some patches, but not all.
> > Can you deal with that?
> > 
> > Example:
> > 
> > A---B---C (upstream)
> >      \
> >       H---I---J---K (yours)
> > 
> > Upstream takes some changes:
> > 
> > A---B---C---I'--K'--D (upstream)
> >      \
> >       H---I---J---K (yours)
> > 
> > rebase leads to:
> > 
> > A---B---C---I'--K'--D (upstream)
> >                      \
> >                       H'--J' (yours)
> > 
> > What would your approach generate in that case?
> 
> There *is no way* to represent this history in a DAG, and therefore the
> history of this operation will necessarily be lost.  (Well, of course it
> could be recorded in metadata supplemental to the DAG, but since the
> history would not affect future merges it would be pointless.)  The
> problem is that there is no way to claim that I' is derived from I
> without also implying that I' includes the change in H (which it
> doesn't).  I discuss this sort of thing in another article [1].

Well, I' isn't even interesting here. I' is _upstream's_ commit, likely
created from an email. Upstream didn't have "I" when I' was created to
begin with. The interesting commits are H' and J', which were actually
rebased. And the situation is even "worse" than for I' or K'. H' does
include H, I, and K, but not J and J' contains H, I, J and K.

So possibly you'd have:

A---B---C---I'--K'--D (upstream)
     \               \
      \       --------H'--J'
       \     /           /
        H---I---J---K----

Which only ignores that K is "contained" in H'.

But you only get to know that I is in H' (and that K is in J') _after_
H' (or J') have been created. So you'd need a preprocessing run.

The non-preprocessing result would be something like:

A---B---C---I'--K'--D (upstream)
     \               \
      \  -------------H'--J'
       \/                /
        H---I---J--------
                 \
                  K

But that's obviously total crap.

> >> For more information, please see the full articles: [...]
> > 
> > In this one you have two DAGs:
> > (I fixed the second one to also have the merge commit in "subsystem"
> > instead of "topic", so they only differ WRT to the rebased stuff)
> > 
> > A)
> > m---N---m---m---m---m---m---M  (master)
> >      \                       \
> >       o---o---O---o---o       o'--o'--o'--o'--o'--S  (subsystem)
> >                        \                         /
> >                         *---*---*-..........-*--T (topic)
> > 
> > 
> > B)
> > m---N---m---m---m---m---m---M  (master)
> >      \                       \
> >       \                       o'--o'--o'--o'--o'----------S  (subsystem)
> >        \                     /   /   /   /   /           /
> >         --------------------o---o---O---o---o---*---*---T (topic)
> > 
> > 
> > And you say that the former creates problems when you want to merge
> > again. How so?
> 
> As you very clearly showed (thanks!), the merge problems that I claimed
> only occur in some obscure edge cases.
> 
> What I *should* have emphasized is that the merge S itself is much more
> prone to conflicts in case A) (with merge base N) than in case B) (with
> the last "o" as merge base).  That is the first advantage of
> rebase-with-history.

Yeah, but as I said, actually, topic should have been rebased, not
merged, and then that ends up the same, as you'd do:
git rebase --onto subsystem $last_o topic

Taking the last o commit as upstream.

> And please note that I really advocate C), not B):
> 
> C)
> m---N---m---m---m---m---m---M  master
>      \                       \
>       \                       o'--o'--o'--o'--o'  subsystem
>        \                     /   /   /   /   / \
>         --------------------o---o---O---o---o   \
>                                              \   \
>                                               \   *'--*'--*'  topic
>                                                \ /   /   /
>                                                 *---*---*
> 

Fine, but this should then be compared to the result from the above
rebase command, which is:

D)
m--...--m (master)
         \
          o'-...-o' (subsystem)
                  \
                  *'-...-*' (topic)

> where the topic branch is not merged into the subsystem branch but
> rather rebased-with-history.  C) has the significant advantage over A)
> or B) that the topic branch can be converted to a series of patches (the
> *' patches) that apply cleanly to the rebased subsystem branch and can
> therefore be submitted upstream.  In the case of A) or B), the only
> available patch that applies cleanly to the rebased subsystem branch is
> S, which is a single commit that squashes together the entire topic
> branch and is therefore difficult to review.

Same for D)

> So rebasing in a public repository makes it difficult for downstream
> developers to apply their work to the rebased branch (because they have
> to repeat the conflict resolution that was done in the upstream rebase),

There's no difference between C) and D) there, except for the fact that
D) requires you to use --onto, because that needs to differ from
<upstream>.

Let's take the pre-topic-rebase history:

m---m---m (master)
     \   \
      \   o'--o'--O' (subsystem)
       \
        o---o---O---*---*---* (topic)

Doing a plain "git rebase subsystem topic" would of course also try to
rebase the "o" commits, so that problematic. Instead, you do:

git rebase --onto subsystem O topic

That turns O..topic (the * commits) into patches, and applies them on
top of O'. So the "o" commits aren't to be rebased.

And that's exactly what your rebase-with-history would do as well. Just
that O is naturally a common ancestor of subsystem and topic, and so
just using "git rebase-w-h subsystem topic" would be enough. Conflicts
etc. should be 100% the same.

If you know that your upstream is going to rebase/rewrite history, you
can tag (or otherwise mark) the current branching point of your branch,
so you can easily specify it for the --onto rebase. IOW: This is
primarily a social problem (tell your downstream that you rebase this or
that branch), but having built-in support to store the branching point
for rebasing _might_ be worth a thought.

> and merging in a topic branch makes it more difficult to create an
> easily-reviewable patch series.  rebase-with-history has neither of
> these problems.

Sure, merging is a no-go if you submit patches by email (or other,
similar means). But you compared that to an "enhanced" rebase approach,
instead of comparing your rebase approach to the currently available
one.

So, as I see it, your approach does:
 * Save the need to use --onto, allowing to just specify <upstream>, as
   if <upstream> was not rewritten.
 * Allows to keep older versions of commits more easily accessible for
   inspection, e.g. creating interdiffs.

The latter is (to me) of limited use, and the former could be done by
tracking the branching point, not sure how well it work out, but maybe
worth investigating.

Björn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing your repo history
  2009-08-13 23:30     ` Björn Steinbrink
@ 2009-08-14 21:21       ` Michael Haggerty
  2009-08-14 21:40         ` Nanako Shiraishi
  2009-08-15  3:36         ` Björn Steinbrink
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Haggerty @ 2009-08-14 21:21 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: Git Mailing List

Björn Steinbrink wrote:
> On 2009.08.14 00:39:48 +0200, Michael Haggerty wrote:
>> Björn Steinbrink wrote:
>>> On 2009.08.13 14:46:07 +0200, Michael Haggerty wrote:
>>> And even for just continously forward porting a series of commits, a
>>> common case might be that upstream applied some patches, but not all.
>>> Can you deal with that?

[A discussion of various unsatisfactory approaches omitted...]

> But that's obviously total crap.

So I think we agree that it is not possible to retain history for a case
like this (which is essentially a general cherry-pick).

> [...]
> Doing a plain "git rebase subsystem topic" would of course also try to
> rebase the "o" commits, so that problematic. Instead, you do:
> 
> git rebase --onto subsystem O topic
> 
> That turns O..topic (the * commits) into patches, and applies them on
> top of O'. So the "o" commits aren't to be rebased.
> 
> And that's exactly what your rebase-with-history would do as well. Just
> that O is naturally a common ancestor of subsystem and topic, and so
> just using "git rebase-w-h subsystem topic" would be enough. Conflicts
> etc. should be 100% the same.
> 
> If you know that your upstream is going to rebase/rewrite history, you
> can tag (or otherwise mark) the current branching point of your branch,
> so you can easily specify it for the --onto rebase. IOW: This is
> primarily a social problem (tell your downstream that you rebase this or
> that branch), but having built-in support to store the branching point
> for rebasing _might_ be worth a thought.

Recording branch points manually, coordinating merges via email -- OMG
you are giving me flashbacks of CVS ;-)

*Of course* you can get around all of these problems if you put the
burden of bookkeeping on the user.  The whole point of
rebase-with-history is to have the VCS handle it automatically!

>> and merging in a topic branch makes it more difficult to create an
>> easily-reviewable patch series.  rebase-with-history has neither of
>> these problems.
> 
> Sure, merging is a no-go if you submit patches by email (or other,
> similar means). But you compared that to an "enhanced" rebase approach,
> instead of comparing your rebase approach to the currently available
> one.

In [1] I compared rebase-with-history with both of the
currently-available options (rebase and merge).  Rebase and merge can
each deal with some of the issues that come up, but each one falls flat
on others.  I believe that rebase-with-history has the advantages of both.

The example in [2] was taken straight from the git-rebase man page [3];
I did not want to claim that current practice would use merging in this
situation, but rather just to show that rebase-with-history removes the
pain from this well-known example.

I think we are mostly in agreement.  Rebase-with-history is obviously
not an earth-shattering revolution in DVCS technology, but my hope is
that it could unobtrusively assist with a few minor pain points.

Michael

[1]
http://softwareswirl.blogspot.com/2009/04/truce-in-merge-vs-rebase-war.html
[2]
http://softwareswirl.blogspot.com/2009/08/upstream-rebase-just-works-if-history.html
[3] http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing your repo history
  2009-08-14 21:21       ` Michael Haggerty
@ 2009-08-14 21:40         ` Nanako Shiraishi
  2009-08-15  3:36         ` Björn Steinbrink
  1 sibling, 0 replies; 10+ messages in thread
From: Nanako Shiraishi @ 2009-08-14 21:40 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Bjrn Steinbrink, Git Mailing List

Quoting Michael Haggerty <mhagger@alum.mit.edu>

> In [1] I compared rebase-with-history with both of the
> currently-available options (rebase and merge).  Rebase and merge can
> each deal with some of the issues that come up, but each one falls flat
> on others.  I believe that rebase-with-history has the advantages of both.
> ....  Rebase-with-history is obviously
> not an earth-shattering revolution in DVCS technology, but my hope is
> that it could unobtrusively assist with a few minor pain points.

The saddest part is that your [1] works only in a case a user can easily handle manually, and doesn't help cases more complex than the most trivial ones, such as reordering and squashing commits, where the user may benefit if an automated support from VCS were available.

-- 
Nanako Shiraishi
http://ivory.ap.teacup.com/nanako3/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing your repo history
  2009-08-14 21:21       ` Michael Haggerty
  2009-08-14 21:40         ` Nanako Shiraishi
@ 2009-08-15  3:36         ` Björn Steinbrink
  1 sibling, 0 replies; 10+ messages in thread
From: Björn Steinbrink @ 2009-08-15  3:36 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Git Mailing List

On 2009.08.14 23:21:01 +0200, Michael Haggerty wrote:
> Björn Steinbrink wrote:
> > On 2009.08.14 00:39:48 +0200, Michael Haggerty wrote:
> >> Björn Steinbrink wrote:
> >>> On 2009.08.13 14:46:07 +0200, Michael Haggerty wrote:
> > [...]
> > Doing a plain "git rebase subsystem topic" would of course also try to
> > rebase the "o" commits, so that problematic. Instead, you do:
> > 
> > git rebase --onto subsystem O topic
> > 
> > That turns O..topic (the * commits) into patches, and applies them on
> > top of O'. So the "o" commits aren't to be rebased.
> > 
> > And that's exactly what your rebase-with-history would do as well. Just
> > that O is naturally a common ancestor of subsystem and topic, and so
> > just using "git rebase-w-h subsystem topic" would be enough. Conflicts
> > etc. should be 100% the same.
> > 
> > If you know that your upstream is going to rebase/rewrite history, you
> > can tag (or otherwise mark) the current branching point of your branch,
> > so you can easily specify it for the --onto rebase. IOW: This is
> > primarily a social problem (tell your downstream that you rebase this or
> > that branch), but having built-in support to store the branching point
> > for rebasing _might_ be worth a thought.
> 
> Recording branch points manually, coordinating merges via email -- OMG
> you are giving me flashbacks of CVS ;-)

Not merging, but rewriting history. One of the primary purposes of
rebasing is to forget the old history, the new version overrides it. And
telling someone to forget something is a social problem. You can help
the user to forget the history by tracking the branching points and I
said that git could maybe learn to do that, so the user doesn't have
to do so. Quick idea:

On branch creation, create refs/bases/<branchname> (let's call that
<base>) referencing the commit the branch initially references.

On rebase, check if <branchname>..<onto> is not empty. If so, update
refs/bases/<branchname> to reference <base>.

On reset, check if the commit the branch head is being reset to is
reachable through the commit the branch head currently references. If
not, update <base> to reference the commit we're resetting to.

Find some sane syntax for rebase that implicitly uses <base> as the
<upstream> argument, e.g. just "git rebase --onto <whatever>" could work
as "git rebase --onto <whatever> <base>".

Most likely, I missed a bunch of corner cases though...

> *Of course* you can get around all of these problems if you put the
> burden of bookkeeping on the user.  The whole point of
> rebase-with-history is to have the VCS handle it automatically!

What your approach does, is simply moving the "just forget the
history" part. Instead of forgetting it at rebase time, you have to
forget it when you want to submit patches. It's obviously a bit easier
though, as you can just say "--first-parent <upstream>", assuming that
you teach format-patch to use a special first-parent diff mode for the
merge commits (see below).

> >> and merging in a topic branch makes it more difficult to create an
> >> easily-reviewable patch series.  rebase-with-history has neither of
> >> these problems.
> > 
> > Sure, merging is a no-go if you submit patches by email (or other,
> > similar means). But you compared that to an "enhanced" rebase approach,
> > instead of comparing your rebase approach to the currently available
> > one.
> 
> In [1] I compared rebase-with-history with both of the
> currently-available options (rebase and merge).  Rebase and merge can
> each deal with some of the issues that come up, but each one falls flat
> on others.  I believe that rebase-with-history has the advantages of both.

And some disadvantages.

1) Cluttered history, which needs to be rewritten again when the emailed
patches are just for review, but the maintainer will actually merge from
you later.

Taking the old master, subsystem, topic example, you get (for example):

          o2--o2 (subsystem)
         /     \
m---m---m---m---m (master)
     \   \
      \   o'--o'
       \ /   / \
        o---o   *'--*' (topic)
             \ /   /
              *---*

Now the user that maintains "topic" is back at the hard case. He now
needs to rebase onto master, using the last o' as <upstream>. The
DAG doesn't help here, the base-tracking would handle that.

2) Merge commits, which are usually displayed in a special format. So
for "git show" or "git log -p" to give useful output for those special
merges, you'd have to introduce a new "diff only against first-parent"
mode, and mark those merge in a special way, so that diff mode is used
for them, but not for real merges. And users of old git versions would
have to deal with the basic -m merge diff mode, ignoring the useless
diff for the second parent and the fact that the real merges also get
shown in that format. The base tracking doesn't have this problem
either.

> The example in [2] was taken straight from the git-rebase man page [3];
> I did not want to claim that current practice would use merging in this
> situation, but rather just to show that rebase-with-history removes the
> pain from this well-known example.

Well, the man pages says: Don't merge, rebase needs to trickle down, but
you'll likely need to use "git rebase --onto subsystem subsystem@{1}".
So the rebase-with-history really just saves that "use --onto and the
right <upstream>" from the hard case. The plain base-tracking does the
same.

Another way to reach the same goal would be just to explictly override
the old history.

m---m---m (master)
     \
      o---o (subsystem)
           \
            *---* (topic)

(Hypothetical): git rebase --override master subsystem

Leads to:

m---m---m---- (master)
     \       \
      o---o---O---o'--o' (subsystem)
           \
            *---* (topic)

Where O is an --ours merge, that just marks the old o commits as merged,
but has the same tree as the last m commit.

Now topic can be rebase using: git rebase --override subsystem topic

m---m---m---- (master)
     \       \
      o---o---O---o'--o' (subsystem)
           \           \
            *---*-------X---*'--*' (topic)

Again, X being an ours merge.

As the O and X commits have the last o and * commits as their second
parents, this even doesn't break things like "git show" and "git log
-p", as the interesting commits aren't merge commits. So "git
log -p --first-parent subsystem..topic" would do the right thing
(optionally with --no-merges to avoid the merge commit, but seeing that
doesn't hurt that much I guess).

This also trivially supports the reorder, squash, edit whatever stuff,
as it doesn't rely on 1:1 commit counterparts to exist. But it also
falls flat on its face as soon as subsystem gets "really" rewritting, so
that the old history is no longer reachable from the new history.

Björn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing  your repo history
  2009-08-13 22:39   ` Michael Haggerty
  2009-08-13 23:30     ` Björn Steinbrink
@ 2009-08-14  3:17     ` Sitaram Chamarty
  1 sibling, 0 replies; 10+ messages in thread
From: Sitaram Chamarty @ 2009-08-14  3:17 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Björn Steinbrink, Git Mailing List

Hi,

I'm one of those wannabe experts who thinks he knows enough about git
to teach people in his workplace but obviously pales in this group,
but with that caveat, let me say:

2009/8/14 Michael Haggerty <mhagger@alum.mit.edu>:

> Now that you mention it, there are some other uses of rebase whose
> history could be recorded correctly, or at least better, in the DAG.  I
> am not ready to advocate any of these changes, but I think they are

I see you've made your own caveat :-)

> worth discussing.

[snip]

> A---B1---B2----C
>  \         \    \
>  ---------B12---C'

> A---B12---C
>  \     \   \
>  B1---B2---C'

[snip]

> A---C
>  \   \
>  B---C'

[etc etc... many such snipped]

To me, the ability to *forget* the mistakes I made (for whatever
definition of "mistake" you wish) as long as it's private to my repo,
is one of the main attractions of git.  I'm one of those guys who
saves early, and saves often, when editing files.  This translates to
commit early, commit often, in the git world.

I see no earthly reason why I would ever *want* those commits
preserved, so I hope that, if this sort of thing ever gets into the
code, it is definitely *not* the default :-)  It is not sufficient for
me that the GUI knows how to suppress their display, it is necessary
that they *disappear completely*.

And that reminds me.  You often hear people on #git ask how to get rid
of some files (maybe containing passwords etc) that inadvertently got
into the repo, and the answer, a lot of the time, is filter-branch,
because the "bad" commit is pretty old.  I suspect that for every
person who asks that question on the list because he already pushed,
there are 4 who discovered such an error much earlier, (when the file
went into only a couple of commits at the top maybe), did a rebase -i
with "edit" or whatever, and got rid of the evidence, err I mean
password :-)  If this sort of thing were to be the default, they'd
have to use a filter-branch even for such simple cases.

Finally, speaking as someone who teaches git, this adds enormous
complexity to the basic concepts.  Complexity is good when the
benefits are obvious, but to me they are not obvious [see *my* caveat
at the top before you react to this statement]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing  your repo history
  2009-08-13 12:46 rebase-with-history -- a technique for rebasing without trashing your repo history Michael Haggerty
  2009-08-13 16:12 ` Björn Steinbrink
@ 2009-08-13 17:39 ` Bryan O'Sullivan
  2009-08-13 20:31 ` Abderrahim Kitouni
  2 siblings, 0 replies; 10+ messages in thread
From: Bryan O'Sullivan @ 2009-08-13 17:39 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Bazaar, mercurial mailing list, Git Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 253 bytes --]

On Thu, Aug 13, 2009 at 5:46 AM, Michael Haggerty <mhagger@alum.mit.edu>wrote:

> Sorry to cross-post, but I think this might be interesting to all three
> projects...
>

Please do not cross post, no matter how interesting you think the topic
might be.

[-- Attachment #1.2: Type: text/html, Size: 526 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: rebase-with-history -- a technique for rebasing without trashing  your repo history
  2009-08-13 12:46 rebase-with-history -- a technique for rebasing without trashing your repo history Michael Haggerty
  2009-08-13 16:12 ` Björn Steinbrink
  2009-08-13 17:39 ` Bryan O'Sullivan
@ 2009-08-13 20:31 ` Abderrahim Kitouni
  2 siblings, 0 replies; 10+ messages in thread
From: Abderrahim Kitouni @ 2009-08-13 20:31 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Bazaar, mercurial mailing list, Git Mailing List

2009/8/13 Michael Haggerty <mhagger@alum.mit.edu>:
> I've been thinking a lot about the problems of tracking upstream changes
> while developing a feature branch.
Isn't this the purpose of pbranch [1] (and I beleive bzr's loom[2] and
git's topgit[3])?

Peace,
Abderrahim

[1] http://arrenbrecht.ch/mercurial/pbranch/
[2] https://launchpad.net/bzr-loom
[3]http://repo.or.cz/w/topgit.git

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-08-15  3:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-13 12:46 rebase-with-history -- a technique for rebasing without trashing your repo history Michael Haggerty
2009-08-13 16:12 ` Björn Steinbrink
2009-08-13 22:39   ` Michael Haggerty
2009-08-13 23:30     ` Björn Steinbrink
2009-08-14 21:21       ` Michael Haggerty
2009-08-14 21:40         ` Nanako Shiraishi
2009-08-15  3:36         ` Björn Steinbrink
2009-08-14  3:17     ` Sitaram Chamarty
2009-08-13 17:39 ` Bryan O'Sullivan
2009-08-13 20:31 ` Abderrahim Kitouni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.