Using Origin hashes to improve rebase behavior

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Using Origin hashes to improve rebase behavior
@ 2011-02-10 21:13 John Wiegley
  2011-02-10 22:16 ` Johan Herland
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: John Wiegley @ 2011-02-10 21:13 UTC (permalink / raw)
  To: git

The following proposal is a check to see if this approach would be sane and
whether someone is already doing similar work.  If not, I offer to implement
this solution.

THE PROBLEM

Say I have a master from which I have branched locally, and that this private
branch has four commits:

    a   b   c
    o---o---o
             \
              o---o---o---o
              1   2   3   4

I then decide to cherry pick commit 3 onto master.  Please believe that my
situation is such that I cannot immediately rebase the private branch to drop
the now-duplicated change.  I end up with this:

    a   b   c   3'
    o---o---o---o
             \
              o---o---o---o
              1   2   3   4

Later, there is work on master which changes the same lines of code that 3'
has changed.  The commit which changes 3' is e*

    a   b   c   3'  d   e*  f
    o---o---o---o---o---o---o
             \
              o---o---o---o
              1   2   3   4

At a later date, I want to rebase the private branch onto master.  What will
happen is that the changes in 3 will conflict with the rewritten changes in
e*.  However, I'd like Git to know that 3 was already incorporated at some
earlier time, and *not consider it during the rebase*, since it doesn't need
to.

THE SOLUTION

For the purposes of this discussion, I'd like to define the term "aggregate
identity" (insert better name here) as a set including: a commit's sha, and
zero or more shas stored in a new field named "Origin-Ids".

If, when cherry-picking, the originating's commit id is stored in the
Origin-Ids field of the cherry-picked commit, then rebase could know whether a
given commit's changes had already been applied.  The logic would look like
this:

  1. When rebasing a branch A onto B, find the common ancestor of A and B.
  2. Examine every commit on B since that common ancestor, collecting a
     set of their aggregate identities.
  3. For each commit on A, ignore it if its aggregate identity occurs in
     that set.

This would cause commit 3 to be ignored during the rebase above, since 3'
would have an origin id referring to 3.

IMPLEMENTATION

A few things need to be done:

 - Extend commit objects to have an Origin field, which can be zero, one or a
   list of hashes.

 - Add an option to git commit so that one or more origin ids can be specified
   at the time any commit is made.  There may be occasions when it's useful to
   explicitly state that a new commit should somehow 'override' the contents
   of another during a rebase.

 - git cherry-pick and git am should add this Origin field, showing the commit
   their contents originated from.

 - git merge --squash would store the commit ids, and the origin ids, of every
   commit involved in the merge into the resulting commit's Origin field.

   Note that nothing can be done about rebasing a squashed merge commit onto
   another squashed merge commit, even though it could be detected that they
   had common changes.  I don't believe it would even be useful to warn about
   this, the user would just have to resolve the conflicts manually.

 - git log could be extended to show the "parentage" (really, the aunt/uncle)
   of commits with origin info, assuming those origin commits are not dangling
   (which is OK, and likely to occur after the originating branch is deleted,
   or if the originating branch is in another repository).

   Where there are multiple Origin ids, a search could be done to find the set
   of most descendent commits, so that history could be usefully shown after
   an octopus squash, for example.

QUESTIONS

Is it allowable to add new metadata fields to a commit, and would this require
bumping the repository version number?  Or should this be implemented by
appending a Header-style textual field at the end of the commit message?

--
  John Wiegley
  BoostPro Computing
  http://www.boostpro.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley
@ 2011-02-10 22:16 ` Johan Herland
  2011-02-10 22:54 ` Jeff King
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Johan Herland @ 2011-02-10 22:16 UTC (permalink / raw)
  To: John Wiegley; +Cc: git

On Thursday 10 February 2011, John Wiegley wrote:
> Is it allowable to add new metadata fields to a commit, and would this
> require bumping the repository version number?  Or should this be
> implemented by appending a Header-style textual field at the end of the
> commit message?

Many have tried before you to add such fields to the commit objects 
(including, literally, storing the origin of cherry-picks to help with 
rebases; search the archives for several examples). They have not succeeded. 
With good reason. This information does not belong in the commit object 
header section (see earlier discussions for a more complete rationale). 
Putting them at the end of the commit message is your best bet. Or even 
better: as a note object stored in a special-purpose notes ref (e.g. 
refs/notes/cherry-picks). The note approach also allows you to retroactively 
add this field to previous cherry-picks. AND it allows you to remove Origin-
IDs that refer to no-longer-existing commits. AND it pretty much solves the 
"git log should show this info" for you as well. In short, this is exactly 
the thing that notes were created to do.

Also, don't forget that the existing -x option to cherry-pick pretty much 
does exactly what you want to add to the commit object.

As for making use of this information in other git commands (e.g. rebase, 
log, etc.), you should show the list that the feature works well and solves 
Real Problems(tm) in the real world. If you can do so (with patches), and 
are willing to work with the list to address issues raised, and improve your 
patches, I'll guess you have a pretty good shot at getting this accepted.

AFAIK, nobody else is working in this area right now, although I don't read 
the mailing list religuously, so I may have missed things. As I said, others 
have previously proposed similar features, so you'll want to search the 
archive for those discussions to make sure you don't repeat the same 
mistakes.

Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley
  2011-02-10 22:16 ` Johan Herland
@ 2011-02-10 22:54 ` Jeff King
  2011-02-11  3:14   ` John Wiegley
  2011-02-12 14:36   ` Thomas Rast
  2011-02-11 10:02 ` skillzero
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Jeff King @ 2011-02-10 22:54 UTC (permalink / raw)
  To: John Wiegley; +Cc: git

On Thu, Feb 10, 2011 at 04:13:10PM -0500, John Wiegley wrote:

> For the purposes of this discussion, I'd like to define the term "aggregate
> identity" (insert better name here) as a set including: a commit's sha, and
> zero or more shas stored in a new field named "Origin-Ids".
> 
> If, when cherry-picking, the originating's commit id is stored in the
> Origin-Ids field of the cherry-picked commit, then rebase could know whether a
> given commit's changes had already been applied.  The logic would look like
> this:
> 
>   1. When rebasing a branch A onto B, find the common ancestor of A and B.
>   2. Examine every commit on B since that common ancestor, collecting a
>      set of their aggregate identities.
>   3. For each commit on A, ignore it if its aggregate identity occurs in
>      that set.
> 
> This would cause commit 3 to be ignored during the rebase above, since 3'
> would have an origin id referring to 3.

This can work in some cases, but there are other cases where it might
not. For example, consider:

   1. I cherry-pick commit X from some branch "topic" onto master as X'.
      We record "Origin-ID: X" in X'.

   2. I rebase "topic" (either onto some other branch, or perhaps I use
      rebase -i to rewrite some earlier commit). X now becomes some X''.

   3. I now rebase "topic" onto master. But we fail to note that X''
      matches X, so we try to rebase it.

Now, in step 2 we could record "X" as an origin ID of X'' and during the
rebase in step 3, calculate the intersection of the origins of X'' and
X', and see that they are both just X. And I think maybe you already
realize that, since you talk about Origin-IDs as sets.

But now you have an interesting question: during which operations does a
commit retain its Origin-ID on a source? I think it is pretty clear that
a cherry-pick that cleanly applies is probably a good candidate. But
what if there is a conflict, and I fix up the conflict? Should I still
skip the original commit during the rebase? Maybe, but there are cases
where you wouldn't want to. For example, consider this sequence:

  1. On my master branch, I have a function foo() which takes one
     argument.

  2. I branch a topic from master. The first commit adds a new caller to
     foo().

  3. The second commit changes foo() to take two arguments. I fix up the
     function itself, any old callers, and the new caller.

  4. I cherry-pick the second commit onto master. There is a conflict,
     since one of the callers it updates doesn't exist in master. So I
     drop that part of the patch.

  5. On master I update the implementation of foo().

  6. Now I try to rebase my topic on top of master. We could get a
     conflict because the second commit from (3) will conflict with the
     updated implementation in (5). This is more or less the case you
     described in your initial email, and we'd like git to automatically
     realize that the conflict is uninteresting.

     So let's imagine we recorded Origin-IDs as you describe, and we
     skip it. But that means we are _also_ skipping the part where we
     update the new caller from the commit in (2), the part that was
     dropped during conflict resolution. So our end result is broken,
     because the new caller is still calling with one argument.

And there are lots of other cases. What about "git cherry-pick -n"? What
about rebasing? If there are no conflicts, is it OK to copy the origin
field? How about if there are conflicts? How about in a "git rebase -i",
where we may stop and the user can arbitrarily split, amend, or add new
commits. How do the old commits map to the new ones with respect to
origin fields?

So there are lots of corner cases where it won't work, because git is
more than happy to give you lots of ways to tweak tree state and
history, and it fundamentally doesn't care as much about process as it
does about the end states that you reach. That's part of what makes git
so flexible, but it also makes niceties like "did I already apply this
commit on this branch" much harder to make sense of.

Now, I don't want to discourage you from working on this. Because while
there are lots of cases where it won't work, there are plenty of cases
where it _will_, and it will save rebasers time and effort. So it is
worth pursuing, but I think it is also worth keeping things simple and
conservative, and not affecting the people who have cases where this
won't help.

>  - Extend commit objects to have an Origin field, which can be zero, one or a
>    list of hashes.

It probably shouldn't be a new header field, but rather a text-style
pseudo-header at the end of the commit.

But consider for a moment whether you actually want this field in the
resulting commit at all, or whether it should be an external annotation.
For example, let's say I cherry-pick from a private branch that is going
to end up rebased anyway. Now the history for all time will have a
commit that refers to some totally useless sha1 that nobody even knows
about.

We already went through this with cherry-pick. It used to always put
"cherry-picked from X..." in the commit message. And then we realized
that in many cases, that information is not interesting, because X is
not something people actually know about. So now we don't do it by
default, but for cases where you are cherry-picking from one
long-running branch to another, you can use "cherry-pick -x".

So consider instead putting this information into a commit-note for the
new commit. Possibly even reversing the direction of the mapping (so
that the old commit says "I was cherry-picked to X"). And then when the
old, rebased commit goes away, the note will automagically get pruned by
the notes-pruning mechanism.

There may be reasons why that isn't a good idea, and I haven't thought
it through. But I think you should consider it as an alternate
implementation and tell me why I'm dumb in that case. ;)

>  - Add an option to git commit so that one or more origin ids can be specified
>    at the time any commit is made.  There may be occasions when it's useful to
>    explicitly state that a new commit should somehow 'override' the contents
>    of another during a rebase.
> 
>  - git cherry-pick and git am should add this Origin field, showing the commit
>    their contents originated from.

We already have this to some degree, in the form of "cherry-pick -x".
You could do it with "git am", but you would need "git format-patch" to
actually generate the information (well, technically speaking it is in
the mbox "From " header, but that usually doesn't make it through mail
transports for obvious reasons).

So I wonder if your proposal can be restructured as:

  1. Change rebase to look for cherry-picked-from headers on the --onto
     side, and skip source commits that appear to exist already. That
     will start helping people immediately using existing history.

     You can also deal with uncertainty by leaving this decision to the
     last minute, or even leaving it up to the user. The usual patch-id
     detection works in a lot of cases. Let it work when it does. When
     it fails, check if the conflicted commit exists in a
     cherry-picked-from line. If it does, either do the skip then, or
     when we barf with the "there was a conflict; fix it up and rebase
     --continue" message, mention the cherry-picked-from line and let
     the user inspect the commits themselves and make a decision.

     They can always do "git rebase --skip" even now, so all we are
     really doing is saying "By the way, you might want the extra
     information that this was cherry-picked earlier". And that makes
     this a very low-risk change, since we are just giving the user
     extra information for a decision they are already making.

  2. (Optional) Start adding the "Cherry picked from" message in a more
     machine-readable format, like an "Origin-ID: ..." header. This has
     already been discussed before. People were generally positive, but
     it didn't seem especially useful. This is a use.

     And obviously make the corresponding change in rebase to also parse
     these kinds of headers (but don't drop parsing the original format,
     obviously, for compatibility).

  3. For people who don't want the "cherry picked from" (or "origin-id")
     in their commit, because they are cherry-picking from a private
     source, start recording "cherry picked from" in a git-note. You
     could even do this by default, since you are not impacting the
     commits themselves in any way.

     And then make the corresponding change in rebase to start using
     these notes as a source.

  4. We already have some functionality to copy notes about commit A to
     commit B during certain operations (like rebasing and
     cherry-picking). Check out how these interact with the notes
     introduced in (3) to see if transitive stuff works (like
     cherry-picking A to A', and then A' to A''; you should still be
     able to figure out that A'' came from A).

And I think at that point we have more or less the functionality you
were asking for, though we arrived in several non-controversial steps.
And there are lots of enhancements you could add on top, like skipping
without bothering the user about it, or better heuristics for when to
record an origin-id or not to. But we can do those once we see how the
basic dumb part performs. I.e., how useful it is in practice, and how
often it is wrong about when to skip.

>  - git merge --squash would store the commit ids, and the origin ids, of every
>    commit involved in the merge into the resulting commit's Origin field.

I hadn't thought about merge --squash as a commit copying operation, but
I think it is. I wonder if squash merges (or squash rebases) should also
be copying notes (or if they do already, I haven't checked).

>  - git log could be extended to show the "parentage" (really, the aunt/uncle)
>    of commits with origin info, assuming those origin commits are not dangling
>    (which is OK, and likely to occur after the originating branch is deleted,
>    or if the originating branch is in another repository).

If you do it with a combination of text in the commit message and
git-notes, then this is all done for you. The commit message you
obviously see by default, and you could explicitly ask for it to show
the refs/notes/origin-id notes tree.

Whew, that turned out long. I hope it's helpful. I think the problem
you're trying to solve is a real one, and I think your approach is the
right direction. I just think we can leverage existing git features to
do most of it, and because it is sort of a heuristic, we should be
conservative in how it's introduced.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-10 22:54 ` Jeff King
@ 2011-02-11  3:14   ` John Wiegley
  2011-02-11  4:45     ` Jeff King
  2011-02-12 14:36   ` Thomas Rast
  1 sibling, 1 reply; 14+ messages in thread
From: John Wiegley @ 2011-02-11  3:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Now, in step 2 we could record "X" as an origin ID of X'' and during the
> rebase in step 3, calculate the intersection of the origins of X'' and X',
> and see that they are both just X. And I think maybe you already realize
> that, since you talk about Origin-IDs as sets.

Right.

> And there are lots of other cases. What about "git cherry-pick -n"? What
> about rebasing? If there are no conflicts, is it OK to copy the origin
> field? How about if there are conflicts? How about in a "git rebase -i",
> where we may stop and the user can arbitrarily split, amend, or add new
> commits. How do the old commits map to the new ones with respect to
> origin fields?

During rebasing, any commits which can be rebased without conflict have their
origin transferred (and each time it would cause the origin id list to grow by
one), but any commits which are squashed or edited would not transfer.

For cherry-pick -n, if the index is empty at the time the cherry-pick is done
(is this required?), then a file is created under .git/ with the SHA of the
changes placed in the index, so that when git-commit is later run and the
index has not been changed, then the Origin-Id for that originating commit
gets placed at the bottom of the commit message.

> So there are lots of corner cases where it won't work, because git is
> more than happy to give you lots of ways to tweak tree state and
> history, and it fundamentally doesn't care as much about process as it
> does about the end states that you reach. That's part of what makes git
> so flexible, but it also makes niceties like "did I already apply this
> commit on this branch" much harder to make sense of.

I think we'd want to restrict this system to those commits which were
automatically rewritten without conflicts.  Any user intervention in the
process would invalidate the meaning of the Origin-Id.

> It probably shouldn't be a new header field, but rather a text-style
> pseudo-header at the end of the commit.

I understand.

> But consider for a moment whether you actually want this field in the
> resulting commit at all, or whether it should be an external annotation.
> For example, let's say I cherry-pick from a private branch that is going
> to end up rebased anyway. Now the history for all time will have a
> commit that refers to some totally useless sha1 that nobody even knows
> about.

The problem with an external annotation is that if developers are sharing
feature branches, as a branch maintainer I want to know whether commits coming
from those feature branches are already in the branch I'm maintaining.

> There may be reasons why that isn't a good idea, and I haven't thought it
> through. But I think you should consider it as an alternate implementation
> and tell me why I'm dumb in that case. ;)

I'll give it a bit more thought as I consider the implementation of this.

> Whew, that turned out long. I hope it's helpful. I think the problem
> you're trying to solve is a real one, and I think your approach is the
> right direction. I just think we can leverage existing git features to
> do most of it, and because it is sort of a heuristic, we should be
> conservative in how it's introduced.

That's all extremely helpful, thank you!  You've brought up several use cases
I hadn't thought of, and perhaps this feature will indeed never cover
everything, but if it can reliably ease maintenance 80% of the time, I think
it's a relatively simple addition.

John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-11  3:14   ` John Wiegley
@ 2011-02-11  4:45     ` Jeff King
  2011-02-11  5:26       ` John Wiegley
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2011-02-11  4:45 UTC (permalink / raw)
  To: John Wiegley; +Cc: git

On Thu, Feb 10, 2011 at 10:14:15PM -0500, John Wiegley wrote:

> > And there are lots of other cases. What about "git cherry-pick -n"? What
> > about rebasing? If there are no conflicts, is it OK to copy the origin
> > field? How about if there are conflicts? How about in a "git rebase -i",
> > where we may stop and the user can arbitrarily split, amend, or add new
> > commits. How do the old commits map to the new ones with respect to
> > origin fields?
> 
> During rebasing, any commits which can be rebased without conflict have their
> origin transferred (and each time it would cause the origin id list to grow by
> one), but any commits which are squashed or edited would not transfer.

OK. That's certainly the conservative answer, and where we should start.
But I wonder in practice how many times we'll hit all the criteria just
right for this feature to kick in (i.e., a cherry pick or rebase with no
conflicts, followed by one that would cause a conflict). But I think
there's nothing to do but implement and see how it works.

After thinking about this a bit more, the whole idea of "is this
cherry-picked/rebased/whatever commit the same as the one before" is
really the same as the notes-rewriting case (i.e., copying notes on
commit A when it is rebased into A'). Which makes me excited about using
notes for this, because the rules that you do figure out to work in
practice will be good rules for notes rewriting in general.

> The problem with an external annotation is that if developers are sharing
> feature branches, as a branch maintainer I want to know whether commits coming
> from those feature branches are already in the branch I'm maintaining.

In that case, I would suggest putting it in git-notes and sharing the
notes with each other. The notes code should happily merge them all
together, and then everyone gets to see everybody else's
cherry-pick/rebase annotations.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-11  4:45     ` Jeff King
@ 2011-02-11  5:26       ` John Wiegley
  0 siblings, 0 replies; 14+ messages in thread
From: John Wiegley @ 2011-02-11  5:26 UTC (permalink / raw)
  To: git

Jeff King <peff@peff.net> writes:

> OK. That's certainly the conservative answer, and where we should start.
> But I wonder in practice how many times we'll hit all the criteria just
> right for this feature to kick in (i.e., a cherry pick or rebase with no
> conflicts, followed by one that would cause a conflict). But I think
> there's nothing to do but implement and see how it works.
>
> After thinking about this a bit more, the whole idea of "is this
> cherry-picked/rebased/whatever commit the same as the one before" is
> really the same as the notes-rewriting case (i.e., copying notes on
> commit A when it is rebased into A'). Which makes me excited about using
> notes for this, because the rules that you do figure out to work in
> practice will be good rules for notes rewriting in general.
>
>> The problem with an external annotation is that if developers are sharing
>> feature branches, as a branch maintainer I want to know whether commits coming
>> from those feature branches are already in the branch I'm maintaining.
>
> In that case, I would suggest putting it in git-notes and sharing the
> notes with each other. The notes code should happily merge them all
> together, and then everyone gets to see everybody else's
> cherry-pick/rebase annotations.

The more I've talked this over with my friend, the more we discover how
difficult this is to get right in certain situations, and also how rare the
actual use cases that require storage within the commit message are -- but at
the same time, how valuable that information is when those cases occur!

This may be a bit more than I can chew right now, so thank you for bringing to
my attention the depth of this problem.  That's exactly why I posted here
before beginning to punch out code that might solve just the naive cases. :)

Thanks, John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-10 22:54 ` Jeff King
  2011-02-11  3:14   ` John Wiegley
@ 2011-02-12 14:36   ` Thomas Rast
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Rast @ 2011-02-12 14:36 UTC (permalink / raw)
  To: Jeff King; +Cc: John Wiegley, git

[I skipped most of the thread, so here's just one minor point.]

Jeff King wrote:
> I hadn't thought about merge --squash as a commit copying operation, but
> I think it is. I wonder if squash merges (or squash rebases) should also
> be copying notes (or if they do already, I haven't checked).

Squash rebases do but squash merges don't.  Doing it "elegantly" would
probably involve caching the list of commits since the merge-bases,
which would be a bit of work.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley
  2011-02-10 22:16 ` Johan Herland
  2011-02-10 22:54 ` Jeff King
@ 2011-02-11 10:02 ` skillzero
  2011-02-11 11:40   ` Johan Herland
  2011-02-20 17:49 ` Enrico Weigelt
  2011-02-21 23:49 ` Dave Abrahams
  4 siblings, 1 reply; 14+ messages in thread
From: skillzero @ 2011-02-11 10:02 UTC (permalink / raw)
  To: John Wiegley; +Cc: git

On Thu, Feb 10, 2011 at 1:13 PM, John Wiegley <johnw@boostpro.com> wrote:
> The following proposal is a check to see if this approach would be sane and
> whether someone is already doing similar work.  If not, I offer to implement
> this solution.
>
> THE PROBLEM
>
> Say I have a master from which I have branched locally, and that this private
> branch has four commits:
>
>    a   b   c
>    o---o---o
>             \
>              o---o---o---o
>              1   2   3   4
>
> I then decide to cherry pick commit 3 onto master.  Please believe that my
> situation is such that I cannot immediately rebase the private branch to drop
> the now-duplicated change.  I end up with this:
>
>    a   b   c   3'
>    o---o---o---o
>             \
>              o---o---o---o
>              1   2   3   4
>
> Later, there is work on master which changes the same lines of code that 3'
> has changed.  The commit which changes 3' is e*
>
>    a   b   c   3'  d   e*  f
>    o---o---o---o---o---o---o
>             \
>              o---o---o---o
>              1   2   3   4
>
> At a later date, I want to rebase the private branch onto master.  What will
> happen is that the changes in 3 will conflict with the rewritten changes in
> e*.  However, I'd like Git to know that 3 was already incorporated at some
> earlier time, and *not consider it during the rebase*, since it doesn't need
> to.

I don't know very much about how git really works so what I'm saying
may be dumb, but rather than record where a commit came from, would it
be reasonable for rebase to look at the patch-id for each change on
the topic branch after the merge base and automatically remove topic
branch commits that match that patch-id? So in your example, rebase
would check each topic branch commit against 3', d, e*, and f and see
that the 3' patch-id is the same as the topic branch 3 and remove
topic branch 3 before it gets to e*?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-11 10:02 ` skillzero
@ 2011-02-11 11:40   ` Johan Herland
  2011-02-11 19:03     ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Johan Herland @ 2011-02-11 11:40 UTC (permalink / raw)
  To: skillzero; +Cc: git, John Wiegley

On Friday 11 February 2011, skillzero@gmail.com wrote:
> On Thu, Feb 10, 2011 at 1:13 PM, John Wiegley <johnw@boostpro.com> wrote:
> >    a   b   c   3'  d   e*  f
> >    o---o---o---o---o---o---o
> >             \
> >              o---o---o---o
> >              1   2   3   4
> > 
> > At a later date, I want to rebase the private branch onto master.  What
> > will happen is that the changes in 3 will conflict with the rewritten
> > changes in e*.  However, I'd like Git to know that 3 was already
> > incorporated at some earlier time, and *not consider it during the
> > rebase*, since it doesn't need to.
> 
> I don't know very much about how git really works so what I'm saying
> may be dumb, but rather than record where a commit came from, would it
> be reasonable for rebase to look at the patch-id for each change on
> the topic branch after the merge base and automatically remove topic
> branch commits that match that patch-id? So in your example, rebase
> would check each topic branch commit against 3', d, e*, and f and see
> that the 3' patch-id is the same as the topic branch 3 and remove
> topic branch 3 before it gets to e*?

I believe "git rebase" already does exactly what you describe [1].

However, comparing patch-ids stops working when the cherry-pick (3 -> 3') 
has conflicts. IINM, it is the conflicting cases that John is interested in 
solving...


...Johan

[1]: I tested the above scenario, and got no conflicts:

$ git init
$ FOO=a && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ FOO=b && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ FOO=c && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ git checkout -b topic
$ FOO=1 && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ FOO=2 && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ FOO=3 && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ FOO=4 && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ git checkout master
$ git cherry-pick topic^
$ FOO=d && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ echo e >> 3 && git add 3
$ FOO=e && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ FOO=f && echo $FOO > $FOO && git add $FOO && git commit -m $FOO
$ git checkout topic
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: 1
Applying: 2
Applying: 4
$ # Look, no conflicts.

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-11 11:40   ` Johan Herland
@ 2011-02-11 19:03     ` Jeff King
  2011-02-11 19:32       ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2011-02-11 19:03 UTC (permalink / raw)
  To: Johan Herland; +Cc: skillzero, git, John Wiegley

On Fri, Feb 11, 2011 at 12:40:29PM +0100, Johan Herland wrote:

> > I don't know very much about how git really works so what I'm saying
> > may be dumb, but rather than record where a commit came from, would it
> > be reasonable for rebase to look at the patch-id for each change on
> > the topic branch after the merge base and automatically remove topic
> > branch commits that match that patch-id? So in your example, rebase
> > would check each topic branch commit against 3', d, e*, and f and see
> > that the 3' patch-id is the same as the topic branch 3 and remove
> > topic branch 3 before it gets to e*?
> 
> I believe "git rebase" already does exactly what you describe [1].

Yep. It uses format-patch's "--ignore-if-in-upstream", which computes
patch-ids (you can get the same list with "git cherry").

> However, comparing patch-ids stops working when the cherry-pick (3 -> 3') 
> has conflicts. IINM, it is the conflicting cases that John is interested in 
> solving...

Exactly. One other possible solution to this problem would be to somehow
make patch-ids handle fuzzy situations better. I doubt it is possible to
do that without introducing a lot of false positives, though.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-11 19:03     ` Jeff King
@ 2011-02-11 19:32       ` Junio C Hamano
  2011-02-11 19:45         ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2011-02-11 19:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Johan Herland, skillzero, git, John Wiegley

Jeff King <peff@peff.net> writes:

> Exactly. One other possible solution to this problem would be to somehow
> make patch-ids handle fuzzy situations better. I doubt it is possible to
> do that without introducing a lot of false positives, though.

We need to remember that we would want to tolerate _no_ false positive.

We try hard to err on the safer side and leave the hard case to the users
for a reason.  A tool that records correct results 99.9% of the time but
produces wrong results for the rest of the time _silently_ is a tool that
cannot be trusted, and forces the user to inspect its output carefully to
make sure it is correct, not just for the 0.1% cases but for all of them.

Among the many automation support facilities we have gained over time, the
three-way merge, recursive merge to come up with a synthetic merge base
tree, detecting change similarity with patch-id, and detecting renames by
content inspection all proved themselves to be reasonably trustworthy
without false positives, even though they sometimes fail with false
negatives and they do so rather loudly by failing.  I find the heuristics
in rerere is trustable most of the time but I still do not completely
trust it myself.

Patching with fuzz and a user declaration that "this change came from
that", especially if the user can declare the correspondence even when
conflict resolution is involved during the porting of changes from totally
different context, fall into a different, a lot less trustworthy, basket.
It needs to start from totally trivial cases and punt _loudly_ when there
is any doubt.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-11 19:32       ` Junio C Hamano
@ 2011-02-11 19:45         ` Jeff King
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2011-02-11 19:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johan Herland, skillzero, git, John Wiegley

On Fri, Feb 11, 2011 at 11:32:03AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Exactly. One other possible solution to this problem would be to somehow
> > make patch-ids handle fuzzy situations better. I doubt it is possible to
> > do that without introducing a lot of false positives, though.
> 
> We need to remember that we would want to tolerate _no_ false positive.

Yeah, I agree with everything you say here. My original message should
have been s/a lot of// in the last line.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley
                   ` (2 preceding siblings ...)
  2011-02-11 10:02 ` skillzero
@ 2011-02-20 17:49 ` Enrico Weigelt
  2011-02-21 23:49 ` Dave Abrahams
  4 siblings, 0 replies; 14+ messages in thread
From: Enrico Weigelt @ 2011-02-20 17:49 UTC (permalink / raw)
  To: git

* John Wiegley <johnw@boostpro.com> wrote:

<snip>

> Later, there is work on master which changes the same lines of code that 3'
> has changed.  The commit which changes 3' is e*
> 
>     a   b   c   3'  d   e*  f
>     o---o---o---o---o---o---o
>              \
>               o---o---o---o
>               1   2   3   4
> 
> At a later date, I want to rebase the private branch onto master.  What will
> happen is that the changes in 3 will conflict with the rewritten changes in
> e*.  However, I'd like Git to know that 3 was already incorporated at some
> earlier time, and *not consider it during the rebase*, since it doesn't need
> to.

I'm solving these situations by incremental rebase (rebasing onto earlier
commits than the head, iteratively). A command for that would be nice.


cu
-- 
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weigelt@metux.de
 mobile: +49 151 27565287  icq:   210169427         skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Using Origin hashes to improve rebase behavior
  2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley
                   ` (3 preceding siblings ...)
  2011-02-20 17:49 ` Enrico Weigelt
@ 2011-02-21 23:49 ` Dave Abrahams
  4 siblings, 0 replies; 14+ messages in thread
From: Dave Abrahams @ 2011-02-21 23:49 UTC (permalink / raw)
  To: git

Johan Herland <johan <at> herland.net> writes:

> 
> On Friday 11 February 2011, skillzero <at> gmail.com wrote:
> > On Thu, Feb 10, 2011 at 1:13 PM, John Wiegley
> > <johnw <at> boostpro.com> wrote:
>
> > I don't know very much about how git really works so what I'm saying
> > may be dumb, but rather than record where a commit came from, would it
> > be reasonable for rebase to look at the patch-id for each change on
> > the topic branch after the merge base and automatically remove topic
> > branch commits that match that patch-id? So in your example, rebase
> > would check each topic branch commit against 3', d, e*, and f and see
> > that the 3' patch-id is the same as the topic branch 3 and remove
> > topic branch 3 before it gets to e*?
> 
> I believe "git rebase" already does exactly what you describe [1].

I can imagine that we could make merges do something similar:

git merge <sources> :=

   Attempt the merge as it works today
   If there are conflicts
       for s in <sources>
            rebase s onto HEAD
       if there are no conflicts
            use the current tree as the result of
                the merge (with the merge's heritage)
            commit
       else
            reset to the conflicted merge state

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-02-21 23:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley
2011-02-10 22:16 ` Johan Herland
2011-02-10 22:54 ` Jeff King
2011-02-11  3:14   ` John Wiegley
2011-02-11  4:45     ` Jeff King
2011-02-11  5:26       ` John Wiegley
2011-02-12 14:36   ` Thomas Rast
2011-02-11 10:02 ` skillzero
2011-02-11 11:40   ` Johan Herland
2011-02-11 19:03     ` Jeff King
2011-02-11 19:32       ` Junio C Hamano
2011-02-11 19:45         ` Jeff King
2011-02-20 17:49 ` Enrico Weigelt
2011-02-21 23:49 ` Dave Abrahams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).