* Using Origin hashes to improve rebase behavior @ 2011-02-10 21:13 John Wiegley 2011-02-10 22:16 ` Johan Herland ` (4 more replies) 0 siblings, 5 replies; 14+ messages in thread From: John Wiegley @ 2011-02-10 21:13 UTC (permalink / raw) To: git The following proposal is a check to see if this approach would be sane and whether someone is already doing similar work. If not, I offer to implement this solution. THE PROBLEM Say I have a master from which I have branched locally, and that this private branch has four commits: a b c o---o---o \ o---o---o---o 1 2 3 4 I then decide to cherry pick commit 3 onto master. Please believe that my situation is such that I cannot immediately rebase the private branch to drop the now-duplicated change. I end up with this: a b c 3' o---o---o---o \ o---o---o---o 1 2 3 4 Later, there is work on master which changes the same lines of code that 3' has changed. The commit which changes 3' is e* a b c 3' d e* f o---o---o---o---o---o---o \ o---o---o---o 1 2 3 4 At a later date, I want to rebase the private branch onto master. What will happen is that the changes in 3 will conflict with the rewritten changes in e*. However, I'd like Git to know that 3 was already incorporated at some earlier time, and *not consider it during the rebase*, since it doesn't need to. THE SOLUTION For the purposes of this discussion, I'd like to define the term "aggregate identity" (insert better name here) as a set including: a commit's sha, and zero or more shas stored in a new field named "Origin-Ids". If, when cherry-picking, the originating's commit id is stored in the Origin-Ids field of the cherry-picked commit, then rebase could know whether a given commit's changes had already been applied. The logic would look like this: 1. When rebasing a branch A onto B, find the common ancestor of A and B. 2. Examine every commit on B since that common ancestor, collecting a set of their aggregate identities. 3. For each commit on A, ignore it if its aggregate identity occurs in that set. This would cause commit 3 to be ignored during the rebase above, since 3' would have an origin id referring to 3. IMPLEMENTATION A few things need to be done: - Extend commit objects to have an Origin field, which can be zero, one or a list of hashes. - Add an option to git commit so that one or more origin ids can be specified at the time any commit is made. There may be occasions when it's useful to explicitly state that a new commit should somehow 'override' the contents of another during a rebase. - git cherry-pick and git am should add this Origin field, showing the commit their contents originated from. - git merge --squash would store the commit ids, and the origin ids, of every commit involved in the merge into the resulting commit's Origin field. Note that nothing can be done about rebasing a squashed merge commit onto another squashed merge commit, even though it could be detected that they had common changes. I don't believe it would even be useful to warn about this, the user would just have to resolve the conflicts manually. - git log could be extended to show the "parentage" (really, the aunt/uncle) of commits with origin info, assuming those origin commits are not dangling (which is OK, and likely to occur after the originating branch is deleted, or if the originating branch is in another repository). Where there are multiple Origin ids, a search could be done to find the set of most descendent commits, so that history could be usefully shown after an octopus squash, for example. QUESTIONS Is it allowable to add new metadata fields to a commit, and would this require bumping the repository version number? Or should this be implemented by appending a Header-style textual field at the end of the commit message? -- John Wiegley BoostPro Computing http://www.boostpro.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley @ 2011-02-10 22:16 ` Johan Herland 2011-02-10 22:54 ` Jeff King ` (3 subsequent siblings) 4 siblings, 0 replies; 14+ messages in thread From: Johan Herland @ 2011-02-10 22:16 UTC (permalink / raw) To: John Wiegley; +Cc: git On Thursday 10 February 2011, John Wiegley wrote: > Is it allowable to add new metadata fields to a commit, and would this > require bumping the repository version number? Or should this be > implemented by appending a Header-style textual field at the end of the > commit message? Many have tried before you to add such fields to the commit objects (including, literally, storing the origin of cherry-picks to help with rebases; search the archives for several examples). They have not succeeded. With good reason. This information does not belong in the commit object header section (see earlier discussions for a more complete rationale). Putting them at the end of the commit message is your best bet. Or even better: as a note object stored in a special-purpose notes ref (e.g. refs/notes/cherry-picks). The note approach also allows you to retroactively add this field to previous cherry-picks. AND it allows you to remove Origin- IDs that refer to no-longer-existing commits. AND it pretty much solves the "git log should show this info" for you as well. In short, this is exactly the thing that notes were created to do. Also, don't forget that the existing -x option to cherry-pick pretty much does exactly what you want to add to the commit object. As for making use of this information in other git commands (e.g. rebase, log, etc.), you should show the list that the feature works well and solves Real Problems(tm) in the real world. If you can do so (with patches), and are willing to work with the list to address issues raised, and improve your patches, I'll guess you have a pretty good shot at getting this accepted. AFAIK, nobody else is working in this area right now, although I don't read the mailing list religuously, so I may have missed things. As I said, others have previously proposed similar features, so you'll want to search the archive for those discussions to make sure you don't repeat the same mistakes. Have fun! :) ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley 2011-02-10 22:16 ` Johan Herland @ 2011-02-10 22:54 ` Jeff King 2011-02-11 3:14 ` John Wiegley 2011-02-12 14:36 ` Thomas Rast 2011-02-11 10:02 ` skillzero ` (2 subsequent siblings) 4 siblings, 2 replies; 14+ messages in thread From: Jeff King @ 2011-02-10 22:54 UTC (permalink / raw) To: John Wiegley; +Cc: git On Thu, Feb 10, 2011 at 04:13:10PM -0500, John Wiegley wrote: > For the purposes of this discussion, I'd like to define the term "aggregate > identity" (insert better name here) as a set including: a commit's sha, and > zero or more shas stored in a new field named "Origin-Ids". > > If, when cherry-picking, the originating's commit id is stored in the > Origin-Ids field of the cherry-picked commit, then rebase could know whether a > given commit's changes had already been applied. The logic would look like > this: > > 1. When rebasing a branch A onto B, find the common ancestor of A and B. > 2. Examine every commit on B since that common ancestor, collecting a > set of their aggregate identities. > 3. For each commit on A, ignore it if its aggregate identity occurs in > that set. > > This would cause commit 3 to be ignored during the rebase above, since 3' > would have an origin id referring to 3. This can work in some cases, but there are other cases where it might not. For example, consider: 1. I cherry-pick commit X from some branch "topic" onto master as X'. We record "Origin-ID: X" in X'. 2. I rebase "topic" (either onto some other branch, or perhaps I use rebase -i to rewrite some earlier commit). X now becomes some X''. 3. I now rebase "topic" onto master. But we fail to note that X'' matches X, so we try to rebase it. Now, in step 2 we could record "X" as an origin ID of X'' and during the rebase in step 3, calculate the intersection of the origins of X'' and X', and see that they are both just X. And I think maybe you already realize that, since you talk about Origin-IDs as sets. But now you have an interesting question: during which operations does a commit retain its Origin-ID on a source? I think it is pretty clear that a cherry-pick that cleanly applies is probably a good candidate. But what if there is a conflict, and I fix up the conflict? Should I still skip the original commit during the rebase? Maybe, but there are cases where you wouldn't want to. For example, consider this sequence: 1. On my master branch, I have a function foo() which takes one argument. 2. I branch a topic from master. The first commit adds a new caller to foo(). 3. The second commit changes foo() to take two arguments. I fix up the function itself, any old callers, and the new caller. 4. I cherry-pick the second commit onto master. There is a conflict, since one of the callers it updates doesn't exist in master. So I drop that part of the patch. 5. On master I update the implementation of foo(). 6. Now I try to rebase my topic on top of master. We could get a conflict because the second commit from (3) will conflict with the updated implementation in (5). This is more or less the case you described in your initial email, and we'd like git to automatically realize that the conflict is uninteresting. So let's imagine we recorded Origin-IDs as you describe, and we skip it. But that means we are _also_ skipping the part where we update the new caller from the commit in (2), the part that was dropped during conflict resolution. So our end result is broken, because the new caller is still calling with one argument. And there are lots of other cases. What about "git cherry-pick -n"? What about rebasing? If there are no conflicts, is it OK to copy the origin field? How about if there are conflicts? How about in a "git rebase -i", where we may stop and the user can arbitrarily split, amend, or add new commits. How do the old commits map to the new ones with respect to origin fields? So there are lots of corner cases where it won't work, because git is more than happy to give you lots of ways to tweak tree state and history, and it fundamentally doesn't care as much about process as it does about the end states that you reach. That's part of what makes git so flexible, but it also makes niceties like "did I already apply this commit on this branch" much harder to make sense of. Now, I don't want to discourage you from working on this. Because while there are lots of cases where it won't work, there are plenty of cases where it _will_, and it will save rebasers time and effort. So it is worth pursuing, but I think it is also worth keeping things simple and conservative, and not affecting the people who have cases where this won't help. > - Extend commit objects to have an Origin field, which can be zero, one or a > list of hashes. It probably shouldn't be a new header field, but rather a text-style pseudo-header at the end of the commit. But consider for a moment whether you actually want this field in the resulting commit at all, or whether it should be an external annotation. For example, let's say I cherry-pick from a private branch that is going to end up rebased anyway. Now the history for all time will have a commit that refers to some totally useless sha1 that nobody even knows about. We already went through this with cherry-pick. It used to always put "cherry-picked from X..." in the commit message. And then we realized that in many cases, that information is not interesting, because X is not something people actually know about. So now we don't do it by default, but for cases where you are cherry-picking from one long-running branch to another, you can use "cherry-pick -x". So consider instead putting this information into a commit-note for the new commit. Possibly even reversing the direction of the mapping (so that the old commit says "I was cherry-picked to X"). And then when the old, rebased commit goes away, the note will automagically get pruned by the notes-pruning mechanism. There may be reasons why that isn't a good idea, and I haven't thought it through. But I think you should consider it as an alternate implementation and tell me why I'm dumb in that case. ;) > - Add an option to git commit so that one or more origin ids can be specified > at the time any commit is made. There may be occasions when it's useful to > explicitly state that a new commit should somehow 'override' the contents > of another during a rebase. > > - git cherry-pick and git am should add this Origin field, showing the commit > their contents originated from. We already have this to some degree, in the form of "cherry-pick -x". You could do it with "git am", but you would need "git format-patch" to actually generate the information (well, technically speaking it is in the mbox "From " header, but that usually doesn't make it through mail transports for obvious reasons). So I wonder if your proposal can be restructured as: 1. Change rebase to look for cherry-picked-from headers on the --onto side, and skip source commits that appear to exist already. That will start helping people immediately using existing history. You can also deal with uncertainty by leaving this decision to the last minute, or even leaving it up to the user. The usual patch-id detection works in a lot of cases. Let it work when it does. When it fails, check if the conflicted commit exists in a cherry-picked-from line. If it does, either do the skip then, or when we barf with the "there was a conflict; fix it up and rebase --continue" message, mention the cherry-picked-from line and let the user inspect the commits themselves and make a decision. They can always do "git rebase --skip" even now, so all we are really doing is saying "By the way, you might want the extra information that this was cherry-picked earlier". And that makes this a very low-risk change, since we are just giving the user extra information for a decision they are already making. 2. (Optional) Start adding the "Cherry picked from" message in a more machine-readable format, like an "Origin-ID: ..." header. This has already been discussed before. People were generally positive, but it didn't seem especially useful. This is a use. And obviously make the corresponding change in rebase to also parse these kinds of headers (but don't drop parsing the original format, obviously, for compatibility). 3. For people who don't want the "cherry picked from" (or "origin-id") in their commit, because they are cherry-picking from a private source, start recording "cherry picked from" in a git-note. You could even do this by default, since you are not impacting the commits themselves in any way. And then make the corresponding change in rebase to start using these notes as a source. 4. We already have some functionality to copy notes about commit A to commit B during certain operations (like rebasing and cherry-picking). Check out how these interact with the notes introduced in (3) to see if transitive stuff works (like cherry-picking A to A', and then A' to A''; you should still be able to figure out that A'' came from A). And I think at that point we have more or less the functionality you were asking for, though we arrived in several non-controversial steps. And there are lots of enhancements you could add on top, like skipping without bothering the user about it, or better heuristics for when to record an origin-id or not to. But we can do those once we see how the basic dumb part performs. I.e., how useful it is in practice, and how often it is wrong about when to skip. > - git merge --squash would store the commit ids, and the origin ids, of every > commit involved in the merge into the resulting commit's Origin field. I hadn't thought about merge --squash as a commit copying operation, but I think it is. I wonder if squash merges (or squash rebases) should also be copying notes (or if they do already, I haven't checked). > - git log could be extended to show the "parentage" (really, the aunt/uncle) > of commits with origin info, assuming those origin commits are not dangling > (which is OK, and likely to occur after the originating branch is deleted, > or if the originating branch is in another repository). If you do it with a combination of text in the commit message and git-notes, then this is all done for you. The commit message you obviously see by default, and you could explicitly ask for it to show the refs/notes/origin-id notes tree. Whew, that turned out long. I hope it's helpful. I think the problem you're trying to solve is a real one, and I think your approach is the right direction. I just think we can leverage existing git features to do most of it, and because it is sort of a heuristic, we should be conservative in how it's introduced. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-10 22:54 ` Jeff King @ 2011-02-11 3:14 ` John Wiegley 2011-02-11 4:45 ` Jeff King 2011-02-12 14:36 ` Thomas Rast 1 sibling, 1 reply; 14+ messages in thread From: John Wiegley @ 2011-02-11 3:14 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > Now, in step 2 we could record "X" as an origin ID of X'' and during the > rebase in step 3, calculate the intersection of the origins of X'' and X', > and see that they are both just X. And I think maybe you already realize > that, since you talk about Origin-IDs as sets. Right. > And there are lots of other cases. What about "git cherry-pick -n"? What > about rebasing? If there are no conflicts, is it OK to copy the origin > field? How about if there are conflicts? How about in a "git rebase -i", > where we may stop and the user can arbitrarily split, amend, or add new > commits. How do the old commits map to the new ones with respect to > origin fields? During rebasing, any commits which can be rebased without conflict have their origin transferred (and each time it would cause the origin id list to grow by one), but any commits which are squashed or edited would not transfer. For cherry-pick -n, if the index is empty at the time the cherry-pick is done (is this required?), then a file is created under .git/ with the SHA of the changes placed in the index, so that when git-commit is later run and the index has not been changed, then the Origin-Id for that originating commit gets placed at the bottom of the commit message. > So there are lots of corner cases where it won't work, because git is > more than happy to give you lots of ways to tweak tree state and > history, and it fundamentally doesn't care as much about process as it > does about the end states that you reach. That's part of what makes git > so flexible, but it also makes niceties like "did I already apply this > commit on this branch" much harder to make sense of. I think we'd want to restrict this system to those commits which were automatically rewritten without conflicts. Any user intervention in the process would invalidate the meaning of the Origin-Id. > It probably shouldn't be a new header field, but rather a text-style > pseudo-header at the end of the commit. I understand. > But consider for a moment whether you actually want this field in the > resulting commit at all, or whether it should be an external annotation. > For example, let's say I cherry-pick from a private branch that is going > to end up rebased anyway. Now the history for all time will have a > commit that refers to some totally useless sha1 that nobody even knows > about. The problem with an external annotation is that if developers are sharing feature branches, as a branch maintainer I want to know whether commits coming from those feature branches are already in the branch I'm maintaining. > There may be reasons why that isn't a good idea, and I haven't thought it > through. But I think you should consider it as an alternate implementation > and tell me why I'm dumb in that case. ;) I'll give it a bit more thought as I consider the implementation of this. > Whew, that turned out long. I hope it's helpful. I think the problem > you're trying to solve is a real one, and I think your approach is the > right direction. I just think we can leverage existing git features to > do most of it, and because it is sort of a heuristic, we should be > conservative in how it's introduced. That's all extremely helpful, thank you! You've brought up several use cases I hadn't thought of, and perhaps this feature will indeed never cover everything, but if it can reliably ease maintenance 80% of the time, I think it's a relatively simple addition. John ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-11 3:14 ` John Wiegley @ 2011-02-11 4:45 ` Jeff King 2011-02-11 5:26 ` John Wiegley 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2011-02-11 4:45 UTC (permalink / raw) To: John Wiegley; +Cc: git On Thu, Feb 10, 2011 at 10:14:15PM -0500, John Wiegley wrote: > > And there are lots of other cases. What about "git cherry-pick -n"? What > > about rebasing? If there are no conflicts, is it OK to copy the origin > > field? How about if there are conflicts? How about in a "git rebase -i", > > where we may stop and the user can arbitrarily split, amend, or add new > > commits. How do the old commits map to the new ones with respect to > > origin fields? > > During rebasing, any commits which can be rebased without conflict have their > origin transferred (and each time it would cause the origin id list to grow by > one), but any commits which are squashed or edited would not transfer. OK. That's certainly the conservative answer, and where we should start. But I wonder in practice how many times we'll hit all the criteria just right for this feature to kick in (i.e., a cherry pick or rebase with no conflicts, followed by one that would cause a conflict). But I think there's nothing to do but implement and see how it works. After thinking about this a bit more, the whole idea of "is this cherry-picked/rebased/whatever commit the same as the one before" is really the same as the notes-rewriting case (i.e., copying notes on commit A when it is rebased into A'). Which makes me excited about using notes for this, because the rules that you do figure out to work in practice will be good rules for notes rewriting in general. > The problem with an external annotation is that if developers are sharing > feature branches, as a branch maintainer I want to know whether commits coming > from those feature branches are already in the branch I'm maintaining. In that case, I would suggest putting it in git-notes and sharing the notes with each other. The notes code should happily merge them all together, and then everyone gets to see everybody else's cherry-pick/rebase annotations. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-11 4:45 ` Jeff King @ 2011-02-11 5:26 ` John Wiegley 0 siblings, 0 replies; 14+ messages in thread From: John Wiegley @ 2011-02-11 5:26 UTC (permalink / raw) To: git Jeff King <peff@peff.net> writes: > OK. That's certainly the conservative answer, and where we should start. > But I wonder in practice how many times we'll hit all the criteria just > right for this feature to kick in (i.e., a cherry pick or rebase with no > conflicts, followed by one that would cause a conflict). But I think > there's nothing to do but implement and see how it works. > > After thinking about this a bit more, the whole idea of "is this > cherry-picked/rebased/whatever commit the same as the one before" is > really the same as the notes-rewriting case (i.e., copying notes on > commit A when it is rebased into A'). Which makes me excited about using > notes for this, because the rules that you do figure out to work in > practice will be good rules for notes rewriting in general. > >> The problem with an external annotation is that if developers are sharing >> feature branches, as a branch maintainer I want to know whether commits coming >> from those feature branches are already in the branch I'm maintaining. > > In that case, I would suggest putting it in git-notes and sharing the > notes with each other. The notes code should happily merge them all > together, and then everyone gets to see everybody else's > cherry-pick/rebase annotations. The more I've talked this over with my friend, the more we discover how difficult this is to get right in certain situations, and also how rare the actual use cases that require storage within the commit message are -- but at the same time, how valuable that information is when those cases occur! This may be a bit more than I can chew right now, so thank you for bringing to my attention the depth of this problem. That's exactly why I posted here before beginning to punch out code that might solve just the naive cases. :) Thanks, John ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-10 22:54 ` Jeff King 2011-02-11 3:14 ` John Wiegley @ 2011-02-12 14:36 ` Thomas Rast 1 sibling, 0 replies; 14+ messages in thread From: Thomas Rast @ 2011-02-12 14:36 UTC (permalink / raw) To: Jeff King; +Cc: John Wiegley, git [I skipped most of the thread, so here's just one minor point.] Jeff King wrote: > I hadn't thought about merge --squash as a commit copying operation, but > I think it is. I wonder if squash merges (or squash rebases) should also > be copying notes (or if they do already, I haven't checked). Squash rebases do but squash merges don't. Doing it "elegantly" would probably involve caching the list of commits since the merge-bases, which would be a bit of work. -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley 2011-02-10 22:16 ` Johan Herland 2011-02-10 22:54 ` Jeff King @ 2011-02-11 10:02 ` skillzero 2011-02-11 11:40 ` Johan Herland 2011-02-20 17:49 ` Enrico Weigelt 2011-02-21 23:49 ` Dave Abrahams 4 siblings, 1 reply; 14+ messages in thread From: skillzero @ 2011-02-11 10:02 UTC (permalink / raw) To: John Wiegley; +Cc: git On Thu, Feb 10, 2011 at 1:13 PM, John Wiegley <johnw@boostpro.com> wrote: > The following proposal is a check to see if this approach would be sane and > whether someone is already doing similar work. If not, I offer to implement > this solution. > > THE PROBLEM > > Say I have a master from which I have branched locally, and that this private > branch has four commits: > > a b c > o---o---o > \ > o---o---o---o > 1 2 3 4 > > I then decide to cherry pick commit 3 onto master. Please believe that my > situation is such that I cannot immediately rebase the private branch to drop > the now-duplicated change. I end up with this: > > a b c 3' > o---o---o---o > \ > o---o---o---o > 1 2 3 4 > > Later, there is work on master which changes the same lines of code that 3' > has changed. The commit which changes 3' is e* > > a b c 3' d e* f > o---o---o---o---o---o---o > \ > o---o---o---o > 1 2 3 4 > > At a later date, I want to rebase the private branch onto master. What will > happen is that the changes in 3 will conflict with the rewritten changes in > e*. However, I'd like Git to know that 3 was already incorporated at some > earlier time, and *not consider it during the rebase*, since it doesn't need > to. I don't know very much about how git really works so what I'm saying may be dumb, but rather than record where a commit came from, would it be reasonable for rebase to look at the patch-id for each change on the topic branch after the merge base and automatically remove topic branch commits that match that patch-id? So in your example, rebase would check each topic branch commit against 3', d, e*, and f and see that the 3' patch-id is the same as the topic branch 3 and remove topic branch 3 before it gets to e*? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-11 10:02 ` skillzero @ 2011-02-11 11:40 ` Johan Herland 2011-02-11 19:03 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Johan Herland @ 2011-02-11 11:40 UTC (permalink / raw) To: skillzero; +Cc: git, John Wiegley On Friday 11 February 2011, skillzero@gmail.com wrote: > On Thu, Feb 10, 2011 at 1:13 PM, John Wiegley <johnw@boostpro.com> wrote: > > a b c 3' d e* f > > o---o---o---o---o---o---o > > \ > > o---o---o---o > > 1 2 3 4 > > > > At a later date, I want to rebase the private branch onto master. What > > will happen is that the changes in 3 will conflict with the rewritten > > changes in e*. However, I'd like Git to know that 3 was already > > incorporated at some earlier time, and *not consider it during the > > rebase*, since it doesn't need to. > > I don't know very much about how git really works so what I'm saying > may be dumb, but rather than record where a commit came from, would it > be reasonable for rebase to look at the patch-id for each change on > the topic branch after the merge base and automatically remove topic > branch commits that match that patch-id? So in your example, rebase > would check each topic branch commit against 3', d, e*, and f and see > that the 3' patch-id is the same as the topic branch 3 and remove > topic branch 3 before it gets to e*? I believe "git rebase" already does exactly what you describe [1]. However, comparing patch-ids stops working when the cherry-pick (3 -> 3') has conflicts. IINM, it is the conflicting cases that John is interested in solving... ...Johan [1]: I tested the above scenario, and got no conflicts: $ git init $ FOO=a && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ FOO=b && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ FOO=c && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ git checkout -b topic $ FOO=1 && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ FOO=2 && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ FOO=3 && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ FOO=4 && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ git checkout master $ git cherry-pick topic^ $ FOO=d && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ echo e >> 3 && git add 3 $ FOO=e && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ FOO=f && echo $FOO > $FOO && git add $FOO && git commit -m $FOO $ git checkout topic $ git rebase master First, rewinding head to replay your work on top of it... Applying: 1 Applying: 2 Applying: 4 $ # Look, no conflicts. -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-11 11:40 ` Johan Herland @ 2011-02-11 19:03 ` Jeff King 2011-02-11 19:32 ` Junio C Hamano 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2011-02-11 19:03 UTC (permalink / raw) To: Johan Herland; +Cc: skillzero, git, John Wiegley On Fri, Feb 11, 2011 at 12:40:29PM +0100, Johan Herland wrote: > > I don't know very much about how git really works so what I'm saying > > may be dumb, but rather than record where a commit came from, would it > > be reasonable for rebase to look at the patch-id for each change on > > the topic branch after the merge base and automatically remove topic > > branch commits that match that patch-id? So in your example, rebase > > would check each topic branch commit against 3', d, e*, and f and see > > that the 3' patch-id is the same as the topic branch 3 and remove > > topic branch 3 before it gets to e*? > > I believe "git rebase" already does exactly what you describe [1]. Yep. It uses format-patch's "--ignore-if-in-upstream", which computes patch-ids (you can get the same list with "git cherry"). > However, comparing patch-ids stops working when the cherry-pick (3 -> 3') > has conflicts. IINM, it is the conflicting cases that John is interested in > solving... Exactly. One other possible solution to this problem would be to somehow make patch-ids handle fuzzy situations better. I doubt it is possible to do that without introducing a lot of false positives, though. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-11 19:03 ` Jeff King @ 2011-02-11 19:32 ` Junio C Hamano 2011-02-11 19:45 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2011-02-11 19:32 UTC (permalink / raw) To: Jeff King; +Cc: Johan Herland, skillzero, git, John Wiegley Jeff King <peff@peff.net> writes: > Exactly. One other possible solution to this problem would be to somehow > make patch-ids handle fuzzy situations better. I doubt it is possible to > do that without introducing a lot of false positives, though. We need to remember that we would want to tolerate _no_ false positive. We try hard to err on the safer side and leave the hard case to the users for a reason. A tool that records correct results 99.9% of the time but produces wrong results for the rest of the time _silently_ is a tool that cannot be trusted, and forces the user to inspect its output carefully to make sure it is correct, not just for the 0.1% cases but for all of them. Among the many automation support facilities we have gained over time, the three-way merge, recursive merge to come up with a synthetic merge base tree, detecting change similarity with patch-id, and detecting renames by content inspection all proved themselves to be reasonably trustworthy without false positives, even though they sometimes fail with false negatives and they do so rather loudly by failing. I find the heuristics in rerere is trustable most of the time but I still do not completely trust it myself. Patching with fuzz and a user declaration that "this change came from that", especially if the user can declare the correspondence even when conflict resolution is involved during the porting of changes from totally different context, fall into a different, a lot less trustworthy, basket. It needs to start from totally trivial cases and punt _loudly_ when there is any doubt. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-11 19:32 ` Junio C Hamano @ 2011-02-11 19:45 ` Jeff King 0 siblings, 0 replies; 14+ messages in thread From: Jeff King @ 2011-02-11 19:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johan Herland, skillzero, git, John Wiegley On Fri, Feb 11, 2011 at 11:32:03AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Exactly. One other possible solution to this problem would be to somehow > > make patch-ids handle fuzzy situations better. I doubt it is possible to > > do that without introducing a lot of false positives, though. > > We need to remember that we would want to tolerate _no_ false positive. Yeah, I agree with everything you say here. My original message should have been s/a lot of// in the last line. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley ` (2 preceding siblings ...) 2011-02-11 10:02 ` skillzero @ 2011-02-20 17:49 ` Enrico Weigelt 2011-02-21 23:49 ` Dave Abrahams 4 siblings, 0 replies; 14+ messages in thread From: Enrico Weigelt @ 2011-02-20 17:49 UTC (permalink / raw) To: git * John Wiegley <johnw@boostpro.com> wrote: <snip> > Later, there is work on master which changes the same lines of code that 3' > has changed. The commit which changes 3' is e* > > a b c 3' d e* f > o---o---o---o---o---o---o > \ > o---o---o---o > 1 2 3 4 > > At a later date, I want to rebase the private branch onto master. What will > happen is that the changes in 3 will conflict with the rewritten changes in > e*. However, I'd like Git to know that 3 was already incorporated at some > earlier time, and *not consider it during the rebase*, since it doesn't need > to. I'm solving these situations by incremental rebase (rebasing onto earlier commits than the head, iteratively). A command for that would be nice. cu -- ---------------------------------------------------------------------- Enrico Weigelt, metux IT service -- http://www.metux.de/ phone: +49 36207 519931 email: weigelt@metux.de mobile: +49 151 27565287 icq: 210169427 skype: nekrad666 ---------------------------------------------------------------------- Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Using Origin hashes to improve rebase behavior 2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley ` (3 preceding siblings ...) 2011-02-20 17:49 ` Enrico Weigelt @ 2011-02-21 23:49 ` Dave Abrahams 4 siblings, 0 replies; 14+ messages in thread From: Dave Abrahams @ 2011-02-21 23:49 UTC (permalink / raw) To: git Johan Herland <johan <at> herland.net> writes: > > On Friday 11 February 2011, skillzero <at> gmail.com wrote: > > On Thu, Feb 10, 2011 at 1:13 PM, John Wiegley > > <johnw <at> boostpro.com> wrote: > > > I don't know very much about how git really works so what I'm saying > > may be dumb, but rather than record where a commit came from, would it > > be reasonable for rebase to look at the patch-id for each change on > > the topic branch after the merge base and automatically remove topic > > branch commits that match that patch-id? So in your example, rebase > > would check each topic branch commit against 3', d, e*, and f and see > > that the 3' patch-id is the same as the topic branch 3 and remove > > topic branch 3 before it gets to e*? > > I believe "git rebase" already does exactly what you describe [1]. I can imagine that we could make merges do something similar: git merge <sources> := Attempt the merge as it works today If there are conflicts for s in <sources> rebase s onto HEAD if there are no conflicts use the current tree as the result of the merge (with the merge's heritage) commit else reset to the conflicted merge state ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-02-21 23:50 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-10 21:13 Using Origin hashes to improve rebase behavior John Wiegley 2011-02-10 22:16 ` Johan Herland 2011-02-10 22:54 ` Jeff King 2011-02-11 3:14 ` John Wiegley 2011-02-11 4:45 ` Jeff King 2011-02-11 5:26 ` John Wiegley 2011-02-12 14:36 ` Thomas Rast 2011-02-11 10:02 ` skillzero 2011-02-11 11:40 ` Johan Herland 2011-02-11 19:03 ` Jeff King 2011-02-11 19:32 ` Junio C Hamano 2011-02-11 19:45 ` Jeff King 2011-02-20 17:49 ` Enrico Weigelt 2011-02-21 23:49 ` Dave Abrahams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).