* [RFC] origin link for cherry-pick and revert @ 2008-09-09 13:22 Stephen R. van den Berg 2008-09-09 13:38 ` Paolo Bonzini ` (4 more replies) 0 siblings, 5 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 13:22 UTC (permalink / raw) To: git I've read and digested the old threads about prior and related links. Here's a new proposal which should be able to pass muster, if I read all the relevant suggestions and objections in the old threads: Consider an origin field as such: commit bbb896d8e10f736bfda8f587c0009c358c9a8599 tree b83f28279a68439b9b044bccc313bbeaa3e973f5 parent ed0f47a8c431f27e0bd131ea1cf9cabbd580745b origin d2b9dff8a08cc2037a7ba0463e90791f07cb49dd origin a1184d85e8752658f02746982822f43f32316803 2 author Junio C Hamano <gitster@pobox.com> 1220132115 -0700 committer Junio C Hamano <gitster@pobox.com> 1220153445 -0700 The definition of the origin field reads as follows: - There can be an arbitrary number of origin fields per commit. Typically there is going to be at most one origin field per commit. - At the time of creation, the origin field contains a hash B which refers to a reachable commit pair (B, B~1). If B has multiple parents and the pair being referred to needs to be e.g. (B, B~2), then the hash is followed by a space and followed by an integer (base10, two in this case), which designates the proper parentnr of B (see: mainline in git cherry-pick/revert). - In an existing repository gc/prune shall not delete commits being referred to by origin links. - During fetch/push/pull the full commit including the origin fields is transmitted, however, the objects the origin links are referring to are not (unless they are being transmitted because of other reasons). - When fetching/pulling it is optionally possible to tell git to actually transmit objects referred to by origin links even if it would otherwise not have done so. - git cherry-pick/revert allow for the creation of origin links only if the object they are referring to is presently reachable. - git fsck will traverse origin links, but will stay silent if the object an origin link points to is unreachable (kind of like a shallow repository). - git rev-list --topo-order will take origin links into account to ensure proper ordering. - gitk allows for (e.g.) dotted lines to show the origin links. - git log would show something like: commit bbb896d8e10f736bfda8f587c0009c358c9a8599 Origin: d2b9dff..53d1589 Origin: a1184d8..e596cdd Author: Junio C Hamano <gitster@pobox.com> Date: Sat Aug 30 14:35:15 2008 -0700 Note that for easy viewing: git diff d2b9dff..53d1589 will show the exact diff the origin link is referring to. - git log --graph will show a dotted line of somesort just like gitk. - git blame will follow and use the origin link if the object exists. - git merge disregards the whole origin field entirely, just like all the rest of git-core. Anything I missed? -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg @ 2008-09-09 13:38 ` Paolo Bonzini 2008-09-09 14:04 ` Stephen R. van den Berg 2008-09-09 13:48 ` Stephen R. van den Berg ` (3 subsequent siblings) 4 siblings, 1 reply; 137+ messages in thread From: Paolo Bonzini @ 2008-09-09 13:38 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: git > - At the time of creation, the origin field contains a hash B which refers > to a reachable commit pair (B, B~1). If B has multiple parents and the pair > being referred to needs to be e.g. (B, B~2), then the hash is followed by > a space and followed by an integer (base10, two in this case), > which designates the proper parentnr of B (see: mainline in git > cherry-pick/revert). What about just storing *two* hashes? This way cherry-pick can store B~1..B and revert can store B..B~1. The two cases can be distinguished by checking which commit is an ancestor of which. > - git cherry-pick/revert allow for the creation of origin links only if > the object they are referring to is presently reachable. Will cherry-pick -x create origin links? Also, does the origin link propagate through multiple cherry picks? If not, how can the origin object not be reachable? > [snip good stuff] git cherry will use origin links to mark a commit as present, and will only use patch-ids for commits that have no origin links. Bonus points for an extra command-line/configuration option to only use origin links: --source=default << default: get setting from core.cherrysource --source=patch-id --source=origin --source=origin,patch-id core.cherrysource = patch-id core.cherrysource = origin core.cherrysource = origin,patch-id Thanks! Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 13:38 ` Paolo Bonzini @ 2008-09-09 14:04 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 14:04 UTC (permalink / raw) To: Paolo Bonzini; +Cc: git Paolo Bonzini wrote: >> - At the time of creation, the origin field contains a hash B which refers >> to a reachable commit pair (B, B~1). If B has multiple parents and the pair >> being referred to needs to be e.g. (B, B~2), then the hash is followed by >> a space and followed by an integer (base10, two in this case), >> which designates the proper parentnr of B (see: mainline in git >> cherry-pick/revert). >What about just storing *two* hashes? This way cherry-pick can store >B~1..B and revert can store B..B~1. The two cases can be distinguished >by checking which commit is an ancestor of which. Valid point, but consider: The new commit to receive hash A. The diff between A~1..A and B~1..B actually defines the relation. Revert and cherry-pick are symmetrical operations as far as git is concerned since git tracks content. So I'm not quite sure if we actually need this extra information, git already knows it all. >> - git cherry-pick/revert allow for the creation of origin links only if >> the object they are referring to is presently reachable. >Will cherry-pick -x create origin links? I'd propose a cherry-pick -o and revert -o for that. I wouldn't want to force the text which -x generates into the commit when origin links are used. > Also, does the origin link >propagate through multiple cherry picks? The origin link is created point-to-point from the object referenced by cherry-pick/revert to the new commit. The link creation specifically does not follow any existing origin links. If you want the origin link to point to a deeper origin behind the current, then cherry-pick from there, no need to fake it. > If not, how can the origin >object not be reachable? That can only happen during a fetch/pull, which doesn't use origin links to determine transmittability by default. -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg 2008-09-09 13:38 ` Paolo Bonzini @ 2008-09-09 13:48 ` Stephen R. van den Berg 2008-09-09 15:44 ` Jakub Narebski ` (2 subsequent siblings) 4 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 13:48 UTC (permalink / raw) To: git Stephen R. van den Berg wrote: >Anything I missed? I think I forgot two: - git rebase will fixup any origin pointers which point back into the strain being rebased. - git filter-branch will rewrite origin pointers which point to commits that receive a new hash. -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg 2008-09-09 13:38 ` Paolo Bonzini 2008-09-09 13:48 ` Stephen R. van den Berg @ 2008-09-09 15:44 ` Jakub Narebski 2008-09-09 16:38 ` Steven Grimm 2008-09-09 19:43 ` Stephen R. van den Berg 2008-09-09 21:13 ` Petr Baudis 2008-09-10 20:32 ` [RFC] origin link for cherry-pick and revert Miklos Vajna 4 siblings, 2 replies; 137+ messages in thread From: Jakub Narebski @ 2008-09-09 15:44 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: git "Stephen R. van den Berg" <srb@cuci.nl> writes: > I've read and digested the old threads about prior and related links. > Here's a new proposal which should be able to pass muster, if I read all > the relevant suggestions and objections in the old threads: > > Consider an origin field as such: > > commit bbb896d8e10f736bfda8f587c0009c358c9a8599 > tree b83f28279a68439b9b044bccc313bbeaa3e973f5 > parent ed0f47a8c431f27e0bd131ea1cf9cabbd580745b > origin d2b9dff8a08cc2037a7ba0463e90791f07cb49dd > origin a1184d85e8752658f02746982822f43f32316803 2 > author Junio C Hamano <gitster@pobox.com> 1220132115 -0700 > committer Junio C Hamano <gitster@pobox.com> 1220153445 -0700 > > The definition of the origin field reads as follows: > > - There can be an arbitrary number of origin fields per commit. > Typically there is going to be at most one origin field per commit. I understand that multiple origin fields occur if you do a squash merge, or if you cherry-pick multiple commits into single commit. For example: $ git cherry-pick -n <a1> $ git cherry-pick <a2> $ git commit --amend #; to correct commit message I'm not sure if you plan to automatically add 'origin' field for rebase, and for interactive rebase... > - At the time of creation, the origin field contains a hash B which refers > to a reachable commit pair (B, B~1). If B has multiple parents and the pair > being referred to needs to be e.g. (B, B~2), then the hash is followed by > a space and followed by an integer (base10, two in this case), > which designates the proper parentnr of B (see: mainline in git > cherry-pick/revert). I think you wanted to use "(B, B^2)", which mean B and second parent of B. B~2 means grandparent of B in the straight line: ... <--- B~2 <--- B^1 = B^ = B~1 <--- B / ... <--- B^2 <--/ Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed in the neighbouring subthread), which would mean together with 'parent <parent>' (assuming that there are no other parents; if they are it gets even more complicated), that the following is true <current> ~= <parent> + (<sha2> - <sha1>), where '<rev1> ~= <rev2>' means that <rev1> is based on <rev2> (perhaps with some fixups, corrections or the like). Perhaps 'origin' should be then called 'changeset'. It would also be easier on implementation to check if 'origin'/'changeset' weak links are not broken, and to get to know which commits are to be protected against pruning than your proposal of origin <"cousin" id> [<mainline = parent number>] where <mainline> can be omitted if it is 1 (the default). This can also lead to replacing origin <b> <a> origin <c> <b> by origin <c> <a> for squash merge, or squash in rebase interactive. > - In an existing repository gc/prune shall not delete commits being > referred to by origin links. > > - During fetch/push/pull the full commit including the origin fields is > transmitted, however, the objects the origin links are referring to > are not (unless they are being transmitted because of other reasons). > > - When fetching/pulling it is optionally possible to tell git to > actually transmit objects referred to by origin links even if it would > otherwise not have done so. > > - git fsck will traverse origin links, but will stay silent if the > object an origin link points to is unreachable (kind of like a shallow > repository). The above means that it is a 'weak' link, i.e. it is protecting against pruning (perhaps influenced by some configuration variable), but it is not considered an error for it to be broken. > - git cherry-pick/revert allow for the creation of origin links only if > the object they are referring to is presently reachable. Errr... shouldn't objects referenced by 'origin' links be reachable in order for "cherry-pick" or "revert" to succeed? On the other hand this leads to the following question: what happens if you cherry-pick or revert a commit which has its own 'origin' links? > - git rev-list --topo-order will take origin links into account to > ensure proper ordering. What do you mean by that? > - gitk allows for (e.g.) dotted lines to show the origin links. > > - git log would show something like: > > commit bbb896d8e10f736bfda8f587c0009c358c9a8599 > Origin: d2b9dff..53d1589 > Origin: a1184d8..e596cdd > Author: Junio C Hamano <gitster@pobox.com> > Date: Sat Aug 30 14:35:15 2008 -0700 > > Note that for easy viewing: git diff d2b9dff..53d1589 > will show the exact diff the origin link is referring to. > > - git log --graph will show a dotted line of somesort just like gitk. That is I guess the whole and main reason for 'origin' links to exist, as having this information in free-form part, i.e. in the commit message might lead to problems (with parsing and extracting, and finding spurious links). > - git blame will follow and use the origin link if the object exists. Hmmmm... I'm not sure about that. > - git merge disregards the whole origin field entirely, just like all > the rest of git-core. Unless of course one uses more complex merge strategy, which doesn't take into account only endpoints (branches to be merged and merge bases), but is also affected in some by history... > Anything I missed? How would git-rebase make use of 'origin' links. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 15:44 ` Jakub Narebski @ 2008-09-09 16:38 ` Steven Grimm 2008-09-09 19:43 ` Stephen R. van den Berg 1 sibling, 0 replies; 137+ messages in thread From: Steven Grimm @ 2008-09-09 16:38 UTC (permalink / raw) To: Jakub Narebski; +Cc: Stephen R. van den Berg, git On Sep 9, 2008, at 8:44 AM, Jakub Narebski wrote: > This can also lead to replacing > > origin <b> <a> > origin <c> <b> > > by > > origin <c> <a> > > for squash merge, or squash in rebase interactive. And, incidentally, the above representation will potentially mesh well with svn integration, making it possible to cleanly represent svn 1.5 merge-tracking metadata directly in git. > Unless of course one uses more complex merge strategy, which doesn't > take into account only endpoints (branches to be merged and merge > bases), but is also affected in some by history... It does intuitively (but perhaps incorrectly) seem like the origin information could be used to make more intelligent decisions about automatic conflict resolution, if nothing else. Though obviously that might, as you suggest, be a pretty big departure from the way git merges currently work. -Steve ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 15:44 ` Jakub Narebski 2008-09-09 16:38 ` Steven Grimm @ 2008-09-09 19:43 ` Stephen R. van den Berg 2008-09-09 19:59 ` Jeff King ` (2 more replies) 1 sibling, 3 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 19:43 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski wrote: >"Stephen R. van den Berg" <srb@cuci.nl> writes: >> The definition of the origin field reads as follows: >> - There can be an arbitrary number of origin fields per commit. >> Typically there is going to be at most one origin field per commit. >I understand that multiple origin fields occur if you do a squash >merge, or if you cherry-pick multiple commits into single commit. >For example: > $ git cherry-pick -n <a1> > $ git cherry-pick <a2> > $ git commit --amend #; to correct commit message Correct. >I'm not sure if you plan to automatically add 'origin' field for >rebase, and for interactive rebase... That is not part of the plan so far. Can you explain what you would be expecting in the best case? >> - At the time of creation, the origin field contains a hash B which refers >> to a reachable commit pair (B, B~1). If B has multiple parents and the pair >> being referred to needs to be e.g. (B, B~2), then the hash is followed by >> a space and followed by an integer (base10, two in this case), >> which designates the proper parentnr of B (see: mainline in git >> cherry-pick/revert). >I think you wanted to use "(B, B^2)", which mean B and second parent >of B. B~2 means grandparent of B in the straight line: Correct, sorry about the confusion, I meant B^2 instead of B~2. >Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed >in the neighbouring subthread), which would mean together with >'parent <parent>' (assuming that there are no other parents; if they >are it gets even more complicated), that the following is true > <current> ~= <parent> + (<sha2> - <sha1>), >where '<rev1> ~= <rev2>' means that <rev1> is based on <rev2> (perhaps >with some fixups, corrections or the like). Perhaps 'origin' should >be then called 'changeset'. The simplicity sounds inviting. I'd like to hear from others who have more experience (than I have) with the git vs. changeset paradigms about this. This allows a bit more flexibility in specifying the origin, the question is if it's needed. >It would also be easier on implementation to check if >'origin'/'changeset' weak links are not broken, and to get to know >which commits are to be protected against pruning than your proposal >of > origin <"cousin" id> [<mainline = parent number>] >where <mainline> can be omitted if it is 1 (the default). On the contrary, my current proposal only needs to verify the validity of a single commit, changing it like this will require the system to verify the validity of two commits. Given the rareness of the origin links this will hardly present a problem, but it *does* increase the overhead in checking a bit. >This can also lead to replacing > origin <b> <a> > origin <c> <b> >by > origin <c> <a> >for squash merge, or squash in rebase interactive. Ok, *that* is not possible with the original proposal. This might just be the reason why we'd like to go with the dual-hash link. >> - git cherry-pick/revert allow for the creation of origin links only if >> the object they are referring to is presently reachable. >Errr... shouldn't objects referenced by 'origin' links be reachable in >order for "cherry-pick" or "revert" to succeed? True. But sometimes it's necessary to emphasize the obvious; call it a preemptive strike against possible objections to the proposal. >On the other hand this leads to the following question: what happens >if you cherry-pick or revert a commit which has its own 'origin' >links? Nothing special. cherry-pick/revert behave as if the existing origin links were not present in the first place. >> - git rev-list --topo-order will take origin links into account to >> ensure proper ordering. >What do you mean by that? The order in which commits are listed is defined by the fact that descendent commits are shown before any of their parents. The presence of an origin link will make sure that the current commit will always appear *before* the origin-commit it is referring to (if the origin-commit is in the displayed set, that is). >> - git log would show something like: >> commit bbb896d8e10f736bfda8f587c0009c358c9a8599 >> Origin: d2b9dff..53d1589 >> Origin: a1184d8..e596cdd >> Author: Junio C Hamano <gitster@pobox.com> >> Date: Sat Aug 30 14:35:15 2008 -0700 >> Note that for easy viewing: git diff d2b9dff..53d1589 >> will show the exact diff the origin link is referring to. >> - git log --graph will show a dotted line of somesort just like gitk. >That is I guess the whole and main reason for 'origin' links to exist, >as having this information in free-form part, i.e. in the commit >message might lead to problems (with parsing and extracting, and >finding spurious links). Quite. Also, having them in a well-defined place will allow for easy fixups in case of rebase/filter-branch. >> - git blame will follow and use the origin link if the object exists. >Hmmmm... I'm not sure about that. Care to explain your doubts? The reason I want this behaviour, is because it's all about tracking content, and that part of the content happens to come from somewhere else, and therefore blame should look there to "dig deeper" into it. >> - git merge disregards the whole origin field entirely, just like all >> the rest of git-core. >Unless of course one uses more complex merge strategy, which doesn't >take into account only endpoints (branches to be merged and merge >bases), but is also affected in some by history... Quite, but that is not a part of the definition of the origin field. I can only try and make sure that we have a well-defined, well-behaved mechanism in core git. If someone wants to get creative with the information presented, by all means, be my guest. >> Anything I missed? >How would git-rebase make use of 'origin' links. As far as I can imagine, git rebase should alter the origin links during rebase if they point to a commit within the strain being rebased. Are there any other desirable use cases (for rebase)? -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 19:43 ` Stephen R. van den Berg @ 2008-09-09 19:59 ` Jeff King 2008-09-09 20:25 ` Stephen R. van den Berg ` (2 more replies) 2008-09-09 20:54 ` Jakub Narebski 2008-09-09 23:35 ` Linus Torvalds 2 siblings, 3 replies; 137+ messages in thread From: Jeff King @ 2008-09-09 19:59 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, git On Tue, Sep 09, 2008 at 09:43:54PM +0200, Stephen R. van den Berg wrote: > >Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed > > The simplicity sounds inviting. I'd like to hear from others who have > more experience (than I have) with the git vs. changeset paradigms about > this. This allows a bit more flexibility in specifying the origin, the > question is if it's needed. One thing to keep in mind is that you are not just proposing some new behavior for a command, but rather a new header for the data structure that we will live with from now until eternity. So I think it makes sense to allow the general case even if nobody is generating it yet, if there is some chance that it may be useful for somebody to generate in the future. And yes, you can get _too_ general to the point where your semantics become meaningless. But I don't think that is the case here. You are defining the origin field as "by the way, the difference between state X and state Y was used to make this commit". cherry-pick just happens to make Y=X^, but something like rebase could use a series. As for "git vs changeset": this is git. So you have a sequence of tree states whether that is what you want or not. Thus you are specifying the difference between _some_ pair of commits. I don't see any benefit to restricting it to a commit and one of its parents. > On the contrary, my current proposal only needs to verify the validity > of a single commit, changing it like this will require the system to > verify the validity of two commits. Given the rareness of the origin > links this will hardly present a problem, but it *does* increase > the overhead in checking a bit. Actually, it could decrease it. If I tell you that you must have "X" and "X^2", then you could get away with just checking if you have "X". But you might also want to check whether "X" even _has_ a second parent. And that means not just looking up the object, but accessing it (resolving deltas if need be, uncompressing, parsing the object). With "X" and "Y", it is just two object lookups. Now obviously you don't have to be quite so careful in the "hash plus parent" case. And if you are going to _do_ anything with the origin field, you will end up accessing those objects anyway. But in that case, you end up with the same number of lookups and accesses anyway: 2 of each. > >On the other hand this leads to the following question: what happens > >if you cherry-pick or revert a commit which has its own 'origin' > >links? > > Nothing special. cherry-pick/revert behave as if the existing origin links > were not present in the first place. I think that is smart; if somebody wants to drill down into the history of origin links, they can do so at lookup time. -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 19:59 ` Jeff King @ 2008-09-09 20:25 ` Stephen R. van den Berg 2008-09-09 20:42 ` Junio C Hamano 2008-09-09 21:05 ` Junio C Hamano 2 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 20:25 UTC (permalink / raw) To: Jeff King; +Cc: Jakub Narebski, git Jeff King wrote: >On Tue, Sep 09, 2008 at 09:43:54PM +0200, Stephen R. van den Berg wrote: >> >Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed >> The simplicity sounds inviting. I'd like to hear from others who have >> more experience (than I have) with the git vs. changeset paradigms about >> this. This allows a bit more flexibility in specifying the origin, the >> question is if it's needed. >And yes, you can get _too_ general to the point where your semantics >become meaningless. But I don't think that is the case here. You are >defining the origin field as "by the way, the difference between state X >and state Y was used to make this commit". cherry-pick just happens to >make Y=X^, but something like rebase could use a series. >As for "git vs changeset": this is git. So you have a sequence of tree >states whether that is what you want or not. Thus you are specifying >the difference between _some_ pair of commits. I don't see any benefit >to restricting it to a commit and one of its parents. Quite. I'll drop the old format and adapt my proposal to use the double hash. As far as the naming of the field is concerned: a changeset is what the field describes, but changeset implies no sense of direction; origin makes it clear that the current commit was derived *from* the changeset represented by "origin". -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 19:59 ` Jeff King 2008-09-09 20:25 ` Stephen R. van den Berg @ 2008-09-09 20:42 ` Junio C Hamano 2008-09-09 20:47 ` Shawn O. Pearce ` (2 more replies) 2008-09-09 21:05 ` Junio C Hamano 2 siblings, 3 replies; 137+ messages in thread From: Junio C Hamano @ 2008-09-09 20:42 UTC (permalink / raw) To: Jeff King; +Cc: Stephen R. van den Berg, Jakub Narebski, git Jeff King <peff@peff.net> writes: > And yes, you can get _too_ general to the point where your semantics > become meaningless. But I don't think that is the case here. You are > defining the origin field as "by the way, the difference between state X > and state Y was used to make this commit". cherry-pick just happens to > make Y=X^, but something like rebase could use a series. > > As for "git vs changeset": this is git. So you have a sequence of tree > states whether that is what you want or not. Thus you are specifying > the difference between _some_ pair of commits. I don't see any benefit > to restricting it to a commit and one of its parents. As for "by the way ... was used to make this commit": this is git. So how you arrived at the tree state you record in a commit *does not matter*. Not only that, it is not just "the difference between state X and Y" that you used to come to that tree. Another thing that is involved is the specific cherry-pick implementation back when the commit was made. That was what gave you the tree. To my ears, it rhymes rather well with a famous quote from $gmane/217: You're freezing your (crappy) algorithm at tree creation time, and basically making it pointless to ever create something better later, because even if hardware and software improves, you've codified that "we have to have crappy information". After reading the discussion so far, I am still not convinced if this is a good idea, nor this time around it is that much different from what the previous "prior" link discussion tried to do. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 20:42 ` Junio C Hamano @ 2008-09-09 20:47 ` Shawn O. Pearce 2008-09-09 20:50 ` Jeff King 2008-09-10 0:13 ` Stephen R. van den Berg 2 siblings, 0 replies; 137+ messages in thread From: Shawn O. Pearce @ 2008-09-09 20:47 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Stephen R. van den Berg, Jakub Narebski, git Junio C Hamano <gitster@pobox.com> wrote: > > To my ears, it rhymes rather well with a famous quote from $gmane/217: > > You're freezing your (crappy) algorithm at tree creation time, and > basically making it pointless to ever create something better later, > because even if hardware and software improves, you've codified that > "we have to have crappy information". > > After reading the discussion so far, I am still not convinced if this is a > good idea, nor this time around it is that much different from what the > previous "prior" link discussion tried to do. Yup. Same here. I didn't see any information about why this "origin" link is needed here, just how it might work. And some of that "how" scared me because it was doing some sort of "soft" reachability, where errors aren't noticed but we are expected to protect the data from prune/repack forever once it has entered the repository. -- Shawn. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 20:42 ` Junio C Hamano 2008-09-09 20:47 ` Shawn O. Pearce @ 2008-09-09 20:50 ` Jeff King 2008-09-09 22:35 ` Jakub Narebski 2008-09-10 0:13 ` Stephen R. van den Berg 2 siblings, 1 reply; 137+ messages in thread From: Jeff King @ 2008-09-09 20:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: Stephen R. van den Berg, Jakub Narebski, git On Tue, Sep 09, 2008 at 01:42:52PM -0700, Junio C Hamano wrote: > As for "by the way ... was used to make this commit": this is git. So how > you arrived at the tree state you record in a commit *does not matter*. But it _does_ matter, which is why we have commit messages to explain how you arrived at this tree state. Now, that being said: > After reading the discussion so far, I am still not convinced if this is a > good idea, nor this time around it is that much different from what the > previous "prior" link discussion tried to do. For the record, I am not convinced it is a good idea either; I was hoping to steer it in a direction where somebody could say "and now this is the useful thing we can do now that we could not do before." If the ultimate goal is to put links to other commits into history viewers, then the commit message is a reasonable place to do so. The only thing I see improving with a header is that it makes more sense for pruning and object transfer. -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 20:50 ` Jeff King @ 2008-09-09 22:35 ` Jakub Narebski 2008-09-09 23:07 ` Jakub Narebski 0 siblings, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2008-09-09 22:35 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Stephen R. van den Berg, git On Tue, 9 Sep 2008, Jeff King wrote: > On Tue, Sep 09, 2008 at 01:42:52PM -0700, Junio C Hamano wrote: > > > As for "by the way ... was used to make this commit": this is git. So how > > you arrived at the tree state you record in a commit *does not matter*. > > But it _does_ matter, which is why we have commit messages to explain > how you arrived at this tree state. Well, that is why I was carefull to say that "origin <rev1> <rev2>" (or 'changeset', or 'cset') means that tree state for given commit is created out of parent commit (or parent commits in the case of merge) and of (<rev2> - <rev1>) patch. This is a bit of enhancement to "parent <rev>" meaning that tree state for current commit is derived from tree state of <rev>. This is nice generalization... > Now, that being said: > > > After reading the discussion so far, I am still not convinced if this is a > > good idea, nor this time around it is that much different from what the > > previous "prior" link discussion tried to do. > > For the record, I am not convinced it is a good idea either; I was > hoping to steer it in a direction where somebody could say "and now this > is the useful thing we can do now that we could not do before." If the > ultimate goal is to put links to other commits into history viewers, > then the commit message is a reasonable place to do so. The only thing I > see improving with a header is that it makes more sense for pruning and > object transfer. I'm also not all convinced that 'cousin'/'origin'/'changeset'/'cset' header is a good idea. I only tried to steer discussion in good direction if it is somewhat a good idea. First, if the only goal would be to add extra links (extra edges) to [graphical] history viewer, then full sha-1 of a commit which can be recorded in commit message for cherry-picks and reverts should be enough. It does mean parsing commit message, and all possibilities for mistake which are connected to using conventions in free-form part of commit object; on the other hand it is not _that_ critical. If however 'origin' links are more (perhaps only a tiny bit more), for example discussed "weak" links... then I'm not sure if the tradeoffs are worth it. First, if it is full connectivity like in 'parent' header case, then a) why not use 'parent' anyway, b) it pins the history indefinitely long. Second, if it is "weak" link, i.e. local protect it on prune, then a) there are problems with transferring the data, and protecting links on transfer, as somewhere in the middle or at the end there might be repository which uses older git (backwards compatibility strikes again), b) git in many, many places assumes that object is valid if it passes, and all objects linked to from object are valid; we would have either use some kind of separate 'not strictly checked' packfile/storage, or have grafts-like thingy. So I'm not sure if 'origin' links are worth the trouble. About much, much earlier "prior" link discussion: I think the discussion about "prior" header link was done before reflogs, or at least before reflogs got turned on by default. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 22:35 ` Jakub Narebski @ 2008-09-09 23:07 ` Jakub Narebski 2008-09-10 8:10 ` Paolo Bonzini 0 siblings, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2008-09-09 23:07 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Stephen R. van den Berg, git Jakub Narebski wrote: > On Tue, 9 Sep 2008, Jeff King wrote: > > On Tue, Sep 09, 2008 at 01:42:52PM -0700, Junio C Hamano wrote: > > Now, that being said: > > > > > After reading the discussion so far, I am still not convinced if this is a > > > good idea, nor this time around it is that much different from what the > > > previous "prior" link discussion tried to do. > > > > For the record, I am not convinced it is a good idea either; I was > > hoping to steer it in a direction where somebody could say "and now this > > is the useful thing we can do now that we could not do before." If the > > ultimate goal is to put links to other commits into history viewers, > > then the commit message is a reasonable place to do so. The only thing I > > see improving with a header is that it makes more sense for pruning and > > object transfer. > > I'm also not all convinced that 'cousin'/'origin'/'changeset'/'cset' > header is a good idea. I only tried to steer discussion in good > direction if it is somewhat a good idea. By the way, beside graphical history viewers it would also help rebase (and git-cherry) notice when patch was already applied better. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 23:07 ` Jakub Narebski @ 2008-09-10 8:10 ` Paolo Bonzini 0 siblings, 0 replies; 137+ messages in thread From: Paolo Bonzini @ 2008-09-10 8:10 UTC (permalink / raw) To: Jakub Narebski; +Cc: Jeff King, Junio C Hamano, Stephen R. van den Berg, git > By the way, beside graphical history viewers it would also help rebase > (and git-cherry) notice when patch was already applied better. I think that rebase had better not trust the origin links in deciding whether a patch was already applied; it already does it well enough. git-cherry is another story, as that tool is not faking "changeset mode" so well (because it cannot attempt merges, and these are what allows git-rebase to fake changesets much better). Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 20:42 ` Junio C Hamano 2008-09-09 20:47 ` Shawn O. Pearce 2008-09-09 20:50 ` Jeff King @ 2008-09-10 0:13 ` Stephen R. van den Berg 2008-09-10 1:59 ` Junio C Hamano 2 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 0:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Jakub Narebski, git Junio C Hamano wrote: >As for "by the way ... was used to make this commit": this is git. So how >you arrived at the tree state you record in a commit *does not matter*. The typical use case for the origin links is in a project with several long-lived branches which use cherry-picks to backport amongst them. There is no real other way to solve this case, except for some rather kludgy stuff in the free-form commit message which doesn't mesh well with rebase/filter-branch/stgit etc. As to "does not matter": then why does git store parent links? >To my ears, it rhymes rather well with a famous quote from $gmane/217: > You're freezing your (crappy) algorithm at tree creation time, and > basically making it pointless to ever create something better later, > because even if hardware and software improves, you've codified that > "we have to have crappy information". I tried to accomodate this approach by overloading the parent link and then making git more intelligent to figure out if it is a cherry-pick or not. That was deemed undesirable, so using the origin links is the next best thing (IMHO). >good idea, nor this time around it is that much different from what the >previous "prior" link discussion tried to do. It is well-defined this time, and doesn't bleed across fetch/pull. -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 0:13 ` Stephen R. van den Berg @ 2008-09-10 1:59 ` Junio C Hamano 2008-09-10 5:38 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Junio C Hamano @ 2008-09-10 1:59 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jeff King, Jakub Narebski, git "Stephen R. van den Berg" <srb@cuci.nl> writes: > Junio C Hamano wrote: >>As for "by the way ... was used to make this commit": this is git. So how >>you arrived at the tree state you record in a commit *does not matter*. > > The typical use case for the origin links is in a project with several > long-lived branches which use cherry-picks to backport amongst them. > There is no real other way to solve this case, except for some rather > kludgy stuff in the free-form commit message which doesn't mesh well > with rebase/filter-branch/stgit etc. > > As to "does not matter": then why does git store parent links? The parent links describe *where* you came from, not *how*. And if you think the difference is just "semantics", then you haven't grokked the first lesson I gave in this thread. "parents" record the reference points against which you make "this resulting commit suits the purpose of my branch better than any histories leading to these commits". ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 1:59 ` Junio C Hamano @ 2008-09-10 5:38 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 5:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Jakub Narebski, git Junio C Hamano wrote: >"Stephen R. van den Berg" <srb@cuci.nl> writes: >> Junio C Hamano wrote: >>>As for "by the way ... was used to make this commit": this is git. So how >>>you arrived at the tree state you record in a commit *does not matter*. >> The typical use case for the origin links is in a project with several >> long-lived branches which use cherry-picks to backport amongst them. >> There is no real other way to solve this case, except for some rather >> kludgy stuff in the free-form commit message which doesn't mesh well >> with rebase/filter-branch/stgit etc. >> As to "does not matter": then why does git store parent links? >The parent links describe *where* you came from, not *how*. >And if you think the difference is just "semantics", then you haven't >grokked the first lesson I gave in this thread. "parents" record the >reference points against which you make "this resulting commit suits the >purpose of my branch better than any histories leading to these commits". The last question of mine was/is a rethorical one. Consider the typical use case I describe above. The developer usually has just created a commit in the developmentbranch, tested it, and deems the patch worthwhile enough to backport it to the latest stable branch. So he cherry picks the from the development branch to the latest stable branch. Then tests it, and decides to backport it to the older stable branch, so he cherry-picks it again, and commits it there too. The is repeated in rapid succession on three older stable branches as well. Basically that means that for the patch itself, there is a path in history to follow as well. I.e. the patch itself evolves over time. Now, when another developer makes an additional change to this patch in one of the stable versions, it is very helpful to actually be able to have git tell you where the original patch came from and to follow back the chain upward. It allows you to forward/backward port the new change more easily. Basically, the normal parent links allow you to follow evolving snapshots of the complete source-tree, whereas the origin links allow you to follow evolving snapshots of a patch. As it happens, the shortest way to describe a patch in git is by specifying two commits of which the difference is exactly your patch. -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 19:59 ` Jeff King 2008-09-09 20:25 ` Stephen R. van den Berg 2008-09-09 20:42 ` Junio C Hamano @ 2008-09-09 21:05 ` Junio C Hamano 2008-09-09 21:09 ` Jeff King 2 siblings, 1 reply; 137+ messages in thread From: Junio C Hamano @ 2008-09-09 21:05 UTC (permalink / raw) To: Jeff King; +Cc: Stephen R. van den Berg, Jakub Narebski, git Jeff King <peff@peff.net> writes: > And yes, you can get _too_ general to the point where your semantics > become meaningless. But I don't think that is the case here. You are > defining the origin field as "by the way, the difference between state X > and state Y was used to make this commit". cherry-pick just happens to > make Y=X^, but something like rebase could use a series. Another thing that made me wonder... To be consistent, when you are at HEAD and are merging side branch B, because that merge is to incorporate what happened on the side branch while you are looking the other way, we should say "by the way, the difference between state $(git merge-base HEAD B) and state B was used to make this commit." in the resulting merge commit, shouldn't we? What happens if there is more than one merge base? ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 21:05 ` Junio C Hamano @ 2008-09-09 21:09 ` Jeff King 2008-09-09 23:36 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Jeff King @ 2008-09-09 21:09 UTC (permalink / raw) To: Junio C Hamano; +Cc: Stephen R. van den Berg, Jakub Narebski, git On Tue, Sep 09, 2008 at 02:05:28PM -0700, Junio C Hamano wrote: > To be consistent, when you are at HEAD and are merging side branch B, > because that merge is to incorporate what happened on the side branch > while you are looking the other way, we should say "by the way, the > difference between state $(git merge-base HEAD B) and state B was used to > make this commit." in the resulting merge commit, shouldn't we? I suppose you could, though in that case you can obviously calculate the merge base yourself from the parents of the merge. The difference with cherry-picking (or rebasing) is that you might not otherwise know about the "by the way" commits. -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 21:09 ` Jeff King @ 2008-09-09 23:36 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 23:36 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Jakub Narebski, git Jeff King wrote: >On Tue, Sep 09, 2008 at 02:05:28PM -0700, Junio C Hamano wrote: >> To be consistent, when you are at HEAD and are merging side branch B, >> because that merge is to incorporate what happened on the side branch >> while you are looking the other way, we should say "by the way, the >> difference between state $(git merge-base HEAD B) and state B was used to >> make this commit." in the resulting merge commit, shouldn't we? >I suppose you could, though in that case you can obviously calculate the >merge base yourself from the parents of the merge. The difference with >cherry-picking (or rebasing) is that you might not otherwise know about >the "by the way" commits. Quite. The origin link is primarily intended for cherry-picks/reverts which are otherwise difficult to find. Anything that can use the normal parent mechanism has no business using the origin links. -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 19:43 ` Stephen R. van den Berg 2008-09-09 19:59 ` Jeff King @ 2008-09-09 20:54 ` Jakub Narebski 2008-09-09 23:08 ` Stephen R. van den Berg 2008-09-09 23:35 ` Linus Torvalds 2 siblings, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2008-09-09 20:54 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: git On Tue, 9 Sep 2008, Stephen R. van den Berg wrote: > Jakub Narebski wrote: >>"Stephen R. van den Berg" <srb@cuci.nl> writes: >>> >>> The definition of the origin field reads as follows: [...] >>> - There can be an arbitrary number of origin fields per commit. >>> Typically there is going to be at most one origin field per commit. >> >> I understand that multiple origin fields occur if you do a squash >> merge, or if you cherry-pick multiple commits into single commit. [...] >> I'm not sure if you plan to automatically add 'origin' field for >> rebase, and for interactive rebase... > > That is not part of the plan so far. > Can you explain what you would be expecting in the best case? After thinking about this a bit, I don't think that (recording origin(al) commits) for rebased commits would be good idea. While one can reasonably expect that cherry-picked changes should stay, and reverted changes even more so (usually one reverts commit from a history), usually the original commits being rebased are meant to be pruned; the are rebased. >>> - At the time of creation, the origin field contains a hash B which refers >>> to a reachable commit pair (B, B~1). If B has multiple parents and the pair >>> being referred to needs to be e.g. (B, B~2), then the hash is followed by >>> a space and followed by an integer (base10, two in this case), >>> which designates the proper parentnr of B (see: mainline in git >>> cherry-pick/revert). [...] >> Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed >> in the neighbouring subthread), which would mean together with >> 'parent <parent>' (assuming that there are no other parents; if they >> are it gets even more complicated), that the following is true >> >> <current> ~= <parent> + (<sha2> - <sha1>), > >> where '<rev1> ~= <rev2>' means that <rev1> is based on <rev2> (perhaps >> with some fixups, corrections or the like). Perhaps 'origin' should >> be then called 'changeset'. > > The simplicity sounds inviting. I'd like to hear from others who have > more experience (than I have) with the git vs. changeset paradigms about > this. This allows a bit more flexibility in specifying the origin, the > question is if it's needed. It is the simplicity that it is the most compelling of this solution. For revert we have "origin B B^", for cherry-pick we have "origin A^ A"; (or 'changeset') and always we have <rev> =~ <rev>^ + (<r2> - <r1>), where '-' denote diff operation (<diff> = <tree1> - <tree2>), and '+' denote patch application (<tree1> = <tree2> + <diff>). [ADDED LATER] Also it could be useful for patch management interfaces using Git as engine, such as StGIT, Guilt (formerly gq), TopGit, or now defunct, obsoleted and no longer maintained Patchy Git aka 'pg'. The "weak" 'origin'/'changeset' header would allow some sort of operating on patches instead of usual operating on tree states. >> It would also be easier on implementation to check if >> 'origin'/'changeset' weak links are not broken, and to get to know >> which commits are to be protected against pruning than your proposal >> of >> >> origin <"cousin" id> [<mainline = parent number>] >> >> where <mainline> can be omitted if it is 1 (the default). > > On the contrary, my current proposal only needs to verify the validity > of a single commit, changing it like this will require the system to > verify the validity of two commits. Given the rareness of the origin > links this will hardly present a problem, but it *does* increase > the overhead in checking a bit. Errr... wasn't you proposing to keep/protect against pruning <cousin> AND <cousin>^<mainline>? You want to have _diff_ (changeset) protected, not a single tree state. And having "origin <r1> <r2>" makes it easier then to check validity; you don't need to get <r1>, check if it has <mainline> parent and what it is, and then check if <r1>^<mainline> exists (and is not for example behind shallow clone barrier). >>> - git cherry-pick/revert allow for the creation of origin links only if >>> the object they are referring to is presently reachable. > >> Errr... shouldn't objects referenced by 'origin' links be reachable in >> order for "cherry-pick" or "revert" to succeed? > > True. But sometimes it's necessary to emphasize the obvious; call it a > preemptive strike against possible objections to the proposal. I don't think that it is true in this case. This sentence _looks_ like it offers / requires additional protection, while this "protection" is already ensured by the fact of cherry-picking or reverting a commit. >> On the other hand this leads to the following question: what happens >> if you cherry-pick or revert a commit which has its own 'origin' >> links? > > Nothing special. cherry-pick/revert behave as if the existing origin links > were not present in the first place. O.K. >>> - git rev-list --topo-order will take origin links into account to >>> ensure proper ordering. >> >> What do you mean by that? > > The order in which commits are listed is defined by the fact that > descendent commits are shown before any of their parents. The presence > of an origin link will make sure that the current commit will always > appear *before* the origin-commit it is referring to (if the > origin-commit is in the displayed set, that is). Hmmm... while I think it might be a good idea, I'm not sure about its overhead. Should be much, I guess. >>> - git log would show something like: [...] >>> - git log --graph will show a dotted line of somesort just like gitk. >> >> That is I guess the whole and main reason for 'origin' links to exist, >> as having this information in free-form part, i.e. in the commit >> message might lead to problems (with parsing and extracting, and >> finding spurious links). > > Quite. Also, having them in a well-defined place will allow for easy > fixups in case of rebase/filter-branch. True. >>> - git blame will follow and use the origin link if the object exists. >> >> Hmmmm... I'm not sure about that. > > Care to explain your doubts? > The reason I want this behaviour, is because it's all about tracking > content, and that part of the content happens to come from somewhere > else, and therefore blame should look there to "dig deeper" into it. But blame is all about what commit brought some line to currents version. So the cherry-pick itself, or revert of a commit itself would be blamed, and should be blamed, not its parents, nor commit which got cherry-picked, or commit which got reversed. It would be nice to be able to follow 'origin'/'changeset' lines in the _graphical_ blame browser like blameview or "git gui blame". -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 20:54 ` Jakub Narebski @ 2008-09-09 23:08 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 23:08 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski wrote: >On Tue, 9 Sep 2008, Stephen R. van den Berg wrote: >> On the contrary, my current proposal only needs to verify the validity >> of a single commit, changing it like this will require the system to >> verify the validity of two commits. Given the rareness of the origin >> links this will hardly present a problem, but it *does* increase >> the overhead in checking a bit. >Errr... wasn't you proposing to keep/protect against pruning <cousin> >AND <cousin>^<mainline>? You want to have _diff_ (changeset) protected, >not a single tree state. Actually, making sure that the commit we reference in the origin link exists, we implicitly prove that all the parents of that commit exist as well. Then again, this point is moot since I already conceded (in a different thread) that storing two hashes is better. >>>> - git rev-list --topo-order will take origin links into account to >>>> ensure proper ordering. >Hmmm... while I think it might be a good idea, I'm not sure about its >overhead. Should be much, I guess. Actually, I have already programmed this part, and the overhead is close to zero. >>>> - git blame will follow and use the origin link if the object exists. >>> Hmmmm... I'm not sure about that. >> Care to explain your doubts? >> The reason I want this behaviour, is because it's all about tracking >> content, and that part of the content happens to come from somewhere >> else, and therefore blame should look there to "dig deeper" into it. >But blame is all about what commit brought some line to currents version. >So the cherry-pick itself, or revert of a commit itself would be blamed, >and should be blamed, not its parents, nor commit which got cherry-picked, >or commit which got reversed. Well, it depends, I guess. If you'd go for a "committer" based display, then following origin links is bad. If you'd go for an "author" based display, then following origin links should be the default (IMHO). -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 19:43 ` Stephen R. van den Berg 2008-09-09 19:59 ` Jeff King 2008-09-09 20:54 ` Jakub Narebski @ 2008-09-09 23:35 ` Linus Torvalds 2008-09-09 23:58 ` Stephen R. van den Berg 2008-09-09 23:59 ` Jakub Narebski 2 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2008-09-09 23:35 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, git On Tue, 9 Sep 2008, Stephen R. van den Berg wrote: > Jakub Narebski wrote: > >"Stephen R. van den Berg" <srb@cuci.nl> writes: > >> The definition of the origin field reads as follows: > > >> - There can be an arbitrary number of origin fields per commit. > >> Typically there is going to be at most one origin field per commit. > > >I understand that multiple origin fields occur if you do a squash > >merge, or if you cherry-pick multiple commits into single commit. > >For example: > > $ git cherry-pick -n <a1> > > $ git cherry-pick <a2> > > $ git commit --amend #; to correct commit message > > Correct. Quite frankly, recording the origins for _any_ of the above sounds like a horribly mistake. All those operations are commonly used (along with "git rebase -i") to clean up history in order to show a nicer version. The whole point of "origin" seems to be to _destroy_ that. I would refuse to ever touch anything that had an "origin" pointer, so if git were to add that feature, it would be a huge disappointment to me. I'd have to have a version that makes sure that anything it pulls hasn't been crapped on by somebody who added a stupid link to some dirty history that I'm not at all interested in seeing. IOW, I'm seeing a _lot_ of downsides, and not any actual upsides. What are the upsides again? Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 23:35 ` Linus Torvalds @ 2008-09-09 23:58 ` Stephen R. van den Berg 2008-09-10 0:23 ` Linus Torvalds 2008-09-09 23:59 ` Jakub Narebski 1 sibling, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 23:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git Linus Torvalds wrote: >On Tue, 9 Sep 2008, Stephen R. van den Berg wrote: >> Jakub Narebski wrote: >> >I understand that multiple origin fields occur if you do a squash >> >merge, or if you cherry-pick multiple commits into single commit. >> >For example: >> > $ git cherry-pick -n <a1> >> > $ git cherry-pick <a2> >> > $ git commit --amend #; to correct commit message >> Correct. >All those operations are commonly used (along with "git rebase -i") to >clean up history in order to show a nicer version. Actually, I'd suggest that cherry-pick takes an -o flag which turns on the origin link. This needs to be a concious decision because one deems the history relevant. This typically is a Good Thing when the development has several long-term-stable branches which never get merged with each other, yet they receive frequent backports (using cherry-pick) between them. >The whole point of "origin" seems to be to _destroy_ that. Only in the case where the committer thinks the history is of interest, and even then, since the origin link is in the header, displaying it or not suddenly is under the control of git. Had it been in the free-form textarea, there'd be no way suppress the display of it. >I would refuse to ever touch anything that had an "origin" pointer, so if >git were to add that feature, it would be a huge disappointment to me. I'd >have to have a version that makes sure that anything it pulls hasn't been >crapped on by somebody who added a stupid link to some dirty history that >I'm not at all interested in seeing. As you might have noticed, the actual process of pulling/fetching explicitly does *not* pull in the objects being pointed to. That, in turn, will cause the origin link output to be automatically suppressed. I.e. you'll never know the difference. OTOH, if someone adds a free-form link to the commit message, you essentially cannot hide that and are just suffering the clutter without having any use for it. >IOW, I'm seeing a _lot_ of downsides, and not any actual upsides. What are >the upsides again? The upsides are: - If your repository contains the proper branches, it will show a richer content. - If your repository lacks the proper branches, it will show a *reduced* clutter content (because actual free-text references in the commit messages will decrease). I see a lot of upsides, what were the downsides again? -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 23:58 ` Stephen R. van den Berg @ 2008-09-10 0:23 ` Linus Torvalds 2008-09-10 5:42 ` Stephen R. van den Berg 2008-09-10 8:30 ` Paolo Bonzini 0 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2008-09-10 0:23 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, git On Wed, 10 Sep 2008, Stephen R. van den Berg wrote: > > As you might have noticed, the actual process of pulling/fetching > explicitly does *not* pull in the objects being pointed to. .. which makes them _local_ data, which in turn means that they should not be in the object database at all. IOW, i you want this for local reasons, you should use a local database, like the index or the reflogs (and I don't mean "like the index" in the sense that it would look _anything_ like that file, but in the sense that it's a purely local thing and doesn't show up in the object database). Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 0:23 ` Linus Torvalds @ 2008-09-10 5:42 ` Stephen R. van den Berg 2008-09-10 15:30 ` Linus Torvalds 2008-09-10 8:30 ` Paolo Bonzini 1 sibling, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 5:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git Linus Torvalds wrote: >On Wed, 10 Sep 2008, Stephen R. van den Berg wrote: >> As you might have noticed, the actual process of pulling/fetching >> explicitly does *not* pull in the objects being pointed to. >.. which makes them _local_ data, which in turn means that they should not >be in the object database at all. >IOW, i you want this for local reasons, you should use a local database, >like the index or the reflogs (and I don't mean "like the index" in the >sense that it would look _anything_ like that file, but in the sense that >it's a purely local thing and doesn't show up in the object database). But then how would someone who clones the repository get at the information? The information is essential to understand backports between the various stable branches. The origin links describe the evolving state of a patch (i.e. just like regular commits/parents store snapshots of the whole tree, the origin links store snapshots of a patch as it evolves through time). -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 5:42 ` Stephen R. van den Berg @ 2008-09-10 15:30 ` Linus Torvalds 2008-09-10 23:09 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2008-09-10 15:30 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, git On Wed, 10 Sep 2008, Stephen R. van den Berg wrote: > > But then how would someone who clones the repository get at the information? You just said it wouldn't get there with fetches. If clone acts differently from a "full" fetch, something is really really wrong. > The information is essential to understand backports between the various > stable branches. No it's not. You can mention the backport explicitly in the commit message, and then you get hyperlinks in the graphical viewers. That works when people _want_ it to work, instead of in some hidden automatic manner that does entirely the wrong thing in all the common cases. What more do you want? Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 15:30 ` Linus Torvalds @ 2008-09-10 23:09 ` Stephen R. van den Berg 2008-09-11 0:39 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 23:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git Linus Torvalds wrote: >On Wed, 10 Sep 2008, Stephen R. van den Berg wrote: >> But then how would someone who clones the repository get at the information? >You just said it wouldn't get there with fetches. True and still valid. >If clone acts differently from a "full" fetch, something is really really >wrong. It does not act differently. Let me elaborate: - The origin field is part of the commit (and only present if *consciously* added by the committer), and therefore is transmitted along with the rest of a commit upon a fetch. - The commits being referred to by the origin field are *not* transmitted upon a fetch. - Given a repository with 4 long lived published branches called A, B, C and D and a backport from development branch D cherry-picked -o into branch A which creates an origin field pointing back to (D^,D^^) - Now you fetch just branch A from this repository. This will not cause branch D to be pulled in as well. - However, if you explicitly pull D, the origin information from A to D can be used. People doing a generic clone get all four branches, and therefore have all the important commits which normally could contain origin links. Note that even during a clone, commits pointed to by origin links are not being transmitted (unless there already are other reasons to send them along). >> The information is essential to understand backports between the various >> stable branches. >No it's not. You can mention the backport explicitly in the commit >message, and then you get hyperlinks in the graphical viewers. That works >when people _want_ it to work, instead of in some hidden automatic manner >that does entirely the wrong thing in all the common cases. Could you spell out one of the common cases where it would do entirely the wrong thing? -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 23:09 ` Stephen R. van den Berg @ 2008-09-11 0:39 ` Linus Torvalds 2008-09-11 6:22 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2008-09-11 0:39 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > > - However, if you explicitly pull D, the origin information from A to D can > be used. People doing a generic clone get all four branches, and > therefore have all the important commits which normally could contain > origin links. Note that even during a clone, commits pointed to by > origin links are not being transmitted (unless there already are other > reasons to send them along). IOW, it's not actually transferring them and saving them, since a simple delete of the origin branch will basically make them unreachable. Fine. At least it works the same way as fetch, then. But it's still a huge mistake, because it really does mean that it is technically no different at all to just mentioning the SHA1 in the commit message, the way we already do for backports. The "origin" link has no _meaning_ for git, in other words. > >No it's not. You can mention the backport explicitly in the commit > >message, and then you get hyperlinks in the graphical viewers. That works > >when people _want_ it to work, instead of in some hidden automatic manner > >that does entirely the wrong thing in all the common cases. > > Could you spell out one of the common cases where it would do entirely > the wrong thing? It carries along information that is worthless and meaningless and hidden. I refuse to touch such an obviously braindamaged design. It has no sane _semantics_. If it doesn't have semantics, it shouldn't exist, certainly not as some architected feature. Nobody has shown any actual sane meaning for it. The only ones that have been mentioned have been for things like avoiding re-picking commits during a "git rebase", but (a) the patch SHA1 does that already for things that are truly identical an (b) since that information isn't reliable _anyway_, and since it's apparently a user choice, it's just "random". I'm sorry, but "good design" is a hell of a lot more important than some made-up use case that isn't even reliable, and doesn't match any actual real problems that anybody can explain. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 0:39 ` Linus Torvalds @ 2008-09-11 6:22 ` Stephen R. van den Berg 2008-09-11 8:20 ` Jakub Narebski ` (2 more replies) 0 siblings, 3 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 6:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git Linus Torvalds wrote: >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> - However, if you explicitly pull D, the origin information from A to D can >> be used. People doing a generic clone get all four branches, and >> therefore have all the important commits which normally could contain >> origin links. Note that even during a clone, commits pointed to by >> origin links are not being transmitted (unless there already are other >> reasons to send them along). >IOW, it's not actually transferring them and saving them, since a simple Correct. >delete of the origin branch will basically make them unreachable. False. If you fetch just branches A, B and C, but not D, the origin link from A to D is dangling. Once you have fetched D as well, the origin link from A to D is not dangling anymore. Subsequently deleting branch D but keeping branch A will keep everything in branch D up till the commits the origin link is pointing to alive and prevent those from being deleted. >Fine. At least it works the same way as fetch, then. But it's still a huge >mistake, because it really does mean that it is technically no different >at all to just mentioning the SHA1 in the commit message, the way we >already do for backports. Not quite. >The "origin" link has no _meaning_ for git, in other words. Git will keep alive commits based on origin links once you (the fetcher) has shown interest by fetching the appropriate branches. As to "meaning" for git, it's there in the form of: - --topo-order uses the information to order the output (but only if the target commits of the link are present in the repository). >> >No it's not. You can mention the backport explicitly in the commit >> >message, and then you get hyperlinks in the graphical viewers. That works >> >when people _want_ it to work, instead of in some hidden automatic manner >> >that does entirely the wrong thing in all the common cases. >> Could you spell out one of the common cases where it would do entirely >> the wrong thing? >It carries along information that is worthless and meaningless and hidden. The common cases would be: a. "hidden": It doesn't need to be hidden. It can be hidden if you want it to be. We can decide if git hides it sometimes, always or never. So this point is moot. b. "meaningless": Git is all about taking snapshots of sourcetrees and linking them in an orderly fashion. The origin link is all about taking snapshots of patches and linking them in an orderly fashion. This allows you to see the patch evolve over time, and it allows for diffs between patches. We're not actually storing patches, we merely store snapshots. As it happens, the snapshot of a patch is defined by two commit hashes. Doesn't sound meaningless to me. Just as one needs normal history between commits in a branch to follow development, there is a history of a backport as it "travels" from stable branch to stable branch. c. "worthless": Without the tracking of a backport through a series of well-defined patch-snapshots, it becomes kind of haphazard to actually figure out which piece of code came from where. Having this information in the form of a series of origin links increases the efficiency of a developer maintaining the backports between branches. Maybe you consider that worthless, I consider anything that improves code quality because having access to a concise history of how the code evolved a Good Thing. Having history of how code evolved is actually one of the main reasons why people use git. It's just that git lacks support in the tracking of backports. The origin link fills that gap. If you don't do backports in your trees, then fine, the origin link will never materialise in your repositories. >I refuse to touch such an obviously braindamaged design. It has no sane >_semantics_. If it doesn't have semantics, it shouldn't exist, certainly >not as some architected feature. It does have sane semantics, quite well defined, actually. I'm just not good at explaining them apparently. Try reading the explanation I gave above. >Nobody has shown any actual sane meaning for it. The only ones that have >been mentioned have been for things like avoiding re-picking commits >during a "git rebase", but (a) the patch SHA1 does that already for things >that are truly identical an (b) since that information isn't reliable >_anyway_, and since it's apparently a user choice, it's just "random". Quite frankly I don't see the application for rebase either (yet). I'm focusing on sane semantics first, any implications that has for usability by rebase will follow from that. The origin links track content (patches), nothing else. They assist the developer in understanding how patches evolve, any use cases follow from that. >I'm sorry, but "good design" is a hell of a lot more important than some >made-up use case that isn't even reliable, and doesn't match any actual >real problems that anybody can explain. Please focus on the semantics and on the *non*-made up use case of development of several stable branches with backports between them. Discussing made-up use cases is wasting energy at this point. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 6:22 ` Stephen R. van den Berg @ 2008-09-11 8:20 ` Jakub Narebski 2008-09-11 12:31 ` Stephen R. van den Berg 2008-09-11 12:28 ` A Large Angry SCM 2008-09-11 15:39 ` Linus Torvalds 2 siblings, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2008-09-11 8:20 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Linus Torvalds, git Stephen R. van den Berg wrote: [...] > Please focus on the semantics and on the *non*-made up use case of > development of several stable branches with backports between them. > Discussing made-up use cases is wasting energy at this point. By the way, I would really consider trying first to host 'origin' links not in repository database itself, but in some extra database inside git repository, like reflog or index. Git community is _very_ reluctant to modifying / extending format of persistent objects. From all the proposals to add some extra header to a 'commit' object: the 'prior' link to previous version of rebased, cherry-picked or redone commit (superceded somewhat by local reflog, on by default in modern git); the generic 'note' header, with examples of usage including _non-linking_ cherry-pick and reverted commit-id, merge strategy used, and hints for rename detection, i.e. something like #pragma in C (rejected on the grounds that it was too generic and didn't have well defined semantic); the 'generation' header which was meant to help and speed up sorting commits, with root (parentless) commit having generation of 1, and each commit having generation being 1 more than maximum of generations of its parents (I think that backwards compatibility killed it, and the fact that date-based heuristics was improved); only the 'encoding' header was accepted. So I think you should go the route of externally (outside 'commit' objects) maintaing 'origin'/'changeset'/'cset' links (like XLink extended links ;-)) as a prototype to examine consequences of the idea. That was the way _submodule_ support was added to Git, by the way. First there were (at least) two implementations maintaining submodules outside object database (see http://git.or.cz/gitwiki/SubprojectSupport especially "References" section), then it was officially added first at the level of plumbing support, as extension of a 'tree' object (and index format, I think). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 8:20 ` Jakub Narebski @ 2008-09-11 12:31 ` Stephen R. van den Berg 2008-09-11 13:51 ` Theodore Tso 2008-09-11 15:02 ` Nicolas Pitre 0 siblings, 2 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 12:31 UTC (permalink / raw) To: Jakub Narebski; +Cc: Linus Torvalds, git Jakub Narebski wrote: >Stephen R. van den Berg wrote: >[...] >> Please focus on the semantics and on the *non*-made up use case of >> development of several stable branches with backports between them. >> Discussing made-up use cases is wasting energy at this point. >By the way, I would really consider trying first to host 'origin' links >not in repository database itself, but in some extra database inside >git repository, like reflog or index. Git community is _very_ >reluctant to modifying / extending format of persistent objects. From Rightfully so, of course. >So I think you should go the route of externally (outside 'commit' >objects) maintaing 'origin'/'changeset'/'cset' links (like XLink >extended links ;-)) as a prototype to examine consequences of the idea. >That was the way _submodule_ support was added to Git, by the way. >First there were (at least) two implementations maintaining submodules >outside object database (see http://git.or.cz/gitwiki/SubprojectSupport >especially "References" section), then it was officially added first at >the level of plumbing support, as extension of a 'tree' object (and >index format, I think). Well, the train of thought here goes as follows: 1. Sure, why not add a field (zero or more) at the bottom of the free-form commit message reading like: Origin: bbb896d8e10f736bfda8f587c0009c358c9a8599 ee837244df2e2e4e9171f508f83f353730db9e53 2. Add support to cherry-pick/revert to actually generate the field upon demand. 3. Then add support to prune/gc/fsck/blame/log --graph to take the field into account. 4. Add support to filter-branch/rebase to renumber the field if necessary. 5. Add support to --topo-order to use the field if present and reachable. 6. For bonus points: add support to log to suppress the display of the field at the end of the commit message, and redisplay the field as Origin: bbb896d..ee83724 next to the Parent/Merge fields. Well, and after having done steps 1 to 5, the net result is that it works almost as if the field is present in the header, except that: - It is now at the end of the body in the commit message. - It takes more time to find and parse it. So that gives two minuses, and no pluses. So short-circuiting the reasoning suggests that since the only thing that actually changes now is the position of the field (at the top or end of the commit message), we might as well do it right and put it in the top, that gets rid of the two minuses. Anything I missed? Basically it means that: a. If there is a better solution to tracking the backports, I'll gladly use that instead, but simply using the current really freeform approach doesn't cut it (it currently refers to a single commit, instead of a pair of commits, and takes too long to parse out in a --top-order or blame command). Better solutions I haven't heard so far. b. I need the integrity protection of a commit to make sure that the origin fields cannot be altered later; blame would be too easy to fool otherwise. So using the notes solution seems to be out (it would also be quite a performance hit again). c. I consider the Origin: field at the end of the commit message a workable solution, but it smells like X-header-extension-messes as in E-mail headers, and it incurs a small performance hit (in case of --topo-order/blame/prune/fsck), but maybe this performance hit can be minimised by making sure that the fields are *always* at the end of the commit message. d. Using the proposed origin header in the standard commit header has close to zero overhead (in most commits the field is not present), yet codecomplexitywise it is almost identical with the Origin: field at the end of the commit message. I find it remarkable though that people are dragging their feet at solution d, yet are quite ok with solution c. IMO solution c and d are almost identical, except that solution c is ugly, and solution d is elegant. But if it makes it easier to prove the usefulness by implementing the ugly solution first, that's fine. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 12:31 ` Stephen R. van den Berg @ 2008-09-11 13:51 ` Theodore Tso 2008-09-11 15:32 ` Stephen R. van den Berg 2008-09-11 15:02 ` Nicolas Pitre 1 sibling, 1 reply; 137+ messages in thread From: Theodore Tso @ 2008-09-11 13:51 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git On Thu, Sep 11, 2008 at 02:31:48PM +0200, Stephen R. van den Berg wrote: > > Well, the train of thought here goes as follows: > 1. Sure, why not add a field (zero or more) at the bottom of the free-form > commit message reading like: > > Origin: bbb896d8e10f736bfda8f587c0009c358c9a8599 ee837244df2e2e4e9171f508f83f353730db9e53 > > 2. Add support to cherry-pick/revert to actually generate the field upon > demand. "git cherry-pick -x" already generates the field you want. > > 3. Then add support to prune/gc/fsck/blame/log --graph to take the field > into account. > Um, why should "git fsck", or "git prune" or "git gc" need to understand about this field? What were you saying about unclean semantics, again? I thought you claimed that dangling origin links were OK? So why the heck should git fsck care? And why shouldn't gc/prune drop objects that are only referenced via the origin link. > 4. Add support to filter-branch/rebase to renumber the field if necessary. As we discussed earlier in some cases renumbering the field is not the right thing to do, especially if the commit in question has already been cherry-picked --- and you don't know that. Again, this is why prototyping it outside of the core git is so useful; it will show up some of these fundamental flaws in the origin link proposal. > Well, and after having done steps 1 to 5, the net result is that it > works almost as if the field is present in the header, except that: > - It is now at the end of the body in the commit message. > - It takes more time to find and parse it. A proof of concept, even if it isn't fully performant, is useful to prove that an idea actually has merit --- which clearly not everyone believes at this point. I'll also note that having a ***local*** database to cache the origin link is a great way of short-circuiting the performance difficulties. If it works, then it will be a lot easier to convince people that perhaps it should be done git-core, and by modifying core git functions. Alternatively, if you think this is such a great idea, why don't you grab a copy of the git repository, and start hacking the idea yourself? If you have running code, it tends to make the idea much more concrete, and much easier to evaluate. Or were you hoping to convince other people to do all of this programming for you? - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 13:51 ` Theodore Tso @ 2008-09-11 15:32 ` Stephen R. van den Berg 2008-09-11 18:00 ` Theodore Tso 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 15:32 UTC (permalink / raw) To: Theodore Tso; +Cc: Jakub Narebski, Linus Torvalds, git Theodore Tso wrote: >On Thu, Sep 11, 2008 at 02:31:48PM +0200, Stephen R. van den Berg wrote: >> Well, the train of thought here goes as follows: >> 2. Add support to cherry-pick/revert to actually generate the field upon >> demand. >"git cherry-pick -x" already generates the field you want. Well, sort of. In order for swift parsing it should be a real field, i.e. it should not be an English sentence (in order to avoid people accidentally translating it); and it should list a pair of hashes (patches/changesets are defined by the difference between two tree snapshots). So it would be a -o option most likely, in order to provide backward compatibility to the users of -x. >> 3. Then add support to prune/gc/fsck/blame/log --graph to take the field >> into account. >Um, why should "git fsck", or "git prune" or "git gc" need to >understand about this field? What were you saying about unclean >semantics, again? I thought you claimed that dangling origin links >were OK? So why the heck should git fsck care? And why shouldn't >gc/prune drop objects that are only referenced via the origin link. Dangling origin links are ok only if the developer in charge of the repository doesn't care about the commits/branches they point to. The definition of a "caring developer" is formalised by the fact that the offending commits are already present in the repository or not. This implies that fsck will skip the field if the hashes in question are unreachable in the current repository. If they are reachable though, fsck will follow the link and check the whole tree referenced by the origin link. Obviously there are only two conditions for an origin link: either the hash points to an unreachable object or the hash points to a reachable object of type commit (and all associated checks that go with any commit). gc will preserve the commits the origin links point to once they are reachable. I.e. if the developer doesn't care about the commits the origin links point to (i.e. if the branches are not reachable) then gc just skips them, if the developer *does* care, the origin links are used to keep those objects alive (and, of course, all their parenthood). >> 4. Add support to filter-branch/rebase to renumber the field if necessary. >As we discussed earlier in some cases renumbering the field is not the >right thing to do, especially if the commit in question has already >been cherry-picked --- and you don't know that. Again, this is why >prototyping it outside of the core git is so useful; it will show up >some of these fundamental flaws in the origin link proposal. I agree that the behaviour of especially rebase with respect to the origin links is still something that needs to be thought through. I'm not convinced you are right, but I'm not convinced you are wrong either. >> Well, and after having done steps 1 to 5, the net result is that it >> works almost as if the field is present in the header, except that: >> - It is now at the end of the body in the commit message. >> - It takes more time to find and parse it. >A proof of concept, even if it isn't fully performant, is useful to >prove that an idea actually has merit --- which clearly not everyone >believes at this point. Quite. >I'll also note that having a ***local*** database to cache the origin >link is a great way of short-circuiting the performance difficulties. >If it works, then it will be a lot easier to convince people that >perhaps it should be done git-core, and by modifying core git functions. Creating local databases for these kinds of structures feels kludgy somehow, since the git hash objects essentially *are* a working database. I have not checked yet if git already has some kind of ready-to-use local database lib inside which I could reuse for that. >Alternatively, if you think this is such a great idea, why don't you >grab a copy of the git repository, and start hacking the idea >yourself? Actually, in the first hour after posting the initial mail/proposal I already had altered a local version of git to support the origin links in commit.[ch], --topo-order and fsck. Before hacking further I decided to get some feedback first to see if someone would come up with something better. And they did, instead of the mainline number, I decided that using two hashes is better. Once the dust has settled, I'll fill in the rest of the code. > If you have running code, it tends to make the idea much >more concrete, and much easier to evaluate. Agreed, but then again, most of the programming is done without touching any code (the design phase), which is where we are now. Once the design is scrutinised (as far as possible), the coding can begin (continue). The feedback so far was very helpful, and caused me to explore (and dismiss) some of the alternate avenues to achieve the desired functionality. > Or were you hoping to >convince other people to do all of this programming for you? I've never needed that so far, and will not need that here either. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 15:32 ` Stephen R. van den Berg @ 2008-09-11 18:00 ` Theodore Tso 2008-09-11 19:03 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Theodore Tso @ 2008-09-11 18:00 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git On Thu, Sep 11, 2008 at 05:32:02PM +0200, Stephen R. van den Berg wrote: > gc will preserve the commits the origin links point to once they are > reachable. I.e. if the developer doesn't care about the commits the > origin links point to (i.e. if the branches are not reachable) then gc > just skips them, if the developer *does* care, the origin links are used > to keep those objects alive (and, of course, all their parenthood). This seems wrong. OK, suppose you have branches A, B, C, and D, while you are on branch C, you cherry pick commit 'p' from branch B, so that there is a new commit q on branch C which has an origin link containing the commit ID's p^ and 'p. Now suppose branch B gets deleted, and you do a "git gc". All of the commits that were part of branch B will vanish except for p^ and p, which in your model will stick around because they are origin links commit q on branch C. But what good is are these two commits? They represent two snapshots in time, with no context now that branch B has been deleted. 99% of the time, the diff between p^ and p will result in the equivalent of the diff between q^ and q. But even if they aren't, what use are these isolated, disconnected commits? So having "git gc" retain them commits that are pointed to be this proposed origin link doesn't seem to make any sense, and doesn't seem to be well thought through. Oh, BTW, suppose you then further do a "git cherry-pick -o" of commit q while you are on branch D. Presumably this will create a new commit, r. But will the origin-link of commit r be p^ and p, or q^ and q? And will this change depending on whether or not -o is specified? > >I'll also note that having a ***local*** database to cache the origin > >link is a great way of short-circuiting the performance difficulties. > >If it works, then it will be a lot easier to convince people that > >perhaps it should be done git-core, and by modifying core git functions. > > Creating local databases for these kinds of structures feels kludgy > somehow, since the git hash objects essentially *are* a working > database. I have not checked yet if git already has some kind of > ready-to-use local database lib inside which I could reuse for that. Gitk already keeps a cache (.git/gitk.cache) to speed up some of its operations. And in some ways the index file is a cache, although it does far more than that. - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 18:00 ` Theodore Tso @ 2008-09-11 19:03 ` Stephen R. van den Berg 2008-09-11 19:33 ` Nicolas Pitre 2008-09-11 20:04 ` Theodore Tso 0 siblings, 2 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 19:03 UTC (permalink / raw) To: Theodore Tso; +Cc: Jakub Narebski, Linus Torvalds, git Theodore Tso wrote: >On Thu, Sep 11, 2008 at 05:32:02PM +0200, Stephen R. van den Berg wrote: >> gc will preserve the commits the origin links point to once they are >> reachable. I.e. if the developer doesn't care about the commits the >> origin links point to (i.e. if the branches are not reachable) then gc >> just skips them, if the developer *does* care, the origin links are used >> to keep those objects alive (and, of course, all their parenthood). >This seems wrong. OK, suppose you have branches A, B, C, and D, while >you are on branch C, you cherry pick commit 'p' from branch B, so that >there is a new commit q on branch C which has an origin link >containing the commit ID's p^ and 'p. Ok. >Now suppose branch B gets deleted, and you do a "git gc". All of the >commits that were part of branch B will vanish except for p^ and p, Not quite. Obviously all parents of p and p^ will continue to exist. I.e. deleting branch B will cause all commits from p till the tip of B (except p itself) to vanish. Keeping p implies that the whole chain of parents below p will continue to exist and be reachable. That's the way a git repository works. >which in your model will stick around because they are origin links >commit q on branch C. But what good is are these two commits? They >represent two snapshots in time, with no context now that branch B has The context are all their ancestors, which continue to exist, and that is all you need. >been deleted. 99% of the time, the diff between p^ and p will result >in the equivalent of the diff between q^ and q. But even if they >aren't, what use are these isolated, disconnected commits? So having >"git gc" retain them commits that are pointed to be this proposed >origin link doesn't seem to make any sense, and doesn't seem to be >well thought through. I beg to differ, but I presume you agree with me now? >Oh, BTW, suppose you then further do a "git cherry-pick -o" of commit >q while you are on branch D. Presumably this will create a new >commit, r. But will the origin-link of commit r be p^ and p, or q^ >and q? It will be q^..q, and specifically not p^..p, using ^p..p would be lying. We aim to document the evolvement of the patch in time. Cherry-pick itself will always ignore the origin links present on the old commit, it simply creates new ones as if the old ones didn't exist. > And will this change depending on whether or not -o is >specified? No. Actually, cherry-pick will never generate origin links unless -o is specified. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:03 ` Stephen R. van den Berg @ 2008-09-11 19:33 ` Nicolas Pitre 2008-09-11 19:44 ` Stephen R. van den Berg 2008-09-11 20:04 ` Theodore Tso 1 sibling, 1 reply; 137+ messages in thread From: Nicolas Pitre @ 2008-09-11 19:33 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > Theodore Tso wrote: > >On Thu, Sep 11, 2008 at 05:32:02PM +0200, Stephen R. van den Berg wrote: > >> gc will preserve the commits the origin links point to once they are > >> reachable. I.e. if the developer doesn't care about the commits the > >> origin links point to (i.e. if the branches are not reachable) then gc > >> just skips them, if the developer *does* care, the origin links are used > >> to keep those objects alive (and, of course, all their parenthood). > > >This seems wrong. OK, suppose you have branches A, B, C, and D, while > >you are on branch C, you cherry pick commit 'p' from branch B, so that > >there is a new commit q on branch C which has an origin link > >containing the commit ID's p^ and 'p. > > Ok. > > >Now suppose branch B gets deleted, and you do a "git gc". All of the > >commits that were part of branch B will vanish except for p^ and p, > > Not quite. Obviously all parents of p and p^ will continue to exist. > I.e. deleting branch B will cause all commits from p till the tip of B > (except p itself) to vanish. Keeping p implies that the whole chain of > parents below p will continue to exist and be reachable. That's the way > a git repository works. And that's what I called stupid in my earlier reply to you. Either you have proper branches or tags keeping P around, or deleting B brings everything not reachable through other branches or tags (or reflog) away too. Otherwise there is no point making a dangling origin link valid. Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:33 ` Nicolas Pitre @ 2008-09-11 19:44 ` Stephen R. van den Berg 2008-09-11 20:03 ` Nicolas Pitre 2008-09-11 20:05 ` Jakub Narebski 0 siblings, 2 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 19:44 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git Nicolas Pitre wrote: >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> Not quite. Obviously all parents of p and p^ will continue to exist. >> I.e. deleting branch B will cause all commits from p till the tip of B >> (except p itself) to vanish. Keeping p implies that the whole chain of >> parents below p will continue to exist and be reachable. That's the way >> a git repository works. >And that's what I called stupid in my earlier reply to you. Either you >have proper branches or tags keeping P around, or deleting B brings >everything not reachable through other branches or tags (or reflog) >away too. Otherwise there is no point making a dangling origin link >valid. Well, the principle of least surprise dictates that they should be kept by gc as described above, however... I can envision an option to gc say "--drop-weak-links" which does exactly what you describe. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:44 ` Stephen R. van den Berg @ 2008-09-11 20:03 ` Nicolas Pitre 2008-09-11 20:24 ` Stephen R. van den Berg 2008-09-11 20:05 ` Jakub Narebski 1 sibling, 1 reply; 137+ messages in thread From: Nicolas Pitre @ 2008-09-11 20:03 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > Nicolas Pitre wrote: > >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > >> Not quite. Obviously all parents of p and p^ will continue to exist. > >> I.e. deleting branch B will cause all commits from p till the tip of B > >> (except p itself) to vanish. Keeping p implies that the whole chain of > >> parents below p will continue to exist and be reachable. That's the way > >> a git repository works. > > >And that's what I called stupid in my earlier reply to you. Either you > >have proper branches or tags keeping P around, or deleting B brings > >everything not reachable through other branches or tags (or reflog) > >away too. Otherwise there is no point making a dangling origin link > >valid. > > Well, the principle of least surprise dictates that they should be kept > by gc as described above, however... > I can envision an option to gc say "--drop-weak-links" which does > exactly what you describe. Don't you think this starts to look silly at that point? Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 20:03 ` Nicolas Pitre @ 2008-09-11 20:24 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 20:24 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git Nicolas Pitre wrote: >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> Well, the principle of least surprise dictates that they should be kept >> by gc as described above, however... >> I can envision an option to gc say "--drop-weak-links" which does >> exactly what you describe. >Don't you think this starts to look silly at that point? No, it's the developers vote controlling his own repository saying: Ok, I expressed interest in the other branches and their backport/forwardport relationships, but I changed my mind. Drop all backport/forwardport information on branches I don't explicitly have. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:44 ` Stephen R. van den Berg 2008-09-11 20:03 ` Nicolas Pitre @ 2008-09-11 20:05 ` Jakub Narebski 2008-09-11 20:22 ` Stephen R. van den Berg 1 sibling, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2008-09-11 20:05 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Nicolas Pitre, Theodore Tso, Linus Torvalds, git Stephen R. van den Berg wrote: > Nicolas Pitre wrote: >>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >>> Not quite. Obviously all parents of p and p^ will continue to exist. >>> I.e. deleting branch B will cause all commits from p till the tip of B >>> (except p itself) to vanish. Keeping p implies that the whole chain of >>> parents below p will continue to exist and be reachable. That's the way >>> a git repository works. > >>And that's what I called stupid in my earlier reply to you. Either you >>have proper branches or tags keeping P around, or deleting B brings >>everything not reachable through other branches or tags (or reflog) >>away too. Otherwise there is no point making a dangling origin link >>valid. > > Well, the principle of least surprise dictates that they should be kept > by gc as described above, however... > I can envision an option to gc say "--drop-weak-links" which does > exactly what you describe. Well, IIRC the need for this was one of the causes of "death" of 'prior' header link proposal... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 20:05 ` Jakub Narebski @ 2008-09-11 20:22 ` Stephen R. van den Berg 2008-09-12 0:30 ` A Large Angry SCM 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 20:22 UTC (permalink / raw) To: Jakub Narebski; +Cc: Nicolas Pitre, Theodore Tso, Linus Torvalds, git Jakub Narebski wrote: >Stephen R. van den Berg wrote: >> Well, the principle of least surprise dictates that they should be kept >> by gc as described above, however... >> I can envision an option to gc say "--drop-weak-links" which does >> exactly what you describe. >Well, IIRC the need for this was one of the causes of "death" of 'prior' >header link proposal... As I understood it, one of the causes of death of the "prior" link proposal was that it was unclear if it pulled in the linked-to commits upon fetch. In the "origin" case, the default is *not* to fetch them. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 20:22 ` Stephen R. van den Berg @ 2008-09-12 0:30 ` A Large Angry SCM 2008-09-12 5:39 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: A Large Angry SCM @ 2008-09-12 0:30 UTC (permalink / raw) To: Stephen R. van den Berg Cc: Jakub Narebski, Nicolas Pitre, Theodore Tso, Linus Torvalds, git Stephen R. van den Berg wrote: > Jakub Narebski wrote: >> Stephen R. van den Berg wrote: >>> Well, the principle of least surprise dictates that they should be kept >>> by gc as described above, however... >>> I can envision an option to gc say "--drop-weak-links" which does >>> exactly what you describe. > >> Well, IIRC the need for this was one of the causes of "death" of 'prior' >> header link proposal... > > As I understood it, one of the causes of death of the "prior" link > proposal was that it was unclear if it pulled in the linked-to commits > upon fetch. In the "origin" case, the default is *not* to fetch them. And that's WRONG. Both prior and origin must fetch them if they're reference in the header. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 0:30 ` A Large Angry SCM @ 2008-09-12 5:39 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 5:39 UTC (permalink / raw) To: A Large Angry SCM Cc: Jakub Narebski, Nicolas Pitre, Theodore Tso, Linus Torvalds, git A Large Angry SCM wrote: >Stephen R. van den Berg wrote: >>Jakub Narebski wrote: >>>Stephen R. van den Berg wrote: >>>>Well, the principle of least surprise dictates that they should be kept >>>>by gc as described above, however... >>>>I can envision an option to gc say "--drop-weak-links" which does >>>>exactly what you describe. >>>Well, IIRC the need for this was one of the causes of "death" of 'prior' >>>header link proposal... >>As I understood it, one of the causes of death of the "prior" link >>proposal was that it was unclear if it pulled in the linked-to commits >>upon fetch. In the "origin" case, the default is *not* to fetch them. >And that's WRONG. Both prior and origin must fetch them if they're >reference in the header. By definition of the origin headerfield that is not wrong, there are no other rules. But the point is moot at the moment, since I'm going to create a proof of concept which puts the field in the free-form trailer. -- Sincerely, Stephen R. van den Berg. "Father's Day: Nine months before Mother's Day." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:03 ` Stephen R. van den Berg 2008-09-11 19:33 ` Nicolas Pitre @ 2008-09-11 20:04 ` Theodore Tso 2008-09-11 21:46 ` Jeff King 1 sibling, 1 reply; 137+ messages in thread From: Theodore Tso @ 2008-09-11 20:04 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git On Thu, Sep 11, 2008 at 09:03:35PM +0200, Stephen R. van den Berg wrote: > >This seems wrong. OK, suppose you have branches A, B, C, and D, while > >you are on branch C, you cherry pick commit 'p' from branch B, so that > >there is a new commit q on branch C which has an origin link > >containing the commit ID's p^ and 'p. > > >Now suppose branch B gets deleted, and you do a "git gc". All of the > >commits that were part of branch B will vanish except for p^ and p, > > Not quite. Obviously all parents of p and p^ will continue to exist. > I.e. deleting branch B will cause all commits from p till the tip of B > (except p itself) to vanish. Keeping p implies that the whole chain of > parents below p will continue to exist and be reachable. That's the way > a git repository works. That's still not very useful, since you still don't have a label for this anonymous series of commit chain that just dead ends at commit p. How would anyone find this useful? > >which in your model will stick around because they are origin links > >commit q on branch C. But what good is are these two commits? They > >represent two snapshots in time, with no context now that branch B has > > The context are all their ancestors, which continue to exist, and that > is all you need. Need for what? What useful information would you devine from it? > >Oh, BTW, suppose you then further do a "git cherry-pick -o" of commit > >q while you are on branch D. Presumably this will create a new > >commit, r. But will the origin-link of commit r be p^ and p, or q^ > >and q? > > It will be q^..q, and specifically not p^..p, using ^p..p would be > lying. We aim to document the evolvement of the patch in time. > Cherry-pick itself will always ignore the origin links present on the > old commit, it simply creates new ones as if the old ones didn't exist. So if you never pull branch C (where commit q resides), there is no way for you to know that commits p and r are related. How.... not useful. If the scenario was being able to tell which stable branches had a particular bug fixes, I think my proposal of attaching a bug identifier is a far superior solution. Again, what's the use case of "trying to document the development of the patch in time?" Aside from drawing pretty dotted lines everywhere, what *good* does this actually achieve? How would it affect other git commands' behavior, and how would this change in behavior actually be considered a net improvement over what we have now? - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 20:04 ` Theodore Tso @ 2008-09-11 21:46 ` Jeff King 2008-09-11 22:56 ` Stephen R. van den Berg 2008-09-11 23:10 ` Linus Torvalds 0 siblings, 2 replies; 137+ messages in thread From: Jeff King @ 2008-09-11 21:46 UTC (permalink / raw) To: Theodore Tso; +Cc: Stephen R. van den Berg, Jakub Narebski, Linus Torvalds, git On Thu, Sep 11, 2008 at 04:04:53PM -0400, Theodore Tso wrote: > > It will be q^..q, and specifically not p^..p, using ^p..p would be > > lying. We aim to document the evolvement of the patch in time. > > Cherry-pick itself will always ignore the origin links present on the > > old commit, it simply creates new ones as if the old ones didn't exist. > > So if you never pull branch C (where commit q resides), there is no > way for you to know that commits p and r are related. How.... not > useful. That is a good point. Stephen has explained his workflow, and I can see why he wants to reference the cherry-picked commits, and how he thinks that the referenced commits will always be available in that workflow. And obviously in Linus's workflow such references are basically useless, and they should just not be generated. But what about workflows in between? When I pull from some developer who has added a weak reference to a particular commit SHA1, but I _don't_ have that commit, my next question "OK, so what was in that commit?". What is the mechanism by which I find out more information on that SHA1? Using a key that is meaningful to an external database (like a bug tracker) means that you can go to that database to look up more information. -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 21:46 ` Jeff King @ 2008-09-11 22:56 ` Stephen R. van den Berg 2008-09-11 23:01 ` Jeff King 2008-09-11 23:10 ` Linus Torvalds 1 sibling, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 22:56 UTC (permalink / raw) To: Jeff King; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git Jeff King wrote: >On Thu, Sep 11, 2008 at 04:04:53PM -0400, Theodore Tso wrote: >> > It will be q^..q, and specifically not p^..p, using ^p..p would be >> > lying. We aim to document the evolvement of the patch in time. >> > Cherry-pick itself will always ignore the origin links present on the >> > old commit, it simply creates new ones as if the old ones didn't exist. >> So if you never pull branch C (where commit q resides), there is no >> way for you to know that commits p and r are related. How.... not >> useful. >But what about workflows in between? When I pull from some developer who >has added a weak reference to a particular commit SHA1, but I _don't_ >have that commit, my next question "OK, so what was in that commit?". >What is the mechanism by which I find out more information on that SHA1? Well, the usual way to fix this is to actually startup fetch and tell it to try and fetch all the weak links (or just fetch a single hash (the offending origin link)) from upstream; this is by no means the default operatingmode of fetch, but I don't see any harm in allowing to fetch those if one really wants to. >Using a key that is meaningful to an external database (like a bug >tracker) means that you can go to that database to look up more >information. True. And also a Good Thing, I concur. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 22:56 ` Stephen R. van den Berg @ 2008-09-11 23:01 ` Jeff King 2008-09-11 23:17 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Jeff King @ 2008-09-11 23:01 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git On Fri, Sep 12, 2008 at 12:56:48AM +0200, Stephen R. van den Berg wrote: > Well, the usual way to fix this is to actually startup fetch and tell it > to try and fetch all the weak links (or just fetch a single hash (the > offending origin link)) from upstream; this is by no means the default > operatingmode of fetch, but I don't see any harm in allowing to fetch > those if one really wants to. Maybe I am misremembering the details of fetching, but I believe you cannot fetch an arbitrary SHA-1, and that is by design. So: 1. You would have to argue the merits of changing that design. I believe the rationale relates to exposing some subset of the content via refs, but I have personally never felt that is very compelling. 2. Even if we did make a change, that means that _both_ sides need the upgraded version. -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 23:01 ` Jeff King @ 2008-09-11 23:17 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 23:17 UTC (permalink / raw) To: Jeff King; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git Jeff King wrote: >On Fri, Sep 12, 2008 at 12:56:48AM +0200, Stephen R. van den Berg wrote: >> Well, the usual way to fix this is to actually startup fetch and tell it >> to try and fetch all the weak links (or just fetch a single hash (the >> offending origin link)) from upstream; this is by no means the default >> operatingmode of fetch, but I don't see any harm in allowing to fetch >> those if one really wants to. >Maybe I am misremembering the details of fetching, but I believe you >cannot fetch an arbitrary SHA-1, and that is by design. So: I see, didn't know that. > 1. You would have to argue the merits of changing that design. I > believe the rationale relates to exposing some subset of the > content via refs, but I have personally never felt that is very > compelling. Well, I can understand why it is done this way, I think. > 2. Even if we did make a change, that means that _both_ sides need the > upgraded version. If you're using origin links, you'd need that anyway, so that's a given. I could imagine the minimum would be something like: Allow direct SHA1 fetches (which obviously pull in all parents as well) if the ref is part of one of the public branches (either as a commit, or as an origin link). -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 21:46 ` Jeff King 2008-09-11 22:56 ` Stephen R. van den Berg @ 2008-09-11 23:10 ` Linus Torvalds 2008-09-11 23:26 ` Jeff King 2008-09-11 23:36 ` Stephen R. van den Berg 1 sibling, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2008-09-11 23:10 UTC (permalink / raw) To: Jeff King; +Cc: Theodore Tso, Stephen R. van den Berg, Jakub Narebski, git On Thu, 11 Sep 2008, Jeff King wrote: > > And obviously in Linus's workflow such references are basically useless, > and they should just not be generated. This has _nothing_ to do with workflows or anything else. Why are people claiming these total red herrings? I have asked several times what it is that makes it so important that the "origin" information be in the headers. Nobody has been able to explain why it's so different from just doing it in the free-form part. NOBODY. If somebody has a workflow where they want to track "origin" commits, then they can do it today with the in-body approach. But that has nothing what-so-ever to do with the question of "let's change object file format to some odd special-case that we just made up and is only apparently useful for some special workflow that uses special tools and special rules". I want the git object database to have really clear semantics. The fields we have now, we have because we _require_ them. There is nothing unclear what-so-ever about the semantics of author/commiter-ship, parenthood, trees, or anything else. And there are _zero_ issues about "workflow". The workflow doesn't matter, the objects always make sense, and they always work exactly the same way. There are no special magic cases that are in the least questionable in any way. So this argument is about more than just "minimalism", although I'll also admit to that being an issue - I want to be able to basically explain how git data structures work to any CS student, and not have any extra fat or any gray areas. It's about everything having a clear design, and a clear meaning, and there never being any question what-so-ever about what the real "meaning" of something is. Then, if you have some special use case or rules for your particular project, well that's where you can have things like formatting rules for how the commit messages should look like. If somebody wants to use fixed format rules for their project, that's fine. And THAT is where "workflow" issues come up. But "workflow" has nothing to do with core git data structures. They were designed for speed, stability, simplicity and good taste. The _workflow_ part has been designed on separately on top of that (example: the whole thing with a single-line top summary of a commit so that we can have "git shortlog" and the "gitk" single-line commit view etc). Of course, good and generally useful workflows can then be reflected in how tools work, where that single line commit summary is an example of that: it's not something that git data structures _enforce_ or even care about, but it's obviously something that a lot of the porcelain expects, and without it, lots of tools will output less useful information. The same goes for the existing SHA1-in-comment support: some tools already support it and help view it in certain ways, even though it is in no way a core data structure issue. And _extending_ on that kind of helpful porcelain support certainly makes sense. The only thing I have ever argued against is adding commit headers that have no sane semantics and don't make sense as internal git data structures. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 23:10 ` Linus Torvalds @ 2008-09-11 23:26 ` Jeff King 2008-09-11 23:36 ` Stephen R. van den Berg 1 sibling, 0 replies; 137+ messages in thread From: Jeff King @ 2008-09-11 23:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Theodore Tso, Stephen R. van den Berg, Jakub Narebski, git On Thu, Sep 11, 2008 at 04:10:26PM -0700, Linus Torvalds wrote: > > And obviously in Linus's workflow such references are basically useless, > > and they should just not be generated. > > This has _nothing_ to do with workflows or anything else. > > Why are people claiming these total red herrings? > > I have asked several times what it is that makes it so important that the > "origin" information be in the headers. Nobody has been able to explain > why it's so different from just doing it in the free-form part. NOBODY. The message you are responding to has nothing to do with an origin header versus putting it in the free-form part. It is equally a problem with both approaches. I was purely commenting on the "if I mention an arbitrary sha-1, what is the person reading it supposed to _do_ with it, if they may never have seen that sha-1" issue. So yes, it has _everything_ to do with workflows. In Stephen's case, he claims that all references will be to commits on long-lived branches. In which case, it is a non-issue because they will have the referenced commits. But in the general case, people will not have them, and there is potential head-scratching. My point is that even if a feature works for Stephen's workflow, it may not be a good feature for everyone, since other solutions handle the general case (as well as his case) much better. > [ranting about how the origin header is bad] > The only thing I have ever argued against is adding commit headers that > have no sane semantics and don't make sense as internal git data > structures. Yes, and I totally agree with everything you said. If you read the mail you are responding to carefully, you will see that I never mention an origin header versus the free-form commit. -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 23:10 ` Linus Torvalds 2008-09-11 23:26 ` Jeff King @ 2008-09-11 23:36 ` Stephen R. van den Berg 1 sibling, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 23:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, Theodore Tso, Jakub Narebski, git Linus Torvalds wrote: >I have asked several times what it is that makes it so important that the >"origin" information be in the headers. Nobody has been able to explain >why it's so different from just doing it in the free-form part. NOBODY. That's because the difference is small: In the header is slightly faster and more elegant (both designwise and displaywise), that's it. Other than that, it hardly matters. >The only thing I have ever argued against is adding commit headers that >have no sane semantics and don't make sense as internal git data >structures. Of course. In any case, I think I got enough feedback from the list to create a working implementation/concept which is going to use the free-form trailer to implement the origin field. -- Sincerely, Stephen R. van den Berg. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 12:31 ` Stephen R. van den Berg 2008-09-11 13:51 ` Theodore Tso @ 2008-09-11 15:02 ` Nicolas Pitre 2008-09-11 16:00 ` Stephen R. van den Berg 1 sibling, 1 reply; 137+ messages in thread From: Nicolas Pitre @ 2008-09-11 15:02 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > So short-circuiting the reasoning suggests that since the only thing > that actually changes now is the position of the field (at the top or > end of the commit message), we might as well do it right and put it in > the top, that gets rid of the two minuses. > > Anything I missed? A good convincing demonstration that this is actually worth doing in the first place. And here I'm talking about the _feature_ and not the _implementation_. > Basically it means that: > > a. If there is a better solution to tracking the backports, I'll gladly > use that instead, but simply using the current really freeform > approach doesn't cut it (it currently refers to a single commit, > instead of a pair of commits, and takes too long to parse out in a > --top-order or blame command). Better solutions I haven't heard so > far. > > b. I need the integrity protection of a commit to make sure that the > origin fields cannot be altered later; blame would be too easy to fool > otherwise. So using the notes solution seems to be out (it would also > be quite a performance hit again). > > c. I consider the Origin: field at the end of the commit message a > workable solution, but it smells like X-header-extension-messes as in > E-mail headers, and it incurs a small performance hit (in case of > --topo-order/blame/prune/fsck), but maybe this performance hit can be > minimised by making sure that the fields are *always* at the end > of the commit message. > > d. Using the proposed origin header in the standard commit header has > close to zero overhead (in most commits the field is not present), yet > codecomplexitywise it is almost identical with the Origin: field at > the end of the commit message. > > I find it remarkable though that people are dragging their feet at > solution d, yet are quite ok with solution c. IMO solution c and d are > almost identical, except that solution c is ugly, and solution d is > elegant. But if it makes it easier to prove the usefulness by > implementing the ugly solution first, that's fine. Technically speaking, implementation d is obviously the most efficient. but, as mentioned above, the actual need for this feature has not been convincing so far. Until then, it is not wise to add random stuff to the very structure of a commit object, while c can be done even externally from git which is a good way to demonstrate and convince people about the usefulness of such feature. Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 15:02 ` Nicolas Pitre @ 2008-09-11 16:00 ` Stephen R. van den Berg 2008-09-11 17:02 ` Nicolas Pitre 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 16:00 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jakub Narebski, Linus Torvalds, git Nicolas Pitre wrote: >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> Anything I missed? >Technically speaking, implementation d is obviously the most efficient. >but, as mentioned above, the actual need for this feature has not been >convincing so far. Until then, it is not wise to add random stuff to >the very structure of a commit object, while c can be done even >externally from git which is a good way to demonstrate and convince >people about the usefulness of such feature. The actual need for the feature seems to be dependent on one's workflow habits. This is also the problem I sense throughout the thread: some people know exactly what I'm talking about, and would come up with the almost identical design specs for the feature independent of myself, and others need to be explained every tiny detail of the spec because they are not familiar with the concept and can't imagine why/how it would be used. Let me try and describe once more the typical environment this origin field is vital in: Imagine a repository with: - 33774 commits total - 13 years of history - 1 development branch - 9 stable branches (forked off of the development branch at regular intervals during the past 13 years). - The stable branches are never merged with each other or with the development branch. - 2787 individual back/forward ports between the development and stable branches. In order to have meaningful output for git-blame, it needs to follow the chain across cherry-picks reliably. Once you alter a piece of code, in order to figure out what more to alter, you need to verify if this piece of code was or wasn't forward/backported. Reliable and fast reporting of this, and actual comparison of the different forward/backports between the 9 branches is essential. It basically means that you need to view the diffs of the patches across 9 branches on a regular basis. Without the origin links, this workflow will cost a lot more time to pursue (I know it, because I'm living it at the moment, and no, I'm not the only developer, it's a development team). This development model is not unique to my situation, it occurs at more places. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 16:00 ` Stephen R. van den Berg @ 2008-09-11 17:02 ` Nicolas Pitre 2008-09-11 18:44 ` Stephen R. van den Berg 2008-09-11 21:05 ` Junio C Hamano 0 siblings, 2 replies; 137+ messages in thread From: Nicolas Pitre @ 2008-09-11 17:02 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > Let me try and describe once more the typical environment this origin field > is vital in: > > Imagine a repository with: > - 33774 commits total > - 13 years of history > - 1 development branch > - 9 stable branches (forked off of the development branch at regular > intervals during the past 13 years). > - The stable branches are never merged with each other or with the > development branch. > - 2787 individual back/forward ports between the development and stable > branches. > > In order to have meaningful output for git-blame, it needs to follow the > chain across cherry-picks reliably. > Once you alter a piece of code, in order to figure out what more to alter, > you need to verify if this piece of code was or wasn't forward/backported. > Reliable and fast reporting of this, and actual comparison of the > different forward/backports between the 9 branches is essential. It > basically means that you need to view the diffs of the patches across 9 > branches on a regular basis. > > Without the origin links, this workflow will cost a lot more time to > pursue (I know it, because I'm living it at the moment, and no, I'm not > the only developer, it's a development team). > > This development model is not unique to my situation, it occurs at more > places. OK. I think I might be able to believe you. Where I feel uncomfortable is with the real semantics of your "origin" link proposal. First, its name. The word "origin" probably has a too narrow meaning that creates confusion. I'd suggest something like a "may-be-related-to" field that would be like a weak link. The format of a may-be-related-to field would be the same as the parent field, except that the object pointed to by the sha1 could have its type relaxed, i.e. it could be anything like a blob or a tag. The semantics of a "may-be-related-to" link would be defined for object reachability only: - If the may-be-related-to link is dangling then it is ignored. - If it is not dangling then usual reachability rules apply. That's all the core git might care about, and the only real argument for not having this information in the free form commit message. Still, in your case, you probably won't get rid of your stable branches, hence the reachability argument is rather weak for your usage scenario, meaning that you could as well have that info in the free form text (like cherry-pick -x), and even generate a special graft file from that locally for visualization/blame purposes. Sure the indirection will add some overhead, but I doubt it'll be measurable. People fetching your main branch won't have to carry the whole repository because those weak links would otherwise be followed if they're formally part of the commit header. And if they want to benefit from the information those weak links carry then they just have to also fetch the branch(es) where those links are pointing. At that point it is trivial to regenerate the special graft file locally which would also have the benefit of only containing links to actually reachable commits, hence you'd never have dangling "origin" links. Conclusion: the only fundamental reason for having this weak link information in the commit header is for reachability convenience for when the actual branch that contained the referenced commits is gone, which IMHO is a bad justification. Having lines of developments hanging off of a weak link alone is just plain stupid if you can't reach it via proper branches or tags. So I think that your usage scenario is a valid one, but I think that you should implement it some other ways, like this special graft file I mentioned above, which can be generated and updated from custom information found in the free form comment text of a commit object, and that proper reachability issues should continue to be handled through proper branches and tags as it is done today. Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 17:02 ` Nicolas Pitre @ 2008-09-11 18:44 ` Stephen R. van den Berg 2008-09-11 20:00 ` Nicolas Pitre 2008-09-11 21:05 ` Junio C Hamano 1 sibling, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 18:44 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jakub Narebski, Linus Torvalds, git Nicolas Pitre wrote: >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> Let me try and describe once more the typical environment this origin field >> is vital in: >> Without the origin links, this workflow will cost a lot more time to >> pursue (I know it, because I'm living it at the moment, and no, I'm not >First, its name. The word "origin" probably has a too narrow meaning >that creates confusion. I'd suggest something like a >"may-be-related-to" field that would be like a weak link. Well, the important properties of the name/field would be: - It should be as specific as possible, in order to minimise the potential for abuse in the future. I distill the desirability of this requirement out of the various earlier discussions about commitheaders in the past on this mailinglist held by others. - It should convey a sense of direction (it's a directed graph). Any generic may-be-related-to field is therefore probably a non-starter. >The semantics of a "may-be-related-to" link would be defined for object >reachability only: >- If the may-be-related-to link is dangling then it is ignored. >- If it is not dangling then usual reachability rules apply. >That's all the core git might care about, and the only real argument for >not having this information in the free form commit message. The origin field as currently proposed tightens the requirements that it either is dangling and ignored or points to a commit. rev-list --topo-order should use the origin links to order the output. gc/prune won't delete commits referenced *by* an origin link. The only two other arguments one might give to actually keep the field in the header of the commit as opposed to the trailer is that the physical field can be kept machine readable, and the actual display can be beautified like: Origin: 2abcdef..1234567 The output of the field could be suppressed (if so desired) if the target commit isn't reachable. All this is of course possible for a trailer field in the free-form area as well, but it seems a bit silly to have two places for "headers". >Still, in your case, you probably won't get rid of your stable branches, True. >hence the reachability argument is rather weak for your usage scenario, Then again, I don't want to be bothered by stupid free-form origin links made to local branches by a developer. If the developer creates them using cherry-pick -o which creates an origin link, I'll never have to see his silly commit hashes where he is referring to commits in his local branch (and never waste time wondering where those commits are). >meaning that you could as well have that info in the free form text >(like cherry-pick -x), and even generate a special graft file from that >locally for visualization/blame purposes. Sure the indirection will add >some overhead, but I doubt it'll be measurable. The free-form equivalent looks like: Origin: df85f7855da44c730f942b330ada181209d09d7a ff1e8bfcd69e5e0ee1a3167e80ef75b611f72123 You need a pair of hashes, which is, a bit bulky, for my taste. What special graft file would I need to visualise? Isn't having the origin link information enough? >People fetching your main branch won't have to carry the whole >repository because those weak links would otherwise be followed if >they're formally part of the commit header. And if they want >to benefit from the information those weak links carry then they just >have to also fetch the branch(es) where those links are pointing. At >that point it is trivial to regenerate the special graft file locally >which would also have the benefit of only containing links to actually >reachable commits, hence you'd never have dangling "origin" links. You lost me here somewhere. Could you give a concrete example with one commit, one origin link (your style) and a special graftfile entry? >Conclusion: the only fundamental reason for having this weak link >information in the commit header is for reachability convenience for >when the actual branch that contained the referenced commits is gone, Erm. Quite the opposite, actually. The practical use for the origin link in case the target is unreachable is zero to none, so it can gleefully be ignored in that case. But maybe the semantics of your "related" link and my origin link are sufficiently distinct. For the arguments why it should be in the header of a commit, see above. >which IMHO is a bad justification. Having lines of developments hanging >off of a weak link alone is just plain stupid if you can't reach it via >proper branches or tags. Agreed. But this is in reference to your "related" link proposal. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 18:44 ` Stephen R. van den Berg @ 2008-09-11 20:00 ` Nicolas Pitre 0 siblings, 0 replies; 137+ messages in thread From: Nicolas Pitre @ 2008-09-11 20:00 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > Nicolas Pitre wrote: > >First, its name. The word "origin" probably has a too narrow meaning > >that creates confusion. I'd suggest something like a > >"may-be-related-to" field that would be like a weak link. > > Well, the important properties of the name/field would be: > - It should be as specific as possible, in order to minimise the > potential for abuse in the future. I distill the desirability of this > requirement out of the various earlier discussions about commitheaders > in the past on this mailinglist held by others. Well, sure. But being too specific sometimes limits its usefulness. There could be other usages for such a link which IMHO should be defined in terms of graph connectivity semantics rather than high level purpose. > - It should convey a sense of direction (it's a directed graph). Well, isn't that already obvious? > Any generic may-be-related-to field is therefore probably a non-starter. Well, my whole argument is that if it has no generic purpose then it probably doesn't belong in the commit header. > The origin field as currently proposed tightens the requirements that > it either is dangling and ignored or points to a commit. > rev-list --topo-order should use the origin links to order the output. > gc/prune won't delete commits referenced *by* an origin link. And I disagree on the gc/prune point, as mentioned previously. As to rev-list --topo-order, it doesn't need for this link to actually be part of the commit object to accomplish the desired effect. > The only two other arguments one might give to actually keep the field > in the header of the commit as opposed to the trailer is that the > physical field can be kept machine readable, and the actual display can be > beautified like: Origin: 2abcdef..1234567 > The output of the field could be suppressed (if so desired) if the > target commit isn't reachable. > All this is of course possible for a trailer field in the free-form > area as well, but it seems a bit silly to have two places for "headers". And I think you should simply create a file within the repository with that info instead of either thecommit header or the free form text. It gives all the usability advantages you wish for and more. Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 17:02 ` Nicolas Pitre 2008-09-11 18:44 ` Stephen R. van den Berg @ 2008-09-11 21:05 ` Junio C Hamano 2008-09-11 22:32 ` Stephen R. van den Berg 1 sibling, 1 reply; 137+ messages in thread From: Junio C Hamano @ 2008-09-11 21:05 UTC (permalink / raw) To: Nicolas Pitre Cc: Stephen R. van den Berg, Jakub Narebski, Linus Torvalds, git Nicolas Pitre <nico@cam.org> writes: > Still, in your case, you probably won't get rid of your stable branches, > hence the reachability argument is rather weak for your usage scenario, > meaning that you could as well have that info in the free form text > (like cherry-pick -x), and even generate a special graft file from that > locally for visualization/blame purposes. Sure the indirection will add > some overhead, but I doubt it'll be measurable. I keep hearing "blame" in this discussion, but I do not understand why people think blame should _follow_ this "origin" information (in the usual sense of "following"). Suppose you cherry-pick an existing commit from unrelated context: ...---A---B . (origin) . ...---o---X---Y---Z i.e. on top of X the difference to bring A to B is applied to produce Y, and a new development Z is made on top. You start digging from Z. Without any "origin", here is how blame works: * What Z did is blamed on Z; what Z did not change is passed to Y; * Y needs to: (1) take responsibility for what it changed; and/or (2) the remaining contents came from X --- pass the blame to it. Let's see how we would want "origin" get involved. Instead of the above, what Y would do would be: (1) if the contents (excluding the part Z changed) is different from X, instead of taking the blame itself, give the _final_ blame to B. (2) the remainder is passed to X as usual. This is different from the normal "following" in that B is not allowed to pass the blame to its parents (should it be allowed to pass it to its "origin"?), because the _only thing_ cherry-pick did was to transport what B did (relative to A) to the unrelated history that led to X. IOW, you did not look at the contents outside "diff A..B" when you made the cherry-pick. There could well be parts of the content that are common across all of A, Y, X and Z, but as far as Y and Z are concerned, they did not get any part of that common common content from A (otherwise "origin" is no different from "parent", but you did not merge). The output from "origin" aware blame would be identical to the normal blame, except that lines that usually are labeled with Y are labeled with B. However: (1) If you _are_ interested in the line that says Y, you can look at the commit object Y and see "cherry-pick -x" information to learn it came from B already; and (2) More importantly, if you want to dig deeper by peeling the blamed line (I think gitweb allows this, and probably git-gui), you shouldn't peel that line blamed on B to start running blame at A. That would continue digging the history of A, which is wrong when you are examining the history that led to Z. So please leave "blame" out of this discussion. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 21:05 ` Junio C Hamano @ 2008-09-11 22:32 ` Stephen R. van den Berg 2008-09-11 22:40 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 22:32 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nicolas Pitre, Jakub Narebski, Linus Torvalds, git Junio C Hamano wrote: >I keep hearing "blame" in this discussion, but I do not understand why >people think blame should _follow_ this "origin" information (in the usual >sense of "following"). >Suppose you cherry-pick an existing commit from unrelated context: > ...---A---B > . (origin) > . > ...---o---X---Y---Z >i.e. on top of X the difference to bring A to B is applied to produce Y, >and a new development Z is made on top. You start digging from Z. >Let's see how we would want "origin" get involved. Instead of the above, >what Y would do would be: > (1) if the contents (excluding the part Z changed) is different from X, > instead of taking the blame itself, give the _final_ blame to B. > (2) the remainder is passed to X as usual. Sounds reasonable. >This is different from the normal "following" in that B is not allowed to >pass the blame to its parents (should it be allowed to pass it to its >"origin"?), because the _only thing_ cherry-pick did was to transport what >B did (relative to A) to the unrelated history that led to X. Well, I'd expect: a. That B should be able to pass blame onto it's origin. b. That B should be able to pass blame onto A (and deeper). Let me show another example: ...-C---D---E---F---G . (origin) . ...---A---B . (origin) . ...---o---X---Y---Z Now suppose there is a piece of sourcecode which evolves from C to F, then when I dig into G using blame I get something like: CCCFFEGGDDDCC (Every letter represents a line in the sourcecode) Digging into Z I'd expect to see the following: ZZCCCFFEDDYDCCB All this assumes that there were minimal changes to the patch when creating B, and also minimal changes to the patch when creating Y. I.e. large parts of that code where developed during C, D, E and F, so that is what I expect to see; is that illogical? -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 22:32 ` Stephen R. van den Berg @ 2008-09-11 22:40 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 22:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nicolas Pitre, Jakub Narebski, Linus Torvalds, git Stephen R. van den Berg wrote: >Junio C Hamano wrote: >>This is different from the normal "following" in that B is not allowed to >>pass the blame to its parents (should it be allowed to pass it to its >>"origin"?), because the _only thing_ cherry-pick did was to transport what >>B did (relative to A) to the unrelated history that led to X. >Well, I'd expect: >a. That B should be able to pass blame onto it's origin. >b. That B should be able to pass blame onto A (and deeper). >Let me show another example: >....-C---D---E---F---G > . (origin) > . > ...---A---B > . (origin) > . > ...---o---X---Y---Z >Now suppose there is a piece of sourcecode which evolves from C to F, >then when I dig into G using blame I get something like: CCCFFEGGDDDCC >(Every letter represents a line in the sourcecode) >Digging into Z I'd expect to see the following: ZZCCCFFEDDYDCCB >All this assumes that there were minimal changes to the patch when >creating B, and also minimal changes to the patch when creating Y. >I.e. large parts of that code where developed during C, D, E and F, so >that is what I expect to see; is that illogical? I'm sorry, you're right, I'm confusing things here. The case I'm describing here can only happen when you do this: ....-C---D---E---F---G \...\...\..\ (origin) . ...---A---B . (origin) . ...---o---X---Y---Z I.e. the first cherry-pick needs to cherry-pick C, D, E *and* F into B, that will result in four origin fields there. And yes, that means that: - blame follows origin links (repeatedly). - blame does *not* travel to parents of commits found through an origin link. Does that mean that blame uses origin fields? Yes, it does, and it has to check for origin links at every commit it traverses. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 6:22 ` Stephen R. van den Berg 2008-09-11 8:20 ` Jakub Narebski @ 2008-09-11 12:28 ` A Large Angry SCM 2008-09-11 12:39 ` Stephen R. van den Berg 2008-09-11 15:39 ` Linus Torvalds 2 siblings, 1 reply; 137+ messages in thread From: A Large Angry SCM @ 2008-09-11 12:28 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Linus Torvalds, Jakub Narebski, git Stephen R. van den Berg wrote: > If you fetch just branches A, B and C, but not D, the origin link from A > to D is dangling. I do not understand how this can be considered an acceptable behavior. If an object ID is referenced in an object header, particularly commit objects, fetch must gather those objects also because to do otherwise breaks the cryptographic authentication in git. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 12:28 ` A Large Angry SCM @ 2008-09-11 12:39 ` Stephen R. van den Berg 2008-09-12 0:03 ` A Large Angry SCM 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 12:39 UTC (permalink / raw) To: A Large Angry SCM; +Cc: Linus Torvalds, Jakub Narebski, git A Large Angry SCM wrote: >Stephen R. van den Berg wrote: >>If you fetch just branches A, B and C, but not D, the origin link from A >>to D is dangling. >I do not understand how this can be considered an acceptable behavior. >If an object ID is referenced in an object header, particularly commit >objects, fetch must gather those objects also because to do otherwise >breaks the cryptographic authentication in git. No it does not. The cryptographic seal is calculated over the content of the commit, which includes the hashes of all referenced objects, but doesn't include the objects themselves. The content of the commit is not violated. Do not forget though: - origin links are a rare occurrence. - When they occur, they usually were made to point into other (deemed) important public branches. - Due to the fact that the branches they are pointing into are important and public, in most cases the origin links *will* point to objects you actually already have (even if you fetched from someone else). - The only time you're going to have dangling origin links is when they were pointing at someone's private branches, in which case it was not very prudent of the committer to actually record the link in the first place. But nothing breaks if you don't have his private branch locally. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 12:39 ` Stephen R. van den Berg @ 2008-09-12 0:03 ` A Large Angry SCM 2008-09-12 0:13 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: A Large Angry SCM @ 2008-09-12 0:03 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Linus Torvalds, Jakub Narebski, git Stephen R. van den Berg wrote: > A Large Angry SCM wrote: >> Stephen R. van den Berg wrote: >>> If you fetch just branches A, B and C, but not D, the origin link from A >>> to D is dangling. > >> I do not understand how this can be considered an acceptable behavior. >> If an object ID is referenced in an object header, particularly commit >> objects, fetch must gather those objects also because to do otherwise >> breaks the cryptographic authentication in git. > > No it does not. > The cryptographic seal is calculated over the content of the commit, > which includes the hashes of all referenced objects, but doesn't include > the objects themselves. > The content of the commit is not violated. The fetch MUST gather the referenced objects ALWAYS or I can't verify the history. To do otherwise means that ID strings on the origin lines are nothing more than an arbitrary text tag and not pointer to a specific history. > > Do not forget though: > - origin links are a rare occurrence. > - When they occur, they usually were made to point into other (deemed) > important public branches. > - Due to the fact that the branches they are pointing into are important > and public, in most cases the origin links *will* point to objects you > actually already have (even if you fetched from someone else). > - The only time you're going to have dangling origin links is when > they were pointing at someone's private branches, in which case it was > not very prudent of the committer to actually record the link in the > first place. But nothing breaks if you don't have his private branch > locally. How do I verify (think git-fsck) that what the origin lines refer to are, in fact, commits with the proper relationships? Either they HAVE to be in the repository or the references do not belong in the header. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 0:03 ` A Large Angry SCM @ 2008-09-12 0:13 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 0:13 UTC (permalink / raw) To: A Large Angry SCM; +Cc: Linus Torvalds, Jakub Narebski, git A Large Angry SCM wrote: >Stephen R. van den Berg wrote: >>No it does not. >>The cryptographic seal is calculated over the content of the commit, >>which includes the hashes of all referenced objects, but doesn't include >>the objects themselves. >>The content of the commit is not violated. >The fetch MUST gather the referenced objects ALWAYS or I can't verify >the history. To do otherwise means that ID strings on the origin lines >are nothing more than an arbitrary text tag and not pointer to a >specific history. To fetch, by default, the origin lines *are* nothing more than arbitrary text and not a pointer to a specific history. >How do I verify (think git-fsck) that what the origin lines refer to >are, in fact, commits with the proper relationships? Either they HAVE to >be in the repository or the references do not belong in the header. If the origin hashes are not reachable, then fsck is required to silently skip them, according to spec. If the origin hashes *are* reachable, then fsck is required to verify that they refer to proper commits with a normal history. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 6:22 ` Stephen R. van den Berg 2008-09-11 8:20 ` Jakub Narebski 2008-09-11 12:28 ` A Large Angry SCM @ 2008-09-11 15:39 ` Linus Torvalds 2008-09-11 16:01 ` Paolo Bonzini ` (2 more replies) 2 siblings, 3 replies; 137+ messages in thread From: Linus Torvalds @ 2008-09-11 15:39 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jakub Narebski, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > > >delete of the origin branch will basically make them unreachable. > > False. Stephen, here's a f*cking clue: - I know how git works. > If you fetch just branches A, B and C, but not D, the origin link from A > to D is dangling. Once you have fetched D as well [..] So I just said we deleted beanch 'D', so there's no way to ever fetch it again. Get it? The fact is, a big part of git is temporary branches. It's one of the *best* features of git. Throw-away stuff. Those throw-away branches are often done for initial development, and then the final result is often a cleaned-up version. Often using rebase or cherry-picking or any number of things. And this is why "git cherry-pick" DOES NOT PUT THE ORIGINAL SHA1 IN THE COMMENT FIELD BY DEFAULT. (Although you can use "-x" to make it do so for when you actually _want_ to say "cherry-picked from xyzzy") Can you not understand that? The "origin" field is _garbage_. It's garbage for all normal cases. The original commit will not ever even EXIST in the result, because it has long since been thrown away and will never exist anywhere else. Garbage should be _avoided_, not added. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 15:39 ` Linus Torvalds @ 2008-09-11 16:01 ` Paolo Bonzini 2008-09-11 16:23 ` Linus Torvalds 2008-09-11 16:53 ` Jakub Narebski 2008-09-11 19:23 ` Stephen R. van den Berg 2 siblings, 1 reply; 137+ messages in thread From: Paolo Bonzini @ 2008-09-11 16:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: Stephen R. van den Berg, Jakub Narebski, git >> If you fetch just branches A, B and C, but not D, the origin link from A >> to D is dangling. Once you have fetched D as well [..] > > So I just said we deleted beanch 'D', so there's no way to ever fetch it > again. > > Get it? Yes, but you should not have used Stephen's proposed new option to git cherry-pick, just like you shouldn't have used the existing -x option. "-x" would not have created a dangling reference, but it would have created a puzzling commit message. > The fact is, a big part of git is temporary branches. It's one of the > *best* features of git. Throw-away stuff. Those throw-away branches are > often done for initial development, and then the final result is often a > cleaned-up version. Often using rebase or cherry-picking or any number of > things. These days I doubt people would use cherry-pick, they would probably use interactive rebase. But anyway, exactly for the same reason... > "git cherry-pick" DOES NOT PUT THE ORIGINAL SHA1 IN THE > COMMENT FIELD BY DEFAULT. ... neither should cherry-picking create the origin link by default. Only if requested by the user, using a new option that is basically "-x" done in a different way. Just like "-x", it should not be used when cherry-picking from private branches. But say someone does it, then what happens? If people clone the branch, the reference will be basically unusable. But since "git gc" does not delete the referenced commit, at least the origin commit is still available in the repository where the cherry-pick was made. It is debatable whether it is better or worse than "-x". Can we discuss instead a generic way to have porcelain-level metadata, immutable or at least versioned, for the commit objects? (This is the same kind of metadata as the author or committer, which clearly have nothing to do with the git plumbing.) Do you have any proposal of saner semantics, not for the origin link but for commit references within this kind of metadata in general? Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 16:01 ` Paolo Bonzini @ 2008-09-11 16:23 ` Linus Torvalds 2008-09-11 20:16 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2008-09-11 16:23 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Stephen R. van den Berg, Jakub Narebski, git On Thu, 11 Sep 2008, Paolo Bonzini wrote: > > Yes, but you should not have used Stephen's proposed new option to git > cherry-pick, just like you shouldn't have used the existing -x option. > "-x" would not have created a dangling reference, but it would have > created a puzzling commit message. But my point is, _none_ of what Stephen proposes has _any_ advantage over the already existing functionality. IOW, absolutely *everything* is actually done better with existing data structures, and then just adding tools to perhaps follow those SHA1's in the commit message. The whole "origin" field doesn't have any semantics that make sense for core git. It's basically ignored by all normal git operations, and the _only_ things that people seem to point out as being features are things that can - and obviously in my opinion should - be done by much higher levels. For example, the claim was that it's hard to follow the chain of cherry-picks. That's not _true_. Use gitweb and gitk, and you can already see them. Sure, you need to use "-x", BUT YOU'D HAVE TO USE THAT WITH Steven's MODEL TOO! Exactly because it would be a frigging _disaster_ if that "origin" field was done by default. And the only thing that "origin" does is: - hide the information - make it easier to make mistakes (either enable the feature by default, or not notice that you didn't enable it when you wanted to) - add a requirement for a backwards-incompatible field that is just guaranteed to confuse any old git binaries. - make it _harder_ to do things like send revert/cherry-pick information by email. See? There are only downsides. Look at the kernel -stable trees. They explicitly add that cherry-pick information, and can add *more*. For example, they go look at http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.26.y.git;a=commit;h=cb09de4542ad75cc3b66d0cf1a86217bf5633416 and then go to its parent commit (just click on the parent SHA). And notice how the stable kernel tree commits talk about where they were back-ported from, or _why_ they aren't back-ports at all! IOW, there are really two main cases: - the common case for cherry-picking: you do not want any origin information, because it's irrelevant, pointless, and *wrong*. - you _do_ want origin information, but you actually want to _explain_ explicitly why it's not irrelevant, pointless, or wrong. And yes, the latter case is about a lot more than "this was cherry-picked". It's about "this fixes that other commit we did", or it's about "this was anti-cherry-picked - ie reverted". They are all "origins" for the commit in the sense that they are relevant to the commit, but they all need some explanation of what _kind_ of origins they are. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 16:23 ` Linus Torvalds @ 2008-09-11 20:16 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 20:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paolo Bonzini, Jakub Narebski, git Linus Torvalds wrote: >But my point is, _none_ of what Stephen proposes has _any_ advantage over >the already existing functionality. I think you're missing some of the advantages because you don't have a lot of experience with cherry-pick workflows between multiple permanent branches. >IOW, absolutely *everything* is actually done better with existing data >structures, and then just adding tools to perhaps follow those SHA1's in >the commit message. The best way to explain the difference is probably by implementing the free-form support, so I think I'll do that. >For example, the claim was that it's hard to follow the chain of >cherry-picks. That's not _true_. Use gitweb and gitk, and you can already >see them. Sure, you need to use "-x", BUT YOU'D HAVE TO USE THAT WITH >Steven's MODEL TOO! The existing cherry-pick -x option doesn't cut it, it helps for the simple cases, yes, but there are cherry-pick situations where it just adds to the confusion. >Exactly because it would be a frigging _disaster_ if that "origin" field >was done by default. That never was the intention, and never will be happening. >And the only thing that "origin" does is: > - hide the information Only if you want to hide it, you control if it does, this point is moot. > - make it easier to make mistakes (either enable the feature by default, > or not notice that you didn't enable it when you wanted to) The same holds for -x, so this point is moot as well. > - add a requirement for a backwards-incompatible field that is just > guaranteed to confuse any old git binaries. This is a problem, I admit, but maybe this can be solved in the future. Then again, since use of the feature is a *very* conscious decision, anyone using the feature can advise their users to use git version xxx at least. > - make it _harder_ to do things like send revert/cherry-pick information > by email. Not necessarily, adding an Origin field in the patch sent by mail is easy. I don't see how it would be more difficult otherwise. Please explain. >See? There are only downsides. I think I just neutralised all but one of the mentioned downsides, and the backward compatibility issue is at least mitigated. >and then go to its parent commit (just click on the parent SHA). And >notice how the stable kernel tree commits talk about where they were >back-ported from, or _why_ they aren't back-ports at all! And this is impossible when using the origin link? The usage with an origin link would be just as flexible, even more so. >IOW, there are really two main cases: > - the common case for cherry-picking: you do not want any origin > information, because it's irrelevant, pointless, and *wrong*. Quite, and my proposal is not generating those anyway. > - you _do_ want origin information, but you actually want to _explain_ > explicitly why it's not irrelevant, pointless, or wrong. >And yes, the latter case is about a lot more than "this was >cherry-picked". It's about "this fixes that other commit we did", or it's >about "this was anti-cherry-picked - ie reverted". They are all "origins" >for the commit in the sense that they are relevant to the commit, but they >all need some explanation of what _kind_ of origins they are. Yes, and that *extra* information can and should go into the free-form commit message, alongside of the origin field inside the header (or trailer), just edit the commit message before committing after a cherry-pick -o. What's your point? -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 15:39 ` Linus Torvalds 2008-09-11 16:01 ` Paolo Bonzini @ 2008-09-11 16:53 ` Jakub Narebski 2008-09-11 19:23 ` Stephen R. van den Berg 2 siblings, 0 replies; 137+ messages in thread From: Jakub Narebski @ 2008-09-11 16:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: Stephen R. van den Berg, git Linus Torvalds wrote: > On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > >> If you fetch just branches A, B and C, but not D, the origin link from A >> to D is dangling. Once you have fetched D as well [..] > > So I just said we deleted beanch 'D', so there's no way to ever fetch it > again. > > Get it? > > The fact is, a big part of git is temporary branches. It's one of the > *best* features of git. Throw-away stuff. Those throw-away branches are > often done for initial development, and then the final result is often a > cleaned-up version. Often using rebase or cherry-picking or any number of > things. > > And this is why "git cherry-pick" DOES NOT PUT THE ORIGINAL SHA1 IN THE > COMMENT FIELD BY DEFAULT. > > (Although you can use "-x" to make it do so for when you actually _want_ > to say "cherry-picked from xyzzy") And that is why the proposal was to use "-o" option to git-cherry-pick to add 'origin'/'changeset' header, exactly because git-cherry-pick is _abused_ to clean up branches and reorder commits; although I think that "git rebase --interactive" (and patch management interfaces) do replace using git-cherry-pick for that purpose. git-revert would add 'origin'/'changeset' header unconditionally, just like by default it seeds commit message with SHA-1 id of reverted commit. > Can you not understand that? The "origin" field is _garbage_. It's garbage > for all normal cases. The original commit will not ever even EXIST in the > result, because it has long since been thrown away and will never exist > anywhere else. > > Garbage should be _avoided_, not added. Hmmm... the difference between having 'origin' in a commit object header, and having it in commit mesage is like difference between 'Signed-off-by:' convention and 'author' header. First is the matter of workflow, second is inherent, required and non-avoidable part of revision information. On the other hand git-cherry and git-blame would then have rely on parsing correctly free-form part of a commit object, to take advantage of 'origin' information: something what 'origin' info is for. P.S. 'generation' header was not added... just saying... :-) -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 15:39 ` Linus Torvalds 2008-09-11 16:01 ` Paolo Bonzini 2008-09-11 16:53 ` Jakub Narebski @ 2008-09-11 19:23 ` Stephen R. van den Berg 2008-09-11 19:45 ` Nicolas Pitre 2 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 19:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jakub Narebski, git Linus Torvalds wrote: >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> >delete of the origin branch will basically make them unreachable. >> False. >Stephen, here's a f*cking clue: > - I know how git works. I'd presume you do, but that doesn't mean you always accurately express yourself. >> If you fetch just branches A, B and C, but not D, the origin link from A >> to D is dangling. Once you have fetched D as well [..] >So I just said we deleted beanch 'D', so there's no way to ever fetch it >again. You did not state you deleted branch 'D' on the repository being fetched *FROM*. I assumed you meant you deleted branch 'D' on the repository doing the fetching (after having fetched 'D' in the past). >Get it? "You stupid git". >The fact is, a big part of git is temporary branches. It's one of the >*best* features of git. Throw-away stuff. Those throw-away branches are >often done for initial development, and then the final result is often a >cleaned-up version. Often using rebase or cherry-picking or any number of >things. Indeed, features I value in git very much, and use every day, thanks. [...portions of man git-cherry-pick stripped...] >Can you not understand that? The "origin" field is _garbage_. It's garbage >for all normal cases. The original commit will not ever even EXIST in the >result, because it has long since been thrown away and will never exist >anywhere else. The origin field will *not* be created on regular cherry-picks, this *would* create garbage. The origin field is not meant to be generated when doing things with temporary branches. The origin field is meant to be filled *ONLY* when cherry-picking from one permanent branch to another permanent branch. This is a *rare* operation. >Garbage should be _avoided_, not added. Quite. I do understand that "normal cases" in your case mean cherry-picks among temporary branches. Well, you are completely right that *your* normal cases should not (and will not) generate an origin field. The origin field is intended for the *abnormal* cases, which means cherry-picking between permanent branches (which, apparently, you rarely do, if ever), this is something that (depending on your workflow) can be a more frequent event. For *those* cases, the origin field will not contain garbage. -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:23 ` Stephen R. van den Berg @ 2008-09-11 19:45 ` Nicolas Pitre 2008-09-11 19:55 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Nicolas Pitre @ 2008-09-11 19:45 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Linus Torvalds, Jakub Narebski, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > The origin field will *not* be created on regular cherry-picks, this > *would* create garbage. The origin field is not meant to be generated > when doing things with temporary branches. The origin field is meant to > be filled *ONLY* when cherry-picking from one permanent branch to > another permanent branch. This is a *rare* operation. ... and therefore you might as well just have a separate file (which might or might not be tracked by git like the .gitignore files are) to keep that information? Since this is a rare operation, modifying the core database structure for this doesn't appear that appealing to most so far. And, while recording this origin link is optional, you are likely to make mistakes like forgotting to record it, or you might even wish to fix it with better links after the facts. Having it versionned also means that older git versions will be able to carry that information even if they won't make any use of it, and that also solves the cryptographic issue since that data is part of the top commit SHA1. Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:45 ` Nicolas Pitre @ 2008-09-11 19:55 ` Stephen R. van den Berg 2008-09-11 20:27 ` Nicolas Pitre 2008-09-11 21:01 ` Theodore Tso 0 siblings, 2 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 19:55 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Linus Torvalds, Jakub Narebski, git Nicolas Pitre wrote: >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> when doing things with temporary branches. The origin field is meant to >> be filled *ONLY* when cherry-picking from one permanent branch to >> another permanent branch. This is a *rare* operation. >... and therefore you might as well just have a separate file (which >might or might not be tracked by git like the .gitignore files are) >to keep that information? Since this is a rare operation, modifying the >core database structure for this doesn't appear that appealing to most >so far. For various reasons, the best alternate place would be at the trailing end of the free-form field. Using a separate structure causes (performance) problems (mostly). >And, while recording this origin link is optional, you are likely to >make mistakes like forgotting to record it, That is just as likely filling in the wrong commit message. > or you might even wish to >fix it with better links after the facts. That is not possible for commit messages, and should not be possible for origin links either (same reasons). > Having it versionned also >means that older git versions will be able to carry that information >even if they won't make any use of it, and that also solves the >cryptographic issue since that data is part of the top commit SHA1. It would allow the data to be faked, that is undesirable for "git blame". -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:55 ` Stephen R. van den Berg @ 2008-09-11 20:27 ` Nicolas Pitre 2008-09-12 8:50 ` Stephen R. van den Berg 2008-09-11 21:01 ` Theodore Tso 1 sibling, 1 reply; 137+ messages in thread From: Nicolas Pitre @ 2008-09-11 20:27 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Linus Torvalds, Jakub Narebski, git On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > Nicolas Pitre wrote: > >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: > >> when doing things with temporary branches. The origin field is meant to > >> be filled *ONLY* when cherry-picking from one permanent branch to > >> another permanent branch. This is a *rare* operation. > > >... and therefore you might as well just have a separate file (which > >might or might not be tracked by git like the .gitignore files are) > >to keep that information? Since this is a rare operation, modifying the > >core database structure for this doesn't appear that appealing to most > >so far. > > For various reasons, the best alternate place would be at the trailing > end of the free-form field. Using a separate structure causes > (performance) problems (mostly). Did you try it? I don't particularly buy this performance argument, and the bulk of my contributions to git so far were about performances. It is quite easy to load a flat file with sorted commit SHA1s, and given that origin links are the result of a rare operation, then there shouldn't be too many entries to search through. Hell, doing 213647 lookups (and many other things like inflating zlib deflated data) with each of them for commit objects in my Linux repository which has 1355167 total entries takes only 6 seconds here, or about a quarter of a milisecond for each lookup. I doubt doing an extra lookup in a much smaller table would show on the radar. Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 20:27 ` Nicolas Pitre @ 2008-09-12 8:50 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 8:50 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Linus Torvalds, Jakub Narebski, git Nicolas Pitre wrote: >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> Nicolas Pitre wrote: >> >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote: >> >> when doing things with temporary branches. The origin field is meant to >> >> be filled *ONLY* when cherry-picking from one permanent branch to >> >> another permanent branch. This is a *rare* operation. >> >... and therefore you might as well just have a separate file (which >> >might or might not be tracked by git like the .gitignore files are) >> >to keep that information? Since this is a rare operation, modifying the >> >core database structure for this doesn't appear that appealing to most >> >so far. >> For various reasons, the best alternate place would be at the trailing >> end of the free-form field. Using a separate structure causes >> (performance) problems (mostly). >Did you try it? No. > I don't particularly buy this performance argument, and >the bulk of my contributions to git so far were about performances. It >is quite easy to load a flat file with sorted commit SHA1s, and given >that origin links are the result of a rare operation, then there >shouldn't be too many entries to search through. Hell, doing 213647 True. >lookups (and many other things like inflating zlib deflated data) with >each of them for commit objects in my Linux repository which has 1355167 >total entries takes only 6 seconds here, or about a quarter of a >milisecond for each lookup. I doubt doing an extra lookup in a much >smaller table would show on the radar. Maybe you're right. The reason why my first knee-jerk reaction is "performance problem" is because: - The field is rarely present. - When it is used, we look for it on every commit we traverse. - This means that finding out the field does *not* exist is the most common operation, and that effort rises linearly with the number of commits visited. Whereas if the information is present in the header or trailer of the commit, finding out that the field does not exist there is rather cheap. But you could very well be right, that the absolute extra time spent might be negligible for all intents and purposes. Nonetheless, the data-integrity argument still holds, i.e. placing it in the commit (header or trailer) automatically protects it. External files need extra care if you want the same integrity protection. -- Sincerely, Stephen R. van den Berg. "Father's Day: Nine months before Mother's Day." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 19:55 ` Stephen R. van den Berg 2008-09-11 20:27 ` Nicolas Pitre @ 2008-09-11 21:01 ` Theodore Tso 2008-09-12 8:40 ` Stephen R. van den Berg 1 sibling, 1 reply; 137+ messages in thread From: Theodore Tso @ 2008-09-11 21:01 UTC (permalink / raw) To: Stephen R. van den Berg Cc: Nicolas Pitre, Linus Torvalds, Jakub Narebski, git On Thu, Sep 11, 2008 at 09:55:16PM +0200, Stephen R. van den Berg wrote: > > Having it versionned also > >means that older git versions will be able to carry that information > >even if they won't make any use of it, and that also solves the > >cryptographic issue since that data is part of the top commit SHA1. > > It would allow the data to be faked, that is undesirable for "git blame". Why would this matter? The information is largely self-authenticating. If a commit claims to have come from some other cherry-pick, a human taking a quick look at it would know instantly that this wasn't true. So what's the harm done if some incorrect information gets introduced? "git blame" is something which is generally used by humans, not by automated programs. Also, what's the attack scenario? The person who originally makes the commit can easily fake the origin link information. They can hack git to fill on some other commit ID, for example. So what you are protecting against is someone after the fact adding the annotation that this commit was related to this other commit. When would this be a bad thing to do? If they are adding correct information, it's a good thing. If they add incorrect information, what's the harm they can as a result of being able to add the incorrect information. (Noting that if this annotation file is kept under git control, you can use what ever access controls and/or process controls that verify that a new cherry-pick --- or a commit claiming to be a cherry-pick --- is valid and should be accepted into the master git repository for that project. - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 21:01 ` Theodore Tso @ 2008-09-12 8:40 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 8:40 UTC (permalink / raw) To: Theodore Tso; +Cc: Nicolas Pitre, Linus Torvalds, Jakub Narebski, git Theodore Tso wrote: >On Thu, Sep 11, 2008 at 09:55:16PM +0200, Stephen R. van den Berg wrote: >> > Having it versionned also >> >means that older git versions will be able to carry that information >> >even if they won't make any use of it, and that also solves the >> >cryptographic issue since that data is part of the top commit SHA1. >> It would allow the data to be faked, that is undesirable for "git blame". >Why would this matter? The information is largely >self-authenticating. If a commit claims to have come from some other Attack-wise, you're right, it's not a big deal. I think the comforting feeling one gets about the hashes protecting integrity is what matters more for me here. -- Sincerely, Stephen R. van den Berg. "Father's Day: Nine months before Mother's Day." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 0:23 ` Linus Torvalds 2008-09-10 5:42 ` Stephen R. van den Berg @ 2008-09-10 8:30 ` Paolo Bonzini 2008-09-10 15:32 ` Linus Torvalds 1 sibling, 1 reply; 137+ messages in thread From: Paolo Bonzini @ 2008-09-10 8:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Stephen R. van den Berg, Jakub Narebski, git Linus Torvalds wrote: > > On Wed, 10 Sep 2008, Stephen R. van den Berg wrote: >> As you might have noticed, the actual process of pulling/fetching >> explicitly does *not* pull in the objects being pointed to. > > .. which makes them _local_ data, which in turn means that they should not > be in the object database at all. Not really local data. More like _weakly referenced_ data. If it is there, cool. If it is not there, no big deal. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 8:30 ` Paolo Bonzini @ 2008-09-10 15:32 ` Linus Torvalds 2008-09-10 15:37 ` Paolo Bonzini 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2008-09-10 15:32 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Stephen R. van den Berg, Jakub Narebski, git On Wed, 10 Sep 2008, Paolo Bonzini wrote: > > Not really local data. More like _weakly referenced_ data. If it is > there, cool. If it is not there, no big deal. You think it's "cool". I think it is "unreliable, random, and depends on the phase of the moon". My definition of "cool" is a totally different thing. What you describe is the very anti-thesis of cool. If you want unreliable and random, use CVS. Please. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 15:32 ` Linus Torvalds @ 2008-09-10 15:37 ` Paolo Bonzini 2008-09-10 15:43 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: Paolo Bonzini @ 2008-09-10 15:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Stephen R. van den Berg, Jakub Narebski, git Linus Torvalds wrote: > > On Wed, 10 Sep 2008, Paolo Bonzini wrote: >> Not really local data. More like _weakly referenced_ data. If it is >> there, cool. If it is not there, no big deal. > > You think it's "cool". > > I think it is "unreliable, random, and depends on the phase of the moon". I think that shallow clones are not any different from this. If the required piece of history is there, cool. If they're not there, no big deal. I understood the hyperbole, but I think that it's not unreliable, because all it relies on is the uniqueness of SHA1 values. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 15:37 ` Paolo Bonzini @ 2008-09-10 15:43 ` Linus Torvalds 2008-09-10 15:46 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: Linus Torvalds @ 2008-09-10 15:43 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Stephen R. van den Berg, Jakub Narebski, git On Wed, 10 Sep 2008, Paolo Bonzini wrote: > > I think that shallow clones are not any different from this. If the > required piece of history is there, cool. If they're not there, no big > deal. Sure. I don't use them either. But because I don't use them, it doesn't affect me. It also doesn't change the core git data structures in any way to introduce any new problems. Also, if there isn't a required piece of history, things generally break very loudly. IOW, there are only certain things you can do with a shallow repo. In general it's absolutely _not_ a "no big deal" issue, quite the reverse - it's a deal-breaker. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 15:43 ` Linus Torvalds @ 2008-09-10 15:46 ` Linus Torvalds 2008-09-10 15:57 ` Paolo Bonzini 2008-09-10 16:23 ` Jakub Narebski 0 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2008-09-10 15:46 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Stephen R. van den Berg, Jakub Narebski, git On Wed, 10 Sep 2008, Linus Torvalds wrote: > > Sure. I don't use them either. But because I don't use them, it doesn't > affect me. It also doesn't change the core git data structures in any way > to introduce any new problems. Btw, so far nobody has even _explained_ what the advantage of the origin link is. It apparently has no effect for most things, and for other things it has some (unspecified) effect when it can be resolved. Apart from the "dotted line" in graphical history viewers, I haven't actually heard any single concrete example of exactly what it would *do*. And that dotted line really does sound like something you could do with just the existing "hyperlink" functionality in the commit message. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 15:46 ` Linus Torvalds @ 2008-09-10 15:57 ` Paolo Bonzini 2008-09-10 23:15 ` Stephen R. van den Berg 2008-09-10 16:23 ` Jakub Narebski 1 sibling, 1 reply; 137+ messages in thread From: Paolo Bonzini @ 2008-09-10 15:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Stephen R. van den Berg, Jakub Narebski, git > Btw, so far nobody has even _explained_ what the advantage of the origin > link is. It apparently has no effect for most things, and for other things > it has some (unspecified) effect when it can be resolved. > > Apart from the "dotted line" in graphical history viewers, I haven't > actually heard any single concrete example of exactly what it would *do*. I mentioned git-cherry as an additional use case. Automatic rename detection works because it might have the occasional false negative, but it has practically no false positive, and those are what screws up merges. But automatic changeset detection a la git-patch-id has too many false negatives to make the current implementation of git-cherry practical, and here's when the origin link comes in. Also, automatic changeset detection does not work with reverts, only with cherry-picks. Blame could also use the origin link to go backwards in the history and find the origin of the code, without being fooled by reverts. I'll quote another message I sent in the thread: >> And why are the notes created by git cherry-pick -x insufficient for that? > > For example, these notes (or the ones created by "git revert") are > *wrong* because they talk about commits instead of changesets (deltas > between two commits). > > Why is only one commit present? Because these messages are meant for > users, not for programs. That's easy to see: users think of commits as > deltas anyway, even though git stores them as snapshots---"git show > HEAD" shows a delta, not a snapshot. > > And what does this mean for programs? That they must resort to > commit-message scraping to distinguish the two cases. (*) > > (*) A GUI blame program, for example, would need to distinguish > whether code added by a commit is taken from commit 4329bd8, or is > reverting commit 4329bd8. (In the first case, the author of that > code is whoever was responsible for that code in 4329bd8; in the > second case, it is whoever was responsible for that code in > 4329bd8^). If recording changesets, you see 4329bd8^..4329bd8 in > the first case, and 4329bd8..4329bd8^ in the second, so it is trivial > to follow the chain. > > And scraping is bad. Imagine people that are writing commit messages in > their native language. What if they patch git to translate the magic > notes created by "git cherry-pick -x" or "git revert" (maybe a future > version of git will do that automatically)? Should they translate also > every program that scrapes the messages? > > > Whenever there is a piece of data that could be useful to programs (no > matter if plumbing or porcelain), I consider free form notes to be bad. > Because data is data, and metadata is metadata. > > If there was a generic way to put porcelain-level metadata in commit > messages (e.g. Signed-Off-By and Acknowledged-By can be already > considered metadata), I would not be so much in favor of "origin" links > being part of the commit object's format. Now if you think about it, > commit references within this kind of metadata would have mostly the > properties that Stephen explained in his first message: > > 1) they would be rewritten by git-filter-branch > > 2) these references, albeit weak by default (Note to Linus: reinforcement of your disagreement will be implicitly assumed :-) > could optionally be > followed when fetching (either with command-line or configuration options) > > 3) they would not be pruned by git-gc, unlike notes > > 4) possibly, git rev-list --topo-order would sort commits by taking into > account metadata references too. > > So the implementation effort would be roughly the same. > > But, can you think of any other such metadata? Personally I can't, so > while I understand the opposition to a new commit header field that > would be there from here to eternity (or until the LHC starts), I do > think it is the simplest thing that can possibly work. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 15:57 ` Paolo Bonzini @ 2008-09-10 23:15 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 23:15 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Linus Torvalds, Jakub Narebski, git Paolo Bonzini wrote: >> Btw, so far nobody has even _explained_ what the advantage of the origin >> link is. It apparently has no effect for most things, and for other things >> it has some (unspecified) effect when it can be resolved. >> Apart from the "dotted line" in graphical history viewers, I haven't >> actually heard any single concrete example of exactly what it would *do*. It allows one to follow and view the evolvement of a patch over time during the various backports. -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 15:46 ` Linus Torvalds 2008-09-10 15:57 ` Paolo Bonzini @ 2008-09-10 16:23 ` Jakub Narebski 2008-09-11 23:28 ` Sam Vilain 1 sibling, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2008-09-10 16:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paolo Bonzini, Stephen R. van den Berg, git Linus Torvalds wrote: > On Wed, 10 Sep 2008, Linus Torvalds wrote: > > > > Sure. I don't use them either. But because I don't use them, it doesn't > > affect me. It also doesn't change the core git data structures in any way > > to introduce any new problems. > > Btw, so far nobody has even _explained_ what the advantage of the origin > link is. It apparently has no effect for most things, and for other things > it has some (unspecified) effect when it can be resolved. > > Apart from the "dotted line" in graphical history viewers, I haven't > actually heard any single concrete example of exactly what it would *do*. > > And that dotted line really does sound like something you could do with > just the existing "hyperlink" functionality in the commit message. As far as I understand (note: I'm neither for, nor against the proposal; although I think it has thin chance to be accepted, especially soon), it is for graphical history viewers, for git-cherry to make it more precise (to detect duplicated/cherry-picked changes better), and in the future possibly to help history-aware merge strategies. And probably help patch management interfaces. On the theoretical front it looks like extension/generalization of a parent link, marking given commit do be derivative not only some set of trees, or some line of history, but also on some changeset. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 16:23 ` Jakub Narebski @ 2008-09-11 23:28 ` Sam Vilain 2008-09-11 23:44 ` Linus Torvalds 0 siblings, 1 reply; 137+ messages in thread From: Sam Vilain @ 2008-09-11 23:28 UTC (permalink / raw) To: Jakub Narebski Cc: Linus Torvalds, Paolo Bonzini, Stephen R. van den Berg, git Jakub Narebski wrote: >> And that dotted line really does sound like something you could do with >> just the existing "hyperlink" functionality in the commit message. > > As far as I understand (note: I'm neither for, nor against the proposal; > although I think it has thin chance to be accepted, especially soon), > it is for graphical history viewers, for git-cherry to make it more > precise (to detect duplicated/cherry-picked changes better), and in > the future possibly to help history-aware merge strategies. And probably > help patch management interfaces. Can I suggest, 1. bury this origin link idea 2. make git-cherry-pick have a similar option to '-x', but instead of recording the original commit ID, record the original *patch* ID, *if* there was a merge conflict for that cherry pick. 3. tools can build indexes from patch ID => (commit IDs) to make this other form of history navigation fast. Sam ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 23:28 ` Sam Vilain @ 2008-09-11 23:44 ` Linus Torvalds 2008-09-12 2:24 ` Sam Vilain 2008-09-12 5:47 ` Stephen R. van den Berg 0 siblings, 2 replies; 137+ messages in thread From: Linus Torvalds @ 2008-09-11 23:44 UTC (permalink / raw) To: Sam Vilain; +Cc: Jakub Narebski, Paolo Bonzini, Stephen R. van den Berg, git On Fri, 12 Sep 2008, Sam Vilain wrote: > > 2. make git-cherry-pick have a similar option to '-x', but instead of > recording the original commit ID, record the original *patch* ID, > *if* there was a merge conflict for that cherry pick. Actually, don't make it dependent on merge conflicts. Just make it depend on whether the patch ID is _different_. It can happen even without any conflicts, just because the context changed. So it really isn't about merge conflicts per se, just the fact that a patch can change when it is applied in a new area with a three-way diff - or because it got applied with fuzz. You could add it as a Original-patch-id: <sha1> or something. And then you just need to teach "git cherry/rebase" to take both the original ID and the new one into account when deciding whether it has already seen that patch. Linus ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 23:44 ` Linus Torvalds @ 2008-09-12 2:24 ` Sam Vilain 2008-09-12 5:47 ` Stephen R. van den Berg 1 sibling, 0 replies; 137+ messages in thread From: Sam Vilain @ 2008-09-12 2:24 UTC (permalink / raw) To: Linus Torvalds Cc: Jakub Narebski, Paolo Bonzini, Stephen R. van den Berg, git Linus Torvalds wrote: > > On Fri, 12 Sep 2008, Sam Vilain wrote: >> 2. make git-cherry-pick have a similar option to '-x', but instead of >> recording the original commit ID, record the original *patch* ID, >> *if* there was a merge conflict for that cherry pick. > > Actually, don't make it dependent on merge conflicts. Just make it depend > on whether the patch ID is _different_. > > It can happen even without any conflicts, just because the context > changed. So it really isn't about merge conflicts per se, just the fact > that a patch can change when it is applied in a new area with a three-way > diff - or because it got applied with fuzz. > > You could add it as a > > Original-patch-id: <sha1> > > or something. And then you just need to teach "git cherry/rebase" to take > both the original ID and the new one into account when deciding whether it > has already seen that patch. Yes, right - it's the patch ID changing that's the problem for git-cherry / rev-list --cherry-pick to be able to spot changes as the 'same'. Someone else pointed out that git-rebase -i might want to have this as well. I actually looked into coding this, but there was a little problem with the way git-revert worked - it builds the commit message before the diff is calculated. So there would probably need to be a little trivial refactoring first before this can be implemented. Sam. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 23:44 ` Linus Torvalds 2008-09-12 2:24 ` Sam Vilain @ 2008-09-12 5:47 ` Stephen R. van den Berg 2008-09-12 6:19 ` Rogan Dawes 2008-09-12 14:58 ` Theodore Tso 1 sibling, 2 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 5:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Sam Vilain, Jakub Narebski, Paolo Bonzini, git Linus Torvalds wrote: >On Fri, 12 Sep 2008, Sam Vilain wrote: >It can happen even without any conflicts, just because the context >changed. So it really isn't about merge conflicts per se, just the fact >that a patch can change when it is applied in a new area with a three-way >diff - or because it got applied with fuzz. Quite. >You could add it as a > Original-patch-id: <sha1> That will probably work fine when operating locally on (short) temporary branches. It would probably become computationally prohibitive to use it between long lived permanent branches. In that case it would need to be augmented by the sha1 of the originating commit. Which gives you two hashes as reference, and in that case you might as well use the two commit hashes of which the difference yields the patch. -- Sincerely, Stephen R. van den Berg. "Father's Day: Nine months before Mother's Day." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 5:47 ` Stephen R. van den Berg @ 2008-09-12 6:19 ` Rogan Dawes 2008-09-12 6:56 ` Stephen R. van den Berg 2008-09-12 14:58 ` Theodore Tso 1 sibling, 1 reply; 137+ messages in thread From: Rogan Dawes @ 2008-09-12 6:19 UTC (permalink / raw) To: Stephen R. van den Berg Cc: Linus Torvalds, Sam Vilain, Jakub Narebski, Paolo Bonzini, git Stephen R. van den Berg wrote: > Linus Torvalds wrote: >> On Fri, 12 Sep 2008, Sam Vilain wrote: >> It can happen even without any conflicts, just because the context >> changed. So it really isn't about merge conflicts per se, just the fact >> that a patch can change when it is applied in a new area with a three-way >> diff - or because it got applied with fuzz. > > Quite. > >> You could add it as a > >> Original-patch-id: <sha1> > > That will probably work fine when operating locally on (short) temporary > branches. > > It would probably become computationally prohibitive to use it between > long lived permanent branches. In that case it would need to be > augmented by the sha1 of the originating commit. Which gives you two > hashes as reference, and in that case you might as well use the two > commit hashes of which the difference yields the patch. Pardon my confusion, but why include two commit hashes? Surely the commit already has its parent, so there is no need to include that in your "cherry pick". And if the commit has more than one parent, then I doubt you could/should really cherry-pick it anyway. Besides, you could always augment your local repo with a mapping of patch ids to commits/commit pairs to reduce lookup time. Rogan ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 6:19 ` Rogan Dawes @ 2008-09-12 6:56 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 6:56 UTC (permalink / raw) To: Rogan Dawes Cc: Linus Torvalds, Sam Vilain, Jakub Narebski, Paolo Bonzini, git Rogan Dawes wrote: >Stephen R. van den Berg wrote: >>It would probably become computationally prohibitive to use it between >>long lived permanent branches. In that case it would need to be >>augmented by the sha1 of the originating commit. Which gives you two >>hashes as reference, and in that case you might as well use the two >>commit hashes of which the difference yields the patch. >Pardon my confusion, but why include two commit hashes? Surely the >commit already has its parent, so there is no need to include that in >your "cherry pick". And if the commit has more than one parent, then I >doubt you could/should really cherry-pick it anyway. Well, actually, sometimes cherry-pick does pick just one of the (multiple) parents to diff with; also, some people (not I) envisioned using two commits which were not a direct parent and child of one another (I'm not quite sure how that would work, but the model would support it). >Besides, you could always augment your local repo with a mapping of >patch ids to commits/commit pairs to reduce lookup time. Yes, possible. But then after cloning, this mapping-cache needs to be recreated, and that would mean that one would have to walk through all commits and calculate all patch-id's, of which then only those few which are referenced need to be stored. -- Sincerely, Stephen R. van den Berg. "Father's Day: Nine months before Mother's Day." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 5:47 ` Stephen R. van den Berg 2008-09-12 6:19 ` Rogan Dawes @ 2008-09-12 14:58 ` Theodore Tso 2008-09-12 15:05 ` Paolo Bonzini ` (2 more replies) 1 sibling, 3 replies; 137+ messages in thread From: Theodore Tso @ 2008-09-12 14:58 UTC (permalink / raw) To: Stephen R. van den Berg Cc: Linus Torvalds, Sam Vilain, Jakub Narebski, Paolo Bonzini, git On Fri, Sep 12, 2008 at 07:47:39AM +0200, Stephen R. van den Berg wrote: > >You could add it as a > > > Original-patch-id: <sha1> > > That will probably work fine when operating locally on (short) temporary > branches. > > It would probably become computationally prohibitive to use it between > long lived permanent branches. In that case it would need to be > augmented by the sha1 of the originating commit. Nope, as Sam suggested in his original message (but which got clipped by Linus when he was replying) all you have to do is to have a separate local database which ties commits and patch-id's together as a cache/index. I know you seem to be resistent to caches, but caches are **good** because they are local information, which by definition can be implementation-dependent; you can always generate the cache from the git repository if for some reason you need to extend it. It also means that if it turns out you need to index reationships a different way, you can do that without having to make fundamental (incompatible) changes in the git object. It's much like SQL databases; you have your database tables, where making changes to the database schema is painful --- and indexes, which can be added and dropped with much less effort. Think of these local caches are database indexes. Just because you need an index in a particular direction to optimize a query or loopup operation does ***not*** imply that you need to make a fundamental, globally visible, database schema change or git object layout which breaks compatibility for everybody. - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 14:58 ` Theodore Tso @ 2008-09-12 15:05 ` Paolo Bonzini 2008-09-12 15:11 ` Jakub Narebski 2008-09-12 15:54 ` Stephen R. van den Berg 2 siblings, 0 replies; 137+ messages in thread From: Paolo Bonzini @ 2008-09-12 15:05 UTC (permalink / raw) To: Theodore Tso Cc: Stephen R. van den Berg, Linus Torvalds, Sam Vilain, Jakub Narebski, git Theodore Tso wrote: > On Fri, Sep 12, 2008 at 07:47:39AM +0200, Stephen R. van den Berg wrote: >>> You could add it as a >>> Original-patch-id: <sha1> >> That will probably work fine when operating locally on (short) temporary >> branches. >> >> It would probably become computationally prohibitive to use it between >> long lived permanent branches. In that case it would need to be >> augmented by the sha1 of the originating commit. > > Nope, as Sam suggested in his original message (but which got clipped > by Linus when he was replying) all you have to do is to have a > separate local database which ties commits and patch-id's together as > a cache/index. Yeah, I must admit I am okay with *this* cache. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 14:58 ` Theodore Tso 2008-09-12 15:05 ` Paolo Bonzini @ 2008-09-12 15:11 ` Jakub Narebski 2008-09-12 15:40 ` Paolo Bonzini 2008-09-12 15:54 ` Stephen R. van den Berg 2 siblings, 1 reply; 137+ messages in thread From: Jakub Narebski @ 2008-09-12 15:11 UTC (permalink / raw) To: Theodore Tso Cc: Stephen R. van den Berg, Linus Torvalds, Sam Vilain, Paolo Bonzini, git Theodore Tso wrote: > On Fri, Sep 12, 2008 at 07:47:39AM +0200, Stephen R. van den Berg wrote: >>> >>>You could add it as a >>> >>> Original-patch-id: <sha1> >> >> That will probably work fine when operating locally on (short) temporary >> branches. >> >> It would probably become computationally prohibitive to use it between >> long lived permanent branches. In that case it would need to be >> augmented by the sha1 of the originating commit. > > Nope, as Sam suggested in his original message (but which got clipped > by Linus when he was replying) all you have to do is to have a > separate local database which ties commits and patch-id's together as > a cache/index. > > I know you seem to be resistent to caches, but caches are **good** > because they are local information, which by definition can be > implementation-dependent; you can always generate the cache from the > git repository if for some reason you need to extend it. [...] But it is not true that "you can always generate the cache from the git repository" in this case; the patch-id that is to be saved is _original_ patch-id of cherry-picked (or reverted) changeset. OTOH it is not much different from reflog information, which also cannot be regenerated from object database. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 15:11 ` Jakub Narebski @ 2008-09-12 15:40 ` Paolo Bonzini 2008-09-12 16:00 ` Theodore Tso 0 siblings, 1 reply; 137+ messages in thread From: Paolo Bonzini @ 2008-09-12 15:40 UTC (permalink / raw) To: Jakub Narebski Cc: Theodore Tso, Stephen R. van den Berg, Linus Torvalds, Sam Vilain, git >> I know you seem to be resistent to caches, but caches are **good** >> because they are local information, which by definition can be >> implementation-dependent; you can always generate the cache from the >> git repository if for some reason you need to extend it. [...] > > But it is not true that "you can always generate the cache from the > git repository" in this case; the patch-id that is to be saved is > _original_ patch-id of cherry-picked (or reverted) changeset. He's proposing storing the original patch id in the commit message, and caching the commit SHA->patch id association on the side. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 15:40 ` Paolo Bonzini @ 2008-09-12 16:00 ` Theodore Tso 0 siblings, 0 replies; 137+ messages in thread From: Theodore Tso @ 2008-09-12 16:00 UTC (permalink / raw) To: Paolo Bonzini Cc: Jakub Narebski, Stephen R. van den Berg, Linus Torvalds, Sam Vilain, git On Fri, Sep 12, 2008 at 05:40:26PM +0200, Paolo Bonzini wrote: > > But it is not true that "you can always generate the cache from the > > git repository" in this case; the patch-id that is to be saved is > > _original_ patch-id of cherry-picked (or reverted) changeset. > > He's proposing storing the original patch id in the commit message, and > caching the commit SHA->patch id association on the side. > Actually its the association in the other direction which you'd want to cache. It's fast given the commit SHA to dig the original patch id out of the commit message. What is harder is given a patch id X, to find all of the commits which either (a) have a patch id of X, or (b) have a commit message indicating that the original patch-id was X. So having a database which caches this information, so given a patch-id, you can quickly look up the related commits, is what I believe Sam was proposing, and which I think would solve the problem quite nicely. - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 14:58 ` Theodore Tso 2008-09-12 15:05 ` Paolo Bonzini 2008-09-12 15:11 ` Jakub Narebski @ 2008-09-12 15:54 ` Stephen R. van den Berg 2008-09-12 16:19 ` Jeff King 2008-09-15 12:21 ` Sam Vilain 2 siblings, 2 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 15:54 UTC (permalink / raw) To: Theodore Tso Cc: Linus Torvalds, Sam Vilain, Jakub Narebski, Paolo Bonzini, git Theodore Tso wrote: >On Fri, Sep 12, 2008 at 07:47:39AM +0200, Stephen R. van den Berg wrote: >> It would probably become computationally prohibitive to use it between >> long lived permanent branches. In that case it would need to be >> augmented by the sha1 of the originating commit. >Nope, as Sam suggested in his original message (but which got clipped >by Linus when he was replying) all you have to do is to have a >separate local database which ties commits and patch-id's together as >a cache/index. True. But repopulating this cache after cloning means that you have to calculate the patch-id of *every* commit in the repository. It sounds like something to avoid, but maybe I'm overly concerned, I have only a vague idea on how computationally intensive this is. >I know you seem to be resistent to caches, but caches are **good** >because they are local information, which by definition can be >implementation-dependent; you can always generate the cache from the >git repository if for some reason you need to extend it. It also >means that if it turns out you need to index reationships a different >way, you can do that without having to make fundamental (incompatible) >changes in the git object. I fully agree that caches are good. And yes I seem to resist the idea to create a cache at every whim, but that mostly is because I want to avoid that everyone invents their own mini-database for each and every data access they want to accellerate. I mean, ideally, any database/index/accellerator structure you'd need can reuse the SHA1 object database index, or maybe one or two other semi-standard index types, and git would provide suitable library functions for all three solutions. And if that would be the case, I'll gladly throw in an extra cache or index at anytime to speed up the particular access pattern I'm trying to make useable. But as far as I can see, those library functions have not materialised yet, so I'm hesitant to create yet another private database structure just for my access patterns; and simply pulling in libdb or sqlite without agreement that those libs are (re)used in a lot of places in git seems a bit bloat-prone. >local caches are database indexes. Just because you need an index in >a particular direction to optimize a query or loopup operation does >***not*** imply that you need to make a fundamental, globally visible, >database schema change or git object layout which breaks compatibility >for everybody. It's not a certainty that changing the git object layout has to break compatibility (it should be reasonably possible to add columns to the schema without breaking anything, to stay with the database paradigm), but I agree that creating another index can be considered better than extending the schema. -- Sincerely, Stephen R. van den Berg. "Father's Day: Nine months before Mother's Day." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 15:54 ` Stephen R. van den Berg @ 2008-09-12 16:19 ` Jeff King 2008-09-12 16:43 ` Stephen R. van den Berg 2008-09-15 12:21 ` Sam Vilain 1 sibling, 1 reply; 137+ messages in thread From: Jeff King @ 2008-09-12 16:19 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: git On Fri, Sep 12, 2008 at 05:54:27PM +0200, Stephen R. van den Berg wrote: > True. But repopulating this cache after cloning means that you have to > calculate the patch-id of *every* commit in the repository. It sounds > like something to avoid, but maybe I'm overly concerned, I have only a > vague idea on how computationally intensive this is. For a rough estimate, try: time git log -p | git patch-id >/dev/null -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 16:19 ` Jeff King @ 2008-09-12 16:43 ` Stephen R. van den Berg 2008-09-12 18:44 ` Theodore Tso 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 16:43 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King wrote: >On Fri, Sep 12, 2008 at 05:54:27PM +0200, Stephen R. van den Berg wrote: >> True. But repopulating this cache after cloning means that you have to >> calculate the patch-id of *every* commit in the repository. It sounds >> like something to avoid, but maybe I'm overly concerned, I have only a >> vague idea on how computationally intensive this is. >For a rough estimate, try: > time git log -p | git patch-id >/dev/null On my system that results in 2ms per commit on average. Not huge, but not small either, I guess. Running it results in real waiting time, it all depends on how patient the user is. -- Sincerely, Stephen R. van den Berg. "Father's Day: Nine months before Mother's Day." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 16:43 ` Stephen R. van den Berg @ 2008-09-12 18:44 ` Theodore Tso 2008-09-12 20:56 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Theodore Tso @ 2008-09-12 18:44 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jeff King, git On Fri, Sep 12, 2008 at 06:43:48PM +0200, Stephen R. van den Berg wrote: > >> True. But repopulating this cache after cloning means that you have to > >> calculate the patch-id of *every* commit in the repository. It sounds > >> like something to avoid, but maybe I'm overly concerned, I have only a > >> vague idea on how computationally intensive this is. > > >For a rough estimate, try: > > > time git log -p | git patch-id >/dev/null > > On my system that results in 2ms per commit on average. Not huge, but > not small either, I guess. Running it results in real waiting time, it > all depends on how patient the user is. For a local clone, git could be taught to copy the cache file. For a network-based clone, the percentage of time needed to download is roughly 2-3 times that (although that will obviously depend on your network connectivity). Building this cache can be done in the background, though, or delayed until the first time the cache is needed. - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 18:44 ` Theodore Tso @ 2008-09-12 20:56 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-12 20:56 UTC (permalink / raw) To: Theodore Tso; +Cc: Jeff King, git Theodore Tso wrote: >On Fri, Sep 12, 2008 at 06:43:48PM +0200, Stephen R. van den Berg wrote: >> On my system that results in 2ms per commit on average. Not huge, but >> not small either, I guess. Running it results in real waiting time, it >> all depends on how patient the user is. >For a local clone, git could be taught to copy the cache file. For a >network-based clone, the percentage of time needed to download is >roughly 2-3 times that (although that will obviously depend on your >network connectivity). Building this cache can be done in the >background, though, or delayed until the first time the cache is >needed. Fair enough. If noone beats me to it, I'll probably take a stab at implementing something like this and see how it fares for my own application. -- Sincerely, Stephen R. van den Berg. "Father's Day: Nine months before Mother's Day." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-12 15:54 ` Stephen R. van den Berg 2008-09-12 16:19 ` Jeff King @ 2008-09-15 12:21 ` Sam Vilain 1 sibling, 0 replies; 137+ messages in thread From: Sam Vilain @ 2008-09-15 12:21 UTC (permalink / raw) To: Stephen R. van den Berg Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Paolo Bonzini, git On Fri, 2008-09-12 at 17:54 +0200, Stephen R. van den Berg wrote: > Theodore Tso wrote: > >Nope, as Sam suggested in his original message (but which got clipped > >by Linus when he was replying) all you have to do is to have a > >separate local database which ties commits and patch-id's together as > >a cache/index. > > True. But repopulating this cache after cloning means that you have to > calculate the patch-id of *every* commit in the repository. It sounds > like something to avoid, but maybe I'm overly concerned, I have only a > vague idea on how computationally intensive this is. You don't necessarily need to do that. If the tool decides that the sha1 it finds in the message is a patch-id reference, well it can just start hunting around, caching the patch-ids it calculates as it finds them, until it either finds one that matches, or determines you don't have it. You can probably find it first try just based on the author name and date 90% of the time anyway. Maybe the machinery could be adequately tilted such that if someone is really desperate to make sure they are found quickly they can put the information at refs/patches/PATCHID/COMMITID, but that sounds a bit abusive. Sam. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 23:35 ` Linus Torvalds 2008-09-09 23:58 ` Stephen R. van den Berg @ 2008-09-09 23:59 ` Jakub Narebski 1 sibling, 0 replies; 137+ messages in thread From: Jakub Narebski @ 2008-09-09 23:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Stephen R. van den Berg, git Linus Torvalds wrote: > On Tue, 9 Sep 2008, Stephen R. van den Berg wrote: >> Jakub Narebski wrote: >>>"Stephen R. van den Berg" <srb@cuci.nl> writes: >>>> The definition of the origin field reads as follows: >> >>>> - There can be an arbitrary number of origin fields per commit. >>>> Typically there is going to be at most one origin field per commit. >> >>> I understand that multiple origin fields occur if you do a squash >>> merge, or if you cherry-pick multiple commits into single commit. >>> For example: >>> $ git cherry-pick -n <a1> >>> $ git cherry-pick <a2> >>> $ git commit --amend #; to correct commit message >> >> Correct. > > Quite frankly, recording the origins for _any_ of the above sounds like a > horribly mistake. Actually the above is _not_ a good example for using 'origin', and why using 'origin'; just a bit convoluted example of multiple 'origin' headers. > All those operations are commonly used (along with "git rebase -i") to > clean up history in order to show a nicer version. > > The whole point of "origin" seems to be to _destroy_ that. If I understand correctly the point is to record those 'origin' headers for git-revert (when 'origin'-ed commit is somewhere in the history), and for git-cherry-pick from other long lived branch and thus require additional option to git-cherry-pick to record 'origin' (denoting that you this is "true" cherry-pick, and not reordering of commits and cleaning up a history, better done with interactive rebase). /me is playing advocatus diaboli here, 'cause I'm not that convinced to necessity of this feature. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg ` (2 preceding siblings ...) 2008-09-09 15:44 ` Jakub Narebski @ 2008-09-09 21:13 ` Petr Baudis 2008-09-09 22:56 ` Stephen R. van den Berg ` (2 more replies) 2008-09-10 20:32 ` [RFC] origin link for cherry-pick and revert Miklos Vajna 4 siblings, 3 replies; 137+ messages in thread From: Petr Baudis @ 2008-09-09 21:13 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: git On Tue, Sep 09, 2008 at 03:22:12PM +0200, Stephen R. van den Berg wrote: > - During fetch/push/pull the full commit including the origin fields is > transmitted, however, the objects the origin links are referring to > are not (unless they are being transmitted because of other reasons). > > - When fetching/pulling it is optionally possible to tell git to > actually transmit objects referred to by origin links even if it would > otherwise not have done so. I think this is misguided. In general case, cherrypicks can be from completely unrelated histories, and if you are doing the cherry pick, you are saying that actually, the history *does not matter*. In that case, this kind of link tries to impose a meaning where there is none, and in an ill-defined way when whether the commit is actually around anywhere is essentially random. Why do you actually *follow* the origin link at all anyway? Without its parents, the associated tree etc., the object is essentially useless for you; the authorship information and commit message should've been preserved by a proper cherry-pick anyway. You're cluttering the object store with invalid objects, which also breaks quite some fundamental logic within Git (which assumes that if an object exists, all its references are valid - give or take few special cases like shallow repositories, but this would have very different characteristics). Having history browsers draw fancy lines is fine but I see nothing wrong with them extracting this from the free-form part of the commit message. For informative purposes, we don't shy away from heuristics anyway, c.f. our renames detection (heck, we are even brave enough to use that for merges). -- Petr "Pasky" Baudis The next generation of interesting software will be done on the Macintosh, not the IBM PC. -- Bill Gates ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 21:13 ` Petr Baudis @ 2008-09-09 22:56 ` Stephen R. van den Berg 2008-09-09 23:05 ` Petr Baudis 2008-09-10 12:21 ` [RFC] origin link for cherry-pick and revert Theodore Tso 2008-09-10 8:45 ` Paolo Bonzini 2008-09-23 13:51 ` Recording "partial merges" (was: Re: [RFC] origin link for cherry-pick and revert) Peter Krefting 2 siblings, 2 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 22:56 UTC (permalink / raw) To: Petr Baudis; +Cc: git Petr Baudis wrote: >On Tue, Sep 09, 2008 at 03:22:12PM +0200, Stephen R. van den Berg wrote: >> - During fetch/push/pull the full commit including the origin fields is >> transmitted, however, the objects the origin links are referring to >> are not (unless they are being transmitted because of other reasons). >> - When fetching/pulling it is optionally possible to tell git to >> actually transmit objects referred to by origin links even if it would >> otherwise not have done so. >I think this is misguided. In general case, cherrypicks can be from >completely unrelated histories, and if you are doing the cherry pick, >you are saying that actually, the history *does not matter*. In that That is a false assumption in general, I'd say. >case, this kind of link tries to impose a meaning where there is none, >and in an ill-defined way when whether the commit is actually around >anywhere is essentially random. The purpose I'd use the origin links for is to manage software projects that consist of 7 main branches which have branched in (on average) two year intervals, which never get merged anymore. The only thing that happens is that there are backports amongst the branches about two per week. The only way to perform the backports is by using cherry-pick. The history of each backport *is* important though. Since all the developers who care about the multiple release branches have all the relevant branches in their repository, the presence of a origin object is by no means random, it's a certainty. >Why do you actually *follow* the origin link at all anyway? Without its >parents, the associated tree etc., the object is essentially useless for >you; the authorship information and commit message should've been >preserved by a proper cherry-pick anyway. You're cluttering the object >store with invalid objects, which also breaks quite some fundamental >logic within Git (which assumes that if an object exists, all its >references are valid - give or take few special cases like shallow >repositories, but this would have very different characteristics). I'd prefer to formalise the (weak) relationship of an origin link, instead of relying on vague assumptions when parsing the free-form commit message and then guessing what the mentioned hash might mean. >Having history browsers draw fancy lines is fine but I see nothing wrong >with them extracting this from the free-form part of the commit message. >For informative purposes, we don't shy away from heuristics anyway, c.f. >our renames detection (heck, we are even brave enough to use that for >merges). It's not just that. If I make a change to an area that was cherrypicked from another branch, then I find it rather important to check if any changes to this area need to be backported/forwardported to the branches the origin links are pointing to. I.e. the origin link allows me to improve my efficiency as a programmer. -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 22:56 ` Stephen R. van den Berg @ 2008-09-09 23:05 ` Petr Baudis 2008-09-09 23:32 ` Stephen R. van den Berg 2008-09-10 9:35 ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini 2008-09-10 12:21 ` [RFC] origin link for cherry-pick and revert Theodore Tso 1 sibling, 2 replies; 137+ messages in thread From: Petr Baudis @ 2008-09-09 23:05 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: git On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote: > The only way to perform the backports is by using cherry-pick. > The history of each backport *is* important though. > Since all the developers who care about the multiple release branches > have all the relevant branches in their repository, the presence of > a origin object is by no means random, it's a certainty. Recording cherry-picks in your workflow certainly makes sense, but I'm not talking about workflow-level issues here. You are adding an extra header to the commit object. I'm talking about the object database and low-level Git model implications this has. In other way, I think this is purely a porcelain matter and recording this information in the free-form area is more than enough. > >Why do you actually *follow* the origin link at all anyway? Without its > >parents, the associated tree etc., the object is essentially useless for > >you; the authorship information and commit message should've been > >preserved by a proper cherry-pick anyway. You're cluttering the object > >store with invalid objects, which also breaks quite some fundamental > >logic within Git (which assumes that if an object exists, all its > >references are valid - give or take few special cases like shallow > >repositories, but this would have very different characteristics). > > I'd prefer to formalise the (weak) relationship of an origin link, instead of > relying on vague assumptions when parsing the free-form commit message > and then guessing what the mentioned hash might mean. Why? > >Having history browsers draw fancy lines is fine but I see nothing wrong > >with them extracting this from the free-form part of the commit message. > >For informative purposes, we don't shy away from heuristics anyway, c.f. > >our renames detection (heck, we are even brave enough to use that for > >merges). > > It's not just that. If I make a change to an area that was cherrypicked > from another branch, then I find it rather important to check if any > changes to this area need to be backported/forwardported to the branches > the origin links are pointing to. > I.e. the origin link allows me to improve my efficiency as a programmer. And why are the notes created by git cherry-pick -x insufficient for that? -- Petr "Pasky" Baudis The next generation of interesting software will be done on the Macintosh, not the IBM PC. -- Bill Gates ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 23:05 ` Petr Baudis @ 2008-09-09 23:32 ` Stephen R. van den Berg 2008-09-10 9:35 ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini 1 sibling, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-09 23:32 UTC (permalink / raw) To: Petr Baudis; +Cc: git Petr Baudis wrote: >On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote: >In other way, I think this is purely a porcelain matter and recording >this information in the free-form area is more than enough. >> I'd prefer to formalise the (weak) relationship of an origin link, instead of >> relying on vague assumptions when parsing the free-form commit message >> and then guessing what the mentioned hash might mean. >Why? Using special references in the free-form area of a commit is akin to using X-... headerfields in E-mail with all the assorted mess: - No strict definition of what it means. - Diverging porcelain implementations making use of the field in ever so slightly changing ways over the years. - You cannot rely on the field being always available. - Automated "renumbering" becomes difficult at best. What we want are concise and unambiguous definitions which allow us to build tools that operate predictably on them now, and will operate predictably on them in the future. >> >Having history browsers draw fancy lines is fine but I see nothing wrong >> It's not just that. If I make a change to an area that was cherrypicked >> from another branch, then I find it rather important to check if any >> changes to this area need to be backported/forwardported to the branches >> the origin links are pointing to. >> I.e. the origin link allows me to improve my efficiency as a programmer. >And why are the notes created by git cherry-pick -x insufficient for that? Things like rebase/filter-branch/stgit mess that up because they don't know if the hash in the free-form should be altered. Also, there is no automated way to actually fetch missing branches we cherry-picked from this way. -- Sincerely, Stephen R. van den Berg. "Be spontaneous!" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata 2008-09-09 23:05 ` Petr Baudis 2008-09-09 23:32 ` Stephen R. van den Berg @ 2008-09-10 9:35 ` Paolo Bonzini 2008-09-10 10:44 ` Petr Baudis 2008-09-10 14:33 ` Dmitry Potapov 1 sibling, 2 replies; 137+ messages in thread From: Paolo Bonzini @ 2008-09-10 9:35 UTC (permalink / raw) To: Petr Baudis; +Cc: Stephen R. van den Berg, git > Why do you actually *follow* the origin link at all anyway? Without its > parents, the associated tree etc., the object is essentially useless for > you Stephen posed the origin links as weak, but it is not necessarily true that you don't have the parents and the associated tree. For example, if you download a repository that includes a "master" branch and a few stable branches, you *will* have the objects cherry-picked into stable branches, because they are commits in the master branch. Junio explained that the way achieves the same effect in git is by forking the topic branch off the "oldest" branch where the patch will possibly be of interest. Then he can merge it in that branch and all the newest ones. That's great, but not all people are as forward-looking (he did say that sometimes he needs to cherrypick). Another problem is that in some projects actually there are two "maint" branches (e.g. currently GCC 4.2 and GCC 4.3), and most developers do not care about what goes in the older "maint" branch; they develop for trunk and for the newer "maint" branch, and then one person comes and cherry-picks into the older "maint" branch. This has two problems: 1) Having to fork topic branches off the older branch would force extra testing on the developers. 2) Besides this, topic branches are not cloned, so if I am the integrator on the older "maint" branch, I need to dig manually in the commits to find bugfixes. True, I could use Bugzilla, but what if I want to use git instead? There is "git cherry -v ... | grep -w ^+.*PR", except that it has too many false negatives (fixes that have already been backported, but do show up in the list). > And why are the notes created by git cherry-pick -x insufficient for that? For example, these notes (or the ones created by "git revert") are *wrong* because they talk about commits instead of changesets (deltas between two commits). Why is only one commit present? Because these messages are meant for users, not for programs. That's easy to show: users think of commits as deltas anyway, even though git stores them as snapshots---"git show HEAD" shows a delta, not a snapshot. And what does this mean for programs? That they must resort to commit-message scraping to distinguish the two cases. (*) (*) A GUI blame program, for example, would need to distinguish whether code added by a commit is taken from commit 4329bd8, or is reverting commit 4329bd8. (In the first case, the author of that code is whoever was responsible for that code in 4329bd8; in the second case, it is whoever was responsible for that code in 4329bd8^). If recording changesets, you see 4329bd8^..4329bd8 in the first case, and 4329bd8..4329bd8^ in the second, so it is trivial to follow the chain. And scraping is bad. Imagine people that are writing commit messages in their native language. What if they patch git to translate the magic notes created by "git cherry-pick -x" or "git revert" (maybe a future version of git will do that automatically)? Should they translate also every program that scrapes the messages? Whenever there is a piece of data that could be useful to programs (no matter if plumbing or porcelain), I consider free form notes to be bad. Because data is data, and metadata is metadata. If there was a generic way to put porcelain-level metadata in commit messages (e.g. Signed-Off-By and Acknowledged-By can be already considered metadata), I would not be so much in favor of "origin" links being part of the commit object's format. Now if you think about it, commit references within this kind of metadata would have mostly the properties that Stephen explained in his first message: 1) they would be rewritten by git-filter-branch 2) these references, albeit weak by default, could optionally be followed when fetching (either with command-line or configuration options) 3) they would not be pruned by git-gc 4) possibly, git rev-list --topo-order would sort commits by taking into account metadata references too. So the implementation effort would be roughly the same. But, can you think of any other such metadata? Personally I can't, so while I understand the opposition to a new commit header field that would be there from here to eternity (or until the LHC starts), I do think it is the simplest thing that can possibly work. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata 2008-09-10 9:35 ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini @ 2008-09-10 10:44 ` Petr Baudis 2008-09-10 11:49 ` Stephen R. van den Berg 2008-09-10 14:33 ` Dmitry Potapov 1 sibling, 1 reply; 137+ messages in thread From: Petr Baudis @ 2008-09-10 10:44 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Stephen R. van den Berg, git On Wed, Sep 10, 2008 at 11:35:18AM +0200, Paolo Bonzini wrote: > > > Why do you actually *follow* the origin link at all anyway? Without its > > parents, the associated tree etc., the object is essentially useless for > > you > > Stephen posed the origin links as weak, but it is not necessarily true > that you don't have the parents and the associated tree. For example, > if you download a repository that includes a "master" branch and a few > stable branches, you *will* have the objects cherry-picked into stable > branches, because they are commits in the master branch. But that is irrelevant. If you already have the objects, whether to follow the origin link does not matter at all. I argue that the following the origin link by one step is harmful as it violated the internal Git object model and does not have real benefits. If you want to have the origin links, do not follow them at all - the commit objects themselves are not useful. (Or, optionally, follow them fully - that of course can make sense.) > > And why are the notes created by git cherry-pick -x insufficient for that? > > For example, these notes (or the ones created by "git revert") are > *wrong* because they talk about commits instead of changesets (deltas > between two commits). (BTW, I don't feel strongly enough about the header-freeform distinction to argue about it and some of your and others' points are good. But even if we have the origin links, I think we should only follow them not at all or fully.) -- Petr "Pasky" Baudis The next generation of interesting software will be done on the Macintosh, not the IBM PC. -- Bill Gates ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata 2008-09-10 10:44 ` Petr Baudis @ 2008-09-10 11:49 ` Stephen R. van den Berg 2008-09-10 12:30 ` Petr Baudis 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 11:49 UTC (permalink / raw) To: Petr Baudis; +Cc: Paolo Bonzini, git Petr Baudis wrote: >On Wed, Sep 10, 2008 at 11:35:18AM +0200, Paolo Bonzini wrote: >But that is irrelevant. If you already have the objects, whether to >follow the origin link does not matter at all. >I argue that the following the origin link by one step is harmful as it >violated the internal Git object model and does not have real benefits. >If you want to have the origin links, do not follow them at all - the >commit objects themselves are not useful. (Or, optionally, follow them >fully - that of course can make sense.) The origin links are rarely followed, not even by one step. They are only followed if a certain operation requires them (not a lot do). >> > And why are the notes created by git cherry-pick -x insufficient for that? >> For example, these notes (or the ones created by "git revert") are >> *wrong* because they talk about commits instead of changesets (deltas >> between two commits). >(BTW, I don't feel strongly enough about the header-freeform distinction >to argue about it and some of your and others' points are good. But even >if we have the origin links, I think we should only follow them not at >all or fully.) Maybe we have a misunderstanding about what "follow a link" means and when it is done. During most normal git operation, the origin links are just read, but not followed. The only commands that I expect to follow them are log --graph, gitk, fsck and blame. I may have missed some corner use-cases, but this should cover most of it; i.e. most of git ignores them or just makes note of the hashvalues provided. -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata 2008-09-10 11:49 ` Stephen R. van den Berg @ 2008-09-10 12:30 ` Petr Baudis 2008-09-10 13:14 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Petr Baudis @ 2008-09-10 12:30 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Paolo Bonzini, git On Wed, Sep 10, 2008 at 01:49:40PM +0200, Stephen R. van den Berg wrote: > Maybe we have a misunderstanding about what "follow a link" means and > when it is done. > During most normal git operation, the origin links are just read, but > not followed. > The only commands that I expect to follow them are log --graph, gitk, fsck > and blame. I may have missed some corner use-cases, but this should > cover most of it; i.e. most of git ignores them or just makes note of > the hashvalues provided. Oh, I'm sorry. By - During fetch/push/pull the full commit including the origin fields is transmitted, however, the objects the origin links are referring to are not (unless they are being transmitted because of other reasons). I have understood that you fetch the origin target but not commits referred from it, but instead you meant that you do not follow the origin link at all. Petr "Pasky" Baudis ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata 2008-09-10 12:30 ` Petr Baudis @ 2008-09-10 13:14 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 13:14 UTC (permalink / raw) To: Petr Baudis; +Cc: Paolo Bonzini, git Petr Baudis wrote: >On Wed, Sep 10, 2008 at 01:49:40PM +0200, Stephen R. van den Berg wrote: >> Maybe we have a misunderstanding about what "follow a link" means and >> when it is done. >Oh, I'm sorry. By > - During fetch/push/pull the full commit including the origin fields is > transmitted, however, the objects the origin links are referring to > are not (unless they are being transmitted because of other reasons). >I have understood that you fetch the origin target but not commits >referred from it, but instead you meant that you do not follow the >origin link at all. Indeed. -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata 2008-09-10 9:35 ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini 2008-09-10 10:44 ` Petr Baudis @ 2008-09-10 14:33 ` Dmitry Potapov 2008-09-10 15:15 ` Stephen R. van den Berg 1 sibling, 1 reply; 137+ messages in thread From: Dmitry Potapov @ 2008-09-10 14:33 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Petr Baudis, Stephen R. van den Berg, git On Wed, Sep 10, 2008 at 11:35:18AM +0200, Paolo Bonzini wrote: > > Junio explained that the way achieves the same effect in git is by > forking the topic branch off the "oldest" branch where the patch will > possibly be of interest. Then he can merge it in that branch and all > the newest ones. That's great, but not all people are as > forward-looking (he did say that sometimes he needs to cherrypick). Those who base their work on the newest ones must be very forward- looking :) but, seriously, cherry-picking is *not* a normal workflow with Git. Git is optimized for easy merging while cherry-picking is a rare operation reserved for correcting some past mistakes. > > Another problem is that in some projects actually there are two "maint" > branches (e.g. currently GCC 4.2 and GCC 4.3), and most developers do > not care about what goes in the older "maint" branch; they develop for > trunk and for the newer "maint" branch, and then one person comes and > cherry-picks into the older "maint" branch. This has two problems: > > 1) Having to fork topic branches off the older branch would force extra > testing on the developers. If a branch is meant to included in the oldest version, it must be tested with that version anyway, and it is better when it is written for the old version, because functions tend to be more backward compatible than forward compatible. In other words, functions may often acquire some extra functionality over time without changing their signature, so the code written for a new version will merge without any conflict to the old one, but it won't work correctly under some conditions. It is certainly possible to have a problem in the opposite direction, but it is much less likely, and usually bugs introduced in the development version are not as bad as destabilizing a stable branch. Thus starting branch that is clearly meant for inclusion to the old version from that version is the right thing do. Of course, if you have more than one stable branch for a long time then you may want some branches forked from the new stable. You can do that by merging uninteresting changes from the new stable with the 'ours' strategy (so they will be ignored), and after that merging actually interesting features from the new stable. In contrast to cherry-picking, the real merge creates the history that can be easily visualized and understood. > > 2) Besides this, topic branches are not cloned, so if I am the > integrator on the older "maint" branch, I need to dig manually in the > commits to find bugfixes. True, I could use Bugzilla, but what if I > want to use git instead? There is "git cherry -v ... | grep -w ^+.*PR", > except that it has too many false negatives (fixes that have already > been backported, but do show up in the list). If you clearly mark all bugs in the commit message, there will be no problem to find them by grepping log. There is a lot of potentially useful information, and the 'origin' link is just one of many. It may be okay to do some general mechanism for custom commit attributes (if it's really necessary), but making a hack for one specific item of information feels very wrong. In fact, I have not convinced at all that the free-form text is not suitable to store this information. Dmitry ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata 2008-09-10 14:33 ` Dmitry Potapov @ 2008-09-10 15:15 ` Stephen R. van den Berg 2008-09-10 15:24 ` Paolo Bonzini 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 15:15 UTC (permalink / raw) To: Dmitry Potapov; +Cc: Paolo Bonzini, Petr Baudis, git Dmitry Potapov wrote: >On Wed, Sep 10, 2008 at 11:35:18AM +0200, Paolo Bonzini wrote: >> Another problem is that in some projects actually there are two "maint" >> branches (e.g. currently GCC 4.2 and GCC 4.3), and most developers do >> not care about what goes in the older "maint" branch; they develop for >> trunk and for the newer "maint" branch, and then one person comes and >> cherry-picks into the older "maint" branch. This has two problems: >> 1) Having to fork topic branches off the older branch would force extra >> testing on the developers. >If a branch is meant to included in the oldest version, it must be >tested with that version anyway, and it is better when it is written for >the old version, because functions tend to be more backward compatible >than forward compatible. In other words, functions may often acquire >some extra functionality over time without changing their signature, so >the code written for a new version will merge without any conflict to >the old one, but it won't work correctly under some conditions. It is >certainly possible to have a problem in the opposite direction, but it >is much less likely, and usually bugs introduced in the development >version are not as bad as destabilizing a stable branch. Thus starting >branch that is clearly meant for inclusion to the old version from that >version is the right thing do. >Of course, if you have more than one stable branch for a long time then >you may want some branches forked from the new stable. You can do that >by merging uninteresting changes from the new stable with the 'ours' >strategy (so they will be ignored), and after that merging actually >interesting features from the new stable. >In contrast to cherry-picking, the real merge creates the history that >can be easily visualized and understood. Could you explain how the above mechanisms work based on the following cherry-pick action: A -- B -- C -- D -- L \ / E -- F -- G -- H -- K D is the stable branch. K is the development branch. G is cherry-picked and applied to D producing L. The origin link of L would have contained (G, F). How would such a workflow be implemented using the temporary branches you describe? >If you clearly mark all bugs in the commit message, there will be no >problem to find them by grepping log. There is a lot of potentially Sometimes they're not bugs, yet they still are backported and thus carry no special marks. >useful information, and the 'origin' link is just one of many. It may True, but it's one of the few machine-useable ones. >be okay to do some general mechanism for custom commit attributes (if >it's really necessary), That's the problem, a general mechanism is undesirable, that we already have the free-form textfield for. > but making a hack for one specific item of >information feels very wrong. It's a rather well-defined usefull property (which precludes it from being a hack, I suppose). -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata 2008-09-10 15:15 ` Stephen R. van den Berg @ 2008-09-10 15:24 ` Paolo Bonzini 0 siblings, 0 replies; 137+ messages in thread From: Paolo Bonzini @ 2008-09-10 15:24 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Dmitry Potapov, git > > > fork topic branches off the older branch [and merge them] > > Could you explain how the above mechanisms work based on the following > cherry-pick action: > > A -- B -- C -- D -- L > \ / > E -- F -- G -- H -- K > > D is the stable branch. > K is the development branch. > G is cherry-picked and applied to D producing L. > The origin link of L would have contained (G, F). > > How would such a workflow be implemented using the temporary branches > you describe? You don't. You do everything in topic branches based off the stable branch, and you merge them. That's the other way round, compared to what you (and I) are used to. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 22:56 ` Stephen R. van den Berg 2008-09-09 23:05 ` Petr Baudis @ 2008-09-10 12:21 ` Theodore Tso 2008-09-10 14:16 ` Stephen R. van den Berg 1 sibling, 1 reply; 137+ messages in thread From: Theodore Tso @ 2008-09-10 12:21 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Petr Baudis, git On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote: > The purpose I'd use the origin links for is to manage software projects > that consist of 7 main branches which have branched in (on average) two > year intervals, which never get merged anymore. The only thing that > happens is that there are backports amongst the branches about two per > week. > > The only way to perform the backports is by using cherry-pick. > The history of each backport *is* important though. > Since all the developers who care about the multiple release branches > have all the relevant branches in their repository, the presence of > a origin object is by no means random, it's a certainty. I'd argue that the origin link is a bit too general for your proposed use. One of the problems with the origin link is that it is only a one way pointer. Given a newer commit, you know that it is (somehow) weekly related to a older commit. So your proposed workflow only works if cherry-picks only happen in one direction. That isn't always true, especially in distributed environments where the bugfix might happen on someone else's development branch, and then it gets pulled in, or perhaps rebased in, and you want to know they are related. I would argue the best way to do that is to store (either in the object or in the free-form text area) not the link, which would have to get renumbered but rather the identifier for the bug(s) that this commit fixes. So for example, consider a convention where in the body of the free-form text area, before the Signed-off-by:, Acked-by:, and CC: headers for those projects that use them, we add something like the following: Addresses-Bug: Red_Hat/149480, Sourceforge_Feature/120167 or Addresses-Bug: Debian/432865, Launchpad/203323, Sourceforge_Bug/1926023 Once you have this information, it is not difficult to maintain a berk_db database which maps a particular Bug identifier (i.e., Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of commits. The advantage of this scheme is that if a bug has been fixed in multiple branches, you can see the association between two commits in two different branches very easily. Furthermore, you get a link back to the actual bug in one or more bug tracking systems, which the some porcelain program could use to transform into a hot-link which when clicked opens up a browser window to the bug in question. In contrast, using your proposed origin scheme, if the bug was originally created in some development branch, and then cherry picked into two separate maintenance branches, if you don't have the development branch in your repository (maybe for some reason that development branch wasn't kept for some reason), the origin link in the two maintenance branches would point to a non-existent commit ID, and you wouldn't be able to estabish a linkage between them. By using an independent bug identifer as the way of creating the linkage, you're preserving *much* more useful information, and you can reliably establish a relationship between two commits. In terms of your arguments about why free-form is bad, in another message: >- No strict definition of what it means. >- Diverging porcelain implementations making use of the field in ever so > slightly changing ways over the years. This can be a problem regardless of where you store the information. Whether you store it in the free-form text or in the git object header, if you don't make sure it is well-defined, you're in trouble. >- You cannot rely on the field being always available. This is true regardless of where you store it; older versions of git won't store the git origin link, for example, unless you plan to break backwards compatibility with all existing git repositories, which would be a bad idea. :-) One nice thing of using text in free-form text fields is that anyone can enter it without needing a new version of git. The downside is that people could typo the header in some fashion. But that can be dealt with in a newer version of the git porcelain validates the bug identifier and/or checks for obvious spelling mistakes and issues a warning ("Looks like you may have mispelled 'Adresses-Bug'; perhaps you should fix this via git commit --amend?"). In contrast, if you put it in the git object header, there is no possibility of using the field at all until you update to a version of git that supports it. And some developer on your project is using an older version of git when they rebase or cherry-pick a commit, the origin header will be completely lost; but if it is stored in the free-form area, the information will be brought along for the ride for free. >- Automated "renumbering" becomes difficult at best. This is actually one of the reasons why I don't like the origin link. If you use the origin link, it's *still* not obvious whether you should rewrite the commit ID or not. For example, in some workflows, you have two branches pointing to the same commit before you do the rebase, where the rebase will only update the current branch pointer, but there is another branch still pointing at the original series of commits. Worse yet, someone may have done a cherry-pick *before* the rebase. Hence, the only thing you can do is keep *both* commit ID's. This means that over time, you can't get rid of any commit ID's when you do a rebase, which means the number of commit ID's in the origin link will always increase whenever you do a rebase or a cherry-pick. This is why for the use case where you are trying to figure out whether a bug exists in a particular branch, it is ***much*** better to rendevous using a bug identifier; it provides an extra layer of indirection which results in a much more stable identifer that is guaranteed to work. I understand it won't work for those cases where you don't have a bug tracking identifer, but in fact, if you need this functionality at all (and I am not convinced that you do), the ***much*** better approach is to use the same approach as the bug tracking identifier, and add a level of indirection. How would that work in practice? Whenever you create a new commit, create a UUID which is assigned to the patch. This UUID is not modified by git rebase or git cherry pick, and it should be optionally kept or modified on a git commit --amend. Ideally, said UUID would exported via git-format-patch, and imported via git-am, and via systems that use patches, such as guilt or stg. This becomes a handy way of recognizing patches even if they aren't being stored in git --- for example, Andrew Morton's mm patch series. Now, whether you store this UUID in the free-form text area, or in the git object header, in the long run really doesn't matter. You can just as easily have porcelein suppress a line in the free-form text area, as you can have the procelain print the UUID when it is stored in the object header. Yes, it means that you have to maintain a separate database so you can easily find the list of commits that contain a particular UUID, but I suspect you would need this in the case of the origin link concept anyway, since sooner or later some of the more useful uses of said link would require you to be able to find the commits which had origin links to the original commit, which means you would need to create and maintain this database anyway. And the maintenance of this database is purely optional; you only need it if you care about efficiently looking up UUID's, and given "time git log > /dev/null" on the kernel tree only takes six seconds on my laptop, and "git log > /dev/null" only takes 0.148 seconds for e2fsprogs, for many projects you might not even need the database to accelerate lookups via UUID. - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 12:21 ` [RFC] origin link for cherry-pick and revert Theodore Tso @ 2008-09-10 14:16 ` Stephen R. van den Berg 2008-09-10 15:10 ` Jeff King ` (2 more replies) 0 siblings, 3 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 14:16 UTC (permalink / raw) To: Theodore Tso; +Cc: Petr Baudis, git Theodore Tso wrote: >On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote: >use. One of the problems with the origin link is that it is only a >one way pointer. Given a newer commit, you know that it is (somehow) >weekly related to a older commit. So your proposed workflow only >works if cherry-picks only happen in one direction. That isn't always >true, especially in distributed environments where the bugfix might >happen on someone else's development branch, and then it gets pulled >in, or perhaps rebased in, and you want to know they are related. Well, the definition of the origin link (and a back/forwardport) is that: - You (as a developer) consider the link relevant for posterity (IOW, you consider it to be a proper back/forwardport which should be recognisable as such). - The back/forwardport always has to reference some existing (stable) commit. Especially the second condition always holds at the time of creation of the backport (or forwardport, for that matter). I'm not quite sure which circumstances you allude to above which would violate this requirement, can you elaborate on that? >I would argue the best way to do that is to store (either in the >object or in the free-form text area) not the link, which would have >to get renumbered but rather the identifier for the bug(s) that this The renumbering is not a problem, renumbering is a rare operation since a project's history is supposed to be stable. And even if renumbering is performed, it is a well understood operation of which the renumbering of the origin links imposes a negligible overhead on top of the existing renumbering overhead. >commit fixes. So for example, consider a convention where in the body >of the free-form text area, before the Signed-off-by:, Acked-by:, and >CC: headers for those projects that use them, we add something like >the following: >Addresses-Bug: Red_Hat/149480, Sourceforge_Feature/120167 >or >Addresses-Bug: Debian/432865, Launchpad/203323, Sourceforge_Bug/1926023 >Once you have this information, it is not difficult to maintain a >berk_db database which maps a particular Bug identifier (i.e., >Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of >commits. This is nice, I admit, but it has the following downsides: - It is nontrivial to automate this on execution of "git cherry-pick". - In a distributed environment this requires a network-reachable bug database. - A network-reachable bug database means that suddenly git needs network access for e.g. cherry-pick, revert, gitk, log --graph, blame. - Network queries for commits containing references kind of kills performance. - Some backports don't have entries in a bug database because they weren't bugs to begin with, in which case it becomes impossible to add an identifier to the commit message after the fact. - It relies heavily on tools outside of git-core, which raises the threshold for using it. >The advantage of this scheme is that if a bug has been fixed in >multiple branches, you can see the association between two commits in >two different branches very easily. Furthermore, you get a link back >to the actual bug in one or more bug tracking systems, which the some >porcelain program could use to transform into a hot-link which when >clicked opens up a browser window to the bug in question. I'm not opposed to links like this, but I consider them a useful extra. The link back is computationally of the same order of magnitude to find all existing children of a certain commit; which is well understood and within reach in most cases. >In contrast, using your proposed origin scheme, if the bug was >originally created in some development branch, and then cherry picked >into two separate maintenance branches, if you don't have the >development branch in your repository (maybe for some reason that >development branch wasn't kept for some reason), the origin link in >the two maintenance branches would point to a non-existent commit ID, >and you wouldn't be able to estabish a linkage between them. By using Yes, you would. You'd notice that either: - One origin will point to the other commit (recommended practice, cherry-pick ripple-through, so to speak). - Both origin links point to the same non-existent commit. >In terms of your arguments about why free-form is bad, in another message: >>- No strict definition of what it means. >>- Diverging porcelain implementations making use of the field in ever so >> slightly changing ways over the years. >This can be a problem regardless of where you store the information. True. The point is that specifying a definition for a origin headerfield will narrow down how it is and can be used. Free-form is just that, free-form, and merely defines things by convention. >Whether you store it in the free-form text or in the git object >header, if you don't make sure it is well-defined, you're in trouble. Free form can take the form of plaintext explanations detailing the relationship in a foreign language (worst case example). >>- You cannot rely on the field being always available. >This is true regardless of where you store it; older versions of git >won't store the git origin link, for example, unless you plan to break >backwards compatibility with all existing git repositories, which >would be a bad idea. :-) True. What I was alluding to, is that if someone includes a back/forwardport link in the free-form part of the commit message, then you cannot predict how they'll do that. In case of the origin link, *if* it is used, it will always look the same. >One nice thing of using text in free-form text fields is that anyone >can enter it without needing a new version of git. The downside is Git is rather portable, I'd say, so anyone wanting to use the new feature can be bothered to upgrade. >that people could typo the header in some fashion. But that can be >dealt with in a newer version of the git porcelain validates the bug >identifier and/or checks for obvious spelling mistakes and issues a >warning ("Looks like you may have mispelled 'Adresses-Bug'; perhaps >you should fix this via git commit --amend?"). You mean you'd prefer some kind of AI solution to aid the user in writing misspelling-free bug identifiers over a simple clean origin link in the header of a commit message? >In contrast, if you put it in the git object header, there is no >possibility of using the field at all until you update to a version of >git that supports it. And some developer on your project is using an >older version of git when they rebase or cherry-pick a commit, the >origin header will be completely lost; but if it is stored in the >free-form area, the information will be brought along for the ride for >free. Same as above: If developers care about the backport information, they *can* be bothered to upgrade git. It's not rocketscience. >>- Automated "renumbering" becomes difficult at best. >This is actually one of the reasons why I don't like the origin link. >If you use the origin link, it's *still* not obvious whether you >should rewrite the commit ID or not. For example, in some workflows, >you have two branches pointing to the same commit before you do the >rebase, where the rebase will only update the current branch pointer, >but there is another branch still pointing at the original series of >commits. Worse yet, someone may have done a cherry-pick *before* the >rebase. Hence, the only thing you can do is keep *both* commit ID's. >This means that over time, you can't get rid of any commit ID's when >you do a rebase, which means the number of commit ID's in the origin >link will always increase whenever you do a rebase or a cherry-pick. The recommended practice here is quite simple: - Origin links should only be created pointing to stable commits (i.e. commits which you'd be willing to publish or already have published). - This implies that pointing an origin link at a commit in a strain that you still want to rebase is asking for trouble. Doing this is akin to doing a merge between two branches and then you start rebasing 4 commits *below* the mergepoint. Don't do that. - The only special case I'd allow is if you rebase a strain and the origin link points from one of the commits in the strain to be rebased back *into* the same strain being rebased (most likely a revert). Rebase can be bothered to renumber the origin link in this case. And when you stick to those rules, the problem you're describing doesn't happen. >This is why for the use case where you are trying to figure out >whether a bug exists in a particular branch, it is ***much*** better >to rendevous using a bug identifier; it provides an extra layer of >indirection which results in a much more stable identifer that is >guaranteed to work. Unless that commit already lies in the past, and you have no way to actually add the bugid to the commit. >(and I am not convinced that you do), the ***much*** better approach >is to use the same approach as the bug tracking identifier, and add a >level of indirection. How would that work in practice? Whenever you >create a new commit, create a UUID which is assigned to the patch. This only works if you know at time of commit that you want to backport it at some later date. >Now, whether you store this UUID in the free-form text area, or in the >git object header, in the long run really doesn't matter. You can >just as easily have porcelein suppress a line in the free-form text >area, as you can have the procelain print the UUID when it is stored >in the object header. True. It's almost as much work. Though it seems rather silly to start suppressing lines in the free-form text area, if one can add a proper headerfield. >Yes, it means that you have to maintain a separate database so you can >easily find the list of commits that contain a particular UUID, but I >suspect you would need this in the case of the origin link concept >anyway, since sooner or later some of the more useful uses of said >link would require you to be able to find the commits which had origin >links to the original commit, which means you would need to create and >maintain this database anyway. That isn't true. Finding commits which have origin links to a certain commit is just as hard as finding all children of a certain commit. It's not exactly instant, but it is not a big problem, and depending on the amount of repositorytraversal you already are doing, it might even be a negligible amount of extra overhead. > And the maintenance of this database >is purely optional; you only need it if you care about efficiently >looking up UUID's, and given "time git log > /dev/null" on the kernel >tree only takes six seconds on my laptop, and "git log > /dev/null" >only takes 0.148 seconds for e2fsprogs, for many projects you might >not even need the database to accelerate lookups via UUID. The database needs to be available to anyone doing a clone of the repository, which implies that: - It needs to be network based. - It needs controlled write access (which is a mess). - It is slow during blame/gitk operations. - It is rather nontrivial to get things setup such that someone (after cloning the repository) is able to run cherry-pick/gitk/blame/revert and have those commands use the database transparently. -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 14:16 ` Stephen R. van den Berg @ 2008-09-10 15:10 ` Jeff King 2008-09-10 21:50 ` Stephen R. van den Berg 2008-09-10 16:18 ` Theodore Tso 2008-09-10 16:40 ` Petr Baudis 2 siblings, 1 reply; 137+ messages in thread From: Jeff King @ 2008-09-10 15:10 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Theodore Tso, Petr Baudis, git On Wed, Sep 10, 2008 at 04:16:30PM +0200, Stephen R. van den Berg wrote: > >Once you have this information, it is not difficult to maintain a > >berk_db database which maps a particular Bug identifier (i.e., > >Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of > >commits. > > This is nice, I admit, but it has the following downsides: > - It is nontrivial to automate this on execution of "git cherry-pick". Maybe a cherry-picking hook? > - In a distributed environment this requires a network-reachable bug > database. Use a distributed bug tracking system (DBTS). > - A network-reachable bug database means that suddenly git needs network > access for e.g. cherry-pick, revert, gitk, log --graph, blame. Use a DBTS. > - Network queries for commits containing references kind of kills > performance. Use a DBTS. > - Some backports don't have entries in a bug database because they > weren't bugs to begin with, in which case it becomes impossible to add > an identifier to the commit message after the fact. Use a DBTS, since then you can generally make up a new UUID on the spot. > - It relies heavily on tools outside of git-core, which raises the > threshold for using it. True. But maybe Ted is on to something here. Rather than adding the information to the commit object itself, why not maintain a separate mapping, but keep it _within git_. That is how most of the DBTS's work that I have seen. Maybe it is possible to implement some subset of the features in a tool that could become part of core git. There was a proposal at some point for a "notes" feature which would allow after-the-fact annotation of commits. I don't recall the exact details, but I think it stored its information as a git tree of blobs. You could choose whether or not to transfer the notes based on transferring a ref pointing to the notes tree. I'm not sure how applicable this is to your problem, but if you want to investigate you can find discussion in the list archive under the name "notes". -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 15:10 ` Jeff King @ 2008-09-10 21:50 ` Stephen R. van den Berg 2008-09-10 21:54 ` Jeff King 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 21:50 UTC (permalink / raw) To: Jeff King; +Cc: Theodore Tso, Petr Baudis, git Jeff King wrote: >On Wed, Sep 10, 2008 at 04:16:30PM +0200, Stephen R. van den Berg wrote: >> This is nice, I admit, but it has the following downsides: >> - It is nontrivial to automate this on execution of "git cherry-pick". >Maybe a cherry-picking hook? Yes, that works, but it is non-trivial, especially since it needs to work for gitk, log --graph, blame and revert as well. >> - In a distributed environment this requires a network-reachable bug >> database. >Use a distributed bug tracking system (DBTS). If it were part of git-core, that would work. >But maybe Ted is on to something here. Rather than adding the >information to the commit object itself, why not maintain a separate >mapping, but keep it _within git_. That is how most of the DBTS's work >that I have seen. Maybe it is possible to implement some subset of the >features in a tool that could become part of core git. Interesting thought. >There was a proposal at some point for a "notes" feature which would >allow after-the-fact annotation of commits. I don't recall the exact >details, but I think it stored its information as a git tree of blobs. >You could choose whether or not to transfer the notes based on >transferring a ref pointing to the notes tree. The idea is nice, but if we were to use it to store the origin link information, the following happens: - Origin link information is rare. - Yet during a log/gitk/blame run the information might need to be queried for at every commit. - Since in most cases the origin information does not exist, this will cause misses to fill the dentry cache for directory lookups, and thus killing performance. - In order to make this efficient, a different database lookup system is needed that is fast for misses. Whereas if the information is part of the commit, it costs nothing in the typical case (no origin information present). -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 21:50 ` Stephen R. van den Berg @ 2008-09-10 21:54 ` Jeff King 2008-09-10 22:34 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Jeff King @ 2008-09-10 21:54 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Theodore Tso, Petr Baudis, git On Wed, Sep 10, 2008 at 11:50:45PM +0200, Stephen R. van den Berg wrote: > >There was a proposal at some point for a "notes" feature which would > >allow after-the-fact annotation of commits. I don't recall the exact > >details, but I think it stored its information as a git tree of blobs. > >You could choose whether or not to transfer the notes based on > >transferring a ref pointing to the notes tree. > > The idea is nice, but if we were to use it to store the origin link > information, the following happens: > - Origin link information is rare. > - Yet during a log/gitk/blame run the information might need to > be queried for at every commit. > - Since in most cases the origin information does not exist, this > will cause misses to fill the dentry cache for directory lookups, and > thus killing performance. > - In order to make this efficient, a different database lookup system is > needed that is fast for misses. I think you are misunderstanding what I meant by "git tree" here. It is literally a git tree object, so you don't ask the filesystem at all. You are looking up within the single object file. If it's a miss, you know after seeing that object. If not, then you dereference the blob object that contains the notes. -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 21:54 ` Jeff King @ 2008-09-10 22:34 ` Stephen R. van den Berg 2008-09-10 22:55 ` Jeff King 0 siblings, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 22:34 UTC (permalink / raw) To: Jeff King; +Cc: Theodore Tso, Petr Baudis, git Jeff King wrote: >On Wed, Sep 10, 2008 at 11:50:45PM +0200, Stephen R. van den Berg wrote: >> >There was a proposal at some point for a "notes" feature which would >> >allow after-the-fact annotation of commits. I don't recall the exact >> >details, but I think it stored its information as a git tree of blobs. >> >You could choose whether or not to transfer the notes based on >> >transferring a ref pointing to the notes tree. >> The idea is nice, but if we were to use it to store the origin link >> information, the following happens: >> - Origin link information is rare. >I think you are misunderstanding what I meant by "git tree" here. It is >literally a git tree object, so you don't ask the filesystem at all. You >are looking up within the single object file. If it's a miss, you know >after seeing that object. If not, then you dereference the blob object >that contains the notes. I see. Indeed. That's a lot better. Did the binary search inside tree objects ever get implemented? It is unclear why the latest commit notes proposal didn't make it, though I admit that storing the origin link information in there seems feasible. The downsides when doing that are: - The lookup cost is small, but still noticable, since it is sometimes done on every commit; using the in-commit origin headerfield solves this at negligible cost. - The origin information is no longer cryptographically protected (under certain circumstances this could be considered an advantage and a disadvantage at the same time). -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 22:34 ` Stephen R. van den Berg @ 2008-09-10 22:55 ` Jeff King 2008-09-10 23:19 ` Stephen R. van den Berg 2008-09-11 2:46 ` Nicolas Pitre 0 siblings, 2 replies; 137+ messages in thread From: Jeff King @ 2008-09-10 22:55 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Theodore Tso, Petr Baudis, git On Thu, Sep 11, 2008 at 12:34:27AM +0200, Stephen R. van den Berg wrote: > I see. Indeed. That's a lot better. > Did the binary search inside tree objects ever get implemented? I believe it's still linear (and skimming tree-walk.c:find_tree_entry seems to confirm). However, one advantage of such an approach is that it will improve as tree lookup improves (e.g., I believe the pack v4 work included improvements in this area). > The downsides when doing that are: > - The lookup cost is small, but still noticable, since it is sometimes > done on every commit; using the in-commit origin headerfield solves > this at negligible cost. > - The origin information is no longer cryptographically protected (under > certain circumstances this could be considered an advantage and a > disadvantage at the same time). Yes, those are inherent in the scheme, as is the upside that one can make and distribute such annotations separately from commit creation. I haven't thought enough about it to decide whether there is a scenario where making such a "cherry-picked from" annotation might make use of that property. -Peff ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 22:55 ` Jeff King @ 2008-09-10 23:19 ` Stephen R. van den Berg 2008-09-11 5:16 ` Paolo Bonzini 2008-09-11 2:46 ` Nicolas Pitre 1 sibling, 1 reply; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 23:19 UTC (permalink / raw) To: Jeff King; +Cc: Theodore Tso, Petr Baudis, git Jeff King wrote: >On Thu, Sep 11, 2008 at 12:34:27AM +0200, Stephen R. van den Berg wrote: >> - The origin information is no longer cryptographically protected (under >> certain circumstances this could be considered an advantage and a >> disadvantage at the same time). >I haven't thought enough about it to decide whether there is a scenario >where making such a "cherry-picked from" annotation might make use of >that property. Being able to subvert the authenticity of git blame by providing fake origin information is not very appealing. -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 23:19 ` Stephen R. van den Berg @ 2008-09-11 5:16 ` Paolo Bonzini 2008-09-11 7:55 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Paolo Bonzini @ 2008-09-11 5:16 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jeff King, Theodore Tso, Petr Baudis, git >>> - The origin information is no longer cryptographically protected (under >>> certain circumstances this could be considered an advantage and a >>> disadvantage at the same time). > >> I haven't thought enough about it to decide whether there is a scenario >> where making such a "cherry-picked from" annotation might make use of >> that property. > > Being able to subvert the authenticity of git blame by providing fake > origin information is not very appealing. You could use a dummy submodule to ensure that each commit pointed to the right set of notes. It would force to create a separate commit whenever you modified the notes, which is actually not bad. Alternatively, the header of the commit can be modified to add a pointer to a tree object for the notes; I suppose this is more palatable than the origin link. The tree could be organized in directories+blobs like .git/objects to speed up the lookup. I actually like the commit notes idea, but then I wonder: why are the author and committer part of the commit object? How does the plumbing use them? Isn't that metadata that could live in the "notes"? And so, why should the origin link have less privileges? Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 5:16 ` Paolo Bonzini @ 2008-09-11 7:55 ` Stephen R. van den Berg 2008-09-11 8:45 ` Paolo Bonzini 2008-09-11 12:33 ` A Large Angry SCM 0 siblings, 2 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-11 7:55 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Jeff King, Theodore Tso, Petr Baudis, git Paolo Bonzini wrote: >> Being able to subvert the authenticity of git blame by providing fake >> origin information is not very appealing. >You could use a dummy submodule to ensure that each commit pointed to >the right set of notes. It would force to create a separate commit >whenever you modified the notes, which is actually not bad. Possibly, yes. But we'd have to be careful not to incur too much overhead because every indirection will cost, especially since the origin link sometimes is checked for on every commit during a treewalk. The fact that it rarely exists means that it should be fast to find out that there are no origin links (which obviously is the common case). >Alternatively, the header of the commit can be modified to add a pointer >to a tree object for the notes; I suppose this is more palatable than >the origin link. This won't work for the original notes concept, because it makes the notes immutable after commit. For the origin links this would be fine, since they don't change once committed. The problem with fitting the origin links in the notes is twofold: - They become mutable, which is undesirable, I'd like to preserve history as is (just like parent links). - There is a performance hit, since origin links need to be found not to exist on every commit (sometimes, depending on the operation of course). > The tree could be organized in directories+blobs like >..git/objects to speed up the lookup. Yes, that was already in the latest proposal for notes, I believe. >I actually like the commit notes idea, but then I wonder: why are the >author and committer part of the commit object? How does the plumbing >use them? Isn't that metadata that could live in the "notes"? And so, It would fit with a non-mutable version of the notes. Then again, we already *have* the non-mutable version of the notes, it's called the header of the commit message. >why should the origin link have less privileges? They both belong in the non-mutable notes, and those happen to live in the header of the commit (which *is* the most efficient spot, of course). -- Sincerely, Stephen R. van den Berg. "There are three types of people in the world; those who can count, and those who can't." ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 7:55 ` Stephen R. van den Berg @ 2008-09-11 8:45 ` Paolo Bonzini 2008-09-11 12:33 ` A Large Angry SCM 1 sibling, 0 replies; 137+ messages in thread From: Paolo Bonzini @ 2008-09-11 8:45 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Jeff King, Theodore Tso, Petr Baudis, git >> I actually like the commit notes idea, but then I wonder: why are the >> author and committer part of the commit object? How does the plumbing >> use them? Isn't that metadata that could live in the "notes"? And so, > > we already *have* the non-mutable version of the notes, it's called the > header of the commit message. Yes, that was my point. I don't see how the author and committer fit in the header of the commit message, if the origin does not. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-11 7:55 ` Stephen R. van den Berg 2008-09-11 8:45 ` Paolo Bonzini @ 2008-09-11 12:33 ` A Large Angry SCM 1 sibling, 0 replies; 137+ messages in thread From: A Large Angry SCM @ 2008-09-11 12:33 UTC (permalink / raw) To: Stephen R. van den Berg Cc: Paolo Bonzini, Jeff King, Theodore Tso, Petr Baudis, git Stephen R. van den Berg wrote: > It would fit with a non-mutable version of the notes. Then again, we > already *have* the non-mutable version of the notes, it's called the > header of the commit message. Almost correct. Remove "header of" from the above and you'd be correct. ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 22:55 ` Jeff King 2008-09-10 23:19 ` Stephen R. van den Berg @ 2008-09-11 2:46 ` Nicolas Pitre 1 sibling, 0 replies; 137+ messages in thread From: Nicolas Pitre @ 2008-09-11 2:46 UTC (permalink / raw) To: Jeff King; +Cc: Stephen R. van den Berg, Theodore Tso, Petr Baudis, git On Wed, 10 Sep 2008, Jeff King wrote: > On Thu, Sep 11, 2008 at 12:34:27AM +0200, Stephen R. van den Berg wrote: > > > I see. Indeed. That's a lot better. > > Did the binary search inside tree objects ever get implemented? > > I believe it's still linear (and skimming tree-walk.c:find_tree_entry > seems to confirm). However, one advantage of such an approach is that it > will improve as tree lookup improves (e.g., I believe the pack v4 work > included improvements in this area). No, not yet. Actually that's the part that still needs serious thinking. Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 14:16 ` Stephen R. van den Berg 2008-09-10 15:10 ` Jeff King @ 2008-09-10 16:18 ` Theodore Tso 2008-09-10 16:40 ` Petr Baudis 2 siblings, 0 replies; 137+ messages in thread From: Theodore Tso @ 2008-09-10 16:18 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Petr Baudis, git On Wed, Sep 10, 2008 at 04:16:30PM +0200, Stephen R. van den Berg wrote: > The renumbering is not a problem, renumbering is a rare operation since > a project's history is supposed to be stable. And even if renumbering > is performed, it is a well understood operation of which the renumbering > of the origin links imposes a negligible overhead on top of the existing > renumbering overhead. Well *you* were the one using this as an argument for using the origin link. But I'll note that in some workflows, rebasing happens all the time when a patch is being developed and moved around. Sometimes patches are created in git, exported as a patch, and then it re-enters git again later (which is another reason why using an external UUID or bug tracking identifier is a good thing). > >Addresses-Bug: Red_Hat/149480, Sourceforge_Feature/120167 > >or > >Addresses-Bug: Debian/432865, Launchpad/203323, Sourceforge_Bug/1926023 > > >Once you have this information, it is not difficult to maintain a > >berk_db database which maps a particular Bug identifier (i.e., > >Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of > >commits. > > This is nice, I admit, but it has the following downsides: > - It is nontrivial to automate this on execution of "git cherry-pick". It's trivial if it's in the free-form text. In fact, it happens automatically. If it's stored within the git commit object, then it will be done in the C code (if you've updated to the latest git; again, one of the advantages of doing it in free-form text). > - In a distributed environment this requires a network-reachable bug > database. > - A network-reachable bug database means that suddenly git needs network > access for e.g. cherry-pick, revert, gitk, log --graph, blame. > - Network queries for commits containing references kind of kills > performance. No, because you don't need to look up the bug identifier unless you want to, you know, actually look at the bug. Otherwise, we are just using something like "debian/432865" as an identifier; you only need to look them up if you want to look up the bug. Any time you have a collaborative development environment, you will need either a centralized, network accessible bug tracking system, or use a distributed bug tracking system. Either way, though, if it's just matter of seeing whether or not a bug fix such as debian/432865 is fixed by some commit in some branch, using the bug identifier actually makes this *easier*, not harder. > - Some backports don't have entries in a bug database because they > weren't bugs to begin with, in which case it becomes impossible to add > an identifier to the commit message after the fact. This is true. The transition is a little easier if you are pointing to a pre-existing commit, whereas if you need some kind of rendevous identifer (whether it is a bug ID or some UUID). On the other hand, you've cherry-picked some bug fix using a git that didn't support the origin link, you'd also be screwed, so > - It relies heavily on tools outside of git-core, which raises the > threshold for using it. Well, it relies on changes to git --- just like the origin link requires changes to git. If the it is implemented using free-form text, which is a great way to prototype it, you have the *option* of implementing it via either git porcelain changes or outside tools like emacs or vi macros (just as most of us who are kernel developers have editor macros that insert Signed-off-by: into git commit messages, as well as changes in git porcelain such that "git am -s" automatically adds the Signed-off-by header). But given the wildly successful use of Signed-off-by in the kernel sources, this objection seems not very credible, to say the least. > The recommended practice here is quite simple: > > - Origin links should only be created pointing to stable commits (i.e. > commits which you'd be willing to publish or already have published). > > - This implies that pointing an origin link at a commit in a strain that > you still want to rebase is asking for trouble. Doing this is akin to > doing a merge between two branches and then you start rebasing 4 > commits *below* the mergepoint. Don't do that. Right. And if we use a UUID to identify commits, then we don't have to have these restrictions. > - The only special case I'd allow is if you rebase a strain and the > origin link points from one of the commits in the strain to be rebased > back *into* the same strain being rebased (most likely a revert). > Rebase can be bothered to renumber the origin link in this case. Nope, because you might have a branch to the original origin link, and some body else may have already done a cherry-pick to the original origin commit. You've hand-waved around the problem by saying, "don't do that", but it just points out how **fragile** the origin link scheme really is. It's just not robust. In contrast, generating a UUID per commit is much more robust, since you can now export it out of git in a patch, and then re-import it later, and have the right thing happen. > >(and I am not convinced that you do), the ***much*** better approach > >is to use the same approach as the bug tracking identifier, and add a > >level of indirection. How would that work in practice? Whenever you > >create a new commit, create a UUID which is assigned to the patch. > > This only works if you know at time of commit that you want to backport > it at some later date. I'm suggesting that all commits (once you upgrade to a version of git that supports this --- and you've already handwaved away the question on whether you can get all of developers for a project to upgrade to the latest git, remember) would have a UUID generated. That UUID could be stored internal to git, or (perhaps as an initial prototyping as a proof of concept, before we add something into the git commit record **forever**) could be in the free-form text. > >Yes, it means that you have to maintain a separate database so you can > >easily find the list of commits that contain a particular UUID, but I > >suspect you would need this in the case of the origin link concept > >anyway, since sooner or later some of the more useful uses of said > >link would require you to be able to find the commits which had origin > >links to the original commit, which means you would need to create and > >maintain this database anyway. > > That isn't true. Finding commits which have origin links to a certain > commit is just as hard as finding all children of a certain commit. > It's not exactly instant, but it is not a big problem, and depending on > the amount of repositorytraversal you already are doing, it might even > be a negligible amount of extra overhead. My point is you'll need this separate database anyway, in order to deal with the cases where you have two commits that point to the same (non-existent) origin link, one in maint1, and one in maint2, and given the commit in the maint1 branch, you want to see if there is related comit in the maint2 branch you'll need this database anyway (or you do a brute force search of the repository, which isn't too bad for modest datbases). It's identical in both cases --- but having a UUID field in the commit is much *cleaner*, since it merely states that these two commits introduce the same semantic change; it doesn't imply some kind of parent/child relationship which an origin link implies. > The database needs to be available to anyone doing a clone of the > repository, which implies that: > - It needs to be network based. > - It needs controlled write access (which is a mess). > - It is slow during blame/gitk operations. > - It is rather nontrivial to get things setup such that someone (after > cloning the repository) is able to run cherry-pick/gitk/blame/revert > and have those commands use the database transparently. No it doesn't, since the database can be inferred from the objects in the repository. So you can generate it locally if you need it, merely as an optimization. The same is true for the origin link proposal, as I've said. - Ted ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 14:16 ` Stephen R. van den Berg 2008-09-10 15:10 ` Jeff King 2008-09-10 16:18 ` Theodore Tso @ 2008-09-10 16:40 ` Petr Baudis 2008-09-10 17:58 ` Paolo Bonzini 2 siblings, 1 reply; 137+ messages in thread From: Petr Baudis @ 2008-09-10 16:40 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: Theodore Tso, git On Wed, Sep 10, 2008 at 04:16:30PM +0200, Stephen R. van den Berg wrote: > Theodore Tso wrote: > > And the maintenance of this database > >is purely optional; you only need it if you care about efficiently > >looking up UUID's, and given "time git log > /dev/null" on the kernel > >tree only takes six seconds on my laptop, and "git log > /dev/null" > >only takes 0.148 seconds for e2fsprogs, for many projects you might > >not even need the database to accelerate lookups via UUID. > > The database needs to be available to anyone doing a clone of the > repository, which implies that: > - It needs to be network based. > - It needs controlled write access (which is a mess). > - It is slow during blame/gitk operations. > - It is rather nontrivial to get things setup such that someone (after > cloning the repository) is able to run cherry-pick/gitk/blame/revert > and have those commands use the database transparently. The database can just live in a special branch, with trees organized the same way the object database is, possibly in a more optimized way (having the HEAD trees cached around inside Git, etc.). This should be no rocked science if the design is given a little thought, and should be fairly fast afterwards. I'm not endorsing assigning UUIDs to commits now at all (but I don't have time to formulate a comprehensive argument against that either). However, having a commit -> nonessential_volatile_metadata database would be useful for many other things as well! For example amending commit messages later, maintaining general linkage between related commits, tracking explicit rename hints for Git (like the Samba guys would appreciate right now, and me many times in the past - note that this is NOT the same as directly tricking renames within Git history) or caching expensive computations with mostly static results (like the rename detection or maybe pickaxe indexes - that could be quite large, so we might want to actually separate different kinds of data to separate branches). -- Petr "Pasky" Baudis The next generation of interesting software will be done on the Macintosh, not the IBM PC. -- Bill Gates ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 16:40 ` Petr Baudis @ 2008-09-10 17:58 ` Paolo Bonzini 2008-09-10 22:44 ` Stephen R. van den Berg 0 siblings, 1 reply; 137+ messages in thread From: Paolo Bonzini @ 2008-09-10 17:58 UTC (permalink / raw) To: Petr Baudis; +Cc: Stephen R. van den Berg, Theodore Tso, git > I'm not endorsing assigning UUIDs to commits now at all (but I don't > have time to formulate a comprehensive argument against that either). > > However, having a commit -> nonessential_volatile_metadata database > would be useful for many other things as well! 100 points to Petr. :-) Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 17:58 ` Paolo Bonzini @ 2008-09-10 22:44 ` Stephen R. van den Berg 0 siblings, 0 replies; 137+ messages in thread From: Stephen R. van den Berg @ 2008-09-10 22:44 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Petr Baudis, Theodore Tso, git Petr wrote: >> However, having a commit -> nonessential_volatile_metadata database >> would be useful for many other things as well! Which brings us back to the "commit notes" proposal. -- Sincerely, Stephen R. van den Berg. "Am I paying for this abuse or is it extra?" ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 21:13 ` Petr Baudis 2008-09-09 22:56 ` Stephen R. van den Berg @ 2008-09-10 8:45 ` Paolo Bonzini 2008-09-23 13:51 ` Recording "partial merges" (was: Re: [RFC] origin link for cherry-pick and revert) Peter Krefting 2 siblings, 0 replies; 137+ messages in thread From: Paolo Bonzini @ 2008-09-10 8:45 UTC (permalink / raw) To: Petr Baudis; +Cc: Stephen R. van den Berg, git > Having history browsers draw fancy lines is fine but I see nothing wrong > with them extracting this from the free-form part of the commit message. > For informative purposes, we don't shy away from heuristics anyway, c.f. > our renames detection (heck, we are even brave enough to use that for > merges). ... and it works only because false positives (which make the merge actively wrong) are extremely rare. If there is the occasional false negative, well, it just makes merges more complicated and screws up visualization a bit, so you live with it. Move/copy detection almost always worked for me, but there are two cases where it didn't: 1) empty files. Each of them is marked as copied from a seemingly-picked-at-random one. 2) renaming a Gtk+ class. You rename it (e.g. from gtkclassnamea.c to gtkclassnameb.c) and at the same time do s/GtkClassNameA/GtkClassNameB/ s/GTK_CLASS_NAME_A(?:\>|?=_)/GTK_CLASS_NAME_B/ s/gtk_class_name_a(?:\>|?=_)/gtk_class_name_b/ and reindent everything. Guaranteed to have a similarity index around 30-40%, not more. I don't care much about it, but face it, it is *not* perfect. Paolo ^ permalink raw reply [flat|nested] 137+ messages in thread
* Recording "partial merges" (was: Re: [RFC] origin link for cherry-pick and revert) 2008-09-09 21:13 ` Petr Baudis 2008-09-09 22:56 ` Stephen R. van den Berg 2008-09-10 8:45 ` Paolo Bonzini @ 2008-09-23 13:51 ` Peter Krefting 2 siblings, 0 replies; 137+ messages in thread From: Peter Krefting @ 2008-09-23 13:51 UTC (permalink / raw) To: Git Mailing List Petr Baudis: > I think this is misguided. In general case, cherrypicks can be from > completely unrelated histories, and if you are doing the cherry pick, > you are saying that actually, the history *does not matter*. As my workflow sometimes make me do cherry-picks where history does matter, in form of me doing a "partial merge" of one or more than one commit from branch A into branch B, which does not necessarily have to be directly related, is there a way to perform something like that, while keeping history? Perhaps I'm damaged by having used CVS for too long, and merging just some files, or abusing CVS internals to make some files on branch A also be part of branch B by having their branches point to the same RCS branch revision number, but sometimes I find that I miss being able to do it in Git. Not really a big deal, just curious. -- \\// Peter - http://www.softwolves.pp.se/ ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg ` (3 preceding siblings ...) 2008-09-09 21:13 ` Petr Baudis @ 2008-09-10 20:32 ` Miklos Vajna 2008-09-10 20:55 ` Nicolas Pitre 4 siblings, 1 reply; 137+ messages in thread From: Miklos Vajna @ 2008-09-10 20:32 UTC (permalink / raw) To: Stephen R. van den Berg; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1150 bytes --] On Tue, Sep 09, 2008 at 03:22:12PM +0200, "Stephen R. van den Berg" <srb@cuci.nl> wrote: > origin a1184d85e8752658f02746982822f43f32316803 2 > author Junio C Hamano <gitster@pobox.com> 1220132115 -0700 > committer Junio C Hamano <gitster@pobox.com> 1220153445 -0700 First, sorry for joining the thread lately, as far as I see the idea I want to shere here was not mentioned by anybody yet. So, git revert already includes the "origin" of the commit in the commit message, and I think that is fine for most people. What about adding an option to cherry-pick to add a similar "commit 7b27718bdb1b70166383dec91391df5534d449ee upstream" or similar string to the commit message? As far as I see the kernel -stable tree already have this, but it is added manually and in many different forms, like: [ Upstream commit 5f3a9a207f1fccde476dd31b4c63ead2967d934f ] commit 7b27718bdb1b70166383dec91391df5534d449ee upstream Already in Linus' tree: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b25b791b13aaa336b56c4f9bd417ff126363f80b etc. Once git would provide a standard way to do this, that could be used to avoid this. [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 20:32 ` [RFC] origin link for cherry-pick and revert Miklos Vajna @ 2008-09-10 20:55 ` Nicolas Pitre 2008-09-10 21:06 ` Miklos Vajna 0 siblings, 1 reply; 137+ messages in thread From: Nicolas Pitre @ 2008-09-10 20:55 UTC (permalink / raw) To: Miklos Vajna; +Cc: Stephen R. van den Berg, git On Wed, 10 Sep 2008, Miklos Vajna wrote: > So, git revert already includes the "origin" of the commit in the commit > message, and I think that is fine for most people. > > What about adding an option to cherry-pick to add a similar > "commit 7b27718bdb1b70166383dec91391df5534d449ee upstream" or similar > string to the commit message? It's already there: -x Nicolas ^ permalink raw reply [flat|nested] 137+ messages in thread
* Re: [RFC] origin link for cherry-pick and revert 2008-09-10 20:55 ` Nicolas Pitre @ 2008-09-10 21:06 ` Miklos Vajna 0 siblings, 0 replies; 137+ messages in thread From: Miklos Vajna @ 2008-09-10 21:06 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Stephen R. van den Berg, git [-- Attachment #1: Type: text/plain, Size: 185 bytes --] On Wed, Sep 10, 2008 at 04:55:19PM -0400, Nicolas Pitre <nico@cam.org> wrote: > It's already there: -x Thanks, and sorry for not reading carefully the manpage before sending the mail. [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 137+ messages in thread
end of thread, other threads:[~2008-09-23 13:52 UTC | newest] Thread overview: 137+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg 2008-09-09 13:38 ` Paolo Bonzini 2008-09-09 14:04 ` Stephen R. van den Berg 2008-09-09 13:48 ` Stephen R. van den Berg 2008-09-09 15:44 ` Jakub Narebski 2008-09-09 16:38 ` Steven Grimm 2008-09-09 19:43 ` Stephen R. van den Berg 2008-09-09 19:59 ` Jeff King 2008-09-09 20:25 ` Stephen R. van den Berg 2008-09-09 20:42 ` Junio C Hamano 2008-09-09 20:47 ` Shawn O. Pearce 2008-09-09 20:50 ` Jeff King 2008-09-09 22:35 ` Jakub Narebski 2008-09-09 23:07 ` Jakub Narebski 2008-09-10 8:10 ` Paolo Bonzini 2008-09-10 0:13 ` Stephen R. van den Berg 2008-09-10 1:59 ` Junio C Hamano 2008-09-10 5:38 ` Stephen R. van den Berg 2008-09-09 21:05 ` Junio C Hamano 2008-09-09 21:09 ` Jeff King 2008-09-09 23:36 ` Stephen R. van den Berg 2008-09-09 20:54 ` Jakub Narebski 2008-09-09 23:08 ` Stephen R. van den Berg 2008-09-09 23:35 ` Linus Torvalds 2008-09-09 23:58 ` Stephen R. van den Berg 2008-09-10 0:23 ` Linus Torvalds 2008-09-10 5:42 ` Stephen R. van den Berg 2008-09-10 15:30 ` Linus Torvalds 2008-09-10 23:09 ` Stephen R. van den Berg 2008-09-11 0:39 ` Linus Torvalds 2008-09-11 6:22 ` Stephen R. van den Berg 2008-09-11 8:20 ` Jakub Narebski 2008-09-11 12:31 ` Stephen R. van den Berg 2008-09-11 13:51 ` Theodore Tso 2008-09-11 15:32 ` Stephen R. van den Berg 2008-09-11 18:00 ` Theodore Tso 2008-09-11 19:03 ` Stephen R. van den Berg 2008-09-11 19:33 ` Nicolas Pitre 2008-09-11 19:44 ` Stephen R. van den Berg 2008-09-11 20:03 ` Nicolas Pitre 2008-09-11 20:24 ` Stephen R. van den Berg 2008-09-11 20:05 ` Jakub Narebski 2008-09-11 20:22 ` Stephen R. van den Berg 2008-09-12 0:30 ` A Large Angry SCM 2008-09-12 5:39 ` Stephen R. van den Berg 2008-09-11 20:04 ` Theodore Tso 2008-09-11 21:46 ` Jeff King 2008-09-11 22:56 ` Stephen R. van den Berg 2008-09-11 23:01 ` Jeff King 2008-09-11 23:17 ` Stephen R. van den Berg 2008-09-11 23:10 ` Linus Torvalds 2008-09-11 23:26 ` Jeff King 2008-09-11 23:36 ` Stephen R. van den Berg 2008-09-11 15:02 ` Nicolas Pitre 2008-09-11 16:00 ` Stephen R. van den Berg 2008-09-11 17:02 ` Nicolas Pitre 2008-09-11 18:44 ` Stephen R. van den Berg 2008-09-11 20:00 ` Nicolas Pitre 2008-09-11 21:05 ` Junio C Hamano 2008-09-11 22:32 ` Stephen R. van den Berg 2008-09-11 22:40 ` Stephen R. van den Berg 2008-09-11 12:28 ` A Large Angry SCM 2008-09-11 12:39 ` Stephen R. van den Berg 2008-09-12 0:03 ` A Large Angry SCM 2008-09-12 0:13 ` Stephen R. van den Berg 2008-09-11 15:39 ` Linus Torvalds 2008-09-11 16:01 ` Paolo Bonzini 2008-09-11 16:23 ` Linus Torvalds 2008-09-11 20:16 ` Stephen R. van den Berg 2008-09-11 16:53 ` Jakub Narebski 2008-09-11 19:23 ` Stephen R. van den Berg 2008-09-11 19:45 ` Nicolas Pitre 2008-09-11 19:55 ` Stephen R. van den Berg 2008-09-11 20:27 ` Nicolas Pitre 2008-09-12 8:50 ` Stephen R. van den Berg 2008-09-11 21:01 ` Theodore Tso 2008-09-12 8:40 ` Stephen R. van den Berg 2008-09-10 8:30 ` Paolo Bonzini 2008-09-10 15:32 ` Linus Torvalds 2008-09-10 15:37 ` Paolo Bonzini 2008-09-10 15:43 ` Linus Torvalds 2008-09-10 15:46 ` Linus Torvalds 2008-09-10 15:57 ` Paolo Bonzini 2008-09-10 23:15 ` Stephen R. van den Berg 2008-09-10 16:23 ` Jakub Narebski 2008-09-11 23:28 ` Sam Vilain 2008-09-11 23:44 ` Linus Torvalds 2008-09-12 2:24 ` Sam Vilain 2008-09-12 5:47 ` Stephen R. van den Berg 2008-09-12 6:19 ` Rogan Dawes 2008-09-12 6:56 ` Stephen R. van den Berg 2008-09-12 14:58 ` Theodore Tso 2008-09-12 15:05 ` Paolo Bonzini 2008-09-12 15:11 ` Jakub Narebski 2008-09-12 15:40 ` Paolo Bonzini 2008-09-12 16:00 ` Theodore Tso 2008-09-12 15:54 ` Stephen R. van den Berg 2008-09-12 16:19 ` Jeff King 2008-09-12 16:43 ` Stephen R. van den Berg 2008-09-12 18:44 ` Theodore Tso 2008-09-12 20:56 ` Stephen R. van den Berg 2008-09-15 12:21 ` Sam Vilain 2008-09-09 23:59 ` Jakub Narebski 2008-09-09 21:13 ` Petr Baudis 2008-09-09 22:56 ` Stephen R. van den Berg 2008-09-09 23:05 ` Petr Baudis 2008-09-09 23:32 ` Stephen R. van den Berg 2008-09-10 9:35 ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini 2008-09-10 10:44 ` Petr Baudis 2008-09-10 11:49 ` Stephen R. van den Berg 2008-09-10 12:30 ` Petr Baudis 2008-09-10 13:14 ` Stephen R. van den Berg 2008-09-10 14:33 ` Dmitry Potapov 2008-09-10 15:15 ` Stephen R. van den Berg 2008-09-10 15:24 ` Paolo Bonzini 2008-09-10 12:21 ` [RFC] origin link for cherry-pick and revert Theodore Tso 2008-09-10 14:16 ` Stephen R. van den Berg 2008-09-10 15:10 ` Jeff King 2008-09-10 21:50 ` Stephen R. van den Berg 2008-09-10 21:54 ` Jeff King 2008-09-10 22:34 ` Stephen R. van den Berg 2008-09-10 22:55 ` Jeff King 2008-09-10 23:19 ` Stephen R. van den Berg 2008-09-11 5:16 ` Paolo Bonzini 2008-09-11 7:55 ` Stephen R. van den Berg 2008-09-11 8:45 ` Paolo Bonzini 2008-09-11 12:33 ` A Large Angry SCM 2008-09-11 2:46 ` Nicolas Pitre 2008-09-10 16:18 ` Theodore Tso 2008-09-10 16:40 ` Petr Baudis 2008-09-10 17:58 ` Paolo Bonzini 2008-09-10 22:44 ` Stephen R. van den Berg 2008-09-10 8:45 ` Paolo Bonzini 2008-09-23 13:51 ` Recording "partial merges" (was: Re: [RFC] origin link for cherry-pick and revert) Peter Krefting 2008-09-10 20:32 ` [RFC] origin link for cherry-pick and revert Miklos Vajna 2008-09-10 20:55 ` Nicolas Pitre 2008-09-10 21:06 ` Miklos Vajna
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).