* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and @ 2006-04-29 16:51 linux 2006-04-29 17:35 ` Linus Torvalds 2006-04-29 18:27 ` Jakub Narebski 0 siblings, 2 replies; 16+ messages in thread From: linux @ 2006-04-29 16:51 UTC (permalink / raw) To: git Boy, this is an interesting discussion! On the one hand, it seems "obvious" to me that extra links might be useful. But Linus's minimalist points have a lot of merit. I have to agree, it's important to think of a single practical use before adding the feature. So let's do a little brainstorming... For just referring to another commit, there's no problem putting it in the body. A sensible porcelain GUI will, when it seems something that looks like an object identifier in a comment, and that object identifier exists, make it a clickable link. So a comment like: "This fixes the same problem as <commit>, but is a cleaner (albeit more invasive) fix." Would do the right thing: the user reading it could easily jump to the other comment. A "header" link as opposed to a "comment" link just has the property of being unambiguous. No heuristic will guess that a link should exist when there isn't. So, what is that property useful for? Now, one thing that porcelains provide, in addition to "parent" links, is "child" links. Useful. But it could be done with commit comment links as well, and it's not clear that having the link in the commit header as opposed to the comment would help much. You still have to find and uncompress part of each commit to generate the history tree. Does uncompressing the rest of it and running a heuristic over the text for really cost that much? I'm not convinced it's needed for that feature. (I'd sooner argue for never compressing commit objects in packs on the grounds that the repeated uncompression while browsing is worth saving more than the relatively minor disk space.) So to be valuable, and inadvisable to express with a specially formatted comment, it has to be something that would be Very Bad to get wrong. What qualifies? Maybe some merge algorithm information? If the merge could be told that this change "is the same" as that change, so it can be skipped when cherry-picking that branch, and the information was wrong, that could cause lots of problems. But given that git-cherry already uses (imperfect) heuristics to detect already-merged patches, and they seem to work well enough, is that a strong enough argument? Is there some other merge application where it would help? Now, the "this other object should exist in the repository, and it's an error if you can't fetch it" link obviously needs to be unambiguously distinguished from, say, a reference to the (Linux kernel) dodecapus merge in a git tree checkin comment. But, as Linus says, what reason is there for including it? What do you need the commit in the repository for? Well, the only reason that you need ANY commit in the repository is because it's part of history, and comparing it with other versions is meaningful. So what trees, not already in the ancestry graph of a given commit, are useful to compare to? In particular, useful for some automated process; manual comparisons can always be done manually. Nothing's jumping out at me. Any suggestions? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-29 16:51 [RFC] [PATCH 0/5] Implement 'prior' commit object links (and linux @ 2006-04-29 17:35 ` Linus Torvalds 2006-04-29 18:07 ` Jakub Narebski 2006-04-29 18:27 ` Jakub Narebski 1 sibling, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2006-04-29 17:35 UTC (permalink / raw) To: linux; +Cc: git On Sat, 29 Apr 2006, linux@horizon.com wrote: > > Well, the only reason that you need ANY commit in the repository is > because it's part of history, and comparing it with other versions is > meaningful. So what trees, not already in the ancestry graph of a > given commit, are useful to compare to? In particular, useful for some > automated process; manual comparisons can always be done manually. > > Nothing's jumping out at me. Any suggestions? The only thing that I've ever wondered about is the "base commit of a merge". Now, the thing is, we can always compute it. That's true _iff_ we've merged using the standard merge mechanism, but it wasn't always true historically (eg the original merges were computed with the original "git-merge-base" algorithm, which just picked the _first_ merge base it would find, while these days we use multiple ones for criss-cross merges). So I would not totally object if a merge algorithm added a merge-base <sha1> notation. But while it _could_ be just a "note merge-base <sha1>", it should _not_ be a "link <sha1> merge-base". Let me explain why I think there are differences between those three options, and why I actually think that two of them are "valid" ideas, while the third one is not. - Case 1: the merge-base <sha1> is a "valid" idea (where there might of course be more than one <sha1>, and possibly more than one "merge-base" line: you'd have to have some rule for what happens for a recursive merge), although it has the generally big down-side of being redundant information in all current setups. It's redundant, but at the same time it's information that in _theory_ might not be redundant, because I can see a situation where a merge was forced by manually specifying a merge base (eg a special merge like the original "gitk" merge, merging two initially unrelated projects together). In theory. So it could be real information for a merge commit. And we'd enforce some kind of real semantics for it - and it would have a really solid technical meaning: assuming we define the multi-merge-base semantics properly it would NEVER have any question about "what are best practices?" or "what does this mean?". So this "case 1" actually has technical consequences, but you can, for example, actually _check_ them. You can make fsck literally complain if the merge base doesn't make sense. There's a clear "technical violation", which might not be entirely trivial to figure out, but thanks to it having a good meaning and a strict definition, it's _there_. Now, in all honesty, I don't think "case 1" is a _good_ thing to do. I'm just saying that I wouldn't be as upset about it as I've been over this "link" discussion. The reason I think "case 1" sucks is simply that I think you can in _practice_ get all the benefits much better with "case 2", even if that one doesn't imply any actual git semantics: - Case 2: the note merge-base <sha1> thing is _also_ a perfectly valid idea, because now it's also very well-defined: the "note" part tells you that git doesn't actually impose any semantics what-so-ever on it, so it's really just a comment, and as in case 1 above, once you see it as a comment, the _meaning_ of it is immediately clear. It's literally just a note from the merge algorithm saying "I used this as a merge base". The "note" syntax actually has a huge advantage. When you see it as a comment from the merge algorithm, you immediately think it might also be a good idea to add a few other notes. So a merge commit might actually have note merge-algorithm recursive note merge-conflicts none note merge-base <sha1> all make total sense. It's telling you what the algorithm used was, and that it didn't neen any manual fixups. It's also telling you that none of this has _any_ impact what-so-ever from a "git semantics" angle, and that this is nothing but a note for anybody who starts digging into it. So now I've shown _two_ examples of some kind of header that I think actually makes sense, and that I would not argue against on those grounds. Especially the "note" thing I think is fine. So why, oh why, do I hate the "link" thing so much? - Case 3: the link <sha1> merge-base thing is a horrible and nasty thing that we should never ever support. Why? Because it's literally designed to both have some semantic meaning ("git will fetch the <sha1> and use it for connectivity analysis") _and_ at the same time the whole syntax it's designed to _not_ have any real meaning ("you can have any kind of link, and I don't know what it actually means from a conceptual standpoint"). So it has a meaning from an _implementation_ angle, but at the same time it does not have a "higher cause". That is EVIL. When they say "The road to hell is paved with good intentions", the implication there is not that good intentions is bad per se, but that you should understand that there are "Unintended Consequences". And if you cannot limit the thing to a very _specific_ higher-level meaning, you by definition will have those "unintended consequences". In short, the difference between three headers that on the face of it say exactly the same thing: "merge-base <sha1>", "note merge-base <sha1>", and "link merge-base <sha1>" is not that they have different syntax (hey, even the syntax itself is almost identical), but exactly the fact that they have different implications and _meaning_. Two of the three have no unintended consequences. One ("note") has no technical "consequences" at _all_, by definition. The other "merge-base" has no technical "unintended" at all, because it's throught through, and has been fully defined. The third? "unintended consequences". It doesn't have a clear definition ("It's cool. You can use it for any link you want"). So pretty much BY DESIGN, it's set up so that you don't know what the consequences of it will be for a project. And that's why "case 3" it's bad. Even though it looks very much like the two other ones. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-29 17:35 ` Linus Torvalds @ 2006-04-29 18:07 ` Jakub Narebski 2006-04-29 19:30 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Jakub Narebski @ 2006-04-29 18:07 UTC (permalink / raw) To: git Linus Torvalds wrote: > - Case 1: the > > merge-base <sha1> [...] > - Case 2: the > > note merge-base <sha1> [...] > - Case 3: the > > link <sha1> merge-base [...] > In short, the difference between three headers that on the face of it say > exactly the same thing: "merge-base <sha1>", "note merge-base <sha1>", and > "link merge-base <sha1>" is not that they have different syntax (hey, even > the syntax itself is almost identical), but exactly the fact that they > have different implications and _meaning_. > > Two of the three have no unintended consequences. One ("note") has no > technical "consequences" at _all_, by definition. The other "merge-base" > has no technical "unintended" at all, because it's throught through, and > has been fully defined. > > The third? "unintended consequences". It doesn't have a clear definition > ("It's cool. You can use it for any link you want"). So pretty much BY > DESIGN, it's set up so that you don't know what the consequences of it > will be for a project. > > And that's why "case 3" it's bad. Even though it looks very much like the > two other ones. IF (and that is big if) git commit header will be extended to have some extra "link" (enforcing connectivity) headers, like proposed "bind" for subprojects, "prev" for pu-like union branches, "merge-base" for merges, there would be repeated work on enforcing connectivity. Hence generic "link" header (formerly "related") proposal. Having fsck report broken links (or not), having purge removing commits (objects) reachable only via "link" headers, having pull download commits via "link" headers... have I forgot anything? It _seems_ that this part is common, and does not depend on semantics. But with "links" (connectivity headers) there always would be some other consequences. For example info/grafts deals for now only with commit parents, and extending the format could be difficult. And of course if we want connectivity, this is for some reason, so the "link" has some other consequences, for example "prev" and "merge-base" for merging, "bind" for checkout, merge (but differently), etc. I think that if it is 'helper' information (i.e. information which is helpful, but we can do without it) and of no real importance to user then use "note". If it is of importance to user (for example "cherrypick" or "reverted") and of use to git, then repeat such info in "note" header to avoid relying on parsing free-form part aka. commit comment. If connectivity is needed... hmmm... -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-29 18:07 ` Jakub Narebski @ 2006-04-29 19:30 ` Junio C Hamano 0 siblings, 0 replies; 16+ messages in thread From: Junio C Hamano @ 2006-04-29 19:30 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > IF (and that is big if) git commit header will be extended to have some > extra "link" (enforcing connectivity) headers, like proposed "bind" for > subprojects, "prev" for pu-like union branches, "merge-base" for merges, > there would be repeated work on enforcing connectivity. Hence generic > "link" header (formerly "related") proposal. The "link <sha1> <type> <meta>" header extension was done primarily for that reason this way. I carried it in my "pu" branch for a few days but Linus convinced me privately that it was a bad idea, so it is not merged in "pu" anymore. Just to make it easy for people to view what we are discussing, I pushed the branch head to jc/bind-2 topic branch, but the code will _not_ be merged. The code in commit.c to recognize and link the releated objects pointed by the "link" header to the commit looked like below (see 11bbee26 commit on that branch): + optr = &item->links; + while (!memcmp(bufptr, "link ", 5)) { + struct object *object; + + if (!get_sha1_hex(bufptr + 5, parent) && + bufptr[45] == ' ' && + (object = lookup_unknown_object(parent)) != NULL) { + struct object_list *l = xmalloc(sizeof(*l)); + l->item = object; + l->next = *optr; + l->name = NULL; + *optr = l; + optr = &l->next; + n_refs++; + bufptr += 45; + } + else + return error("bad link in commit %s", + sha1_to_hex(item->object.sha1)); + while (*bufptr++ != '\n') + ; /* skip over subdirectory name */ + } But if your are going to introduce "merge-base" and similar headers that have impact to connectivity traversal code, you can easily change the !memcmp(buptr, "link ", 5) with a sequence of "memcmp(foo) || memcmp(bar) || ...", and use the "l->name" field to point at the header itself, so that the user of the resulting commit object can easily tell what kind of link-like header it is, and enforce further semantics that are specific to each kind of such header on it. The revision traversal change that was done in a later commit (7091fd commit) does not have to change. The code sharing aspect you brought up is a very important issue. This is revision traversal, which is really the central part of git and needs deep thought to touch without breaking, so we would like to avoid risking breaking it by repeatedly touching it. But that can be done without making the recorded header something like "link <sha1> <type> <metainfo>" which is too generic. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-29 16:51 [RFC] [PATCH 0/5] Implement 'prior' commit object links (and linux 2006-04-29 17:35 ` Linus Torvalds @ 2006-04-29 18:27 ` Jakub Narebski 2006-04-29 20:44 ` Junio C Hamano 1 sibling, 1 reply; 16+ messages in thread From: Jakub Narebski @ 2006-04-29 18:27 UTC (permalink / raw) To: git On Sat, 29 Apr 2006, linux@horizon.com wrote: > > Well, the only reason that you need ANY commit in the repository is > because it's part of history, and comparing it with other versions is > meaningful. So what trees, not already in the ancestry graph of a > given commit, are useful to compare to? In particular, useful for some > automated process; manual comparisons can always be done manually. > > Nothing's jumping out at me. Any suggestions? See below. Not necessary all those require connectivity. Most of them are not my ideas. * "prior" - heads that represent topic branch merges This is the "pu" branch case, where the head is a merge of several topic branches that is continually moved forward. topic branches head ,___. ,___. | TA1 | | TB1 | `---' `---' ,__. ^\_____^\____| H1 | `--' + some topic branch changes and a republish: ,___. ,___. | TA1 | | TB1 | `---' `---'^ ,__. |^\_____^\____| H1 | | | `--' ,_|_. ,_|_. P | TA2 | | TB2 | | `---' `---'^ | ^ ^ | ,_|_. | | | TA3 | | | `---' | ,__. ^\______\____| H2 | `--' key: ^ = parent P = prior * "bind" - for subprojects bind links from master project commit to externally managed embedded third-party project, for example Linux kernel for some mainly userspace project, or library or engine for some application. Additionally it provides root dir where to attach subproject. * "original" for rebase before rebase: A---B---C topic / / / D---E---F---G master after rebase ------A---B---C / ^ ^ ^ / : : : / A'--B'--C' topic / / D---E---F---G master where ':' denotes "original" link. Note that old branch is not pointed by any head, and would be pruned without connectivity * "original" or "cherrypick" for cherry-picking A--------B---C bugfix / ^ / : D---E---F---G---B'---H main * "revert" for reverting commits -- Jakub Narebski ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-29 18:27 ` Jakub Narebski @ 2006-04-29 20:44 ` Junio C Hamano 2006-04-29 20:58 ` Jakub Narebski 2006-05-01 0:05 ` Sam Vilain 0 siblings, 2 replies; 16+ messages in thread From: Junio C Hamano @ 2006-04-29 20:44 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > * "prior" - heads that represent topic branch merges This is not any different from usual "parent" at all (but you have to think about it a bit to realize it). Before talking about making a new commit object that links to other related commits, let's first talk about what it means to update the branch head ($GIT_DIR/refs/heads/<branch>) from commit A to commit B. Understanding what it means is more fundamental. A git "branch" points at the tip of one possible history of a development. As the often-used word "topic branch" tells you, a "branch", i.e. that history, has a specific purpose. The purpose of my "master" branch is to give reasonably stable new feature set and bugfixes, "next" to give testable ones, and "pu" to collect remaining bits that are worthy of discussion. When your branch head points at commit A and you update the head to point at a different commit B, you are making this statement: The commit B suits the purpose of the branch better than the commit A. Notice there may or may not be ancestry relation between these two commits at this point of the discussion. B may be a direct child of commit A, a merge that has A as its first parent, a merge that has A as its one of its parent (but not necessarily the first), or a Nth-generation descendant if the update was a fast forward merge from another branch. It might even be an ancestor if the update rewinds the history. Among the above cases (and there may be others), in only two cases you actually create a new commit to record that statement [*1*]. The simplest case is when commit B is a direct, single-parent child of commit A, and that statement is in your commit log message. "I started out from the commit A, and the result is this tree. The result suits what I am doing better than the previous commit and I made the world a better place." -- the "I started out from the commit A" part is on the parent header and the rest is in the free-text. When you are creating a merge of N parents, the principle is the same. Although in pure core-git terms all parents are equal, in practice, the first parent has somewhat special meaning to you. When the parents of commit B are A and X, you started out from the commit A. Then what are other parents? You can read such a commit this way: I started out from commit A and came up with this tree, which suits my purpose better. While doing so, I have also considered what X has; and this result, commit B, suits my purpose better than X, too. This is why a later merge with another branch that further builds on top of X works so well. ----A----B / ----X----Y If somebody built Y on X independently from us, when we merge with Y, we say the merge base is X because B says "I've already considered what X has" to do a 3-way merge. While that is what happens at the mechanical level, what is happening at the philosophical level is we are taking "I consider that B is better than X", part of the message seriously, which means "I want to keep changes I made between X B". Also the other person who made Y made a similar statement that she considers Y is better than X, and we try to preserve the changes between X and Y in the automated part of the merge while preparing the tree to commit the merge between B and Y. Once you start reading the commit parent to mean " considering what all of these commits have, what this new commit has suits my purpose better", it becomes clear that the "previous" pointer for a branch like my "pu" is just another "parent". I rebuild "pu" from the tip of then-current "next", and merge other topics in, and discard the previous "pu". So it results in this kind of graph: o---o---o---o---o (updated "pu") / / / / ---o---o---o---o \ \ \ \ \ o---------------o---o---o---o (previous "pu") But theoretically, I could include the previous "pu" tip as one of the parents of the updated "pu" branch. At the mechanical level, I start from then-current "next" and merge each topic branch one-by-one on top of it. But at the philosophical level, what I am doing is to publish material that shows a set of proposed changes that are more appropriate for review by the curious than the previous round of "pu" head used to have. So the previous "pu" _is_ in the consideration while I publish the updated "pu", although it is _not_ recorded anywhere. After I come up with a fully merged tree, I could make a fake Octopus that has the previous "pu" as its first parent and each of the topic branch heads merged as second and subsequent parents, with the resulting tree. That would be more "honest" at the philosophical level. I am not going to actually suggest anybody doing this as a good practice, but we can make such a commit with the current tool like this: git checkout pu git tag -f prev-pu ;# remember where we were git reset --hard next ;# start at next git pull . topic-1 ;# merge all remaining topics git pull . topic-2 ;# ... git pull . topic-3 ... git tag -f next-pu ;# this tree is what we want git reset --hard prev-pu ;# start from previous git pull --no-commit -s ours . next topic-1 topic-2 ... git read-tree -m -u next-pu ;# record a merge whose first git commit ;# parent is previous pu and ;# has all the topics merged. [Footnote] *1* IOW, we _are_ losing some information by not recording the fact that fast-forward was done while doing so. That record should _not_ be in the commit chain. At the mechanical level, recording that in the commit chain means two criss-crossing branches never converge at the commit chain level, which is already bad. At the philosophical level, the commit chain is a mesh of many possible "global" histories, and the record that somebody (a particular branch in a particular repository) was at what point in the mesh at given time does not belong there. But from the repository-owner's point of view, that _might_ be a useful information to keep. I am just saying this preemptively so that if somebody wants to record it, that should not be recorded in the commit object. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-29 20:44 ` Junio C Hamano @ 2006-04-29 20:58 ` Jakub Narebski 2006-04-30 15:21 ` Jakub Narebski 2006-05-01 0:05 ` Sam Vilain 1 sibling, 1 reply; 16+ messages in thread From: Jakub Narebski @ 2006-04-29 20:58 UTC (permalink / raw) To: git Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > >> * "prior" - heads that represent topic branch merges > > This is not any different from usual "parent" at all (but you > have to think about it a bit to realize it). [cut] Thanks for an explanation. I would say that "prior" is not THAT different from usual "parent", rather than it is not ANY different. My doubts about recording previous head of a "union" (pu-like) branch is that for merge (e.g. 'pu' to 'next', cherrypick to/from 'pu', 'pu' rebase) is that for merge algorithm all parents are equivalent, with eventual exception of first which can be treated special ('ours'). -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-29 20:58 ` Jakub Narebski @ 2006-04-30 15:21 ` Jakub Narebski 2006-04-30 23:19 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Jakub Narebski @ 2006-04-30 15:21 UTC (permalink / raw) To: git Jakub Narebski wrote: > Junio C Hamano wrote: > >> Jakub Narebski <jnareb@gmail.com> writes: >> >>> * "prior" - heads that represent topic branch merges >> >> This is not any different from usual "parent" at all (but you >> have to think about it a bit to realize it). > [cut] > Thanks for an explanation. > > I would say that "prior" is not THAT different from usual "parent", > rather than it is not ANY different. > > My doubts about recording previous head of a "union" (pu-like) branch > is that for merge (e.g. 'pu' to 'next', cherrypick to/from 'pu', 'pu' > rebase) is that for merge algorithm all parents are equivalent, with > eventual exception of first which can be treated special ('ours'). Additionally with "prior" (or at least some convention on which of parents is to prior head of "union (pu-like) branch) I think we could fast-forward such branches... -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-30 15:21 ` Jakub Narebski @ 2006-04-30 23:19 ` Junio C Hamano 2006-05-01 0:50 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2006-04-30 23:19 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: >>> This is not any different from usual "parent" at all (but you >>> have to think about it a bit to realize it). >> >> I would say that "prior" is not THAT different from usual "parent", >> rather than it is not ANY different. >> >> My doubts about recording previous head of a "union" (pu-like) branch >> is that for merge (e.g. 'pu' to 'next', cherrypick to/from 'pu', 'pu' >> rebase) is that for merge algorithm all parents are equivalent, with >> eventual exception of first which can be treated special ('ours'). > > Additionally with "prior" (or at least some convention on which of parents > is to prior head of "union (pu-like) branch) I think we could fast-forward > such branches... This is why I said you have to think about it a bit to realize that the "prior" is not _ANY_ different from the ordinary parent for something like "pu". We can fast-forward if (1) you pulled from "pu" the last time, and (2) you haven't added anything on top of it on your own, and (3) you pull from "pu" again, if the previous "pu" (i.e. your "pu") is a parent of the updated "pu". We do not need "prior" for that. The old "pu" being _one_ _of_ the parents, not even necessarily be the first one, would do just fine. If you have built on top of the last "pu", obviously we do not want to fast-forward with or without "prior". Your doubts about the merge is also unfounded. The current "pu" head is (against my own recommendation not to do so) a hydra cap. It is a direct child of the previous "pu" that merges all the leftover bits along with what was in 'next' when the commit was made, so you could do something like this to experiment: git branch test-1 pu^1 echo >>Makefile '# End of Makefile' git commit -m 'build on top of previous "pu"' Makefile git pull . pu ;# Merge whatever happened in "pu" ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-30 23:19 ` Junio C Hamano @ 2006-05-01 0:50 ` Junio C Hamano 2006-05-01 1:25 ` Sam Vilain 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2006-05-01 0:50 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Junio C Hamano <junkio@cox.net> writes: > We can fast-forward if (1) you pulled from "pu" the last time, > and (2) you haven't added anything on top of it on your own, and > (3) you pull from "pu" again, if the previous "pu" (i.e. your > "pu") is a parent of the updated "pu". We do not need "prior" > for that. The old "pu" being _one_ _of_ the parents, not even > necessarily be the first one, would do just fine. This part may want a bit more elaboration. Often, we see in the Linus kernel tree a fast forward of his tip from a recent commit Linus made to bunch of networking commits made by David S Miller. For example, Linus fast forwarded to 18118c from David's tree before making this commit: commit 454ac778459bc70f0a9818a6a8fd974ced11de66 Merge: 18118cd... 301dc3e... Author: Linus Torvalds <torvalds@g5.osdl.org> AuthorDate: Mon Apr 24 20:08:08 2006 -0700 Commit: Linus Torvalds <torvalds@g5.osdl.org> CommitDate: Mon Apr 24 20:08:08 2006 -0700 The first parent of this commit is one not made by Linus; that is how we can tell he fast forwarded. We cannot easily tell where the tip of Linus tree was before he made this fast forward (it is not recorded anywhere), but if we look at 18118c commit: commit 18118cdbfd1f855e09ee511d764d6c9df3d4f952 Author: Patrick McHardy <kaber@trash.net> AuthorDate: Mon Apr 24 17:18:59 2006 -0700 Commit: David S. Miller <davem@sunset.davemloft.net> CommitDate: Mon Apr 24 17:27:34 2006 -0700 [NETFILTER]: ipt action: use xt_check_target for basic verification we could sort-of make a guess, by looking at merge-base of 18118c and 301dc3. By looking at gitk 6b426e..18118c 454ac7 we can tell that David "forked" from Linus at 6b426e commit. What does it mean for Linus to fast-forward to the tip of David? Earlier I said that each branch has a purpose, and replacing the current tip commit of the branch with another commit is a statement by the repository owner that the new commit suits the purpose of the branch better. To David, the commits he has in the chain between 6b426e to 18118c obviously suited the purpose of his tree better, and that was why these commits were made. And the fact Linus fast forwarded to the tip of David is an implicit statement by Linus that that results suits the purpose of Linus tree better as well compared to his old tip, presumably 6b426e. Earlier I suggested (or at least may have sounded as if I was suggesting) that not recording that statement in fast-forward situation was a bad thing, but that is not necessarily so. Having 18118c commit as part of the history that leads to the tip is enough as such a statement by Linus. Now, David's tree has a tendency to be extra clean (no merges but straight commits on top of then-current tip of Linus), but if he had his own merge from Linus's tree, such a commit would have had a commit from Linus tree as its second parent. If Linus tip remained at that "second parent" commit until David is done and asked Linus to pull, it would result in a fast forward via non-first-parent ancestry. But even if that happened, the above discussion still applies. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-05-01 0:50 ` Junio C Hamano @ 2006-05-01 1:25 ` Sam Vilain 2006-05-01 4:44 ` Jakub Narebski 2006-05-01 6:58 ` Junio C Hamano 0 siblings, 2 replies; 16+ messages in thread From: Sam Vilain @ 2006-05-01 1:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jakub Narebski, git Junio C Hamano wrote: >>We can fast-forward if (1) you pulled from "pu" the last time, >>and (2) you haven't added anything on top of it on your own, and >>(3) you pull from "pu" again, if the previous "pu" (i.e. your >>"pu") is a parent of the updated "pu". We do not need "prior" >>for that. The old "pu" being _one_ _of_ the parents, not even >>necessarily be the first one, would do just fine. >> >> > >This part may want a bit more elaboration. > >Often, we see in the Linus kernel tree a fast forward of his tip >from a recent commit Linus made to bunch of networking commits >made by David S Miller. For example, Linus fast forwarded to >18118c from David's tree before making this commit: > [...] >To David, the commits he has in the chain between 6b426e to >18118c obviously suited the purpose of his tree better, and that >was why these commits were made. And the fact Linus fast >forwarded to the tip of David is an implicit statement by Linus >that that results suits the purpose of Linus tree better as well >compared to his old tip, presumably 6b426e. > > Aha, now I see reason in the madness. So, the "prior" head is not stored in the trees, and tracking the progress of actual head transitions is loosely defined / a research topic. But demonstrably derivable. That works for me. Sam. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-05-01 1:25 ` Sam Vilain @ 2006-05-01 4:44 ` Jakub Narebski 2006-05-01 6:58 ` Junio C Hamano 1 sibling, 0 replies; 16+ messages in thread From: Jakub Narebski @ 2006-05-01 4:44 UTC (permalink / raw) To: git Take a look at complexity of that explanation. And the need for additional commit. That balanced against all the headaches of having connectivity header other than "parent". Perhaps it would be better (and easier) just to say note prior parent^1 or note prior <sha1> repeating <sha1> found in parent. Just a thought. -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-05-01 1:25 ` Sam Vilain 2006-05-01 4:44 ` Jakub Narebski @ 2006-05-01 6:58 ` Junio C Hamano 2006-05-02 0:21 ` Sam Vilain 1 sibling, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2006-05-01 6:58 UTC (permalink / raw) To: Sam Vilain; +Cc: git, Jakub Narebski Sam Vilain <sam@vilain.net> writes: > Junio C Hamano wrote: > >>To David, the commits he has in the chain between 6b426e to >>18118c obviously suited the purpose of his tree better, and that >>was why these commits were made. And the fact Linus fast >>forwarded to the tip of David is an implicit statement by Linus >>that that results suits the purpose of Linus tree better as well >>compared to his old tip, presumably 6b426e. > > Aha, now I see reason in the madness. So, the "prior" head is not stored > in the trees, and tracking the progress of actual head transitions is > loosely defined / a research topic. But demonstrably derivable. That > works for me. I do not think there is any madness involved here, but I should point out that the above example happens to work only because Linus and David are two different people. If Linus did the David's work in a separate repository, or even in the same repository but on a separate branch, people following the Linus tip might still want to know about the fast-forward, but that is something you cannot truly tell by the digging like what I did in the previous message. That is why I earlier said this: *1* IOW, we _are_ losing some information by not recording the fact that fast-forward was done while doing so. That record should _not_ be in the commit chain. At the mechanical level, recording that in the commit chain means two criss-crossing branches never converge at the commit chain level, which is already bad. At the philosophical level, the commit chain is a mesh of many possible "global" histories, and the record that somebody (a particular branch in a particular repository) was at what point in the mesh at given time does not belong there. But from the repository-owner's point of view, that _might_ be a useful information to keep. I am just saying this preemptively so that if somebody wants to record it, that should not be recorded in the commit object. I do not think the commit object is the place to record it, even with a purely-comment field like "note prior". The commit ancestry DAG is global in nature, and the information under discussion, "before pointing at this commit, the branch that made this commit happened to point at this other commit", is not. That information describes only one-branch's view of the world, and would not work in the fast-forward case because no new commit is created. An important property of a fast-forward is that we do not create an extra commit object that makes it impossible for two criss-crossing branches to ever converge. On the other hand, a "note" field that records on which branch of which repository each commit was made (you need to give each repository-branch an UUID) when you do create a new commit would be a sensible thing to have if somebody cares deeply enough. It is an information that is global in nature, and with that, you could do the digging like I did without relying on the committer identity, but instead using the branch identity. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-05-01 6:58 ` Junio C Hamano @ 2006-05-02 0:21 ` Sam Vilain 2006-05-02 7:08 ` Martin Langhoff 0 siblings, 1 reply; 16+ messages in thread From: Sam Vilain @ 2006-05-02 0:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Jakub Narebski Junio C Hamano wrote: >>Aha, now I see reason in the madness. So, the "prior" head is not stored >>in the trees, and tracking the progress of actual head transitions is >>loosely defined / a research topic. But demonstrably derivable. That >>works for me. >> >> >I do not think there is any madness involved here, but I should > > Sorry, it was a figure of speech. It's more like, what appeared to be madness no longer looks so. >point out that the above example happens to work only because >Linus and David are two different people. If Linus did the >David's work in a separate repository, or even in the same >repository but on a separate branch, people following the Linus >tip might still want to know about the fast-forward, but that is >something you cannot truly tell by the digging like what I did >in the previous message. > >That is why I earlier said this: > > *1* IOW, we _are_ losing some information by not recording the > fact that fast-forward was done while doing so. > > That record should _not_ be in the commit chain. At the > mechanical level, recording that in the commit chain means two > criss-crossing branches never converge at the commit chain > level, which is already bad. > > Here I'm a little bit confused still. Surely criss-crossing branches already don't converge unless the commits are in the same order. Oh, I see. Even if they *are* in the same order, the commit IDs would end up different due to these extra headers. > At the philosophical level, the > commit chain is a mesh of many possible "global" histories, and > the record that somebody (a particular branch in a particular > repository) was at what point in the mesh at given time does not > belong there. > > But from the repository-owner's point of view, that _might_ be a > useful information to keep. I am just saying this preemptively > so that if somebody wants to record it, that should not be > recorded in the commit object. > > That makes sense. >On the other hand, a "note" field that records on which branch >of which repository each commit was made (you need to give each >repository-branch an UUID) when you do create a new commit would >be a sensible thing to have if somebody cares deeply enough. It >is an information that is global in nature, and with that, you >could do the digging like I did without relying on the committer >identity, but instead using the branch identity. > > That sounds reasonable. The UUID doesn't need to replicate, either, just tag the commits that were made against it. This extra information falls into the informational, "forensic" history tracing category. ie, we don't know now whether we'll need it, but we'll store it anyway just to be sure to not make later operations impossible. I think the large remaining question is around what conventions apply to the use of the "note" field. We have perhaps the first example of a well formed piece of "forensic" information that belongs in the commit chain and could possibly be added by plumbing. I can't think of any more of those, but the rename/copy tracking case is a bit different. In this case, it doesn't belong in the plumbing, yet you want a reasonable convention for storing this information to apply. Also the other cases outlined in the original post might do well to have a common convention so that the information is more portable between porcelain. Sam. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-05-02 0:21 ` Sam Vilain @ 2006-05-02 7:08 ` Martin Langhoff 0 siblings, 0 replies; 16+ messages in thread From: Martin Langhoff @ 2006-05-02 7:08 UTC (permalink / raw) To: Sam Vilain; +Cc: Junio C Hamano, git, Jakub Narebski On 5/2/06, Sam Vilain <sam@vilain.net> wrote: > Here I'm a little bit confused still. Surely criss-crossing branches > already don't converge unless the commits are in the same order. They do under GIT. No matter how much you criss-cross, every time you identify a merge base for the next merge, you are identifying the last commit in common on both branches. While maybe you didn't have that commit being the tip of a head in your repo, it _is_ the last common point. If your criss-crossing is messy and a few commits are out of order or cherry picked, git-merge has a good chance of spotting it. The whole mechanism tends pulls quite consistently towards convergence. If the notes in the commit msg aren't consistent and we lose the natural tendency towards convergence that's a major drawback. On the other hand, if two branches have exchanged patches "out of band", git-merge still gets it right most of the time, so perhaps slightly different headers in the commit messages are tolerable? Junio had written: > >On the other hand, a "note" field that records on which branch > >of which repository each commit was made (you need to give each > >repository-branch an UUID) when you do create a new commit would > >be a sensible thing to have if somebody cares deeply enough. I really don't like that -- goes against the grain of really simple, portable repos. I cp -pr repo{,_tmp} all the time to do risky merges or save a heavy download from a remote server. Let me run away from this idea... quick before Linus kills us all ;-) I did feel a couple of times the need of remembering where I had checked this in -- but it went away quite quickly, must have been a leftover of my Arch days ;-). And it actually got solved by agreeing within my team to a commit message format pretty much like what's used in the kernel. Because the truth is that most of my heads and branches have very "local" names. cheers, martin ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and 2006-04-29 20:44 ` Junio C Hamano 2006-04-29 20:58 ` Jakub Narebski @ 2006-05-01 0:05 ` Sam Vilain 1 sibling, 0 replies; 16+ messages in thread From: Sam Vilain @ 2006-05-01 0:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jakub Narebski, git Junio C Hamano wrote: >> * "prior" - heads that represent topic branch merges >> >> > >This is not any different from usual "parent" at all (but you >have to think about it a bit to realize it). > [...] >Once you start reading the commit parent to mean " considering >what all of these commits have, what this new commit has suits >my purpose better", it becomes clear that the "previous" pointer >for a branch like my "pu" is just another "parent". > > How can you look back at the merge history and determine which of these scenarios is the case? It still looks like to me that you are recording two distinct types of parent using the same type of link. You're now just expanding the definition of parent so they look to be the same. Actually it might be alright if you have an extra merge commit object. ie, make a complete merge of the new tips, then make a second merge that merges the two heads. It's still a little bit of a research topic to look at that mess and figure out which type of relationship each parent actually is, but if you really want to decide that is that and done is done then I guess we'll all just have to live with it or fork. Sam. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2006-05-02 7:09 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-29 16:51 [RFC] [PATCH 0/5] Implement 'prior' commit object links (and linux 2006-04-29 17:35 ` Linus Torvalds 2006-04-29 18:07 ` Jakub Narebski 2006-04-29 19:30 ` Junio C Hamano 2006-04-29 18:27 ` Jakub Narebski 2006-04-29 20:44 ` Junio C Hamano 2006-04-29 20:58 ` Jakub Narebski 2006-04-30 15:21 ` Jakub Narebski 2006-04-30 23:19 ` Junio C Hamano 2006-05-01 0:50 ` Junio C Hamano 2006-05-01 1:25 ` Sam Vilain 2006-05-01 4:44 ` Jakub Narebski 2006-05-01 6:58 ` Junio C Hamano 2006-05-02 0:21 ` Sam Vilain 2006-05-02 7:08 ` Martin Langhoff 2006-05-01 0:05 ` Sam Vilain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).