* What's the meaning of `parenthood' in git commits?
@ 2006-11-08 0:39 Nix
2006-11-08 0:52 ` Jakub Narebski
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Nix @ 2006-11-08 0:39 UTC (permalink / raw)
To: git
So I'm back on the weird porcelain I mentioned months and months ago,
the one which treats source trees as named collections of patches merged
together in different ways, almost like stgit on steroids, only not.
It occurred to me recently that packed refs provide about 50% of what I
need (efficient handling of lots and lots of refs); most of the other
50% consits of a new extremely weird git merge strategy,
`git-merge-patched', which merges branches A and B by finding the most
recent merge-base between branch B and any branch listed in
.git/refs/trunks (`trunks' being a directory holding heads which are
treated this way by this weird merge strategy; the porcelain will have
to keep it up to date, which shouldn't be too terribly hard), and
patch(1)ing the diff between that base and the tip of branch B into
branch A. (A patch rejection, of course, means merge-by-hand and commit,
as usual with merge conflicts.)
The idea being that if you have a tree like this:
B
------------- ref trunks/latest
\
------ ref heads/some-change-foo
... -------- ref trunks/old-and-grotty
then this merge strategy, when asked to merge heads/some-change-foo into
trunks/old-and-grotty would spot that point B was the most recent
merge point into anything in trunks/, generate a diff between point B
and heads/some-change-foo, and patch it into trunks/old-and-grotty.
(I *know* this is really weird, but I've got a choice of doing this or
continuing to use SCCS with the world's most horrible shell script
wrapper as the source code repository for ~5Gb of source, with tens of
thousands of files in a flat directory structure, expanded to 50Gb
because we're storing binary files in there by the astonishingly
inefficient means of uuencoding them and sccsing the result: you may be
sick now. I know which I'd prefer. I may be distorting git into
something unrecognisable to its own father but it's that or I go insane
*and* run out of disk space.)
After all that setup, my question's simple. Does a `parent' in git
terminology simply mean `this commit was derived in some way from the
commit listed here'? If so, I suppose I can list heads/some-change-foo
as one parent on these merge commits, even though the `merging'
mechanism is so odd that I expect to be pelted with rotten vegetables as
soon as I post this.
But it's that or SCCS.
(Of course this will go into a public git repository for people to laugh
at. I don't expect anyone to actually *use* it.)
--
Rich industrial heritage: lifeless wasteland. `The land
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: What's the meaning of `parenthood' in git commits? 2006-11-08 0:39 What's the meaning of `parenthood' in git commits? Nix @ 2006-11-08 0:52 ` Jakub Narebski 2006-11-08 0:58 ` Linus Torvalds 2006-11-08 1:13 ` Junio C Hamano 2 siblings, 0 replies; 7+ messages in thread From: Jakub Narebski @ 2006-11-08 0:52 UTC (permalink / raw) To: git Nix wrote: > After all that setup, my question's simple. Does a `parent' in git > terminology simply mean `this commit was derived in some way from the > commit listed here'? If so, I suppose I can list heads/some-change-foo > as one parent on these merge commits, even though the `merging' > mechanism is so odd that I expect to be pelted with rotten vegetables as > soon as I post this. Yes, being parent means that this commit was derived in some way from the commit listed here. It needs not to be this commit is the result of merge of commits listed here... there was a discussion some time ago to use one of parents (first for example) instead of special header for "prev" link to previous value of the ref (which discussion was obsoleted by reflog). It provies two things you have to think about if to use 'parenthood' for something a bit unexpected. First, parents are connectivity, so even if you delete trunks/some-name and then prune, averything that was merged in some branch or tag which lives still wouldn't get pruned. Second, the information about merges is used in merge strategies: consider if having this information would help your strange merge strategy. And of course there is a question if the graph as visualized by for example gitk would have more sense or not with the "strange merges" marked as merges. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What's the meaning of `parenthood' in git commits? 2006-11-08 0:39 What's the meaning of `parenthood' in git commits? Nix 2006-11-08 0:52 ` Jakub Narebski @ 2006-11-08 0:58 ` Linus Torvalds 2006-11-08 1:28 ` Nix 2006-11-08 1:13 ` Junio C Hamano 2 siblings, 1 reply; 7+ messages in thread From: Linus Torvalds @ 2006-11-08 0:58 UTC (permalink / raw) To: Nix; +Cc: git On Wed, 8 Nov 2006, Nix wrote: > > [ Nix explains what he's doing now with SCCS ]: you may be > sick now. Wow. You've got some strange setup there, Nix. > After all that setup, my question's simple. Does a `parent' in git > terminology simply mean `this commit was derived in some way from the > commit listed here'? Well, strictly speaking, git doesn't itself assign much any real meaning to "parent" at all. It has the obvious meanings: - the parent pointers act as reachability graph edges (so fsck cares about it a lot, of course) - listing the "log" of a commit will show everything reachable from that commit and it's parents, of course (with the commit date-stamp being used as a "ordering" when having multiple choices of commits to show) - it has the obvious meanings for the "revision arithmetic", ie revision name parsing (ie "commit~3^2") - parenthood will be used to show the diff ("git show", "git log -p" and friends) - the "merge-base" algorithms obviously use it to find the most recent common ancestor, and that in turn impacts the normal merge strategies, of course. so parenthood does obviously have a number of very specific technical meanings for different programs, but at the same time, no, git doesn't really "care". You can happily generate your own parenthood if you want to, and git will just continue to follow the above rules. > If so, I suppose I can list heads/some-change-foo as one parent on these > merge commits, even though the `merging' mechanism is so odd that I > expect to be pelted with rotten vegetables as soon as I post this. Yeah, git won't care. If you screw up parenthood, you have a few problems: - the diffs may look really strange. In particular, if you list multiple parents, the git "diff" functions will all just assume that it's a merge, and a "git show" will start showing the combined diff (which is usually empty). So if you end up having multiple parents, not because it was "really" a merge, but because you use the other parent pointer to point to some "source" for the patch, things like "git log -p" won't give nice output any more. You need to manually ask for the diff with something like # show diff from second parent git diff commit^2..commit instead. - listing too _few_ parents is potentially more serious, if you have reachability issues (ie you wanted to keep the other source around, but since you didn't list it as a parent, git won't know that it had anything to do with your commit, so it may be pruned away unless you have some other way to reach it) but if you just have a really strange merge algorithm, and the _data_ associated with the parents is "surprising" from the standpoint of the default merge, git really won't care at all. Your usage does sound a bit strange. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What's the meaning of `parenthood' in git commits? 2006-11-08 0:58 ` Linus Torvalds @ 2006-11-08 1:28 ` Nix 2006-11-08 3:04 ` Nix 0 siblings, 1 reply; 7+ messages in thread From: Nix @ 2006-11-08 1:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: git On 8 Nov 2006, Linus Torvalds uttered the following: > On Wed, 8 Nov 2006, Nix wrote: >> >> [ Nix explains what he's doing now with SCCS ]: you may be >> sick now. > > Wow. You've got some strange setup there, Nix. It's what happens when a version-control system gets implemented as an emergency hack when moving from VMS, by people who don't really grok Unix shell scripting... and then you let fifteen years pass, and nobody dares touch the hack because it's so damned delicate. It took months of agony to implement crude half-functional branching in this. Writing a git porcelain should be vastly simpler, even with the overhead of a conversion tool as well. Writing that conversion tool will be fun :( e.g. I'm going to have to identify branches by diffing/xdeltaing each version of a file with every single previous version of that file, and if the diff is smallest against a version other than the immediate ancestor, it's assumed to be a branch against that version. (I'm going to have to fake up packed refs for these tiny branches so that they're at least accessible in emergencies, gah.) It's all, well, nasty. But all will be so much happier in the shining world of git. >> After all that setup, my question's simple. Does a `parent' in git >> terminology simply mean `this commit was derived in some way from the >> commit listed here'? > > Well, strictly speaking, git doesn't itself assign much any real meaning > to "parent" at all. It has the obvious meanings: Oh *good*, that's what I thought. [snip more things which match my understanding] > - parenthood will be used to show the diff ("git show", "git log -p" and > friends) I'll list the patch-merged parent as the second parent, so that you'll only get the mostly-useless huge diff from that if you actually ask for it, and will get a more useful result with ^. > - the "merge-base" algorithms obviously use it to find the most recent > common ancestor, and that in turn impacts the normal merge strategies, > of course. Hm, yeah, if merging iterates down patch-merged branches it might have interesting consequences, because the trees on one side of patch- merges are likely to be very different to trees on the other side (years of development separate them). I'd like a way to specify that those parents are *not* to be traversed by the merge-base algorithms, really. A series of not-merge-base: <sha1 id> headers, perhaps? (I think that's likely to involve much less code churn than introducing a new `not-merge-base-parent' tag). > Yeah, git won't care. If you screw up parenthood, you have a few problems: > > - the diffs may look really strange. In particular, if you list multiple > parents, the git "diff" functions will all just assume that it's a > merge, and a "git show" will start showing the combined diff (which is > usually empty). It is a merge, so that's right. It's just a rather odd merge. (I don't envisage actual *changes* being made in these commits except to resolve conflicts.) > So if you end up having multiple parents, not because it was "really" a > merge, but because you use the other parent pointer to point to some > "source" for the patch, things like "git log -p" won't give nice output > any more. You need to manually ask for the diff with something like Well, I was envisaging that the other parent pointer would point to the tip of the changes tree. Going back to that graph again: B ------------- ref trunks/latest \ ------ ref heads/some-change-foo ... -------- ref trunks/old-and-grotty The idea is that the patch-merge of trunks/old-and-grotty and heads/some-change-foo would consist textually of the diff between B and heads/some-change-foo, applied to trunks/old-and-grotty, and would list as its parents trunks/old-and-grotty, *and heads/some-change-foo*. (Perhaps this isn't really a merge after all? Should merge parents be treated as differently as this? It'll all be covered over by the porcelain in any case: it won't be possible to confuse a trunk/ with a normal head and accidentally patch-merge in the wrong direction.) > - listing too _few_ parents is potentially more serious, if you have > reachability issues (ie you wanted to keep the other source around, but > since you didn't list it as a parent, git won't know that it had > anything to do with your commit, so it may be pruned away unless you > have some other way to reach it) Yeah, that would be bad. > but if you just have a really strange merge algorithm, and the _data_ > associated with the parents is "surprising" from the standpoint of the > default merge, git really won't care at all. Good. > Your usage does sound a bit strange. Agreed. But there are hundreds of people banging on my door asking for a proper version control system, quilt isn't a proper version control system in that sense, and stgit has... issues when you try to distribute it and when you have a lot of people working on one tree at once: plus it doesn't fit our weird workflow with multiple parallel release branches, at least one active development trunk, and all changes done under a carefully-controlled bug tracking system (it's as if *every* change has a bugzilla ticket associated, *always*, and we expect to be able to get from ticket to change efficiently). (We do both distribution and working-copy-sharing: the trees are too large to have one tree per person, not least because each tree requires an entire Oracle instance of its own to play with and massive amounts of memory; and we have geographically distributed sites with trees of their own.) -- Rich industrial heritage: lifeless wasteland. `The land ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What's the meaning of `parenthood' in git commits? 2006-11-08 1:28 ` Nix @ 2006-11-08 3:04 ` Nix 0 siblings, 0 replies; 7+ messages in thread From: Nix @ 2006-11-08 3:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: git On 8 Nov 2006, nix@esperi.org.uk spake thusly: > On 8 Nov 2006, Linus Torvalds uttered the following: >> - the "merge-base" algorithms obviously use it to find the most recent >> common ancestor, and that in turn impacts the normal merge strategies, >> of course. > > Hm, yeah, if merging iterates down patch-merged branches it might have > interesting consequences, because the trees on one side of patch- merges > are likely to be very different to trees on the other side (years of > development separate them). I'd like a way to specify that those parents > are *not* to be traversed by the merge-base algorithms, really. > > A series of > > not-merge-base: <sha1 id> > > headers, perhaps? (I think that's likely to involve much less code churn > than introducing a new `not-merge-base-parent' tag). Wrong. Sort of. When doing normal merges you don't want to consider patch-merged parents as real merges: but there is one situation when you *do* want merge-base checking to traverse such links. Say you have the tree just described: B ------------- ref trunks/latest \ ------ ref heads/some-change-foo ... -------- ref trunks/old-and-grotty and you want to patch-merge heads/some-change-foo with trunks/old-and-grotty. It doesn't quite apply, so you end up with a conflict-resolution. This will normally be in the merge commit, but there's no guarantee of that: perhaps you knew the source tree would conflict in advance and fixed it up so that it wouldn't, leaving the old heads/some-change-foo pointing before that fixup: B ------------- ref trunks/latest \ ------- ref heads/some-change-foo D \ c | ... -------------- ref trunks/old-and-grotty Later on, you find a bug in that change. It's still the same conceptual change, so you fix it, and you want to patch-merge the fix across: B ------------- ref trunks/latest \ -----------\ ref heads/some-change-foo C \ . c . (link under construction) | . ... -------------- ref trunks/old-and-grotty E F What patch-merge must do in order to produce a diff-merge at point F is therefore rather more involved than I'd hoped: - determine B as above (most recent merge-base of heads/some-change-foo with anything in trunks/). - determine the merge-base of trunks/old-and-grotty with heads/some-change-foo, *traversing patch-merge parents*. Call this base C. (This is the only circumstance in which merge-base determination should traverse patch-merged parents.) - Iff that base C is topologically a child of B, then we have already merged part of this change in the past. In that case, instead of the merge consisting of the diff between B and F, it consists of the diff between C and the head, minus the set of changes c. So it remains to determine c. - scan backwards along F with git-rev-list, searching specifically for the most recent patch-merge naming any commit which has C as a transitive parent: that is point E. (Such a point must exist as long as only patch-merges have been used to merge heads/some-change-foo with trunks/old-and-grotty: if other sorts of merge have been used, all bets are off and I think we can legitimately fail the merge.) (This requires the ability to distinguish patch-merges from normal merges, but that's easy if we have any tag at all to distinguish them, which we must for merge- base traversal to avoid such parents normally.) - Reverse out the diff between C and E (if the two are not the same commit) and remember it temporarily as c. - Apply the forwards diff between point C and heads/some-change-foo, and then apply c in the forwards direction (if c is already present, this is not an error: it just means that whatever conflict- resolution was necessary as a one-off was later needed on the change trunk). I think that should cope with just about everything. I've tried to mock up all sorts of contrived trees and I can't find anything that doesn't reduce to that case or a simplification of it. (And no, this case is not contrived: we test on trunks, so we deal with it whenever anything fails testing and has to be fixed...) (Now all I have to do is write it... enough words, time for action. Actually time for sleep, it's three in the morning here. Action tomorrow.) -- Rich industrial heritage: lifeless wasteland. `The land ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What's the meaning of `parenthood' in git commits? 2006-11-08 0:39 What's the meaning of `parenthood' in git commits? Nix 2006-11-08 0:52 ` Jakub Narebski 2006-11-08 0:58 ` Linus Torvalds @ 2006-11-08 1:13 ` Junio C Hamano 2006-11-08 1:36 ` Nix 2 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2006-11-08 1:13 UTC (permalink / raw) To: Nix; +Cc: git Nix <nix@esperi.org.uk> writes: > The idea being that if you have a tree like this: > > B > ------------- ref trunks/latest > \ > ------ ref heads/some-change-foo > > ... -------- ref trunks/old-and-grotty > > > then this merge strategy, when asked to merge heads/some-change-foo into > trunks/old-and-grotty would spot that point B was the most recent > merge point into anything in trunks/, generate a diff between point B > and heads/some-change-foo, and patch it into trunks/old-and-grotty. This is a standard "cherry-picking" practice. > After all that setup, my question's simple. Does a `parent' in git > terminology simply mean `this commit was derived in some way from the > commit listed here'? When you think about commit ancestry, think of it this way: These commits I list as its parents of this new commit, and everything that leads to them, are what I considered when derived this commit. This new child commit of them suits the purpose of _my_ branch better than any of these parent commits I took into consideration because of such and such reasons that I stated in its commit log message. If you mark the resulting commit on old-and-grotty to have some-change-foo as one of its parents, because some-change-foo has almost everything 'latest' has (up to point B), you are also saying "I have considered everything that happened between old-and-grotty and B when making this commit". What's implied by that statement is this, even though you do not explicitly say: I reject everything that happened on the development line that led to 'latest' up to point B since old-and-grotty was forked. This is not necessarily a bad thing, by the way. For somebody who is trying to maintain extremely-stable branch by cherry picking only changes in a few narrow areas from the mainline would _want_ to leave most of the "new good stuff" out from his branch. That's why I emphasized _my_ a few paragraphs above. But it is _so_ different from the mindset of usual "every branch makes progress _forward_ perhaps with different pace". In this example, this branch is actively choosing to stay behind and refusing to take changes from the 'latest'. So your users need to really understand what they are doing. For example, if there is another topic forked off of B (or at a later commit from there that leads to 'latest'), after your "funny merge" took place, even the usual merge strategies would work as expected by you --- it would still ignore the changes up to B because you told git to do so. Also, if you make a good change on top of the resulting merge that _should_ be applicable to some-change-foo which is based on the 'latest', you cannot merge that back in the usual way. Usual git merge will find your first "funny merge" as the merge base, and because it chooses to reject everything leading to B, the merge result would look very similar to the set of changes based on old-and-grotty. Actually, that would even fast forward to the version you made into a phony "merge" out of the cherry-picked result. But that is at least consistent with the statement you made when you created that commit. Staying behind at old-and-grotty suited _your_ branch'es purpose better than being based on 'latest'. And a person who is merging _your_ branch into some-change-foo, by choosing to merge that branch into the latter, is choosing to share your branch'es purpose, so it is natural a lot of the "good things" that happened up to B is rewound by that merge. So I think as long as you and your users understand what is going on, I do not see a problem at either the mechanical level or the philosophical level. But I am sure it would confuse a lot of people, so please do not come back complaining that you ended up getting your users heads explode ;-). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: What's the meaning of `parenthood' in git commits? 2006-11-08 1:13 ` Junio C Hamano @ 2006-11-08 1:36 ` Nix 0 siblings, 0 replies; 7+ messages in thread From: Nix @ 2006-11-08 1:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 8 Nov 2006, Junio C. Hamano spake thusly: > Nix <nix@esperi.org.uk> writes: > >> The idea being that if you have a tree like this: >> >> B >> ------------- ref trunks/latest >> \ >> ------ ref heads/some-change-foo >> >> ... -------- ref trunks/old-and-grotty >> >> >> then this merge strategy, when asked to merge heads/some-change-foo into >> trunks/old-and-grotty would spot that point B was the most recent >> merge point into anything in trunks/, generate a diff between point B >> and heads/some-change-foo, and patch it into trunks/old-and-grotty. > > This is a standard "cherry-picking" practice. Yes, pretty much, except that we do *everything* by cherry-picking, and we want to track the cherry-picks in the same way that all other changes are tracked (i.e., a small branch for each (numbered) change, patching madly in all directions into a variety of trunks and release branches, with all those patches tracked.) > These commits I list as its parents of this new commit, and > everything that leads to them, are what I considered when > derived this commit. This new child commit of them suits the > purpose of _my_ branch better than any of these parent > commits I took into consideration because of such and such > reasons that I stated in its commit log message. > > If you mark the resulting commit on old-and-grotty to have > some-change-foo as one of its parents, because some-change-foo > has almost everything 'latest' has (up to point B), you are also > saying "I have considered everything that happened between > old-and-grotty and B when making this commit". Yeah. This is the merge-base tracking that Linus mentioned, and it's not quite what I'm looking for :/ it's a sort of step-parent, really... > What's implied by that statement is this, even though you do not > explicitly say: > > I reject everything that happened on the development line > that led to 'latest' up to point B since old-and-grotty was > forked. (which is not necessarily true: we might want to backport an earlier change, also on another `small change branch', later on. Stuff on the trunks themselves will never want to get backported, but if the merge-base algorithm traverses patch-merge parent links, it might consider that a `small change branch' has been merged when it actually hasn't.) > This is not necessarily a bad thing, by the way. For somebody > who is trying to maintain extremely-stable branch by cherry > picking only changes in a few narrow areas from the mainline > would _want_ to leave most of the "new good stuff" out from his > branch. That's why I emphasized _my_ a few paragraphs above. That's exactly what we're doing, across-the-board. > But it is _so_ different from the mindset of usual "every branch > makes progress _forward_ perhaps with different pace". In this > example, this branch is actively choosing to stay behind and > refusing to take changes from the 'latest'. So your users need > to really understand what they are doing. *hahahaaaaa*... hang on, that *was* a joke, right? ;) > So I think as long as you and your users understand what is > going on, I do not see a problem at either the mechanical level > or the philosophical level. But I am sure it would confuse a > lot of people, so please do not come back complaining that you > ended up getting your users heads explode ;-). OK, I think I need to find a way to notate in the patch-merged commit that one or more parents should be disregarded when searching for merge bases (and *only* when searching for merge bases). I think that will do what's wanted in all areas: i.e., it'll act like a cherry-pick that shows up in the logs/revlist and so on, but doesn't affect the semantics of later merges of stuff from anywhere except for the same limited branch. (obviously trying to patch-merge B to A twice is always going to fail, whether or not merge-base traversal jumps into B: I don't think there's any real need to protect against that.) -- Rich industrial heritage: lifeless wasteland. `The land ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-11-08 3:05 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-08 0:39 What's the meaning of `parenthood' in git commits? Nix 2006-11-08 0:52 ` Jakub Narebski 2006-11-08 0:58 ` Linus Torvalds 2006-11-08 1:28 ` Nix 2006-11-08 3:04 ` Nix 2006-11-08 1:13 ` Junio C Hamano 2006-11-08 1:36 ` Nix
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).