git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* What's the meaning of `parenthood' in git commits?
@ 2006-11-08  0:39 Nix
  2006-11-08  0:52 ` Jakub Narebski
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Nix @ 2006-11-08  0:39 UTC (permalink / raw)
  To: git

So I'm back on the weird porcelain I mentioned months and months ago,
the one which treats source trees as named collections of patches merged
together in different ways, almost like stgit on steroids, only not.

It occurred to me recently that packed refs provide about 50% of what I
need (efficient handling of lots and lots of refs); most of the other
50% consits of a new extremely weird git merge strategy,
`git-merge-patched', which merges branches A and B by finding the most
recent merge-base between branch B and any branch listed in
.git/refs/trunks (`trunks' being a directory holding heads which are
treated this way by this weird merge strategy; the porcelain will have
to keep it up to date, which shouldn't be too terribly hard), and
patch(1)ing the diff between that base and the tip of branch B into
branch A. (A patch rejection, of course, means merge-by-hand and commit,
as usual with merge conflicts.)

The idea being that if you have a tree like this:

     B
------------- ref trunks/latest
     \
      ------ ref heads/some-change-foo

 ... -------- ref trunks/old-and-grotty


then this merge strategy, when asked to merge heads/some-change-foo into
trunks/old-and-grotty would spot that point B was the most recent
merge point into anything in trunks/, generate a diff between point B
and heads/some-change-foo, and patch it into trunks/old-and-grotty.

(I *know* this is really weird, but I've got a choice of doing this or
continuing to use SCCS with the world's most horrible shell script
wrapper as the source code repository for ~5Gb of source, with tens of
thousands of files in a flat directory structure, expanded to 50Gb
because we're storing binary files in there by the astonishingly
inefficient means of uuencoding them and sccsing the result: you may be
sick now. I know which I'd prefer. I may be distorting git into
something unrecognisable to its own father but it's that or I go insane
*and* run out of disk space.)


After all that setup, my question's simple. Does a `parent' in git
terminology simply mean `this commit was derived in some way from the
commit listed here'? If so, I suppose I can list heads/some-change-foo
as one parent on these merge commits, even though the `merging'
mechanism is so odd that I expect to be pelted with rotten vegetables as
soon as I post this.

But it's that or SCCS.

(Of course this will go into a public git repository for people to laugh
at. I don't expect anyone to actually *use* it.)

-- 
Rich industrial heritage: lifeless wasteland. `The land

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the meaning of `parenthood' in git commits?
  2006-11-08  0:39 What's the meaning of `parenthood' in git commits? Nix
@ 2006-11-08  0:52 ` Jakub Narebski
  2006-11-08  0:58 ` Linus Torvalds
  2006-11-08  1:13 ` Junio C Hamano
  2 siblings, 0 replies; 7+ messages in thread
From: Jakub Narebski @ 2006-11-08  0:52 UTC (permalink / raw)
  To: git

Nix wrote:

> After all that setup, my question's simple. Does a `parent' in git
> terminology simply mean `this commit was derived in some way from the
> commit listed here'? If so, I suppose I can list heads/some-change-foo
> as one parent on these merge commits, even though the `merging'
> mechanism is so odd that I expect to be pelted with rotten vegetables as
> soon as I post this.

Yes, being parent means that this commit was derived in some way from the
commit listed here. It needs not to be this commit is the result of merge
of commits listed here... there was a discussion some time ago to use one
of parents (first for example) instead of special header for "prev" link to
previous value of the ref (which discussion was obsoleted by reflog).

It provies two things you have to think about if to use 'parenthood' for
something a bit unexpected. First, parents are connectivity, so even if you
delete trunks/some-name and then prune, averything that was merged in some
branch or tag which lives still wouldn't get pruned. Second, the
information about merges is used in merge strategies: consider if having
this information would help your strange merge strategy.

And of course there is a question if the graph as visualized by for example
gitk would have more sense or not with the "strange merges" marked as
merges. 
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the meaning of `parenthood' in git commits?
  2006-11-08  0:39 What's the meaning of `parenthood' in git commits? Nix
  2006-11-08  0:52 ` Jakub Narebski
@ 2006-11-08  0:58 ` Linus Torvalds
  2006-11-08  1:28   ` Nix
  2006-11-08  1:13 ` Junio C Hamano
  2 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2006-11-08  0:58 UTC (permalink / raw)
  To: Nix; +Cc: git



On Wed, 8 Nov 2006, Nix wrote:
> 
> [ Nix explains what he's doing now with SCCS ]: you may be
> sick now.

Wow. You've got some strange setup there, Nix.

> After all that setup, my question's simple. Does a `parent' in git
> terminology simply mean `this commit was derived in some way from the
> commit listed here'?

Well, strictly speaking, git doesn't itself assign much any real meaning 
to "parent" at all. It has the obvious meanings:

 - the parent pointers act as reachability graph edges (so fsck cares 
   about it a lot, of course)

 - listing the "log" of a commit will show everything reachable from that 
   commit and it's parents, of course (with the commit date-stamp being 
   used as a "ordering" when having multiple choices of commits to show)

 - it has the obvious meanings for the "revision arithmetic", ie revision 
   name parsing (ie "commit~3^2")

 - parenthood will be used to show the diff ("git show", "git log -p" and 
   friends)

 - the "merge-base" algorithms obviously use it to find the most recent 
   common ancestor, and that in turn impacts the normal merge strategies, 
   of course.

so parenthood does obviously have a number of very specific technical 
meanings for different programs, but at the same time, no, git doesn't 
really "care". You can happily generate your own parenthood if you want 
to, and git will just continue to follow the above rules.

> If so, I suppose I can list heads/some-change-foo as one parent on these 
> merge commits, even though the `merging' mechanism is so odd that I 
> expect to be pelted with rotten vegetables as soon as I post this.

Yeah, git won't care. If you screw up parenthood, you have a few problems:

 - the diffs may look really strange. In particular, if you list multiple 
   parents, the git "diff" functions will all just assume that it's a 
   merge, and a "git show" will start showing the combined diff (which is 
   usually empty).

   So if you end up having multiple parents, not because it was "really" a 
   merge, but because you use the other parent pointer to point to some 
   "source" for the patch, things like "git log -p" won't give nice output 
   any more. You need to manually ask for the diff with something like

	# show diff from second parent
	git diff commit^2..commit

   instead.

 - listing too _few_ parents is potentially more serious, if you have 
   reachability issues (ie you wanted to keep the other source around, but 
   since you didn't list it as a parent, git won't know that it had 
   anything to do with your commit, so it may be pruned away unless you 
   have some other way to reach it)

but if you just have a really strange merge algorithm, and the _data_ 
associated with the parents is "surprising" from the standpoint of the 
default merge, git really won't care at all.

Your usage does sound a bit strange.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the meaning of `parenthood' in git commits?
  2006-11-08  0:39 What's the meaning of `parenthood' in git commits? Nix
  2006-11-08  0:52 ` Jakub Narebski
  2006-11-08  0:58 ` Linus Torvalds
@ 2006-11-08  1:13 ` Junio C Hamano
  2006-11-08  1:36   ` Nix
  2 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2006-11-08  1:13 UTC (permalink / raw)
  To: Nix; +Cc: git

Nix <nix@esperi.org.uk> writes:

> The idea being that if you have a tree like this:
>
>      B
> ------------- ref trunks/latest
>      \
>       ------ ref heads/some-change-foo
>
>  ... -------- ref trunks/old-and-grotty
>
>
> then this merge strategy, when asked to merge heads/some-change-foo into
> trunks/old-and-grotty would spot that point B was the most recent
> merge point into anything in trunks/, generate a diff between point B
> and heads/some-change-foo, and patch it into trunks/old-and-grotty.

This is a standard "cherry-picking" practice.

> After all that setup, my question's simple. Does a `parent' in git
> terminology simply mean `this commit was derived in some way from the
> commit listed here'?

When you think about commit ancestry, think of it this way:

   These commits I list as its parents of this new commit, and
   everything that leads to them, are what I considered when
   derived this commit.  This new child commit of them suits the
   purpose of _my_ branch better than any of these parent
   commits I took into consideration because of such and such
   reasons that I stated in its commit log message.

If you mark the resulting commit on old-and-grotty to have
some-change-foo as one of its parents, because some-change-foo
has almost everything 'latest' has (up to point B), you are also
saying "I have considered everything that happened between
old-and-grotty and B when making this commit".

What's implied by that statement is this, even though you do not
explicitly say:

   I reject everything that happened on the development line
   that led to 'latest' up to point B since old-and-grotty was
   forked.

This is not necessarily a bad thing, by the way.  For somebody
who is trying to maintain extremely-stable branch by cherry
picking only changes in a few narrow areas from the mainline
would _want_ to leave most of the "new good stuff" out from his
branch.  That's why I emphasized _my_ a few paragraphs above.

But it is _so_ different from the mindset of usual "every branch
makes progress _forward_ perhaps with different pace".  In this
example, this branch is actively choosing to stay behind and
refusing to take changes from the 'latest'.  So your users need
to really understand what they are doing.  For example, if there
is another topic forked off of B (or at a later commit from
there that leads to 'latest'), after your "funny merge" took
place, even the usual merge strategies would work as expected by
you --- it would still ignore the changes up to B because you
told git to do so.

Also, if you make a good change on top of the resulting merge
that _should_ be applicable to some-change-foo which is based on
the 'latest', you cannot merge that back in the usual way.
Usual git merge will find your first "funny merge" as the merge
base, and because it chooses to reject everything leading to B,
the merge result would look very similar to the set of changes
based on old-and-grotty.  Actually, that would even fast forward
to the version you made into a phony "merge" out of the
cherry-picked result.

But that is at least consistent with the statement you made when
you created that commit.  Staying behind at old-and-grotty
suited _your_ branch'es purpose better than being based on
'latest'.  And a person who is merging _your_ branch into
some-change-foo, by choosing to merge that branch into the
latter, is choosing to share your branch'es purpose, so it is
natural a lot of the "good things" that happened up to B is
rewound by that merge.

So I think as long as you and your users understand what is
going on, I do not see a problem at either the mechanical level
or the philosophical level.  But I am sure it would confuse a
lot of people, so please do not come back complaining that you
ended up getting your users heads explode ;-).



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the meaning of `parenthood' in git commits?
  2006-11-08  0:58 ` Linus Torvalds
@ 2006-11-08  1:28   ` Nix
  2006-11-08  3:04     ` Nix
  0 siblings, 1 reply; 7+ messages in thread
From: Nix @ 2006-11-08  1:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 8 Nov 2006, Linus Torvalds uttered the following:
> On Wed, 8 Nov 2006, Nix wrote:
>> 
>> [ Nix explains what he's doing now with SCCS ]: you may be
>> sick now.
>
> Wow. You've got some strange setup there, Nix.

It's what happens when a version-control system gets implemented as
an emergency hack when moving from VMS, by people who don't really
grok Unix shell scripting... and then you let fifteen years pass,
and nobody dares touch the hack because it's so damned delicate.
It took months of agony to implement crude half-functional branching
in this. Writing a git porcelain should be vastly simpler, even with
the overhead of a conversion tool as well.

Writing that conversion tool will be fun :( e.g. I'm going to have to
identify branches by diffing/xdeltaing each version of a file with every
single previous version of that file, and if the diff is smallest
against a version other than the immediate ancestor, it's assumed to be
a branch against that version. (I'm going to have to fake up packed refs
for these tiny branches so that they're at least accessible in
emergencies, gah.)

It's all, well, nasty. But all will be so much happier in the shining
world of git.

>> After all that setup, my question's simple. Does a `parent' in git
>> terminology simply mean `this commit was derived in some way from the
>> commit listed here'?
>
> Well, strictly speaking, git doesn't itself assign much any real meaning 
> to "parent" at all. It has the obvious meanings:

Oh *good*, that's what I thought.

[snip more things which match my understanding]

>  - parenthood will be used to show the diff ("git show", "git log -p" and 
>    friends)

I'll list the patch-merged parent as the second parent, so that you'll only
get the mostly-useless huge diff from that if you actually ask for it, and
will get a more useful result with ^.

>  - the "merge-base" algorithms obviously use it to find the most recent 
>    common ancestor, and that in turn impacts the normal merge strategies, 
>    of course.

Hm, yeah, if merging iterates down patch-merged branches it might have
interesting consequences, because the trees on one side of patch- merges
are likely to be very different to trees on the other side (years of
development separate them). I'd like a way to specify that those parents
are *not* to be traversed by the merge-base algorithms, really.

A series of

not-merge-base: <sha1 id>

headers, perhaps? (I think that's likely to involve much less code churn
than introducing a new `not-merge-base-parent' tag).

> Yeah, git won't care. If you screw up parenthood, you have a few problems:
>
>  - the diffs may look really strange. In particular, if you list multiple 
>    parents, the git "diff" functions will all just assume that it's a 
>    merge, and a "git show" will start showing the combined diff (which is 
>    usually empty).

It is a merge, so that's right. It's just a rather odd merge.

(I don't envisage actual *changes* being made in these commits except to
resolve conflicts.)

>    So if you end up having multiple parents, not because it was "really" a 
>    merge, but because you use the other parent pointer to point to some 
>    "source" for the patch, things like "git log -p" won't give nice output 
>    any more. You need to manually ask for the diff with something like

Well, I was envisaging that the other parent pointer would point to the
tip of the changes tree. Going back to that graph again:

     B
------------- ref trunks/latest
     \
      ------ ref heads/some-change-foo

 ... -------- ref trunks/old-and-grotty

The idea is that the patch-merge of trunks/old-and-grotty and
heads/some-change-foo would consist textually of the diff between B and
heads/some-change-foo, applied to trunks/old-and-grotty, and would list
as its parents trunks/old-and-grotty, *and heads/some-change-foo*.

(Perhaps this isn't really a merge after all? Should merge parents be
treated as differently as this? It'll all be covered over by the
porcelain in any case: it won't be possible to confuse a trunk/ with a
normal head and accidentally patch-merge in the wrong direction.)

>  - listing too _few_ parents is potentially more serious, if you have 
>    reachability issues (ie you wanted to keep the other source around, but 
>    since you didn't list it as a parent, git won't know that it had 
>    anything to do with your commit, so it may be pruned away unless you 
>    have some other way to reach it)

Yeah, that would be bad.

> but if you just have a really strange merge algorithm, and the _data_ 
> associated with the parents is "surprising" from the standpoint of the 
> default merge, git really won't care at all.

Good.

> Your usage does sound a bit strange.

Agreed. But there are hundreds of people banging on my door asking for a
proper version control system, quilt isn't a proper version control
system in that sense, and stgit has... issues when you try to distribute
it and when you have a lot of people working on one tree at once: plus
it doesn't fit our weird workflow with multiple parallel release
branches, at least one active development trunk, and all changes done
under a carefully-controlled bug tracking system (it's as if *every*
change has a bugzilla ticket associated, *always*, and we expect to be
able to get from ticket to change efficiently).

(We do both distribution and working-copy-sharing: the trees are too
large to have one tree per person, not least because each tree requires
an entire Oracle instance of its own to play with and massive amounts of
memory; and we have geographically distributed sites with trees of their
own.)

-- 
Rich industrial heritage: lifeless wasteland. `The land

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the meaning of `parenthood' in git commits?
  2006-11-08  1:13 ` Junio C Hamano
@ 2006-11-08  1:36   ` Nix
  0 siblings, 0 replies; 7+ messages in thread
From: Nix @ 2006-11-08  1:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 8 Nov 2006, Junio C. Hamano spake thusly:
> Nix <nix@esperi.org.uk> writes:
>
>> The idea being that if you have a tree like this:
>>
>>      B
>> ------------- ref trunks/latest
>>      \
>>       ------ ref heads/some-change-foo
>>
>>  ... -------- ref trunks/old-and-grotty
>>
>>
>> then this merge strategy, when asked to merge heads/some-change-foo into
>> trunks/old-and-grotty would spot that point B was the most recent
>> merge point into anything in trunks/, generate a diff between point B
>> and heads/some-change-foo, and patch it into trunks/old-and-grotty.
>
> This is a standard "cherry-picking" practice.

Yes, pretty much, except that we do *everything* by cherry-picking, and
we want to track the cherry-picks in the same way that all other changes
are tracked (i.e., a small branch for each (numbered) change, patching
madly in all directions into a variety of trunks and release branches,
with all those patches tracked.)

>    These commits I list as its parents of this new commit, and
>    everything that leads to them, are what I considered when
>    derived this commit.  This new child commit of them suits the
>    purpose of _my_ branch better than any of these parent
>    commits I took into consideration because of such and such
>    reasons that I stated in its commit log message.
>
> If you mark the resulting commit on old-and-grotty to have
> some-change-foo as one of its parents, because some-change-foo
> has almost everything 'latest' has (up to point B), you are also
> saying "I have considered everything that happened between
> old-and-grotty and B when making this commit".

Yeah. This is the merge-base tracking that Linus mentioned, and it's not
quite what I'm looking for :/ it's a sort of step-parent, really...

> What's implied by that statement is this, even though you do not
> explicitly say:
>
>    I reject everything that happened on the development line
>    that led to 'latest' up to point B since old-and-grotty was
>    forked.

(which is not necessarily true: we might want to backport an earlier
change, also on another `small change branch', later on. Stuff on the
trunks themselves will never want to get backported, but if the
merge-base algorithm traverses patch-merge parent links, it might
consider that a `small change branch' has been merged when it actually
hasn't.)

> This is not necessarily a bad thing, by the way.  For somebody
> who is trying to maintain extremely-stable branch by cherry
> picking only changes in a few narrow areas from the mainline
> would _want_ to leave most of the "new good stuff" out from his
> branch.  That's why I emphasized _my_ a few paragraphs above.

That's exactly what we're doing, across-the-board.

> But it is _so_ different from the mindset of usual "every branch
> makes progress _forward_ perhaps with different pace".  In this
> example, this branch is actively choosing to stay behind and
> refusing to take changes from the 'latest'.  So your users need
> to really understand what they are doing.

*hahahaaaaa*... hang on, that *was* a joke, right? ;)

> So I think as long as you and your users understand what is
> going on, I do not see a problem at either the mechanical level
> or the philosophical level.  But I am sure it would confuse a
> lot of people, so please do not come back complaining that you
> ended up getting your users heads explode ;-).

OK, I think I need to find a way to notate in the patch-merged commit
that one or more parents should be disregarded when searching for merge
bases (and *only* when searching for merge bases). I think that will
do what's wanted in all areas: i.e., it'll act like a cherry-pick
that shows up in the logs/revlist and so on, but doesn't affect the
semantics of later merges of stuff from anywhere except for the
same limited branch.

(obviously trying to patch-merge B to A twice is always going to
fail, whether or not merge-base traversal jumps into B: I don't
think there's any real need to protect against that.)

-- 
Rich industrial heritage: lifeless wasteland. `The land

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: What's the meaning of `parenthood' in git commits?
  2006-11-08  1:28   ` Nix
@ 2006-11-08  3:04     ` Nix
  0 siblings, 0 replies; 7+ messages in thread
From: Nix @ 2006-11-08  3:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On 8 Nov 2006, nix@esperi.org.uk spake thusly:
> On 8 Nov 2006, Linus Torvalds uttered the following:
>>  - the "merge-base" algorithms obviously use it to find the most recent 
>>    common ancestor, and that in turn impacts the normal merge strategies, 
>>    of course.
>
> Hm, yeah, if merging iterates down patch-merged branches it might have
> interesting consequences, because the trees on one side of patch- merges
> are likely to be very different to trees on the other side (years of
> development separate them). I'd like a way to specify that those parents
> are *not* to be traversed by the merge-base algorithms, really.
>
> A series of
>
> not-merge-base: <sha1 id>
>
> headers, perhaps? (I think that's likely to involve much less code churn
> than introducing a new `not-merge-base-parent' tag).

Wrong. Sort of.

When doing normal merges you don't want to consider patch-merged parents
as real merges: but there is one situation when you *do* want merge-base
checking to traverse such links.

Say you have the tree just described:

     B
------------- ref trunks/latest
     \
      ------ ref heads/some-change-foo

 ... -------- ref trunks/old-and-grotty

and you want to patch-merge heads/some-change-foo with
trunks/old-and-grotty.

It doesn't quite apply, so you end up with a conflict-resolution. This
will normally be in the merge commit, but there's no guarantee of that:
perhaps you knew the source tree would conflict in advance and fixed it
up so that it wouldn't, leaving the old heads/some-change-foo pointing
before that fixup:

     B
------------- ref trunks/latest
     \
      ------- ref heads/some-change-foo
          D \
            c
            |
 ... -------------- ref trunks/old-and-grotty

Later on, you find a bug in that change. It's still the same conceptual
change, so you fix it, and you want to patch-merge the fix across:

     B
------------- ref trunks/latest
     \
      -----------\ ref heads/some-change-foo
          C \    .
            c    . (link under construction)
            |    .
 ... -------------- ref trunks/old-and-grotty
            E    F

What patch-merge must do in order to produce a diff-merge at point F is
therefore rather more involved than I'd hoped:

 - determine B as above (most recent merge-base of heads/some-change-foo
   with anything in trunks/).

 - determine the merge-base of trunks/old-and-grotty with
   heads/some-change-foo, *traversing patch-merge parents*. Call this
   base C. (This is the only circumstance in which merge-base
   determination should traverse patch-merged parents.)

 - Iff that base C is topologically a child of B, then we have already
   merged part of this change in the past. In that case, instead of the
   merge consisting of the diff between B and F, it consists of the diff
   between C and the head, minus the set of changes c. So it remains to
   determine c.

 - scan backwards along F with git-rev-list, searching specifically for
   the most recent patch-merge naming any commit which has C as a
   transitive parent: that is point E. (Such a point must exist as long
   as only patch-merges have been used to merge heads/some-change-foo
   with trunks/old-and-grotty: if other sorts of merge have been used,
   all bets are off and I think we can legitimately fail the merge.)
   (This requires the ability to distinguish patch-merges from normal
   merges, but that's easy if we have any tag at all to distinguish
   them, which we must for merge- base traversal to avoid such parents
   normally.)

 - Reverse out the diff between C and E (if the two are not the same
   commit) and remember it temporarily as c.

 - Apply the forwards diff between point C and heads/some-change-foo,
   and then apply c in the forwards direction (if c is already present,
   this is not an error: it just means that whatever conflict-
   resolution was necessary as a one-off was later needed on the change
   trunk).

I think that should cope with just about everything. I've tried to mock
up all sorts of contrived trees and I can't find anything that doesn't
reduce to that case or a simplification of it. (And no, this case is not
contrived: we test on trunks, so we deal with it whenever anything fails
testing and has to be fixed...)

(Now all I have to do is write it... enough words, time for action.
Actually time for sleep, it's three in the morning here. Action
tomorrow.)

-- 
Rich industrial heritage: lifeless wasteland. `The land

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-11-08  3:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-08  0:39 What's the meaning of `parenthood' in git commits? Nix
2006-11-08  0:52 ` Jakub Narebski
2006-11-08  0:58 ` Linus Torvalds
2006-11-08  1:28   ` Nix
2006-11-08  3:04     ` Nix
2006-11-08  1:13 ` Junio C Hamano
2006-11-08  1:36   ` Nix

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).