Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
@ 2006-04-29 16:51 linux
  2006-04-29 17:35 ` Linus Torvalds
  2006-04-29 18:27 ` Jakub Narebski
  0 siblings, 2 replies; 16+ messages in thread
From: linux @ 2006-04-29 16:51 UTC (permalink / raw)
  To: git

Boy, this is an interesting discussion!
On the one hand, it seems "obvious" to me that extra links might be
useful.  But Linus's minimalist points have a lot of merit.

I have to agree, it's important to think of a single practical use before
adding the feature.  So let's do a little brainstorming...

For just referring to another commit, there's no problem putting
it in the body.  A sensible porcelain GUI will, when it seems something
that looks like an object identifier in a comment, and that object
identifier exists, make it a clickable link.  So a comment like:

"This fixes the same problem as <commit>, but is a cleaner
(albeit more invasive) fix."

Would do the right thing: the user reading it could easily jump
to the other comment.  A "header" link as opposed to a "comment"
link just has the property of being unambiguous.  No heuristic
will guess that a link should exist when there isn't.

So, what is that property useful for?

Now, one thing that porcelains provide, in addition to "parent" links,
is "child" links.  Useful.  But it could be done with commit comment
links as well, and it's not clear that having the link in the commit
header as opposed to the comment would help much.  You still have to
find and uncompress part of each commit to generate the history
tree.  Does uncompressing the rest of it and running a heuristic
over the text for really cost that much?

I'm not convinced it's needed for that feature.  (I'd sooner argue for
never compressing commit objects in packs on the grounds that the
repeated uncompression while browsing is worth saving more than the
relatively minor disk space.)

So to be valuable, and inadvisable to express with a specially
formatted comment, it has to be something that would be Very Bad
to get wrong.  What qualifies?

Maybe some merge algorithm information?  If the merge could be told that
this change "is the same" as that change, so it can be skipped when
cherry-picking that branch, and the information was wrong, that could
cause lots of problems.

But given that git-cherry already uses (imperfect) heuristics to
detect already-merged patches, and they seem to work well enough, is
that a strong enough argument?  Is there some other merge application
where it would help?

Now, the "this other object should exist in the repository, and it's an
error if you can't fetch it" link obviously needs to be unambiguously
distinguished from, say, a reference to the (Linux kernel) dodecapus merge
in a git tree checkin comment.  But, as Linus says, what reason is there
for including it?  What do you need the commit in the repository for?

Well, the only reason that you need ANY commit in the repository is
because it's part of history, and comparing it with other versions is
meaningful.  So what trees, not already in the ancestry graph of a
given commit, are useful to compare to?  In particular, useful for some
automated process; manual comparisons can always be done manually.

Nothing's jumping out at me.  Any suggestions?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-29 16:51 [RFC] [PATCH 0/5] Implement 'prior' commit object links (and linux
@ 2006-04-29 17:35 ` Linus Torvalds
  2006-04-29 18:07   ` Jakub Narebski
  2006-04-29 18:27 ` Jakub Narebski
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2006-04-29 17:35 UTC (permalink / raw)
  To: linux; +Cc: git

On Sat, 29 Apr 2006, linux@horizon.com wrote:
> 
> Well, the only reason that you need ANY commit in the repository is
> because it's part of history, and comparing it with other versions is
> meaningful.  So what trees, not already in the ancestry graph of a
> given commit, are useful to compare to?  In particular, useful for some
> automated process; manual comparisons can always be done manually.
> 
> Nothing's jumping out at me.  Any suggestions?

The only thing that I've ever wondered about is the "base commit of a 
merge".

Now, the thing is, we can always compute it. That's true _iff_ we've 
merged using the standard merge mechanism, but it wasn't always true 
historically (eg the original merges were computed with the original 
"git-merge-base" algorithm, which just picked the _first_ merge base it 
would find, while these days we use multiple ones for criss-cross merges).

So I would not totally object if a merge algorithm added a

	merge-base <sha1>

notation. But while it _could_ be just a "note merge-base <sha1>", it 
should _not_ be a "link <sha1> merge-base".

Let me explain why I think there are differences between those three 
options, and why I actually think that two of them are "valid" ideas, 
while the third one is not.

 - Case 1: the

	merge-base <sha1>

   is a "valid" idea (where there might of course be more than one <sha1>, 
   and possibly more than one "merge-base" line: you'd have to have some 
   rule for what happens for a recursive merge), although it has the 
   generally big down-side of being redundant information in all current 
   setups.

   It's redundant, but at the same time it's information that in _theory_ 
   might not be redundant, because I can see a situation where a merge was 
   forced by manually specifying a merge base (eg a special merge like the 
   original "gitk" merge, merging two initially unrelated projects 
   together).

   In theory. So it could be real information for a merge commit. And we'd 
   enforce some kind of real semantics for it - and it would have a really 
   solid technical meaning: assuming we define the multi-merge-base 
   semantics properly it would NEVER have any question about "what are 
   best practices?" or "what does this mean?".

   So this "case 1" actually has technical consequences, but you can, for 
   example, actually _check_ them. You can make fsck literally complain if 
   the merge base doesn't make sense. There's a clear "technical 
   violation", which might not be entirely trivial to figure out, but 
   thanks to it having a good meaning and a strict definition, it's 
   _there_.

Now, in all honesty, I don't think "case 1" is a _good_ thing to do. I'm 
just saying that I wouldn't be as upset about it as I've been over this 
"link" discussion. The reason I think "case 1" sucks is simply that I 
think you can in _practice_ get all the benefits much better with "case 
2", even if that one doesn't imply any actual git semantics:

 - Case 2: the

	note merge-base <sha1>

   thing is _also_ a perfectly valid idea, because now it's also very 
   well-defined: the "note" part tells you that git doesn't actually 
   impose any semantics what-so-ever on it, so it's really just a comment, 
   and as in case 1 above, once you see it as a comment, the _meaning_ of 
   it is immediately clear. It's literally just a note from the merge 
   algorithm saying "I used this as a merge base".

   The "note" syntax actually has a huge advantage. When you see it as a 
   comment from the merge algorithm, you immediately think it might also 
   be a good idea to add a few other notes. So a merge commit might 
   actually have

	note merge-algorithm recursive
	note merge-conflicts none
	note merge-base <sha1>

   all make total sense. It's telling you what the algorithm used was, and 
   that it didn't neen any manual fixups. It's also telling you that none 
   of this has _any_ impact what-so-ever from a "git semantics" angle, and 
   that this is nothing but a note for anybody who starts digging into it.

So now I've shown _two_ examples of some kind of header that I think 
actually makes sense, and that I would not argue against on those grounds. 
Especially the "note" thing I think is fine. So why, oh why, do I hate the 
"link" thing so much?

 - Case 3: the

	link <sha1> merge-base

   thing is a horrible and nasty thing that we should never ever support.

   Why? Because it's literally designed to both have some semantic meaning 
   ("git will fetch the <sha1> and use it for connectivity analysis") 
   _and_ at the same time the whole syntax it's designed to _not_ have any 
   real meaning ("you can have any kind of link, and I don't know what 
   it actually means from a conceptual standpoint").

   So it has a meaning from an _implementation_ angle, but at the same 
   time it does not have a "higher cause". That is EVIL. When they say 
   "The road to hell is paved with good intentions", the implication there 
   is not that good intentions is bad per se, but that you should 
   understand that there are "Unintended Consequences".

   And if you cannot limit the thing to a very _specific_ higher-level 
   meaning, you by definition will have those "unintended consequences".

In short, the difference between three headers that on the face of it say 
exactly the same thing: "merge-base <sha1>", "note merge-base <sha1>", and 
"link merge-base <sha1>" is not that they have different syntax (hey, even 
the syntax itself is almost identical), but exactly the fact that they 
have different implications and _meaning_.

Two of the three have no unintended consequences. One ("note") has no 
technical "consequences" at _all_, by definition. The other "merge-base" 
has no technical "unintended" at all, because it's throught through, and 
has been fully defined.

The third? "unintended consequences". It doesn't have a clear definition 
("It's cool. You can use it for any link you want"). So pretty much BY 
DESIGN, it's set up so that you don't know what the consequences of it 
will be for a project.

And that's why "case 3" it's bad. Even though it looks very much like the 
two other ones.

			Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-29 17:35 ` Linus Torvalds
@ 2006-04-29 18:07   ` Jakub Narebski
  2006-04-29 19:30     ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Narebski @ 2006-04-29 18:07 UTC (permalink / raw)
  To: git

Linus Torvalds wrote:

>  - Case 1: the
> 
>         merge-base <sha1>
[...]
>  - Case 2: the
> 
>         note merge-base <sha1>
[...]
>  - Case 3: the
> 
>         link <sha1> merge-base
[...]

> In short, the difference between three headers that on the face of it say
> exactly the same thing: "merge-base <sha1>", "note merge-base <sha1>", and
> "link merge-base <sha1>" is not that they have different syntax (hey, even
> the syntax itself is almost identical), but exactly the fact that they
> have different implications and _meaning_.
> 
> Two of the three have no unintended consequences. One ("note") has no
> technical "consequences" at _all_, by definition. The other "merge-base"
> has no technical "unintended" at all, because it's throught through, and
> has been fully defined.
> 
> The third? "unintended consequences". It doesn't have a clear definition
> ("It's cool. You can use it for any link you want"). So pretty much BY
> DESIGN, it's set up so that you don't know what the consequences of it
> will be for a project.
> 
> And that's why "case 3" it's bad. Even though it looks very much like the
> two other ones.

IF (and that is big if) git commit header will be extended to have some
extra "link" (enforcing connectivity) headers, like proposed "bind" for
subprojects, "prev" for pu-like union branches, "merge-base" for merges,
there would be repeated work on enforcing connectivity. Hence generic
"link" header (formerly "related") proposal. Having fsck report broken
links (or not), having purge removing commits (objects) reachable only via
"link" headers, having pull download commits via "link" headers... have I
forgot anything? It _seems_ that this part is common, and does not depend
on semantics.

But with "links" (connectivity headers) there always would be some other
consequences. For example info/grafts deals for now only with commit
parents, and extending the format could be difficult.

And of course if we want connectivity, this is for some reason, so the
"link" has some other consequences, for example "prev" and "merge-base" for
merging, "bind" for checkout, merge (but differently), etc.

I think that if it is 'helper' information (i.e. information which is
helpful, but we can do without it) and of no real importance to user then
use "note". If it is of importance to user (for example "cherrypick" or
"reverted") and of use to git, then repeat such info in "note" header to
avoid relying on parsing free-form part aka. commit comment. If
connectivity is needed... hmmm...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-29 16:51 [RFC] [PATCH 0/5] Implement 'prior' commit object links (and linux
  2006-04-29 17:35 ` Linus Torvalds
@ 2006-04-29 18:27 ` Jakub Narebski
  2006-04-29 20:44   ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Jakub Narebski @ 2006-04-29 18:27 UTC (permalink / raw)
  To: git

On Sat, 29 Apr 2006, linux@horizon.com wrote:
> 
> Well, the only reason that you need ANY commit in the repository is
> because it's part of history, and comparing it with other versions is
> meaningful.  So what trees, not already in the ancestry graph of a
> given commit, are useful to compare to?  In particular, useful for some
> automated process; manual comparisons can always be done manually.
> 
> Nothing's jumping out at me.  Any suggestions?

See below.

Not necessary all those require connectivity.
Most of them are not my ideas.

 * "prior" - heads that represent topic branch merges

    This is the "pu" branch case, where the head is a merge of several
    topic branches that is continually moved forward.

    topic branches     head
      ,___.   ,___.
     | TA1 | | TB1 |
      `---'   `---'    ,__.
         ^\_____^\____| H1 |
                       `--'

    + some topic branch changes and a republish:

      ,___.   ,___.
     | TA1 | | TB1 |
      `---'   `---'^   ,__.
        |^\_____^\____| H1 |
        |       |      `--'
      ,_|_.   ,_|_.      P
     | TA2 | | TB2 |     |
      `---'   `---'^     |
        ^       ^        |
      ,_|_.     |        |
     | TA3 |    |        |
      `---'     |      ,__.
         ^\______\____| H2 |
                       `--'

    key:  ^ = parent   P = prior


 * "bind" - for subprojects

   bind links from master project commit to externally managed embedded
   third-party project, for example Linux kernel for some mainly userspace
   project, or library or engine for some application. Additionally it
   provides root dir where to attach subproject.

 
 * "original" for rebase

   before rebase:

             A---B---C topic
            /
           /
          /
     D---E---F---G master

   after rebase

              ------A---B---C
             /      ^   ^   ^ 
            /       :   :   :
           /        A'--B'--C' topic
          /       /
     D---E---F---G master


   where ':' denotes "original" link. Note that old branch is not pointed by
   any head, and would be pruned without connectivity


 * "original" or "cherrypick" for cherry-picking

            A--------B---C bugfix
           /         ^
          /          :
     D---E---F---G---B'---H main


 * "revert" for reverting commits

-- 
Jakub Narebski

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-29 18:07   ` Jakub Narebski
@ 2006-04-29 19:30     ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2006-04-29 19:30 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> IF (and that is big if) git commit header will be extended to have some
> extra "link" (enforcing connectivity) headers, like proposed "bind" for
> subprojects, "prev" for pu-like union branches, "merge-base" for merges,
> there would be repeated work on enforcing connectivity. Hence generic
> "link" header (formerly "related") proposal.

The "link <sha1> <type> <meta>" header extension was done
primarily for that reason this way.  I carried it in my "pu"
branch for a few days but Linus convinced me privately that it
was a bad idea, so it is not merged in "pu" anymore.  Just to
make it easy for people to view what we are discussing, I pushed
the branch head to jc/bind-2 topic branch, but the code will
_not_ be merged.

The code in commit.c to recognize and link the releated objects
pointed by the "link" header to the commit looked like below
(see 11bbee26 commit on that branch):

+       optr = &item->links;
+       while (!memcmp(bufptr, "link ", 5)) {
+               struct object *object;
+
+               if (!get_sha1_hex(bufptr + 5, parent) &&
+                   bufptr[45] == ' ' &&
+                   (object = lookup_unknown_object(parent)) != NULL) {
+                       struct object_list *l = xmalloc(sizeof(*l));
+                       l->item = object;
+                       l->next = *optr;
+                       l->name = NULL;
+                       *optr = l;
+                       optr = &l->next;
+                       n_refs++;
+                       bufptr += 45;
+               }
+               else
+                       return error("bad link in commit %s",
+                                    sha1_to_hex(item->object.sha1));
+               while (*bufptr++ != '\n')
+                       ; /* skip over subdirectory name */
+       }

But if your are going to introduce "merge-base" and similar
headers that have impact to connectivity traversal code, you can
easily change the !memcmp(buptr, "link ", 5) with a sequence of
"memcmp(foo) || memcmp(bar) || ...", and use the "l->name" field
to point at the header itself, so that the user of the resulting
commit object can easily tell what kind of link-like header it
is, and enforce further semantics that are specific to each kind
of such header on it.  The revision traversal change that was
done in a later commit (7091fd commit) does not have to change.

The code sharing aspect you brought up is a very important
issue.  This is revision traversal, which is really the central
part of git and needs deep thought to touch without breaking, so
we would like to avoid risking breaking it by repeatedly
touching it.  But that can be done without making the recorded
header something like "link <sha1> <type> <metainfo>" which is
too generic.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-29 18:27 ` Jakub Narebski
@ 2006-04-29 20:44   ` Junio C Hamano
  2006-04-29 20:58     ` Jakub Narebski
  2006-05-01  0:05     ` Sam Vilain
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2006-04-29 20:44 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

>  * "prior" - heads that represent topic branch merges

This is not any different from usual "parent" at all (but you
have to think about it a bit to realize it).

Before talking about making a new commit object that links to
other related commits, let's first talk about what it means to
update the branch head ($GIT_DIR/refs/heads/<branch>) from
commit A to commit B.  Understanding what it means is more
fundamental.

A git "branch" points at the tip of one possible history of a
development.  As the often-used word "topic branch" tells you, a
"branch", i.e. that history, has a specific purpose.  The
purpose of my "master" branch is to give reasonably stable new
feature set and bugfixes, "next" to give testable ones, and "pu"
to collect remaining bits that are worthy of discussion.

When your branch head points at commit A and you update the head
to point at a different commit B, you are making this statement:

	The commit B suits the purpose of the branch better
	than the commit A.

Notice there may or may not be ancestry relation between these
two commits at this point of the discussion.  B may be a direct
child of commit A, a merge that has A as its first parent, a
merge that has A as its one of its parent (but not necessarily
the first), or a Nth-generation descendant if the update was a
fast forward merge from another branch.  It might even be an
ancestor if the update rewinds the history.

Among the above cases (and there may be others), in only two
cases you actually create a new commit to record that
statement [*1*].

The simplest case is when commit B is a direct, single-parent
child of commit A, and that statement is in your commit log
message.  "I started out from the commit A, and the result is
this tree.  The result suits what I am doing better than the
previous commit and I made the world a better place." -- the "I
started out from the commit A" part is on the parent header and
the rest is in the free-text.

When you are creating a merge of N parents, the principle is the
same.  Although in pure core-git terms all parents are equal, in
practice, the first parent has somewhat special meaning to you.
When the parents of commit B are A and X, you started out from
the commit A.  Then what are other parents?  You can read such a
commit this way:

	I started out from commit A and came up with this tree,
	which suits my purpose better.  While doing so, I have
	also considered what X has; and this result, commit B,
	suits my purpose better than X, too.

This is why a later merge with another branch that further
builds on top of X works so well.

    ----A----B
            /
       ----X----Y

If somebody built Y on X independently from us, when we merge
with Y, we say the merge base is X because B says "I've already
considered what X has" to do a 3-way merge.  While that is what
happens at the mechanical level, what is happening at the
philosophical level is we are taking "I consider that B is
better than X", part of the message seriously, which means "I
want to keep changes I made between X B".  Also the other person
who made Y made a similar statement that she considers Y is
better than X, and we try to preserve the changes between X and
Y in the automated part of the merge while preparing the tree to
commit the merge between B and Y.

Once you start reading the commit parent to mean " considering
what all of these commits have, what this new commit has suits
my purpose better", it becomes clear that the "previous" pointer
for a branch like my "pu" is just another "parent".

I rebuild "pu" from the tip of then-current "next", and merge
other topics in, and discard the previous "pu".  So it results
in this kind of graph:

                         o---o---o---o---o (updated "pu")
                        /   /   /   /  
        ---o---o---o---o
            \               \   \   \   \
             o---------------o---o---o---o (previous "pu")

But theoretically, I could include the previous "pu" tip as one
of the parents of the updated "pu" branch.

At the mechanical level, I start from then-current "next" and
merge each topic branch one-by-one on top of it.  But at the
philosophical level, what I am doing is to publish material that
shows a set of proposed changes that are more appropriate for
review by the curious than the previous round of "pu" head used
to have.  So the previous "pu" _is_ in the consideration while I
publish the updated "pu", although it is _not_ recorded anywhere.

After I come up with a fully merged tree, I could make a fake
Octopus that has the previous "pu" as its first parent and each
of the topic branch heads merged as second and subsequent
parents, with the resulting tree.  That would be more "honest"
at the philosophical level.

I am not going to actually suggest anybody doing this as a good
practice, but we can make such a commit with the current tool
like this:

        git checkout pu
	git tag -f prev-pu		;# remember where we were
	git reset --hard next		;# start at next
        git pull . topic-1		;# merge all remaining topics
        git pull . topic-2		;# ...
        git pull . topic-3
        ...
        git tag -f next-pu		;# this tree is what we want
        git reset --hard prev-pu	;# start from previous
        git pull --no-commit -s ours . next topic-1 topic-2 ...
	git read-tree -m -u next-pu	;# record a merge whose first
	git commit			;# parent is previous pu and
					;# has all the topics merged.

[Footnote]

*1* IOW, we _are_ losing some information by not recording the
fact that fast-forward was done while doing so.  

That record should _not_ be in the commit chain.  At the
mechanical level, recording that in the commit chain means two
criss-crossing branches never converge at the commit chain
level, which is already bad.  At the philosophical level, the
commit chain is a mesh of many possible "global" histories, and
the record that somebody (a particular branch in a particular
repository) was at what point in the mesh at given time does not
belong there.

But from the repository-owner's point of view, that _might_ be a
useful information to keep.  I am just saying this preemptively
so that if somebody wants to record it, that should not be
recorded in the commit object.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-29 20:44   ` Junio C Hamano
@ 2006-04-29 20:58     ` Jakub Narebski
  2006-04-30 15:21       ` Jakub Narebski
  2006-05-01  0:05     ` Sam Vilain
  1 sibling, 1 reply; 16+ messages in thread
From: Jakub Narebski @ 2006-04-29 20:58 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>>  * "prior" - heads that represent topic branch merges
> 
> This is not any different from usual "parent" at all (but you
> have to think about it a bit to realize it).
[cut]
Thanks for an explanation.

I would say that "prior" is not THAT different from usual "parent",
rather than it is not ANY different.

My doubts about recording previous head of a "union" (pu-like) branch 
is that for merge (e.g. 'pu' to 'next', cherrypick to/from 'pu', 'pu'
rebase) is that for merge algorithm all parents are equivalent, with
eventual exception of first which can be treated special ('ours').

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-29 20:58     ` Jakub Narebski
@ 2006-04-30 15:21       ` Jakub Narebski
  2006-04-30 23:19         ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Narebski @ 2006-04-30 15:21 UTC (permalink / raw)
  To: git

Jakub Narebski wrote:

> Junio C Hamano wrote:
> 
>> Jakub Narebski <jnareb@gmail.com> writes:
>> 
>>>  * "prior" - heads that represent topic branch merges
>> 
>> This is not any different from usual "parent" at all (but you
>> have to think about it a bit to realize it).
> [cut]
> Thanks for an explanation.
> 
> I would say that "prior" is not THAT different from usual "parent",
> rather than it is not ANY different.
> 
> My doubts about recording previous head of a "union" (pu-like) branch
> is that for merge (e.g. 'pu' to 'next', cherrypick to/from 'pu', 'pu'
> rebase) is that for merge algorithm all parents are equivalent, with
> eventual exception of first which can be treated special ('ours').

Additionally with "prior" (or at least some convention on which of parents
is to prior head of "union (pu-like) branch) I think we could fast-forward
such branches...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-30 15:21       ` Jakub Narebski
@ 2006-04-30 23:19         ` Junio C Hamano
  2006-05-01  0:50           ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2006-04-30 23:19 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

>>> This is not any different from usual "parent" at all (but you
>>> have to think about it a bit to realize it).
>> 
>> I would say that "prior" is not THAT different from usual "parent",
>> rather than it is not ANY different.
>> 
>> My doubts about recording previous head of a "union" (pu-like) branch
>> is that for merge (e.g. 'pu' to 'next', cherrypick to/from 'pu', 'pu'
>> rebase) is that for merge algorithm all parents are equivalent, with
>> eventual exception of first which can be treated special ('ours').
>
> Additionally with "prior" (or at least some convention on which of parents
> is to prior head of "union (pu-like) branch) I think we could fast-forward
> such branches...

This is why I said you have to think about it a bit to realize
that the "prior" is not _ANY_ different from the ordinary parent
for something like "pu".

We can fast-forward if (1) you pulled from "pu" the last time,
and (2) you haven't added anything on top of it on your own, and
(3) you pull from "pu" again, if the previous "pu" (i.e. your
"pu") is a parent of the updated "pu".  We do not need "prior"
for that.  The old "pu" being _one_ _of_ the parents, not even
necessarily be the first one, would do just fine.

If you have built on top of the last "pu", obviously we do not
want to fast-forward with or without "prior".

Your doubts about the merge is also unfounded.  The current "pu"
head is (against my own recommendation not to do so) a hydra
cap.  It is a direct child of the previous "pu" that merges all
the leftover bits along with what was in 'next' when the commit
was made, so you could do something like this to experiment:

	git branch test-1 pu^1
	echo >>Makefile '# End of Makefile'
        git commit -m 'build on top of previous "pu"' Makefile
        git pull . pu ;# Merge whatever happened in "pu"

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-29 20:44   ` Junio C Hamano
  2006-04-29 20:58     ` Jakub Narebski
@ 2006-05-01  0:05     ` Sam Vilain
  1 sibling, 0 replies; 16+ messages in thread
From: Sam Vilain @ 2006-05-01  0:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, git

Junio C Hamano wrote:

>> * "prior" - heads that represent topic branch merges
>>    
>>
>
>This is not any different from usual "parent" at all (but you
>have to think about it a bit to realize it).
> [...]
>Once you start reading the commit parent to mean " considering
>what all of these commits have, what this new commit has suits
>my purpose better", it becomes clear that the "previous" pointer
>for a branch like my "pu" is just another "parent".
>  
>

How can you look back at the merge history and determine which of these
scenarios is the case?

It still looks like to me that you are recording two distinct types of
parent using the same type of link.  You're now just expanding the
definition of parent so they look to be the same.

Actually it might be alright if you have an extra merge commit object. 
ie, make a complete merge of the new tips, then make a second merge that
merges the two heads.  It's still a little bit of a research topic to
look at that mess and figure out which type of relationship each parent
actually is, but if you really want to decide that is that and done is
done then I guess we'll all just have to live with it or fork.

Sam.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-04-30 23:19         ` Junio C Hamano
@ 2006-05-01  0:50           ` Junio C Hamano
  2006-05-01  1:25             ` Sam Vilain
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2006-05-01  0:50 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> We can fast-forward if (1) you pulled from "pu" the last time,
> and (2) you haven't added anything on top of it on your own, and
> (3) you pull from "pu" again, if the previous "pu" (i.e. your
> "pu") is a parent of the updated "pu".  We do not need "prior"
> for that.  The old "pu" being _one_ _of_ the parents, not even
> necessarily be the first one, would do just fine.

This part may want a bit more elaboration.  

Often, we see in the Linus kernel tree a fast forward of his tip
from a recent commit Linus made to bunch of networking commits
made by David S Miller.  For example, Linus fast forwarded to
18118c from David's tree before making this commit:

    commit 454ac778459bc70f0a9818a6a8fd974ced11de66
    Merge: 18118cd... 301dc3e...
    Author:     Linus Torvalds <torvalds@g5.osdl.org>
    AuthorDate: Mon Apr 24 20:08:08 2006 -0700
    Commit:     Linus Torvalds <torvalds@g5.osdl.org>
    CommitDate: Mon Apr 24 20:08:08 2006 -0700

The first parent of this commit is one not made by Linus; that
is how we can tell he fast forwarded.  We cannot easily tell
where the tip of Linus tree was before he made this fast forward
(it is not recorded anywhere), but if we look at 18118c commit:

    commit 18118cdbfd1f855e09ee511d764d6c9df3d4f952
    Author:     Patrick McHardy <kaber@trash.net>
    AuthorDate: Mon Apr 24 17:18:59 2006 -0700
    Commit:     David S. Miller <davem@sunset.davemloft.net>
    CommitDate: Mon Apr 24 17:27:34 2006 -0700

        [NETFILTER]: ipt action: use xt_check_target for basic verification

we could sort-of make a guess, by looking at merge-base of
18118c and 301dc3.  By looking at

	gitk 6b426e..18118c 454ac7

we can tell that David "forked" from Linus at 6b426e commit.

What does it mean for Linus to fast-forward to the tip of David?
Earlier I said that each branch has a purpose, and replacing the
current tip commit of the branch with another commit is a
statement by the repository owner that the new commit suits the
purpose of the branch better.

To David, the commits he has in the chain between 6b426e to
18118c obviously suited the purpose of his tree better, and that
was why these commits were made.  And the fact Linus fast
forwarded to the tip of David is an implicit statement by Linus
that that results suits the purpose of Linus tree better as well
compared to his old tip, presumably 6b426e.

Earlier I suggested (or at least may have sounded as if I was
suggesting) that not recording that statement in fast-forward
situation was a bad thing, but that is not necessarily so.
Having 18118c commit as part of the history that leads to the
tip is enough as such a statement by Linus.

Now, David's tree has a tendency to be extra clean (no merges
but straight commits on top of then-current tip of Linus), but
if he had his own merge from Linus's tree, such a commit would
have had a commit from Linus tree as its second parent.  If
Linus tip remained at that "second parent" commit until David is
done and asked Linus to pull, it would result in a fast forward
via non-first-parent ancestry.  But even if that happened, the
above discussion still applies.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-05-01  0:50           ` Junio C Hamano
@ 2006-05-01  1:25             ` Sam Vilain
  2006-05-01  4:44               ` Jakub Narebski
  2006-05-01  6:58               ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Sam Vilain @ 2006-05-01  1:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, git

Junio C Hamano wrote:

>>We can fast-forward if (1) you pulled from "pu" the last time,
>>and (2) you haven't added anything on top of it on your own, and
>>(3) you pull from "pu" again, if the previous "pu" (i.e. your
>>"pu") is a parent of the updated "pu".  We do not need "prior"
>>for that.  The old "pu" being _one_ _of_ the parents, not even
>>necessarily be the first one, would do just fine.
>>    
>>
>
>This part may want a bit more elaboration.  
>
>Often, we see in the Linus kernel tree a fast forward of his tip
>from a recent commit Linus made to bunch of networking commits
>made by David S Miller.  For example, Linus fast forwarded to
>18118c from David's tree before making this commit:
> [...]
>To David, the commits he has in the chain between 6b426e to
>18118c obviously suited the purpose of his tree better, and that
>was why these commits were made.  And the fact Linus fast
>forwarded to the tip of David is an implicit statement by Linus
>that that results suits the purpose of Linus tree better as well
>compared to his old tip, presumably 6b426e.
>  
>

Aha, now I see reason in the madness. So, the "prior" head is not stored
in the trees, and tracking the progress of actual head transitions is
loosely defined / a research topic. But demonstrably derivable. That
works for me.

Sam.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-05-01  1:25             ` Sam Vilain
@ 2006-05-01  4:44               ` Jakub Narebski
  2006-05-01  6:58               ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2006-05-01  4:44 UTC (permalink / raw)
  To: git

Take a look at complexity of that explanation. And the need for additional
commit. That balanced against all the headaches of having connectivity
header other than "parent".

Perhaps it would be better (and easier) just to say

   note prior parent^1

or

   note prior <sha1>

repeating <sha1> found in parent.


Just a thought.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-05-01  1:25             ` Sam Vilain
  2006-05-01  4:44               ` Jakub Narebski
@ 2006-05-01  6:58               ` Junio C Hamano
  2006-05-02  0:21                 ` Sam Vilain
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2006-05-01  6:58 UTC (permalink / raw)
  To: Sam Vilain; +Cc: git, Jakub Narebski

Sam Vilain <sam@vilain.net> writes:

> Junio C Hamano wrote:
>
>>To David, the commits he has in the chain between 6b426e to
>>18118c obviously suited the purpose of his tree better, and that
>>was why these commits were made.  And the fact Linus fast
>>forwarded to the tip of David is an implicit statement by Linus
>>that that results suits the purpose of Linus tree better as well
>>compared to his old tip, presumably 6b426e.
>
> Aha, now I see reason in the madness. So, the "prior" head is not stored
> in the trees, and tracking the progress of actual head transitions is
> loosely defined / a research topic. But demonstrably derivable. That
> works for me.

I do not think there is any madness involved here, but I should
point out that the above example happens to work only because
Linus and David are two different people.  If Linus did the
David's work in a separate repository, or even in the same
repository but on a separate branch, people following the Linus
tip might still want to know about the fast-forward, but that is
something you cannot truly tell by the digging like what I did
in the previous message.

That is why I earlier said this:

    *1* IOW, we _are_ losing some information by not recording the
    fact that fast-forward was done while doing so.  

    That record should _not_ be in the commit chain.  At the
    mechanical level, recording that in the commit chain means two
    criss-crossing branches never converge at the commit chain
    level, which is already bad.  At the philosophical level, the
    commit chain is a mesh of many possible "global" histories, and
    the record that somebody (a particular branch in a particular
    repository) was at what point in the mesh at given time does not
    belong there.

    But from the repository-owner's point of view, that _might_ be a
    useful information to keep.  I am just saying this preemptively
    so that if somebody wants to record it, that should not be
    recorded in the commit object.

I do not think the commit object is the place to record it, even
with a purely-comment field like "note prior".  The commit
ancestry DAG is global in nature, and the information under
discussion, "before pointing at this commit, the branch that
made this commit happened to point at this other commit", is
not.  That information describes only one-branch's view of the
world, and would not work in the fast-forward case because no
new commit is created.  An important property of a fast-forward
is that we do not create an extra commit object that makes it
impossible for two criss-crossing branches to ever converge.

On the other hand, a "note" field that records on which branch
of which repository each commit was made (you need to give each
repository-branch an UUID) when you do create a new commit would
be a sensible thing to have if somebody cares deeply enough.  It
is an information that is global in nature, and with that, you
could do the digging like I did without relying on the committer
identity, but instead using the branch identity.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-05-01  6:58               ` Junio C Hamano
@ 2006-05-02  0:21                 ` Sam Vilain
  2006-05-02  7:08                   ` Martin Langhoff
  0 siblings, 1 reply; 16+ messages in thread
From: Sam Vilain @ 2006-05-02  0:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jakub Narebski

Junio C Hamano wrote:

>>Aha, now I see reason in the madness. So, the "prior" head is not stored
>>in the trees, and tracking the progress of actual head transitions is
>>loosely defined / a research topic. But demonstrably derivable. That
>>works for me.
>>    
>>
>I do not think there is any madness involved here, but I should
>  
>

Sorry, it was a figure of speech. It's more like, what appeared to be
madness no longer looks so.

>point out that the above example happens to work only because
>Linus and David are two different people.  If Linus did the
>David's work in a separate repository, or even in the same
>repository but on a separate branch, people following the Linus
>tip might still want to know about the fast-forward, but that is
>something you cannot truly tell by the digging like what I did
>in the previous message.
>
>That is why I earlier said this:
>
>    *1* IOW, we _are_ losing some information by not recording the
>    fact that fast-forward was done while doing so.  
>
>    That record should _not_ be in the commit chain.  At the
>    mechanical level, recording that in the commit chain means two
>    criss-crossing branches never converge at the commit chain
>    level, which is already bad.
>  
>

Here I'm a little bit confused still. Surely criss-crossing branches
already don't converge unless the commits are in the same order.

Oh, I see. Even if they *are* in the same order, the commit IDs would
end up different due to these extra headers.

>  At the philosophical level, the
>    commit chain is a mesh of many possible "global" histories, and
>    the record that somebody (a particular branch in a particular
>    repository) was at what point in the mesh at given time does not
>    belong there.
>
>    But from the repository-owner's point of view, that _might_ be a
>    useful information to keep.  I am just saying this preemptively
>    so that if somebody wants to record it, that should not be
>    recorded in the commit object.
>  
>

That makes sense.

>On the other hand, a "note" field that records on which branch
>of which repository each commit was made (you need to give each
>repository-branch an UUID) when you do create a new commit would
>be a sensible thing to have if somebody cares deeply enough.  It
>is an information that is global in nature, and with that, you
>could do the digging like I did without relying on the committer
>identity, but instead using the branch identity.
>  
>

That sounds reasonable. The UUID doesn't need to replicate, either, just
tag the commits that were made against it.

This extra information falls into the informational, "forensic" history
tracing category. ie, we don't know now whether we'll need it, but we'll
store it anyway just to be sure to not make later operations impossible.

I think the large remaining question is around what conventions apply to
the use of the "note" field. We have perhaps the first example of a well
formed piece of "forensic" information that belongs in the commit chain
and could possibly be added by plumbing. I can't think of any more of
those, but the rename/copy tracking case is a bit different. In this
case, it doesn't belong in the plumbing, yet you want a reasonable
convention for storing this information to apply. Also the other cases
outlined in the original post might do well to have a common convention
so that the information is more portable between porcelain.

Sam.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
  2006-05-02  0:21                 ` Sam Vilain
@ 2006-05-02  7:08                   ` Martin Langhoff
  0 siblings, 0 replies; 16+ messages in thread
From: Martin Langhoff @ 2006-05-02  7:08 UTC (permalink / raw)
  To: Sam Vilain; +Cc: Junio C Hamano, git, Jakub Narebski

On 5/2/06, Sam Vilain <sam@vilain.net> wrote:
> Here I'm a little bit confused still. Surely criss-crossing branches
> already don't converge unless the commits are in the same order.

They do under GIT. No matter how much you criss-cross, every time you
identify a merge base for the next merge, you are identifying the last
commit in common on both branches.

While maybe you didn't have that commit being the tip of a head in
your repo, it _is_ the last common point. If your criss-crossing is
messy and a few commits are out of order or cherry picked, git-merge
has a good chance of spotting it. The whole mechanism tends pulls
quite consistently towards convergence.

If the notes in the commit msg aren't consistent and we lose the
natural tendency towards convergence that's a major drawback. On the
other hand, if two branches have exchanged patches "out of band",
git-merge still gets it right most of the time, so perhaps slightly
different headers in the commit messages are tolerable?

Junio had written:
> >On the other hand, a "note" field that records on which branch
> >of which repository each commit was made (you need to give each
> >repository-branch an UUID) when you do create a new commit would
> >be a sensible thing to have if somebody cares deeply enough.

I really don't like that -- goes against the grain of really simple,
portable repos. I cp -pr repo{,_tmp} all the time to do risky merges
or save a heavy download from a remote server. Let me run away from
this idea... quick before Linus kills us all ;-)

I did feel a couple of times the need of remembering where I had
checked this in -- but it went away quite quickly, must have been a
leftover of my Arch days ;-). And it actually got solved by agreeing
within my team to a commit message format pretty much like what's used
in the kernel. Because the truth is that most of my heads and branches
have very "local" names.

cheers,

martin

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-05-02  7:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-29 16:51 [RFC] [PATCH 0/5] Implement 'prior' commit object links (and linux
2006-04-29 17:35 ` Linus Torvalds
2006-04-29 18:07   ` Jakub Narebski
2006-04-29 19:30     ` Junio C Hamano
2006-04-29 18:27 ` Jakub Narebski
2006-04-29 20:44   ` Junio C Hamano
2006-04-29 20:58     ` Jakub Narebski
2006-04-30 15:21       ` Jakub Narebski
2006-04-30 23:19         ` Junio C Hamano
2006-05-01  0:50           ` Junio C Hamano
2006-05-01  1:25             ` Sam Vilain
2006-05-01  4:44               ` Jakub Narebski
2006-05-01  6:58               ` Junio C Hamano
2006-05-02  0:21                 ` Sam Vilain
2006-05-02  7:08                   ` Martin Langhoff
2006-05-01  0:05     ` Sam Vilain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).