[RFC] origin link for cherry-pick and revert

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] origin link for cherry-pick and revert
@ 2008-09-09 13:22 Stephen R. van den Berg
  2008-09-09 13:38 ` Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 13:22 UTC (permalink / raw)
  To: git

I've read and digested the old threads about prior and related links.
Here's a new proposal which should be able to pass muster, if I read all
the relevant suggestions and objections in the old threads:

Consider an origin field as such:

commit bbb896d8e10f736bfda8f587c0009c358c9a8599
tree b83f28279a68439b9b044bccc313bbeaa3e973f5
parent ed0f47a8c431f27e0bd131ea1cf9cabbd580745b
origin d2b9dff8a08cc2037a7ba0463e90791f07cb49dd
origin a1184d85e8752658f02746982822f43f32316803 2
author Junio C Hamano <gitster@pobox.com> 1220132115 -0700
committer Junio C Hamano <gitster@pobox.com> 1220153445 -0700

The definition of the origin field reads as follows:

- There can be an arbitrary number of origin fields per commit.
  Typically there is going to be at most one origin field per commit.

- At the time of creation, the origin field contains a hash B which refers
  to a reachable commit pair (B, B~1).  If B has multiple parents and the pair
  being referred to needs to be e.g. (B, B~2), then the hash is followed by
  a space and followed by an integer (base10, two in this case),
  which designates the proper parentnr of B (see: mainline in git
  cherry-pick/revert).

- In an existing repository gc/prune shall not delete commits being
  referred to by origin links.

- During fetch/push/pull the full commit including the origin fields is
  transmitted, however, the objects the origin links are referring to
  are not (unless they are being transmitted because of other reasons).

- When fetching/pulling it is optionally possible to tell git to
  actually transmit objects referred to by origin links even if it would
  otherwise not have done so.

- git cherry-pick/revert allow for the creation of origin links only if
  the object they are referring to is presently reachable.

- git fsck will traverse origin links, but will stay silent if the
  object an origin link points to is unreachable (kind of like a shallow
  repository).

- git rev-list --topo-order will take origin links into account to
  ensure proper ordering.

- gitk allows for (e.g.) dotted lines to show the origin links.

- git log would show something like:

  commit bbb896d8e10f736bfda8f587c0009c358c9a8599
  Origin: d2b9dff..53d1589
  Origin: a1184d8..e596cdd
  Author: Junio C Hamano <gitster@pobox.com>
  Date:   Sat Aug 30 14:35:15 2008 -0700

  Note that for easy viewing: git diff d2b9dff..53d1589
  will show the exact diff the origin link is referring to.

- git log --graph will show a dotted line of somesort just like gitk.

- git blame will follow and use the origin link if the object exists.

- git merge disregards the whole origin field entirely, just like all
  the rest of git-core.

Anything I missed?
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg
@ 2008-09-09 13:38 ` Paolo Bonzini
  2008-09-09 14:04   ` Stephen R. van den Berg
  2008-09-09 13:48 ` Stephen R. van den Berg
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-09 13:38 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: git

> - At the time of creation, the origin field contains a hash B which refers
>   to a reachable commit pair (B, B~1).  If B has multiple parents and the pair
>   being referred to needs to be e.g. (B, B~2), then the hash is followed by
>   a space and followed by an integer (base10, two in this case),
>   which designates the proper parentnr of B (see: mainline in git
>   cherry-pick/revert).

What about just storing *two* hashes?  This way cherry-pick can store
B~1..B and revert can store B..B~1.  The two cases can be distinguished
by checking which commit is an ancestor of which.

> - git cherry-pick/revert allow for the creation of origin links only if
>   the object they are referring to is presently reachable.

Will cherry-pick -x create origin links?  Also, does the origin link
propagate through multiple cherry picks?  If not, how can the origin
object not be reachable?

> [snip good stuff]

git cherry will use origin links to mark a commit as present, and will
only use patch-ids for commits that have no origin links.  Bonus points
for an extra command-line/configuration option to only use origin links:

  --source=default	  << default: get setting from core.cherrysource
  --source=patch-id
  --source=origin
  --source=origin,patch-id

  core.cherrysource = patch-id
  core.cherrysource = origin
  core.cherrysource = origin,patch-id

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg
  2008-09-09 13:38 ` Paolo Bonzini
@ 2008-09-09 13:48 ` Stephen R. van den Berg
  2008-09-09 15:44 ` Jakub Narebski
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 13:48 UTC (permalink / raw)
  To: git

Stephen R. van den Berg wrote:
>Anything I missed?

I think I forgot two:
- git rebase will fixup any origin pointers which point back into the
  strain being rebased.

- git filter-branch will rewrite origin pointers which point to commits
  that receive a new hash.
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 13:38 ` Paolo Bonzini
@ 2008-09-09 14:04   ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 14:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: git

Paolo Bonzini wrote:
>> - At the time of creation, the origin field contains a hash B which refers
>>   to a reachable commit pair (B, B~1).  If B has multiple parents and the pair
>>   being referred to needs to be e.g. (B, B~2), then the hash is followed by
>>   a space and followed by an integer (base10, two in this case),
>>   which designates the proper parentnr of B (see: mainline in git
>>   cherry-pick/revert).

>What about just storing *two* hashes?  This way cherry-pick can store
>B~1..B and revert can store B..B~1.  The two cases can be distinguished
>by checking which commit is an ancestor of which.

Valid point, but consider:
The new commit to receive hash A.  The diff between A~1..A and B~1..B
actually defines the relation.  Revert and cherry-pick are symmetrical
operations as far as git is concerned since git tracks content.
So I'm not quite sure if we actually need this extra information, git
already knows it all.

>> - git cherry-pick/revert allow for the creation of origin links only if
>>   the object they are referring to is presently reachable.

>Will cherry-pick -x create origin links?

I'd propose a cherry-pick -o and revert -o for that.
I wouldn't want to force the text which -x generates into the commit
when origin links are used.

>  Also, does the origin link
>propagate through multiple cherry picks?

The origin link is created point-to-point from the object referenced by
cherry-pick/revert to the new commit.  The link creation specifically
does not follow any existing origin links.  If you want the origin link
to point to a deeper origin behind the current, then cherry-pick from
there, no need to fake it.

>  If not, how can the origin
>object not be reachable?

That can only happen during a fetch/pull, which doesn't use origin links
to determine transmittability by default.
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg
  2008-09-09 13:38 ` Paolo Bonzini
  2008-09-09 13:48 ` Stephen R. van den Berg
@ 2008-09-09 15:44 ` Jakub Narebski
  2008-09-09 16:38   ` Steven Grimm
  2008-09-09 19:43   ` Stephen R. van den Berg
  2008-09-09 21:13 ` Petr Baudis
  2008-09-10 20:32 ` [RFC] origin link for cherry-pick and revert Miklos Vajna
  4 siblings, 2 replies; 137+ messages in thread
From: Jakub Narebski @ 2008-09-09 15:44 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: git

"Stephen R. van den Berg" <srb@cuci.nl> writes:

> I've read and digested the old threads about prior and related links.
> Here's a new proposal which should be able to pass muster, if I read all
> the relevant suggestions and objections in the old threads:
> 
> Consider an origin field as such:
> 
> commit bbb896d8e10f736bfda8f587c0009c358c9a8599
> tree b83f28279a68439b9b044bccc313bbeaa3e973f5
> parent ed0f47a8c431f27e0bd131ea1cf9cabbd580745b
> origin d2b9dff8a08cc2037a7ba0463e90791f07cb49dd
> origin a1184d85e8752658f02746982822f43f32316803 2
> author Junio C Hamano <gitster@pobox.com> 1220132115 -0700
> committer Junio C Hamano <gitster@pobox.com> 1220153445 -0700
> 
> The definition of the origin field reads as follows:
> 
> - There can be an arbitrary number of origin fields per commit.
>   Typically there is going to be at most one origin field per commit.

I understand that multiple origin fields occur if you do a squash
merge, or if you cherry-pick multiple commits into single commit.
For example:
 $ git cherry-pick -n <a1>
 $ git cherry-pick    <a2>
 $ git commit --amend        #; to correct commit message

I'm not sure if you plan to automatically add 'origin' field for
rebase, and for interactive rebase...
 
> - At the time of creation, the origin field contains a hash B which refers
>   to a reachable commit pair (B, B~1).  If B has multiple parents and the pair
>   being referred to needs to be e.g. (B, B~2), then the hash is followed by
>   a space and followed by an integer (base10, two in this case),
>   which designates the proper parentnr of B (see: mainline in git
>   cherry-pick/revert).

I think you wanted to use "(B, B^2)", which mean B and second parent
of B.  B~2 means grandparent of B in the straight line:

      ... <--- B~2 <--- B^1 = B^ = B~1 <--- B
                                           /
                          ... <--- B^2 <--/

Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed
in the neighbouring subthread), which would mean together with
'parent <parent>' (assuming that there are no other parents; if they
are it gets even more complicated), that the following is true

  <current> ~= <parent> + (<sha2> - <sha1>),

where '<rev1> ~= <rev2>' means that <rev1> is based on <rev2> (perhaps
with some fixups, corrections or the like).  Perhaps 'origin' should
be then called 'changeset'.

It would also be easier on implementation to check if
'origin'/'changeset' weak links are not broken, and to get to know
which commits are to be protected against pruning than your proposal
of

  origin <"cousin" id> [<mainline = parent number>]

where <mainline> can be omitted if it is 1 (the default).


This can also lead to replacing

  origin <b> <a>
  origin <c> <b>

by

  origin <c> <a>

for squash merge, or squash in rebase interactive.

> - In an existing repository gc/prune shall not delete commits being
>   referred to by origin links.
> 
> - During fetch/push/pull the full commit including the origin fields is
>   transmitted, however, the objects the origin links are referring to
>   are not (unless they are being transmitted because of other reasons).
> 
> - When fetching/pulling it is optionally possible to tell git to
>   actually transmit objects referred to by origin links even if it would
>   otherwise not have done so.
>
> - git fsck will traverse origin links, but will stay silent if the
>   object an origin link points to is unreachable (kind of like a shallow
>   repository).

The above means that it is a 'weak' link, i.e. it is protecting
against pruning (perhaps influenced by some configuration variable),
but it is not considered an error for it to be broken.

> - git cherry-pick/revert allow for the creation of origin links only if
>   the object they are referring to is presently reachable.

Errr... shouldn't objects referenced by 'origin' links be reachable in
order for "cherry-pick" or "revert" to succeed?

On the other hand this leads to the following question: what happens
if you cherry-pick or revert a commit which has its own 'origin'
links?

> - git rev-list --topo-order will take origin links into account to
>   ensure proper ordering.

What do you mean by that?
 
> - gitk allows for (e.g.) dotted lines to show the origin links.
> 
> - git log would show something like:
> 
>   commit bbb896d8e10f736bfda8f587c0009c358c9a8599
>   Origin: d2b9dff..53d1589
>   Origin: a1184d8..e596cdd
>   Author: Junio C Hamano <gitster@pobox.com>
>   Date:   Sat Aug 30 14:35:15 2008 -0700
> 
>   Note that for easy viewing: git diff d2b9dff..53d1589
>   will show the exact diff the origin link is referring to.
> 
> - git log --graph will show a dotted line of somesort just like gitk.

That is I guess the whole and main reason for 'origin' links to exist,
as having this information in free-form part, i.e. in the commit
message might lead to problems (with parsing and extracting, and
finding spurious links).
 
> - git blame will follow and use the origin link if the object exists.

Hmmmm... I'm not sure about that.

> - git merge disregards the whole origin field entirely, just like all
>   the rest of git-core.

Unless of course one uses more complex merge strategy, which doesn't
take into account only endpoints (branches to be merged and merge
bases), but is also affected in some by history...
 
> Anything I missed?

How would git-rebase make use of 'origin' links.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 15:44 ` Jakub Narebski
@ 2008-09-09 16:38   ` Steven Grimm
  2008-09-09 19:43   ` Stephen R. van den Berg
  1 sibling, 0 replies; 137+ messages in thread
From: Steven Grimm @ 2008-09-09 16:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Stephen R. van den Berg, git

On Sep 9, 2008, at 8:44 AM, Jakub Narebski wrote:
> This can also lead to replacing
>
> origin <b> <a>
> origin <c> <b>
>
> by
>
> origin <c> <a>
>
> for squash merge, or squash in rebase interactive.

And, incidentally, the above representation will potentially mesh well  
with svn integration, making it possible to cleanly represent svn 1.5  
merge-tracking metadata directly in git.

> Unless of course one uses more complex merge strategy, which doesn't
> take into account only endpoints (branches to be merged and merge
> bases), but is also affected in some by history...

It does intuitively (but perhaps incorrectly) seem like the origin  
information could be used to make more intelligent decisions about  
automatic conflict resolution, if nothing else. Though obviously that  
might, as you suggest, be a pretty big departure from the way git  
merges currently work.

-Steve

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 15:44 ` Jakub Narebski
  2008-09-09 16:38   ` Steven Grimm
@ 2008-09-09 19:43   ` Stephen R. van den Berg
  2008-09-09 19:59     ` Jeff King
                       ` (2 more replies)
  1 sibling, 3 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 19:43 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski wrote:
>"Stephen R. van den Berg" <srb@cuci.nl> writes:
>> The definition of the origin field reads as follows:

>> - There can be an arbitrary number of origin fields per commit.
>>   Typically there is going to be at most one origin field per commit.

>I understand that multiple origin fields occur if you do a squash
>merge, or if you cherry-pick multiple commits into single commit.
>For example:
> $ git cherry-pick -n <a1>
> $ git cherry-pick    <a2>
> $ git commit --amend        #; to correct commit message

Correct.

>I'm not sure if you plan to automatically add 'origin' field for
>rebase, and for interactive rebase...

That is not part of the plan so far.
Can you explain what you would be expecting in the best case?

>> - At the time of creation, the origin field contains a hash B which refers
>>   to a reachable commit pair (B, B~1).  If B has multiple parents and the pair
>>   being referred to needs to be e.g. (B, B~2), then the hash is followed by
>>   a space and followed by an integer (base10, two in this case),
>>   which designates the proper parentnr of B (see: mainline in git
>>   cherry-pick/revert).

>I think you wanted to use "(B, B^2)", which mean B and second parent
>of B.  B~2 means grandparent of B in the straight line:

Correct, sorry about the confusion, I meant B^2 instead of B~2.

>Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed
>in the neighbouring subthread), which would mean together with
>'parent <parent>' (assuming that there are no other parents; if they
>are it gets even more complicated), that the following is true

>  <current> ~= <parent> + (<sha2> - <sha1>),

>where '<rev1> ~= <rev2>' means that <rev1> is based on <rev2> (perhaps
>with some fixups, corrections or the like).  Perhaps 'origin' should
>be then called 'changeset'.

The simplicity sounds inviting.  I'd like to hear from others who have
more experience (than I have) with the git vs. changeset paradigms about
this.  This allows a bit more flexibility in specifying the origin, the
question is if it's needed.

>It would also be easier on implementation to check if
>'origin'/'changeset' weak links are not broken, and to get to know
>which commits are to be protected against pruning than your proposal
>of
>  origin <"cousin" id> [<mainline = parent number>]
>where <mainline> can be omitted if it is 1 (the default).

On the contrary, my current proposal only needs to verify the validity
of a single commit, changing it like this will require the system to
verify the validity of two commits.  Given the rareness of the origin
links this will hardly present a problem, but it *does* increase
the overhead in checking a bit.

>This can also lead to replacing
>  origin <b> <a>
>  origin <c> <b>
>by
>  origin <c> <a>
>for squash merge, or squash in rebase interactive.

Ok, *that* is not possible with the original proposal.  This might just
be the reason why we'd like to go with the dual-hash link.

>> - git cherry-pick/revert allow for the creation of origin links only if
>>   the object they are referring to is presently reachable.

>Errr... shouldn't objects referenced by 'origin' links be reachable in
>order for "cherry-pick" or "revert" to succeed?

True.  But sometimes it's necessary to emphasize the obvious; call it a
preemptive strike against possible objections to the proposal.

>On the other hand this leads to the following question: what happens
>if you cherry-pick or revert a commit which has its own 'origin'
>links?

Nothing special.  cherry-pick/revert behave as if the existing origin links
were not present in the first place.

>> - git rev-list --topo-order will take origin links into account to
>>   ensure proper ordering.

>What do you mean by that?

The order in which commits are listed is defined by the fact that
descendent commits are shown before any of their parents.  The presence
of an origin link will make sure that the current commit will always
appear *before* the origin-commit it is referring to (if the
origin-commit is in the displayed set, that is).

>> - git log would show something like:

>>   commit bbb896d8e10f736bfda8f587c0009c358c9a8599
>>   Origin: d2b9dff..53d1589
>>   Origin: a1184d8..e596cdd
>>   Author: Junio C Hamano <gitster@pobox.com>
>>   Date:   Sat Aug 30 14:35:15 2008 -0700

>>   Note that for easy viewing: git diff d2b9dff..53d1589
>>   will show the exact diff the origin link is referring to.

>> - git log --graph will show a dotted line of somesort just like gitk.

>That is I guess the whole and main reason for 'origin' links to exist,
>as having this information in free-form part, i.e. in the commit
>message might lead to problems (with parsing and extracting, and
>finding spurious links).

Quite.  Also, having them in a well-defined place will allow for easy
fixups in case of rebase/filter-branch.

>> - git blame will follow and use the origin link if the object exists.

>Hmmmm... I'm not sure about that.

Care to explain your doubts?
The reason I want this behaviour, is because it's all about tracking
content, and that part of the content happens to come from somewhere
else, and therefore blame should look there to "dig deeper" into it.

>> - git merge disregards the whole origin field entirely, just like all
>>   the rest of git-core.

>Unless of course one uses more complex merge strategy, which doesn't
>take into account only endpoints (branches to be merged and merge
>bases), but is also affected in some by history...

Quite, but that is not a part of the definition of the origin field.
I can only try and make sure that we have a well-defined, well-behaved
mechanism in core git.  If someone wants to get creative with the
information presented, by all means, be my guest.

>> Anything I missed?

>How would git-rebase make use of 'origin' links.

As far as I can imagine, git rebase should alter the origin links during
rebase if they point to a commit within the strain being rebased.
Are there any other desirable use cases (for rebase)?
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 19:43   ` Stephen R. van den Berg
@ 2008-09-09 19:59     ` Jeff King
  2008-09-09 20:25       ` Stephen R. van den Berg
                         ` (2 more replies)
  2008-09-09 20:54     ` Jakub Narebski
  2008-09-09 23:35     ` Linus Torvalds
  2 siblings, 3 replies; 137+ messages in thread
From: Jeff King @ 2008-09-09 19:59 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, git

On Tue, Sep 09, 2008 at 09:43:54PM +0200, Stephen R. van den Berg wrote:

> >Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed
> 
> The simplicity sounds inviting.  I'd like to hear from others who have
> more experience (than I have) with the git vs. changeset paradigms about
> this.  This allows a bit more flexibility in specifying the origin, the
> question is if it's needed.

One thing to keep in mind is that you are not just proposing some new
behavior for a command, but rather a new header for the data structure
that we will live with from now until eternity. So I think it makes
sense to allow the general case even if nobody is generating it yet, if
there is some chance that it may be useful for somebody to generate in
the future.

And yes, you can get _too_ general to the point where your semantics
become meaningless. But I don't think that is the case here. You are
defining the origin field as "by the way, the difference between state X
and state Y was used to make this commit". cherry-pick just happens to
make Y=X^, but something like rebase could use a series.

As for "git vs changeset": this is git. So you have a sequence of tree
states whether that is what you want or not. Thus you are specifying
the difference between _some_ pair of commits. I don't see any benefit
to restricting it to a commit and one of its parents.

> On the contrary, my current proposal only needs to verify the validity
> of a single commit, changing it like this will require the system to
> verify the validity of two commits.  Given the rareness of the origin
> links this will hardly present a problem, but it *does* increase
> the overhead in checking a bit.

Actually, it could decrease it. If I tell you that you must have "X" and
"X^2", then you could get away with just checking if you have "X". But
you might also want to check whether "X" even _has_ a second parent. And
that means not just looking up the object, but accessing it (resolving
deltas if need be, uncompressing, parsing the object).  With "X" and
"Y", it is just two object lookups.

Now obviously you don't have to be quite so careful in the "hash plus
parent" case. And if you are going to _do_ anything with the origin
field, you will end up accessing those objects anyway. But in that case,
you end up with the same number of lookups and accesses anyway: 2 of
each.

> >On the other hand this leads to the following question: what happens
> >if you cherry-pick or revert a commit which has its own 'origin'
> >links?
> 
> Nothing special.  cherry-pick/revert behave as if the existing origin links
> were not present in the first place.

I think that is smart; if somebody wants to drill down into the history
of origin links, they can do so at lookup time.

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 19:59     ` Jeff King
@ 2008-09-09 20:25       ` Stephen R. van den Berg
  2008-09-09 20:42       ` Junio C Hamano
  2008-09-09 21:05       ` Junio C Hamano
  2 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 20:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Jakub Narebski, git

Jeff King wrote:
>On Tue, Sep 09, 2008 at 09:43:54PM +0200, Stephen R. van den Berg wrote:
>> >Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed

>> The simplicity sounds inviting.  I'd like to hear from others who have
>> more experience (than I have) with the git vs. changeset paradigms about
>> this.  This allows a bit more flexibility in specifying the origin, the
>> question is if it's needed.

>And yes, you can get _too_ general to the point where your semantics
>become meaningless. But I don't think that is the case here. You are
>defining the origin field as "by the way, the difference between state X
>and state Y was used to make this commit". cherry-pick just happens to
>make Y=X^, but something like rebase could use a series.

>As for "git vs changeset": this is git. So you have a sequence of tree
>states whether that is what you want or not. Thus you are specifying
>the difference between _some_ pair of commits. I don't see any benefit
>to restricting it to a commit and one of its parents.

Quite.  I'll drop the old format and adapt my proposal to use the double
hash.

As far as the naming of the field is concerned: a changeset is what the
field describes, but changeset implies no sense of direction; origin
makes it clear that the current commit was derived *from* the changeset
represented by "origin".
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 19:59     ` Jeff King
  2008-09-09 20:25       ` Stephen R. van den Berg
@ 2008-09-09 20:42       ` Junio C Hamano
  2008-09-09 20:47         ` Shawn O. Pearce
                           ` (2 more replies)
  2008-09-09 21:05       ` Junio C Hamano
  2 siblings, 3 replies; 137+ messages in thread
From: Junio C Hamano @ 2008-09-09 20:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Stephen R. van den Berg, Jakub Narebski, git

Jeff King <peff@peff.net> writes:

> And yes, you can get _too_ general to the point where your semantics
> become meaningless. But I don't think that is the case here. You are
> defining the origin field as "by the way, the difference between state X
> and state Y was used to make this commit". cherry-pick just happens to
> make Y=X^, but something like rebase could use a series.
>
> As for "git vs changeset": this is git. So you have a sequence of tree
> states whether that is what you want or not. Thus you are specifying
> the difference between _some_ pair of commits. I don't see any benefit
> to restricting it to a commit and one of its parents.

As for "by the way ... was used to make this commit": this is git.  So how
you arrived at the tree state you record in a commit *does not matter*.

Not only that, it is not just "the difference between state X and Y" that
you used to come to that tree.  Another thing that is involved is the
specific cherry-pick implementation back when the commit was made.  That
was what gave you the tree.

To my ears, it rhymes rather well with a famous quote from $gmane/217:

    You're freezing your (crappy) algorithm at tree creation time, and
    basically making it pointless to ever create something better later,
    because even if hardware and software improves, you've codified that
    "we have to have crappy information".

After reading the discussion so far, I am still not convinced if this is a
good idea, nor this time around it is that much different from what the
previous "prior" link discussion tried to do.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 20:42       ` Junio C Hamano
@ 2008-09-09 20:47         ` Shawn O. Pearce
  2008-09-09 20:50         ` Jeff King
  2008-09-10  0:13         ` Stephen R. van den Berg
  2 siblings, 0 replies; 137+ messages in thread
From: Shawn O. Pearce @ 2008-09-09 20:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Stephen R. van den Berg, Jakub Narebski, git

Junio C Hamano <gitster@pobox.com> wrote:
> 
> To my ears, it rhymes rather well with a famous quote from $gmane/217:
> 
>     You're freezing your (crappy) algorithm at tree creation time, and
>     basically making it pointless to ever create something better later,
>     because even if hardware and software improves, you've codified that
>     "we have to have crappy information".
> 
> After reading the discussion so far, I am still not convinced if this is a
> good idea, nor this time around it is that much different from what the
> previous "prior" link discussion tried to do.

Yup.  Same here.

I didn't see any information about why this "origin" link is
needed here, just how it might work.

And some of that "how" scared me because it was doing some sort of
"soft" reachability, where errors aren't noticed but we are expected
to protect the data from prune/repack forever once it has entered
the repository.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 20:42       ` Junio C Hamano
  2008-09-09 20:47         ` Shawn O. Pearce
@ 2008-09-09 20:50         ` Jeff King
  2008-09-09 22:35           ` Jakub Narebski
  2008-09-10  0:13         ` Stephen R. van den Berg
  2 siblings, 1 reply; 137+ messages in thread
From: Jeff King @ 2008-09-09 20:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stephen R. van den Berg, Jakub Narebski, git

On Tue, Sep 09, 2008 at 01:42:52PM -0700, Junio C Hamano wrote:

> As for "by the way ... was used to make this commit": this is git.  So how
> you arrived at the tree state you record in a commit *does not matter*.

But it _does_ matter, which is why we have commit messages to explain
how you arrived at this tree state.

Now, that being said:

> After reading the discussion so far, I am still not convinced if this is a
> good idea, nor this time around it is that much different from what the
> previous "prior" link discussion tried to do.

For the record, I am not convinced it is a good idea either; I was
hoping to steer it in a direction where somebody could say "and now this
is the useful thing we can do now that we could not do before." If the
ultimate goal is to put links to other commits into history viewers,
then the commit message is a reasonable place to do so. The only thing I
see improving with a header is that it makes more sense for pruning and
object transfer.

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 19:43   ` Stephen R. van den Berg
  2008-09-09 19:59     ` Jeff King
@ 2008-09-09 20:54     ` Jakub Narebski
  2008-09-09 23:08       ` Stephen R. van den Berg
  2008-09-09 23:35     ` Linus Torvalds
  2 siblings, 1 reply; 137+ messages in thread
From: Jakub Narebski @ 2008-09-09 20:54 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: git

On Tue, 9 Sep 2008, Stephen R. van den Berg wrote:
> Jakub Narebski wrote:
>>"Stephen R. van den Berg" <srb@cuci.nl> writes:
>>>
>>> The definition of the origin field reads as follows:
[...] 
>>> - There can be an arbitrary number of origin fields per commit.
>>>   Typically there is going to be at most one origin field per commit.
>> 
>> I understand that multiple origin fields occur if you do a squash
>> merge, or if you cherry-pick multiple commits into single commit.
[...]
>> I'm not sure if you plan to automatically add 'origin' field for
>> rebase, and for interactive rebase...
> 
> That is not part of the plan so far.
> Can you explain what you would be expecting in the best case?

After thinking about this a bit, I don't think that (recording
origin(al) commits) for rebased commits would be good idea.  While one
can reasonably expect that cherry-picked changes should stay, and
reverted changes even more so (usually one reverts commit from
a history), usually the original commits being rebased are meant
to be pruned; the are rebased.

>>> - At the time of creation, the origin field contains a hash B which refers
>>>   to a reachable commit pair (B, B~1).  If B has multiple parents and the pair
>>>   being referred to needs to be e.g. (B, B~2), then the hash is followed by
>>>   a space and followed by an integer (base10, two in this case),
>>>   which designates the proper parentnr of B (see: mainline in git
>>>   cherry-pick/revert).
[...]
>> Besides I very much prefer using 'origin <sha1> <sha2>' (as proposed
>> in the neighbouring subthread), which would mean together with
>> 'parent <parent>' (assuming that there are no other parents; if they
>> are it gets even more complicated), that the following is true
>> 
>>  <current> ~= <parent> + (<sha2> - <sha1>),
> 
>> where '<rev1> ~= <rev2>' means that <rev1> is based on <rev2> (perhaps
>> with some fixups, corrections or the like).  Perhaps 'origin' should
>> be then called 'changeset'.
> 
> The simplicity sounds inviting.  I'd like to hear from others who have
> more experience (than I have) with the git vs. changeset paradigms about
> this.  This allows a bit more flexibility in specifying the origin, the
> question is if it's needed.

It is the simplicity that it is the most compelling of this solution.
For revert we have "origin B B^", for cherry-pick we have "origin A^ A";
(or 'changeset') and always we have <rev> =~ <rev>^ + (<r2> - <r1>),
where '-' denote diff operation (<diff> = <tree1> - <tree2>), and '+'
denote patch application (<tree1> = <tree2> + <diff>).


[ADDED LATER]
Also it could be useful for patch management interfaces using Git
as engine, such as StGIT, Guilt (formerly gq), TopGit, or now defunct,
obsoleted and no longer maintained Patchy Git aka 'pg'.

The "weak" 'origin'/'changeset' header would allow some sort of
operating on patches instead of usual operating on tree states.

>> It would also be easier on implementation to check if
>> 'origin'/'changeset' weak links are not broken, and to get to know
>> which commits are to be protected against pruning than your proposal
>> of
>>
>>   origin <"cousin" id> [<mainline = parent number>]
>>
>> where <mainline> can be omitted if it is 1 (the default).
> 
> On the contrary, my current proposal only needs to verify the validity
> of a single commit, changing it like this will require the system to
> verify the validity of two commits.  Given the rareness of the origin
> links this will hardly present a problem, but it *does* increase
> the overhead in checking a bit.

Errr... wasn't you proposing to keep/protect against pruning <cousin>
AND <cousin>^<mainline>? You want to have _diff_ (changeset) protected,
not a single tree state.

And having "origin <r1> <r2>" makes it easier then to check validity; you
don't need to get <r1>, check if it has <mainline> parent and what it is,
and then check if <r1>^<mainline> exists (and is not for example behind
shallow clone barrier).

>>> - git cherry-pick/revert allow for the creation of origin links only if
>>>   the object they are referring to is presently reachable.
> 
>> Errr... shouldn't objects referenced by 'origin' links be reachable in
>> order for "cherry-pick" or "revert" to succeed?
> 
> True.  But sometimes it's necessary to emphasize the obvious; call it a
> preemptive strike against possible objections to the proposal.

I don't think that it is true in this case.  This sentence _looks_ like
it offers / requires additional protection, while this "protection" is
already ensured by the fact of cherry-picking or reverting a commit.

>> On the other hand this leads to the following question: what happens
>> if you cherry-pick or revert a commit which has its own 'origin'
>> links?
> 
> Nothing special.  cherry-pick/revert behave as if the existing origin links
> were not present in the first place.

O.K.

>>> - git rev-list --topo-order will take origin links into account to
>>>   ensure proper ordering.
>> 
>> What do you mean by that?
> 
> The order in which commits are listed is defined by the fact that
> descendent commits are shown before any of their parents.  The presence
> of an origin link will make sure that the current commit will always
> appear *before* the origin-commit it is referring to (if the
> origin-commit is in the displayed set, that is).

Hmmm... while I think it might be a good idea, I'm not sure about its
overhead. Should be much, I guess.
 
>>> - git log would show something like: [...]
>>> - git log --graph will show a dotted line of somesort just like gitk.
>> 
>> That is I guess the whole and main reason for 'origin' links to exist,
>> as having this information in free-form part, i.e. in the commit
>> message might lead to problems (with parsing and extracting, and
>> finding spurious links).
> 
> Quite.  Also, having them in a well-defined place will allow for easy
> fixups in case of rebase/filter-branch.

True.

>>> - git blame will follow and use the origin link if the object exists.
>> 
>> Hmmmm... I'm not sure about that.
> 
> Care to explain your doubts?
> The reason I want this behaviour, is because it's all about tracking
> content, and that part of the content happens to come from somewhere
> else, and therefore blame should look there to "dig deeper" into it.

But blame is all about what commit brought some line to currents version.
So the cherry-pick itself, or revert of a commit itself would be blamed,
and should be blamed, not its parents, nor commit which got cherry-picked,
or commit which got reversed.

It would be nice to be able to follow 'origin'/'changeset' lines in the
_graphical_ blame browser like blameview or "git gui blame".

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 19:59     ` Jeff King
  2008-09-09 20:25       ` Stephen R. van den Berg
  2008-09-09 20:42       ` Junio C Hamano
@ 2008-09-09 21:05       ` Junio C Hamano
  2008-09-09 21:09         ` Jeff King
  2 siblings, 1 reply; 137+ messages in thread
From: Junio C Hamano @ 2008-09-09 21:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Stephen R. van den Berg, Jakub Narebski, git

Jeff King <peff@peff.net> writes:

> And yes, you can get _too_ general to the point where your semantics
> become meaningless. But I don't think that is the case here. You are
> defining the origin field as "by the way, the difference between state X
> and state Y was used to make this commit". cherry-pick just happens to
> make Y=X^, but something like rebase could use a series.

Another thing that made me wonder...

To be consistent, when you are at HEAD and are merging side branch B,
because that merge is to incorporate what happened on the side branch
while you are looking the other way, we should say "by the way, the
difference between state $(git merge-base HEAD B) and state B was used to
make this commit." in the resulting merge commit, shouldn't we?

What happens if there is more than one merge base?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 21:05       ` Junio C Hamano
@ 2008-09-09 21:09         ` Jeff King
  2008-09-09 23:36           ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Jeff King @ 2008-09-09 21:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stephen R. van den Berg, Jakub Narebski, git

On Tue, Sep 09, 2008 at 02:05:28PM -0700, Junio C Hamano wrote:

> To be consistent, when you are at HEAD and are merging side branch B,
> because that merge is to incorporate what happened on the side branch
> while you are looking the other way, we should say "by the way, the
> difference between state $(git merge-base HEAD B) and state B was used to
> make this commit." in the resulting merge commit, shouldn't we?

I suppose you could, though in that case you can obviously calculate the
merge base yourself from the parents of the merge. The difference with
cherry-picking (or rebasing) is that you might not otherwise know about
the "by the way" commits.

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg
                   ` (2 preceding siblings ...)
  2008-09-09 15:44 ` Jakub Narebski
@ 2008-09-09 21:13 ` Petr Baudis
  2008-09-09 22:56   ` Stephen R. van den Berg
                     ` (2 more replies)
  2008-09-10 20:32 ` [RFC] origin link for cherry-pick and revert Miklos Vajna
  4 siblings, 3 replies; 137+ messages in thread
From: Petr Baudis @ 2008-09-09 21:13 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: git

On Tue, Sep 09, 2008 at 03:22:12PM +0200, Stephen R. van den Berg wrote:
> - During fetch/push/pull the full commit including the origin fields is
>   transmitted, however, the objects the origin links are referring to
>   are not (unless they are being transmitted because of other reasons).
> 
> - When fetching/pulling it is optionally possible to tell git to
>   actually transmit objects referred to by origin links even if it would
>   otherwise not have done so.

I think this is misguided. In general case, cherrypicks can be from
completely unrelated histories, and if you are doing the cherry pick,
you are saying that actually, the history *does not matter*. In that
case, this kind of link tries to impose a meaning where there is none,
and in an ill-defined way when whether the commit is actually around
anywhere is essentially random.

Why do you actually *follow* the origin link at all anyway? Without its
parents, the associated tree etc., the object is essentially useless for
you; the authorship information and commit message should've been
preserved by a proper cherry-pick anyway. You're cluttering the object
store with invalid objects, which also breaks quite some fundamental
logic within Git (which assumes that if an object exists, all its
references are valid - give or take few special cases like shallow
repositories, but this would have very different characteristics).

Having history browsers draw fancy lines is fine but I see nothing wrong
with them extracting this from the free-form part of the commit message.
For informative purposes, we don't shy away from heuristics anyway, c.f.
our renames detection (heck, we are even brave enough to use that for
merges).

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 20:50         ` Jeff King
@ 2008-09-09 22:35           ` Jakub Narebski
  2008-09-09 23:07             ` Jakub Narebski
  0 siblings, 1 reply; 137+ messages in thread
From: Jakub Narebski @ 2008-09-09 22:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Stephen R. van den Berg, git

On Tue, 9 Sep 2008, Jeff King wrote:
> On Tue, Sep 09, 2008 at 01:42:52PM -0700, Junio C Hamano wrote:
> 
> > As for "by the way ... was used to make this commit": this is git.  So how
> > you arrived at the tree state you record in a commit *does not matter*.
> 
> But it _does_ matter, which is why we have commit messages to explain
> how you arrived at this tree state.

Well, that is why I was carefull to say that "origin <rev1> <rev2>"
(or 'changeset', or 'cset') means that tree state for given commit
is created out of parent commit (or parent commits in the case of merge)
and of (<rev2> - <rev1>) patch.  This is a bit of enhancement to
"parent <rev>" meaning that tree state for current commit is derived
from tree state of <rev>.

This is nice generalization...

> Now, that being said:
> 
> > After reading the discussion so far, I am still not convinced if this is a
> > good idea, nor this time around it is that much different from what the
> > previous "prior" link discussion tried to do.
> 
> For the record, I am not convinced it is a good idea either; I was
> hoping to steer it in a direction where somebody could say "and now this
> is the useful thing we can do now that we could not do before." If the
> ultimate goal is to put links to other commits into history viewers,
> then the commit message is a reasonable place to do so. The only thing I
> see improving with a header is that it makes more sense for pruning and
> object transfer.

I'm also not all convinced that 'cousin'/'origin'/'changeset'/'cset'
header is a good idea.  I only tried to steer discussion in good
direction if it is somewhat a good idea.

First, if the only goal would be to add extra links (extra edges) to
[graphical] history viewer, then full sha-1 of a commit which can be
recorded in commit message for cherry-picks and reverts should be
enough.  It does mean parsing commit message, and all possibilities
for mistake which are connected to using conventions in free-form part
of commit object; on the other hand it is not _that_ critical.

If however 'origin' links are more (perhaps only a tiny bit more),
for example discussed "weak" links... then I'm not sure if
the tradeoffs are worth it. First, if it is full connectivity like
in 'parent' header case, then a) why not use 'parent' anyway,
b) it pins the history indefinitely long. Second, if it is "weak"
link, i.e. local protect it on prune, then a) there are problems
with transferring the data, and protecting links on transfer,
as somewhere in the middle or at the end there might be repository
which uses older git (backwards compatibility strikes again),
b) git in many, many places assumes that object is valid if it passes,
and all objects linked to from object are valid; we would have either
use some kind of separate 'not strictly checked' packfile/storage,
or have grafts-like thingy.

So I'm not sure if 'origin' links are worth the trouble.

About much, much earlier "prior" link discussion: I think the discussion
about "prior" header link was done before reflogs, or at least before
reflogs got turned on by default.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 21:13 ` Petr Baudis
@ 2008-09-09 22:56   ` Stephen R. van den Berg
  2008-09-09 23:05     ` Petr Baudis
  2008-09-10 12:21     ` [RFC] origin link for cherry-pick and revert Theodore Tso
  2008-09-10  8:45   ` Paolo Bonzini
  2008-09-23 13:51   ` Recording "partial merges" (was: Re: [RFC] origin link for cherry-pick and revert) Peter Krefting
  2 siblings, 2 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 22:56 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr Baudis wrote:
>On Tue, Sep 09, 2008 at 03:22:12PM +0200, Stephen R. van den Berg wrote:
>> - During fetch/push/pull the full commit including the origin fields is
>>   transmitted, however, the objects the origin links are referring to
>>   are not (unless they are being transmitted because of other reasons).

>> - When fetching/pulling it is optionally possible to tell git to
>>   actually transmit objects referred to by origin links even if it would
>>   otherwise not have done so.

>I think this is misguided. In general case, cherrypicks can be from
>completely unrelated histories, and if you are doing the cherry pick,
>you are saying that actually, the history *does not matter*. In that

That is a false assumption in general, I'd say.

>case, this kind of link tries to impose a meaning where there is none,
>and in an ill-defined way when whether the commit is actually around
>anywhere is essentially random.

The purpose I'd use the origin links for is to manage software projects
that consist of 7 main branches which have branched in (on average) two
year intervals, which never get merged anymore.  The only thing that
happens is that there are backports amongst the branches about two per
week.

The only way to perform the backports is by using cherry-pick.
The history of each backport *is* important though.
Since all the developers who care about the multiple release branches
have all the relevant branches in their repository, the presence of
a origin object is by no means random, it's a certainty.

>Why do you actually *follow* the origin link at all anyway? Without its
>parents, the associated tree etc., the object is essentially useless for
>you; the authorship information and commit message should've been
>preserved by a proper cherry-pick anyway. You're cluttering the object
>store with invalid objects, which also breaks quite some fundamental
>logic within Git (which assumes that if an object exists, all its
>references are valid - give or take few special cases like shallow
>repositories, but this would have very different characteristics).

I'd prefer to formalise the (weak) relationship of an origin link, instead of
relying on vague assumptions when parsing the free-form commit message
and then guessing what the mentioned hash might mean.

>Having history browsers draw fancy lines is fine but I see nothing wrong
>with them extracting this from the free-form part of the commit message.
>For informative purposes, we don't shy away from heuristics anyway, c.f.
>our renames detection (heck, we are even brave enough to use that for
>merges).

It's not just that.  If I make a change to an area that was cherrypicked
from another branch, then I find it rather important to check if any
changes to this area need to be backported/forwardported to the branches
the origin links are pointing to.
I.e. the origin link allows me to improve my efficiency as a programmer.
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 22:56   ` Stephen R. van den Berg
@ 2008-09-09 23:05     ` Petr Baudis
  2008-09-09 23:32       ` Stephen R. van den Berg
  2008-09-10  9:35       ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini
  2008-09-10 12:21     ` [RFC] origin link for cherry-pick and revert Theodore Tso
  1 sibling, 2 replies; 137+ messages in thread
From: Petr Baudis @ 2008-09-09 23:05 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: git

On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote:
> The only way to perform the backports is by using cherry-pick.
> The history of each backport *is* important though.
> Since all the developers who care about the multiple release branches
> have all the relevant branches in their repository, the presence of
> a origin object is by no means random, it's a certainty.

Recording cherry-picks in your workflow certainly makes sense, but I'm
not talking about workflow-level issues here. You are adding an extra
header to the commit object. I'm talking about the object database and
low-level Git model implications this has.

In other way, I think this is purely a porcelain matter and recording
this information in the free-form area is more than enough.

> >Why do you actually *follow* the origin link at all anyway? Without its
> >parents, the associated tree etc., the object is essentially useless for
> >you; the authorship information and commit message should've been
> >preserved by a proper cherry-pick anyway. You're cluttering the object
> >store with invalid objects, which also breaks quite some fundamental
> >logic within Git (which assumes that if an object exists, all its
> >references are valid - give or take few special cases like shallow
> >repositories, but this would have very different characteristics).
> 
> I'd prefer to formalise the (weak) relationship of an origin link, instead of
> relying on vague assumptions when parsing the free-form commit message
> and then guessing what the mentioned hash might mean.

Why?

> >Having history browsers draw fancy lines is fine but I see nothing wrong
> >with them extracting this from the free-form part of the commit message.
> >For informative purposes, we don't shy away from heuristics anyway, c.f.
> >our renames detection (heck, we are even brave enough to use that for
> >merges).
> 
> It's not just that.  If I make a change to an area that was cherrypicked
> from another branch, then I find it rather important to check if any
> changes to this area need to be backported/forwardported to the branches
> the origin links are pointing to.
> I.e. the origin link allows me to improve my efficiency as a programmer.

And why are the notes created by git cherry-pick -x insufficient for that?

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 22:35           ` Jakub Narebski
@ 2008-09-09 23:07             ` Jakub Narebski
  2008-09-10  8:10               ` Paolo Bonzini
  0 siblings, 1 reply; 137+ messages in thread
From: Jakub Narebski @ 2008-09-09 23:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Stephen R. van den Berg, git

Jakub Narebski wrote:
> On Tue, 9 Sep 2008, Jeff King wrote:
> > On Tue, Sep 09, 2008 at 01:42:52PM -0700, Junio C Hamano wrote:

> > Now, that being said:
> > 
> > > After reading the discussion so far, I am still not convinced if this is a
> > > good idea, nor this time around it is that much different from what the
> > > previous "prior" link discussion tried to do.
> > 
> > For the record, I am not convinced it is a good idea either; I was
> > hoping to steer it in a direction where somebody could say "and now this
> > is the useful thing we can do now that we could not do before." If the
> > ultimate goal is to put links to other commits into history viewers,
> > then the commit message is a reasonable place to do so. The only thing I
> > see improving with a header is that it makes more sense for pruning and
> > object transfer.
> 
> I'm also not all convinced that 'cousin'/'origin'/'changeset'/'cset'
> header is a good idea.  I only tried to steer discussion in good
> direction if it is somewhat a good idea.

By the way, beside graphical history viewers it would also help rebase
(and git-cherry) notice when patch was already applied better.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 20:54     ` Jakub Narebski
@ 2008-09-09 23:08       ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 23:08 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski wrote:
>On Tue, 9 Sep 2008, Stephen R. van den Berg wrote:
>> On the contrary, my current proposal only needs to verify the validity
>> of a single commit, changing it like this will require the system to
>> verify the validity of two commits.  Given the rareness of the origin
>> links this will hardly present a problem, but it *does* increase
>> the overhead in checking a bit.

>Errr... wasn't you proposing to keep/protect against pruning <cousin>
>AND <cousin>^<mainline>? You want to have _diff_ (changeset) protected,
>not a single tree state.

Actually, making sure that the commit we reference in the origin link
exists, we implicitly prove that all the parents of that commit exist as
well.  Then again, this point is moot since I already conceded (in a
different thread) that storing two hashes is better.

>>>> - git rev-list --topo-order will take origin links into account to
>>>>   ensure proper ordering.

>Hmmm... while I think it might be a good idea, I'm not sure about its
>overhead. Should be much, I guess.

Actually, I have already programmed this part, and the overhead is close
to zero.

>>>> - git blame will follow and use the origin link if the object exists.

>>> Hmmmm... I'm not sure about that.

>> Care to explain your doubts?
>> The reason I want this behaviour, is because it's all about tracking
>> content, and that part of the content happens to come from somewhere
>> else, and therefore blame should look there to "dig deeper" into it.

>But blame is all about what commit brought some line to currents version.
>So the cherry-pick itself, or revert of a commit itself would be blamed,
>and should be blamed, not its parents, nor commit which got cherry-picked,
>or commit which got reversed.

Well, it depends, I guess.
If you'd go for a "committer" based display, then following origin links
is bad.
If you'd go for an "author" based display, then following origin links
should be the default (IMHO).
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 23:05     ` Petr Baudis
@ 2008-09-09 23:32       ` Stephen R. van den Berg
  2008-09-10  9:35       ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini
  1 sibling, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 23:32 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr Baudis wrote:
>On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote:

>In other way, I think this is purely a porcelain matter and recording
>this information in the free-form area is more than enough.

>> I'd prefer to formalise the (weak) relationship of an origin link, instead of
>> relying on vague assumptions when parsing the free-form commit message
>> and then guessing what the mentioned hash might mean.

>Why?

Using special references in the free-form area of a commit is akin to
using X-... headerfields in E-mail with all the assorted mess:
- No strict definition of what it means.
- Diverging porcelain implementations making use of the field in ever so
  slightly changing ways over the years.
- You cannot rely on the field being always available.
- Automated "renumbering" becomes difficult at best.

What we want are concise and unambiguous definitions which allow us to
build tools that operate predictably on them now, and will operate
predictably on them in the future.

>> >Having history browsers draw fancy lines is fine but I see nothing wrong

>> It's not just that.  If I make a change to an area that was cherrypicked
>> from another branch, then I find it rather important to check if any
>> changes to this area need to be backported/forwardported to the branches
>> the origin links are pointing to.
>> I.e. the origin link allows me to improve my efficiency as a programmer.

>And why are the notes created by git cherry-pick -x insufficient for that?

Things like rebase/filter-branch/stgit mess that up because they don't
know if the hash in the free-form should be altered.
Also, there is no automated way to actually fetch missing branches we
cherry-picked from this way.
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 19:43   ` Stephen R. van den Berg
  2008-09-09 19:59     ` Jeff King
  2008-09-09 20:54     ` Jakub Narebski
@ 2008-09-09 23:35     ` Linus Torvalds
  2008-09-09 23:58       ` Stephen R. van den Berg
  2008-09-09 23:59       ` Jakub Narebski
  2 siblings, 2 replies; 137+ messages in thread
From: Linus Torvalds @ 2008-09-09 23:35 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, git

On Tue, 9 Sep 2008, Stephen R. van den Berg wrote:

> Jakub Narebski wrote:
> >"Stephen R. van den Berg" <srb@cuci.nl> writes:
> >> The definition of the origin field reads as follows:
> 
> >> - There can be an arbitrary number of origin fields per commit.
> >>   Typically there is going to be at most one origin field per commit.
> 
> >I understand that multiple origin fields occur if you do a squash
> >merge, or if you cherry-pick multiple commits into single commit.
> >For example:
> > $ git cherry-pick -n <a1>
> > $ git cherry-pick    <a2>
> > $ git commit --amend        #; to correct commit message
> 
> Correct.

Quite frankly, recording the origins for _any_ of the above sounds like a 
horribly mistake.

All those operations are commonly used (along with "git rebase -i") to 
clean up history in order to show a nicer version.

The whole point of "origin" seems to be to _destroy_ that.

I would refuse to ever touch anything that had an "origin" pointer, so if 
git were to add that feature, it would be a huge disappointment to me. I'd 
have to have a version that makes sure that anything it pulls hasn't been 
crapped on by somebody who added a stupid link to some dirty history that 
I'm not at all interested in seeing.

IOW, I'm seeing a _lot_ of downsides, and not any actual upsides. What are 
the upsides again? 

			Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 21:09         ` Jeff King
@ 2008-09-09 23:36           ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 23:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Jakub Narebski, git

Jeff King wrote:
>On Tue, Sep 09, 2008 at 02:05:28PM -0700, Junio C Hamano wrote:
>> To be consistent, when you are at HEAD and are merging side branch B,
>> because that merge is to incorporate what happened on the side branch
>> while you are looking the other way, we should say "by the way, the
>> difference between state $(git merge-base HEAD B) and state B was used to
>> make this commit." in the resulting merge commit, shouldn't we?

>I suppose you could, though in that case you can obviously calculate the
>merge base yourself from the parents of the merge. The difference with
>cherry-picking (or rebasing) is that you might not otherwise know about
>the "by the way" commits.

Quite.  The origin link is primarily intended for cherry-picks/reverts
which are otherwise difficult to find.  Anything that can use the normal
parent mechanism has no business using the origin links.
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 23:35     ` Linus Torvalds
@ 2008-09-09 23:58       ` Stephen R. van den Berg
  2008-09-10  0:23         ` Linus Torvalds
  2008-09-09 23:59       ` Jakub Narebski
  1 sibling, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-09 23:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git

Linus Torvalds wrote:
>On Tue, 9 Sep 2008, Stephen R. van den Berg wrote:
>> Jakub Narebski wrote:
>> >I understand that multiple origin fields occur if you do a squash
>> >merge, or if you cherry-pick multiple commits into single commit.
>> >For example:
>> > $ git cherry-pick -n <a1>
>> > $ git cherry-pick    <a2>
>> > $ git commit --amend        #; to correct commit message

>> Correct.

>All those operations are commonly used (along with "git rebase -i") to 
>clean up history in order to show a nicer version.

Actually, I'd suggest that cherry-pick takes an -o flag which turns on
the origin link.  This needs to be a concious decision because one deems
the history relevant.  This typically is a Good Thing when the
development has several long-term-stable branches which never get merged
with each other, yet they receive frequent backports (using cherry-pick)
between them.

>The whole point of "origin" seems to be to _destroy_ that.

Only in the case where the committer thinks the history is of interest,
and even then, since the origin link is in the header, displaying it or
not suddenly is under the control of git.
Had it been in the free-form textarea, there'd be no way suppress the
display of it.

>I would refuse to ever touch anything that had an "origin" pointer, so if 
>git were to add that feature, it would be a huge disappointment to me. I'd 
>have to have a version that makes sure that anything it pulls hasn't been 
>crapped on by somebody who added a stupid link to some dirty history that 
>I'm not at all interested in seeing.

As you might have noticed, the actual process of pulling/fetching
explicitly does *not* pull in the objects being pointed to.
That, in turn, will cause the origin link output to be automatically
suppressed.  I.e. you'll never know the difference.

OTOH, if someone adds a free-form link to the commit message, you
essentially cannot hide that and are just suffering the clutter without
having any use for it.

>IOW, I'm seeing a _lot_ of downsides, and not any actual upsides. What are 
>the upsides again? 

The upsides are:
- If your repository contains the proper branches, it will show a richer
  content.
- If your repository lacks the proper branches, it will show a *reduced*
  clutter content (because actual free-text references in the commit
  messages will decrease).

I see a lot of upsides, what were the downsides again?
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 23:35     ` Linus Torvalds
  2008-09-09 23:58       ` Stephen R. van den Berg
@ 2008-09-09 23:59       ` Jakub Narebski
  1 sibling, 0 replies; 137+ messages in thread
From: Jakub Narebski @ 2008-09-09 23:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen R. van den Berg, git

Linus Torvalds wrote:
> On Tue, 9 Sep 2008, Stephen R. van den Berg wrote:
>> Jakub Narebski wrote:
>>>"Stephen R. van den Berg" <srb@cuci.nl> writes:
>>>> The definition of the origin field reads as follows:
>> 
>>>> - There can be an arbitrary number of origin fields per commit.
>>>>   Typically there is going to be at most one origin field per commit.
>> 
>>> I understand that multiple origin fields occur if you do a squash
>>> merge, or if you cherry-pick multiple commits into single commit.
>>> For example:
>>> $ git cherry-pick -n <a1>
>>> $ git cherry-pick    <a2>
>>> $ git commit --amend        #; to correct commit message
>> 
>> Correct.
> 
> Quite frankly, recording the origins for _any_ of the above sounds like a 
> horribly mistake.

Actually the above is _not_ a good example for using 'origin', and why
using 'origin'; just a bit convoluted example of multiple 'origin'
headers.

> All those operations are commonly used (along with "git rebase -i") to 
> clean up history in order to show a nicer version.
> 
> The whole point of "origin" seems to be to _destroy_ that.

If I understand correctly the point is to record those 'origin' headers
for git-revert (when 'origin'-ed commit is somewhere in the history),
and for git-cherry-pick from other long lived branch and thus require
additional option to git-cherry-pick to record 'origin' (denoting that
you this is "true" cherry-pick, and not reordering of commits and
cleaning up a history, better done with interactive rebase).

/me is playing advocatus diaboli here, 'cause I'm not that convinced
to necessity of this feature.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 20:42       ` Junio C Hamano
  2008-09-09 20:47         ` Shawn O. Pearce
  2008-09-09 20:50         ` Jeff King
@ 2008-09-10  0:13         ` Stephen R. van den Berg
  2008-09-10  1:59           ` Junio C Hamano
  2 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10  0:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Jakub Narebski, git

Junio C Hamano wrote:
>As for "by the way ... was used to make this commit": this is git.  So how
>you arrived at the tree state you record in a commit *does not matter*.

The typical use case for the origin links is in a project with several
long-lived branches which use cherry-picks to backport amongst them.
There is no real other way to solve this case, except for some rather
kludgy stuff in the free-form commit message which doesn't mesh well
with rebase/filter-branch/stgit etc.

As to "does not matter": then why does git store parent links?

>To my ears, it rhymes rather well with a famous quote from $gmane/217:

>    You're freezing your (crappy) algorithm at tree creation time, and
>    basically making it pointless to ever create something better later,
>    because even if hardware and software improves, you've codified that
>    "we have to have crappy information".

I tried to accomodate this approach by overloading the parent link and
then making git more intelligent to figure out if it is a cherry-pick or
not.  That was deemed undesirable, so using the origin links is the next
best thing (IMHO).

>good idea, nor this time around it is that much different from what the
>previous "prior" link discussion tried to do.

It is well-defined this time, and doesn't bleed across fetch/pull.
-- 
Sincerely,
           Stephen R. van den Berg.

"Be spontaneous!"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 23:58       ` Stephen R. van den Berg
@ 2008-09-10  0:23         ` Linus Torvalds
  2008-09-10  5:42           ` Stephen R. van den Berg
  2008-09-10  8:30           ` Paolo Bonzini
  0 siblings, 2 replies; 137+ messages in thread
From: Linus Torvalds @ 2008-09-10  0:23 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, git

On Wed, 10 Sep 2008, Stephen R. van den Berg wrote:
> 
> As you might have noticed, the actual process of pulling/fetching
> explicitly does *not* pull in the objects being pointed to.

.. which makes them _local_ data, which in turn means that they should not 
be in the object database at all.

IOW, i you want this for local reasons, you should use a local database, 
like the index or the reflogs (and I don't mean "like the index" in the 
sense that it would look _anything_ like that file, but in the sense that 
it's a purely local thing and doesn't show up in the object database).

		Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10  0:13         ` Stephen R. van den Berg
@ 2008-09-10  1:59           ` Junio C Hamano
  2008-09-10  5:38             ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Junio C Hamano @ 2008-09-10  1:59 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jeff King, Jakub Narebski, git

"Stephen R. van den Berg" <srb@cuci.nl> writes:

> Junio C Hamano wrote:
>>As for "by the way ... was used to make this commit": this is git.  So how
>>you arrived at the tree state you record in a commit *does not matter*.
>
> The typical use case for the origin links is in a project with several
> long-lived branches which use cherry-picks to backport amongst them.
> There is no real other way to solve this case, except for some rather
> kludgy stuff in the free-form commit message which doesn't mesh well
> with rebase/filter-branch/stgit etc.
>
> As to "does not matter": then why does git store parent links?

The parent links describe *where* you came from, not *how*.

And if you think the difference is just "semantics", then you haven't
grokked the first lesson I gave in this thread.  "parents" record the
reference points against which you make "this resulting commit suits the
purpose of my branch better than any histories leading to these commits".

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10  1:59           ` Junio C Hamano
@ 2008-09-10  5:38             ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10  5:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Jakub Narebski, git

Junio C Hamano wrote:
>"Stephen R. van den Berg" <srb@cuci.nl> writes:
>> Junio C Hamano wrote:
>>>As for "by the way ... was used to make this commit": this is git.  So how
>>>you arrived at the tree state you record in a commit *does not matter*.

>> The typical use case for the origin links is in a project with several
>> long-lived branches which use cherry-picks to backport amongst them.
>> There is no real other way to solve this case, except for some rather
>> kludgy stuff in the free-form commit message which doesn't mesh well
>> with rebase/filter-branch/stgit etc.

>> As to "does not matter": then why does git store parent links?

>The parent links describe *where* you came from, not *how*.

>And if you think the difference is just "semantics", then you haven't
>grokked the first lesson I gave in this thread.  "parents" record the
>reference points against which you make "this resulting commit suits the
>purpose of my branch better than any histories leading to these commits".

The last question of mine was/is a rethorical one.

Consider the typical use case I describe above.  The developer usually
has just created a commit in the developmentbranch, tested it, and deems
the patch worthwhile enough to backport it to the latest stable branch.
So he cherry picks the from the development branch to the latest stable
branch.
Then tests it, and decides to backport it to the older stable branch,
so he cherry-picks it again, and commits it there too.  The is repeated
in rapid succession on three older stable branches as well.

Basically that means that for the patch itself, there is a path in
history to follow as well.  I.e. the patch itself evolves over time.

Now, when another developer makes an additional change to this patch
in one of the stable versions, it is very helpful to actually be able to
have git tell you where the original patch came from and to follow back
the chain upward.  It allows you to forward/backward port the new change
more easily.

Basically, the normal parent links allow you to follow evolving
snapshots of the complete source-tree, whereas the origin links allow
you to follow evolving snapshots of a patch.
As it happens, the shortest way to describe a patch in git is by
specifying two commits of which the difference is exactly your patch.
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10  0:23         ` Linus Torvalds
@ 2008-09-10  5:42           ` Stephen R. van den Berg
  2008-09-10 15:30             ` Linus Torvalds
  2008-09-10  8:30           ` Paolo Bonzini
  1 sibling, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10  5:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git

Linus Torvalds wrote:
>On Wed, 10 Sep 2008, Stephen R. van den Berg wrote:

>> As you might have noticed, the actual process of pulling/fetching
>> explicitly does *not* pull in the objects being pointed to.

>.. which makes them _local_ data, which in turn means that they should not 
>be in the object database at all.

>IOW, i you want this for local reasons, you should use a local database, 
>like the index or the reflogs (and I don't mean "like the index" in the 
>sense that it would look _anything_ like that file, but in the sense that 
>it's a purely local thing and doesn't show up in the object database).

But then how would someone who clones the repository get at the information?
The information is essential to understand backports between the various
stable branches.

The origin links describe the evolving state of a patch (i.e. just like
regular commits/parents store snapshots of the whole tree, the origin
links store snapshots of a patch as it evolves through time).
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 23:07             ` Jakub Narebski
@ 2008-09-10  8:10               ` Paolo Bonzini
  0 siblings, 0 replies; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-10  8:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Jeff King, Junio C Hamano, Stephen R. van den Berg, git

> By the way, beside graphical history viewers it would also help rebase
> (and git-cherry) notice when patch was already applied better.

I think that rebase had better not trust the origin links in deciding
whether a patch was already applied; it already does it well enough.

git-cherry is another story, as that tool is not faking "changeset mode"
so well (because it cannot attempt merges, and these are what allows
git-rebase to fake changesets much better).

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10  0:23         ` Linus Torvalds
  2008-09-10  5:42           ` Stephen R. van den Berg
@ 2008-09-10  8:30           ` Paolo Bonzini
  2008-09-10 15:32             ` Linus Torvalds
  1 sibling, 1 reply; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-10  8:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen R. van den Berg, Jakub Narebski, git

Linus Torvalds wrote:
> 
> On Wed, 10 Sep 2008, Stephen R. van den Berg wrote:
>> As you might have noticed, the actual process of pulling/fetching
>> explicitly does *not* pull in the objects being pointed to.
> 
> .. which makes them _local_ data, which in turn means that they should not 
> be in the object database at all.

Not really local data.  More like _weakly referenced_ data.  If it is
there, cool.  If it is not there, no big deal.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 21:13 ` Petr Baudis
  2008-09-09 22:56   ` Stephen R. van den Berg
@ 2008-09-10  8:45   ` Paolo Bonzini
  2008-09-23 13:51   ` Recording "partial merges" (was: Re: [RFC] origin link for cherry-pick and revert) Peter Krefting
  2 siblings, 0 replies; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-10  8:45 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Stephen R. van den Berg, git

> Having history browsers draw fancy lines is fine but I see nothing wrong
> with them extracting this from the free-form part of the commit message.
> For informative purposes, we don't shy away from heuristics anyway, c.f.
> our renames detection (heck, we are even brave enough to use that for
> merges).

... and it works only because false positives (which make the merge
actively wrong) are extremely rare.  If there is the occasional false
negative, well, it just makes merges more complicated and screws up
visualization a bit, so you live with it.

Move/copy detection almost always worked for me, but there are two cases
where it didn't:

1) empty files.  Each of them is marked as copied from a
seemingly-picked-at-random one.

2) renaming a Gtk+ class.  You rename it (e.g. from gtkclassnamea.c to
gtkclassnameb.c) and at the same time do

  s/GtkClassNameA/GtkClassNameB/
  s/GTK_CLASS_NAME_A(?:\>|?=_)/GTK_CLASS_NAME_B/
  s/gtk_class_name_a(?:\>|?=_)/gtk_class_name_b/

and reindent everything.  Guaranteed to have a similarity index around
30-40%, not more.

I don't care much about it, but face it, it is *not* perfect.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata
  2008-09-09 23:05     ` Petr Baudis
  2008-09-09 23:32       ` Stephen R. van den Berg
@ 2008-09-10  9:35       ` Paolo Bonzini
  2008-09-10 10:44         ` Petr Baudis
  2008-09-10 14:33         ` Dmitry Potapov
  1 sibling, 2 replies; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-10  9:35 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Stephen R. van den Berg, git

> Why do you actually *follow* the origin link at all anyway? Without its
> parents, the associated tree etc., the object is essentially useless for
> you

Stephen posed the origin links as weak, but it is not necessarily true
that you don't have the parents and the associated tree.  For example,
if you download a repository that includes a "master" branch and a few
stable branches, you *will* have the objects cherry-picked into stable
branches, because they are commits in the master branch.

Junio explained that the way achieves the same effect in git is by
forking the topic branch off the "oldest" branch where the patch will
possibly be of interest.  Then he can merge it in that branch and all
the newest ones.  That's great, but not all people are as
forward-looking (he did say that sometimes he needs to cherrypick).

Another problem is that in some projects actually there are two "maint"
branches (e.g. currently GCC 4.2 and GCC 4.3), and most developers do
not care about what goes in the older "maint" branch; they develop for
trunk and for the newer "maint" branch, and then one person comes and
cherry-picks into the older "maint" branch.  This has two problems:

1) Having to fork topic branches off the older branch would force extra
testing on the developers.

2) Besides this, topic branches are not cloned, so if I am the
integrator on the older "maint" branch, I need to dig manually in the
commits to find bugfixes.  True, I could use Bugzilla, but what if I
want to use git instead?  There is "git cherry -v ... | grep -w ^+.*PR",
except that it has too many false negatives (fixes that have already
been backported, but do show up in the list).

> And why are the notes created by git cherry-pick -x insufficient for that?

For example, these notes (or the ones created by "git revert") are
*wrong* because they talk about commits instead of changesets (deltas
between two commits).

Why is only one commit present?  Because these messages are meant for
users, not for programs.  That's easy to show: users think of commits as
deltas anyway, even though git stores them as snapshots---"git show
HEAD" shows a delta, not a snapshot.

And what does this mean for programs?  That they must resort to
commit-message scraping to distinguish the two cases. (*)

   (*) A GUI blame program, for example, would need to distinguish
   whether code added by a commit is taken from commit 4329bd8, or is
   reverting commit 4329bd8.  (In the first case, the author of that
   code is whoever was responsible for that code in 4329bd8; in the
   second case, it is whoever was responsible for that code in
   4329bd8^).  If recording changesets, you see 4329bd8^..4329bd8 in
   the first case, and 4329bd8..4329bd8^ in the second, so it is trivial
   to follow the chain.

And scraping is bad.  Imagine people that are writing commit messages in
their native language.  What if they patch git to translate the magic
notes created by "git cherry-pick -x" or "git revert" (maybe a future
version of git will do that automatically)?  Should they translate also
every program that scrapes the messages?

Whenever there is a piece of data that could be useful to programs (no
matter if plumbing or porcelain), I consider free form notes to be bad.
 Because data is data, and metadata is metadata.

If there was a generic way to put porcelain-level metadata in commit
messages (e.g. Signed-Off-By and Acknowledged-By can be already
considered metadata), I would not be so much in favor of "origin" links
being part of the commit object's format.  Now if you think about it,
commit references within this kind of metadata would have mostly the
properties that Stephen explained in his first message:

1) they would be rewritten by git-filter-branch

2) these references, albeit weak by default, could optionally be
followed when fetching (either with command-line or configuration options)

3) they would not be pruned by git-gc

4) possibly, git rev-list --topo-order would sort commits by taking into
account metadata references too.

So the implementation effort would be roughly the same.

But, can you think of any other such metadata?  Personally I can't, so
while I understand the opposition to a new commit header field that
would be there from here to eternity (or until the LHC starts), I do
think it is the simplest thing that can possibly work.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata
  2008-09-10  9:35       ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini
@ 2008-09-10 10:44         ` Petr Baudis
  2008-09-10 11:49           ` Stephen R. van den Berg
  2008-09-10 14:33         ` Dmitry Potapov
  1 sibling, 1 reply; 137+ messages in thread
From: Petr Baudis @ 2008-09-10 10:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stephen R. van den Berg, git

On Wed, Sep 10, 2008 at 11:35:18AM +0200, Paolo Bonzini wrote:
> 
> > Why do you actually *follow* the origin link at all anyway? Without its
> > parents, the associated tree etc., the object is essentially useless for
> > you
> 
> Stephen posed the origin links as weak, but it is not necessarily true
> that you don't have the parents and the associated tree.  For example,
> if you download a repository that includes a "master" branch and a few
> stable branches, you *will* have the objects cherry-picked into stable
> branches, because they are commits in the master branch.

But that is irrelevant. If you already have the objects, whether to
follow the origin link does not matter at all.

I argue that the following the origin link by one step is harmful as it
violated the internal Git object model and does not have real benefits.
If you want to have the origin links, do not follow them at all - the
commit objects themselves are not useful. (Or, optionally, follow them
fully - that of course can make sense.)

> > And why are the notes created by git cherry-pick -x insufficient for that?
> 
> For example, these notes (or the ones created by "git revert") are
> *wrong* because they talk about commits instead of changesets (deltas
> between two commits).

(BTW, I don't feel strongly enough about the header-freeform distinction
to argue about it and some of your and others' points are good. But even
if we have the origin links, I think we should only follow them not at
all or fully.)

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata
  2008-09-10 10:44         ` Petr Baudis
@ 2008-09-10 11:49           ` Stephen R. van den Berg
  2008-09-10 12:30             ` Petr Baudis
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 11:49 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Paolo Bonzini, git

Petr Baudis wrote:
>On Wed, Sep 10, 2008 at 11:35:18AM +0200, Paolo Bonzini wrote:
>But that is irrelevant. If you already have the objects, whether to
>follow the origin link does not matter at all.

>I argue that the following the origin link by one step is harmful as it
>violated the internal Git object model and does not have real benefits.
>If you want to have the origin links, do not follow them at all - the
>commit objects themselves are not useful. (Or, optionally, follow them
>fully - that of course can make sense.)

The origin links are rarely followed, not even by one step.  They are
only followed if a certain operation requires them (not a lot do).

>> > And why are the notes created by git cherry-pick -x insufficient for that?

>> For example, these notes (or the ones created by "git revert") are
>> *wrong* because they talk about commits instead of changesets (deltas
>> between two commits).

>(BTW, I don't feel strongly enough about the header-freeform distinction
>to argue about it and some of your and others' points are good. But even
>if we have the origin links, I think we should only follow them not at
>all or fully.)

Maybe we have a misunderstanding about what "follow a link" means and
when it is done.
During most normal git operation, the origin links are just read, but
not followed.
The only commands that I expect to follow them are log --graph, gitk, fsck
and blame.  I may have missed some corner use-cases, but this should
cover most of it; i.e. most of git ignores them or just makes note of
the hashvalues provided.
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 22:56   ` Stephen R. van den Berg
  2008-09-09 23:05     ` Petr Baudis
@ 2008-09-10 12:21     ` Theodore Tso
  2008-09-10 14:16       ` Stephen R. van den Berg
  1 sibling, 1 reply; 137+ messages in thread
From: Theodore Tso @ 2008-09-10 12:21 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Petr Baudis, git

On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote:
> The purpose I'd use the origin links for is to manage software projects
> that consist of 7 main branches which have branched in (on average) two
> year intervals, which never get merged anymore.  The only thing that
> happens is that there are backports amongst the branches about two per
> week.
> 
> The only way to perform the backports is by using cherry-pick.
> The history of each backport *is* important though.
> Since all the developers who care about the multiple release branches
> have all the relevant branches in their repository, the presence of
> a origin object is by no means random, it's a certainty.

I'd argue that the origin link is a bit too general for your proposed
use.  One of the problems with the origin link is that it is only a
one way pointer.  Given a newer commit, you know that it is (somehow)
weekly related to a older commit.  So your proposed workflow only
works if cherry-picks only happen in one direction.  That isn't always
true, especially in distributed environments where the bugfix might
happen on someone else's development branch, and then it gets pulled
in, or perhaps rebased in, and you want to know they are related.

I would argue the best way to do that is to store (either in the
object or in the free-form text area) not the link, which would have
to get renumbered but rather the identifier for the bug(s) that this
commit fixes.  So for example, consider a convention where in the body
of the free-form text area, before the Signed-off-by:, Acked-by:, and
CC: headers for those projects that use them, we add something like
the following:

Addresses-Bug: Red_Hat/149480, Sourceforge_Feature/120167

or

Addresses-Bug: Debian/432865, Launchpad/203323, Sourceforge_Bug/1926023

Once you have this information, it is not difficult to maintain a
berk_db database which maps a particular Bug identifier (i.e.,
Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of
commits.

The advantage of this scheme is that if a bug has been fixed in
multiple branches, you can see the association between two commits in
two different branches very easily.  Furthermore, you get a link back
to the actual bug in one or more bug tracking systems, which the some
porcelain program could use to transform into a hot-link which when
clicked opens up a browser window to the bug in question.

In contrast, using your proposed origin scheme, if the bug was
originally created in some development branch, and then cherry picked
into two separate maintenance branches, if you don't have the
development branch in your repository (maybe for some reason that
development branch wasn't kept for some reason), the origin link in
the two maintenance branches would point to a non-existent commit ID,
and you wouldn't be able to estabish a linkage between them.  By using
an independent bug identifer as the way of creating the linkage,
you're preserving *much* more useful information, and you can reliably
establish a relationship between two commits.

In terms of your arguments about why free-form is bad, in another message:

>- No strict definition of what it means.
>- Diverging porcelain implementations making use of the field in ever so
>  slightly changing ways over the years.

This can be a problem regardless of where you store the information.
Whether you store it in the free-form text or in the git object
header, if you don't make sure it is well-defined, you're in trouble.

>- You cannot rely on the field being always available.

This is true regardless of where you store it; older versions of git
won't store the git origin link, for example, unless you plan to break
backwards compatibility with all existing git repositories, which
would be a bad idea.  :-)

One nice thing of using text in free-form text fields is that anyone
can enter it without needing a new version of git.  The downside is
that people could typo the header in some fashion.  But that can be
dealt with in a newer version of the git porcelain validates the bug
identifier and/or checks for obvious spelling mistakes and issues a
warning ("Looks like you may have mispelled 'Adresses-Bug'; perhaps
you should fix this via git commit --amend?").  

In contrast, if you put it in the git object header, there is no
possibility of using the field at all until you update to a version of
git that supports it.  And some developer on your project is using an
older version of git when they rebase or cherry-pick a commit, the
origin header will be completely lost; but if it is stored in the
free-form area, the information will be brought along for the ride for
free.

>- Automated "renumbering" becomes difficult at best.

This is actually one of the reasons why I don't like the origin link.
If you use the origin link, it's *still* not obvious whether you
should rewrite the commit ID or not.  For example, in some workflows,
you have two branches pointing to the same commit before you do the
rebase, where the rebase will only update the current branch pointer,
but there is another branch still pointing at the original series of
commits.  Worse yet, someone may have done a cherry-pick *before* the
rebase.  Hence, the only thing you can do is keep *both* commit ID's.
This means that over time, you can't get rid of any commit ID's when
you do a rebase, which means the number of commit ID's in the origin
link will always increase whenever you do a rebase or a cherry-pick.

This is why for the use case where you are trying to figure out
whether a bug exists in a particular branch, it is ***much*** better
to rendevous using a bug identifier; it provides an extra layer of
indirection which results in a much more stable identifer that is
guaranteed to work.

I understand it won't work for those cases where you don't have a bug
tracking identifer, but in fact, if you need this functionality at all
(and I am not convinced that you do), the ***much*** better approach
is to use the same approach as the bug tracking identifier, and add a
level of indirection.  How would that work in practice?  Whenever you
create a new commit, create a UUID which is assigned to the patch.
This UUID is not modified by git rebase or git cherry pick, and it
should be optionally kept or modified on a git commit --amend.
Ideally, said UUID would exported via git-format-patch, and imported
via git-am, and via systems that use patches, such as guilt or stg.
This becomes a handy way of recognizing patches even if they aren't
being stored in git --- for example, Andrew Morton's mm patch series.

Now, whether you store this UUID in the free-form text area, or in the
git object header, in the long run really doesn't matter.  You can
just as easily have porcelein suppress a line in the free-form text
area, as you can have the procelain print the UUID when it is stored
in the object header.

Yes, it means that you have to maintain a separate database so you can
easily find the list of commits that contain a particular UUID, but I
suspect you would need this in the case of the origin link concept
anyway, since sooner or later some of the more useful uses of said
link would require you to be able to find the commits which had origin
links to the original commit, which means you would need to create and
maintain this database anyway.  And the maintenance of this database
is purely optional; you only need it if you care about efficiently
looking up UUID's, and given "time git log > /dev/null" on the kernel
tree only takes six seconds on my laptop, and "git log > /dev/null"
only takes 0.148 seconds for e2fsprogs, for many projects you might
not even need the database to accelerate lookups via UUID.

    	      	  	      		 	     - Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata
  2008-09-10 11:49           ` Stephen R. van den Berg
@ 2008-09-10 12:30             ` Petr Baudis
  2008-09-10 13:14               ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Petr Baudis @ 2008-09-10 12:30 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Paolo Bonzini, git

On Wed, Sep 10, 2008 at 01:49:40PM +0200, Stephen R. van den Berg wrote:
> Maybe we have a misunderstanding about what "follow a link" means and
> when it is done.
> During most normal git operation, the origin links are just read, but
> not followed.
> The only commands that I expect to follow them are log --graph, gitk, fsck
> and blame.  I may have missed some corner use-cases, but this should
> cover most of it; i.e. most of git ignores them or just makes note of
> the hashvalues provided.

Oh, I'm sorry. By

	- During fetch/push/pull the full commit including the origin fields is
	  transmitted, however, the objects the origin links are referring to
	  are not (unless they are being transmitted because of other reasons).

I have understood that you fetch the origin target but not commits
referred from it, but instead you meant that you do not follow the
origin link at all.

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata
  2008-09-10 12:30             ` Petr Baudis
@ 2008-09-10 13:14               ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 13:14 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Paolo Bonzini, git

Petr Baudis wrote:
>On Wed, Sep 10, 2008 at 01:49:40PM +0200, Stephen R. van den Berg wrote:
>> Maybe we have a misunderstanding about what "follow a link" means and
>> when it is done.

>Oh, I'm sorry. By

>	- During fetch/push/pull the full commit including the origin fields is
>	  transmitted, however, the objects the origin links are referring to
>	  are not (unless they are being transmitted because of other reasons).

>I have understood that you fetch the origin target but not commits
>referred from it, but instead you meant that you do not follow the
>origin link at all.

Indeed.
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 12:21     ` [RFC] origin link for cherry-pick and revert Theodore Tso
@ 2008-09-10 14:16       ` Stephen R. van den Berg
  2008-09-10 15:10         ` Jeff King
                           ` (2 more replies)
  0 siblings, 3 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 14:16 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Petr Baudis, git

Theodore Tso wrote:
>On Wed, Sep 10, 2008 at 12:56:03AM +0200, Stephen R. van den Berg wrote:

>use.  One of the problems with the origin link is that it is only a
>one way pointer.  Given a newer commit, you know that it is (somehow)
>weekly related to a older commit.  So your proposed workflow only
>works if cherry-picks only happen in one direction.  That isn't always
>true, especially in distributed environments where the bugfix might
>happen on someone else's development branch, and then it gets pulled
>in, or perhaps rebased in, and you want to know they are related.

Well, the definition of the origin link (and a back/forwardport) is that:
- You (as a developer) consider the link relevant for posterity (IOW,
  you consider it to be a proper back/forwardport which should be
  recognisable as such).
- The back/forwardport always has to reference some existing (stable) commit.

Especially the second condition always holds at the time of creation of
the backport (or forwardport, for that matter).  I'm not quite sure
which circumstances you allude to above which would violate this
requirement, can you elaborate on that?

>I would argue the best way to do that is to store (either in the
>object or in the free-form text area) not the link, which would have
>to get renumbered but rather the identifier for the bug(s) that this

The renumbering is not a problem, renumbering is a rare operation since
a project's history is supposed to be stable.  And even if renumbering
is performed, it is a well understood operation of which the renumbering
of the origin links imposes a negligible overhead on top of the existing
renumbering overhead.

>commit fixes.  So for example, consider a convention where in the body
>of the free-form text area, before the Signed-off-by:, Acked-by:, and
>CC: headers for those projects that use them, we add something like
>the following:

>Addresses-Bug: Red_Hat/149480, Sourceforge_Feature/120167
>or
>Addresses-Bug: Debian/432865, Launchpad/203323, Sourceforge_Bug/1926023

>Once you have this information, it is not difficult to maintain a
>berk_db database which maps a particular Bug identifier (i.e.,
>Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of
>commits.

This is nice, I admit, but it has the following downsides:
- It is nontrivial to automate this on execution of "git cherry-pick".
- In a distributed environment this requires a network-reachable bug
  database.
- A network-reachable bug database means that suddenly git needs network
  access for e.g. cherry-pick, revert, gitk, log --graph, blame.
- Network queries for commits containing references kind of kills
  performance.
- Some backports don't have entries in a bug database because they
  weren't bugs to begin with, in which case it becomes impossible to add
  an identifier to the commit message after the fact.
- It relies heavily on tools outside of git-core, which raises the
  threshold for using it.

>The advantage of this scheme is that if a bug has been fixed in
>multiple branches, you can see the association between two commits in
>two different branches very easily.  Furthermore, you get a link back
>to the actual bug in one or more bug tracking systems, which the some
>porcelain program could use to transform into a hot-link which when
>clicked opens up a browser window to the bug in question.

I'm not opposed to links like this, but I consider them a useful extra.
The link back is computationally of the same order of magnitude to find
all existing children of a certain commit; which is well understood and
within reach in most cases.

>In contrast, using your proposed origin scheme, if the bug was
>originally created in some development branch, and then cherry picked
>into two separate maintenance branches, if you don't have the
>development branch in your repository (maybe for some reason that
>development branch wasn't kept for some reason), the origin link in
>the two maintenance branches would point to a non-existent commit ID,
>and you wouldn't be able to estabish a linkage between them.  By using

Yes, you would.  You'd notice that either:
- One origin will point to the other commit (recommended practice,
  cherry-pick ripple-through, so to speak).
- Both origin links point to the same non-existent commit.

>In terms of your arguments about why free-form is bad, in another message:

>>- No strict definition of what it means.
>>- Diverging porcelain implementations making use of the field in ever so
>>  slightly changing ways over the years.

>This can be a problem regardless of where you store the information.

True.  The point is that specifying a definition for a origin
headerfield will narrow down how it is and can be used.  Free-form is
just that, free-form, and merely defines things by convention.

>Whether you store it in the free-form text or in the git object
>header, if you don't make sure it is well-defined, you're in trouble.

Free form can take the form of plaintext explanations detailing the
relationship in a foreign language (worst case example).

>>- You cannot rely on the field being always available.

>This is true regardless of where you store it; older versions of git
>won't store the git origin link, for example, unless you plan to break
>backwards compatibility with all existing git repositories, which
>would be a bad idea.  :-)

True.  What I was alluding to, is that if someone includes a
back/forwardport link in the free-form part of the commit message, then
you cannot predict how they'll do that.  In case of the origin link,
*if* it is used, it will always look the same.

>One nice thing of using text in free-form text fields is that anyone
>can enter it without needing a new version of git.  The downside is

Git is rather portable, I'd say, so anyone wanting to use the new
feature can be bothered to upgrade.

>that people could typo the header in some fashion.  But that can be
>dealt with in a newer version of the git porcelain validates the bug
>identifier and/or checks for obvious spelling mistakes and issues a
>warning ("Looks like you may have mispelled 'Adresses-Bug'; perhaps
>you should fix this via git commit --amend?").  

You mean you'd prefer some kind of AI solution to aid the user in
writing misspelling-free bug identifiers over a simple clean origin link
in the header of a commit message?

>In contrast, if you put it in the git object header, there is no
>possibility of using the field at all until you update to a version of
>git that supports it.  And some developer on your project is using an
>older version of git when they rebase or cherry-pick a commit, the
>origin header will be completely lost; but if it is stored in the
>free-form area, the information will be brought along for the ride for
>free.

Same as above:
If developers care about the backport information, they *can* be
bothered to upgrade git.  It's not rocketscience.

>>- Automated "renumbering" becomes difficult at best.

>This is actually one of the reasons why I don't like the origin link.
>If you use the origin link, it's *still* not obvious whether you
>should rewrite the commit ID or not.  For example, in some workflows,
>you have two branches pointing to the same commit before you do the
>rebase, where the rebase will only update the current branch pointer,
>but there is another branch still pointing at the original series of
>commits.  Worse yet, someone may have done a cherry-pick *before* the
>rebase.  Hence, the only thing you can do is keep *both* commit ID's.
>This means that over time, you can't get rid of any commit ID's when
>you do a rebase, which means the number of commit ID's in the origin
>link will always increase whenever you do a rebase or a cherry-pick.

The recommended practice here is quite simple:

- Origin links should only be created pointing to stable commits (i.e.
  commits which you'd be willing to publish or already have published).

- This implies that pointing an origin link at a commit in a strain that
  you still want to rebase is asking for trouble.  Doing this is akin to
  doing a merge between two branches and then you start rebasing 4
  commits *below* the mergepoint.  Don't do that.

- The only special case I'd allow is if you rebase a strain and the
  origin link points from one of the commits in the strain to be rebased
  back *into* the same strain being rebased (most likely a revert).
  Rebase can be bothered to renumber the origin link in this case.

And when you stick to those rules, the problem you're describing doesn't
happen.

>This is why for the use case where you are trying to figure out
>whether a bug exists in a particular branch, it is ***much*** better
>to rendevous using a bug identifier; it provides an extra layer of
>indirection which results in a much more stable identifer that is
>guaranteed to work.

Unless that commit already lies in the past, and you have no way to
actually add the bugid to the commit.

>(and I am not convinced that you do), the ***much*** better approach
>is to use the same approach as the bug tracking identifier, and add a
>level of indirection.  How would that work in practice?  Whenever you
>create a new commit, create a UUID which is assigned to the patch.

This only works if you know at time of commit that you want to backport
it at some later date.

>Now, whether you store this UUID in the free-form text area, or in the
>git object header, in the long run really doesn't matter.  You can
>just as easily have porcelein suppress a line in the free-form text
>area, as you can have the procelain print the UUID when it is stored
>in the object header.

True.  It's almost as much work.  Though it seems rather silly to start
suppressing lines in the free-form text area, if one can add a proper
headerfield.

>Yes, it means that you have to maintain a separate database so you can
>easily find the list of commits that contain a particular UUID, but I
>suspect you would need this in the case of the origin link concept
>anyway, since sooner or later some of the more useful uses of said
>link would require you to be able to find the commits which had origin
>links to the original commit, which means you would need to create and
>maintain this database anyway.

That isn't true.  Finding commits which have origin links to a certain
commit is just as hard as finding all children of a certain commit.
It's not exactly instant, but it is not a big problem, and depending on
the amount of repositorytraversal you already are doing, it might even
be a negligible amount of extra overhead.

>  And the maintenance of this database
>is purely optional; you only need it if you care about efficiently
>looking up UUID's, and given "time git log > /dev/null" on the kernel
>tree only takes six seconds on my laptop, and "git log > /dev/null"
>only takes 0.148 seconds for e2fsprogs, for many projects you might
>not even need the database to accelerate lookups via UUID.

The database needs to be available to anyone doing a clone of the
repository, which implies that:
- It needs to be network based.
- It needs controlled write access (which is a mess).
- It is slow during blame/gitk operations.
- It is rather nontrivial to get things setup such that someone (after
  cloning the repository) is able to run cherry-pick/gitk/blame/revert
  and have those commands use the database transparently.
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata
  2008-09-10  9:35       ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini
  2008-09-10 10:44         ` Petr Baudis
@ 2008-09-10 14:33         ` Dmitry Potapov
  2008-09-10 15:15           ` Stephen R. van den Berg
  1 sibling, 1 reply; 137+ messages in thread
From: Dmitry Potapov @ 2008-09-10 14:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Petr Baudis, Stephen R. van den Berg, git

On Wed, Sep 10, 2008 at 11:35:18AM +0200, Paolo Bonzini wrote:
> 
> Junio explained that the way achieves the same effect in git is by
> forking the topic branch off the "oldest" branch where the patch will
> possibly be of interest.  Then he can merge it in that branch and all
> the newest ones.  That's great, but not all people are as
> forward-looking (he did say that sometimes he needs to cherrypick).

Those who base their work on the newest ones must be very forward-
looking :) but, seriously, cherry-picking is *not* a normal workflow
with Git. Git is optimized for easy merging while cherry-picking is
a rare operation reserved for correcting some past mistakes.

> 
> Another problem is that in some projects actually there are two "maint"
> branches (e.g. currently GCC 4.2 and GCC 4.3), and most developers do
> not care about what goes in the older "maint" branch; they develop for
> trunk and for the newer "maint" branch, and then one person comes and
> cherry-picks into the older "maint" branch.  This has two problems:
> 
> 1) Having to fork topic branches off the older branch would force extra
> testing on the developers.

If a branch is meant to included in the oldest version, it must be
tested with that version anyway, and it is better when it is written for
the old version, because functions tend to be more backward compatible
than forward compatible. In other words, functions may often acquire
some extra functionality over time without changing their signature, so
the code written for a new version will merge without any conflict to
the old one, but it won't work correctly under some conditions. It is
certainly possible to have a problem in the opposite direction, but it
is much less likely, and usually bugs introduced in the development
version are not as bad as destabilizing a stable branch. Thus starting
branch that is clearly meant for inclusion to the old version from that
version is the right thing do.

Of course, if you have more than one stable branch for a long time then
you may want some branches forked from the new stable. You can do that
by merging uninteresting changes from the new stable with the 'ours'
strategy (so they will be ignored), and after that merging actually
interesting features from the new stable.

In contrast to cherry-picking, the real merge creates the history that
can be easily visualized and understood.

> 
> 2) Besides this, topic branches are not cloned, so if I am the
> integrator on the older "maint" branch, I need to dig manually in the
> commits to find bugfixes.  True, I could use Bugzilla, but what if I
> want to use git instead?  There is "git cherry -v ... | grep -w ^+.*PR",
> except that it has too many false negatives (fixes that have already
> been backported, but do show up in the list).

If you clearly mark all bugs in the commit message, there will be no
problem to find them by grepping log. There is a lot of potentially
useful information, and the 'origin' link is just one of many. It may
be okay to do some general mechanism for custom commit attributes (if
it's really necessary), but making a hack for one specific item of
information feels very wrong. In fact, I have not convinced at all
that the free-form text is not suitable to store this information.

Dmitry

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 14:16       ` Stephen R. van den Berg
@ 2008-09-10 15:10         ` Jeff King
  2008-09-10 21:50           ` Stephen R. van den Berg
  2008-09-10 16:18         ` Theodore Tso
  2008-09-10 16:40         ` Petr Baudis
  2 siblings, 1 reply; 137+ messages in thread
From: Jeff King @ 2008-09-10 15:10 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Theodore Tso, Petr Baudis, git

On Wed, Sep 10, 2008 at 04:16:30PM +0200, Stephen R. van den Berg wrote:

> >Once you have this information, it is not difficult to maintain a
> >berk_db database which maps a particular Bug identifier (i.e.,
> >Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of
> >commits.
> 
> This is nice, I admit, but it has the following downsides:
> - It is nontrivial to automate this on execution of "git cherry-pick".

Maybe a cherry-picking hook?

> - In a distributed environment this requires a network-reachable bug
>   database.

Use a distributed bug tracking system (DBTS).

> - A network-reachable bug database means that suddenly git needs network
>   access for e.g. cherry-pick, revert, gitk, log --graph, blame.

Use a DBTS.

> - Network queries for commits containing references kind of kills
>   performance.

Use a DBTS.

> - Some backports don't have entries in a bug database because they
>   weren't bugs to begin with, in which case it becomes impossible to add
>   an identifier to the commit message after the fact.

Use a DBTS, since then you can generally make up a new UUID on the spot.

> - It relies heavily on tools outside of git-core, which raises the
>   threshold for using it.

True.

But maybe Ted is on to something here. Rather than adding the
information to the commit object itself, why not maintain a separate
mapping, but keep it _within git_. That is how most of the DBTS's work
that I have seen. Maybe it is possible to implement some subset of the
features in a tool that could become part of core git.

There was a proposal at some point for a "notes" feature which would
allow after-the-fact annotation of commits. I don't recall the exact
details, but I think it stored its information as a git tree of blobs.
You could choose whether or not to transfer the notes based on
transferring a ref pointing to the notes tree.

I'm not sure how applicable this is to your problem, but if you want to
investigate you can find discussion in the list archive under the name
"notes".

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata
  2008-09-10 14:33         ` Dmitry Potapov
@ 2008-09-10 15:15           ` Stephen R. van den Berg
  2008-09-10 15:24             ` Paolo Bonzini
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 15:15 UTC (permalink / raw)
  To: Dmitry Potapov; +Cc: Paolo Bonzini, Petr Baudis, git

Dmitry Potapov wrote:
>On Wed, Sep 10, 2008 at 11:35:18AM +0200, Paolo Bonzini wrote:
>> Another problem is that in some projects actually there are two "maint"
>> branches (e.g. currently GCC 4.2 and GCC 4.3), and most developers do
>> not care about what goes in the older "maint" branch; they develop for
>> trunk and for the newer "maint" branch, and then one person comes and
>> cherry-picks into the older "maint" branch.  This has two problems:

>> 1) Having to fork topic branches off the older branch would force extra
>> testing on the developers.

>If a branch is meant to included in the oldest version, it must be
>tested with that version anyway, and it is better when it is written for
>the old version, because functions tend to be more backward compatible
>than forward compatible. In other words, functions may often acquire
>some extra functionality over time without changing their signature, so
>the code written for a new version will merge without any conflict to
>the old one, but it won't work correctly under some conditions. It is
>certainly possible to have a problem in the opposite direction, but it
>is much less likely, and usually bugs introduced in the development
>version are not as bad as destabilizing a stable branch. Thus starting
>branch that is clearly meant for inclusion to the old version from that
>version is the right thing do.

>Of course, if you have more than one stable branch for a long time then
>you may want some branches forked from the new stable. You can do that
>by merging uninteresting changes from the new stable with the 'ours'
>strategy (so they will be ignored), and after that merging actually
>interesting features from the new stable.

>In contrast to cherry-picking, the real merge creates the history that
>can be easily visualized and understood.

Could you explain how the above mechanisms work based on the following
cherry-pick action:

A -- B -- C -- D -- L
      \            /
       E -- F -- G -- H -- K

D is the stable branch.
K is the development branch.
G is cherry-picked and applied to D producing L.
The origin link of L would have contained (G, F).

How would such a workflow be implemented using the temporary branches
you describe?

>If you clearly mark all bugs in the commit message, there will be no
>problem to find them by grepping log. There is a lot of potentially

Sometimes they're not bugs, yet they still are backported and thus carry
no special marks.

>useful information, and the 'origin' link is just one of many. It may

True, but it's one of the few machine-useable ones.

>be okay to do some general mechanism for custom commit attributes (if
>it's really necessary),

That's the problem, a general mechanism is undesirable, that we already
have the free-form textfield for.

> but making a hack for one specific item of
>information feels very wrong.

It's a rather well-defined usefull property (which precludes it from
being a hack, I suppose).
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata
  2008-09-10 15:15           ` Stephen R. van den Berg
@ 2008-09-10 15:24             ` Paolo Bonzini
  0 siblings, 0 replies; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-10 15:24 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Dmitry Potapov, git

> > > fork topic branches off the older branch [and merge them]
>
> Could you explain how the above mechanisms work based on the following
> cherry-pick action:
> 
> A -- B -- C -- D -- L
>       \            /
>        E -- F -- G -- H -- K
> 
> D is the stable branch.
> K is the development branch.
> G is cherry-picked and applied to D producing L.
> The origin link of L would have contained (G, F).
> 
> How would such a workflow be implemented using the temporary branches
> you describe?

You don't.  You do everything in topic branches based off the stable
branch, and you merge them.  That's the other way round, compared to
what you (and I) are used to.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10  5:42           ` Stephen R. van den Berg
@ 2008-09-10 15:30             ` Linus Torvalds
  2008-09-10 23:09               ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Linus Torvalds @ 2008-09-10 15:30 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, git

On Wed, 10 Sep 2008, Stephen R. van den Berg wrote:
> 
> But then how would someone who clones the repository get at the information?

You just said it wouldn't get there with fetches.

If clone acts differently from a "full" fetch, something is really really 
wrong.

> The information is essential to understand backports between the various
> stable branches.

No it's not. You can mention the backport explicitly in the commit 
message, and then you get hyperlinks in the graphical viewers. That works 
when people _want_ it to work, instead of in some hidden automatic manner 
that does entirely the wrong thing in all the common cases.

What more do you want?

		Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10  8:30           ` Paolo Bonzini
@ 2008-09-10 15:32             ` Linus Torvalds
  2008-09-10 15:37               ` Paolo Bonzini
  0 siblings, 1 reply; 137+ messages in thread
From: Linus Torvalds @ 2008-09-10 15:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stephen R. van den Berg, Jakub Narebski, git



On Wed, 10 Sep 2008, Paolo Bonzini wrote:
> 
> Not really local data.  More like _weakly referenced_ data.  If it is
> there, cool.  If it is not there, no big deal.

You think it's "cool".

I think it is "unreliable, random, and depends on the phase of the moon".

My definition of "cool" is a totally different thing. What you describe is 
the very anti-thesis of cool.

If you want unreliable and random, use CVS. Please.

		Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 15:32             ` Linus Torvalds
@ 2008-09-10 15:37               ` Paolo Bonzini
  2008-09-10 15:43                 ` Linus Torvalds
  0 siblings, 1 reply; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-10 15:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen R. van den Berg, Jakub Narebski, git

Linus Torvalds wrote:
> 
> On Wed, 10 Sep 2008, Paolo Bonzini wrote:
>> Not really local data.  More like _weakly referenced_ data.  If it is
>> there, cool.  If it is not there, no big deal.
> 
> You think it's "cool".
> 
> I think it is "unreliable, random, and depends on the phase of the moon".

I think that shallow clones are not any different from this.  If the
required piece of history is there, cool.  If they're not there, no big
deal.

I understood the hyperbole, but I think that it's not unreliable,
because all it relies on is the uniqueness of SHA1 values.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 15:37               ` Paolo Bonzini
@ 2008-09-10 15:43                 ` Linus Torvalds
  2008-09-10 15:46                   ` Linus Torvalds
  0 siblings, 1 reply; 137+ messages in thread
From: Linus Torvalds @ 2008-09-10 15:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stephen R. van den Berg, Jakub Narebski, git

On Wed, 10 Sep 2008, Paolo Bonzini wrote:
> 
> I think that shallow clones are not any different from this.  If the
> required piece of history is there, cool.  If they're not there, no big
> deal.

Sure. I don't use them either. But because I don't use them, it doesn't 
affect me. It also doesn't change the core git data structures in any way 
to introduce any new problems.

Also, if there isn't a required piece of history, things generally break 
very loudly. IOW, there are only certain things you can do with a shallow 
repo. In general it's absolutely _not_ a "no big deal" issue, quite the 
reverse - it's a deal-breaker.

			Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 15:43                 ` Linus Torvalds
@ 2008-09-10 15:46                   ` Linus Torvalds
  2008-09-10 15:57                     ` Paolo Bonzini
  2008-09-10 16:23                     ` Jakub Narebski
  0 siblings, 2 replies; 137+ messages in thread
From: Linus Torvalds @ 2008-09-10 15:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stephen R. van den Berg, Jakub Narebski, git

On Wed, 10 Sep 2008, Linus Torvalds wrote:
>
> Sure. I don't use them either. But because I don't use them, it doesn't 
> affect me. It also doesn't change the core git data structures in any way 
> to introduce any new problems.

Btw, so far nobody has even _explained_ what the advantage of the origin 
link is. It apparently has no effect for most things, and for other things 
it has some (unspecified) effect when it can be resolved.

Apart from the "dotted line" in graphical history viewers, I haven't 
actually heard any single concrete example of exactly what it would *do*.

And that dotted line really does sound like something you could do with 
just the existing "hyperlink" functionality in the commit message.

		Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 15:46                   ` Linus Torvalds
@ 2008-09-10 15:57                     ` Paolo Bonzini
  2008-09-10 23:15                       ` Stephen R. van den Berg
  2008-09-10 16:23                     ` Jakub Narebski
  1 sibling, 1 reply; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-10 15:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen R. van den Berg, Jakub Narebski, git


> Btw, so far nobody has even _explained_ what the advantage of the origin 
> link is. It apparently has no effect for most things, and for other things 
> it has some (unspecified) effect when it can be resolved.
> 
> Apart from the "dotted line" in graphical history viewers, I haven't 
> actually heard any single concrete example of exactly what it would *do*.

I mentioned git-cherry as an additional use case.  Automatic rename
detection works because it might have the occasional false negative, but
it has practically no false positive, and those are what screws up
merges.  But automatic changeset detection a la git-patch-id has too
many false negatives to make the current implementation of git-cherry
practical, and here's when the origin link comes in.  Also, automatic
changeset detection does not work with reverts, only with cherry-picks.

Blame could also use the origin link to go backwards in the history and
find the origin of the code, without being fooled by reverts.

I'll quote another message I sent in the thread:

>> And why are the notes created by git cherry-pick -x insufficient for that?
> 
> For example, these notes (or the ones created by "git revert") are
> *wrong* because they talk about commits instead of changesets (deltas
> between two commits).
> 
> Why is only one commit present?  Because these messages are meant for
> users, not for programs.  That's easy to see: users think of commits as
> deltas anyway, even though git stores them as snapshots---"git show
> HEAD" shows a delta, not a snapshot.
> 
> And what does this mean for programs?  That they must resort to
> commit-message scraping to distinguish the two cases. (*)
> 
>    (*) A GUI blame program, for example, would need to distinguish
>    whether code added by a commit is taken from commit 4329bd8, or is
>    reverting commit 4329bd8.  (In the first case, the author of that
>    code is whoever was responsible for that code in 4329bd8; in the
>    second case, it is whoever was responsible for that code in
>    4329bd8^).  If recording changesets, you see 4329bd8^..4329bd8 in
>    the first case, and 4329bd8..4329bd8^ in the second, so it is trivial
>    to follow the chain.
> 
> And scraping is bad.  Imagine people that are writing commit messages in
> their native language.  What if they patch git to translate the magic
> notes created by "git cherry-pick -x" or "git revert" (maybe a future
> version of git will do that automatically)?  Should they translate also
> every program that scrapes the messages?
> 
> 
> Whenever there is a piece of data that could be useful to programs (no
> matter if plumbing or porcelain), I consider free form notes to be bad.
> Because data is data, and metadata is metadata.
> 
> If there was a generic way to put porcelain-level metadata in commit
> messages (e.g. Signed-Off-By and Acknowledged-By can be already
> considered metadata), I would not be so much in favor of "origin" links
> being part of the commit object's format.  Now if you think about it,
> commit references within this kind of metadata would have mostly the
> properties that Stephen explained in his first message:
> 
> 1) they would be rewritten by git-filter-branch
> 
> 2) these references, albeit weak by default

(Note to Linus: reinforcement of your disagreement will be implicitly
assumed :-)

> could optionally be
> followed when fetching (either with command-line or configuration options)
> 
> 3) they would not be pruned by git-gc, unlike notes
> 
> 4) possibly, git rev-list --topo-order would sort commits by taking into
> account metadata references too.
> 
> So the implementation effort would be roughly the same.
> 
> But, can you think of any other such metadata?  Personally I can't, so
> while I understand the opposition to a new commit header field that
> would be there from here to eternity (or until the LHC starts), I do
> think it is the simplest thing that can possibly work.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 14:16       ` Stephen R. van den Berg
  2008-09-10 15:10         ` Jeff King
@ 2008-09-10 16:18         ` Theodore Tso
  2008-09-10 16:40         ` Petr Baudis
  2 siblings, 0 replies; 137+ messages in thread
From: Theodore Tso @ 2008-09-10 16:18 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Petr Baudis, git

On Wed, Sep 10, 2008 at 04:16:30PM +0200, Stephen R. van den Berg wrote:
> The renumbering is not a problem, renumbering is a rare operation since
> a project's history is supposed to be stable.  And even if renumbering
> is performed, it is a well understood operation of which the renumbering
> of the origin links imposes a negligible overhead on top of the existing
> renumbering overhead.

Well *you* were the one using this as an argument for using the origin
link.  But I'll note that in some workflows, rebasing happens all the
time when a patch is being developed and moved around.  Sometimes
patches are created in git, exported as a patch, and then it re-enters
git again later (which is another reason why using an external UUID or
bug tracking identifier is a good thing).

> >Addresses-Bug: Red_Hat/149480, Sourceforge_Feature/120167
> >or
> >Addresses-Bug: Debian/432865, Launchpad/203323, Sourceforge_Bug/1926023
> 
> >Once you have this information, it is not difficult to maintain a
> >berk_db database which maps a particular Bug identifier (i.e.,
> >Red_Hat/149480, or Debian/471977, or Launchpad/203323) to a series of
> >commits.
> 
> This is nice, I admit, but it has the following downsides:
> - It is nontrivial to automate this on execution of "git cherry-pick".

It's trivial if it's in the free-form text.  In fact, it happens
automatically.  If it's stored within the git commit object, then it
will be done in the C code (if you've updated to the latest git;
again, one of the advantages of doing it in free-form text).

> - In a distributed environment this requires a network-reachable bug
>   database.
> - A network-reachable bug database means that suddenly git needs network
>   access for e.g. cherry-pick, revert, gitk, log --graph, blame.
> - Network queries for commits containing references kind of kills
>   performance.

No, because you don't need to look up the bug identifier unless you
want to, you know, actually look at the bug.  Otherwise, we are just
using something like "debian/432865" as an identifier; you only need
to look them up if you want to look up the bug.  Any time you have a
collaborative development environment, you will need either a
centralized, network accessible bug tracking system, or use a
distributed bug tracking system.  Either way, though, if it's just
matter of seeing whether or not a bug fix such as debian/432865 is
fixed by some commit in some branch, using the bug identifier actually
makes this *easier*, not harder.

> - Some backports don't have entries in a bug database because they
>   weren't bugs to begin with, in which case it becomes impossible to add
>   an identifier to the commit message after the fact.

This is true.  The transition is a little easier if you are pointing
to a pre-existing commit, whereas if you need some kind of rendevous
identifer (whether it is a bug ID or some UUID).  On the other hand,
you've cherry-picked some bug fix using a git that didn't support the
origin link, you'd also be screwed, so 

> - It relies heavily on tools outside of git-core, which raises the
>   threshold for using it.

Well, it relies on changes to git --- just like the origin link
requires changes to git.  If the it is implemented using free-form
text, which is a great way to prototype it, you have the *option* of
implementing it via either git porcelain changes or outside tools like
emacs or vi macros (just as most of us who are kernel developers have
editor macros that insert Signed-off-by: into git commit messages, as
well as changes in git porcelain such that "git am -s" automatically
adds the Signed-off-by header).  But given the wildly successful use
of Signed-off-by in the kernel sources, this objection seems not very
credible, to say the least.

> The recommended practice here is quite simple:
> 
> - Origin links should only be created pointing to stable commits (i.e.
>   commits which you'd be willing to publish or already have published).
> 
> - This implies that pointing an origin link at a commit in a strain that
>   you still want to rebase is asking for trouble.  Doing this is akin to
>   doing a merge between two branches and then you start rebasing 4
>   commits *below* the mergepoint.  Don't do that.

Right.  And if we use a UUID to identify commits, then we don't have
to have these restrictions.

> - The only special case I'd allow is if you rebase a strain and the
>   origin link points from one of the commits in the strain to be rebased
>   back *into* the same strain being rebased (most likely a revert).
>   Rebase can be bothered to renumber the origin link in this case.

Nope, because you might have a branch to the original origin link, and
some body else may have already done a cherry-pick to the original
origin commit.  You've hand-waved around the problem by saying, "don't
do that", but it just points out how **fragile** the origin link
scheme really is.  It's just not robust.

In contrast, generating a UUID per commit is much more robust, since
you can now export it out of git in a patch, and then re-import it
later, and have the right thing happen.

> >(and I am not convinced that you do), the ***much*** better approach
> >is to use the same approach as the bug tracking identifier, and add a
> >level of indirection.  How would that work in practice?  Whenever you
> >create a new commit, create a UUID which is assigned to the patch.
> 
> This only works if you know at time of commit that you want to backport
> it at some later date.

I'm suggesting that all commits (once you upgrade to a version of git
that supports this --- and you've already handwaved away the question
on whether you can get all of developers for a project to upgrade to
the latest git, remember) would have a UUID generated.  That UUID
could be stored internal to git, or (perhaps as an initial prototyping
as a proof of concept, before we add something into the git commit
record **forever**) could be in the free-form text.

> >Yes, it means that you have to maintain a separate database so you can
> >easily find the list of commits that contain a particular UUID, but I
> >suspect you would need this in the case of the origin link concept
> >anyway, since sooner or later some of the more useful uses of said
> >link would require you to be able to find the commits which had origin
> >links to the original commit, which means you would need to create and
> >maintain this database anyway.
> 
> That isn't true.  Finding commits which have origin links to a certain
> commit is just as hard as finding all children of a certain commit.
> It's not exactly instant, but it is not a big problem, and depending on
> the amount of repositorytraversal you already are doing, it might even
> be a negligible amount of extra overhead.

My point is you'll need this separate database anyway, in order to
deal with the cases where you have two commits that point to the same
(non-existent) origin link, one in maint1, and one in maint2, and
given the commit in the maint1 branch, you want to see if there is
related comit in the maint2 branch you'll need this database anyway
(or you do a brute force search of the repository, which isn't too bad
for modest datbases).  It's identical in both cases --- but having a
UUID field in the commit is much *cleaner*, since it merely states
that these two commits introduce the same semantic change; it doesn't
imply some kind of parent/child relationship which an origin link
implies.

> The database needs to be available to anyone doing a clone of the
> repository, which implies that:
> - It needs to be network based.
> - It needs controlled write access (which is a mess).
> - It is slow during blame/gitk operations.
> - It is rather nontrivial to get things setup such that someone (after
>   cloning the repository) is able to run cherry-pick/gitk/blame/revert
>   and have those commands use the database transparently.

No it doesn't, since the database can be inferred from the objects in
the repository.  So you can generate it locally if you need it, merely
as an optimization.  The same is true for the origin link proposal, as
I've said.

    		    	    	     - Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 15:46                   ` Linus Torvalds
  2008-09-10 15:57                     ` Paolo Bonzini
@ 2008-09-10 16:23                     ` Jakub Narebski
  2008-09-11 23:28                       ` Sam Vilain
  1 sibling, 1 reply; 137+ messages in thread
From: Jakub Narebski @ 2008-09-10 16:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paolo Bonzini, Stephen R. van den Berg, git

Linus Torvalds wrote:
> On Wed, 10 Sep 2008, Linus Torvalds wrote:
> >
> > Sure. I don't use them either. But because I don't use them, it doesn't 
> > affect me. It also doesn't change the core git data structures in any way 
> > to introduce any new problems.
> 
> Btw, so far nobody has even _explained_ what the advantage of the origin 
> link is. It apparently has no effect for most things, and for other things 
> it has some (unspecified) effect when it can be resolved.
> 
> Apart from the "dotted line" in graphical history viewers, I haven't 
> actually heard any single concrete example of exactly what it would *do*.
> 
> And that dotted line really does sound like something you could do with 
> just the existing "hyperlink" functionality in the commit message.

As far as I understand (note: I'm neither for, nor against the proposal;
although I think it has thin chance to be accepted, especially soon),
it is for graphical history viewers, for git-cherry to make it more
precise (to detect duplicated/cherry-picked changes better), and in
the future possibly to help history-aware merge strategies. And probably
help patch management interfaces.

On the theoretical front it looks like extension/generalization of
a parent link, marking given commit do be derivative not only some
set of trees, or some line of history, but also on some changeset.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 14:16       ` Stephen R. van den Berg
  2008-09-10 15:10         ` Jeff King
  2008-09-10 16:18         ` Theodore Tso
@ 2008-09-10 16:40         ` Petr Baudis
  2008-09-10 17:58           ` Paolo Bonzini
  2 siblings, 1 reply; 137+ messages in thread
From: Petr Baudis @ 2008-09-10 16:40 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Theodore Tso, git

On Wed, Sep 10, 2008 at 04:16:30PM +0200, Stephen R. van den Berg wrote:
> Theodore Tso wrote:
> >  And the maintenance of this database
> >is purely optional; you only need it if you care about efficiently
> >looking up UUID's, and given "time git log > /dev/null" on the kernel
> >tree only takes six seconds on my laptop, and "git log > /dev/null"
> >only takes 0.148 seconds for e2fsprogs, for many projects you might
> >not even need the database to accelerate lookups via UUID.
> 
> The database needs to be available to anyone doing a clone of the
> repository, which implies that:
> - It needs to be network based.
> - It needs controlled write access (which is a mess).
> - It is slow during blame/gitk operations.
> - It is rather nontrivial to get things setup such that someone (after
>   cloning the repository) is able to run cherry-pick/gitk/blame/revert
>   and have those commands use the database transparently.

The database can just live in a special branch, with trees organized the
same way the object database is, possibly in a more optimized way
(having the HEAD trees cached around inside Git, etc.).  This should be
no rocked science if the design is given a little thought, and should be
fairly fast afterwards.

I'm not endorsing assigning UUIDs to commits now at all (but I don't
have time to formulate a comprehensive argument against that either).

However, having a commit -> nonessential_volatile_metadata database
would be useful for many other things as well! For example amending
commit messages later, maintaining general linkage between related
commits, tracking explicit rename hints for Git (like the Samba guys
would appreciate right now, and me many times in the past - note that
this is NOT the same as directly tricking renames within Git history)
or caching expensive computations with mostly static results (like the
rename detection or maybe pickaxe indexes - that could be quite large,
so we might want to actually separate different kinds of data to
separate branches).

-- 
				Petr "Pasky" Baudis
The next generation of interesting software will be done
on the Macintosh, not the IBM PC.  -- Bill Gates

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 16:40         ` Petr Baudis
@ 2008-09-10 17:58           ` Paolo Bonzini
  2008-09-10 22:44             ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-10 17:58 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Stephen R. van den Berg, Theodore Tso, git


> I'm not endorsing assigning UUIDs to commits now at all (but I don't
> have time to formulate a comprehensive argument against that either).
> 
> However, having a commit -> nonessential_volatile_metadata database
> would be useful for many other things as well!

100 points to Petr. :-)

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg
                   ` (3 preceding siblings ...)
  2008-09-09 21:13 ` Petr Baudis
@ 2008-09-10 20:32 ` Miklos Vajna
  2008-09-10 20:55   ` Nicolas Pitre
  4 siblings, 1 reply; 137+ messages in thread
From: Miklos Vajna @ 2008-09-10 20:32 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

On Tue, Sep 09, 2008 at 03:22:12PM +0200, "Stephen R. van den Berg" <srb@cuci.nl> wrote:
> origin a1184d85e8752658f02746982822f43f32316803 2
> author Junio C Hamano <gitster@pobox.com> 1220132115 -0700
> committer Junio C Hamano <gitster@pobox.com> 1220153445 -0700

First, sorry for joining the thread lately, as far as I see the idea I
want to shere here was not mentioned by anybody yet.

So, git revert already includes the "origin" of the commit in the commit
message, and I think that is fine for most people.

What about adding an option to cherry-pick to add a similar
"commit 7b27718bdb1b70166383dec91391df5534d449ee upstream" or similar
string to the commit message?

As far as I see the kernel -stable tree already have this, but it is
added manually and in many different forms, like:

[ Upstream commit 5f3a9a207f1fccde476dd31b4c63ead2967d934f ]

commit 7b27718bdb1b70166383dec91391df5534d449ee upstream

Already in Linus' tree:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b25b791b13aaa336b56c4f9bd417ff126363f80b

etc.

Once git would provide a standard way to do this, that could be used to
avoid this.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 20:32 ` [RFC] origin link for cherry-pick and revert Miklos Vajna
@ 2008-09-10 20:55   ` Nicolas Pitre
  2008-09-10 21:06     ` Miklos Vajna
  0 siblings, 1 reply; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-10 20:55 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Stephen R. van den Berg, git

On Wed, 10 Sep 2008, Miklos Vajna wrote:

> So, git revert already includes the "origin" of the commit in the commit
> message, and I think that is fine for most people.
> 
> What about adding an option to cherry-pick to add a similar
> "commit 7b27718bdb1b70166383dec91391df5534d449ee upstream" or similar
> string to the commit message?

It's already there: -x

Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 20:55   ` Nicolas Pitre
@ 2008-09-10 21:06     ` Miklos Vajna
  0 siblings, 0 replies; 137+ messages in thread
From: Miklos Vajna @ 2008-09-10 21:06 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Stephen R. van den Berg, git

[-- Attachment #1: Type: text/plain, Size: 185 bytes --]

On Wed, Sep 10, 2008 at 04:55:19PM -0400, Nicolas Pitre <nico@cam.org> wrote:
> It's already there: -x

Thanks, and sorry for not reading carefully the manpage before sending
the mail.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 15:10         ` Jeff King
@ 2008-09-10 21:50           ` Stephen R. van den Berg
  2008-09-10 21:54             ` Jeff King
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 21:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Theodore Tso, Petr Baudis, git

Jeff King wrote:
>On Wed, Sep 10, 2008 at 04:16:30PM +0200, Stephen R. van den Berg wrote:
>> This is nice, I admit, but it has the following downsides:
>> - It is nontrivial to automate this on execution of "git cherry-pick".

>Maybe a cherry-picking hook?

Yes, that works, but it is non-trivial, especially since it needs to
work for gitk, log --graph, blame and revert as well.

>> - In a distributed environment this requires a network-reachable bug
>>   database.

>Use a distributed bug tracking system (DBTS).

If it were part of git-core, that would work.

>But maybe Ted is on to something here. Rather than adding the
>information to the commit object itself, why not maintain a separate
>mapping, but keep it _within git_. That is how most of the DBTS's work
>that I have seen. Maybe it is possible to implement some subset of the
>features in a tool that could become part of core git.

Interesting thought.

>There was a proposal at some point for a "notes" feature which would
>allow after-the-fact annotation of commits. I don't recall the exact
>details, but I think it stored its information as a git tree of blobs.
>You could choose whether or not to transfer the notes based on
>transferring a ref pointing to the notes tree.

The idea is nice, but if we were to use it to store the origin link
information, the following happens:
- Origin link information is rare.
- Yet during a log/gitk/blame run the information might need to
  be queried for at every commit.
- Since in most cases the origin information does not exist, this
  will cause misses to fill the dentry cache for directory lookups, and
  thus killing performance.
- In order to make this efficient, a different database lookup system is
  needed that is fast for misses.

Whereas if the information is part of the commit, it costs nothing in
the typical case (no origin information present).
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 21:50           ` Stephen R. van den Berg
@ 2008-09-10 21:54             ` Jeff King
  2008-09-10 22:34               ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Jeff King @ 2008-09-10 21:54 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Theodore Tso, Petr Baudis, git

On Wed, Sep 10, 2008 at 11:50:45PM +0200, Stephen R. van den Berg wrote:

> >There was a proposal at some point for a "notes" feature which would
> >allow after-the-fact annotation of commits. I don't recall the exact
> >details, but I think it stored its information as a git tree of blobs.
> >You could choose whether or not to transfer the notes based on
> >transferring a ref pointing to the notes tree.
> 
> The idea is nice, but if we were to use it to store the origin link
> information, the following happens:
> - Origin link information is rare.
> - Yet during a log/gitk/blame run the information might need to
>   be queried for at every commit.
> - Since in most cases the origin information does not exist, this
>   will cause misses to fill the dentry cache for directory lookups, and
>   thus killing performance.
> - In order to make this efficient, a different database lookup system is
>   needed that is fast for misses.

I think you are misunderstanding what I meant by "git tree" here. It is
literally a git tree object, so you don't ask the filesystem at all. You
are looking up within the single object file. If it's a miss, you know
after seeing that object. If not, then you dereference the blob object
that contains the notes.

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 21:54             ` Jeff King
@ 2008-09-10 22:34               ` Stephen R. van den Berg
  2008-09-10 22:55                 ` Jeff King
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 22:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Theodore Tso, Petr Baudis, git

Jeff King wrote:
>On Wed, Sep 10, 2008 at 11:50:45PM +0200, Stephen R. van den Berg wrote:
>> >There was a proposal at some point for a "notes" feature which would
>> >allow after-the-fact annotation of commits. I don't recall the exact
>> >details, but I think it stored its information as a git tree of blobs.
>> >You could choose whether or not to transfer the notes based on
>> >transferring a ref pointing to the notes tree.

>> The idea is nice, but if we were to use it to store the origin link
>> information, the following happens:
>> - Origin link information is rare.

>I think you are misunderstanding what I meant by "git tree" here. It is
>literally a git tree object, so you don't ask the filesystem at all. You
>are looking up within the single object file. If it's a miss, you know
>after seeing that object. If not, then you dereference the blob object
>that contains the notes.

I see.  Indeed.  That's a lot better.
Did the binary search inside tree objects ever get implemented?

It is unclear why the latest commit notes proposal didn't make it,
though I admit that storing the origin link information in there seems
feasible.

The downsides when doing that are:
- The lookup cost is small, but still noticable, since it is sometimes
  done on every commit; using the in-commit origin headerfield solves
  this at negligible cost.
- The origin information is no longer cryptographically protected (under
  certain circumstances this could be considered an advantage and a
  disadvantage at the same time).
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 17:58           ` Paolo Bonzini
@ 2008-09-10 22:44             ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 22:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Petr Baudis, Theodore Tso, git

Petr wrote:
>> However, having a commit -> nonessential_volatile_metadata database
>> would be useful for many other things as well!

Which brings us back to the "commit notes" proposal.
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 22:34               ` Stephen R. van den Berg
@ 2008-09-10 22:55                 ` Jeff King
  2008-09-10 23:19                   ` Stephen R. van den Berg
  2008-09-11  2:46                   ` Nicolas Pitre
  0 siblings, 2 replies; 137+ messages in thread
From: Jeff King @ 2008-09-10 22:55 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Theodore Tso, Petr Baudis, git

On Thu, Sep 11, 2008 at 12:34:27AM +0200, Stephen R. van den Berg wrote:

> I see.  Indeed.  That's a lot better.
> Did the binary search inside tree objects ever get implemented?

I believe it's still linear (and skimming tree-walk.c:find_tree_entry
seems to confirm). However, one advantage of such an approach is that it
will improve as tree lookup improves (e.g., I believe the pack v4 work
included improvements in this area).

> The downsides when doing that are:
> - The lookup cost is small, but still noticable, since it is sometimes
>   done on every commit; using the in-commit origin headerfield solves
>   this at negligible cost.
> - The origin information is no longer cryptographically protected (under
>   certain circumstances this could be considered an advantage and a
>   disadvantage at the same time).

Yes, those are inherent in the scheme, as is the upside that one can
make and distribute such annotations separately from commit creation.

I haven't thought enough about it to decide whether there is a scenario
where making such a "cherry-picked from" annotation might make use of
that property.

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 15:30             ` Linus Torvalds
@ 2008-09-10 23:09               ` Stephen R. van den Berg
  2008-09-11  0:39                 ` Linus Torvalds
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 23:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git

Linus Torvalds wrote:
>On Wed, 10 Sep 2008, Stephen R. van den Berg wrote:
>> But then how would someone who clones the repository get at the information?

>You just said it wouldn't get there with fetches.

True and still valid.

>If clone acts differently from a "full" fetch, something is really really 
>wrong.

It does not act differently.

Let me elaborate:

- The origin field is part of the commit (and only present if
  *consciously* added by the committer), and therefore is transmitted
  along with the rest of a commit upon a fetch.

- The commits being referred to by the origin field are *not*
  transmitted upon a fetch.

- Given a repository with 4 long lived published branches called A, B, C and D
  and a backport from development branch D cherry-picked -o into branch A
  which creates an origin field pointing back to (D^,D^^)

- Now you fetch just branch A from this repository.  This will not cause
  branch D to be pulled in as well.

- However, if you explicitly pull D, the origin information from A to D can
  be used.  People doing a generic clone get all four branches, and
  therefore have all the important commits which normally could contain
  origin links.  Note that even during a clone, commits pointed to by
  origin links are not being transmitted (unless there already are other
  reasons to send them along).

>> The information is essential to understand backports between the various
>> stable branches.

>No it's not. You can mention the backport explicitly in the commit 
>message, and then you get hyperlinks in the graphical viewers. That works 
>when people _want_ it to work, instead of in some hidden automatic manner 
>that does entirely the wrong thing in all the common cases.

Could you spell out one of the common cases where it would do entirely
the wrong thing?
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 15:57                     ` Paolo Bonzini
@ 2008-09-10 23:15                       ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 23:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Linus Torvalds, Jakub Narebski, git

Paolo Bonzini wrote:
>> Btw, so far nobody has even _explained_ what the advantage of the origin 
>> link is. It apparently has no effect for most things, and for other things 
>> it has some (unspecified) effect when it can be resolved.

>> Apart from the "dotted line" in graphical history viewers, I haven't 
>> actually heard any single concrete example of exactly what it would *do*.

It allows one to follow and view the evolvement of a patch over time during
the various backports.
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 22:55                 ` Jeff King
@ 2008-09-10 23:19                   ` Stephen R. van den Berg
  2008-09-11  5:16                     ` Paolo Bonzini
  2008-09-11  2:46                   ` Nicolas Pitre
  1 sibling, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-10 23:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Theodore Tso, Petr Baudis, git

Jeff King wrote:
>On Thu, Sep 11, 2008 at 12:34:27AM +0200, Stephen R. van den Berg wrote:
>> - The origin information is no longer cryptographically protected (under
>>   certain circumstances this could be considered an advantage and a
>>   disadvantage at the same time).

>I haven't thought enough about it to decide whether there is a scenario
>where making such a "cherry-picked from" annotation might make use of
>that property.

Being able to subvert the authenticity of git blame by providing fake
origin information is not very appealing.
-- 
Sincerely,
           Stephen R. van den Berg.

"Am I paying for this abuse or is it extra?"

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 23:09               ` Stephen R. van den Berg
@ 2008-09-11  0:39                 ` Linus Torvalds
  2008-09-11  6:22                   ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Linus Torvalds @ 2008-09-11  0:39 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
> 
> - However, if you explicitly pull D, the origin information from A to D can
>   be used.  People doing a generic clone get all four branches, and
>   therefore have all the important commits which normally could contain
>   origin links.  Note that even during a clone, commits pointed to by
>   origin links are not being transmitted (unless there already are other
>   reasons to send them along).

IOW, it's not actually transferring them and saving them, since a simple 
delete of the origin branch will basically make them unreachable.

Fine. At least it works the same way as fetch, then. But it's still a huge 
mistake, because it really does mean that it is technically no different 
at all to just mentioning the SHA1 in the commit message, the way we 
already do for backports.

The "origin" link has no _meaning_ for git, in other words.

> >No it's not. You can mention the backport explicitly in the commit 
> >message, and then you get hyperlinks in the graphical viewers. That works 
> >when people _want_ it to work, instead of in some hidden automatic manner 
> >that does entirely the wrong thing in all the common cases.
> 
> Could you spell out one of the common cases where it would do entirely
> the wrong thing?

It carries along information that is worthless and meaningless and hidden.

I refuse to touch such an obviously braindamaged design. It has no sane 
_semantics_. If it doesn't have semantics, it shouldn't exist, certainly 
not as some architected feature.

Nobody has shown any actual sane meaning for it. The only ones that have 
been mentioned have been for things like avoiding re-picking commits 
during a "git rebase", but (a) the patch SHA1 does that already for things 
that are truly identical an (b) since that information isn't reliable 
_anyway_, and since it's apparently a user choice, it's just "random".

I'm sorry, but "good design" is a hell of a lot more important than some 
made-up use case that isn't even reliable, and doesn't match any actual 
real problems that anybody can explain.

			Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 22:55                 ` Jeff King
  2008-09-10 23:19                   ` Stephen R. van den Berg
@ 2008-09-11  2:46                   ` Nicolas Pitre
  1 sibling, 0 replies; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-11  2:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Stephen R. van den Berg, Theodore Tso, Petr Baudis, git

On Wed, 10 Sep 2008, Jeff King wrote:

> On Thu, Sep 11, 2008 at 12:34:27AM +0200, Stephen R. van den Berg wrote:
> 
> > I see.  Indeed.  That's a lot better.
> > Did the binary search inside tree objects ever get implemented?
> 
> I believe it's still linear (and skimming tree-walk.c:find_tree_entry
> seems to confirm). However, one advantage of such an approach is that it
> will improve as tree lookup improves (e.g., I believe the pack v4 work
> included improvements in this area).

No, not yet.  Actually that's the part that still needs serious 
thinking.


Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 23:19                   ` Stephen R. van den Berg
@ 2008-09-11  5:16                     ` Paolo Bonzini
  2008-09-11  7:55                       ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-11  5:16 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jeff King, Theodore Tso, Petr Baudis, git

>>> - The origin information is no longer cryptographically protected (under
>>>   certain circumstances this could be considered an advantage and a
>>>   disadvantage at the same time).
> 
>> I haven't thought enough about it to decide whether there is a scenario
>> where making such a "cherry-picked from" annotation might make use of
>> that property.
> 
> Being able to subvert the authenticity of git blame by providing fake
> origin information is not very appealing.

You could use a dummy submodule to ensure that each commit pointed to
the right set of notes.  It would force to create a separate commit
whenever you modified the notes, which is actually not bad.

Alternatively, the header of the commit can be modified to add a pointer
to a tree object for the notes; I suppose this is more palatable than
the origin link.  The tree could be organized in directories+blobs like
.git/objects to speed up the lookup.

I actually like the commit notes idea, but then I wonder: why are the
author and committer part of the commit object?  How does the plumbing
use them?  Isn't that metadata that could live in the "notes"?  And so,
why should the origin link have less privileges?

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11  0:39                 ` Linus Torvalds
@ 2008-09-11  6:22                   ` Stephen R. van den Berg
  2008-09-11  8:20                     ` Jakub Narebski
                                       ` (2 more replies)
  0 siblings, 3 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11  6:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git

Linus Torvalds wrote:
>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:

>> - However, if you explicitly pull D, the origin information from A to D can
>>   be used.  People doing a generic clone get all four branches, and
>>   therefore have all the important commits which normally could contain
>>   origin links.  Note that even during a clone, commits pointed to by
>>   origin links are not being transmitted (unless there already are other
>>   reasons to send them along).

>IOW, it's not actually transferring them and saving them, since a simple 

Correct.

>delete of the origin branch will basically make them unreachable.

False.

If you fetch just branches A, B and C, but not D, the origin link from A
to D is dangling.  Once you have fetched D as well, the origin link from
A to D is not dangling anymore.  Subsequently deleting branch D but
keeping branch A will keep everything in branch D up till the commits
the origin link is pointing to alive and prevent those from being
deleted.

>Fine. At least it works the same way as fetch, then. But it's still a huge 
>mistake, because it really does mean that it is technically no different 
>at all to just mentioning the SHA1 in the commit message, the way we 
>already do for backports.

Not quite.

>The "origin" link has no _meaning_ for git, in other words.

Git will keep alive commits based on origin links once you (the fetcher)
has shown interest by fetching the appropriate branches.

As to "meaning" for git, it's there in the form of:
- --topo-order uses the information to order the output (but only if the
  target commits of the link are present in the repository).

>> >No it's not. You can mention the backport explicitly in the commit 
>> >message, and then you get hyperlinks in the graphical viewers. That works 
>> >when people _want_ it to work, instead of in some hidden automatic manner 
>> >that does entirely the wrong thing in all the common cases.

>> Could you spell out one of the common cases where it would do entirely
>> the wrong thing?

>It carries along information that is worthless and meaningless and hidden.

The common cases would be:

a. "hidden": It doesn't need to be hidden.  It can be hidden if you want it
   to be.  We can decide if git hides it sometimes, always or never.
   So this point is moot.

b. "meaningless": Git is all about taking snapshots of sourcetrees and
   linking them in an orderly fashion.  The origin link is all about
   taking snapshots of patches and linking them in an orderly fashion.
   This allows you to see the patch evolve over time, and it allows for
   diffs between patches.  We're not actually storing patches, we merely
   store snapshots.  As it happens, the snapshot of a patch is defined
   by two commit hashes.
   Doesn't sound meaningless to me.  Just as one needs normal history
   between commits in a branch to follow development, there is a history
   of a backport as it "travels" from stable branch to stable branch.

c. "worthless": Without the tracking of a backport through a series of
   well-defined patch-snapshots, it becomes kind of haphazard to
   actually figure out which piece of code came from where.  Having this
   information in the form of a series of origin links increases the
   efficiency of a developer maintaining the backports between branches.
   Maybe you consider that worthless, I consider anything that improves
   code quality because having access to a concise history of how the
   code evolved a Good Thing.  Having history of how code evolved is
   actually one of the main reasons why people use git.  It's just that
   git lacks support in the tracking of backports.  The origin link
   fills that gap.  If you don't do backports in your trees, then fine,
   the origin link will never materialise in your repositories.

>I refuse to touch such an obviously braindamaged design. It has no sane 
>_semantics_. If it doesn't have semantics, it shouldn't exist, certainly 
>not as some architected feature.

It does have sane semantics, quite well defined, actually.  I'm just not
good at explaining them apparently.  Try reading the explanation I gave
above.

>Nobody has shown any actual sane meaning for it. The only ones that have 
>been mentioned have been for things like avoiding re-picking commits 
>during a "git rebase", but (a) the patch SHA1 does that already for things 
>that are truly identical an (b) since that information isn't reliable 
>_anyway_, and since it's apparently a user choice, it's just "random".

Quite frankly I don't see the application for rebase either (yet).
I'm focusing on sane semantics first, any implications that has for
usability by rebase will follow from that.  The origin links track
content (patches), nothing else.  They assist the developer in
understanding how patches evolve, any use cases follow from that.

>I'm sorry, but "good design" is a hell of a lot more important than some 
>made-up use case that isn't even reliable, and doesn't match any actual 
>real problems that anybody can explain.

Please focus on the semantics and on the *non*-made up use case of
development of several stable branches with backports between them.
Discussing made-up use cases is wasting energy at this point.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11  5:16                     ` Paolo Bonzini
@ 2008-09-11  7:55                       ` Stephen R. van den Berg
  2008-09-11  8:45                         ` Paolo Bonzini
  2008-09-11 12:33                         ` A Large Angry SCM
  0 siblings, 2 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11  7:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jeff King, Theodore Tso, Petr Baudis, git

Paolo Bonzini wrote:
>> Being able to subvert the authenticity of git blame by providing fake
>> origin information is not very appealing.

>You could use a dummy submodule to ensure that each commit pointed to
>the right set of notes.  It would force to create a separate commit
>whenever you modified the notes, which is actually not bad.

Possibly, yes.  But we'd have to be careful not to incur too much
overhead because every indirection will cost, especially since the
origin link sometimes is checked for on every commit during a treewalk.
The fact that it rarely exists means that it should be fast to find out
that there are no origin links (which obviously is the common case).

>Alternatively, the header of the commit can be modified to add a pointer
>to a tree object for the notes; I suppose this is more palatable than
>the origin link.

This won't work for the original notes concept, because it makes the
notes immutable after commit.  For the origin links this would be fine,
since they don't change once committed.
The problem with fitting the origin links in the notes is twofold:
- They become mutable, which is undesirable, I'd like to preserve
  history as is (just like parent links).
- There is a performance hit, since origin links need to be found not to
  exist on every commit (sometimes, depending on the operation of course).

>  The tree could be organized in directories+blobs like
>..git/objects to speed up the lookup.

Yes, that was already in the latest proposal for notes, I believe.

>I actually like the commit notes idea, but then I wonder: why are the
>author and committer part of the commit object?  How does the plumbing
>use them?  Isn't that metadata that could live in the "notes"?  And so,

It would fit with a non-mutable version of the notes.  Then again, we
already *have* the non-mutable version of the notes, it's called the
header of the commit message.

>why should the origin link have less privileges?

They both belong in the non-mutable notes, and those happen to live in
the header of the commit (which *is* the most efficient spot, of course).
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11  6:22                   ` Stephen R. van den Berg
@ 2008-09-11  8:20                     ` Jakub Narebski
  2008-09-11 12:31                       ` Stephen R. van den Berg
  2008-09-11 12:28                     ` A Large Angry SCM
  2008-09-11 15:39                     ` Linus Torvalds
  2 siblings, 1 reply; 137+ messages in thread
From: Jakub Narebski @ 2008-09-11  8:20 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Linus Torvalds, git

Stephen R. van den Berg wrote:

[...]
> Please focus on the semantics and on the *non*-made up use case of
> development of several stable branches with backports between them.
> Discussing made-up use cases is wasting energy at this point.

By the way, I would really consider trying first to host 'origin' links 
not in repository database itself, but in some extra database inside 
git repository, like reflog or index.  Git community is _very_ 
reluctant to modifying / extending format of persistent objects.  From 
all the proposals to add some extra header to a 'commit' object: 
the 'prior' link to previous version of rebased, cherry-picked or redone 
commit (superceded somewhat by local reflog, on by default in modern 
git); the generic 'note' header, with examples of usage including 
_non-linking_ cherry-pick and reverted commit-id, merge strategy used, 
and hints for rename detection, i.e. something like #pragma in C 
(rejected on the grounds that it was too generic and didn't have well 
defined semantic); the 'generation' header which was meant to help and 
speed up sorting commits, with root (parentless) commit having 
generation of 1, and each commit having generation being 1 more than 
maximum of generations of its parents (I think that backwards 
compatibility killed it, and the fact that date-based heuristics was 
improved); only the 'encoding' header was accepted.

So I think you should go the route of externally (outside 'commit' 
objects) maintaing 'origin'/'changeset'/'cset' links (like XLink 
extended links ;-)) as a prototype to examine consequences of the idea. 
That was the way _submodule_ support was added to Git, by the way.  
First there were (at least) two implementations maintaining submodules 
outside object database (see http://git.or.cz/gitwiki/SubprojectSupport
especially "References" section), then it was officially added first at 
the level of plumbing support, as extension of a 'tree' object (and 
index format, I think).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11  7:55                       ` Stephen R. van den Berg
@ 2008-09-11  8:45                         ` Paolo Bonzini
  2008-09-11 12:33                         ` A Large Angry SCM
  1 sibling, 0 replies; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-11  8:45 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jeff King, Theodore Tso, Petr Baudis, git


>> I actually like the commit notes idea, but then I wonder: why are the
>> author and committer part of the commit object?  How does the plumbing
>> use them?  Isn't that metadata that could live in the "notes"?  And so,
> 
> we already *have* the non-mutable version of the notes, it's called the
> header of the commit message.

Yes, that was my point.  I don't see how the author and committer fit in
the header of the commit message, if the origin does not.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11  6:22                   ` Stephen R. van den Berg
  2008-09-11  8:20                     ` Jakub Narebski
@ 2008-09-11 12:28                     ` A Large Angry SCM
  2008-09-11 12:39                       ` Stephen R. van den Berg
  2008-09-11 15:39                     ` Linus Torvalds
  2 siblings, 1 reply; 137+ messages in thread
From: A Large Angry SCM @ 2008-09-11 12:28 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Linus Torvalds, Jakub Narebski, git

Stephen R. van den Berg wrote:
> If you fetch just branches A, B and C, but not D, the origin link from A
> to D is dangling. 

I do not understand how this can be considered an acceptable behavior. 
If an object ID is referenced in an object header, particularly commit 
objects, fetch must gather those objects also because to do otherwise 
breaks the cryptographic authentication in git.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11  8:20                     ` Jakub Narebski
@ 2008-09-11 12:31                       ` Stephen R. van den Berg
  2008-09-11 13:51                         ` Theodore Tso
  2008-09-11 15:02                         ` Nicolas Pitre
  0 siblings, 2 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 12:31 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Linus Torvalds, git

Jakub Narebski wrote:
>Stephen R. van den Berg wrote:
>[...]
>> Please focus on the semantics and on the *non*-made up use case of
>> development of several stable branches with backports between them.
>> Discussing made-up use cases is wasting energy at this point.

>By the way, I would really consider trying first to host 'origin' links 
>not in repository database itself, but in some extra database inside 
>git repository, like reflog or index.  Git community is _very_ 
>reluctant to modifying / extending format of persistent objects.  From 

Rightfully so, of course.

>So I think you should go the route of externally (outside 'commit' 
>objects) maintaing 'origin'/'changeset'/'cset' links (like XLink 
>extended links ;-)) as a prototype to examine consequences of the idea. 
>That was the way _submodule_ support was added to Git, by the way.  
>First there were (at least) two implementations maintaining submodules 
>outside object database (see http://git.or.cz/gitwiki/SubprojectSupport
>especially "References" section), then it was officially added first at 
>the level of plumbing support, as extension of a 'tree' object (and 
>index format, I think).

Well, the train of thought here goes as follows:
1. Sure, why not add a field (zero or more) at the bottom of the free-form
   commit message reading like:

   Origin: bbb896d8e10f736bfda8f587c0009c358c9a8599 ee837244df2e2e4e9171f508f83f353730db9e53

2. Add support to cherry-pick/revert to actually generate the field upon
   demand.

3. Then add support to prune/gc/fsck/blame/log --graph to take the field
   into account.

4. Add support to filter-branch/rebase to renumber the field if necessary.

5. Add support to --topo-order to use the field if present and reachable.

6. For bonus points: add support to log to suppress the display of the
   field at the end of the commit message, and redisplay the field
   as Origin: bbb896d..ee83724
   next to the Parent/Merge fields.

Well, and after having done steps 1 to 5, the net result is that it
works almost as if the field is present in the header, except that:
- It is now at the end of the body in the commit message.
- It takes more time to find and parse it.

So that gives two minuses, and no pluses.
So short-circuiting the reasoning suggests that since the only thing
that actually changes now is the position of the field (at the top or
end of the commit message), we might as well do it right and put it in
the top, that gets rid of the two minuses.

Anything I missed?

Basically it means that:

a. If there is a better solution to tracking the backports, I'll gladly
   use that instead, but simply using the current really freeform
   approach doesn't cut it (it currently refers to a single commit,
   instead of a pair of commits, and takes too long to parse out in a
   --top-order or blame command).  Better solutions I haven't heard so
   far.

b. I need the integrity protection of a commit to make sure that the
   origin fields cannot be altered later; blame would be too easy to fool
   otherwise.  So using the notes solution seems to be out (it would also
   be quite a performance hit again).

c. I consider the Origin: field at the end of the commit message a
   workable solution, but it smells like X-header-extension-messes as in
   E-mail headers, and it incurs a small performance hit (in case of
   --topo-order/blame/prune/fsck), but maybe this performance hit can be
   minimised by making sure that the fields are *always* at the end
   of the commit message.

d. Using the proposed origin header in the standard commit header has
   close to zero overhead (in most commits the field is not present), yet
   codecomplexitywise it is almost identical with the Origin: field at
   the end of the commit message.

I find it remarkable though that people are dragging their feet at
solution d, yet are quite ok with solution c.  IMO solution c and d are
almost identical, except that solution c is ugly, and solution d is
elegant.  But if it makes it easier to prove the usefulness by
implementing the ugly solution first, that's fine.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11  7:55                       ` Stephen R. van den Berg
  2008-09-11  8:45                         ` Paolo Bonzini
@ 2008-09-11 12:33                         ` A Large Angry SCM
  1 sibling, 0 replies; 137+ messages in thread
From: A Large Angry SCM @ 2008-09-11 12:33 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: Paolo Bonzini, Jeff King, Theodore Tso, Petr Baudis, git

Stephen R. van den Berg wrote:
> It would fit with a non-mutable version of the notes.  Then again, we
> already *have* the non-mutable version of the notes, it's called the
> header of the commit message.

Almost correct. Remove "header of" from the above and you'd be correct.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 12:28                     ` A Large Angry SCM
@ 2008-09-11 12:39                       ` Stephen R. van den Berg
  2008-09-12  0:03                         ` A Large Angry SCM
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 12:39 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Linus Torvalds, Jakub Narebski, git

A Large Angry SCM wrote:
>Stephen R. van den Berg wrote:
>>If you fetch just branches A, B and C, but not D, the origin link from A
>>to D is dangling. 

>I do not understand how this can be considered an acceptable behavior. 
>If an object ID is referenced in an object header, particularly commit 
>objects, fetch must gather those objects also because to do otherwise 
>breaks the cryptographic authentication in git.

No it does not.
The cryptographic seal is calculated over the content of the commit,
which includes the hashes of all referenced objects, but doesn't include
the objects themselves.
The content of the commit is not violated.

Do not forget though:
- origin links are a rare occurrence.
- When they occur, they usually were made to point into other (deemed)
  important public branches.
- Due to the fact that the branches they are pointing into are important
  and public, in most cases the origin links *will* point to objects you
  actually already have (even if you fetched from someone else).
- The only time you're going to have dangling origin links is when
  they were pointing at someone's private branches, in which case it was
  not very prudent of the committer to actually record the link in the
  first place.  But nothing breaks if you don't have his private branch
  locally.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 12:31                       ` Stephen R. van den Berg
@ 2008-09-11 13:51                         ` Theodore Tso
  2008-09-11 15:32                           ` Stephen R. van den Berg
  2008-09-11 15:02                         ` Nicolas Pitre
  1 sibling, 1 reply; 137+ messages in thread
From: Theodore Tso @ 2008-09-11 13:51 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git

On Thu, Sep 11, 2008 at 02:31:48PM +0200, Stephen R. van den Berg wrote:
> 
> Well, the train of thought here goes as follows:
> 1. Sure, why not add a field (zero or more) at the bottom of the free-form
>    commit message reading like:
> 
>    Origin: bbb896d8e10f736bfda8f587c0009c358c9a8599 ee837244df2e2e4e9171f508f83f353730db9e53
> 
> 2. Add support to cherry-pick/revert to actually generate the field upon
>    demand.

"git cherry-pick -x" already generates the field you want.

> 
> 3. Then add support to prune/gc/fsck/blame/log --graph to take the field
>    into account.
> 

Um, why should "git fsck", or "git prune" or "git gc" need to
understand about this field?  What were you saying about unclean
semantics, again?  I thought you claimed that dangling origin links
were OK?  So why the heck should git fsck care?  And why shouldn't
gc/prune drop objects that are only referenced via the origin link.

> 4. Add support to filter-branch/rebase to renumber the field if necessary.

As we discussed earlier in some cases renumbering the field is not the
right thing to do, especially if the commit in question has already
been cherry-picked --- and you don't know that.  Again, this is why
prototyping it outside of the core git is so useful; it will show up
some of these fundamental flaws in the origin link proposal.

> Well, and after having done steps 1 to 5, the net result is that it
> works almost as if the field is present in the header, except that:
> - It is now at the end of the body in the commit message.
> - It takes more time to find and parse it.

A proof of concept, even if it isn't fully performant, is useful to
prove that an idea actually has merit --- which clearly not everyone
believes at this point.

I'll also note that having a ***local*** database to cache the origin
link is a great way of short-circuiting the performance difficulties.
If it works, then it will be a lot easier to convince people that
perhaps it should be done git-core, and by modifying core git functions.

Alternatively, if you think this is such a great idea, why don't you
grab a copy of the git repository, and start hacking the idea
yourself?  If you have running code, it tends to make the idea much
more concrete, and much easier to evaluate.  Or were you hoping to
convince other people to do all of this programming for you?

						- Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 12:31                       ` Stephen R. van den Berg
  2008-09-11 13:51                         ` Theodore Tso
@ 2008-09-11 15:02                         ` Nicolas Pitre
  2008-09-11 16:00                           ` Stephen R. van den Berg
  1 sibling, 1 reply; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-11 15:02 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:

> So short-circuiting the reasoning suggests that since the only thing
> that actually changes now is the position of the field (at the top or
> end of the commit message), we might as well do it right and put it in
> the top, that gets rid of the two minuses.
> 
> Anything I missed?

A good convincing demonstration that this is actually worth doing in the 
first place.  And here I'm talking about the _feature_ and not the 
_implementation_.

> Basically it means that:
> 
> a. If there is a better solution to tracking the backports, I'll gladly
>    use that instead, but simply using the current really freeform
>    approach doesn't cut it (it currently refers to a single commit,
>    instead of a pair of commits, and takes too long to parse out in a
>    --top-order or blame command).  Better solutions I haven't heard so
>    far.
> 
> b. I need the integrity protection of a commit to make sure that the
>    origin fields cannot be altered later; blame would be too easy to fool
>    otherwise.  So using the notes solution seems to be out (it would also
>    be quite a performance hit again).
> 
> c. I consider the Origin: field at the end of the commit message a
>    workable solution, but it smells like X-header-extension-messes as in
>    E-mail headers, and it incurs a small performance hit (in case of
>    --topo-order/blame/prune/fsck), but maybe this performance hit can be
>    minimised by making sure that the fields are *always* at the end
>    of the commit message.
> 
> d. Using the proposed origin header in the standard commit header has
>    close to zero overhead (in most commits the field is not present), yet
>    codecomplexitywise it is almost identical with the Origin: field at
>    the end of the commit message.
> 
> I find it remarkable though that people are dragging their feet at
> solution d, yet are quite ok with solution c.  IMO solution c and d are
> almost identical, except that solution c is ugly, and solution d is
> elegant.  But if it makes it easier to prove the usefulness by
> implementing the ugly solution first, that's fine.

Technically speaking, implementation d is obviously the most efficient.  
but, as mentioned above, the actual need for this feature has not been 
convincing so far.  Until then, it is not wise to add random stuff to 
the very structure of a commit object, while c can be done even 
externally from git which is a good way to demonstrate and convince 
people about the usefulness of such feature.


Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 13:51                         ` Theodore Tso
@ 2008-09-11 15:32                           ` Stephen R. van den Berg
  2008-09-11 18:00                             ` Theodore Tso
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 15:32 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jakub Narebski, Linus Torvalds, git

Theodore Tso wrote:
>On Thu, Sep 11, 2008 at 02:31:48PM +0200, Stephen R. van den Berg wrote:
>> Well, the train of thought here goes as follows:
>> 2. Add support to cherry-pick/revert to actually generate the field upon
>>    demand.

>"git cherry-pick -x" already generates the field you want.

Well, sort of.  In order for swift parsing it should be a real field,
i.e. it should not be an English sentence (in order to avoid people
accidentally translating it); and it should list a pair of hashes
(patches/changesets are defined by the difference between two tree
snapshots).  So it would be a -o option most likely, in order to provide
backward compatibility to the users of -x.

>> 3. Then add support to prune/gc/fsck/blame/log --graph to take the field
>>    into account.

>Um, why should "git fsck", or "git prune" or "git gc" need to
>understand about this field?  What were you saying about unclean
>semantics, again?  I thought you claimed that dangling origin links
>were OK?  So why the heck should git fsck care?  And why shouldn't
>gc/prune drop objects that are only referenced via the origin link.

Dangling origin links are ok only if the developer in charge of the
repository doesn't care about the commits/branches they point to.
The definition of a "caring developer" is formalised by the fact that
the offending commits are already present in the repository or not.

This implies that fsck will skip the field if the hashes in question are
unreachable in the current repository.
If they are reachable though, fsck will follow the link and check the
whole tree referenced by the origin link.  Obviously there are only two
conditions for an origin link: either the hash points to an unreachable
object or the hash points to a reachable object of type commit (and all
associated checks that go with any commit).

gc will preserve the commits the origin links point to once they are
reachable.  I.e. if the developer doesn't care about the commits the
origin links point to (i.e. if the branches are not reachable) then gc
just skips them, if the developer *does* care, the origin links are used
to keep those objects alive (and, of course, all their parenthood).

>> 4. Add support to filter-branch/rebase to renumber the field if necessary.

>As we discussed earlier in some cases renumbering the field is not the
>right thing to do, especially if the commit in question has already
>been cherry-picked --- and you don't know that.  Again, this is why
>prototyping it outside of the core git is so useful; it will show up
>some of these fundamental flaws in the origin link proposal.

I agree that the behaviour of especially rebase with respect to the
origin links is still something that needs to be thought through.
I'm not convinced you are right, but I'm not convinced you are wrong
either.

>> Well, and after having done steps 1 to 5, the net result is that it
>> works almost as if the field is present in the header, except that:
>> - It is now at the end of the body in the commit message.
>> - It takes more time to find and parse it.

>A proof of concept, even if it isn't fully performant, is useful to
>prove that an idea actually has merit --- which clearly not everyone
>believes at this point.

Quite.

>I'll also note that having a ***local*** database to cache the origin
>link is a great way of short-circuiting the performance difficulties.
>If it works, then it will be a lot easier to convince people that
>perhaps it should be done git-core, and by modifying core git functions.

Creating local databases for these kinds of structures feels kludgy
somehow, since the git hash objects essentially *are* a working
database.  I have not checked yet if git already has some kind of
ready-to-use local database lib inside which I could reuse for that.

>Alternatively, if you think this is such a great idea, why don't you
>grab a copy of the git repository, and start hacking the idea
>yourself?

Actually, in the first hour after posting the initial mail/proposal I
already had altered a local version of git to support the origin links
in commit.[ch], --topo-order and fsck.  Before hacking further I decided
to get some feedback first to see if someone would come up with
something better.  And they did, instead of the mainline number, I
decided that using two hashes is better.  Once the dust has settled,
I'll fill in the rest of the code.

>  If you have running code, it tends to make the idea much
>more concrete, and much easier to evaluate.

Agreed, but then again, most of the programming is done without touching
any code (the design phase), which is where we are now.  Once the design
is scrutinised (as far as possible), the coding can begin (continue).
The feedback so far was very helpful, and caused me to explore (and
dismiss) some of the alternate avenues to achieve the desired
functionality.

>  Or were you hoping to
>convince other people to do all of this programming for you?

I've never needed that so far, and will not need that here either.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11  6:22                   ` Stephen R. van den Berg
  2008-09-11  8:20                     ` Jakub Narebski
  2008-09-11 12:28                     ` A Large Angry SCM
@ 2008-09-11 15:39                     ` Linus Torvalds
  2008-09-11 16:01                       ` Paolo Bonzini
                                         ` (2 more replies)
  2 siblings, 3 replies; 137+ messages in thread
From: Linus Torvalds @ 2008-09-11 15:39 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
> 
> >delete of the origin branch will basically make them unreachable.
> 
> False.

Stephen, here's a f*cking clue:

 - I know how git works.

> If you fetch just branches A, B and C, but not D, the origin link from A
> to D is dangling.  Once you have fetched D as well [..]

So I just said we deleted beanch 'D', so there's no way to ever fetch it 
again.

Get it?

The fact is, a big part of git is temporary branches. It's one of the 
*best* features of git. Throw-away stuff. Those throw-away branches are 
often done for initial development, and then the final result is often a 
cleaned-up version. Often using rebase or cherry-picking or any number of 
things.

And this is why "git cherry-pick" DOES NOT PUT THE ORIGINAL SHA1 IN THE 
COMMENT FIELD BY DEFAULT.

(Although you can use "-x" to make it do so for when you actually _want_ 
to say "cherry-picked from xyzzy")

Can you not understand that? The "origin" field is _garbage_. It's garbage 
for all normal cases. The original commit will not ever even EXIST in the 
result, because it has long since been thrown away and will never exist 
anywhere else.

Garbage should be _avoided_, not added.

			Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 15:02                         ` Nicolas Pitre
@ 2008-09-11 16:00                           ` Stephen R. van den Berg
  2008-09-11 17:02                             ` Nicolas Pitre
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 16:00 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jakub Narebski, Linus Torvalds, git

Nicolas Pitre wrote:
>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>> Anything I missed?

>Technically speaking, implementation d is obviously the most efficient.  
>but, as mentioned above, the actual need for this feature has not been 
>convincing so far.  Until then, it is not wise to add random stuff to 
>the very structure of a commit object, while c can be done even 
>externally from git which is a good way to demonstrate and convince 
>people about the usefulness of such feature.

The actual need for the feature seems to be dependent on one's workflow
habits.  This is also the problem I sense throughout the thread: some
people know exactly what I'm talking about, and would come up with the
almost identical design specs for the feature independent of myself, and
others need to be explained every tiny detail of the spec because they
are not familiar with the concept and can't imagine why/how it would be
used.

Let me try and describe once more the typical environment this origin field
is vital in:

Imagine a repository with:
- 33774 commits total
- 13 years of history
- 1 development branch
- 9 stable branches (forked off of the development branch at regular
  intervals during the past 13 years).
- The stable branches are never merged with each other or with the
  development branch.
- 2787 individual back/forward ports between the development and stable
  branches.

In order to have meaningful output for git-blame, it needs to follow the
chain across cherry-picks reliably.
Once you alter a piece of code, in order to figure out what more to alter,
you need to verify if this piece of code was or wasn't forward/backported.
Reliable and fast reporting of this, and actual comparison of the
different forward/backports between the 9 branches is essential.  It
basically means that you need to view the diffs of the patches across 9
branches on a regular basis.

Without the origin links, this workflow will cost a lot more time to
pursue (I know it, because I'm living it at the moment, and no, I'm not
the only developer, it's a development team).

This development model is not unique to my situation, it occurs at more
places.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 15:39                     ` Linus Torvalds
@ 2008-09-11 16:01                       ` Paolo Bonzini
  2008-09-11 16:23                         ` Linus Torvalds
  2008-09-11 16:53                       ` Jakub Narebski
  2008-09-11 19:23                       ` Stephen R. van den Berg
  2 siblings, 1 reply; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-11 16:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen R. van den Berg, Jakub Narebski, git

>> If you fetch just branches A, B and C, but not D, the origin link from A
>> to D is dangling.  Once you have fetched D as well [..]
> 
> So I just said we deleted beanch 'D', so there's no way to ever fetch it 
> again.
> 
> Get it?

Yes, but you should not have used Stephen's proposed new option to git
cherry-pick, just like you shouldn't have used the existing -x option.
"-x" would not have created a dangling reference, but it would have
created a puzzling commit message.

> The fact is, a big part of git is temporary branches. It's one of the 
> *best* features of git. Throw-away stuff. Those throw-away branches are 
> often done for initial development, and then the final result is often a 
> cleaned-up version. Often using rebase or cherry-picking or any number of 
> things.

These days I doubt people would use cherry-pick, they would probably use
interactive rebase.  But anyway, exactly for the same reason...

> "git cherry-pick" DOES NOT PUT THE ORIGINAL SHA1 IN THE 
> COMMENT FIELD BY DEFAULT.

... neither should cherry-picking create the origin link by default.
Only if requested by the user, using a new option that is basically "-x"
done in a different way.  Just like "-x", it should not be used when
cherry-picking from private branches.

But say someone does it, then what happens?  If people clone the branch,
the reference will be basically unusable.  But since "git gc" does not
delete the referenced commit, at least the origin commit is still
available in the repository where the cherry-pick was made.  It is
debatable whether it is better or worse than "-x".

Can we discuss instead a generic way to have porcelain-level metadata,
immutable or at least versioned, for the commit objects?  (This is the
same kind of metadata as the author or committer, which clearly have
nothing to do with the git plumbing.)  Do you have any proposal of saner
semantics, not for the origin link but for commit references within this
kind of metadata in general?

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 16:01                       ` Paolo Bonzini
@ 2008-09-11 16:23                         ` Linus Torvalds
  2008-09-11 20:16                           ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Linus Torvalds @ 2008-09-11 16:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stephen R. van den Berg, Jakub Narebski, git

On Thu, 11 Sep 2008, Paolo Bonzini wrote:
> 
> Yes, but you should not have used Stephen's proposed new option to git
> cherry-pick, just like you shouldn't have used the existing -x option.
> "-x" would not have created a dangling reference, but it would have
> created a puzzling commit message.

But my point is, _none_ of what Stephen proposes has _any_ advantage over 
the already existing functionality.

IOW, absolutely *everything* is actually done better with existing data 
structures, and then just adding tools to perhaps follow those SHA1's in 
the commit message.

The whole "origin" field doesn't have any semantics that make sense for 
core git. It's basically ignored by all normal git operations, and the 
_only_ things that people seem to point out as being features are things 
that can - and obviously in my opinion should - be done by much higher 
levels.

For example, the claim was that it's hard to follow the chain of 
cherry-picks. That's not _true_. Use gitweb and gitk, and you can already 
see them. Sure, you need to use "-x", BUT YOU'D HAVE TO USE THAT WITH 
Steven's MODEL TOO!

Exactly because it would be a frigging _disaster_ if that "origin" field 
was done by default.

And the only thing that "origin" does is:

 - hide the information

 - make it easier to make mistakes (either enable the feature by default, 
   or not notice that you didn't enable it when you wanted to)

 - add a requirement for a backwards-incompatible field that is just 
   guaranteed to confuse any old git binaries.

 - make it _harder_ to do things like send revert/cherry-pick information 
   by email.

See? There are only downsides.

Look at the kernel -stable trees. They explicitly add that cherry-pick 
information, and can add *more*. For example, they go look at

	http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.26.y.git;a=commit;h=cb09de4542ad75cc3b66d0cf1a86217bf5633416

and then go to its parent commit (just click on the parent SHA). And 
notice how the stable kernel tree commits talk about where they were 
back-ported from, or _why_ they aren't back-ports at all!

IOW, there are really two main cases:

 - the common case for cherry-picking: you do not want any origin 
   information, because it's irrelevant, pointless, and *wrong*.

 - you _do_ want origin information, but you actually want to _explain_ 
   explicitly why it's not irrelevant, pointless, or wrong.

And yes, the latter case is about a lot more than "this was 
cherry-picked". It's about "this fixes that other commit we did", or it's 
about "this was anti-cherry-picked - ie reverted". They are all "origins" 
for the commit in the sense that they are relevant to the commit, but they 
all need some explanation of what _kind_ of origins they are.

			Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 15:39                     ` Linus Torvalds
  2008-09-11 16:01                       ` Paolo Bonzini
@ 2008-09-11 16:53                       ` Jakub Narebski
  2008-09-11 19:23                       ` Stephen R. van den Berg
  2 siblings, 0 replies; 137+ messages in thread
From: Jakub Narebski @ 2008-09-11 16:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen R. van den Berg, git

Linus Torvalds wrote:
> On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
> 
>> If you fetch just branches A, B and C, but not D, the origin link from A
>> to D is dangling.  Once you have fetched D as well [..]
> 
> So I just said we deleted beanch 'D', so there's no way to ever fetch it 
> again.
> 
> Get it?
> 
> The fact is, a big part of git is temporary branches. It's one of the 
> *best* features of git. Throw-away stuff. Those throw-away branches are 
> often done for initial development, and then the final result is often a 
> cleaned-up version. Often using rebase or cherry-picking or any number of 
> things.
> 
> And this is why "git cherry-pick" DOES NOT PUT THE ORIGINAL SHA1 IN THE 
> COMMENT FIELD BY DEFAULT.
> 
> (Although you can use "-x" to make it do so for when you actually _want_ 
> to say "cherry-picked from xyzzy")

And that is why the proposal was to use "-o" option to git-cherry-pick
to add 'origin'/'changeset' header, exactly because git-cherry-pick is
_abused_ to clean up branches and reorder commits; although I think that
"git rebase --interactive" (and patch management interfaces) do replace
using git-cherry-pick for that purpose.

git-revert would add 'origin'/'changeset' header unconditionally,
just like by default it seeds commit message with SHA-1 id of reverted
commit.

> Can you not understand that? The "origin" field is _garbage_. It's garbage 
> for all normal cases. The original commit will not ever even EXIST in the 
> result, because it has long since been thrown away and will never exist 
> anywhere else.
> 
> Garbage should be _avoided_, not added.

Hmmm... the difference between having 'origin' in a commit object header,
and having it in commit mesage is like difference between 'Signed-off-by:'
convention and 'author' header.  First is the matter of workflow, second
is inherent, required and non-avoidable part of revision information.

On the other hand git-cherry and git-blame would then have rely on
parsing correctly free-form part of a commit object, to take advantage
of 'origin' information: something what 'origin' info is for.

P.S. 'generation' header was not added... just saying... :-)

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 16:00                           ` Stephen R. van den Berg
@ 2008-09-11 17:02                             ` Nicolas Pitre
  2008-09-11 18:44                               ` Stephen R. van den Berg
  2008-09-11 21:05                               ` Junio C Hamano
  0 siblings, 2 replies; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-11 17:02 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:

> Let me try and describe once more the typical environment this origin field
> is vital in:
> 
> Imagine a repository with:
> - 33774 commits total
> - 13 years of history
> - 1 development branch
> - 9 stable branches (forked off of the development branch at regular
>   intervals during the past 13 years).
> - The stable branches are never merged with each other or with the
>   development branch.
> - 2787 individual back/forward ports between the development and stable
>   branches.
> 
> In order to have meaningful output for git-blame, it needs to follow the
> chain across cherry-picks reliably.
> Once you alter a piece of code, in order to figure out what more to alter,
> you need to verify if this piece of code was or wasn't forward/backported.
> Reliable and fast reporting of this, and actual comparison of the
> different forward/backports between the 9 branches is essential.  It
> basically means that you need to view the diffs of the patches across 9
> branches on a regular basis.
> 
> Without the origin links, this workflow will cost a lot more time to
> pursue (I know it, because I'm living it at the moment, and no, I'm not
> the only developer, it's a development team).
> 
> This development model is not unique to my situation, it occurs at more
> places.

OK.  I think I might be able to believe you.

Where I feel uncomfortable is with the real semantics of your "origin" 
link proposal.

First, its name.  The word "origin" probably has a too narrow meaning 
that creates confusion.  I'd suggest something like a 
"may-be-related-to" field that would be like a weak link.

The format of a may-be-related-to field would be the same as the parent 
field, except that the object pointed to by the sha1 could have its type 
relaxed, i.e. it could be anything like a blob or a tag.

The semantics of a "may-be-related-to" link would be defined for object 
reachability only:

- If the may-be-related-to link is dangling then it is ignored.

- If it is not dangling then usual reachability rules apply.

That's all the core git might care about, and the only real argument for 
not having this information in the free form commit message.

Still, in your case, you probably won't get rid of your stable branches, 
hence the reachability argument is rather weak for your usage scenario, 
meaning that you could as well have that info in the free form text 
(like cherry-pick -x), and even generate a special graft file from that 
locally for visualization/blame purposes.  Sure the indirection will add 
some overhead, but I doubt it'll be measurable.

People fetching your main branch won't have to carry the whole 
repository because those weak links would otherwise be followed if 
they're formally part of the commit header.  And if they want 
to benefit from the information those weak links carry then they just 
have to also fetch the branch(es) where those links are pointing.  At 
that point it is trivial to regenerate the special graft file locally 
which would also have the benefit of only containing links to actually 
reachable commits, hence you'd never have dangling "origin" links.

Conclusion: the only fundamental reason for having this weak link 
information in the commit header is for reachability convenience for 
when the actual branch that contained the referenced commits is gone, 
which IMHO is a bad justification.  Having lines of developments hanging 
off of a weak link alone is just plain stupid if you can't reach it via 
proper branches or tags.

So I think that your usage scenario is a valid one, but I think that you 
should implement it some other ways, like this special graft file I 
mentioned above, which can be generated and updated from custom 
information found in the free form comment text of a commit object, and 
that proper reachability issues should continue to be handled through 
proper branches and tags as it is done today.

Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 15:32                           ` Stephen R. van den Berg
@ 2008-09-11 18:00                             ` Theodore Tso
  2008-09-11 19:03                               ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Theodore Tso @ 2008-09-11 18:00 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git

On Thu, Sep 11, 2008 at 05:32:02PM +0200, Stephen R. van den Berg wrote:
> gc will preserve the commits the origin links point to once they are
> reachable.  I.e. if the developer doesn't care about the commits the
> origin links point to (i.e. if the branches are not reachable) then gc
> just skips them, if the developer *does* care, the origin links are used
> to keep those objects alive (and, of course, all their parenthood).

This seems wrong.  OK, suppose you have branches A, B, C, and D, while
you are on branch C, you cherry pick commit 'p' from branch B, so that
there is a new commit q on branch C which has an origin link
containing the commit ID's p^ and 'p.    

Now suppose branch B gets deleted, and you do a "git gc".  All of the
commits that were part of branch B will vanish except for p^ and p,
which in your model will stick around because they are origin links
commit q on branch C.  But what good is are these two commits?  They
represent two snapshots in time, with no context now that branch B has
been deleted.  99% of the time, the diff between p^ and p will result
in the equivalent of the diff between q^ and q.  But even if they
aren't, what use are these isolated, disconnected commits?  So having
"git gc" retain them commits that are pointed to be this proposed
origin link doesn't seem to make any sense, and doesn't seem to be
well thought through.

Oh, BTW, suppose you then further do a "git cherry-pick -o" of commit
q while you are on branch D.  Presumably this will create a new
commit, r.  But will the origin-link of commit r be p^ and p, or q^
and q?  And will this change depending on whether or not -o is
specified?

> >I'll also note that having a ***local*** database to cache the origin
> >link is a great way of short-circuiting the performance difficulties.
> >If it works, then it will be a lot easier to convince people that
> >perhaps it should be done git-core, and by modifying core git functions.
> 
> Creating local databases for these kinds of structures feels kludgy
> somehow, since the git hash objects essentially *are* a working
> database.  I have not checked yet if git already has some kind of
> ready-to-use local database lib inside which I could reuse for that.

Gitk already keeps a cache (.git/gitk.cache) to speed up some of its
operations.  And in some ways the index file is a cache, although it
does far more than that.

     	      	       	 		    - Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 17:02                             ` Nicolas Pitre
@ 2008-09-11 18:44                               ` Stephen R. van den Berg
  2008-09-11 20:00                                 ` Nicolas Pitre
  2008-09-11 21:05                               ` Junio C Hamano
  1 sibling, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 18:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jakub Narebski, Linus Torvalds, git

Nicolas Pitre wrote:
>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>> Let me try and describe once more the typical environment this origin field
>> is vital in:

>> Without the origin links, this workflow will cost a lot more time to
>> pursue (I know it, because I'm living it at the moment, and no, I'm not

>First, its name.  The word "origin" probably has a too narrow meaning 
>that creates confusion.  I'd suggest something like a 
>"may-be-related-to" field that would be like a weak link.

Well, the important properties of the name/field would be:
- It should be as specific as possible, in order to minimise the
  potential for abuse in the future.  I distill the desirability of this
  requirement out of the various earlier discussions about commitheaders
  in the past on this mailinglist held by others.
- It should convey a sense of direction (it's a directed graph).

Any generic may-be-related-to field is therefore probably a non-starter.

>The semantics of a "may-be-related-to" link would be defined for object 
>reachability only:
>- If the may-be-related-to link is dangling then it is ignored.
>- If it is not dangling then usual reachability rules apply.
>That's all the core git might care about, and the only real argument for 
>not having this information in the free form commit message.

The origin field as currently proposed tightens the requirements that
it either is dangling and ignored or points to a commit.
rev-list --topo-order should use the origin links to order the output.
gc/prune won't delete commits referenced *by* an origin link.

The only two other arguments one might give to actually keep the field
in the header of the commit as opposed to the trailer is that the 
physical field can be kept machine readable, and the actual display can be
beautified like:  Origin: 2abcdef..1234567
The output of the field could be suppressed (if so desired) if the
target commit isn't reachable.
All this is of course possible for a trailer field in the free-form
area as well, but it seems a bit silly to have two places for "headers".

>Still, in your case, you probably won't get rid of your stable branches, 

True.

>hence the reachability argument is rather weak for your usage scenario, 

Then again, I don't want to be bothered by stupid free-form origin links
made to local branches by a developer.  If the developer creates them
using cherry-pick -o which creates an origin link, I'll never have to
see his silly commit hashes where he is referring to commits in his
local branch (and never waste time wondering where those commits are).

>meaning that you could as well have that info in the free form text 
>(like cherry-pick -x), and even generate a special graft file from that 
>locally for visualization/blame purposes.  Sure the indirection will add 
>some overhead, but I doubt it'll be measurable.

The free-form equivalent looks like:
Origin: df85f7855da44c730f942b330ada181209d09d7a ff1e8bfcd69e5e0ee1a3167e80ef75b611f72123
You need a pair of hashes, which is, a bit bulky, for my taste.

What special graft file would I need to visualise?  Isn't having the
origin link information enough?

>People fetching your main branch won't have to carry the whole 
>repository because those weak links would otherwise be followed if 
>they're formally part of the commit header.  And if they want 
>to benefit from the information those weak links carry then they just 
>have to also fetch the branch(es) where those links are pointing.  At 
>that point it is trivial to regenerate the special graft file locally 
>which would also have the benefit of only containing links to actually 
>reachable commits, hence you'd never have dangling "origin" links.

You lost me here somewhere.  Could you give a concrete example with one
commit, one origin link (your style) and a special graftfile entry?

>Conclusion: the only fundamental reason for having this weak link 
>information in the commit header is for reachability convenience for 
>when the actual branch that contained the referenced commits is gone, 

Erm.  Quite the opposite, actually.
The practical use for the origin link in case the target is unreachable
is zero to none, so it can gleefully be ignored in that case.
But maybe the semantics of your "related" link and my origin link are
sufficiently distinct.
For the arguments why it should be in the header of a commit, see above.

>which IMHO is a bad justification.  Having lines of developments hanging 
>off of a weak link alone is just plain stupid if you can't reach it via 
>proper branches or tags.

Agreed.  But this is in reference to your "related" link proposal.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 18:00                             ` Theodore Tso
@ 2008-09-11 19:03                               ` Stephen R. van den Berg
  2008-09-11 19:33                                 ` Nicolas Pitre
  2008-09-11 20:04                                 ` Theodore Tso
  0 siblings, 2 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 19:03 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jakub Narebski, Linus Torvalds, git

Theodore Tso wrote:
>On Thu, Sep 11, 2008 at 05:32:02PM +0200, Stephen R. van den Berg wrote:
>> gc will preserve the commits the origin links point to once they are
>> reachable.  I.e. if the developer doesn't care about the commits the
>> origin links point to (i.e. if the branches are not reachable) then gc
>> just skips them, if the developer *does* care, the origin links are used
>> to keep those objects alive (and, of course, all their parenthood).

>This seems wrong.  OK, suppose you have branches A, B, C, and D, while
>you are on branch C, you cherry pick commit 'p' from branch B, so that
>there is a new commit q on branch C which has an origin link
>containing the commit ID's p^ and 'p.    

Ok.

>Now suppose branch B gets deleted, and you do a "git gc".  All of the
>commits that were part of branch B will vanish except for p^ and p,

Not quite.  Obviously all parents of p and p^ will continue to exist.
I.e. deleting branch B will cause all commits from p till the tip of B
(except p itself) to vanish.  Keeping p implies that the whole chain of
parents below p will continue to exist and be reachable.  That's the way
a git repository works.

>which in your model will stick around because they are origin links
>commit q on branch C.  But what good is are these two commits?  They
>represent two snapshots in time, with no context now that branch B has

The context are all their ancestors, which continue to exist, and that
is all you need.

>been deleted.  99% of the time, the diff between p^ and p will result
>in the equivalent of the diff between q^ and q.  But even if they
>aren't, what use are these isolated, disconnected commits?  So having
>"git gc" retain them commits that are pointed to be this proposed
>origin link doesn't seem to make any sense, and doesn't seem to be
>well thought through.

I beg to differ, but I presume you agree with me now?

>Oh, BTW, suppose you then further do a "git cherry-pick -o" of commit
>q while you are on branch D.  Presumably this will create a new
>commit, r.  But will the origin-link of commit r be p^ and p, or q^
>and q?

It will be q^..q, and specifically not p^..p, using ^p..p would be
lying.  We aim to document the evolvement of the patch in time.
Cherry-pick itself will always ignore the origin links present on the
old commit, it simply creates new ones as if the old ones didn't exist.

>  And will this change depending on whether or not -o is
>specified?

No.  Actually, cherry-pick will never generate origin links unless -o is
specified.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 15:39                     ` Linus Torvalds
  2008-09-11 16:01                       ` Paolo Bonzini
  2008-09-11 16:53                       ` Jakub Narebski
@ 2008-09-11 19:23                       ` Stephen R. van den Berg
  2008-09-11 19:45                         ` Nicolas Pitre
  2 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 19:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jakub Narebski, git

Linus Torvalds wrote:
>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>> >delete of the origin branch will basically make them unreachable.

>> False.

>Stephen, here's a f*cking clue:
> - I know how git works.

I'd presume you do, but that doesn't mean you always accurately express
yourself.

>> If you fetch just branches A, B and C, but not D, the origin link from A
>> to D is dangling.  Once you have fetched D as well [..]

>So I just said we deleted beanch 'D', so there's no way to ever fetch it 
>again.

You did not state you deleted branch 'D' on the repository being fetched
*FROM*.  I assumed you meant you deleted branch 'D' on the repository
doing the fetching (after having fetched 'D' in the past).

>Get it?

"You stupid git".

>The fact is, a big part of git is temporary branches. It's one of the 
>*best* features of git. Throw-away stuff. Those throw-away branches are 
>often done for initial development, and then the final result is often a 
>cleaned-up version. Often using rebase or cherry-picking or any number of 
>things.

Indeed, features I value in git very much, and use every day, thanks.

[...portions of man git-cherry-pick stripped...]

>Can you not understand that? The "origin" field is _garbage_. It's garbage 
>for all normal cases. The original commit will not ever even EXIST in the 
>result, because it has long since been thrown away and will never exist 
>anywhere else.

The origin field will *not* be created on regular cherry-picks, this
*would* create garbage.  The origin field is not meant to be generated
when doing things with temporary branches.  The origin field is meant to
be filled *ONLY* when cherry-picking from one permanent branch to
another permanent branch.  This is a *rare* operation.

>Garbage should be _avoided_, not added.

Quite.

I do understand that "normal cases" in your case mean cherry-picks among
temporary branches.
Well, you are completely right that *your* normal cases should not (and
will not) generate an origin field.
The origin field is intended for the *abnormal* cases, which means
cherry-picking between permanent branches (which, apparently, you rarely
do, if ever), this is something that (depending on your workflow) can be
a more frequent event.  For *those* cases, the origin field will not
contain garbage.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:03                               ` Stephen R. van den Berg
@ 2008-09-11 19:33                                 ` Nicolas Pitre
  2008-09-11 19:44                                   ` Stephen R. van den Berg
  2008-09-11 20:04                                 ` Theodore Tso
  1 sibling, 1 reply; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-11 19:33 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:

> Theodore Tso wrote:
> >On Thu, Sep 11, 2008 at 05:32:02PM +0200, Stephen R. van den Berg wrote:
> >> gc will preserve the commits the origin links point to once they are
> >> reachable.  I.e. if the developer doesn't care about the commits the
> >> origin links point to (i.e. if the branches are not reachable) then gc
> >> just skips them, if the developer *does* care, the origin links are used
> >> to keep those objects alive (and, of course, all their parenthood).
> 
> >This seems wrong.  OK, suppose you have branches A, B, C, and D, while
> >you are on branch C, you cherry pick commit 'p' from branch B, so that
> >there is a new commit q on branch C which has an origin link
> >containing the commit ID's p^ and 'p.    
> 
> Ok.
> 
> >Now suppose branch B gets deleted, and you do a "git gc".  All of the
> >commits that were part of branch B will vanish except for p^ and p,
> 
> Not quite.  Obviously all parents of p and p^ will continue to exist.
> I.e. deleting branch B will cause all commits from p till the tip of B
> (except p itself) to vanish.  Keeping p implies that the whole chain of
> parents below p will continue to exist and be reachable.  That's the way
> a git repository works.

And that's what I called stupid in my earlier reply to you.  Either you 
have proper branches or tags keeping P around, or deleting B brings 
everything not reachable through other branches or tags (or reflog) 
away too.  Otherwise there is no point making a dangling origin link 
valid.


Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:33                                 ` Nicolas Pitre
@ 2008-09-11 19:44                                   ` Stephen R. van den Berg
  2008-09-11 20:03                                     ` Nicolas Pitre
  2008-09-11 20:05                                     ` Jakub Narebski
  0 siblings, 2 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 19:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git

Nicolas Pitre wrote:
>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>> Not quite.  Obviously all parents of p and p^ will continue to exist.
>> I.e. deleting branch B will cause all commits from p till the tip of B
>> (except p itself) to vanish.  Keeping p implies that the whole chain of
>> parents below p will continue to exist and be reachable.  That's the way
>> a git repository works.

>And that's what I called stupid in my earlier reply to you.  Either you 
>have proper branches or tags keeping P around, or deleting B brings 
>everything not reachable through other branches or tags (or reflog) 
>away too.  Otherwise there is no point making a dangling origin link 
>valid.

Well, the principle of least surprise dictates that they should be kept
by gc as described above, however...
I can envision an option to gc say "--drop-weak-links" which does
exactly what you describe.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:23                       ` Stephen R. van den Berg
@ 2008-09-11 19:45                         ` Nicolas Pitre
  2008-09-11 19:55                           ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-11 19:45 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Linus Torvalds, Jakub Narebski, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:

> The origin field will *not* be created on regular cherry-picks, this
> *would* create garbage.  The origin field is not meant to be generated
> when doing things with temporary branches.  The origin field is meant to
> be filled *ONLY* when cherry-picking from one permanent branch to
> another permanent branch.  This is a *rare* operation.

... and therefore you might as well just have a separate file (which 
might or might not be tracked by git like the .gitignore files are) 
to keep that information?  Since this is a rare operation, modifying the 
core database structure for this doesn't appear that appealing to most 
so far.

And, while recording this origin link is optional, you are likely to 
make mistakes like forgotting to record it, or you might even wish to 
fix it with better links after the facts.  Having it versionned also 
means that older git versions will be able to carry that information 
even if they won't make any use of it, and that also solves the 
cryptographic issue since that data is part of the top commit SHA1.

Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:45                         ` Nicolas Pitre
@ 2008-09-11 19:55                           ` Stephen R. van den Berg
  2008-09-11 20:27                             ` Nicolas Pitre
  2008-09-11 21:01                             ` Theodore Tso
  0 siblings, 2 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 19:55 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Jakub Narebski, git

Nicolas Pitre wrote:
>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>> when doing things with temporary branches.  The origin field is meant to
>> be filled *ONLY* when cherry-picking from one permanent branch to
>> another permanent branch.  This is a *rare* operation.

>... and therefore you might as well just have a separate file (which 
>might or might not be tracked by git like the .gitignore files are) 
>to keep that information?  Since this is a rare operation, modifying the 
>core database structure for this doesn't appear that appealing to most 
>so far.

For various reasons, the best alternate place would be at the trailing
end of the free-form field.  Using a separate structure causes
(performance) problems (mostly).

>And, while recording this origin link is optional, you are likely to 
>make mistakes like forgotting to record it,

That is just as likely filling in the wrong commit message.

> or you might even wish to 
>fix it with better links after the facts.

That is not possible for commit messages, and should not be possible for
origin links either (same reasons).

>  Having it versionned also 
>means that older git versions will be able to carry that information 
>even if they won't make any use of it, and that also solves the 
>cryptographic issue since that data is part of the top commit SHA1.

It would allow the data to be faked, that is undesirable for "git blame".
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 18:44                               ` Stephen R. van den Berg
@ 2008-09-11 20:00                                 ` Nicolas Pitre
  0 siblings, 0 replies; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-11 20:00 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:

> Nicolas Pitre wrote:
> >First, its name.  The word "origin" probably has a too narrow meaning 
> >that creates confusion.  I'd suggest something like a 
> >"may-be-related-to" field that would be like a weak link.
> 
> Well, the important properties of the name/field would be:
> - It should be as specific as possible, in order to minimise the
>   potential for abuse in the future.  I distill the desirability of this
>   requirement out of the various earlier discussions about commitheaders
>   in the past on this mailinglist held by others.

Well, sure.  But being too specific sometimes limits its usefulness.  
There could be other usages for such a link which IMHO should be defined 
in terms of graph connectivity semantics rather than high level purpose.

> - It should convey a sense of direction (it's a directed graph).

Well, isn't that already obvious?

> Any generic may-be-related-to field is therefore probably a non-starter.

Well, my whole argument is that if it has no generic purpose then it 
probably doesn't belong in the commit header.

> The origin field as currently proposed tightens the requirements that
> it either is dangling and ignored or points to a commit.
> rev-list --topo-order should use the origin links to order the output.
> gc/prune won't delete commits referenced *by* an origin link.

And I disagree on the gc/prune point, as mentioned previously.

As to rev-list --topo-order, it doesn't need for this link to actually 
be part of the commit object to accomplish the desired effect.

> The only two other arguments one might give to actually keep the field
> in the header of the commit as opposed to the trailer is that the 
> physical field can be kept machine readable, and the actual display can be
> beautified like:  Origin: 2abcdef..1234567
> The output of the field could be suppressed (if so desired) if the
> target commit isn't reachable.
> All this is of course possible for a trailer field in the free-form
> area as well, but it seems a bit silly to have two places for "headers".

And I think you should simply create a file within the repository with 
that info instead of either thecommit header or the free form text.  It 
gives all the usability advantages you wish for and more.


Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:44                                   ` Stephen R. van den Berg
@ 2008-09-11 20:03                                     ` Nicolas Pitre
  2008-09-11 20:24                                       ` Stephen R. van den Berg
  2008-09-11 20:05                                     ` Jakub Narebski
  1 sibling, 1 reply; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-11 20:03 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:

> Nicolas Pitre wrote:
> >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
> >> Not quite.  Obviously all parents of p and p^ will continue to exist.
> >> I.e. deleting branch B will cause all commits from p till the tip of B
> >> (except p itself) to vanish.  Keeping p implies that the whole chain of
> >> parents below p will continue to exist and be reachable.  That's the way
> >> a git repository works.
> 
> >And that's what I called stupid in my earlier reply to you.  Either you 
> >have proper branches or tags keeping P around, or deleting B brings 
> >everything not reachable through other branches or tags (or reflog) 
> >away too.  Otherwise there is no point making a dangling origin link 
> >valid.
> 
> Well, the principle of least surprise dictates that they should be kept
> by gc as described above, however...
> I can envision an option to gc say "--drop-weak-links" which does
> exactly what you describe.

Don't you think this starts to look silly at that point?


Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:03                               ` Stephen R. van den Berg
  2008-09-11 19:33                                 ` Nicolas Pitre
@ 2008-09-11 20:04                                 ` Theodore Tso
  2008-09-11 21:46                                   ` Jeff King
  1 sibling, 1 reply; 137+ messages in thread
From: Theodore Tso @ 2008-09-11 20:04 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jakub Narebski, Linus Torvalds, git

On Thu, Sep 11, 2008 at 09:03:35PM +0200, Stephen R. van den Berg wrote:
> >This seems wrong.  OK, suppose you have branches A, B, C, and D, while
> >you are on branch C, you cherry pick commit 'p' from branch B, so that
> >there is a new commit q on branch C which has an origin link
> >containing the commit ID's p^ and 'p.    
> 
> >Now suppose branch B gets deleted, and you do a "git gc".  All of the
> >commits that were part of branch B will vanish except for p^ and p,
> 
> Not quite.  Obviously all parents of p and p^ will continue to exist.
> I.e. deleting branch B will cause all commits from p till the tip of B
> (except p itself) to vanish.  Keeping p implies that the whole chain of
> parents below p will continue to exist and be reachable.  That's the way
> a git repository works.

That's still not very useful, since you still don't have a label for
this anonymous series of commit chain that just dead ends at commit p.
How would anyone find this useful?

> >which in your model will stick around because they are origin links
> >commit q on branch C.  But what good is are these two commits?  They
> >represent two snapshots in time, with no context now that branch B has
> 
> The context are all their ancestors, which continue to exist, and that
> is all you need.

Need for what?  What useful information would you devine from it?

> >Oh, BTW, suppose you then further do a "git cherry-pick -o" of commit
> >q while you are on branch D.  Presumably this will create a new
> >commit, r.  But will the origin-link of commit r be p^ and p, or q^
> >and q?
> 
> It will be q^..q, and specifically not p^..p, using ^p..p would be
> lying.  We aim to document the evolvement of the patch in time.
> Cherry-pick itself will always ignore the origin links present on the
> old commit, it simply creates new ones as if the old ones didn't exist.

So if you never pull branch C (where commit q resides), there is no
way for you to know that commits p and r are related.  How.... not
useful.

If the scenario was being able to tell which stable branches had a
particular bug fixes, I think my proposal of attaching a bug
identifier is a far superior solution.

Again, what's the use case of "trying to document the development of
the patch in time?"  Aside from drawing pretty dotted lines
everywhere, what *good* does this actually achieve?  How would it
affect other git commands' behavior, and how would this change in
behavior actually be considered a net improvement over what we have
now?

						- Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:44                                   ` Stephen R. van den Berg
  2008-09-11 20:03                                     ` Nicolas Pitre
@ 2008-09-11 20:05                                     ` Jakub Narebski
  2008-09-11 20:22                                       ` Stephen R. van den Berg
  1 sibling, 1 reply; 137+ messages in thread
From: Jakub Narebski @ 2008-09-11 20:05 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Nicolas Pitre, Theodore Tso, Linus Torvalds, git

Stephen R. van den Berg wrote:
> Nicolas Pitre wrote:
>>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>>> Not quite.  Obviously all parents of p and p^ will continue to exist.
>>> I.e. deleting branch B will cause all commits from p till the tip of B
>>> (except p itself) to vanish.  Keeping p implies that the whole chain of
>>> parents below p will continue to exist and be reachable.  That's the way
>>> a git repository works.
> 
>>And that's what I called stupid in my earlier reply to you.  Either you 
>>have proper branches or tags keeping P around, or deleting B brings 
>>everything not reachable through other branches or tags (or reflog) 
>>away too.  Otherwise there is no point making a dangling origin link 
>>valid.
> 
> Well, the principle of least surprise dictates that they should be kept
> by gc as described above, however...
> I can envision an option to gc say "--drop-weak-links" which does
> exactly what you describe.

Well, IIRC the need for this was one of the causes of "death" of 'prior'
header link proposal...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 16:23                         ` Linus Torvalds
@ 2008-09-11 20:16                           ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 20:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paolo Bonzini, Jakub Narebski, git

Linus Torvalds wrote:
>But my point is, _none_ of what Stephen proposes has _any_ advantage over 
>the already existing functionality.

I think you're missing some of the advantages because you don't have a
lot of experience with cherry-pick workflows between multiple permanent
branches.

>IOW, absolutely *everything* is actually done better with existing data 
>structures, and then just adding tools to perhaps follow those SHA1's in 
>the commit message.

The best way to explain the difference is probably by implementing the
free-form support, so I think I'll do that.

>For example, the claim was that it's hard to follow the chain of 
>cherry-picks. That's not _true_. Use gitweb and gitk, and you can already 
>see them. Sure, you need to use "-x", BUT YOU'D HAVE TO USE THAT WITH 
>Steven's MODEL TOO!

The existing cherry-pick -x option doesn't cut it, it helps for the
simple cases, yes, but there are cherry-pick situations where it just
adds to the confusion.

>Exactly because it would be a frigging _disaster_ if that "origin" field 
>was done by default.

That never was the intention, and never will be happening.

>And the only thing that "origin" does is:

> - hide the information

Only if you want to hide it, you control if it does, this point is moot.

> - make it easier to make mistakes (either enable the feature by default, 
>   or not notice that you didn't enable it when you wanted to)

The same holds for -x, so this point is moot as well.

> - add a requirement for a backwards-incompatible field that is just 
>   guaranteed to confuse any old git binaries.

This is a problem, I admit, but maybe this can be solved in the future.
Then again, since use of the feature is a *very* conscious decision, anyone
using the feature can advise their users to use git version xxx at least.

> - make it _harder_ to do things like send revert/cherry-pick information 
>   by email.

Not necessarily, adding an Origin field in the patch sent by mail is
easy.  I don't see how it would be more difficult otherwise.  Please
explain.

>See? There are only downsides.

I think I just neutralised all but one of the mentioned downsides, and
the backward compatibility issue is at least mitigated.

>and then go to its parent commit (just click on the parent SHA). And 
>notice how the stable kernel tree commits talk about where they were 
>back-ported from, or _why_ they aren't back-ports at all!

And this is impossible when using the origin link?  The usage with an
origin link would be just as flexible, even more so.

>IOW, there are really two main cases:

> - the common case for cherry-picking: you do not want any origin 
>   information, because it's irrelevant, pointless, and *wrong*.

Quite, and my proposal is not generating those anyway.

> - you _do_ want origin information, but you actually want to _explain_ 
>   explicitly why it's not irrelevant, pointless, or wrong.

>And yes, the latter case is about a lot more than "this was 
>cherry-picked". It's about "this fixes that other commit we did", or it's 
>about "this was anti-cherry-picked - ie reverted". They are all "origins" 
>for the commit in the sense that they are relevant to the commit, but they 
>all need some explanation of what _kind_ of origins they are.

Yes, and that *extra* information can and should go into the free-form
commit message, alongside of the origin field inside the header (or
trailer), just edit the commit message before committing after a
cherry-pick -o.  What's your point?
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 20:05                                     ` Jakub Narebski
@ 2008-09-11 20:22                                       ` Stephen R. van den Berg
  2008-09-12  0:30                                         ` A Large Angry SCM
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 20:22 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Nicolas Pitre, Theodore Tso, Linus Torvalds, git

Jakub Narebski wrote:
>Stephen R. van den Berg wrote:
>> Well, the principle of least surprise dictates that they should be kept
>> by gc as described above, however...
>> I can envision an option to gc say "--drop-weak-links" which does
>> exactly what you describe.

>Well, IIRC the need for this was one of the causes of "death" of 'prior'
>header link proposal...

As I understood it, one of the causes of death of the "prior" link
proposal was that it was unclear if it pulled in the linked-to commits
upon fetch.  In the "origin" case, the default is *not* to fetch them.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 20:03                                     ` Nicolas Pitre
@ 2008-09-11 20:24                                       ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 20:24 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git

Nicolas Pitre wrote:
>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>> Well, the principle of least surprise dictates that they should be kept
>> by gc as described above, however...
>> I can envision an option to gc say "--drop-weak-links" which does
>> exactly what you describe.

>Don't you think this starts to look silly at that point?

No, it's the developers vote controlling his own repository saying:
Ok, I expressed interest in the other branches and their
backport/forwardport relationships, but I changed my mind.  Drop all
backport/forwardport information on branches I don't explicitly have.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:55                           ` Stephen R. van den Berg
@ 2008-09-11 20:27                             ` Nicolas Pitre
  2008-09-12  8:50                               ` Stephen R. van den Berg
  2008-09-11 21:01                             ` Theodore Tso
  1 sibling, 1 reply; 137+ messages in thread
From: Nicolas Pitre @ 2008-09-11 20:27 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Linus Torvalds, Jakub Narebski, git

On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:

> Nicolas Pitre wrote:
> >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
> >> when doing things with temporary branches.  The origin field is meant to
> >> be filled *ONLY* when cherry-picking from one permanent branch to
> >> another permanent branch.  This is a *rare* operation.
> 
> >... and therefore you might as well just have a separate file (which 
> >might or might not be tracked by git like the .gitignore files are) 
> >to keep that information?  Since this is a rare operation, modifying the 
> >core database structure for this doesn't appear that appealing to most 
> >so far.
> 
> For various reasons, the best alternate place would be at the trailing
> end of the free-form field.  Using a separate structure causes
> (performance) problems (mostly).

Did you try it?  I don't particularly buy this performance argument, and 
the bulk of my contributions to git so far were about performances.  It 
is quite easy to load a flat file with sorted commit SHA1s, and given 
that origin links are the result of a rare operation, then there 
shouldn't be too many entries to search through.  Hell, doing 213647 
lookups (and many other things like inflating zlib deflated data)  with 
each of them for commit objects in my Linux repository which has 1355167 
total entries takes only 6 seconds here, or about a quarter of a 
milisecond for each lookup.  I doubt doing an extra lookup in a much 
smaller table would show on the radar.

Nicolas

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 19:55                           ` Stephen R. van den Berg
  2008-09-11 20:27                             ` Nicolas Pitre
@ 2008-09-11 21:01                             ` Theodore Tso
  2008-09-12  8:40                               ` Stephen R. van den Berg
  1 sibling, 1 reply; 137+ messages in thread
From: Theodore Tso @ 2008-09-11 21:01 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: Nicolas Pitre, Linus Torvalds, Jakub Narebski, git

On Thu, Sep 11, 2008 at 09:55:16PM +0200, Stephen R. van den Berg wrote:
> >  Having it versionned also 
> >means that older git versions will be able to carry that information 
> >even if they won't make any use of it, and that also solves the 
> >cryptographic issue since that data is part of the top commit SHA1.
> 
> It would allow the data to be faked, that is undesirable for "git blame".

Why would this matter?  The information is largely
self-authenticating.  If a commit claims to have come from some other
cherry-pick, a human taking a quick look at it would know instantly
that this wasn't true.  So what's the harm done if some incorrect
information gets introduced?  "git blame" is something which is
generally used by humans, not by automated programs.

Also, what's the attack scenario?  The person who originally makes the
commit can easily fake the origin link information.  They can hack git
to fill on some other commit ID, for example.  So what you are
protecting against is someone after the fact adding the annotation
that this commit was related to this other commit.  When would this be
a bad thing to do?  If they are adding correct information, it's a
good thing.  If they add incorrect information, what's the harm they
can as a result of being able to add the incorrect information.
(Noting that if this annotation file is kept under git control, you
can use what ever access controls and/or process controls that verify
that a new cherry-pick --- or a commit claiming to be a cherry-pick
--- is valid and should be accepted into the master git repository for
that project.

						- Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 17:02                             ` Nicolas Pitre
  2008-09-11 18:44                               ` Stephen R. van den Berg
@ 2008-09-11 21:05                               ` Junio C Hamano
  2008-09-11 22:32                                 ` Stephen R. van den Berg
  1 sibling, 1 reply; 137+ messages in thread
From: Junio C Hamano @ 2008-09-11 21:05 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Stephen R. van den Berg, Jakub Narebski, Linus Torvalds, git

Nicolas Pitre <nico@cam.org> writes:

> Still, in your case, you probably won't get rid of your stable branches, 
> hence the reachability argument is rather weak for your usage scenario, 
> meaning that you could as well have that info in the free form text 
> (like cherry-pick -x), and even generate a special graft file from that 
> locally for visualization/blame purposes.  Sure the indirection will add 
> some overhead, but I doubt it'll be measurable.

I keep hearing "blame" in this discussion, but I do not understand why
people think blame should _follow_ this "origin" information (in the usual
sense of "following").

Suppose you cherry-pick an existing commit from unrelated context:

         ...---A---B
                    . (origin)
                     .
        ...---o---X---Y---Z

i.e. on top of X the difference to bring A to B is applied to produce Y,
and a new development Z is made on top.  You start digging from Z.

Without any "origin", here is how blame works:

 * What Z did is blamed on Z; what Z did not change is passed to Y;

 * Y needs to:

   (1) take responsibility for what it changed; and/or

   (2) the remaining contents came from X --- pass the blame to it.

Let's see how we would want "origin" get involved.  Instead of the above,
what Y would do would be:

   (1) if the contents (excluding the part Z changed) is different from X,
       instead of taking the blame itself, give the _final_ blame to B.

   (2) the remainder is passed to X as usual.

This is different from the normal "following" in that B is not allowed to
pass the blame to its parents (should it be allowed to pass it to its
"origin"?), because the _only thing_ cherry-pick did was to transport what
B did (relative to A) to the unrelated history that led to X.

IOW, you did not look at the contents outside "diff A..B" when you made
the cherry-pick.  There could well be parts of the content that are common
across all of A, Y, X and Z, but as far as Y and Z are concerned, they did
not get any part of that common common content from A (otherwise "origin"
is no different from "parent", but you did not merge).

The output from "origin" aware blame would be identical to the normal
blame, except that lines that usually are labeled with Y are labeled with
B.  However:

   (1) If you _are_ interested in the line that says Y, you can look at
       the commit object Y and see "cherry-pick -x" information to learn
       it came from B already; and

   (2) More importantly, if you want to dig deeper by peeling the blamed
       line (I think gitweb allows this, and probably git-gui), you
       shouldn't peel that line blamed on B to start running blame at A.
       That would continue digging the history of A, which is wrong when
       you are examining the history that led to Z.

So please leave "blame" out of this discussion.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 20:04                                 ` Theodore Tso
@ 2008-09-11 21:46                                   ` Jeff King
  2008-09-11 22:56                                     ` Stephen R. van den Berg
  2008-09-11 23:10                                     ` Linus Torvalds
  0 siblings, 2 replies; 137+ messages in thread
From: Jeff King @ 2008-09-11 21:46 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Stephen R. van den Berg, Jakub Narebski, Linus Torvalds, git

On Thu, Sep 11, 2008 at 04:04:53PM -0400, Theodore Tso wrote:

> > It will be q^..q, and specifically not p^..p, using ^p..p would be
> > lying.  We aim to document the evolvement of the patch in time.
> > Cherry-pick itself will always ignore the origin links present on the
> > old commit, it simply creates new ones as if the old ones didn't exist.
> 
> So if you never pull branch C (where commit q resides), there is no
> way for you to know that commits p and r are related.  How.... not
> useful.

That is a good point. Stephen has explained his workflow, and I can see
why he wants to reference the cherry-picked commits, and how he thinks
that the referenced commits will always be available in that workflow.
And obviously in Linus's workflow such references are basically useless,
and they should just not be generated.

But what about workflows in between? When I pull from some developer who
has added a weak reference to a particular commit SHA1, but I _don't_
have that commit, my next question "OK, so what was in that commit?".
What is the mechanism by which I find out more information on that SHA1?

Using a key that is meaningful to an external database (like a bug
tracker) means that you can go to that database to look up more
information.

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 21:05                               ` Junio C Hamano
@ 2008-09-11 22:32                                 ` Stephen R. van den Berg
  2008-09-11 22:40                                   ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 22:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Jakub Narebski, Linus Torvalds, git

Junio C Hamano wrote:
>I keep hearing "blame" in this discussion, but I do not understand why
>people think blame should _follow_ this "origin" information (in the usual
>sense of "following").

>Suppose you cherry-pick an existing commit from unrelated context:

>         ...---A---B
>                    . (origin)
>                     .
>        ...---o---X---Y---Z

>i.e. on top of X the difference to bring A to B is applied to produce Y,
>and a new development Z is made on top.  You start digging from Z.

>Let's see how we would want "origin" get involved.  Instead of the above,
>what Y would do would be:

>   (1) if the contents (excluding the part Z changed) is different from X,
>       instead of taking the blame itself, give the _final_ blame to B.

>   (2) the remainder is passed to X as usual.

Sounds reasonable.

>This is different from the normal "following" in that B is not allowed to
>pass the blame to its parents (should it be allowed to pass it to its
>"origin"?), because the _only thing_ cherry-pick did was to transport what
>B did (relative to A) to the unrelated history that led to X.

Well, I'd expect:
a. That B should be able to pass blame onto it's origin.
b. That B should be able to pass blame onto A (and deeper).

Let me show another example:

...-C---D---E---F---G
                 . (origin)
                  .
         ...---A---B
                    . (origin)
                     .
        ...---o---X---Y---Z


Now suppose there is a piece of sourcecode which evolves from C to F,
then when I dig into G using blame I get something like:  CCCFFEGGDDDCC
(Every letter represents a line in the sourcecode)

Digging into Z I'd expect to see the following:  ZZCCCFFEDDYDCCB

All this assumes that there were minimal changes to the patch when
creating B, and also minimal changes to the patch when creating Y.

I.e. large parts of that code where developed during C, D, E and F, so
that is what I expect to see; is that illogical?
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 22:32                                 ` Stephen R. van den Berg
@ 2008-09-11 22:40                                   ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 22:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Jakub Narebski, Linus Torvalds, git

Stephen R. van den Berg wrote:
>Junio C Hamano wrote:
>>This is different from the normal "following" in that B is not allowed to
>>pass the blame to its parents (should it be allowed to pass it to its
>>"origin"?), because the _only thing_ cherry-pick did was to transport what
>>B did (relative to A) to the unrelated history that led to X.

>Well, I'd expect:
>a. That B should be able to pass blame onto it's origin.
>b. That B should be able to pass blame onto A (and deeper).

>Let me show another example:

>....-C---D---E---F---G
>                 . (origin)
>                  .
>         ...---A---B
>                    . (origin)
>                     .
>        ...---o---X---Y---Z

>Now suppose there is a piece of sourcecode which evolves from C to F,
>then when I dig into G using blame I get something like:  CCCFFEGGDDDCC
>(Every letter represents a line in the sourcecode)

>Digging into Z I'd expect to see the following:  ZZCCCFFEDDYDCCB

>All this assumes that there were minimal changes to the patch when
>creating B, and also minimal changes to the patch when creating Y.

>I.e. large parts of that code where developed during C, D, E and F, so
>that is what I expect to see; is that illogical?

I'm sorry, you're right, I'm confusing things here.  The case I'm describing
here can only happen when you do this:

....-C---D---E---F---G
      \...\...\..\ (origin)
                  .
         ...---A---B
                    . (origin)
                     .
        ...---o---X---Y---Z

I.e. the first cherry-pick needs to cherry-pick C, D, E *and* F into B,
that will result in four origin fields there.
And yes, that means that:
- blame follows origin links (repeatedly).
- blame does *not* travel to parents of commits found through an origin
  link.

Does that mean that blame uses origin fields?  Yes, it does, and it has
to check for origin links at every commit it traverses.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 21:46                                   ` Jeff King
@ 2008-09-11 22:56                                     ` Stephen R. van den Berg
  2008-09-11 23:01                                       ` Jeff King
  2008-09-11 23:10                                     ` Linus Torvalds
  1 sibling, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 22:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git

Jeff King wrote:
>On Thu, Sep 11, 2008 at 04:04:53PM -0400, Theodore Tso wrote:
>> > It will be q^..q, and specifically not p^..p, using ^p..p would be
>> > lying.  We aim to document the evolvement of the patch in time.
>> > Cherry-pick itself will always ignore the origin links present on the
>> > old commit, it simply creates new ones as if the old ones didn't exist.

>> So if you never pull branch C (where commit q resides), there is no
>> way for you to know that commits p and r are related.  How.... not
>> useful.

>But what about workflows in between? When I pull from some developer who
>has added a weak reference to a particular commit SHA1, but I _don't_
>have that commit, my next question "OK, so what was in that commit?".
>What is the mechanism by which I find out more information on that SHA1?

Well, the usual way to fix this is to actually startup fetch and tell it
to try and fetch all the weak links (or just fetch a single hash (the
offending origin link)) from upstream; this is by no means the default
operatingmode of fetch, but I don't see any harm in allowing to fetch
those if one really wants to.

>Using a key that is meaningful to an external database (like a bug
>tracker) means that you can go to that database to look up more
>information.

True.  And also a Good Thing, I concur.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 22:56                                     ` Stephen R. van den Berg
@ 2008-09-11 23:01                                       ` Jeff King
  2008-09-11 23:17                                         ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Jeff King @ 2008-09-11 23:01 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git

On Fri, Sep 12, 2008 at 12:56:48AM +0200, Stephen R. van den Berg wrote:

> Well, the usual way to fix this is to actually startup fetch and tell it
> to try and fetch all the weak links (or just fetch a single hash (the
> offending origin link)) from upstream; this is by no means the default
> operatingmode of fetch, but I don't see any harm in allowing to fetch
> those if one really wants to.

Maybe I am misremembering the details of fetching, but I believe you
cannot fetch an arbitrary SHA-1, and that is by design. So:

  1. You would have to argue the merits of changing that design. I
     believe the rationale relates to exposing some subset of the
     content via refs, but I have personally never felt that is very
     compelling.

  2. Even if we did make a change, that means that _both_ sides need the
     upgraded version.

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 21:46                                   ` Jeff King
  2008-09-11 22:56                                     ` Stephen R. van den Berg
@ 2008-09-11 23:10                                     ` Linus Torvalds
  2008-09-11 23:26                                       ` Jeff King
  2008-09-11 23:36                                       ` Stephen R. van den Berg
  1 sibling, 2 replies; 137+ messages in thread
From: Linus Torvalds @ 2008-09-11 23:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Theodore Tso, Stephen R. van den Berg, Jakub Narebski, git

On Thu, 11 Sep 2008, Jeff King wrote:
>
> And obviously in Linus's workflow such references are basically useless,
> and they should just not be generated.

This has _nothing_ to do with workflows or anything else.

Why are people claiming these total red herrings?

I have asked several times what it is that makes it so important that the 
"origin" information be in the headers. Nobody has been able to explain 
why it's so different from just doing it in the free-form part. NOBODY.

If somebody has a workflow where they want to track "origin" commits, then 
they can do it today with the in-body approach. But that has nothing 
what-so-ever to do with the question of "let's change object file format 
to some odd special-case that we just made up and is only apparently 
useful for some special workflow that uses special tools and special 
rules".

I want the git object database to have really clear semantics. The fields 
we have now, we have because we _require_ them. There is nothing unclear 
what-so-ever about the semantics of author/commiter-ship, parenthood, 
trees, or anything else.

And there are _zero_ issues about "workflow". The workflow doesn't matter, 
the objects always make sense, and they always work exactly the same way. 
There are no special magic cases that are in the least questionable in any 
way.

So this argument is about more than just "minimalism", although I'll also 
admit to that being an issue - I want to be able to basically explain how 
git data structures work to any CS student, and not have any extra fat or 
any gray areas. It's about everything having a clear design, and a clear 
meaning, and there never being any question what-so-ever about what the 
real "meaning" of something is.

Then, if you have some special use case or rules for your particular 
project, well that's where you can have things like formatting rules for 
how the commit messages should look like. If somebody wants to use fixed 
format rules for their project, that's fine. And THAT is where "workflow" 
issues come up. 

But "workflow" has nothing to do with core git data structures. They were 
designed for speed, stability, simplicity and good taste. The _workflow_ 
part has been designed on separately on top of that (example: the whole 
thing with a single-line top summary of a commit so that we can have "git 
shortlog" and the "gitk" single-line commit view etc).

Of course, good and generally useful workflows can then be reflected in 
how tools work, where that single line commit summary is an example of 
that: it's not something that git data structures _enforce_ or even care 
about, but it's obviously something that a lot of the porcelain expects, 
and without it, lots of tools will output less useful information.

The same goes for the existing SHA1-in-comment support: some tools already 
support it and help view it in certain ways, even though it is in no way a 
core data structure issue. And _extending_ on that kind of helpful 
porcelain support certainly makes sense.

The only thing I have ever argued against is adding commit headers that 
have no sane semantics and don't make sense as internal git data 
structures.

			Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 23:01                                       ` Jeff King
@ 2008-09-11 23:17                                         ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 23:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Theodore Tso, Jakub Narebski, Linus Torvalds, git

Jeff King wrote:
>On Fri, Sep 12, 2008 at 12:56:48AM +0200, Stephen R. van den Berg wrote:
>> Well, the usual way to fix this is to actually startup fetch and tell it
>> to try and fetch all the weak links (or just fetch a single hash (the
>> offending origin link)) from upstream; this is by no means the default
>> operatingmode of fetch, but I don't see any harm in allowing to fetch
>> those if one really wants to.

>Maybe I am misremembering the details of fetching, but I believe you
>cannot fetch an arbitrary SHA-1, and that is by design. So:

I see, didn't know that.

>  1. You would have to argue the merits of changing that design. I
>     believe the rationale relates to exposing some subset of the
>     content via refs, but I have personally never felt that is very
>     compelling.

Well, I can understand why it is done this way, I think.

>  2. Even if we did make a change, that means that _both_ sides need the
>     upgraded version.

If you're using origin links, you'd need that anyway, so that's a given.
I could imagine the minimum would be something like:

 Allow direct SHA1 fetches (which obviously pull in all parents as well)
 if the ref is part of one of the public branches (either as a commit,
 or as an origin link).
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 23:10                                     ` Linus Torvalds
@ 2008-09-11 23:26                                       ` Jeff King
  2008-09-11 23:36                                       ` Stephen R. van den Berg
  1 sibling, 0 replies; 137+ messages in thread
From: Jeff King @ 2008-09-11 23:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Theodore Tso, Stephen R. van den Berg, Jakub Narebski, git

On Thu, Sep 11, 2008 at 04:10:26PM -0700, Linus Torvalds wrote:

> > And obviously in Linus's workflow such references are basically useless,
> > and they should just not be generated.
> 
> This has _nothing_ to do with workflows or anything else.
> 
> Why are people claiming these total red herrings?
> 
> I have asked several times what it is that makes it so important that the 
> "origin" information be in the headers. Nobody has been able to explain 
> why it's so different from just doing it in the free-form part. NOBODY.

The message you are responding to has nothing to do with an origin
header versus putting it in the free-form part. It is equally a problem
with both approaches.

I was purely commenting on the "if I mention an arbitrary sha-1, what is
the person reading it supposed to _do_ with it, if they may never have
seen that sha-1" issue.

So yes, it has _everything_ to do with workflows. In Stephen's case, he
claims that all references will be to commits on long-lived branches. In
which case, it is a non-issue because they will have the referenced
commits.

But in the general case, people will not have them, and there is
potential head-scratching. My point is that even if a feature works for
Stephen's workflow, it may not be a good feature for everyone, since
other solutions handle the general case (as well as his case) much
better.

> [ranting about how the origin header is bad]
> The only thing I have ever argued against is adding commit headers that 
> have no sane semantics and don't make sense as internal git data 
> structures.

Yes, and I totally agree with everything you said. If you read the mail
you are responding to carefully, you will see that I never mention an
origin header versus the free-form commit.

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-10 16:23                     ` Jakub Narebski
@ 2008-09-11 23:28                       ` Sam Vilain
  2008-09-11 23:44                         ` Linus Torvalds
  0 siblings, 1 reply; 137+ messages in thread
From: Sam Vilain @ 2008-09-11 23:28 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Linus Torvalds, Paolo Bonzini, Stephen R. van den Berg, git

Jakub Narebski wrote:
>> And that dotted line really does sound like something you could do with 
>> just the existing "hyperlink" functionality in the commit message.
> 
> As far as I understand (note: I'm neither for, nor against the proposal;
> although I think it has thin chance to be accepted, especially soon),
> it is for graphical history viewers, for git-cherry to make it more
> precise (to detect duplicated/cherry-picked changes better), and in
> the future possibly to help history-aware merge strategies. And probably
> help patch management interfaces.

Can I suggest,

 1. bury this origin link idea

 2. make git-cherry-pick have a similar option to '-x', but instead of
    recording the original commit ID, record the original *patch* ID,
    *if* there was a merge conflict for that cherry pick.

 3. tools can build indexes from patch ID => (commit IDs) to make this
    other form of history navigation fast.

Sam

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 23:10                                     ` Linus Torvalds
  2008-09-11 23:26                                       ` Jeff King
@ 2008-09-11 23:36                                       ` Stephen R. van den Berg
  1 sibling, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-11 23:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Theodore Tso, Jakub Narebski, git

Linus Torvalds wrote:
>I have asked several times what it is that makes it so important that the 
>"origin" information be in the headers. Nobody has been able to explain 
>why it's so different from just doing it in the free-form part. NOBODY.

That's because the difference is small:
In the header is slightly faster and more elegant (both designwise and
displaywise), that's it.
Other than that, it hardly matters.

>The only thing I have ever argued against is adding commit headers that 
>have no sane semantics and don't make sense as internal git data 
>structures.

Of course.

In any case, I think I got enough feedback from the list to create
a working implementation/concept which is going to use the free-form
trailer to implement the origin field.
-- 
Sincerely,
           Stephen R. van den Berg.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 23:28                       ` Sam Vilain
@ 2008-09-11 23:44                         ` Linus Torvalds
  2008-09-12  2:24                           ` Sam Vilain
  2008-09-12  5:47                           ` Stephen R. van den Berg
  0 siblings, 2 replies; 137+ messages in thread
From: Linus Torvalds @ 2008-09-11 23:44 UTC (permalink / raw)
  To: Sam Vilain; +Cc: Jakub Narebski, Paolo Bonzini, Stephen R. van den Berg, git

On Fri, 12 Sep 2008, Sam Vilain wrote:
> 
>  2. make git-cherry-pick have a similar option to '-x', but instead of
>     recording the original commit ID, record the original *patch* ID,
>     *if* there was a merge conflict for that cherry pick.

Actually, don't make it dependent on merge conflicts. Just make it depend 
on whether the patch ID is _different_.

It can happen even without any conflicts, just because the context 
changed. So it really isn't about merge conflicts per se, just the fact 
that a patch can change when it is applied in a new area with a three-way 
diff - or because it got applied with fuzz.

You could add it as a 

	Original-patch-id: <sha1>

or something. And then you just need to teach "git cherry/rebase" to take 
both the original ID and the new one into account when deciding whether it 
has already seen that patch.

			Linus

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 12:39                       ` Stephen R. van den Berg
@ 2008-09-12  0:03                         ` A Large Angry SCM
  2008-09-12  0:13                           ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: A Large Angry SCM @ 2008-09-12  0:03 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Linus Torvalds, Jakub Narebski, git

Stephen R. van den Berg wrote:
> A Large Angry SCM wrote:
>> Stephen R. van den Berg wrote:
>>> If you fetch just branches A, B and C, but not D, the origin link from A
>>> to D is dangling. 
> 
>> I do not understand how this can be considered an acceptable behavior. 
>> If an object ID is referenced in an object header, particularly commit 
>> objects, fetch must gather those objects also because to do otherwise 
>> breaks the cryptographic authentication in git.
> 
> No it does not.
> The cryptographic seal is calculated over the content of the commit,
> which includes the hashes of all referenced objects, but doesn't include
> the objects themselves.
> The content of the commit is not violated.

The fetch MUST gather the referenced objects ALWAYS or I can't verify 
the history. To do otherwise means that ID strings on the origin lines 
are nothing more than an arbitrary text tag and not pointer to a 
specific history.

> 
> Do not forget though:
> - origin links are a rare occurrence.
> - When they occur, they usually were made to point into other (deemed)
>   important public branches.
> - Due to the fact that the branches they are pointing into are important
>   and public, in most cases the origin links *will* point to objects you
>   actually already have (even if you fetched from someone else).
> - The only time you're going to have dangling origin links is when
>   they were pointing at someone's private branches, in which case it was
>   not very prudent of the committer to actually record the link in the
>   first place.  But nothing breaks if you don't have his private branch
>   locally.

How do I verify (think git-fsck) that what the origin lines refer to 
are, in fact, commits with the proper relationships? Either they HAVE to 
be in the repository or the references do not belong in the header.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12  0:03                         ` A Large Angry SCM
@ 2008-09-12  0:13                           ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12  0:13 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: Linus Torvalds, Jakub Narebski, git

A Large Angry SCM wrote:
>Stephen R. van den Berg wrote:
>>No it does not.
>>The cryptographic seal is calculated over the content of the commit,
>>which includes the hashes of all referenced objects, but doesn't include
>>the objects themselves.
>>The content of the commit is not violated.

>The fetch MUST gather the referenced objects ALWAYS or I can't verify 
>the history. To do otherwise means that ID strings on the origin lines 
>are nothing more than an arbitrary text tag and not pointer to a 
>specific history.

To fetch, by default, the origin lines *are* nothing more than arbitrary
text and not a pointer to a specific history.

>How do I verify (think git-fsck) that what the origin lines refer to 
>are, in fact, commits with the proper relationships? Either they HAVE to 
>be in the repository or the references do not belong in the header.

If the origin hashes are not reachable, then fsck is required to silently
skip them, according to spec.
If the origin hashes *are* reachable, then fsck is required to verify
that they refer to proper commits with a normal history.
-- 
Sincerely,
           Stephen R. van den Berg.
"There are three types of people in the world;
 those who can count, and those who can't."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 20:22                                       ` Stephen R. van den Berg
@ 2008-09-12  0:30                                         ` A Large Angry SCM
  2008-09-12  5:39                                           ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: A Large Angry SCM @ 2008-09-12  0:30 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: Jakub Narebski, Nicolas Pitre, Theodore Tso, Linus Torvalds, git

Stephen R. van den Berg wrote:
> Jakub Narebski wrote:
>> Stephen R. van den Berg wrote:
>>> Well, the principle of least surprise dictates that they should be kept
>>> by gc as described above, however...
>>> I can envision an option to gc say "--drop-weak-links" which does
>>> exactly what you describe.
> 
>> Well, IIRC the need for this was one of the causes of "death" of 'prior'
>> header link proposal...
> 
> As I understood it, one of the causes of death of the "prior" link
> proposal was that it was unclear if it pulled in the linked-to commits
> upon fetch.  In the "origin" case, the default is *not* to fetch them.

And that's WRONG. Both prior and origin must fetch them if they're 
reference in the header.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 23:44                         ` Linus Torvalds
@ 2008-09-12  2:24                           ` Sam Vilain
  2008-09-12  5:47                           ` Stephen R. van den Berg
  1 sibling, 0 replies; 137+ messages in thread
From: Sam Vilain @ 2008-09-12  2:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jakub Narebski, Paolo Bonzini, Stephen R. van den Berg, git

Linus Torvalds wrote:
> 
> On Fri, 12 Sep 2008, Sam Vilain wrote:
>>  2. make git-cherry-pick have a similar option to '-x', but instead of
>>     recording the original commit ID, record the original *patch* ID,
>>     *if* there was a merge conflict for that cherry pick.
> 
> Actually, don't make it dependent on merge conflicts. Just make it depend 
> on whether the patch ID is _different_.
> 
> It can happen even without any conflicts, just because the context 
> changed. So it really isn't about merge conflicts per se, just the fact 
> that a patch can change when it is applied in a new area with a three-way 
> diff - or because it got applied with fuzz.
> 
> You could add it as a 
> 
> 	Original-patch-id: <sha1>
> 
> or something. And then you just need to teach "git cherry/rebase" to take 
> both the original ID and the new one into account when deciding whether it 
> has already seen that patch.

Yes, right - it's the patch ID changing that's the problem for
git-cherry / rev-list --cherry-pick to be able to spot changes as the
'same'.

Someone else pointed out that git-rebase -i might want to have this as well.

I actually looked into coding this, but there was a little problem with
the way git-revert worked - it builds the commit message before the diff
is calculated.  So there would probably need to be a little trivial
refactoring first before this can be implemented.

Sam.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12  0:30                                         ` A Large Angry SCM
@ 2008-09-12  5:39                                           ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12  5:39 UTC (permalink / raw)
  To: A Large Angry SCM
  Cc: Jakub Narebski, Nicolas Pitre, Theodore Tso, Linus Torvalds, git

A Large Angry SCM wrote:
>Stephen R. van den Berg wrote:
>>Jakub Narebski wrote:
>>>Stephen R. van den Berg wrote:
>>>>Well, the principle of least surprise dictates that they should be kept
>>>>by gc as described above, however...
>>>>I can envision an option to gc say "--drop-weak-links" which does
>>>>exactly what you describe.

>>>Well, IIRC the need for this was one of the causes of "death" of 'prior'
>>>header link proposal...

>>As I understood it, one of the causes of death of the "prior" link
>>proposal was that it was unclear if it pulled in the linked-to commits
>>upon fetch.  In the "origin" case, the default is *not* to fetch them.

>And that's WRONG. Both prior and origin must fetch them if they're 
>reference in the header.

By definition of the origin headerfield that is not wrong, there are no
other rules.  But the point is moot at the moment, since I'm going to
create a proof of concept which puts the field in the free-form trailer.
-- 
Sincerely,
           Stephen R. van den Berg.

"Father's Day: Nine months before Mother's Day."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 23:44                         ` Linus Torvalds
  2008-09-12  2:24                           ` Sam Vilain
@ 2008-09-12  5:47                           ` Stephen R. van den Berg
  2008-09-12  6:19                             ` Rogan Dawes
  2008-09-12 14:58                             ` Theodore Tso
  1 sibling, 2 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12  5:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sam Vilain, Jakub Narebski, Paolo Bonzini, git

Linus Torvalds wrote:
>On Fri, 12 Sep 2008, Sam Vilain wrote:
>It can happen even without any conflicts, just because the context 
>changed. So it really isn't about merge conflicts per se, just the fact 
>that a patch can change when it is applied in a new area with a three-way 
>diff - or because it got applied with fuzz.

Quite.

>You could add it as a 

>	Original-patch-id: <sha1>

That will probably work fine when operating locally on (short) temporary
branches.

It would probably become computationally prohibitive to use it between
long lived permanent branches.  In that case it would need to be
augmented by the sha1 of the originating commit.  Which gives you two
hashes as reference, and in that case you might as well use the two
commit hashes of which the difference yields the patch.
-- 
Sincerely,
           Stephen R. van den Berg.

"Father's Day: Nine months before Mother's Day."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12  5:47                           ` Stephen R. van den Berg
@ 2008-09-12  6:19                             ` Rogan Dawes
  2008-09-12  6:56                               ` Stephen R. van den Berg
  2008-09-12 14:58                             ` Theodore Tso
  1 sibling, 1 reply; 137+ messages in thread
From: Rogan Dawes @ 2008-09-12  6:19 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: Linus Torvalds, Sam Vilain, Jakub Narebski, Paolo Bonzini, git

Stephen R. van den Berg wrote:
> Linus Torvalds wrote:
>> On Fri, 12 Sep 2008, Sam Vilain wrote:
>> It can happen even without any conflicts, just because the context 
>> changed. So it really isn't about merge conflicts per se, just the fact 
>> that a patch can change when it is applied in a new area with a three-way 
>> diff - or because it got applied with fuzz.
> 
> Quite.
> 
>> You could add it as a 
> 
>> 	Original-patch-id: <sha1>
> 
> That will probably work fine when operating locally on (short) temporary
> branches.
> 
> It would probably become computationally prohibitive to use it between
> long lived permanent branches.  In that case it would need to be
> augmented by the sha1 of the originating commit.  Which gives you two
> hashes as reference, and in that case you might as well use the two
> commit hashes of which the difference yields the patch.

Pardon my confusion, but why include two commit hashes? Surely the 
commit already has its parent, so there is no need to include that in 
your "cherry pick". And if the commit has more than one parent, then I 
doubt you could/should really cherry-pick it anyway.

Besides, you could always augment your local repo with a mapping of 
patch ids to commits/commit pairs to reduce lookup time.

Rogan

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12  6:19                             ` Rogan Dawes
@ 2008-09-12  6:56                               ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12  6:56 UTC (permalink / raw)
  To: Rogan Dawes
  Cc: Linus Torvalds, Sam Vilain, Jakub Narebski, Paolo Bonzini, git

Rogan Dawes wrote:
>Stephen R. van den Berg wrote:
>>It would probably become computationally prohibitive to use it between
>>long lived permanent branches.  In that case it would need to be
>>augmented by the sha1 of the originating commit.  Which gives you two
>>hashes as reference, and in that case you might as well use the two
>>commit hashes of which the difference yields the patch.

>Pardon my confusion, but why include two commit hashes? Surely the 
>commit already has its parent, so there is no need to include that in 
>your "cherry pick". And if the commit has more than one parent, then I 
>doubt you could/should really cherry-pick it anyway.

Well, actually, sometimes cherry-pick does pick just one of the
(multiple) parents to diff with; also, some people (not I) envisioned
using two commits which were not a direct parent and child of one
another (I'm not quite sure how that would work, but the model would
support it).

>Besides, you could always augment your local repo with a mapping of 
>patch ids to commits/commit pairs to reduce lookup time.

Yes, possible.  But then after cloning, this mapping-cache needs to be
recreated, and that would mean that one would have to walk through all
commits and calculate all patch-id's, of which then only those few which are
referenced need to be stored.
-- 
Sincerely,
           Stephen R. van den Berg.

"Father's Day: Nine months before Mother's Day."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 21:01                             ` Theodore Tso
@ 2008-09-12  8:40                               ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12  8:40 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Nicolas Pitre, Linus Torvalds, Jakub Narebski, git

Theodore Tso wrote:
>On Thu, Sep 11, 2008 at 09:55:16PM +0200, Stephen R. van den Berg wrote:
>> >  Having it versionned also 
>> >means that older git versions will be able to carry that information 
>> >even if they won't make any use of it, and that also solves the 
>> >cryptographic issue since that data is part of the top commit SHA1.

>> It would allow the data to be faked, that is undesirable for "git blame".

>Why would this matter?  The information is largely
>self-authenticating.  If a commit claims to have come from some other

Attack-wise, you're right, it's not a big deal.
I think the comforting feeling one gets about the hashes protecting
integrity is what matters more for me here.
-- 
Sincerely,
           Stephen R. van den Berg.

"Father's Day: Nine months before Mother's Day."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-11 20:27                             ` Nicolas Pitre
@ 2008-09-12  8:50                               ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12  8:50 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Jakub Narebski, git

Nicolas Pitre wrote:
>On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>> Nicolas Pitre wrote:
>> >On Thu, 11 Sep 2008, Stephen R. van den Berg wrote:
>> >> when doing things with temporary branches.  The origin field is meant to
>> >> be filled *ONLY* when cherry-picking from one permanent branch to
>> >> another permanent branch.  This is a *rare* operation.

>> >... and therefore you might as well just have a separate file (which 
>> >might or might not be tracked by git like the .gitignore files are) 
>> >to keep that information?  Since this is a rare operation, modifying the 
>> >core database structure for this doesn't appear that appealing to most 
>> >so far.

>> For various reasons, the best alternate place would be at the trailing
>> end of the free-form field.  Using a separate structure causes
>> (performance) problems (mostly).

>Did you try it?

No.

>  I don't particularly buy this performance argument, and 
>the bulk of my contributions to git so far were about performances.  It 
>is quite easy to load a flat file with sorted commit SHA1s, and given 
>that origin links are the result of a rare operation, then there 
>shouldn't be too many entries to search through.  Hell, doing 213647 

True.

>lookups (and many other things like inflating zlib deflated data)  with 
>each of them for commit objects in my Linux repository which has 1355167 
>total entries takes only 6 seconds here, or about a quarter of a 
>milisecond for each lookup.  I doubt doing an extra lookup in a much 
>smaller table would show on the radar.

Maybe you're right.  The reason why my first knee-jerk reaction is
"performance problem" is because:
- The field is rarely present.
- When it is used, we look for it on every commit we traverse.
- This means that finding out the field does *not* exist is the most
  common operation, and that effort rises linearly with the number of
  commits visited.

Whereas if the information is present in the header or trailer of the
commit, finding out that the field does not exist there is rather
cheap.  But you could very well be right, that the absolute extra time
spent might be negligible for all intents and purposes.

Nonetheless, the data-integrity argument still holds, i.e. placing it in
the commit (header or trailer) automatically protects it.  External
files need extra care if you want the same integrity protection.
-- 
Sincerely,
           Stephen R. van den Berg.

"Father's Day: Nine months before Mother's Day."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12  5:47                           ` Stephen R. van den Berg
  2008-09-12  6:19                             ` Rogan Dawes
@ 2008-09-12 14:58                             ` Theodore Tso
  2008-09-12 15:05                               ` Paolo Bonzini
                                                 ` (2 more replies)
  1 sibling, 3 replies; 137+ messages in thread
From: Theodore Tso @ 2008-09-12 14:58 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: Linus Torvalds, Sam Vilain, Jakub Narebski, Paolo Bonzini, git

On Fri, Sep 12, 2008 at 07:47:39AM +0200, Stephen R. van den Berg wrote:
> >You could add it as a 
> 
> >	Original-patch-id: <sha1>
> 
> That will probably work fine when operating locally on (short) temporary
> branches.
> 
> It would probably become computationally prohibitive to use it between
> long lived permanent branches.  In that case it would need to be
> augmented by the sha1 of the originating commit.

Nope, as Sam suggested in his original message (but which got clipped
by Linus when he was replying) all you have to do is to have a
separate local database which ties commits and patch-id's together as
a cache/index.

I know you seem to be resistent to caches, but caches are **good**
because they are local information, which by definition can be
implementation-dependent; you can always generate the cache from the
git repository if for some reason you need to extend it.  It also
means that if it turns out you need to index reationships a different
way, you can do that without having to make fundamental (incompatible)
changes in the git object.  

It's much like SQL databases; you have your database tables, where
making changes to the database schema is painful --- and indexes,
which can be added and dropped with much less effort.  Think of these
local caches are database indexes.  Just because you need an index in
a particular direction to optimize a query or loopup operation does
***not*** imply that you need to make a fundamental, globally visible,
database schema change or git object layout which breaks compatibility
for everybody.

						- Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 14:58                             ` Theodore Tso
@ 2008-09-12 15:05                               ` Paolo Bonzini
  2008-09-12 15:11                               ` Jakub Narebski
  2008-09-12 15:54                               ` Stephen R. van den Berg
  2 siblings, 0 replies; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-12 15:05 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Stephen R. van den Berg, Linus Torvalds, Sam Vilain,
	Jakub Narebski, git

Theodore Tso wrote:
> On Fri, Sep 12, 2008 at 07:47:39AM +0200, Stephen R. van den Berg wrote:
>>> You could add it as a 
>>> 	Original-patch-id: <sha1>
>> That will probably work fine when operating locally on (short) temporary
>> branches.
>>
>> It would probably become computationally prohibitive to use it between
>> long lived permanent branches.  In that case it would need to be
>> augmented by the sha1 of the originating commit.
> 
> Nope, as Sam suggested in his original message (but which got clipped
> by Linus when he was replying) all you have to do is to have a
> separate local database which ties commits and patch-id's together as
> a cache/index.

Yeah, I must admit I am okay with *this* cache.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 14:58                             ` Theodore Tso
  2008-09-12 15:05                               ` Paolo Bonzini
@ 2008-09-12 15:11                               ` Jakub Narebski
  2008-09-12 15:40                                 ` Paolo Bonzini
  2008-09-12 15:54                               ` Stephen R. van den Berg
  2 siblings, 1 reply; 137+ messages in thread
From: Jakub Narebski @ 2008-09-12 15:11 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Stephen R. van den Berg, Linus Torvalds, Sam Vilain,
	Paolo Bonzini, git

Theodore Tso wrote:
> On Fri, Sep 12, 2008 at 07:47:39AM +0200, Stephen R. van den Berg wrote:
>>>
>>>You could add it as a 
>>>
>>>	Original-patch-id: <sha1>
>> 
>> That will probably work fine when operating locally on (short) temporary
>> branches.
>> 
>> It would probably become computationally prohibitive to use it between
>> long lived permanent branches.  In that case it would need to be
>> augmented by the sha1 of the originating commit.
> 
> Nope, as Sam suggested in his original message (but which got clipped
> by Linus when he was replying) all you have to do is to have a
> separate local database which ties commits and patch-id's together as
> a cache/index.
> 
> I know you seem to be resistent to caches, but caches are **good**
> because they are local information, which by definition can be
> implementation-dependent; you can always generate the cache from the
> git repository if for some reason you need to extend it. [...]

But it is not true that "you can always generate the cache from the
git repository" in this case; the patch-id that is to be saved is
_original_ patch-id of cherry-picked (or reverted) changeset.

OTOH it is not much different from reflog information, which also
cannot be regenerated from object database.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 15:11                               ` Jakub Narebski
@ 2008-09-12 15:40                                 ` Paolo Bonzini
  2008-09-12 16:00                                   ` Theodore Tso
  0 siblings, 1 reply; 137+ messages in thread
From: Paolo Bonzini @ 2008-09-12 15:40 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Theodore Tso, Stephen R. van den Berg, Linus Torvalds, Sam Vilain,
	git


>> I know you seem to be resistent to caches, but caches are **good**
>> because they are local information, which by definition can be
>> implementation-dependent; you can always generate the cache from the
>> git repository if for some reason you need to extend it. [...]
> 
> But it is not true that "you can always generate the cache from the
> git repository" in this case; the patch-id that is to be saved is
> _original_ patch-id of cherry-picked (or reverted) changeset.

He's proposing storing the original patch id in the commit message, and
caching the commit SHA->patch id association on the side.

Paolo

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 14:58                             ` Theodore Tso
  2008-09-12 15:05                               ` Paolo Bonzini
  2008-09-12 15:11                               ` Jakub Narebski
@ 2008-09-12 15:54                               ` Stephen R. van den Berg
  2008-09-12 16:19                                 ` Jeff King
  2008-09-15 12:21                                 ` Sam Vilain
  2 siblings, 2 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12 15:54 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Linus Torvalds, Sam Vilain, Jakub Narebski, Paolo Bonzini, git

Theodore Tso wrote:
>On Fri, Sep 12, 2008 at 07:47:39AM +0200, Stephen R. van den Berg wrote:
>> It would probably become computationally prohibitive to use it between
>> long lived permanent branches.  In that case it would need to be
>> augmented by the sha1 of the originating commit.

>Nope, as Sam suggested in his original message (but which got clipped
>by Linus when he was replying) all you have to do is to have a
>separate local database which ties commits and patch-id's together as
>a cache/index.

True.  But repopulating this cache after cloning means that you have to
calculate the patch-id of *every* commit in the repository.  It sounds
like something to avoid, but maybe I'm overly concerned, I have only a
vague idea on how computationally intensive this is.

>I know you seem to be resistent to caches, but caches are **good**
>because they are local information, which by definition can be
>implementation-dependent; you can always generate the cache from the
>git repository if for some reason you need to extend it.  It also
>means that if it turns out you need to index reationships a different
>way, you can do that without having to make fundamental (incompatible)
>changes in the git object.  

I fully agree that caches are good.
And yes I seem to resist the idea to create a cache at every whim, but
that mostly is because I want to avoid that everyone invents their own
mini-database for each and every data access they want to accellerate.

I mean, ideally, any database/index/accellerator structure you'd need
can reuse the SHA1 object database index, or maybe one or two other
semi-standard index types, and git would provide suitable library
functions for all three solutions.  And if that would be the case, I'll
gladly throw in an extra cache or index at anytime to speed up the
particular access pattern I'm trying to make useable.  But as far as I
can see, those library functions have not materialised yet, so I'm
hesitant to create yet another private database structure just for my
access patterns; and simply pulling in libdb or sqlite without agreement
that those libs are (re)used in a lot of places in git seems a bit
bloat-prone.

>local caches are database indexes.  Just because you need an index in
>a particular direction to optimize a query or loopup operation does
>***not*** imply that you need to make a fundamental, globally visible,
>database schema change or git object layout which breaks compatibility
>for everybody.

It's not a certainty that changing the git object layout has to break
compatibility (it should be reasonably possible to add columns to the
schema without breaking anything, to stay with the database paradigm),
but I agree that creating another index can be considered better than
extending the schema.
-- 
Sincerely,
           Stephen R. van den Berg.

"Father's Day: Nine months before Mother's Day."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 15:40                                 ` Paolo Bonzini
@ 2008-09-12 16:00                                   ` Theodore Tso
  0 siblings, 0 replies; 137+ messages in thread
From: Theodore Tso @ 2008-09-12 16:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Jakub Narebski, Stephen R. van den Berg, Linus Torvalds,
	Sam Vilain, git

On Fri, Sep 12, 2008 at 05:40:26PM +0200, Paolo Bonzini wrote:
> > But it is not true that "you can always generate the cache from the
> > git repository" in this case; the patch-id that is to be saved is
> > _original_ patch-id of cherry-picked (or reverted) changeset.
> 
> He's proposing storing the original patch id in the commit message, and
> caching the commit SHA->patch id association on the side.
> 

Actually its the association in the other direction which you'd want
to cache.  It's fast given the commit SHA to dig the original patch id
out of the commit message.  What is harder is given a patch id X, to
find all of the commits which either (a) have a patch id of X, or (b)
have a commit message indicating that the original patch-id was X.  So
having a database which caches this information, so given a patch-id,
you can quickly look up the related commits, is what I believe Sam was
proposing, and which I think would solve the problem quite nicely.

	       	       	     	   	     - Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 15:54                               ` Stephen R. van den Berg
@ 2008-09-12 16:19                                 ` Jeff King
  2008-09-12 16:43                                   ` Stephen R. van den Berg
  2008-09-15 12:21                                 ` Sam Vilain
  1 sibling, 1 reply; 137+ messages in thread
From: Jeff King @ 2008-09-12 16:19 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: git

On Fri, Sep 12, 2008 at 05:54:27PM +0200, Stephen R. van den Berg wrote:

> True.  But repopulating this cache after cloning means that you have to
> calculate the patch-id of *every* commit in the repository.  It sounds
> like something to avoid, but maybe I'm overly concerned, I have only a
> vague idea on how computationally intensive this is.

For a rough estimate, try:

  time git log -p | git patch-id >/dev/null

-Peff

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 16:19                                 ` Jeff King
@ 2008-09-12 16:43                                   ` Stephen R. van den Berg
  2008-09-12 18:44                                     ` Theodore Tso
  0 siblings, 1 reply; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12 16:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King wrote:
>On Fri, Sep 12, 2008 at 05:54:27PM +0200, Stephen R. van den Berg wrote:

>> True.  But repopulating this cache after cloning means that you have to
>> calculate the patch-id of *every* commit in the repository.  It sounds
>> like something to avoid, but maybe I'm overly concerned, I have only a
>> vague idea on how computationally intensive this is.

>For a rough estimate, try:

>  time git log -p | git patch-id >/dev/null

On my system that results in 2ms per commit on average.  Not huge, but
not small either, I guess.  Running it results in real waiting time, it
all depends on how patient the user is.
-- 
Sincerely,
           Stephen R. van den Berg.

"Father's Day: Nine months before Mother's Day."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 16:43                                   ` Stephen R. van den Berg
@ 2008-09-12 18:44                                     ` Theodore Tso
  2008-09-12 20:56                                       ` Stephen R. van den Berg
  0 siblings, 1 reply; 137+ messages in thread
From: Theodore Tso @ 2008-09-12 18:44 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: Jeff King, git

On Fri, Sep 12, 2008 at 06:43:48PM +0200, Stephen R. van den Berg wrote:
> >> True.  But repopulating this cache after cloning means that you have to
> >> calculate the patch-id of *every* commit in the repository.  It sounds
> >> like something to avoid, but maybe I'm overly concerned, I have only a
> >> vague idea on how computationally intensive this is.
> 
> >For a rough estimate, try:
> 
> >  time git log -p | git patch-id >/dev/null
> 
> On my system that results in 2ms per commit on average.  Not huge, but
> not small either, I guess.  Running it results in real waiting time, it
> all depends on how patient the user is.

For a local clone, git could be taught to copy the cache file.  For a
network-based clone, the percentage of time needed to download is
roughly 2-3 times that (although that will obviously depend on your
network connectivity).  Building this cache can be done in the
background, though, or delayed until the first time the cache is
needed.

						- Ted

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 18:44                                     ` Theodore Tso
@ 2008-09-12 20:56                                       ` Stephen R. van den Berg
  0 siblings, 0 replies; 137+ messages in thread
From: Stephen R. van den Berg @ 2008-09-12 20:56 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jeff King, git

Theodore Tso wrote:
>On Fri, Sep 12, 2008 at 06:43:48PM +0200, Stephen R. van den Berg wrote:
>> On my system that results in 2ms per commit on average.  Not huge, but
>> not small either, I guess.  Running it results in real waiting time, it
>> all depends on how patient the user is.

>For a local clone, git could be taught to copy the cache file.  For a
>network-based clone, the percentage of time needed to download is
>roughly 2-3 times that (although that will obviously depend on your
>network connectivity).  Building this cache can be done in the
>background, though, or delayed until the first time the cache is
>needed.

Fair enough.  If noone beats me to it, I'll probably take a stab at
implementing something like this and see how it fares for my own
application.
-- 
Sincerely,
           Stephen R. van den Berg.

"Father's Day: Nine months before Mother's Day."

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [RFC] origin link for cherry-pick and revert
  2008-09-12 15:54                               ` Stephen R. van den Berg
  2008-09-12 16:19                                 ` Jeff King
@ 2008-09-15 12:21                                 ` Sam Vilain
  1 sibling, 0 replies; 137+ messages in thread
From: Sam Vilain @ 2008-09-15 12:21 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: Theodore Tso, Linus Torvalds, Jakub Narebski, Paolo Bonzini, git

On Fri, 2008-09-12 at 17:54 +0200, Stephen R. van den Berg wrote:
> Theodore Tso wrote:
> >Nope, as Sam suggested in his original message (but which got clipped
> >by Linus when he was replying) all you have to do is to have a
> >separate local database which ties commits and patch-id's together as
> >a cache/index.
> 
> True.  But repopulating this cache after cloning means that you have to
> calculate the patch-id of *every* commit in the repository.  It sounds
> like something to avoid, but maybe I'm overly concerned, I have only a
> vague idea on how computationally intensive this is.

You don't necessarily need to do that.  If the tool decides that the
sha1 it finds in the message is a patch-id reference, well it can just
start hunting around, caching the patch-ids it calculates as it finds
them, until it either finds one that matches, or determines you don't
have it.  You can probably find it first try just based on the author
name and date 90% of the time anyway.

Maybe the machinery could be adequately tilted such that if someone is
really desperate to make sure they are found quickly they can put the
information at refs/patches/PATCHID/COMMITID, but that sounds a bit
abusive.

Sam.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Recording "partial merges" (was: Re: [RFC] origin link for cherry-pick and revert)
  2008-09-09 21:13 ` Petr Baudis
  2008-09-09 22:56   ` Stephen R. van den Berg
  2008-09-10  8:45   ` Paolo Bonzini
@ 2008-09-23 13:51   ` Peter Krefting
  2 siblings, 0 replies; 137+ messages in thread
From: Peter Krefting @ 2008-09-23 13:51 UTC (permalink / raw)
  To: Git Mailing List

Petr Baudis:

> I think this is misguided. In general case, cherrypicks can be from
> completely unrelated histories, and if you are doing the cherry pick,
> you are saying that actually, the history *does not matter*.

As my workflow sometimes make me do cherry-picks where history does
matter, in form of me doing a "partial merge" of one or more than one
commit from branch A into branch B, which does not necessarily have to
be directly related,

 is there a way to perform something like that, while keeping history?

Perhaps I'm damaged by having used CVS for too long, and merging just
some files, or abusing CVS internals to make some files on branch A
also be part of branch B by having their branches point to the same RCS
branch revision number, but sometimes I find that I miss being able to
do it in Git.

Not really a big deal, just curious.

-- 
\\// Peter - http://www.softwolves.pp.se/

^ permalink raw reply	[flat|nested] 137+ messages in thread

end of thread, other threads:[~2008-09-23 13:52 UTC | newest]

Thread overview: 137+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-09 13:22 [RFC] origin link for cherry-pick and revert Stephen R. van den Berg
2008-09-09 13:38 ` Paolo Bonzini
2008-09-09 14:04   ` Stephen R. van den Berg
2008-09-09 13:48 ` Stephen R. van den Berg
2008-09-09 15:44 ` Jakub Narebski
2008-09-09 16:38   ` Steven Grimm
2008-09-09 19:43   ` Stephen R. van den Berg
2008-09-09 19:59     ` Jeff King
2008-09-09 20:25       ` Stephen R. van den Berg
2008-09-09 20:42       ` Junio C Hamano
2008-09-09 20:47         ` Shawn O. Pearce
2008-09-09 20:50         ` Jeff King
2008-09-09 22:35           ` Jakub Narebski
2008-09-09 23:07             ` Jakub Narebski
2008-09-10  8:10               ` Paolo Bonzini
2008-09-10  0:13         ` Stephen R. van den Berg
2008-09-10  1:59           ` Junio C Hamano
2008-09-10  5:38             ` Stephen R. van den Berg
2008-09-09 21:05       ` Junio C Hamano
2008-09-09 21:09         ` Jeff King
2008-09-09 23:36           ` Stephen R. van den Berg
2008-09-09 20:54     ` Jakub Narebski
2008-09-09 23:08       ` Stephen R. van den Berg
2008-09-09 23:35     ` Linus Torvalds
2008-09-09 23:58       ` Stephen R. van den Berg
2008-09-10  0:23         ` Linus Torvalds
2008-09-10  5:42           ` Stephen R. van den Berg
2008-09-10 15:30             ` Linus Torvalds
2008-09-10 23:09               ` Stephen R. van den Berg
2008-09-11  0:39                 ` Linus Torvalds
2008-09-11  6:22                   ` Stephen R. van den Berg
2008-09-11  8:20                     ` Jakub Narebski
2008-09-11 12:31                       ` Stephen R. van den Berg
2008-09-11 13:51                         ` Theodore Tso
2008-09-11 15:32                           ` Stephen R. van den Berg
2008-09-11 18:00                             ` Theodore Tso
2008-09-11 19:03                               ` Stephen R. van den Berg
2008-09-11 19:33                                 ` Nicolas Pitre
2008-09-11 19:44                                   ` Stephen R. van den Berg
2008-09-11 20:03                                     ` Nicolas Pitre
2008-09-11 20:24                                       ` Stephen R. van den Berg
2008-09-11 20:05                                     ` Jakub Narebski
2008-09-11 20:22                                       ` Stephen R. van den Berg
2008-09-12  0:30                                         ` A Large Angry SCM
2008-09-12  5:39                                           ` Stephen R. van den Berg
2008-09-11 20:04                                 ` Theodore Tso
2008-09-11 21:46                                   ` Jeff King
2008-09-11 22:56                                     ` Stephen R. van den Berg
2008-09-11 23:01                                       ` Jeff King
2008-09-11 23:17                                         ` Stephen R. van den Berg
2008-09-11 23:10                                     ` Linus Torvalds
2008-09-11 23:26                                       ` Jeff King
2008-09-11 23:36                                       ` Stephen R. van den Berg
2008-09-11 15:02                         ` Nicolas Pitre
2008-09-11 16:00                           ` Stephen R. van den Berg
2008-09-11 17:02                             ` Nicolas Pitre
2008-09-11 18:44                               ` Stephen R. van den Berg
2008-09-11 20:00                                 ` Nicolas Pitre
2008-09-11 21:05                               ` Junio C Hamano
2008-09-11 22:32                                 ` Stephen R. van den Berg
2008-09-11 22:40                                   ` Stephen R. van den Berg
2008-09-11 12:28                     ` A Large Angry SCM
2008-09-11 12:39                       ` Stephen R. van den Berg
2008-09-12  0:03                         ` A Large Angry SCM
2008-09-12  0:13                           ` Stephen R. van den Berg
2008-09-11 15:39                     ` Linus Torvalds
2008-09-11 16:01                       ` Paolo Bonzini
2008-09-11 16:23                         ` Linus Torvalds
2008-09-11 20:16                           ` Stephen R. van den Berg
2008-09-11 16:53                       ` Jakub Narebski
2008-09-11 19:23                       ` Stephen R. van den Berg
2008-09-11 19:45                         ` Nicolas Pitre
2008-09-11 19:55                           ` Stephen R. van den Berg
2008-09-11 20:27                             ` Nicolas Pitre
2008-09-12  8:50                               ` Stephen R. van den Berg
2008-09-11 21:01                             ` Theodore Tso
2008-09-12  8:40                               ` Stephen R. van den Berg
2008-09-10  8:30           ` Paolo Bonzini
2008-09-10 15:32             ` Linus Torvalds
2008-09-10 15:37               ` Paolo Bonzini
2008-09-10 15:43                 ` Linus Torvalds
2008-09-10 15:46                   ` Linus Torvalds
2008-09-10 15:57                     ` Paolo Bonzini
2008-09-10 23:15                       ` Stephen R. van den Berg
2008-09-10 16:23                     ` Jakub Narebski
2008-09-11 23:28                       ` Sam Vilain
2008-09-11 23:44                         ` Linus Torvalds
2008-09-12  2:24                           ` Sam Vilain
2008-09-12  5:47                           ` Stephen R. van den Berg
2008-09-12  6:19                             ` Rogan Dawes
2008-09-12  6:56                               ` Stephen R. van den Berg
2008-09-12 14:58                             ` Theodore Tso
2008-09-12 15:05                               ` Paolo Bonzini
2008-09-12 15:11                               ` Jakub Narebski
2008-09-12 15:40                                 ` Paolo Bonzini
2008-09-12 16:00                                   ` Theodore Tso
2008-09-12 15:54                               ` Stephen R. van den Berg
2008-09-12 16:19                                 ` Jeff King
2008-09-12 16:43                                   ` Stephen R. van den Berg
2008-09-12 18:44                                     ` Theodore Tso
2008-09-12 20:56                                       ` Stephen R. van den Berg
2008-09-15 12:21                                 ` Sam Vilain
2008-09-09 23:59       ` Jakub Narebski
2008-09-09 21:13 ` Petr Baudis
2008-09-09 22:56   ` Stephen R. van den Berg
2008-09-09 23:05     ` Petr Baudis
2008-09-09 23:32       ` Stephen R. van den Berg
2008-09-10  9:35       ` [RFC] origin link for cherry-pick and revert, and more about porcelain-level metadata Paolo Bonzini
2008-09-10 10:44         ` Petr Baudis
2008-09-10 11:49           ` Stephen R. van den Berg
2008-09-10 12:30             ` Petr Baudis
2008-09-10 13:14               ` Stephen R. van den Berg
2008-09-10 14:33         ` Dmitry Potapov
2008-09-10 15:15           ` Stephen R. van den Berg
2008-09-10 15:24             ` Paolo Bonzini
2008-09-10 12:21     ` [RFC] origin link for cherry-pick and revert Theodore Tso
2008-09-10 14:16       ` Stephen R. van den Berg
2008-09-10 15:10         ` Jeff King
2008-09-10 21:50           ` Stephen R. van den Berg
2008-09-10 21:54             ` Jeff King
2008-09-10 22:34               ` Stephen R. van den Berg
2008-09-10 22:55                 ` Jeff King
2008-09-10 23:19                   ` Stephen R. van den Berg
2008-09-11  5:16                     ` Paolo Bonzini
2008-09-11  7:55                       ` Stephen R. van den Berg
2008-09-11  8:45                         ` Paolo Bonzini
2008-09-11 12:33                         ` A Large Angry SCM
2008-09-11  2:46                   ` Nicolas Pitre
2008-09-10 16:18         ` Theodore Tso
2008-09-10 16:40         ` Petr Baudis
2008-09-10 17:58           ` Paolo Bonzini
2008-09-10 22:44             ` Stephen R. van den Berg
2008-09-10  8:45   ` Paolo Bonzini
2008-09-23 13:51   ` Recording "partial merges" (was: Re: [RFC] origin link for cherry-pick and revert) Peter Krefting
2008-09-10 20:32 ` [RFC] origin link for cherry-pick and revert Miklos Vajna
2008-09-10 20:55   ` Nicolas Pitre
2008-09-10 21:06     ` Miklos Vajna

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).