git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] New commit object headers: generation and note headers
@ 2008-02-09 16:46 Jakub Narebski
  2008-02-09 17:35 ` Daniel Barkalow
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Narebski @ 2008-02-09 16:46 UTC (permalink / raw)
  To: git

As new major git release 1.6.0 is close (BTW. I wonder if git would ever 
reach/get 2.0.0 release...), I'd like to sum up here, adding my own 
thoughts and comments, ideas about extending commit object by adding 
new headers. I think it would be better to have such major feature 
introduced in major release, and not with only minor number changed.
For some headers the faster it is introduced the better.


1. 'generation' header

In the "[BUG?] git log picks up bad commit" thread:
  http://permalink.gmane.org/gmane.comp.version-control.git/72274
later "[RFH] revision limiting sometimes ignored" there was resurrected 
idea of the 'generation' header. This header is meant to simplify 
removing uninteresting commits in the presence of clock skew, to 
replace various commit-time related heuristics.

The proposed solution (which was at least once discussed in the past on 
git mailing list) is to use for this "generation number":
 1. For parentless (root) commits it equals 1 (or 0)
 2. For each commit, it equals maximum of generation numbers of parents,
    plus 1.
Of course to not to have to recalculate it from beginning it must be 
saved somewhere. Best solution is to use 'generation' header for that.

Unfortunately there is complication that commits written before this 
header introduced doesn't have generation number handy. It was proposed
then to use generation number if possible, and fallback to old date 
based heuristic if it does not exist, and do not (re)calculate it;
the idea is to avoid such cost.

My comments:
============
The problem is twofold: when to calculate generation header, and what to 
do with commits that lacks it. We could require to calculate generation 
header when creating a commit (commit, amend, rebase, filter-branch), 
but this might mean that a few first commits after 'generation' header 
is introduced would be much slower.

As for older commits which lacks generation number header: perhaps some 
(pack)-index-like external storage/cache, where generation numbers will 
be saved as we generate them? And perhaps some command to generate 
generation numbers in advance, in a free time.

Note that keeping generation numbers externally to the object database 
is more error prone (cache sync), and would not propagate.

The question is if to take grafts and shallows when creating version 
numbers: if they are to be saved in object database, then no. If saving 
to external pack-index like storage, then perhaps.


2. 'note' header (no semantical meaning)

There was some time ago discussion about adding 'note' header, initially 
to save original sha-1 of a commit for cherry-picking and rebase; then 
for saving explicit rename or corrected rename info, for saving chosen 
merge strategy, and for saving original ID of SCM import.

My comments:
============

>From all those I think what makes most sense is saving foreign SCM ID 
for a commit, for commits imported from other SCM. This way we do not 
have to parse commit message (fragile and ugly, and makes it harder for 
two-way exchange: no pristine commit message), or store them externally 
(not propagated, prone to be lost).

Another would be to save rename and copy info when importing from 
another SCM which tracks renames and not detects code movement. This 
would allow (at least theoretically) for lossless import. When 
detecting renames, in the process of finding common merge base(s), we 
could check and take into account such information. It would be purely 
advisory...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] New commit object headers: generation and note headers
  2008-02-09 16:46 [RFC] New commit object headers: generation and note headers Jakub Narebski
@ 2008-02-09 17:35 ` Daniel Barkalow
  2008-02-09 17:50   ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Barkalow @ 2008-02-09 17:35 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Sat, 9 Feb 2008, Jakub Narebski wrote:

> As new major git release 1.6.0 is close (BTW. I wonder if git would ever 
> reach/get 2.0.0 release...), I'd like to sum up here, adding my own 
> thoughts and comments, ideas about extending commit object by adding 
> new headers. I think it would be better to have such major feature 
> introduced in major release, and not with only minor number changed.
> For some headers the faster it is introduced the better.
> 
> 
> 1. 'generation' header
> 
> In the "[BUG?] git log picks up bad commit" thread:
>   http://permalink.gmane.org/gmane.comp.version-control.git/72274
> later "[RFH] revision limiting sometimes ignored" there was resurrected 
> idea of the 'generation' header. This header is meant to simplify 
> removing uninteresting commits in the presence of clock skew, to 
> replace various commit-time related heuristics.
> 
> The proposed solution (which was at least once discussed in the past on 
> git mailing list) is to use for this "generation number":
>  1. For parentless (root) commits it equals 1 (or 0)
>  2. For each commit, it equals maximum of generation numbers of parents,
>     plus 1.
> Of course to not to have to recalculate it from beginning it must be 
> saved somewhere. Best solution is to use 'generation' header for that.
> 
> Unfortunately there is complication that commits written before this 
> header introduced doesn't have generation number handy. It was proposed
> then to use generation number if possible, and fallback to old date 
> based heuristic if it does not exist, and do not (re)calculate it;
> the idea is to avoid such cost.
> 
> My comments:
> ============
> The problem is twofold: when to calculate generation header, and what to 
> do with commits that lacks it. We could require to calculate generation 
> header when creating a commit (commit, amend, rebase, filter-branch), 
> but this might mean that a few first commits after 'generation' header 
> is introduced would be much slower.

Surely, at least sombody has to do the slow commit that's the first one 
with a generation number. Maybe make it optional? If Linus calculates the 
generation number of some v2.6.x and only merges trees that have been 
rebased on it by people with sufficiently recent git, it should only take 
a lot of time once.

It's probably best to start by only including them if all parents have 
them, then having a cycle where figuring them out slowly is optional and 
defaults to off, and then make it default to on once projects are likely 
to have them in general, or at least would be likely to retain them once a 
few slow commits are done.

> 2. 'note' header (no semantical meaning)
> 
> There was some time ago discussion about adding 'note' header, initially 
> to save original sha-1 of a commit for cherry-picking and rebase; then 
> for saving explicit rename or corrected rename info, for saving chosen 
> merge strategy, and for saving original ID of SCM import.

Probably want to have a prescribed syntax for specifying what note this 
is, so that different programs using notes don't confuse each other.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] New commit object headers: generation and note headers
  2008-02-09 17:35 ` Daniel Barkalow
@ 2008-02-09 17:50   ` Nguyen Thai Ngoc Duy
  2008-02-09 21:03     ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2008-02-09 17:50 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Jakub Narebski, git

On Sun, Feb 10, 2008 at 12:35 AM, Daniel Barkalow <barkalow@iabervon.org> wrote:
>
> On Sat, 9 Feb 2008, Jakub Narebski wrote:
>  > 2. 'note' header (no semantical meaning)
>  >
>  > There was some time ago discussion about adding 'note' header, initially
>  > to save original sha-1 of a commit for cherry-picking and rebase; then
>  > for saving explicit rename or corrected rename info, for saving chosen
>  > merge strategy, and for saving original ID of SCM import.
>
>  Probably want to have a prescribed syntax for specifying what note this
>  is, so that different programs using notes don't confuse each other.

How about git ignoring all X- headers and let programs freely add
them? For example, X-SVN may be used for git-svn.

-- 
Duy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] New commit object headers: generation and note headers
  2008-02-09 17:50   ` Nguyen Thai Ngoc Duy
@ 2008-02-09 21:03     ` Junio C Hamano
  2008-02-09 23:26       ` [RFC] New commit object headers: " Jakub Narebski
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-02-09 21:03 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Daniel Barkalow, Jakub Narebski, git

"Nguyen Thai Ngoc Duy" <pclouds@gmail.com> writes:

> On Sun, Feb 10, 2008 at 12:35 AM, Daniel Barkalow <barkalow@iabervon.org> wrote:
>>
>> On Sat, 9 Feb 2008, Jakub Narebski wrote:
>>  > 2. 'note' header (no semantical meaning)
>>  >
>>  > There was some time ago discussion about adding 'note' header, initially
>>  > to save original sha-1 of a commit for cherry-picking and rebase; then
>>  > for saving explicit rename or corrected rename info, for saving chosen
>>  > merge strategy, and for saving original ID of SCM import.
>>
>>  Probably want to have a prescribed syntax for specifying what note this
>>  is, so that different programs using notes don't confuse each other.
>
> How about git ignoring all X- headers and let programs freely add
> them? For example, X-SVN may be used for git-svn.

Please don't.

When two people/programs create an otherwise identical (for the
purpose of git) commits that have two different object names,
there'd better be a very good reason other than "I felt like
adding an extra header that I can use willy-nilly".

Please separate the 'note' part and the 'generation' part and
make two separate discussion threads.

And kill 'note' part altogether, but that can be done in that
thread ;-).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] New commit object headers: note headers
  2008-02-09 21:03     ` Junio C Hamano
@ 2008-02-09 23:26       ` Jakub Narebski
  2008-02-10  1:08         ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Narebski @ 2008-02-09 23:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, Daniel Barkalow, git

On Sat, 9 Feb 2008, Junio C Hamano wrote:
> "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> writes:
>> On Sun, Feb 10, 2008 at 12:35 AM, Daniel Barkalow <barkalow@iabervon.org> wrote:
>>> On Sat, 9 Feb 2008, Jakub Narebski wrote:
>>>>
>>>> 2. 'note' header (no semantical meaning)
>>>>
>>>> There was some time ago discussion about adding 'note' header, initially
>>>> to save original sha-1 of a commit for cherry-picking and rebase; then
>>>> for saving explicit rename or corrected rename info, for saving chosen
>>>> merge strategy, and for saving original ID of SCM import.
>>>
>>>  Probably want to have a prescribed syntax for specifying what note this
>>>  is, so that different programs using notes don't confuse each other.
>>
>> How about git ignoring all X- headers and let programs freely add
>> them? For example, X-SVN may be used for git-svn.
> 
> Please don't.
> 
> When two people/programs create an otherwise identical (for the
> purpose of git) commits that have two different object names,
> there'd better be a very good reason other than "I felt like
> adding an extra header that I can use willy-nilly".
> 
> Please separate the 'note' part and the 'generation' part and
> make two separate discussion threads.
> 
> And kill 'note' part altogether, but that can be done in that
> thread ;-).

Ah, well... very good reservation. So here it goes generic 'note'
(or 'X-*', similarly to non-standarized email headers) header...

Still I think it is would be nice to have original commit id in
a header when importing from foreign SCM. First, it would not pollute
commit message, which would be identical with the original commit
message (which allows easy two-way interaction). Second, it is
much easier and much less error prone to extract it by machine.

As to marking explicitely renames and copies...
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] New commit object headers: note headers
  2008-02-09 23:26       ` [RFC] New commit object headers: " Jakub Narebski
@ 2008-02-10  1:08         ` Johannes Schindelin
  2008-02-11 10:08           ` Jakub Narebski
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2008-02-10  1:08 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, Daniel Barkalow, git

Hi,

On Sun, 10 Feb 2008, Jakub Narebski wrote:

> Still I think it is would be nice to have original commit id in a header 
> when importing from foreign SCM. First, it would not pollute commit 
> message, which would be identical with the original commit message 
> (which allows easy two-way interaction). Second, it is much easier and 
> much less error prone to extract it by machine.

I cannot agree to either reason.  It is _not_ a git specific header, so it 
does not belong in the commit header.

Also, I find it does not clutter the commit message _at all_, but adds 
information that the user might find useful.

Lastly, I cannot see _any_ reason why it should be _easier_ or _less error 
prone_ to put an "original commit id" into the commit header than into the 
commit body.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] New commit object headers: note headers
  2008-02-10  1:08         ` Johannes Schindelin
@ 2008-02-11 10:08           ` Jakub Narebski
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Narebski @ 2008-02-11 10:08 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, Daniel Barkalow, git

On Sunday, 10 February 2008, Johannes Schindelin wrote:
> On Sun, 10 Feb 2008, Jakub Narebski wrote:
> 
>> Still I think it is would be nice to have original commit id in a header 
>> when importing from foreign SCM. First, it would not pollute commit 
>> message, which would be identical with the original commit message 
>> (which allows easy two-way interaction). Second, it is much easier and 
>> much less error prone to extract it by machine.
> 
> I cannot agree to either reason.  It is _not_ a git specific header, so it 
> does not belong in the commit header.

Well, that, and the fact that the same commit imported using two
different tools, one using this header and one didn't would result
in different commit object... although if they differ in adding
original revision id to the commit message commit objects would
differ too.

> Also, I find it does not clutter the commit message _at all_, but adds 
> information that the user might find useful.

Revisions ids can be long, and together with prefix introducing
original SCM revision identifier be longer than customary 80
characters.

Besides, "git cherry-pick" was changed to _not_ add information
about original commit id by default. Shouldn't this apply also
for import?

> Lastly, I cannot see _any_ reason why it should be _easier_ or _less error 
> prone_ to put an "original commit id" into the commit header than into the 
> commit body.

Well, if commit message talks about foreign commit IDs...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-02-11 10:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-09 16:46 [RFC] New commit object headers: generation and note headers Jakub Narebski
2008-02-09 17:35 ` Daniel Barkalow
2008-02-09 17:50   ` Nguyen Thai Ngoc Duy
2008-02-09 21:03     ` Junio C Hamano
2008-02-09 23:26       ` [RFC] New commit object headers: " Jakub Narebski
2008-02-10  1:08         ` Johannes Schindelin
2008-02-11 10:08           ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).