* extra headers in commit objects
@ 2010-02-03 17:40 Shawn O. Pearce
2010-02-03 18:15 ` Nicolas Pitre
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Shawn O. Pearce @ 2010-02-03 17:40 UTC (permalink / raw)
To: git
Am I correct that core C developers are still under the opinion
that extra headers in a commit object aren't encouraged?
That is, we shouldn't see something like this made-up example:
$ git cat-file commit HEAD
tree e0fb24d872e2daa1507ea5879e1cdce5c0da9902
parent ec0865178ad6d8dab9ccd82b07bc3f3dae20542a
parent 89d61592bddda4dfcb90314be9e06479f712bb7f
author Junio C Hamano <gitster@pobox.com> 1265176189 -0800
committer Junio C Hamano <gitster@pobox.com> 1265176189 -0800
bug 18389
url http://example.com/some/mailing/list/post
message-id <gitster-182819131@gitster.computer>
Merge git://repo.or.cz/git-gui into next
(Sorry Junio for picking on your latest next merge...)
Today I came across this "bug fix" [1,2] in Dulwich, which is
claiming to be a pure-Python implementation of Git.
[1] http://git.samba.org/?p=jelmer/dulwich.git;a=commit;h=bc8d73f1146afba8828a7dadbb4320f592cddcab
[2] http://git.samba.org/?p=jelmer/dulwich.git;a=commitdiff;h=bc8d73f1146afba8828a7dadbb4320f592cddcab;hp=4e50426fb72e6c9259feecbba5bfcf053af62335
I haven't spoken with Jelmer Vernooij directly about it, but after
some indirect email through a 3rd party, it seems he might be under
the impression that this really is a bug in Dulwich, because "other
git implementations do it".
Uhm.
I thought the canonical reference implementation was C Git
(aka git-core), as maintained by Junio Hamano, and the object
formats, core data structures, and network protocols were
fairly well documented between the Git Community Book and the
Documentation/technical/ directory.
The only other widely used Git implementation that I know of is JGit.
It sure as hell doesn't do this, and it sure as hell isn't what I
would call the reference implementation for Git... and that project
is my own baby.
Yes, there are many other Git implementations. But I thought nearly
all of them were toys, and none of them were even close to serving
the kind of production volume that JGit serves, and JGit isn't even
considered a production library by most. Yet JGit always tries to
conform to whatever standard is set by the C implementation.
Basically, aside from having a pretty horrible morning thus far,
and being in a really bad mood, I'm starting to get a bit worried
about the proliferation of Git implementations, and what the notion
of the standard network protocol and file formats is.
We're starting to see a fork in the basic protocols happen. Hell,
Dulwich 0.4.1 isn't even capable of speaking over the network to
C Git, but it does talk to itself, so its valid, right? :-(
$ PYTHONPATH=`pwd` ./bin/dul-daemon . &
$ git clone git://localhost/.git
Initialized empty Git repository in /usr/local/google/users/sop/tmp/localhost/.git/
fetch-pack: protocol error: bad band #78
fatal: early EOF
fatal: index-pack failed
Fortunately a friend of mine is spending some time trying to patch
it up... trying to get it back in compliance with the C reference
implementation.
At the end of the day, is it a bug that C git doesn't support
working with extra commit headers? IMHO, no, because, we've
rejected these in the past, and its not part of the Git standard.
And other implementations shouldn't be trying to sell it that way.
</rather-pissed-off-rant>
--
Shawn.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce
@ 2010-02-03 18:15 ` Nicolas Pitre
2010-02-03 19:01 ` demerphq
2010-02-03 19:53 ` Sverre Rabbelier
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Nicolas Pitre @ 2010-02-03 18:15 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
On Wed, 3 Feb 2010, Shawn O. Pearce wrote:
> Am I correct that core C developers are still under the opinion
> that extra headers in a commit object aren't encouraged?
I would say so.
[...]
> At the end of the day, is it a bug that C git doesn't support
> working with extra commit headers? IMHO, no, because, we've
> rejected these in the past, and its not part of the Git standard.
> And other implementations shouldn't be trying to sell it that way.
Agreed. And this was discussed in great length on this list on few
occasions already (probably more than a year back).
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 18:15 ` Nicolas Pitre
@ 2010-02-03 19:01 ` demerphq
2010-02-03 19:26 ` Shawn O. Pearce
2010-02-03 19:26 ` Petr Baudis
0 siblings, 2 replies; 20+ messages in thread
From: demerphq @ 2010-02-03 19:01 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Shawn O. Pearce, git
On 3 February 2010 19:15, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Wed, 3 Feb 2010, Shawn O. Pearce wrote:
>
>> Am I correct that core C developers are still under the opinion
>> that extra headers in a commit object aren't encouraged?
>
> I would say so.
>
> [...]
>> At the end of the day, is it a bug that C git doesn't support
>> working with extra commit headers? IMHO, no, because, we've
>> rejected these in the past, and its not part of the Git standard.
>> And other implementations shouldn't be trying to sell it that way.
>
> Agreed. And this was discussed in great length on this list on few
> occasions already (probably more than a year back).
One problem, is that if you take the approach you say then you
basically guarantee that a new git that DOES add new headers will
break an old git that doesnt know about the headers, and actually
doesnt care about them either.
So it would essentially mean that if you ever have to change the
commit format you will be in a position where new git commits will be
incompatible by design with old git commits.
Maybe I misunderstand, but this doesnt seem to accord with my reading
of the original design objectives and philosophy of git.
Shouldn't an old git just ignore headers from a new git?
I mean, forget about the fact that somebody is doing something naughty
with the git protocol, ask youself if you want this rule to basically
prevent any backwards compatible changes with older gits.
As a lurker here I understand completely if you ignore this mail
entirely. But this seems to me to be a decision that could bite you
later.
cheers,
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:01 ` demerphq
@ 2010-02-03 19:26 ` Shawn O. Pearce
2010-02-03 19:40 ` demerphq
` (2 more replies)
2010-02-03 19:26 ` Petr Baudis
1 sibling, 3 replies; 20+ messages in thread
From: Shawn O. Pearce @ 2010-02-03 19:26 UTC (permalink / raw)
To: demerphq; +Cc: Nicolas Pitre, git
demerphq <demerphq@gmail.com> wrote:
> On 3 February 2010 19:15, Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Wed, 3 Feb 2010, Shawn O. Pearce wrote:
> >
> >> Am I correct that core C developers are still under the opinion
> >> that extra headers in a commit object aren't encouraged?
> >
> > I would say so.
> >
> > [...]
> >> At the end of the day, is it a bug that C git doesn't support
> >> working with extra commit headers? ?IMHO, no, because, we've
> >> rejected these in the past, and its not part of the Git standard.
> >> And other implementations shouldn't be trying to sell it that way.
> >
> > Agreed. ?And this was discussed in great length on this list on few
> > occasions already (probably more than a year back).
>
> One problem, is that if you take the approach you say then you
> basically guarantee that a new git that DOES add new headers will
> break an old git that doesnt know about the headers, and actually
> doesnt care about them either.
As I understand it, the current stance is:
1) A compliant Git implementation ignores any headers it doesn't
recognize that appear *after* the optional "encoding" header.
2) A compliant Git implementation does not produce any additional
headers in a commit object, because other implementations cannot
perform any machine based reasoning on them.
3) All implementations would (eventually) treat all headers equally,
that is they all understand what author, committer, encoding are
and process them the same way. Any new headers should equally
be fully cross-implementation.
> So it would essentially mean that if you ever have to change the
> commit format you will be in a position where new git commits will be
> incompatible by design with old git commits.
So, we can change the format by adding a new header, after the
optional "encoding" header.
But such a change needs to be something that an older Git will
safely ignore (due to rule 1), and something that a newer Git can
make really effective use of (due to rule 2 and 3). And that newer
Git must also safely deal with commits missing that new header, due
to the huge number of commits out in the wild without said header.
And don't even get me started on amending commits with new unknown
headers. Existing implementions of Git tools will drop the extra
headers during the amend, because the headers are viewed as part
of the commit object data... and during an amend you are making a
totally new object.
For example, git-gui would drop any extra headers during an amend,
because its running `git commit-tree` directly without any way to
tell commit-tree this is for an amend of an existing commit, vs. a
completely new commit... because either way its a new commit object.
> Shouldn't an old git just ignore headers from a new git?
Yes, see above.
--
Shawn.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:01 ` demerphq
2010-02-03 19:26 ` Shawn O. Pearce
@ 2010-02-03 19:26 ` Petr Baudis
2010-02-03 19:43 ` demerphq
2010-02-03 20:03 ` Nicolas Pitre
1 sibling, 2 replies; 20+ messages in thread
From: Petr Baudis @ 2010-02-03 19:26 UTC (permalink / raw)
To: demerphq; +Cc: Nicolas Pitre, Shawn O. Pearce, git
On Wed, Feb 03, 2010 at 08:01:17PM +0100, demerphq wrote:
> Shouldn't an old git just ignore headers from a new git?
>
> I mean, forget about the fact that somebody is doing something naughty
> with the git protocol, ask youself if you want this rule to basically
> prevent any backwards compatible changes with older gits.
We have done similar changes in the past and if there would be such
a change, we can phase-in it over the course of several releases.
I think the fall-out would not be that bad; we have some experience
with even making Debian-stable Git compatible with new stuff. ;-)
Also, what if any extra header would be essential and we _wanted_
non-compatible Git to break down on it?
On the other hand, allowing this preventively would apparently have
the immediate effect of alternative implementations users happily
starting to use it, and then to get to the data, people would demand
git-core support as well. _And_ so far everyone seems really really
fairly sure we don't want the headers and it's not likely to change.
P.S.: On the other hand, I think that change was probably just
misguided, not malicious. And I wouldn't be that hard on Dulwich,
it's an early-0.x software after all, it's allowed to crash and have
protocol issues. ;-)
--
Petr "Pasky" Baudis
If you can't see the value in jet powered ants you should turn in
your nerd card. -- Dunbal (464142)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:26 ` Shawn O. Pearce
@ 2010-02-03 19:40 ` demerphq
2010-02-03 20:42 ` Junio C Hamano
2010-02-04 0:41 ` A Large Angry SCM
2 siblings, 0 replies; 20+ messages in thread
From: demerphq @ 2010-02-03 19:40 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Nicolas Pitre, git
On 3 February 2010 20:26, Shawn O. Pearce <spearce@spearce.org> wrote:
> demerphq <demerphq@gmail.com> wrote:
>> On 3 February 2010 19:15, Nicolas Pitre <nico@fluxnic.net> wrote:
>> > On Wed, 3 Feb 2010, Shawn O. Pearce wrote:
>> >
>> >> Am I correct that core C developers are still under the opinion
>> >> that extra headers in a commit object aren't encouraged?
>> >
>> > I would say so.
>> >
>> > [...]
>> >> At the end of the day, is it a bug that C git doesn't support
>> >> working with extra commit headers? ?IMHO, no, because, we've
>> >> rejected these in the past, and its not part of the Git standard.
>> >> And other implementations shouldn't be trying to sell it that way.
>> >
>> > Agreed. ?And this was discussed in great length on this list on few
>> > occasions already (probably more than a year back).
>>
>> One problem, is that if you take the approach you say then you
>> basically guarantee that a new git that DOES add new headers will
>> break an old git that doesnt know about the headers, and actually
>> doesnt care about them either.
>
> As I understand it, the current stance is:
>
> 1) A compliant Git implementation ignores any headers it doesn't
> recognize that appear *after* the optional "encoding" header.
Ignores but passes through?
> 2) A compliant Git implementation does not produce any additional
> headers in a commit object, because other implementations cannot
> perform any machine based reasoning on them.
>
> 3) All implementations would (eventually) treat all headers equally,
> that is they all understand what author, committer, encoding are
> and process them the same way. Any new headers should equally
> be fully cross-implementation.
>
>> So it would essentially mean that if you ever have to change the
>> commit format you will be in a position where new git commits will be
>> incompatible by design with old git commits.
>
> So, we can change the format by adding a new header, after the
> optional "encoding" header.
>
> But such a change needs to be something that an older Git will
> safely ignore (due to rule 1), and something that a newer Git can
> make really effective use of (due to rule 2 and 3). And that newer
> Git must also safely deal with commits missing that new header, due
> to the huge number of commits out in the wild without said header.
>
> And don't even get me started on amending commits with new unknown
> headers. Existing implementions of Git tools will drop the extra
> headers during the amend, because the headers are viewed as part
> of the commit object data... and during an amend you are making a
> totally new object.
>
> For example, git-gui would drop any extra headers during an amend,
> because its running `git commit-tree` directly without any way to
> tell commit-tree this is for an amend of an existing commit, vs. a
> completely new commit... because either way its a new commit object.
>
>> Shouldn't an old git just ignore headers from a new git?
>
> Yes, see above.
Right, which seems to sum to up to "that boat sailed, forget about
it", which is fair enough.
Which I say from the point of view of arbitrary headers not approved
by the git dev team. You can ensure that any new *approved* headers
have the semantics that "if they arent passed through it doesnt
matter", whereas you cant know whether a header should be passed
through or not that comes from some other source.
Well unless you introduced a convention that some header prefix is to
be preserved on amend, but other prefixes shouldnt be.
I can imagine that might be a nasty place to go tho. :-)
Anyway, thanks a lot for taking the time to explain this a bit more.
cheers,
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:26 ` Petr Baudis
@ 2010-02-03 19:43 ` demerphq
2010-02-03 20:31 ` Shawn O. Pearce
2010-02-03 20:03 ` Nicolas Pitre
1 sibling, 1 reply; 20+ messages in thread
From: demerphq @ 2010-02-03 19:43 UTC (permalink / raw)
To: Petr Baudis; +Cc: Nicolas Pitre, Shawn O. Pearce, git
On 3 February 2010 20:26, Petr Baudis <pasky@suse.cz> wrote:
> On Wed, Feb 03, 2010 at 08:01:17PM +0100, demerphq wrote:
>> Shouldn't an old git just ignore headers from a new git?
>>
>> I mean, forget about the fact that somebody is doing something naughty
>> with the git protocol, ask youself if you want this rule to basically
>> prevent any backwards compatible changes with older gits.
>
> We have done similar changes in the past and if there would be such
> a change, we can phase-in it over the course of several releases.
> I think the fall-out would not be that bad; we have some experience
> with even making Debian-stable Git compatible with new stuff. ;-)
> Also, what if any extra header would be essential and we _wanted_
> non-compatible Git to break down on it?
Right. The only solution i can see would have had to have been
implemented already. And that would involved some headers being marked
"pass through", some "marked throw away on cherry-pick" and some
"choke horribly if you find this and dont know what it is".
And even with somethng like that one wonders if notes arent really a
better alternative to user defined headers anyway?
> On the other hand, allowing this preventively would apparently have
> the immediate effect of alternative implementations users happily
> starting to use it, and then to get to the data, people would demand
> git-core support as well. _And_ so far everyone seems really really
> fairly sure we don't want the headers and it's not likely to change.
Yes, right understood.
>
> P.S.: On the other hand, I think that change was probably just
> misguided, not malicious. And I wouldn't be that hard on Dulwich,
> it's an early-0.x software after all, it's allowed to crash and have
> protocol issues. ;-)
Heh. I have no opinion on Dulwich. Didnt even know it existed until this mail.
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce
2010-02-03 18:15 ` Nicolas Pitre
@ 2010-02-03 19:53 ` Sverre Rabbelier
2010-02-03 19:58 ` Scott Chacon
2010-02-03 20:58 ` Jelmer Vernooij
3 siblings, 0 replies; 20+ messages in thread
From: Sverre Rabbelier @ 2010-02-03 19:53 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git, Jelmer Vernooij, Jelmer Vernooij
Heya,
[+cc Jelmer]
On Wed, Feb 3, 2010 at 18:40, Shawn O. Pearce <spearce@spearce.org> wrote:
> I haven't spoken with Jelmer Vernooij directly about it, but after
> some indirect email through a 3rd party, it seems he might be under
> the impression that this really is a bug in Dulwich, because "other
> git implementations do it".
That would seem like the #1 thing to do, I'm sure Jelmer (cc-ed) can
both benefit from this discussion, and perhaps explain what is going
on from first hand. Full thread as it's developing can be found here
[0]. Jelmer, you can just reply to this, no need to subscribe or such.
Also, it's custom on the git list to cc all involved, so you should be
in on the conversation for any emails that are a reply to mine.
[0] http://thread.gmane.org/gmane.comp.version-control.git/138848
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce
2010-02-03 18:15 ` Nicolas Pitre
2010-02-03 19:53 ` Sverre Rabbelier
@ 2010-02-03 19:58 ` Scott Chacon
2010-02-03 22:48 ` Shawn O. Pearce
2010-02-03 20:58 ` Jelmer Vernooij
3 siblings, 1 reply; 20+ messages in thread
From: Scott Chacon @ 2010-02-03 19:58 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
Hey,
On Wed, Feb 3, 2010 at 9:40 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Today I came across this "bug fix" [1,2] in Dulwich, which is
> claiming to be a pure-Python implementation of Git.
>
> I haven't spoken with Jelmer Vernooij directly about it, but after
> some indirect email through a 3rd party, it seems he might be under
> the impression that this really is a bug in Dulwich, because "other
> git implementations do it".
At the risk of pissing you off for the second time in as many days,
this is entirely my fault. I was having a beer with Jelmer in
Wellington a few weeks ago during LinuxConf.au and we were talking
about the difficulties in storing metadata having to do with cross-vcs
migrations - specifically his work with an bzr-git bridge and mine
with the hg-git project. He was noting that I kept all my metadata
about original Hg commits in Git as formatted text in the commit
message, which is pretty uggo (especially with the amount of sometimes
inconsistent denormalization of data Hg does on commit, explicitly
recording renames and manifests and whatnot).
Anyhow, I was saying that _technically_ you can artificially write
extra headers into the commit object (though at the time Dulwich
didn't support reading them because of how it parsed commit objects -
I believe it would actually explode if it saw something it didn't
expect). I said I was still going to keep the metadata in my
implementation in the message, but he was very interested in hiding
his in the commit headers. To my defense, we (you and I, Shawn)
talked about this at the GitTogether this year and you and a few
others told me that CGit would not blow up but would just ignore them,
which is fine for his purposes. I certainly did not get the
impression from that short discussion that this was something to be
absolutely avoided, but rather that it just wasn't really encouraged
or explicitly supported.
Oddly enough, this whole thing basically came up because we were
noting that you can hide extra data in Hg changesets, but it's a
ridiculous hack involving adding it after a null byte in the timestamp
field, much like we do in adding the capabilities after the first ref
in the negotiation phase of the tranfer protocol. I was just casually
saying, "yeah, you can actually technically do that a lot cleaner in
Git"...
Sorry. So, for future reference, though CGit _can_ handle it, don't?
thanks,
Scott
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:26 ` Petr Baudis
2010-02-03 19:43 ` demerphq
@ 2010-02-03 20:03 ` Nicolas Pitre
1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2010-02-03 20:03 UTC (permalink / raw)
To: Petr Baudis; +Cc: demerphq, Shawn O. Pearce, git
On Wed, 3 Feb 2010, Petr Baudis wrote:
> On Wed, Feb 03, 2010 at 08:01:17PM +0100, demerphq wrote:
> > Shouldn't an old git just ignore headers from a new git?
> >
> > I mean, forget about the fact that somebody is doing something naughty
> > with the git protocol, ask youself if you want this rule to basically
> > prevent any backwards compatible changes with older gits.
>
> We have done similar changes in the past and if there would be such
> a change, we can phase-in it over the course of several releases.
> I think the fall-out would not be that bad; we have some experience
> with even making Debian-stable Git compatible with new stuff. ;-)
Heh... That's because I was crazy enough to do that work so the new
features I implemented in the latest version could be enabled by default
sooner. And incidentally those features weren't controvertial at all
which sorta helped.
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:43 ` demerphq
@ 2010-02-03 20:31 ` Shawn O. Pearce
0 siblings, 0 replies; 20+ messages in thread
From: Shawn O. Pearce @ 2010-02-03 20:31 UTC (permalink / raw)
To: demerphq; +Cc: Petr Baudis, Nicolas Pitre, git
demerphq <demerphq@gmail.com> wrote:
> On 3 February 2010 20:26, Petr Baudis <pasky@suse.cz> wrote:
> Right. The only solution i can see would have had to have been
> implemented already. And that would involved some headers being marked
> "pass through", some "marked throw away on cherry-pick" and some
> "choke horribly if you find this and dont know what it is".
>
> And even with somethng like that one wonders if notes arent really a
> better alternative to user defined headers anyway?
Yes, exactly.
I think notes turn out to be a much better way to store this extra
data, provided you are OK with them being disconnected during an
amend, cherry-pick, filter-branch, or rebase... :-)
And unlike additional headers, git implementations will likely
support notes, because they are a good way to attach additional
user data onto commits.
--
Shawn.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:26 ` Shawn O. Pearce
2010-02-03 19:40 ` demerphq
@ 2010-02-03 20:42 ` Junio C Hamano
2010-02-03 21:04 ` Shawn O. Pearce
2010-02-04 0:41 ` A Large Angry SCM
2 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2010-02-03 20:42 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: demerphq, Nicolas Pitre, git
"Shawn O. Pearce" <spearce@spearce.org> writes:
> As I understand it, the current stance is:
>
> 1) A compliant Git implementation ignores any headers it doesn't
> recognize that appear *after* the optional "encoding" header.
I first read the above to mean that you need to add encoding if you want
to throw in other garbage.
I would say "*after* the mandatory 'tree', 'parent' (0 or more), 'author',
and 'committer' headers that must appear in this order", for clarity.
> 2) A compliant Git implementation does not produce any additional
> headers in a commit object, because other implementations cannot
> perform any machine based reasoning on them.
>
> 3) All implementations would (eventually) treat all headers equally,
> that is they all understand what author, committer, encoding are
> and process them the same way. Any new headers should equally
> be fully cross-implementation.
These are very important points.
In your made-up example you added "bug" (presumably to mean "fixes this
bug") and "message-id" ("am-ed from this message"). The latter might make
sense, but the former does not belong to the header, as it is not a
statement of the fact.
Forcing people to say "this fixes" at the commit time means you do not
allow mistakes---it may turn out to be an incorrect or non fix later.
When you are amending the commit to say "this does not really fix it", you
would want to lose the old "bug" header, but you would want to keep the
"message-id" one. There simply is not enough hint as to which ones must
be carried across amending in the "we allow people to randomly throw extra
headers into the commit object" model. It is not a model--it is chaos.
Also it wouldn't be obvious to other people what got changed while
comparing two commits (before and after the amend) if the information is
hidden in the header. The right place for that kind of information is in
the log message (if the nature of the information is for everybody to see)
or in notes.
Another major difference between extra random headers and notes is that
the former changes the commit's object name, and if it is due to "random
headers", it means you are breaking the object model for no good reason.
Introducing extra headers needs to be done _very_ carefully after thinking
things through, judging the pros and cons. Even though we kept the format
open to allow us to extend the format to add essential statement of fact
that we can make at the commit time (e.g. "encoding"), I do not foresee us
adding any official extra headers in near future.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce
` (2 preceding siblings ...)
2010-02-03 19:58 ` Scott Chacon
@ 2010-02-03 20:58 ` Jelmer Vernooij
2010-02-03 21:17 ` Nicolas Pitre
2010-02-03 22:39 ` Shawn O. Pearce
3 siblings, 2 replies; 20+ messages in thread
From: Jelmer Vernooij @ 2010-02-03 20:58 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 3779 bytes --]
Hi Shawn,
On Wed, 2010-02-03 at 09:40 -0800, Shawn O. Pearce wrote:
> Am I correct that core C developers are still under the opinion
> that extra headers in a commit object aren't encouraged?
>
> That is, we shouldn't see something like this made-up example:
>
> $ git cat-file commit HEAD
> tree e0fb24d872e2daa1507ea5879e1cdce5c0da9902
> parent ec0865178ad6d8dab9ccd82b07bc3f3dae20542a
> parent 89d61592bddda4dfcb90314be9e06479f712bb7f
> author Junio C Hamano <gitster@pobox.com> 1265176189 -0800
> committer Junio C Hamano <gitster@pobox.com> 1265176189 -0800
> bug 18389
> url http://example.com/some/mailing/list/post
> message-id <gitster-182819131@gitster.computer>
>
> Merge git://repo.or.cz/git-gui into next
>
> (Sorry Junio for picking on your latest next merge...)
> Today I came across this "bug fix" [1,2] in Dulwich, which is
> claiming to be a pure-Python implementation of Git.
>
> [1] http://git.samba.org/?p=jelmer/dulwich.git;a=commit;h=bc8d73f1146afba8828a7dadbb4320f592cddcab
> [2] http://git.samba.org/?p=jelmer/dulwich.git;a=commitdiff;h=bc8d73f1146afba8828a7dadbb4320f592cddcab;hp=4e50426fb72e6c9259feecbba5bfcf053af62335
>
> I haven't spoken with Jelmer Vernooij directly about it, but after
> some indirect email through a 3rd party, it seems he might be under
> the impression that this really is a bug in Dulwich, because "other
> git implementations do it".
If you have concerns like this in the future, please don't hesitate to
contact me directly. I don't follow the git list because it's a
high-volume list where pretty much all traffic is irrelevant to me. The
only reason I became aware of this thread was because Sverre CC'ed me.
> Uhm.
Originally I was under the impression that custom headers would break
(by reading the C Git source code) and so Dulwich made that assumption,
but after hearing from several people (among whom Scott, see his reply)
at Linux.Conf.Au that custom headers could be added and were ignored by
C git I made this change.
Since Dulwich would blow up when it encountered custom headers that
might be set by other Git implements and since (as I understand) C git
ignores unknown headers, I called this a bug fix. This change made it
possible to deal with custom headers whenever they would appear *and*
allowed users of the Dulwich API to set custom headers.
(FWIW I haven't actually seen anybody setting custom headers)
If this is indeed a misunderstanding, I'll happily make this
datastructure with custom headers read-only.
[...]
> Yes, there are many other Git implementations. But I thought nearly
> all of them were toys, and none of them were even close to serving
> the kind of production volume that JGit serves, and JGit isn't even
> considered a production library by most. Yet JGit always tries to
> conform to whatever standard is set by the C implementation.
So does Dulwich. I've fixed issues in the compatibility with C Git when
I've noticed them or have been made aware of them. Any incompatibilities
are the result of ignorance on my part rather than malicious intent.
[...]
> We're starting to see a fork in the basic protocols happen. Hell,
> Dulwich 0.4.1 isn't even capable of speaking over the network to
> C Git, but it does talk to itself, so its valid, right? :-(
I've been using Dulwich's client to talk to C Git servers for ages and
haven't seen issues. I would appreciate hearing about
incompatibilities.
If you're talking about the server side - we know it's broken, at least
dul-daemon. Nobody (except for API changes) has really cared about it
since John Carr originally hacked it up. I'd be surprised if it even
works with the Dulwich client.
Cheers,
Jelmer
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 20:42 ` Junio C Hamano
@ 2010-02-03 21:04 ` Shawn O. Pearce
2010-02-04 0:38 ` Junio C Hamano
0 siblings, 1 reply; 20+ messages in thread
From: Shawn O. Pearce @ 2010-02-03 21:04 UTC (permalink / raw)
To: Junio C Hamano; +Cc: demerphq, Nicolas Pitre, git
Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
>
> > As I understand it, the current stance is:
> >
> > 1) A compliant Git implementation ignores any headers it doesn't
> > recognize that appear *after* the optional "encoding" header.
>
> I first read the above to mean that you need to add encoding if you want
> to throw in other garbage.
>
> I would say "*after* the mandatory 'tree', 'parent' (0 or more), 'author',
> and 'committer' headers that must appear in this order", for clarity.
Yes, sorry, of course that is what I meant. Thanks for the
clarification.
To add to that, "after encoding, if encoding is present".
> > 2) A compliant Git implementation does not produce any additional
> > headers in a commit object, because other implementations cannot
> > perform any machine based reasoning on them.
> >
> > 3) All implementations would (eventually) treat all headers equally,
> > that is they all understand what author, committer, encoding are
> > and process them the same way. Any new headers should equally
> > be fully cross-implementation.
>
> These are very important points.
>
> In your made-up example you added "bug" (presumably to mean "fixes this
> bug") and "message-id" ("am-ed from this message"). The latter might make
> sense, but the former does not belong to the header, as it is not a
> statement of the fact.
This all came out of what appears to be a tool to bridge another
VCS system data into Git. Ala git-svn.
We all know that some other systems, e.g. SVN, permit adding
additional properties to commits, and that often these are used
to make statements like "Fixed bug NNNN", and bug tracking systems
integrate into SVN by reading or updating those properties.
So you, Nico, myself, might all agree that "bug" does not belong
in the header, but many others see it like SVN sees additional
properties on a revision, and thus it goes there.
Hence the artifical example. It seems that it is not that artifical
outside of our mailing list.
> Forcing people to say "this fixes" at the commit time means you do not
> allow mistakes---it may turn out to be an incorrect or non fix later.
Yup, happens often.
> When you are amending the commit to say "this does not really fix it", you
> would want to lose the old "bug" header, but you would want to keep the
> "message-id" one. There simply is not enough hint as to which ones must
> be carried across amending in the "we allow people to randomly throw extra
> headers into the commit object" model. It is not a model--it is chaos.
Exactly. That's what I had thought our position was, for exactly
this reason, it very quickly devolves into a chaos we can't reason
about, let alone write code to support for end-users.
> Also it wouldn't be obvious to other people what got changed while
> comparing two commits (before and after the amend) if the information is
> hidden in the header. The right place for that kind of information is in
> the log message (if the nature of the information is for everybody to see)
> or in notes.
I'm afraid users might insert their own headers, then come report
the bug that `git log` and `git show` don't make those headers
visible when formatting the commit. After all, they show the author
committer, and parent information when you use the right flags.
We'll of course say, its not in the message, and suggest using the
footer style like our Signed-off-by lines, or notes, which appear
below the message if requested.
> Introducing extra headers needs to be done _very_ carefully after thinking
> things through, judging the pros and cons. Even though we kept the format
> open to allow us to extend the format to add essential statement of fact
> that we can make at the commit time (e.g. "encoding"), I do not foresee us
> adding any official extra headers in near future.
Right, me neither, because everything that has been proposed for an
extra header (e.g. bug id, Message-Id from the email it as git-amed
from, rename tracking, ...) has all been suggested to be better
positioned in the message itself, or in a note, or not at all...
--
Shawn.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 20:58 ` Jelmer Vernooij
@ 2010-02-03 21:17 ` Nicolas Pitre
2010-02-03 22:39 ` Shawn O. Pearce
1 sibling, 0 replies; 20+ messages in thread
From: Nicolas Pitre @ 2010-02-03 21:17 UTC (permalink / raw)
To: Jelmer Vernooij; +Cc: Shawn O. Pearce, git
On Wed, 3 Feb 2010, Jelmer Vernooij wrote:
> Since Dulwich would blow up when it encountered custom headers that
> might be set by other Git implements and since (as I understand) C git
> ignores unknown headers, I called this a bug fix. This change made it
> possible to deal with custom headers whenever they would appear *and*
> allowed users of the Dulwich API to set custom headers.
>
> (FWIW I haven't actually seen anybody setting custom headers)
>
> If this is indeed a misunderstanding, I'll happily make this
> datastructure with custom headers read-only.
Please do so.
It is best to consider the Git note facility for the addition of such
custom notations. Notes can be attached to commits and changed at will
while the commit objects themselves cannot (unless you rewrite history).
Nicolas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 20:58 ` Jelmer Vernooij
2010-02-03 21:17 ` Nicolas Pitre
@ 2010-02-03 22:39 ` Shawn O. Pearce
1 sibling, 0 replies; 20+ messages in thread
From: Shawn O. Pearce @ 2010-02-03 22:39 UTC (permalink / raw)
To: Jelmer Vernooij; +Cc: git
Jelmer Vernooij <jelmer@samba.org> wrote:
> On Wed, 2010-02-03 at 09:40 -0800, Shawn O. Pearce wrote:
> >
> > I haven't spoken with Jelmer Vernooij directly about it, but after
> > some indirect email through a 3rd party, it seems he might be under
> > the impression that this really is a bug in Dulwich, because "other
> > git implementations do it".
>
> If you have concerns like this in the future, please don't hesitate to
> contact me directly.
OK.
> I don't follow the git list because it's a
> high-volume list where pretty much all traffic is irrelevant to me. The
> only reason I became aware of this thread was because Sverre CC'ed me.
I probably should have CC'd you in from the beginning, sorry.
Its true, this is a high-volume list. But we don't see much, if
anything, about Dulwich here. Yet I for one like to see discussion
about other implementations here, to some extent, so its easier
to make sure everyone is staying close to the C implementation's
reference standard.
> Originally I was under the impression that custom headers would break
> (by reading the C Git source code) and so Dulwich made that assumption,
> but after hearing from several people (among whom Scott, see his reply)
> at Linux.Conf.Au that custom headers could be added and were ignored by
> C git I made this change.
Yes, apparently Scott didn't quite represent things accurately.
Oh well, it seems its been raised now, and beaten to death.
> Since Dulwich would blow up when it encountered custom headers that
> might be set by other Git implements and since (as I understand) C git
> ignores unknown headers, I called this a bug fix.
That's true, and I'm glad you have made that change to Dulwich. It is
a good bug fix to skip over headers you don't recognize.
But, its a new incompatible feature to support writing extra headers.
> If this is indeed a misunderstanding, I'll happily make this
> datastructure with custom headers read-only.
Yes. Please see the other messages in this thread, especially from
Nico and Junio. Setting other headers is not a good idea, and you
shouldn't encourage it in Dulwich by making an API available.
> > Yes, there are many other Git implementations. But I thought nearly
> > all of them were toys, and none of them were even close to serving
> > the kind of production volume that JGit serves, and JGit isn't even
> > considered a production library by most. Yet JGit always tries to
> > conform to whatever standard is set by the C implementation.
>
> So does Dulwich. I've fixed issues in the compatibility with C Git when
> I've noticed them or have been made aware of them. Any incompatibilities
> are the result of ignorance on my part rather than malicious intent.
I'm glad to hear that.
See above about keeping discussion related to other Git implementations
here. We're happy to help explain something that is perhaps vague or
poorly specified. Not everyone has the answer right away, but usually
the list fills in everything.
> > We're starting to see a fork in the basic protocols happen. Hell,
> > Dulwich 0.4.1 isn't even capable of speaking over the network to
> > C Git, but it does talk to itself, so its valid, right? :-(
>
> I've been using Dulwich's client to talk to C Git servers for ages and
> haven't seen issues. I would appreciate hearing about
> incompatibilities.
OK, I haven't actually looked at the Dulwich client code... so I
don't know what its current state is.
> If you're talking about the server side - we know it's broken, at least
> dul-daemon. Nobody (except for API changes) has really cared about it
> since John Carr originally hacked it up. I'd be surprised if it even
> works with the Dulwich client.
OK, then you may be interested in some of the patches my friend
Dave worked up (he said he was going to send them to you).
Dave discovered the server wasn't playing nice with C git, and
asked me for some protocol help to get it going again.
I'm glad its only an issue of neglect (lack of time) and not
something else that has caused it to be incompatible.
--
Shawn.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:58 ` Scott Chacon
@ 2010-02-03 22:48 ` Shawn O. Pearce
2010-02-04 6:24 ` Mike Hommey
0 siblings, 1 reply; 20+ messages in thread
From: Shawn O. Pearce @ 2010-02-03 22:48 UTC (permalink / raw)
To: Scott Chacon; +Cc: git
Scott Chacon <schacon@gmail.com> wrote:
> On Wed, Feb 3, 2010 at 9:40 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> > Today I came across this "bug fix" [1,2] in Dulwich, which is
> > claiming to be a pure-Python implementation of Git.
> >
> > I haven't spoken with Jelmer Vernooij directly about it, but after
> > some indirect email through a 3rd party, it seems he might be under
> > the impression that this really is a bug in Dulwich, because "other
> > git implementations do it".
>
> At the risk of pissing you off for the second time in as many days,
> this is entirely my fault.
Apparently, s**t happens is a good phrase. One I need to learn.
> I was having a beer with Jelmer in Wellington a few weeks ago
And... beer doesn't promote clear thinking.
All is forgiven. As is yesterday's remark about not telling me
sooner about a JGit bug. You really didn't do anything bad, I
just woke up on the wrong side of the bed the past couple of days,
and sort of went off...
Sorry. :-\
> Anyhow, I was saying that _technically_ you can artificially write
> extra headers into the commit object (though at the time Dulwich
> didn't support reading them because of how it parsed commit objects -
> I believe it would actually explode if it saw something it didn't
> expect). I said I was still going to keep the metadata in my
> implementation in the message, but he was very interested in hiding
> his in the commit headers.
Yea, everyone wants to hide that extra metadata. I never get why.
Even in SVN. Why wouldn't I want to see the bug(s) fixed by
a commit? Difference of opinion. I also happen to prefer the
color blue. Dammit, everyone should prefer blue.
> To my defense, we (you and I, Shawn)
> talked about this at the GitTogether this year and you and a few
> others told me that CGit would not blow up but would just ignore them,
> which is fine for his purposes. I certainly did not get the
> impression from that short discussion that this was something to be
> absolutely avoided, but rather that it just wasn't really encouraged
> or explicitly supported.
Sorry. I've held this same opinion as Junio and Nico have expressed
in this thread, that although we ignore extra headers, its only to
leave us an escape hatch in case we add something like "encoding"
in the future. Adding encoding was almost a nightmare because we
didn't have that escape hatch.
I also hold the opinion that the C implementation is correct,
and everyone else is wrong. Even JGit. Unless its a bug in the
C implementation, in which case the bug fix is correct. :-)
Which in this case means, if the C implementation doesn't give
the user plumbing to do something (aside from using git mkobject),
you really should think twice before doing it.
So I apologize if I gave you the wrong impression at the GitTogether.
I claim stupidity as my only defense.
> Sorry. So, for future reference, though CGit _can_ handle it, don't?
C Git won't choke if there are extra headers.
But we _really_ don't want them. And C Git won't be writing any new
headers anytime soon. I think we're more likely to shift the entire
hashing scheme to SHA-512 or something before we add a new header.
--
Shawn.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 21:04 ` Shawn O. Pearce
@ 2010-02-04 0:38 ` Junio C Hamano
0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2010-02-04 0:38 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: demerphq, Nicolas Pitre, git
"Shawn O. Pearce" <spearce@spearce.org> writes:
> We all know that some other systems, e.g. SVN, permit adding
> additional properties to commits, and that often these are used
> to make statements like "Fixed bug NNNN", and bug tracking systems
> integrate into SVN by reading or updating those properties.
>
> So you, Nico, myself, might all agree that "bug" does not belong
> in the header, but many others see it like SVN sees additional
> properties on a revision, and thus it goes there.
>
> Hence the artifical example. It seems that it is not that artifical
> outside of our mailing list.
Aren't the meta-properties like "Fixed bug NNNN" something you can add
after the fact, even in SVN?
We have that in "notes". I never said people are wrong for wanting to
record additional information _about_ commits somewhere (and I didn't say
"artificial" at all---it was you who said it was a "made-up" example).
My point was that they do not belong to the commit _header_, and "but many
others see" doesn't contradict with that. Many others may feel the need
to be able to express random things _about_ the commit; it does not mean
these random things have to go _in_ the commit.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 19:26 ` Shawn O. Pearce
2010-02-03 19:40 ` demerphq
2010-02-03 20:42 ` Junio C Hamano
@ 2010-02-04 0:41 ` A Large Angry SCM
2 siblings, 0 replies; 20+ messages in thread
From: A Large Angry SCM @ 2010-02-04 0:41 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: demerphq, Nicolas Pitre, git
Shawn O. Pearce wrote:
> demerphq <demerphq@gmail.com> wrote:
>> On 3 February 2010 19:15, Nicolas Pitre <nico@fluxnic.net> wrote:
>>> On Wed, 3 Feb 2010, Shawn O. Pearce wrote:
>>>
>>>> Am I correct that core C developers are still under the opinion
>>>> that extra headers in a commit object aren't encouraged?
>>> I would say so.
>>>
>>> [...]
>>>> At the end of the day, is it a bug that C git doesn't support
>>>> working with extra commit headers? ?IMHO, no, because, we've
>>>> rejected these in the past, and its not part of the Git standard.
>>>> And other implementations shouldn't be trying to sell it that way.
>>> Agreed. ?And this was discussed in great length on this list on few
>>> occasions already (probably more than a year back).
>> One problem, is that if you take the approach you say then you
>> basically guarantee that a new git that DOES add new headers will
>> break an old git that doesnt know about the headers, and actually
>> doesnt care about them either.
>
> As I understand it, the current stance is:
>
> 1) A compliant Git implementation ignores any headers it doesn't
> recognize that appear *after* the optional "encoding" header.
>
> 2) A compliant Git implementation does not produce any additional
> headers in a commit object, because other implementations cannot
> perform any machine based reasoning on them.
>
> 3) All implementations would (eventually) treat all headers equally,
> that is they all understand what author, committer, encoding are
> and process them the same way. Any new headers should equally
> be fully cross-implementation.
>
>> So it would essentially mean that if you ever have to change the
>> commit format you will be in a position where new git commits will be
>> incompatible by design with old git commits.
>
> So, we can change the format by adding a new header, after the
> optional "encoding" header.
>
> But such a change needs to be something that an older Git will
> safely ignore (due to rule 1), and something that a newer Git can
> make really effective use of (due to rule 2 and 3). And that newer
> Git must also safely deal with commits missing that new header, due
> to the huge number of commits out in the wild without said header.
>
> And don't even get me started on amending commits with new unknown
> headers. Existing implementions of Git tools will drop the extra
> headers during the amend, because the headers are viewed as part
> of the commit object data... and during an amend you are making a
> totally new object.
>
> For example, git-gui would drop any extra headers during an amend,
> because its running `git commit-tree` directly without any way to
> tell commit-tree this is for an amend of an existing commit, vs. a
> completely new commit... because either way its a new commit object.
>
>> Shouldn't an old git just ignore headers from a new git?
>
> Yes, see above.
>
4) C-git "owns" the header name space. The git ML is _the_ controlling
standards body.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects
2010-02-03 22:48 ` Shawn O. Pearce
@ 2010-02-04 6:24 ` Mike Hommey
0 siblings, 0 replies; 20+ messages in thread
From: Mike Hommey @ 2010-02-04 6:24 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Scott Chacon, git
On Wed, Feb 03, 2010 at 02:48:35PM -0800, Shawn O. Pearce wrote:
> > Anyhow, I was saying that _technically_ you can artificially write
> > extra headers into the commit object (though at the time Dulwich
> > didn't support reading them because of how it parsed commit objects -
> > I believe it would actually explode if it saw something it didn't
> > expect). I said I was still going to keep the metadata in my
> > implementation in the message, but he was very interested in hiding
> > his in the commit headers.
>
> Yea, everyone wants to hide that extra metadata. I never get why.
> Even in SVN. Why wouldn't I want to see the bug(s) fixed by
> a commit? Difference of opinion. I also happen to prefer the
> color blue. Dammit, everyone should prefer blue.
Note, though, that such information may change in the future, in which
case you can't rewrite the commit to fit that.
But for all that, there are git-notes, now, aren't there ?
Mike
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-02-04 6:25 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce
2010-02-03 18:15 ` Nicolas Pitre
2010-02-03 19:01 ` demerphq
2010-02-03 19:26 ` Shawn O. Pearce
2010-02-03 19:40 ` demerphq
2010-02-03 20:42 ` Junio C Hamano
2010-02-03 21:04 ` Shawn O. Pearce
2010-02-04 0:38 ` Junio C Hamano
2010-02-04 0:41 ` A Large Angry SCM
2010-02-03 19:26 ` Petr Baudis
2010-02-03 19:43 ` demerphq
2010-02-03 20:31 ` Shawn O. Pearce
2010-02-03 20:03 ` Nicolas Pitre
2010-02-03 19:53 ` Sverre Rabbelier
2010-02-03 19:58 ` Scott Chacon
2010-02-03 22:48 ` Shawn O. Pearce
2010-02-04 6:24 ` Mike Hommey
2010-02-03 20:58 ` Jelmer Vernooij
2010-02-03 21:17 ` Nicolas Pitre
2010-02-03 22:39 ` Shawn O. Pearce
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).