* extra headers in commit objects @ 2010-02-03 17:40 Shawn O. Pearce 2010-02-03 18:15 ` Nicolas Pitre ` (3 more replies) 0 siblings, 4 replies; 20+ messages in thread From: Shawn O. Pearce @ 2010-02-03 17:40 UTC (permalink / raw) To: git Am I correct that core C developers are still under the opinion that extra headers in a commit object aren't encouraged? That is, we shouldn't see something like this made-up example: $ git cat-file commit HEAD tree e0fb24d872e2daa1507ea5879e1cdce5c0da9902 parent ec0865178ad6d8dab9ccd82b07bc3f3dae20542a parent 89d61592bddda4dfcb90314be9e06479f712bb7f author Junio C Hamano <gitster@pobox.com> 1265176189 -0800 committer Junio C Hamano <gitster@pobox.com> 1265176189 -0800 bug 18389 url http://example.com/some/mailing/list/post message-id <gitster-182819131@gitster.computer> Merge git://repo.or.cz/git-gui into next (Sorry Junio for picking on your latest next merge...) Today I came across this "bug fix" [1,2] in Dulwich, which is claiming to be a pure-Python implementation of Git. [1] http://git.samba.org/?p=jelmer/dulwich.git;a=commit;h=bc8d73f1146afba8828a7dadbb4320f592cddcab [2] http://git.samba.org/?p=jelmer/dulwich.git;a=commitdiff;h=bc8d73f1146afba8828a7dadbb4320f592cddcab;hp=4e50426fb72e6c9259feecbba5bfcf053af62335 I haven't spoken with Jelmer Vernooij directly about it, but after some indirect email through a 3rd party, it seems he might be under the impression that this really is a bug in Dulwich, because "other git implementations do it". Uhm. I thought the canonical reference implementation was C Git (aka git-core), as maintained by Junio Hamano, and the object formats, core data structures, and network protocols were fairly well documented between the Git Community Book and the Documentation/technical/ directory. The only other widely used Git implementation that I know of is JGit. It sure as hell doesn't do this, and it sure as hell isn't what I would call the reference implementation for Git... and that project is my own baby. Yes, there are many other Git implementations. But I thought nearly all of them were toys, and none of them were even close to serving the kind of production volume that JGit serves, and JGit isn't even considered a production library by most. Yet JGit always tries to conform to whatever standard is set by the C implementation. Basically, aside from having a pretty horrible morning thus far, and being in a really bad mood, I'm starting to get a bit worried about the proliferation of Git implementations, and what the notion of the standard network protocol and file formats is. We're starting to see a fork in the basic protocols happen. Hell, Dulwich 0.4.1 isn't even capable of speaking over the network to C Git, but it does talk to itself, so its valid, right? :-( $ PYTHONPATH=`pwd` ./bin/dul-daemon . & $ git clone git://localhost/.git Initialized empty Git repository in /usr/local/google/users/sop/tmp/localhost/.git/ fetch-pack: protocol error: bad band #78 fatal: early EOF fatal: index-pack failed Fortunately a friend of mine is spending some time trying to patch it up... trying to get it back in compliance with the C reference implementation. At the end of the day, is it a bug that C git doesn't support working with extra commit headers? IMHO, no, because, we've rejected these in the past, and its not part of the Git standard. And other implementations shouldn't be trying to sell it that way. </rather-pissed-off-rant> -- Shawn. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce @ 2010-02-03 18:15 ` Nicolas Pitre 2010-02-03 19:01 ` demerphq 2010-02-03 19:53 ` Sverre Rabbelier ` (2 subsequent siblings) 3 siblings, 1 reply; 20+ messages in thread From: Nicolas Pitre @ 2010-02-03 18:15 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git On Wed, 3 Feb 2010, Shawn O. Pearce wrote: > Am I correct that core C developers are still under the opinion > that extra headers in a commit object aren't encouraged? I would say so. [...] > At the end of the day, is it a bug that C git doesn't support > working with extra commit headers? IMHO, no, because, we've > rejected these in the past, and its not part of the Git standard. > And other implementations shouldn't be trying to sell it that way. Agreed. And this was discussed in great length on this list on few occasions already (probably more than a year back). Nicolas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 18:15 ` Nicolas Pitre @ 2010-02-03 19:01 ` demerphq 2010-02-03 19:26 ` Shawn O. Pearce 2010-02-03 19:26 ` Petr Baudis 0 siblings, 2 replies; 20+ messages in thread From: demerphq @ 2010-02-03 19:01 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Shawn O. Pearce, git On 3 February 2010 19:15, Nicolas Pitre <nico@fluxnic.net> wrote: > On Wed, 3 Feb 2010, Shawn O. Pearce wrote: > >> Am I correct that core C developers are still under the opinion >> that extra headers in a commit object aren't encouraged? > > I would say so. > > [...] >> At the end of the day, is it a bug that C git doesn't support >> working with extra commit headers? IMHO, no, because, we've >> rejected these in the past, and its not part of the Git standard. >> And other implementations shouldn't be trying to sell it that way. > > Agreed. And this was discussed in great length on this list on few > occasions already (probably more than a year back). One problem, is that if you take the approach you say then you basically guarantee that a new git that DOES add new headers will break an old git that doesnt know about the headers, and actually doesnt care about them either. So it would essentially mean that if you ever have to change the commit format you will be in a position where new git commits will be incompatible by design with old git commits. Maybe I misunderstand, but this doesnt seem to accord with my reading of the original design objectives and philosophy of git. Shouldn't an old git just ignore headers from a new git? I mean, forget about the fact that somebody is doing something naughty with the git protocol, ask youself if you want this rule to basically prevent any backwards compatible changes with older gits. As a lurker here I understand completely if you ignore this mail entirely. But this seems to me to be a decision that could bite you later. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:01 ` demerphq @ 2010-02-03 19:26 ` Shawn O. Pearce 2010-02-03 19:40 ` demerphq ` (2 more replies) 2010-02-03 19:26 ` Petr Baudis 1 sibling, 3 replies; 20+ messages in thread From: Shawn O. Pearce @ 2010-02-03 19:26 UTC (permalink / raw) To: demerphq; +Cc: Nicolas Pitre, git demerphq <demerphq@gmail.com> wrote: > On 3 February 2010 19:15, Nicolas Pitre <nico@fluxnic.net> wrote: > > On Wed, 3 Feb 2010, Shawn O. Pearce wrote: > > > >> Am I correct that core C developers are still under the opinion > >> that extra headers in a commit object aren't encouraged? > > > > I would say so. > > > > [...] > >> At the end of the day, is it a bug that C git doesn't support > >> working with extra commit headers? ?IMHO, no, because, we've > >> rejected these in the past, and its not part of the Git standard. > >> And other implementations shouldn't be trying to sell it that way. > > > > Agreed. ?And this was discussed in great length on this list on few > > occasions already (probably more than a year back). > > One problem, is that if you take the approach you say then you > basically guarantee that a new git that DOES add new headers will > break an old git that doesnt know about the headers, and actually > doesnt care about them either. As I understand it, the current stance is: 1) A compliant Git implementation ignores any headers it doesn't recognize that appear *after* the optional "encoding" header. 2) A compliant Git implementation does not produce any additional headers in a commit object, because other implementations cannot perform any machine based reasoning on them. 3) All implementations would (eventually) treat all headers equally, that is they all understand what author, committer, encoding are and process them the same way. Any new headers should equally be fully cross-implementation. > So it would essentially mean that if you ever have to change the > commit format you will be in a position where new git commits will be > incompatible by design with old git commits. So, we can change the format by adding a new header, after the optional "encoding" header. But such a change needs to be something that an older Git will safely ignore (due to rule 1), and something that a newer Git can make really effective use of (due to rule 2 and 3). And that newer Git must also safely deal with commits missing that new header, due to the huge number of commits out in the wild without said header. And don't even get me started on amending commits with new unknown headers. Existing implementions of Git tools will drop the extra headers during the amend, because the headers are viewed as part of the commit object data... and during an amend you are making a totally new object. For example, git-gui would drop any extra headers during an amend, because its running `git commit-tree` directly without any way to tell commit-tree this is for an amend of an existing commit, vs. a completely new commit... because either way its a new commit object. > Shouldn't an old git just ignore headers from a new git? Yes, see above. -- Shawn. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:26 ` Shawn O. Pearce @ 2010-02-03 19:40 ` demerphq 2010-02-03 20:42 ` Junio C Hamano 2010-02-04 0:41 ` A Large Angry SCM 2 siblings, 0 replies; 20+ messages in thread From: demerphq @ 2010-02-03 19:40 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Nicolas Pitre, git On 3 February 2010 20:26, Shawn O. Pearce <spearce@spearce.org> wrote: > demerphq <demerphq@gmail.com> wrote: >> On 3 February 2010 19:15, Nicolas Pitre <nico@fluxnic.net> wrote: >> > On Wed, 3 Feb 2010, Shawn O. Pearce wrote: >> > >> >> Am I correct that core C developers are still under the opinion >> >> that extra headers in a commit object aren't encouraged? >> > >> > I would say so. >> > >> > [...] >> >> At the end of the day, is it a bug that C git doesn't support >> >> working with extra commit headers? ?IMHO, no, because, we've >> >> rejected these in the past, and its not part of the Git standard. >> >> And other implementations shouldn't be trying to sell it that way. >> > >> > Agreed. ?And this was discussed in great length on this list on few >> > occasions already (probably more than a year back). >> >> One problem, is that if you take the approach you say then you >> basically guarantee that a new git that DOES add new headers will >> break an old git that doesnt know about the headers, and actually >> doesnt care about them either. > > As I understand it, the current stance is: > > 1) A compliant Git implementation ignores any headers it doesn't > recognize that appear *after* the optional "encoding" header. Ignores but passes through? > 2) A compliant Git implementation does not produce any additional > headers in a commit object, because other implementations cannot > perform any machine based reasoning on them. > > 3) All implementations would (eventually) treat all headers equally, > that is they all understand what author, committer, encoding are > and process them the same way. Any new headers should equally > be fully cross-implementation. > >> So it would essentially mean that if you ever have to change the >> commit format you will be in a position where new git commits will be >> incompatible by design with old git commits. > > So, we can change the format by adding a new header, after the > optional "encoding" header. > > But such a change needs to be something that an older Git will > safely ignore (due to rule 1), and something that a newer Git can > make really effective use of (due to rule 2 and 3). And that newer > Git must also safely deal with commits missing that new header, due > to the huge number of commits out in the wild without said header. > > And don't even get me started on amending commits with new unknown > headers. Existing implementions of Git tools will drop the extra > headers during the amend, because the headers are viewed as part > of the commit object data... and during an amend you are making a > totally new object. > > For example, git-gui would drop any extra headers during an amend, > because its running `git commit-tree` directly without any way to > tell commit-tree this is for an amend of an existing commit, vs. a > completely new commit... because either way its a new commit object. > >> Shouldn't an old git just ignore headers from a new git? > > Yes, see above. Right, which seems to sum to up to "that boat sailed, forget about it", which is fair enough. Which I say from the point of view of arbitrary headers not approved by the git dev team. You can ensure that any new *approved* headers have the semantics that "if they arent passed through it doesnt matter", whereas you cant know whether a header should be passed through or not that comes from some other source. Well unless you introduced a convention that some header prefix is to be preserved on amend, but other prefixes shouldnt be. I can imagine that might be a nasty place to go tho. :-) Anyway, thanks a lot for taking the time to explain this a bit more. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:26 ` Shawn O. Pearce 2010-02-03 19:40 ` demerphq @ 2010-02-03 20:42 ` Junio C Hamano 2010-02-03 21:04 ` Shawn O. Pearce 2010-02-04 0:41 ` A Large Angry SCM 2 siblings, 1 reply; 20+ messages in thread From: Junio C Hamano @ 2010-02-03 20:42 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: demerphq, Nicolas Pitre, git "Shawn O. Pearce" <spearce@spearce.org> writes: > As I understand it, the current stance is: > > 1) A compliant Git implementation ignores any headers it doesn't > recognize that appear *after* the optional "encoding" header. I first read the above to mean that you need to add encoding if you want to throw in other garbage. I would say "*after* the mandatory 'tree', 'parent' (0 or more), 'author', and 'committer' headers that must appear in this order", for clarity. > 2) A compliant Git implementation does not produce any additional > headers in a commit object, because other implementations cannot > perform any machine based reasoning on them. > > 3) All implementations would (eventually) treat all headers equally, > that is they all understand what author, committer, encoding are > and process them the same way. Any new headers should equally > be fully cross-implementation. These are very important points. In your made-up example you added "bug" (presumably to mean "fixes this bug") and "message-id" ("am-ed from this message"). The latter might make sense, but the former does not belong to the header, as it is not a statement of the fact. Forcing people to say "this fixes" at the commit time means you do not allow mistakes---it may turn out to be an incorrect or non fix later. When you are amending the commit to say "this does not really fix it", you would want to lose the old "bug" header, but you would want to keep the "message-id" one. There simply is not enough hint as to which ones must be carried across amending in the "we allow people to randomly throw extra headers into the commit object" model. It is not a model--it is chaos. Also it wouldn't be obvious to other people what got changed while comparing two commits (before and after the amend) if the information is hidden in the header. The right place for that kind of information is in the log message (if the nature of the information is for everybody to see) or in notes. Another major difference between extra random headers and notes is that the former changes the commit's object name, and if it is due to "random headers", it means you are breaking the object model for no good reason. Introducing extra headers needs to be done _very_ carefully after thinking things through, judging the pros and cons. Even though we kept the format open to allow us to extend the format to add essential statement of fact that we can make at the commit time (e.g. "encoding"), I do not foresee us adding any official extra headers in near future. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 20:42 ` Junio C Hamano @ 2010-02-03 21:04 ` Shawn O. Pearce 2010-02-04 0:38 ` Junio C Hamano 0 siblings, 1 reply; 20+ messages in thread From: Shawn O. Pearce @ 2010-02-03 21:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: demerphq, Nicolas Pitre, git Junio C Hamano <gitster@pobox.com> wrote: > "Shawn O. Pearce" <spearce@spearce.org> writes: > > > As I understand it, the current stance is: > > > > 1) A compliant Git implementation ignores any headers it doesn't > > recognize that appear *after* the optional "encoding" header. > > I first read the above to mean that you need to add encoding if you want > to throw in other garbage. > > I would say "*after* the mandatory 'tree', 'parent' (0 or more), 'author', > and 'committer' headers that must appear in this order", for clarity. Yes, sorry, of course that is what I meant. Thanks for the clarification. To add to that, "after encoding, if encoding is present". > > 2) A compliant Git implementation does not produce any additional > > headers in a commit object, because other implementations cannot > > perform any machine based reasoning on them. > > > > 3) All implementations would (eventually) treat all headers equally, > > that is they all understand what author, committer, encoding are > > and process them the same way. Any new headers should equally > > be fully cross-implementation. > > These are very important points. > > In your made-up example you added "bug" (presumably to mean "fixes this > bug") and "message-id" ("am-ed from this message"). The latter might make > sense, but the former does not belong to the header, as it is not a > statement of the fact. This all came out of what appears to be a tool to bridge another VCS system data into Git. Ala git-svn. We all know that some other systems, e.g. SVN, permit adding additional properties to commits, and that often these are used to make statements like "Fixed bug NNNN", and bug tracking systems integrate into SVN by reading or updating those properties. So you, Nico, myself, might all agree that "bug" does not belong in the header, but many others see it like SVN sees additional properties on a revision, and thus it goes there. Hence the artifical example. It seems that it is not that artifical outside of our mailing list. > Forcing people to say "this fixes" at the commit time means you do not > allow mistakes---it may turn out to be an incorrect or non fix later. Yup, happens often. > When you are amending the commit to say "this does not really fix it", you > would want to lose the old "bug" header, but you would want to keep the > "message-id" one. There simply is not enough hint as to which ones must > be carried across amending in the "we allow people to randomly throw extra > headers into the commit object" model. It is not a model--it is chaos. Exactly. That's what I had thought our position was, for exactly this reason, it very quickly devolves into a chaos we can't reason about, let alone write code to support for end-users. > Also it wouldn't be obvious to other people what got changed while > comparing two commits (before and after the amend) if the information is > hidden in the header. The right place for that kind of information is in > the log message (if the nature of the information is for everybody to see) > or in notes. I'm afraid users might insert their own headers, then come report the bug that `git log` and `git show` don't make those headers visible when formatting the commit. After all, they show the author committer, and parent information when you use the right flags. We'll of course say, its not in the message, and suggest using the footer style like our Signed-off-by lines, or notes, which appear below the message if requested. > Introducing extra headers needs to be done _very_ carefully after thinking > things through, judging the pros and cons. Even though we kept the format > open to allow us to extend the format to add essential statement of fact > that we can make at the commit time (e.g. "encoding"), I do not foresee us > adding any official extra headers in near future. Right, me neither, because everything that has been proposed for an extra header (e.g. bug id, Message-Id from the email it as git-amed from, rename tracking, ...) has all been suggested to be better positioned in the message itself, or in a note, or not at all... -- Shawn. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 21:04 ` Shawn O. Pearce @ 2010-02-04 0:38 ` Junio C Hamano 0 siblings, 0 replies; 20+ messages in thread From: Junio C Hamano @ 2010-02-04 0:38 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: demerphq, Nicolas Pitre, git "Shawn O. Pearce" <spearce@spearce.org> writes: > We all know that some other systems, e.g. SVN, permit adding > additional properties to commits, and that often these are used > to make statements like "Fixed bug NNNN", and bug tracking systems > integrate into SVN by reading or updating those properties. > > So you, Nico, myself, might all agree that "bug" does not belong > in the header, but many others see it like SVN sees additional > properties on a revision, and thus it goes there. > > Hence the artifical example. It seems that it is not that artifical > outside of our mailing list. Aren't the meta-properties like "Fixed bug NNNN" something you can add after the fact, even in SVN? We have that in "notes". I never said people are wrong for wanting to record additional information _about_ commits somewhere (and I didn't say "artificial" at all---it was you who said it was a "made-up" example). My point was that they do not belong to the commit _header_, and "but many others see" doesn't contradict with that. Many others may feel the need to be able to express random things _about_ the commit; it does not mean these random things have to go _in_ the commit. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:26 ` Shawn O. Pearce 2010-02-03 19:40 ` demerphq 2010-02-03 20:42 ` Junio C Hamano @ 2010-02-04 0:41 ` A Large Angry SCM 2 siblings, 0 replies; 20+ messages in thread From: A Large Angry SCM @ 2010-02-04 0:41 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: demerphq, Nicolas Pitre, git Shawn O. Pearce wrote: > demerphq <demerphq@gmail.com> wrote: >> On 3 February 2010 19:15, Nicolas Pitre <nico@fluxnic.net> wrote: >>> On Wed, 3 Feb 2010, Shawn O. Pearce wrote: >>> >>>> Am I correct that core C developers are still under the opinion >>>> that extra headers in a commit object aren't encouraged? >>> I would say so. >>> >>> [...] >>>> At the end of the day, is it a bug that C git doesn't support >>>> working with extra commit headers? ?IMHO, no, because, we've >>>> rejected these in the past, and its not part of the Git standard. >>>> And other implementations shouldn't be trying to sell it that way. >>> Agreed. ?And this was discussed in great length on this list on few >>> occasions already (probably more than a year back). >> One problem, is that if you take the approach you say then you >> basically guarantee that a new git that DOES add new headers will >> break an old git that doesnt know about the headers, and actually >> doesnt care about them either. > > As I understand it, the current stance is: > > 1) A compliant Git implementation ignores any headers it doesn't > recognize that appear *after* the optional "encoding" header. > > 2) A compliant Git implementation does not produce any additional > headers in a commit object, because other implementations cannot > perform any machine based reasoning on them. > > 3) All implementations would (eventually) treat all headers equally, > that is they all understand what author, committer, encoding are > and process them the same way. Any new headers should equally > be fully cross-implementation. > >> So it would essentially mean that if you ever have to change the >> commit format you will be in a position where new git commits will be >> incompatible by design with old git commits. > > So, we can change the format by adding a new header, after the > optional "encoding" header. > > But such a change needs to be something that an older Git will > safely ignore (due to rule 1), and something that a newer Git can > make really effective use of (due to rule 2 and 3). And that newer > Git must also safely deal with commits missing that new header, due > to the huge number of commits out in the wild without said header. > > And don't even get me started on amending commits with new unknown > headers. Existing implementions of Git tools will drop the extra > headers during the amend, because the headers are viewed as part > of the commit object data... and during an amend you are making a > totally new object. > > For example, git-gui would drop any extra headers during an amend, > because its running `git commit-tree` directly without any way to > tell commit-tree this is for an amend of an existing commit, vs. a > completely new commit... because either way its a new commit object. > >> Shouldn't an old git just ignore headers from a new git? > > Yes, see above. > 4) C-git "owns" the header name space. The git ML is _the_ controlling standards body. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:01 ` demerphq 2010-02-03 19:26 ` Shawn O. Pearce @ 2010-02-03 19:26 ` Petr Baudis 2010-02-03 19:43 ` demerphq 2010-02-03 20:03 ` Nicolas Pitre 1 sibling, 2 replies; 20+ messages in thread From: Petr Baudis @ 2010-02-03 19:26 UTC (permalink / raw) To: demerphq; +Cc: Nicolas Pitre, Shawn O. Pearce, git On Wed, Feb 03, 2010 at 08:01:17PM +0100, demerphq wrote: > Shouldn't an old git just ignore headers from a new git? > > I mean, forget about the fact that somebody is doing something naughty > with the git protocol, ask youself if you want this rule to basically > prevent any backwards compatible changes with older gits. We have done similar changes in the past and if there would be such a change, we can phase-in it over the course of several releases. I think the fall-out would not be that bad; we have some experience with even making Debian-stable Git compatible with new stuff. ;-) Also, what if any extra header would be essential and we _wanted_ non-compatible Git to break down on it? On the other hand, allowing this preventively would apparently have the immediate effect of alternative implementations users happily starting to use it, and then to get to the data, people would demand git-core support as well. _And_ so far everyone seems really really fairly sure we don't want the headers and it's not likely to change. P.S.: On the other hand, I think that change was probably just misguided, not malicious. And I wouldn't be that hard on Dulwich, it's an early-0.x software after all, it's allowed to crash and have protocol issues. ;-) -- Petr "Pasky" Baudis If you can't see the value in jet powered ants you should turn in your nerd card. -- Dunbal (464142) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:26 ` Petr Baudis @ 2010-02-03 19:43 ` demerphq 2010-02-03 20:31 ` Shawn O. Pearce 2010-02-03 20:03 ` Nicolas Pitre 1 sibling, 1 reply; 20+ messages in thread From: demerphq @ 2010-02-03 19:43 UTC (permalink / raw) To: Petr Baudis; +Cc: Nicolas Pitre, Shawn O. Pearce, git On 3 February 2010 20:26, Petr Baudis <pasky@suse.cz> wrote: > On Wed, Feb 03, 2010 at 08:01:17PM +0100, demerphq wrote: >> Shouldn't an old git just ignore headers from a new git? >> >> I mean, forget about the fact that somebody is doing something naughty >> with the git protocol, ask youself if you want this rule to basically >> prevent any backwards compatible changes with older gits. > > We have done similar changes in the past and if there would be such > a change, we can phase-in it over the course of several releases. > I think the fall-out would not be that bad; we have some experience > with even making Debian-stable Git compatible with new stuff. ;-) > Also, what if any extra header would be essential and we _wanted_ > non-compatible Git to break down on it? Right. The only solution i can see would have had to have been implemented already. And that would involved some headers being marked "pass through", some "marked throw away on cherry-pick" and some "choke horribly if you find this and dont know what it is". And even with somethng like that one wonders if notes arent really a better alternative to user defined headers anyway? > On the other hand, allowing this preventively would apparently have > the immediate effect of alternative implementations users happily > starting to use it, and then to get to the data, people would demand > git-core support as well. _And_ so far everyone seems really really > fairly sure we don't want the headers and it's not likely to change. Yes, right understood. > > P.S.: On the other hand, I think that change was probably just > misguided, not malicious. And I wouldn't be that hard on Dulwich, > it's an early-0.x software after all, it's allowed to crash and have > protocol issues. ;-) Heh. I have no opinion on Dulwich. Didnt even know it existed until this mail. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:43 ` demerphq @ 2010-02-03 20:31 ` Shawn O. Pearce 0 siblings, 0 replies; 20+ messages in thread From: Shawn O. Pearce @ 2010-02-03 20:31 UTC (permalink / raw) To: demerphq; +Cc: Petr Baudis, Nicolas Pitre, git demerphq <demerphq@gmail.com> wrote: > On 3 February 2010 20:26, Petr Baudis <pasky@suse.cz> wrote: > Right. The only solution i can see would have had to have been > implemented already. And that would involved some headers being marked > "pass through", some "marked throw away on cherry-pick" and some > "choke horribly if you find this and dont know what it is". > > And even with somethng like that one wonders if notes arent really a > better alternative to user defined headers anyway? Yes, exactly. I think notes turn out to be a much better way to store this extra data, provided you are OK with them being disconnected during an amend, cherry-pick, filter-branch, or rebase... :-) And unlike additional headers, git implementations will likely support notes, because they are a good way to attach additional user data onto commits. -- Shawn. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:26 ` Petr Baudis 2010-02-03 19:43 ` demerphq @ 2010-02-03 20:03 ` Nicolas Pitre 1 sibling, 0 replies; 20+ messages in thread From: Nicolas Pitre @ 2010-02-03 20:03 UTC (permalink / raw) To: Petr Baudis; +Cc: demerphq, Shawn O. Pearce, git On Wed, 3 Feb 2010, Petr Baudis wrote: > On Wed, Feb 03, 2010 at 08:01:17PM +0100, demerphq wrote: > > Shouldn't an old git just ignore headers from a new git? > > > > I mean, forget about the fact that somebody is doing something naughty > > with the git protocol, ask youself if you want this rule to basically > > prevent any backwards compatible changes with older gits. > > We have done similar changes in the past and if there would be such > a change, we can phase-in it over the course of several releases. > I think the fall-out would not be that bad; we have some experience > with even making Debian-stable Git compatible with new stuff. ;-) Heh... That's because I was crazy enough to do that work so the new features I implemented in the latest version could be enabled by default sooner. And incidentally those features weren't controvertial at all which sorta helped. Nicolas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce 2010-02-03 18:15 ` Nicolas Pitre @ 2010-02-03 19:53 ` Sverre Rabbelier 2010-02-03 19:58 ` Scott Chacon 2010-02-03 20:58 ` Jelmer Vernooij 3 siblings, 0 replies; 20+ messages in thread From: Sverre Rabbelier @ 2010-02-03 19:53 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git, Jelmer Vernooij, Jelmer Vernooij Heya, [+cc Jelmer] On Wed, Feb 3, 2010 at 18:40, Shawn O. Pearce <spearce@spearce.org> wrote: > I haven't spoken with Jelmer Vernooij directly about it, but after > some indirect email through a 3rd party, it seems he might be under > the impression that this really is a bug in Dulwich, because "other > git implementations do it". That would seem like the #1 thing to do, I'm sure Jelmer (cc-ed) can both benefit from this discussion, and perhaps explain what is going on from first hand. Full thread as it's developing can be found here [0]. Jelmer, you can just reply to this, no need to subscribe or such. Also, it's custom on the git list to cc all involved, so you should be in on the conversation for any emails that are a reply to mine. [0] http://thread.gmane.org/gmane.comp.version-control.git/138848 -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce 2010-02-03 18:15 ` Nicolas Pitre 2010-02-03 19:53 ` Sverre Rabbelier @ 2010-02-03 19:58 ` Scott Chacon 2010-02-03 22:48 ` Shawn O. Pearce 2010-02-03 20:58 ` Jelmer Vernooij 3 siblings, 1 reply; 20+ messages in thread From: Scott Chacon @ 2010-02-03 19:58 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git Hey, On Wed, Feb 3, 2010 at 9:40 AM, Shawn O. Pearce <spearce@spearce.org> wrote: > Today I came across this "bug fix" [1,2] in Dulwich, which is > claiming to be a pure-Python implementation of Git. > > I haven't spoken with Jelmer Vernooij directly about it, but after > some indirect email through a 3rd party, it seems he might be under > the impression that this really is a bug in Dulwich, because "other > git implementations do it". At the risk of pissing you off for the second time in as many days, this is entirely my fault. I was having a beer with Jelmer in Wellington a few weeks ago during LinuxConf.au and we were talking about the difficulties in storing metadata having to do with cross-vcs migrations - specifically his work with an bzr-git bridge and mine with the hg-git project. He was noting that I kept all my metadata about original Hg commits in Git as formatted text in the commit message, which is pretty uggo (especially with the amount of sometimes inconsistent denormalization of data Hg does on commit, explicitly recording renames and manifests and whatnot). Anyhow, I was saying that _technically_ you can artificially write extra headers into the commit object (though at the time Dulwich didn't support reading them because of how it parsed commit objects - I believe it would actually explode if it saw something it didn't expect). I said I was still going to keep the metadata in my implementation in the message, but he was very interested in hiding his in the commit headers. To my defense, we (you and I, Shawn) talked about this at the GitTogether this year and you and a few others told me that CGit would not blow up but would just ignore them, which is fine for his purposes. I certainly did not get the impression from that short discussion that this was something to be absolutely avoided, but rather that it just wasn't really encouraged or explicitly supported. Oddly enough, this whole thing basically came up because we were noting that you can hide extra data in Hg changesets, but it's a ridiculous hack involving adding it after a null byte in the timestamp field, much like we do in adding the capabilities after the first ref in the negotiation phase of the tranfer protocol. I was just casually saying, "yeah, you can actually technically do that a lot cleaner in Git"... Sorry. So, for future reference, though CGit _can_ handle it, don't? thanks, Scott ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 19:58 ` Scott Chacon @ 2010-02-03 22:48 ` Shawn O. Pearce 2010-02-04 6:24 ` Mike Hommey 0 siblings, 1 reply; 20+ messages in thread From: Shawn O. Pearce @ 2010-02-03 22:48 UTC (permalink / raw) To: Scott Chacon; +Cc: git Scott Chacon <schacon@gmail.com> wrote: > On Wed, Feb 3, 2010 at 9:40 AM, Shawn O. Pearce <spearce@spearce.org> wrote: > > Today I came across this "bug fix" [1,2] in Dulwich, which is > > claiming to be a pure-Python implementation of Git. > > > > I haven't spoken with Jelmer Vernooij directly about it, but after > > some indirect email through a 3rd party, it seems he might be under > > the impression that this really is a bug in Dulwich, because "other > > git implementations do it". > > At the risk of pissing you off for the second time in as many days, > this is entirely my fault. Apparently, s**t happens is a good phrase. One I need to learn. > I was having a beer with Jelmer in Wellington a few weeks ago And... beer doesn't promote clear thinking. All is forgiven. As is yesterday's remark about not telling me sooner about a JGit bug. You really didn't do anything bad, I just woke up on the wrong side of the bed the past couple of days, and sort of went off... Sorry. :-\ > Anyhow, I was saying that _technically_ you can artificially write > extra headers into the commit object (though at the time Dulwich > didn't support reading them because of how it parsed commit objects - > I believe it would actually explode if it saw something it didn't > expect). I said I was still going to keep the metadata in my > implementation in the message, but he was very interested in hiding > his in the commit headers. Yea, everyone wants to hide that extra metadata. I never get why. Even in SVN. Why wouldn't I want to see the bug(s) fixed by a commit? Difference of opinion. I also happen to prefer the color blue. Dammit, everyone should prefer blue. > To my defense, we (you and I, Shawn) > talked about this at the GitTogether this year and you and a few > others told me that CGit would not blow up but would just ignore them, > which is fine for his purposes. I certainly did not get the > impression from that short discussion that this was something to be > absolutely avoided, but rather that it just wasn't really encouraged > or explicitly supported. Sorry. I've held this same opinion as Junio and Nico have expressed in this thread, that although we ignore extra headers, its only to leave us an escape hatch in case we add something like "encoding" in the future. Adding encoding was almost a nightmare because we didn't have that escape hatch. I also hold the opinion that the C implementation is correct, and everyone else is wrong. Even JGit. Unless its a bug in the C implementation, in which case the bug fix is correct. :-) Which in this case means, if the C implementation doesn't give the user plumbing to do something (aside from using git mkobject), you really should think twice before doing it. So I apologize if I gave you the wrong impression at the GitTogether. I claim stupidity as my only defense. > Sorry. So, for future reference, though CGit _can_ handle it, don't? C Git won't choke if there are extra headers. But we _really_ don't want them. And C Git won't be writing any new headers anytime soon. I think we're more likely to shift the entire hashing scheme to SHA-512 or something before we add a new header. -- Shawn. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 22:48 ` Shawn O. Pearce @ 2010-02-04 6:24 ` Mike Hommey 0 siblings, 0 replies; 20+ messages in thread From: Mike Hommey @ 2010-02-04 6:24 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Scott Chacon, git On Wed, Feb 03, 2010 at 02:48:35PM -0800, Shawn O. Pearce wrote: > > Anyhow, I was saying that _technically_ you can artificially write > > extra headers into the commit object (though at the time Dulwich > > didn't support reading them because of how it parsed commit objects - > > I believe it would actually explode if it saw something it didn't > > expect). I said I was still going to keep the metadata in my > > implementation in the message, but he was very interested in hiding > > his in the commit headers. > > Yea, everyone wants to hide that extra metadata. I never get why. > Even in SVN. Why wouldn't I want to see the bug(s) fixed by > a commit? Difference of opinion. I also happen to prefer the > color blue. Dammit, everyone should prefer blue. Note, though, that such information may change in the future, in which case you can't rewrite the commit to fit that. But for all that, there are git-notes, now, aren't there ? Mike ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce ` (2 preceding siblings ...) 2010-02-03 19:58 ` Scott Chacon @ 2010-02-03 20:58 ` Jelmer Vernooij 2010-02-03 21:17 ` Nicolas Pitre 2010-02-03 22:39 ` Shawn O. Pearce 3 siblings, 2 replies; 20+ messages in thread From: Jelmer Vernooij @ 2010-02-03 20:58 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git [-- Attachment #1: Type: text/plain, Size: 3779 bytes --] Hi Shawn, On Wed, 2010-02-03 at 09:40 -0800, Shawn O. Pearce wrote: > Am I correct that core C developers are still under the opinion > that extra headers in a commit object aren't encouraged? > > That is, we shouldn't see something like this made-up example: > > $ git cat-file commit HEAD > tree e0fb24d872e2daa1507ea5879e1cdce5c0da9902 > parent ec0865178ad6d8dab9ccd82b07bc3f3dae20542a > parent 89d61592bddda4dfcb90314be9e06479f712bb7f > author Junio C Hamano <gitster@pobox.com> 1265176189 -0800 > committer Junio C Hamano <gitster@pobox.com> 1265176189 -0800 > bug 18389 > url http://example.com/some/mailing/list/post > message-id <gitster-182819131@gitster.computer> > > Merge git://repo.or.cz/git-gui into next > > (Sorry Junio for picking on your latest next merge...) > Today I came across this "bug fix" [1,2] in Dulwich, which is > claiming to be a pure-Python implementation of Git. > > [1] http://git.samba.org/?p=jelmer/dulwich.git;a=commit;h=bc8d73f1146afba8828a7dadbb4320f592cddcab > [2] http://git.samba.org/?p=jelmer/dulwich.git;a=commitdiff;h=bc8d73f1146afba8828a7dadbb4320f592cddcab;hp=4e50426fb72e6c9259feecbba5bfcf053af62335 > > I haven't spoken with Jelmer Vernooij directly about it, but after > some indirect email through a 3rd party, it seems he might be under > the impression that this really is a bug in Dulwich, because "other > git implementations do it". If you have concerns like this in the future, please don't hesitate to contact me directly. I don't follow the git list because it's a high-volume list where pretty much all traffic is irrelevant to me. The only reason I became aware of this thread was because Sverre CC'ed me. > Uhm. Originally I was under the impression that custom headers would break (by reading the C Git source code) and so Dulwich made that assumption, but after hearing from several people (among whom Scott, see his reply) at Linux.Conf.Au that custom headers could be added and were ignored by C git I made this change. Since Dulwich would blow up when it encountered custom headers that might be set by other Git implements and since (as I understand) C git ignores unknown headers, I called this a bug fix. This change made it possible to deal with custom headers whenever they would appear *and* allowed users of the Dulwich API to set custom headers. (FWIW I haven't actually seen anybody setting custom headers) If this is indeed a misunderstanding, I'll happily make this datastructure with custom headers read-only. [...] > Yes, there are many other Git implementations. But I thought nearly > all of them were toys, and none of them were even close to serving > the kind of production volume that JGit serves, and JGit isn't even > considered a production library by most. Yet JGit always tries to > conform to whatever standard is set by the C implementation. So does Dulwich. I've fixed issues in the compatibility with C Git when I've noticed them or have been made aware of them. Any incompatibilities are the result of ignorance on my part rather than malicious intent. [...] > We're starting to see a fork in the basic protocols happen. Hell, > Dulwich 0.4.1 isn't even capable of speaking over the network to > C Git, but it does talk to itself, so its valid, right? :-( I've been using Dulwich's client to talk to C Git servers for ages and haven't seen issues. I would appreciate hearing about incompatibilities. If you're talking about the server side - we know it's broken, at least dul-daemon. Nobody (except for API changes) has really cared about it since John Carr originally hacked it up. I'd be surprised if it even works with the Dulwich client. Cheers, Jelmer [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 20:58 ` Jelmer Vernooij @ 2010-02-03 21:17 ` Nicolas Pitre 2010-02-03 22:39 ` Shawn O. Pearce 1 sibling, 0 replies; 20+ messages in thread From: Nicolas Pitre @ 2010-02-03 21:17 UTC (permalink / raw) To: Jelmer Vernooij; +Cc: Shawn O. Pearce, git On Wed, 3 Feb 2010, Jelmer Vernooij wrote: > Since Dulwich would blow up when it encountered custom headers that > might be set by other Git implements and since (as I understand) C git > ignores unknown headers, I called this a bug fix. This change made it > possible to deal with custom headers whenever they would appear *and* > allowed users of the Dulwich API to set custom headers. > > (FWIW I haven't actually seen anybody setting custom headers) > > If this is indeed a misunderstanding, I'll happily make this > datastructure with custom headers read-only. Please do so. It is best to consider the Git note facility for the addition of such custom notations. Notes can be attached to commits and changed at will while the commit objects themselves cannot (unless you rewrite history). Nicolas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: extra headers in commit objects 2010-02-03 20:58 ` Jelmer Vernooij 2010-02-03 21:17 ` Nicolas Pitre @ 2010-02-03 22:39 ` Shawn O. Pearce 1 sibling, 0 replies; 20+ messages in thread From: Shawn O. Pearce @ 2010-02-03 22:39 UTC (permalink / raw) To: Jelmer Vernooij; +Cc: git Jelmer Vernooij <jelmer@samba.org> wrote: > On Wed, 2010-02-03 at 09:40 -0800, Shawn O. Pearce wrote: > > > > I haven't spoken with Jelmer Vernooij directly about it, but after > > some indirect email through a 3rd party, it seems he might be under > > the impression that this really is a bug in Dulwich, because "other > > git implementations do it". > > If you have concerns like this in the future, please don't hesitate to > contact me directly. OK. > I don't follow the git list because it's a > high-volume list where pretty much all traffic is irrelevant to me. The > only reason I became aware of this thread was because Sverre CC'ed me. I probably should have CC'd you in from the beginning, sorry. Its true, this is a high-volume list. But we don't see much, if anything, about Dulwich here. Yet I for one like to see discussion about other implementations here, to some extent, so its easier to make sure everyone is staying close to the C implementation's reference standard. > Originally I was under the impression that custom headers would break > (by reading the C Git source code) and so Dulwich made that assumption, > but after hearing from several people (among whom Scott, see his reply) > at Linux.Conf.Au that custom headers could be added and were ignored by > C git I made this change. Yes, apparently Scott didn't quite represent things accurately. Oh well, it seems its been raised now, and beaten to death. > Since Dulwich would blow up when it encountered custom headers that > might be set by other Git implements and since (as I understand) C git > ignores unknown headers, I called this a bug fix. That's true, and I'm glad you have made that change to Dulwich. It is a good bug fix to skip over headers you don't recognize. But, its a new incompatible feature to support writing extra headers. > If this is indeed a misunderstanding, I'll happily make this > datastructure with custom headers read-only. Yes. Please see the other messages in this thread, especially from Nico and Junio. Setting other headers is not a good idea, and you shouldn't encourage it in Dulwich by making an API available. > > Yes, there are many other Git implementations. But I thought nearly > > all of them were toys, and none of them were even close to serving > > the kind of production volume that JGit serves, and JGit isn't even > > considered a production library by most. Yet JGit always tries to > > conform to whatever standard is set by the C implementation. > > So does Dulwich. I've fixed issues in the compatibility with C Git when > I've noticed them or have been made aware of them. Any incompatibilities > are the result of ignorance on my part rather than malicious intent. I'm glad to hear that. See above about keeping discussion related to other Git implementations here. We're happy to help explain something that is perhaps vague or poorly specified. Not everyone has the answer right away, but usually the list fills in everything. > > We're starting to see a fork in the basic protocols happen. Hell, > > Dulwich 0.4.1 isn't even capable of speaking over the network to > > C Git, but it does talk to itself, so its valid, right? :-( > > I've been using Dulwich's client to talk to C Git servers for ages and > haven't seen issues. I would appreciate hearing about > incompatibilities. OK, I haven't actually looked at the Dulwich client code... so I don't know what its current state is. > If you're talking about the server side - we know it's broken, at least > dul-daemon. Nobody (except for API changes) has really cared about it > since John Carr originally hacked it up. I'd be surprised if it even > works with the Dulwich client. OK, then you may be interested in some of the patches my friend Dave worked up (he said he was going to send them to you). Dave discovered the server wasn't playing nice with C git, and asked me for some protocol help to get it going again. I'm glad its only an issue of neglect (lack of time) and not something else that has caused it to be incompatible. -- Shawn. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-02-04 6:25 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-03 17:40 extra headers in commit objects Shawn O. Pearce 2010-02-03 18:15 ` Nicolas Pitre 2010-02-03 19:01 ` demerphq 2010-02-03 19:26 ` Shawn O. Pearce 2010-02-03 19:40 ` demerphq 2010-02-03 20:42 ` Junio C Hamano 2010-02-03 21:04 ` Shawn O. Pearce 2010-02-04 0:38 ` Junio C Hamano 2010-02-04 0:41 ` A Large Angry SCM 2010-02-03 19:26 ` Petr Baudis 2010-02-03 19:43 ` demerphq 2010-02-03 20:31 ` Shawn O. Pearce 2010-02-03 20:03 ` Nicolas Pitre 2010-02-03 19:53 ` Sverre Rabbelier 2010-02-03 19:58 ` Scott Chacon 2010-02-03 22:48 ` Shawn O. Pearce 2010-02-04 6:24 ` Mike Hommey 2010-02-03 20:58 ` Jelmer Vernooij 2010-02-03 21:17 ` Nicolas Pitre 2010-02-03 22:39 ` Shawn O. Pearce
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).