* Tree with leading '0' modes in 1.7.0.3 @ 2010-03-26 21:56 Shawn O. Pearce 2010-03-26 22:26 ` Jonathan Nieder 0 siblings, 1 reply; 33+ messages in thread From: Shawn O. Pearce @ 2010-03-26 21:56 UTC (permalink / raw) To: git, mike.lifeguard Mike (CC'd) found a bad Git tree today, where the modes for subtrees where formatted using a leading '0': $ od -c tree 0000000 1 0 0 6 4 4 R E A D M E \0 244 \r 0000020 233 214 350 375 0 263 374 227 264 343 $ 031 027 ` 373 301 0000040 ! h 0 4 0 0 0 0 m o d u l e s 0000060 \0 262 z K 240 4 377 \ 245 C c " 231 377 \n t 0000100 , \n O R E 0 4 0 0 0 0 s t e w 0000120 a r d b o t \0 037 \b 5 262 345 234 034 303 C 0000140 373 335 207 300 u 341 277 \f ] 320 207 0000153 The '0' on the 3rd line after '! h' is wrong. It shouldn't be here. Likewise the '0' on the 5th line after "O R E" is also wrong. At least its consistently broken. But its still broken by fsck standards: $ git fsck --full a39aa6d warning in tree a39aa6d4a6dcfd6c14d8f818bbdf1dfcb3e11771: contains zero-padded file modes Mike claims this tree was created with git-core 1.7.0.3. This thread actually started over on Gerrit Code Review's mailing list [1], because JGit refuses to allow this malformed tree mode to pass its fsck implementation. Any ideas? Why is Git 1.7.0.3 jamming a leading '0' on a file mode? [1] https://groups.google.com/group/repo-discuss/browse_thread/thread/6ff8d7ffba5a9775 -- Shawn. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 21:56 Tree with leading '0' modes in 1.7.0.3 Shawn O. Pearce @ 2010-03-26 22:26 ` Jonathan Nieder 2010-03-26 22:29 ` Shawn O. Pearce 0 siblings, 1 reply; 33+ messages in thread From: Jonathan Nieder @ 2010-03-26 22:26 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git, mike.lifeguard Shawn O. Pearce wrote: > Any ideas? Why is Git 1.7.0.3 jamming a leading '0' on a file mode? See http://thread.gmane.org/gmane.comp.version-control.git/141028 and commit c88f0cc (notes: fix malformed tree entry, 2010-02-24). The regression that that fixes appeared in 61a7cca0 (Notes API: write_notes_tree(): Store the notes tree in the database, 2010-02-13), which is not part of 1.7.0.3. Still, HTH, Jonathan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 22:26 ` Jonathan Nieder @ 2010-03-26 22:29 ` Shawn O. Pearce 2010-03-26 22:40 ` Jonathan Nieder 2010-03-26 22:59 ` Mike.lifeguard 0 siblings, 2 replies; 33+ messages in thread From: Shawn O. Pearce @ 2010-03-26 22:29 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git, mike.lifeguard Jonathan Nieder <jrnieder@gmail.com> wrote: > Shawn O. Pearce wrote: > > > Any ideas? Why is Git 1.7.0.3 jamming a leading '0' on a file mode? > > See http://thread.gmane.org/gmane.comp.version-control.git/141028 > and commit c88f0cc (notes: fix malformed tree entry, 2010-02-24). > > The regression that that fixes appeared in 61a7cca0 (Notes API: > write_notes_tree(): Store the notes tree in the database, 2010-02-13), > which is not part of 1.7.0.3. That may be true... but I doubt the tree in question was a notes tree. The path entries were names like 'README', 'modules' and 'stewardbot'. Something I would assume was the project's source tree, not its notes tree. Unless someone abused the note tree editor to edit the README or something... -- Shawn. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 22:29 ` Shawn O. Pearce @ 2010-03-26 22:40 ` Jonathan Nieder 2010-03-26 23:09 ` Junio C Hamano 2010-03-26 22:59 ` Mike.lifeguard 1 sibling, 1 reply; 33+ messages in thread From: Jonathan Nieder @ 2010-03-26 22:40 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git, mike.lifeguard Shawn O. Pearce wrote: > Jonathan Nieder <jrnieder@gmail.com> wrote: >> Shawn O. Pearce wrote: >>> Any ideas? Why is Git 1.7.0.3 jamming a leading '0' on a file mode? >> >> See http://thread.gmane.org/gmane.comp.version-control.git/141028 >> and commit c88f0cc (notes: fix malformed tree entry, 2010-02-24). >> >> The regression that that fixes appeared in 61a7cca0 (Notes API: >> write_notes_tree(): Store the notes tree in the database, 2010-02-13), >> which is not part of 1.7.0.3. > > That may be true... but I doubt the tree in question was a notes > tree. The path entries were names like 'README', 'modules' and > 'stewardbot'. Something I would assume was the project's source > tree, not its notes tree. Yes, true. The problem is probably elsewhere, especially because 1.7.0.3 doesn’t even have that commit. Still, I find this a bit strange because such breakage should have been noticeable if it happens often. What has changed recently that involves writing trees? Jonathan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 22:40 ` Jonathan Nieder @ 2010-03-26 23:09 ` Junio C Hamano 0 siblings, 0 replies; 33+ messages in thread From: Junio C Hamano @ 2010-03-26 23:09 UTC (permalink / raw) To: Jonathan Nieder; +Cc: Shawn O. Pearce, git, mike.lifeguard Jonathan Nieder <jrnieder@gmail.com> writes: > Shawn O. Pearce wrote: >> Jonathan Nieder <jrnieder@gmail.com> wrote: >>> Shawn O. Pearce wrote: > >>>> Any ideas? Why is Git 1.7.0.3 jamming a leading '0' on a file mode? >>> >>> See http://thread.gmane.org/gmane.comp.version-control.git/141028 >>> and commit c88f0cc (notes: fix malformed tree entry, 2010-02-24). >>> >>> The regression that that fixes appeared in 61a7cca0 (Notes API: >>> write_notes_tree(): Store the notes tree in the database, 2010-02-13), >>> which is not part of 1.7.0.3. >> >> That may be true... but I doubt the tree in question was a notes >> tree. The path entries were names like 'README', 'modules' and >> 'stewardbot'. Something I would assume was the project's source >> tree, not its notes tree. > > Yes, true. The problem is probably elsewhere, especially because > 1.7.0.3 doesn’t even have that commit. Still, I find this a bit > strange because such breakage should have been noticeable if it > happens often. > > What has changed recently that involves writing trees? Asking grep for "%06o" reveals nothing. Perhaps somebody else's imitation implementation? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 22:29 ` Shawn O. Pearce 2010-03-26 22:40 ` Jonathan Nieder @ 2010-03-26 22:59 ` Mike.lifeguard 2010-03-26 23:05 ` Shawn O. Pearce 1 sibling, 1 reply; 33+ messages in thread From: Mike.lifeguard @ 2010-03-26 22:59 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jonathan Nieder, git, schacon -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10-03-26 07:29 PM, Shawn O. Pearce wrote: > Something I would assume was the project's source > tree, not its notes tree. Yes, it is the source tree. We don't even know what a notes tree is. Apparently Scott Chacon has some clue about this error: http://support.github.com/discussions/repos/2566-strange-warning-from-fsck-and-github-repo-using-too-much-diskspace so I've added him to CC. (Note that changing all SHA1s is not really a problem for us, there are only 3 copies of the repo, and the project has only been using version control for 2 days) Thanks, all - -Mike -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkutPG0ACgkQst0AR/DaKHuRtQCdEyy/KIWwpNYUA4EnkHGy2Y3D chwAoLDzdhD9dmmn5mdkJxGrL5Kjlf4/ =k2Eg -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 22:59 ` Mike.lifeguard @ 2010-03-26 23:05 ` Shawn O. Pearce 2010-03-26 23:22 ` Mike.lifeguard 2010-03-26 23:50 ` Junio C Hamano 0 siblings, 2 replies; 33+ messages in thread From: Shawn O. Pearce @ 2010-03-26 23:05 UTC (permalink / raw) To: Mike.lifeguard, Scott Chacon; +Cc: Jonathan Nieder, git "Mike.lifeguard" <mike.lifeguard@gmail.com> wrote: > Apparently Scott Chacon has some clue about this error: > http://support.github.com/discussions/repos/2566-strange-warning-from-fsck-and-github-repo-using-too-much-diskspace > so I've added him to CC. (Note that changing all SHA1s is not really a > problem for us, there are only 3 copies of the repo, and the project has > only been using version control for 2 days) Scott, please fix that library on GitHub. JGit's fsck has a hard failure on these malformed trees, because the leading '0' mode causes the tree to come up with the wrong SHA-1 hash given its logical content. They shouldn't be created like this. Mike, it sounds like you might be able to fix your project by just running something like: $ git filter-branch --index-filter '' --all It rewrites the trees, which will change their SHA-1s (and the commit SHA-1s downstream from there) with correctly formatted tree objects. -- Shawn. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 23:05 ` Shawn O. Pearce @ 2010-03-26 23:22 ` Mike.lifeguard 2010-03-26 23:49 ` Jonathan Nieder 2010-03-26 23:50 ` Junio C Hamano 1 sibling, 1 reply; 33+ messages in thread From: Mike.lifeguard @ 2010-03-26 23:22 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Scott Chacon, Jonathan Nieder, git -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10-03-26 08:05 PM, Shawn O. Pearce wrote: > $ git filter-branch --index-filter '' --all This and a few other variations I tried does rewrite things, but the problem persists: mikelifeguard@arbour:~/Code/git/stewbot (master)$ git filter-branch - --subdirectory-filter '' -- --all Rewrite 5b5d93ca1ebcbc90c3ad688b9b9751d014b452a8 (8/8) WARNING: Ref 'refs/heads/master' is unchanged WARNING: Ref 'refs/remotes/origin/master' is unchanged WARNING: Ref 'refs/remotes/origin/gh-pages' is unchanged WARNING: Ref 'refs/remotes/origin/master' is unchanged mikelifeguard@arbour:~/Code/git/stewbot (master)$ git fsck warning in tree a39aa6d4a6dcfd6c14d8f818bbdf1dfcb3e11771: contains zero-padded file modes - -Mike -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkutQcQACgkQst0AR/DaKHudmACgtZ67Tv1pO769BL5OvduZacix BvMAoLr1UTf5UTd6zowRDHLovBVSY+0R =n/G1 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 23:22 ` Mike.lifeguard @ 2010-03-26 23:49 ` Jonathan Nieder 0 siblings, 0 replies; 33+ messages in thread From: Jonathan Nieder @ 2010-03-26 23:49 UTC (permalink / raw) To: Mike.lifeguard; +Cc: Shawn O. Pearce, Scott Chacon, git Mike.lifeguard wrote: > On 10-03-26 08:05 PM, Shawn O. Pearce wrote: >> $ git filter-branch --index-filter '' --all > > This and a few other variations I tried does rewrite things, but the > problem persists: Yes, I think git write-tree does not rewrite subtrees. How about git fast-export --all | ( cd /empty/repository && git init && git fast-import ) ? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 23:05 ` Shawn O. Pearce 2010-03-26 23:22 ` Mike.lifeguard @ 2010-03-26 23:50 ` Junio C Hamano 2010-03-26 23:56 ` Avery Pennarun 1 sibling, 1 reply; 33+ messages in thread From: Junio C Hamano @ 2010-03-26 23:50 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Mike.lifeguard, Scott Chacon, Jonathan Nieder, git "Shawn O. Pearce" <spearce@spearce.org> writes: > Scott, please fix that library on GitHub. JGit's fsck has a hard > failure on these malformed trees, because the leading '0' mode > causes the tree to come up with the wrong SHA-1 hash given its > logical content. They shouldn't be created like this. What is curious is that even though 6407180 (git-fsck-cache: be stricter about "tree" objects, 2005-07-27) does talk about zero-padding, it appears that we never had a version of git that padded mode in '0' in the entire history of write-tree (except that "notes tree" one, but even that didn't escape the laboratory). But now we know there is a tool in the wild creating broken objects left and right, jgit's fsck routine might need to be more lenient (while warning loudly) in what it accepts. Scott, does your tool have outside users (i.e. being freely distributed and you have no control over the continued use of existing copies that create broken objects)? If not, then there won't be further damage once you fix it at Github, and we may not have to worry about changing jgit after all. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 23:50 ` Junio C Hamano @ 2010-03-26 23:56 ` Avery Pennarun 2010-03-27 0:00 ` Mike.lifeguard 0 siblings, 1 reply; 33+ messages in thread From: Avery Pennarun @ 2010-03-26 23:56 UTC (permalink / raw) To: Junio C Hamano Cc: Shawn O. Pearce, Mike.lifeguard, Scott Chacon, Jonathan Nieder, git On Fri, Mar 26, 2010 at 7:50 PM, Junio C Hamano <gitster@pobox.com> wrote: > "Shawn O. Pearce" <spearce@spearce.org> writes: > >> Scott, please fix that library on GitHub. JGit's fsck has a hard >> failure on these malformed trees, because the leading '0' mode >> causes the tree to come up with the wrong SHA-1 hash given its >> logical content. They shouldn't be created like this. > > What is curious is that even though 6407180 (git-fsck-cache: be stricter > about "tree" objects, 2005-07-27) does talk about zero-padding, it appears > that we never had a version of git that padded mode in '0' in the entire > history of write-tree (except that "notes tree" one, but even that didn't > escape the laboratory). It's apparently an easy mistake to make. bup did this for a while until I added a 'git fsck' to its automated tests :) The problem is that everything in git works perfectly with these invalid file modes *except* fsck, and there's rarely a need to run fsck, so this problem can hide for a long time. Have fun, Avery ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-26 23:56 ` Avery Pennarun @ 2010-03-27 0:00 ` Mike.lifeguard 2010-03-27 1:22 ` Shawn O. Pearce 0 siblings, 1 reply; 33+ messages in thread From: Mike.lifeguard @ 2010-03-27 0:00 UTC (permalink / raw) To: Avery Pennarun Cc: Junio C Hamano, Shawn O. Pearce, Scott Chacon, Jonathan Nieder, git -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10-03-26 08:49 PM, Jonathan Nieder wrote: > git fast-export --all | > ( > cd /empty/repository && > git init && > git fast-import > ) That one did something: *When I cd-ed into the repo, there were staged changes waiting for me (O.o) -- the changes would have simply deleted every file in the source tree. *git fsck had no warnings *As predicted, SHA1s changed On 10-03-26 08:56 PM, Avery Pennarun wrote: > The problem is that everything in git works perfectly with these > invalid file modes *except* fsck, and there's rarely a need to run > fsck, so this problem can hide for a long time. So, does the error matter or not? If it doesn't matter, then shouldn't Jgit stop whining? If it does, then whatever-it-is needs to be fixed. We're still not sure what was done with github to cause this. I've done nothing with github's web interface, and the project lead can't recall doing anything prior to this error (only stuff today, but this error cropped up yesterday). I suspect witchcraft :P - -Mike -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkutSoIACgkQst0AR/DaKHsblwCcC5j2jDuy95EOjhkK8adfWXl7 ZFEAnRn2bi9glDh6RR3xTwYkjxnMQYqx =YSTH -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 0:00 ` Mike.lifeguard @ 2010-03-27 1:22 ` Shawn O. Pearce 2010-03-27 1:30 ` Nicolas Pitre 0 siblings, 1 reply; 33+ messages in thread From: Shawn O. Pearce @ 2010-03-27 1:22 UTC (permalink / raw) To: Mike.lifeguard Cc: Avery Pennarun, Junio C Hamano, Scott Chacon, Jonathan Nieder, git "Mike.lifeguard" <mike.lifeguard@gmail.com> wrote: > On 10-03-26 08:56 PM, Avery Pennarun wrote: > > The problem is that everything in git works perfectly with these > > invalid file modes *except* fsck, and there's rarely a need to run > > fsck, so this problem can hide for a long time. > > So, does the error matter or not? If it doesn't matter, then shouldn't > Jgit stop whining? If it does, then whatever-it-is needs to be fixed. Its less harmful than other types of corruption. But its quite wrong from a format perspective. The hash of the tree differs even though there is no semantic difference in the tree content. Given that GitHub has blessed the world with this corruption, we may need to modify JGit to accept it. -- Shawn. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 1:22 ` Shawn O. Pearce @ 2010-03-27 1:30 ` Nicolas Pitre 2010-03-27 1:34 ` Shawn O. Pearce 0 siblings, 1 reply; 33+ messages in thread From: Nicolas Pitre @ 2010-03-27 1:30 UTC (permalink / raw) To: Shawn O. Pearce Cc: Mike.lifeguard, Avery Pennarun, Junio C Hamano, Scott Chacon, Jonathan Nieder, git On Fri, 26 Mar 2010, Shawn O. Pearce wrote: > "Mike.lifeguard" <mike.lifeguard@gmail.com> wrote: > > On 10-03-26 08:56 PM, Avery Pennarun wrote: > > > The problem is that everything in git works perfectly with these > > > invalid file modes *except* fsck, and there's rarely a need to run > > > fsck, so this problem can hide for a long time. > > > > So, does the error matter or not? If it doesn't matter, then shouldn't > > Jgit stop whining? If it does, then whatever-it-is needs to be fixed. > > Its less harmful than other types of corruption. But its quite > wrong from a format perspective. The hash of the tree differs even > though there is no semantic difference in the tree content. > > Given that GitHub has blessed the world with this corruption, > we may need to modify JGit to accept it. Should we? This is going to screw up pack v4 (yes, someday I'll have the time to make it real). Nicolas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 1:30 ` Nicolas Pitre @ 2010-03-27 1:34 ` Shawn O. Pearce 2010-03-27 1:56 ` Nicolas Pitre 2010-03-27 5:16 ` Junio C Hamano 0 siblings, 2 replies; 33+ messages in thread From: Shawn O. Pearce @ 2010-03-27 1:34 UTC (permalink / raw) To: Nicolas Pitre Cc: Mike.lifeguard, Avery Pennarun, Junio C Hamano, Scott Chacon, Jonathan Nieder, git Nicolas Pitre <nico@fluxnic.net> wrote: > On Fri, 26 Mar 2010, Shawn O. Pearce wrote: > > "Mike.lifeguard" <mike.lifeguard@gmail.com> wrote: > > > On 10-03-26 08:56 PM, Avery Pennarun wrote: > > > > The problem is that everything in git works perfectly with these > > > > invalid file modes *except* fsck, and there's rarely a need to run > > > > fsck, so this problem can hide for a long time. > > > > > > So, does the error matter or not? If it doesn't matter, then shouldn't > > > Jgit stop whining? If it does, then whatever-it-is needs to be fixed. > > > > Its less harmful than other types of corruption. But its quite > > wrong from a format perspective. The hash of the tree differs even > > though there is no semantic difference in the tree content. > > > > Given that GitHub has blessed the world with this corruption, > > we may need to modify JGit to accept it. > > Should we? > > This is going to screw up pack v4 (yes, someday I'll have the time to > make it real). Exactly. I *really* don't want to permit this sort of corruption in a Git repository. But GitHub's approach here seems to be "Meh, its fine, don't worry about it". Its *NOT* fine. But Avery and Junio might disagree with me. :-) Though, FWIW, it might not screw up pack v4. IIRC from our discussions long ago on pack v4, we store "$mode $name" pairs in an indexed list, preciously because we needed to support odd modes like 10664 from ancient Git binaries. If we continue to allow this corruption, it means we have to ensure $mode is the octal string and not the binary value. -- Shawn. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 1:34 ` Shawn O. Pearce @ 2010-03-27 1:56 ` Nicolas Pitre 2010-03-27 2:33 ` Avery Pennarun 2010-03-27 12:44 ` Scott Chacon 2010-03-27 5:16 ` Junio C Hamano 1 sibling, 2 replies; 33+ messages in thread From: Nicolas Pitre @ 2010-03-27 1:56 UTC (permalink / raw) To: Shawn O. Pearce Cc: Mike.lifeguard, Avery Pennarun, Junio C Hamano, Scott Chacon, Jonathan Nieder, git On Fri, 26 Mar 2010, Shawn O. Pearce wrote: > Nicolas Pitre <nico@fluxnic.net> wrote: > > On Fri, 26 Mar 2010, Shawn O. Pearce wrote: > > > Given that GitHub has blessed the world with this corruption, > > > we may need to modify JGit to accept it. > > > > Should we? > > > > This is going to screw up pack v4 (yes, someday I'll have the time to > > make it real). > > Exactly. I *really* don't want to permit this sort of corruption > in a Git repository. > > But GitHub's approach here seems to be "Meh, its fine, don't worry > about it". It's up to GitHub to fork Git then, and while at it stop calling it Git compatible. Really. If we start to get slack about the pack format like this then every Git reimplementation du jour will make similar deviations except in different directions and we'll end up with a mess to support. And in this case there is _no_ excuse as 'git fsck' is actually complaining. My stance has always been that the C Git is authoritative with regards to formats and protocols. It's up to Github to fix their screw-up. > Its *NOT* fine. But Avery and Junio might disagree with me. :-) FWIW I agree with you. > Though, FWIW, it might not screw up pack v4. IIRC from our > discussions long ago on pack v4, we store "$mode $name" pairs in > an indexed list, preciously because we needed to support odd modes > like 10664 from ancient Git binaries. If we continue to allow this > corruption, it means we have to ensure $mode is the octal string > and not the binary value. Which is a real pity. In fact, my position is that pack v4 would simply refuse to optimize the encoding for such tree objects, period. Only the non ambiguously encoded tree objects would benefit from the v4 improvements. Nicolas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 1:56 ` Nicolas Pitre @ 2010-03-27 2:33 ` Avery Pennarun 2010-03-27 12:44 ` Scott Chacon 1 sibling, 0 replies; 33+ messages in thread From: Avery Pennarun @ 2010-03-27 2:33 UTC (permalink / raw) To: Nicolas Pitre Cc: Shawn O. Pearce, Mike.lifeguard, Junio C Hamano, Scott Chacon, Jonathan Nieder, git On Fri, Mar 26, 2010 at 9:56 PM, Nicolas Pitre <nico@fluxnic.net> wrote: > On Fri, 26 Mar 2010, Shawn O. Pearce wrote: >> Its *NOT* fine. But Avery and Junio might disagree with me. :-) > > FWIW I agree with you. I would also like to remove my name from the "disagree" list. :) Producing nonstandard output isn't fine at all - I mentioned Postel's Law, but the neglected half of that law is that you're supposed to produce valid data in the first place. This is why (as I mentioned earlier) bup's automated tests now run 'git fsck' explicitly to verify that it gets it right. It was only the very first versions of bup, which thankfully nobody used for anything important, that screwed this up. Barring any new and improved screw-ups, anyway. I only brought it up to say that it's actually easy to make this mistake undetected. Very few people run git fsck nowadays. The world might benefit if git complained (albeit non-fatally) *whenever* it saw such an incorrect tree. > In fact, my position is that pack v4 would simply refuse to optimize the > encoding for such tree objects, period. Only the non ambiguously > encoded tree objects would benefit from the v4 improvements. This sounds very wise to me. Have fun, Avery ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 1:56 ` Nicolas Pitre 2010-03-27 2:33 ` Avery Pennarun @ 2010-03-27 12:44 ` Scott Chacon 2010-03-27 14:21 ` Nicolas Pitre 1 sibling, 1 reply; 33+ messages in thread From: Scott Chacon @ 2010-03-27 12:44 UTC (permalink / raw) To: Nicolas Pitre Cc: Shawn O. Pearce, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git Hey, Sorry it's taken me a bit - I'm traveling right now. On Fri, Mar 26, 2010 at 6:56 PM, Nicolas Pitre <nico@fluxnic.net> wrote: >> > > Given that GitHub has blessed the world with this corruption, >> > > we may need to modify JGit to accept it. Well, shouldn't it accept it just because CGit accepts it? Isn't that an incompatibility in implementation? >> But GitHub's approach here seems to be "Meh, its fine, don't worry >> about it". That isn't really my approach, I actually thought I had fixed this a while ago. It seems to be a pretty understandable mistake, since ls-tree and cat-file -p both output zero padded modes and it is only an issue on trees with subtrees, obviously, so we don't see it all the time at GitHub. I have fixed this and it's in the queue for deployment which should be in the next few days (I gotta get home first). > It's up to GitHub to fork Git then, and while at it stop calling it Git > compatible. Really. If we start to get slack about the pack format > like this then every Git reimplementation du jour will make similar > deviations except in different directions and we'll end up with a mess > to support. Really? It's not the pack format - we use stock Git servers and almost always have. It's the tree writing when someone edits a file inline - I was writing out zero-padded trees. And, it _is_ Git compatible - CGit only issues a warning, and that only if the circumstances align such that we write a tree with a subtree, which again is pretty rare. There are only a handful of projects like this and in all CGit circumstances makes no practical difference. > My stance has always been that the C Git is authoritative with regards to > formats and protocols. It's up to Github to fix their screw-up. It is fixed and will be deployed soon, but really, there is no reason to be snippy. It is a simple and minor mistake effecting very few repositories (maybe 100 out of 730k), and the only reason it's an issue at all is that JGit is not following the authoritative CGit implementation of basically ignoring it. Also, if we're all concerned about "Git reimplementation du jour" deviations, then we need to focus on libifying Git so there isn't a need for such re-implementations. I'm hoping to help with a possible GSoC project on libgit2, but the lack of a linkable library will ensure that re-implementations in nearly every useful language will continue. Scott ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 12:44 ` Scott Chacon @ 2010-03-27 14:21 ` Nicolas Pitre 2010-03-27 19:14 ` Shawn O. Pearce 0 siblings, 1 reply; 33+ messages in thread From: Nicolas Pitre @ 2010-03-27 14:21 UTC (permalink / raw) To: Scott Chacon Cc: Shawn O. Pearce, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 3481 bytes --] On Sat, 27 Mar 2010, Scott Chacon wrote: > Hey, > > Sorry it's taken me a bit - I'm traveling right now. > > On Fri, Mar 26, 2010 at 6:56 PM, Nicolas Pitre <nico@fluxnic.net> wrote: > >> > > Given that GitHub has blessed the world with this corruption, > >> > > we may need to modify JGit to accept it. > > Well, shouldn't it accept it just because CGit accepts it? Isn't that > an incompatibility in implementation? CGit fsck complains about it. This should be sufficient a clue to avoid such things. > >> But GitHub's approach here seems to be "Meh, its fine, don't worry > >> about it". > > That isn't really my approach, I actually thought I had fixed this a > while ago. It seems to be a pretty understandable mistake, since > ls-tree and cat-file -p both output zero padded modes and it is only > an issue on trees with subtrees, obviously, so we don't see it all the > time at GitHub. I have fixed this and it's in the queue for > deployment which should be in the next few days (I gotta get home > first). Thanks. > > It's up to GitHub to fork Git then, and while at it stop calling it Git > > compatible. Really. If we start to get slack about the pack format > > like this then every Git reimplementation du jour will make similar > > deviations except in different directions and we'll end up with a mess > > to support. > > Really? It's not the pack format - we use stock Git servers and > almost always have. It's the tree writing when someone edits a file > inline - I was writing out zero-padded trees. And, it _is_ Git > compatible - CGit only issues a warning, and that only if the > circumstances align such that we write a tree with a subtree, which > again is pretty rare. There are only a handful of projects like this > and in all CGit circumstances makes no practical difference. It is still damn important to those with an interest in pack format improvements that only one way of creating a tree object exists, especially as we stamp a SHA1 hash on it. Whatever we do with the tree encoding in the future, it is essential that the canonical expression of any tree object be unambiguous and always produce the same hash. > > My stance has always been that the C Git is authoritative with regards to > > formats and protocols. It's up to Github to fix their screw-up. > > It is fixed and will be deployed soon, but really, there is no reason > to be snippy. It is a simple and minor mistake effecting very few > repositories (maybe 100 out of 730k), and the only reason it's an > issue at all is that JGit is not following the authoritative CGit > implementation of basically ignoring it. But again CGit's fsck is not ignoring this discrepancy. And if the CGit core is otherwise silently accepting it then it is a mistake. > Also, if we're all concerned about "Git reimplementation du jour" > deviations, then we need to focus on libifying Git so there isn't a > need for such re-implementations. I'm hoping to help with a possible > GSoC project on libgit2, but the lack of a linkable library will > ensure that re-implementations in nearly every useful language will > continue. Don't get me wrong. I'm not against Git reimplementations per se, as long as they rigorously implement the exact format and protocol from CGit. In that sense it is important that the CGit fsck and verify-pack tools be exploited on objects/packs produced by alternate Git implementation systematically to find such issues. Nicolas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 14:21 ` Nicolas Pitre @ 2010-03-27 19:14 ` Shawn O. Pearce 2010-03-27 19:30 ` A Large Angry SCM ` (3 more replies) 0 siblings, 4 replies; 33+ messages in thread From: Shawn O. Pearce @ 2010-03-27 19:14 UTC (permalink / raw) To: Nicolas Pitre Cc: Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git Nicolas Pitre <nico@fluxnic.net> wrote: > On Sat, 27 Mar 2010, Scott Chacon wrote: > > > My stance has always been that the C Git is authoritative with regards to > > > formats and protocols. ??It's up to Github to fix their screw-up. > > > > It is fixed and will be deployed soon, but really, there is no reason > > to be snippy. It is a simple and minor mistake effecting very few > > repositories (maybe 100 out of 730k) What is the C Git stance on these 100 repositories then? Are they now considered corrupt? Or is 100 enough in the wild that we have to accept the problem, just like we accept the 10664 mode issue from "ancient" Linux? I would love to say "those are corrupt, sorry, fix your repository". But we have traditionally tried to help our users, and not cause them pain. Forcing a rewrite on these 100 projects to fix up the corruption is going to be painful for them. > > , and the only reason it's an > > issue at all is that JGit is not following the authoritative CGit > > implementation of basically ignoring it. > > But again CGit's fsck is not ignoring this discrepancy. And if the CGit > core is otherwise silently accepting it then it is a mistake. Right. I tend to agree. CGit was too lax here, fsck shouldn't be issuing a warning, it should be a fatal error. Both CGit and JGit are too lax by not failing when reading that tree during normal processing. > > Also, if we're all concerned about "Git reimplementation du jour" > > deviations, then we need to focus on libifying Git so there isn't a > > need for such re-implementations. I'm hoping to help with a possible > > GSoC project on libgit2, but the lack of a linkable library will > > ensure that re-implementations in nearly every useful language will > > continue. > > Don't get me wrong. I'm not against Git reimplementations per se, as > long as they rigorously implement the exact format and protocol from > CGit. In that sense it is important that the CGit fsck and verify-pack > tools be exploited on objects/packs produced by alternate Git > implementation systematically to find such issues. When JGit had the tree sort order wrong, JGit was in the wrong, and any repository which contained those corrupt trees had to be fixed by rewriting them. IIRC it was only the JGit repository itself that had this problem in the wild. But we fixed our code. IMHO, this leading '0' thing is a similar breakage. We shouldn't relax CGit or JGit to accept it just because the Ruby implementation of Git got the tree encoding wrong. If anything, we should teach these implementations to catch these sorts of problems earlier. -- Shawn. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:14 ` Shawn O. Pearce @ 2010-03-27 19:30 ` A Large Angry SCM 2010-03-27 19:32 ` Shawn O. Pearce 2010-03-27 20:13 ` A Large Angry SCM ` (2 subsequent siblings) 3 siblings, 1 reply; 33+ messages in thread From: A Large Angry SCM @ 2010-03-27 19:30 UTC (permalink / raw) To: Shawn O. Pearce Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git Shawn O. Pearce wrote: > Nicolas Pitre <nico@fluxnic.net> wrote: >> On Sat, 27 Mar 2010, Scott Chacon wrote: >>>> My stance has always been that the C Git is authoritative with regards to >>>> formats and protocols. ??It's up to Github to fix their screw-up. >>> It is fixed and will be deployed soon, but really, there is no reason >>> to be snippy. It is a simple and minor mistake effecting very few >>> repositories (maybe 100 out of 730k) > > What is the C Git stance on these 100 repositories then? Are they > now considered corrupt? Or is 100 enough in the wild that we have > to accept the problem, just like we accept the 10664 mode issue from > "ancient" Linux? > > I would love to say "those are corrupt, sorry, fix your repository". > > But we have traditionally tried to help our users, and not cause > them pain. Forcing a rewrite on these 100 projects to fix up the > corruption is going to be painful for them. > >>> , and the only reason it's an >>> issue at all is that JGit is not following the authoritative CGit >>> implementation of basically ignoring it. >> But again CGit's fsck is not ignoring this discrepancy. And if the CGit >> core is otherwise silently accepting it then it is a mistake. > > Right. I tend to agree. CGit was too lax here, fsck shouldn't > be issuing a warning, it should be a fatal error. Both CGit and > JGit are too lax by not failing when reading that tree during > normal processing. > >>> Also, if we're all concerned about "Git reimplementation du jour" >>> deviations, then we need to focus on libifying Git so there isn't a >>> need for such re-implementations. I'm hoping to help with a possible >>> GSoC project on libgit2, but the lack of a linkable library will >>> ensure that re-implementations in nearly every useful language will >>> continue. >> Don't get me wrong. I'm not against Git reimplementations per se, as >> long as they rigorously implement the exact format and protocol from >> CGit. In that sense it is important that the CGit fsck and verify-pack >> tools be exploited on objects/packs produced by alternate Git >> implementation systematically to find such issues. > > When JGit had the tree sort order wrong, JGit was in the wrong, > and any repository which contained those corrupt trees had to be > fixed by rewriting them. IIRC it was only the JGit repository > itself that had this problem in the wild. But we fixed our code. > > IMHO, this leading '0' thing is a similar breakage. We shouldn't > relax CGit or JGit to accept it just because the Ruby implementation > of Git got the tree encoding wrong. If anything, we should teach > these implementations to catch these sorts of problems earlier. > Just add an additional data point, it looks like up to 16 of these trees with zero-padded file modes are reachable from Linus' kernel master ref. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:30 ` A Large Angry SCM @ 2010-03-27 19:32 ` Shawn O. Pearce 2010-03-27 19:39 ` A Large Angry SCM 0 siblings, 1 reply; 33+ messages in thread From: Shawn O. Pearce @ 2010-03-27 19:32 UTC (permalink / raw) To: A Large Angry SCM Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git A Large Angry SCM <gitzilla@gmail.com> wrote: > Shawn O. Pearce wrote: >> >> IMHO, this leading '0' thing is a similar breakage. We shouldn't >> relax CGit or JGit to accept it just because the Ruby implementation >> of Git got the tree encoding wrong. If anything, we should teach >> these implementations to catch these sorts of problems earlier. > > Just add an additional data point, it looks like up to 16 of these trees > with zero-padded file modes are reachable from Linus' kernel master ref. Frell. We can't ask Linus to rewrite his history to repair this breakage. The fact that its made it into the kernel history means we have to accept this. The kernel project is simply too large and move too fast for us to ask them to fix their repository history. Smaller projects of 1-2 people, we could have gotten away with asking them to fix their history. I guess that answers the questions then. CGit permits this with a warning, and must always continue to do that. And JGit needs to fix itself to do the same. -- Shawn. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:32 ` Shawn O. Pearce @ 2010-03-27 19:39 ` A Large Angry SCM 2010-03-27 19:44 ` A Large Angry SCM 0 siblings, 1 reply; 33+ messages in thread From: A Large Angry SCM @ 2010-03-27 19:39 UTC (permalink / raw) To: Shawn O. Pearce Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git Shawn O. Pearce wrote: > A Large Angry SCM <gitzilla@gmail.com> wrote: >> Shawn O. Pearce wrote: >>> IMHO, this leading '0' thing is a similar breakage. We shouldn't >>> relax CGit or JGit to accept it just because the Ruby implementation >>> of Git got the tree encoding wrong. If anything, we should teach >>> these implementations to catch these sorts of problems earlier. >> Just add an additional data point, it looks like up to 16 of these trees >> with zero-padded file modes are reachable from Linus' kernel master ref. > > Frell. > > We can't ask Linus to rewrite his history to repair this breakage. > The fact that its made it into the kernel history means we have > to accept this. The kernel project is simply too large and move > too fast for us to ask them to fix their repository history. > Smaller projects of 1-2 people, we could have gotten away with > asking them to fix their history. > > I guess that answers the questions then. CGit permits this with > a warning, and must always continue to do that. And JGit needs to > fix itself to do the same. > Wait a minute, something strange is going on here. My combined kernel repository has 16 of these things according to git-fsck. And when I do 'git-fsck torvalds/linux-2.6/master' I get the same 16 BUT when I 'git-rev-list --objects torvalds/linux-2.6/master' they do not appear in the output. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:39 ` A Large Angry SCM @ 2010-03-27 19:44 ` A Large Angry SCM 2010-03-27 19:57 ` A Large Angry SCM 2010-03-28 17:38 ` Sitaram Chamarty 0 siblings, 2 replies; 33+ messages in thread From: A Large Angry SCM @ 2010-03-27 19:44 UTC (permalink / raw) To: gitzilla Cc: Shawn O. Pearce, Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git A Large Angry SCM wrote: > Shawn O. Pearce wrote: >> A Large Angry SCM <gitzilla@gmail.com> wrote: >>> Shawn O. Pearce wrote: >>>> IMHO, this leading '0' thing is a similar breakage. We shouldn't >>>> relax CGit or JGit to accept it just because the Ruby implementation >>>> of Git got the tree encoding wrong. If anything, we should teach >>>> these implementations to catch these sorts of problems earlier. >>> Just add an additional data point, it looks like up to 16 of these >>> trees with zero-padded file modes are reachable from Linus' kernel >>> master ref. >> >> Frell. >> >> We can't ask Linus to rewrite his history to repair this breakage. >> The fact that its made it into the kernel history means we have >> to accept this. The kernel project is simply too large and move >> too fast for us to ask them to fix their repository history. >> Smaller projects of 1-2 people, we could have gotten away with >> asking them to fix their history. >> >> I guess that answers the questions then. CGit permits this with >> a warning, and must always continue to do that. And JGit needs to >> fix itself to do the same. >> > > Wait a minute, something strange is going on here. > > My combined kernel repository has 16 of these things according to > git-fsck. And when I do 'git-fsck torvalds/linux-2.6/master' I get the > same 16 BUT when I 'git-rev-list --objects torvalds/linux-2.6/master' > they do not appear in the output. > Ignore my noise until some more checking is done! My combined repository also includes Scott's progit book and examples repositories. I'm guessing that git-fsck did not limit itself to just the object reachable from the torvalds/linux-2.6/master ref. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:44 ` A Large Angry SCM @ 2010-03-27 19:57 ` A Large Angry SCM 2010-03-28 17:38 ` Sitaram Chamarty 1 sibling, 0 replies; 33+ messages in thread From: A Large Angry SCM @ 2010-03-27 19:57 UTC (permalink / raw) To: gitzilla Cc: Shawn O. Pearce, Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git A Large Angry SCM wrote: > A Large Angry SCM wrote: >> Shawn O. Pearce wrote: >>> A Large Angry SCM <gitzilla@gmail.com> wrote: >>>> Shawn O. Pearce wrote: >>>>> IMHO, this leading '0' thing is a similar breakage. We shouldn't >>>>> relax CGit or JGit to accept it just because the Ruby implementation >>>>> of Git got the tree encoding wrong. If anything, we should teach >>>>> these implementations to catch these sorts of problems earlier. >>>> Just add an additional data point, it looks like up to 16 of these >>>> trees with zero-padded file modes are reachable from Linus' kernel >>>> master ref. >>> >>> Frell. >>> >>> We can't ask Linus to rewrite his history to repair this breakage. >>> The fact that its made it into the kernel history means we have >>> to accept this. The kernel project is simply too large and move >>> too fast for us to ask them to fix their repository history. >>> Smaller projects of 1-2 people, we could have gotten away with >>> asking them to fix their history. >>> >>> I guess that answers the questions then. CGit permits this with >>> a warning, and must always continue to do that. And JGit needs to >>> fix itself to do the same. >>> >> >> Wait a minute, something strange is going on here. >> >> My combined kernel repository has 16 of these things according to >> git-fsck. And when I do 'git-fsck torvalds/linux-2.6/master' I get the >> same 16 BUT when I 'git-rev-list --objects torvalds/linux-2.6/master' >> they do not appear in the output. >> > > Ignore my noise until some more checking is done! > > My combined repository also includes Scott's progit book and examples > repositories. I'm guessing that git-fsck did not limit itself to just > the object reachable from the torvalds/linux-2.6/master ref. > The 16 offending trees are all git-rev-list reachable from Scott's progit book master ref and _NOT_ from Linus' Linux master ref. Sorry about the confusion!!! ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:44 ` A Large Angry SCM 2010-03-27 19:57 ` A Large Angry SCM @ 2010-03-28 17:38 ` Sitaram Chamarty 2010-03-28 23:28 ` A Large Angry SCM 1 sibling, 1 reply; 33+ messages in thread From: Sitaram Chamarty @ 2010-03-28 17:38 UTC (permalink / raw) To: gitzilla Cc: Shawn O. Pearce, Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git On Sun, Mar 28, 2010 at 1:14 AM, A Large Angry SCM <gitzilla@gmail.com> wrote: > My combined repository also includes Scott's progit book and examples > repositories. I'm guessing that git-fsck did not limit itself to just the > object reachable from the torvalds/linux-2.6/master ref. <struck speechless> You have *one* repo containing both Scott's book and the Linux kernel tree? "large angry SCM" is probably an understatement then... any SCM has the right to be angry mixing up things that are so unrelated! Or did I totally, *totally* misunderstand...? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-28 17:38 ` Sitaram Chamarty @ 2010-03-28 23:28 ` A Large Angry SCM 0 siblings, 0 replies; 33+ messages in thread From: A Large Angry SCM @ 2010-03-28 23:28 UTC (permalink / raw) To: Sitaram Chamarty Cc: Shawn O. Pearce, Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git Sitaram Chamarty wrote: > On Sun, Mar 28, 2010 at 1:14 AM, A Large Angry SCM <gitzilla@gmail.com> wrote: > >> My combined repository also includes Scott's progit book and examples >> repositories. I'm guessing that git-fsck did not limit itself to just the >> object reachable from the torvalds/linux-2.6/master ref. > > <struck speechless> > > You have *one* repo containing both Scott's book and the Linux kernel > tree? "large angry SCM" is probably an understatement then... any SCM > has the right to be angry mixing up things that are so unrelated! > > Or did I totally, *totally* misunderstand...? > You understood correctly. I use that repository to track activities in a number of different but related projects/communities. Scott's stuff has since been purged until he fixes the corruption. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:14 ` Shawn O. Pearce 2010-03-27 19:30 ` A Large Angry SCM @ 2010-03-27 20:13 ` A Large Angry SCM 2010-03-27 20:16 ` Junio C Hamano 2010-03-27 22:16 ` Avery Pennarun 3 siblings, 0 replies; 33+ messages in thread From: A Large Angry SCM @ 2010-03-27 20:13 UTC (permalink / raw) To: Shawn O. Pearce Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano, Jonathan Nieder, git Shawn O. Pearce wrote: > Nicolas Pitre <nico@fluxnic.net> wrote: >> On Sat, 27 Mar 2010, Scott Chacon wrote: >>>> My stance has always been that the C Git is authoritative with regards to >>>> formats and protocols. ??It's up to Github to fix their screw-up. >>> It is fixed and will be deployed soon, but really, there is no reason >>> to be snippy. It is a simple and minor mistake effecting very few >>> repositories (maybe 100 out of 730k) > > What is the C Git stance on these 100 repositories then? Are they > now considered corrupt? Or is 100 enough in the wild that we have > to accept the problem, just like we accept the 10664 mode issue from > "ancient" Linux? > > I would love to say "those are corrupt, sorry, fix your repository". > > But we have traditionally tried to help our users, and not cause > them pain. Forcing a rewrite on these 100 projects to fix up the > corruption is going to be painful for them. > >>> , and the only reason it's an >>> issue at all is that JGit is not following the authoritative CGit >>> implementation of basically ignoring it. >> But again CGit's fsck is not ignoring this discrepancy. And if the CGit >> core is otherwise silently accepting it then it is a mistake. > > Right. I tend to agree. CGit was too lax here, fsck shouldn't > be issuing a warning, it should be a fatal error. Both CGit and > JGit are too lax by not failing when reading that tree during > normal processing. CGit should treat the object as corrupt, output a message to that effect, and continue checking the rest of the objects. Everything else that traverses graph should exit with an error as soon as it tries detects a corrupt object. This would allow someone to use git-for-each-ref and git-rev-list to prune the graph by deleting refs without trashing the entire repository. >>> Also, if we're all concerned about "Git reimplementation du jour" >>> deviations, then we need to focus on libifying Git so there isn't a >>> need for such re-implementations. I'm hoping to help with a possible >>> GSoC project on libgit2, but the lack of a linkable library will >>> ensure that re-implementations in nearly every useful language will >>> continue. >> Don't get me wrong. I'm not against Git reimplementations per se, as >> long as they rigorously implement the exact format and protocol from >> CGit. In that sense it is important that the CGit fsck and verify-pack >> tools be exploited on objects/packs produced by alternate Git >> implementation systematically to find such issues. > > When JGit had the tree sort order wrong, JGit was in the wrong, > and any repository which contained those corrupt trees had to be > fixed by rewriting them. IIRC it was only the JGit repository > itself that had this problem in the wild. But we fixed our code. > > IMHO, this leading '0' thing is a similar breakage. We shouldn't > relax CGit or JGit to accept it just because the Ruby implementation > of Git got the tree encoding wrong. If anything, we should teach > these implementations to catch these sorts of problems earlier. > I agree. Now how can the git community help them help themselves? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:14 ` Shawn O. Pearce 2010-03-27 19:30 ` A Large Angry SCM 2010-03-27 20:13 ` A Large Angry SCM @ 2010-03-27 20:16 ` Junio C Hamano 2010-03-27 22:16 ` Avery Pennarun 3 siblings, 0 replies; 33+ messages in thread From: Junio C Hamano @ 2010-03-27 20:16 UTC (permalink / raw) To: Shawn O. Pearce Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun, Jonathan Nieder, git "Shawn O. Pearce" <spearce@spearce.org> writes: > Nicolas Pitre <nico@fluxnic.net> wrote: >> On Sat, 27 Mar 2010, Scott Chacon wrote: >> > > My stance has always been that the C Git is authoritative with regards to >> > > formats and protocols. ??It's up to Github to fix their screw-up. >> > >> > It is fixed and will be deployed soon, but really, there is no reason >> > to be snippy. It is a simple and minor mistake effecting very few >> > repositories (maybe 100 out of 730k) > > What is the C Git stance on these 100 repositories then? Are they > now considered corrupt? Or is 100 enough in the wild that we have > to accept the problem, just like we accept the 10664 mode issue from > "ancient" Linux? > > I would love to say "those are corrupt, sorry, fix your repository". This is why I first asked how widespread the copies of the implementation of that broken tool are. If it is only 100 and all breakages are confined to objects created at GitHub installation, and the owners of these 100 repositories are not locally creating corrupt objects with copies of broken reimplementation of git they have, I would say that we tell them to fix it, and GitHub can hopefully help them as their hosting site. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:14 ` Shawn O. Pearce ` (2 preceding siblings ...) 2010-03-27 20:16 ` Junio C Hamano @ 2010-03-27 22:16 ` Avery Pennarun 3 siblings, 0 replies; 33+ messages in thread From: Avery Pennarun @ 2010-03-27 22:16 UTC (permalink / raw) To: Shawn O. Pearce Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Junio C Hamano, Jonathan Nieder, git On Sat, Mar 27, 2010 at 3:14 PM, Shawn O. Pearce <spearce@spearce.org> wrote: > Nicolas Pitre <nico@fluxnic.net> wrote: >> On Sat, 27 Mar 2010, Scott Chacon wrote: >> > , and the only reason it's an >> > issue at all is that JGit is not following the authoritative CGit >> > implementation of basically ignoring it. >> >> But again CGit's fsck is not ignoring this discrepancy. And if the CGit >> core is otherwise silently accepting it then it is a mistake. > > Right. I tend to agree. CGit was too lax here, fsck shouldn't > be issuing a warning, it should be a fatal error. Both CGit and > JGit are too lax by not failing when reading that tree during > normal processing. Gah, no! Why would you want to make CGit work *less*? Print a warning, sure, but since the tree is perfectly readable, there's no reason to refuse to read it. That's just rude. Similarly, there should be no reason for fsck to treat *any* recoverable error as fatal. If it drops dead, it then misses the chance to diagnose later problems. When I run fsck, I want to see all the problems, not just the first one, especially if the first one can be fixed by filter-branch. Bonus points if (like e2fsck) it offers to fix it for me, though that's probably not worth implementing here. CGit works fine already. The only problem it has is that it works so well that Scott didn't notice his bug. This can be fixed by adding a simple warning. Have fun, Avery ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 1:34 ` Shawn O. Pearce 2010-03-27 1:56 ` Nicolas Pitre @ 2010-03-27 5:16 ` Junio C Hamano 2010-03-27 19:20 ` Shawn O. Pearce 1 sibling, 1 reply; 33+ messages in thread From: Junio C Hamano @ 2010-03-27 5:16 UTC (permalink / raw) To: Shawn O. Pearce Cc: Nicolas Pitre, Mike.lifeguard, Avery Pennarun, Scott Chacon, Jonathan Nieder, git "Shawn O. Pearce" <spearce@spearce.org> writes: > But GitHub's approach here seems to be "Meh, its fine, don't worry > about it". > > Its *NOT* fine. But Avery and Junio might disagree with me. :-) Did I ever say it is _fine_? I thought I said "complain loudly". That would at least give poor jgit users who have hit such a corrupted object a chance to get a controlled notice and ask for help (and get an insn to recover with filter-branch that appeared in this thread). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 5:16 ` Junio C Hamano @ 2010-03-27 19:20 ` Shawn O. Pearce 2010-03-27 20:04 ` Junio C Hamano 0 siblings, 1 reply; 33+ messages in thread From: Shawn O. Pearce @ 2010-03-27 19:20 UTC (permalink / raw) To: Junio C Hamano Cc: Nicolas Pitre, Mike.lifeguard, Avery Pennarun, Scott Chacon, Jonathan Nieder, git Junio C Hamano <gitster@pobox.com> wrote: > "Shawn O. Pearce" <spearce@spearce.org> writes: > > > But GitHub's approach here seems to be "Meh, its fine, don't worry > > about it". > > > > Its *NOT* fine. But Avery and Junio might disagree with me. :-) > > Did I ever say it is _fine_? I thought I said "complain loudly". I apologize if I misrepresented you above. > That would at least give poor jgit users who have hit such a corrupted > object a chance to get a controlled notice and ask for help (and get an > insn to recover with filter-branch that appeared in this thread). Well, there is "complain loudly but do it anyway" and "hard stop". JGit currently has the leading '0' be a "hard stop". Because this is the fsck code running inside of the receive-pack service, validating what the user sent is isn't malformed. Its clearly malformed. This only got discovered because Mike tried to take a repository from GitHub and push it into Gerrit Code Review, where JGit's fsck routine cannot be bypassed during receive-pack. Are you suggesting JGit should change its behavior to be "complain loudly but do it anyway"? I'm open to making the code change there if that is how you think a Git implementation should behave in this case. But I don't want to do it just to match CGit's behavior, sometimes CGit can be wrong. :-) -- Shawn. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Tree with leading '0' modes in 1.7.0.3 2010-03-27 19:20 ` Shawn O. Pearce @ 2010-03-27 20:04 ` Junio C Hamano 0 siblings, 0 replies; 33+ messages in thread From: Junio C Hamano @ 2010-03-27 20:04 UTC (permalink / raw) To: Shawn O. Pearce Cc: Nicolas Pitre, Mike.lifeguard, Avery Pennarun, Scott Chacon, Jonathan Nieder, git "Shawn O. Pearce" <spearce@spearce.org> writes: > JGit currently has the leading '0' be a "hard stop". Because this is > the fsck code running inside of the receive-pack service, validating > what the user sent is isn't malformed. Its clearly malformed. The "complain loudly to let the user know about the need to fix the corrupt repository" comment was meant for the "git fsck" equivalent of jgit (if there such a thing). I think "hard stop to prevent corruption from getting propagated" is actually something we should do in receive-pack in the reference implementation if we don't do so already. ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2010-03-28 23:28 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-26 21:56 Tree with leading '0' modes in 1.7.0.3 Shawn O. Pearce 2010-03-26 22:26 ` Jonathan Nieder 2010-03-26 22:29 ` Shawn O. Pearce 2010-03-26 22:40 ` Jonathan Nieder 2010-03-26 23:09 ` Junio C Hamano 2010-03-26 22:59 ` Mike.lifeguard 2010-03-26 23:05 ` Shawn O. Pearce 2010-03-26 23:22 ` Mike.lifeguard 2010-03-26 23:49 ` Jonathan Nieder 2010-03-26 23:50 ` Junio C Hamano 2010-03-26 23:56 ` Avery Pennarun 2010-03-27 0:00 ` Mike.lifeguard 2010-03-27 1:22 ` Shawn O. Pearce 2010-03-27 1:30 ` Nicolas Pitre 2010-03-27 1:34 ` Shawn O. Pearce 2010-03-27 1:56 ` Nicolas Pitre 2010-03-27 2:33 ` Avery Pennarun 2010-03-27 12:44 ` Scott Chacon 2010-03-27 14:21 ` Nicolas Pitre 2010-03-27 19:14 ` Shawn O. Pearce 2010-03-27 19:30 ` A Large Angry SCM 2010-03-27 19:32 ` Shawn O. Pearce 2010-03-27 19:39 ` A Large Angry SCM 2010-03-27 19:44 ` A Large Angry SCM 2010-03-27 19:57 ` A Large Angry SCM 2010-03-28 17:38 ` Sitaram Chamarty 2010-03-28 23:28 ` A Large Angry SCM 2010-03-27 20:13 ` A Large Angry SCM 2010-03-27 20:16 ` Junio C Hamano 2010-03-27 22:16 ` Avery Pennarun 2010-03-27 5:16 ` Junio C Hamano 2010-03-27 19:20 ` Shawn O. Pearce 2010-03-27 20:04 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).