git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Tree with leading '0' modes in 1.7.0.3
@ 2010-03-26 21:56 Shawn O. Pearce
  2010-03-26 22:26 ` Jonathan Nieder
  0 siblings, 1 reply; 33+ messages in thread
From: Shawn O. Pearce @ 2010-03-26 21:56 UTC (permalink / raw)
  To: git, mike.lifeguard

Mike (CC'd) found a bad Git tree today, where the modes for subtrees
where formatted using a leading '0':

  $ od -c tree
  0000000   1   0   0   6   4   4       R   E   A   D   M   E  \0 244  \r
  0000020 233 214 350 375   0 263 374 227 264 343   $ 031 027   ` 373 301
  0000040   !   h   0   4   0   0   0   0       m   o   d   u   l   e   s
  0000060  \0 262   z   K 240   4 377   \ 245   C   c   " 231 377  \n   t
  0000100   ,  \n   O   R   E   0   4   0   0   0   0       s   t   e   w
  0000120   a   r   d   b   o   t  \0 037  \b   5 262 345 234 034 303   C
  0000140 373 335 207 300   u 341 277  \f   ] 320 207
  0000153

The '0' on the 3rd line after '! h' is wrong.  It shouldn't be here.
Likewise the '0' on the 5th line after "O R E" is also wrong.
At least its consistently broken.  But its still broken by fsck
standards:

 $ git fsck --full a39aa6d
 warning in tree a39aa6d4a6dcfd6c14d8f818bbdf1dfcb3e11771: contains zero-padded file modes

Mike claims this tree was created with git-core 1.7.0.3.  This thread
actually started over on Gerrit Code Review's mailing list [1],
because JGit refuses to allow this malformed tree mode to pass its
fsck implementation.

Any ideas?  Why is Git 1.7.0.3 jamming a leading '0' on a file mode?


[1] https://groups.google.com/group/repo-discuss/browse_thread/thread/6ff8d7ffba5a9775

-- 
Shawn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 21:56 Tree with leading '0' modes in 1.7.0.3 Shawn O. Pearce
@ 2010-03-26 22:26 ` Jonathan Nieder
  2010-03-26 22:29   ` Shawn O. Pearce
  0 siblings, 1 reply; 33+ messages in thread
From: Jonathan Nieder @ 2010-03-26 22:26 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git, mike.lifeguard

Shawn O. Pearce wrote:

> Any ideas?  Why is Git 1.7.0.3 jamming a leading '0' on a file mode?

See http://thread.gmane.org/gmane.comp.version-control.git/141028
and commit c88f0cc (notes: fix malformed tree entry, 2010-02-24).

The regression that that fixes appeared in 61a7cca0 (Notes API:
write_notes_tree(): Store the notes tree in the database, 2010-02-13),
which is not part of 1.7.0.3.

Still, HTH,
Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 22:26 ` Jonathan Nieder
@ 2010-03-26 22:29   ` Shawn O. Pearce
  2010-03-26 22:40     ` Jonathan Nieder
  2010-03-26 22:59     ` Mike.lifeguard
  0 siblings, 2 replies; 33+ messages in thread
From: Shawn O. Pearce @ 2010-03-26 22:29 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, mike.lifeguard

Jonathan Nieder <jrnieder@gmail.com> wrote:
> Shawn O. Pearce wrote:
> 
> > Any ideas?  Why is Git 1.7.0.3 jamming a leading '0' on a file mode?
> 
> See http://thread.gmane.org/gmane.comp.version-control.git/141028
> and commit c88f0cc (notes: fix malformed tree entry, 2010-02-24).
> 
> The regression that that fixes appeared in 61a7cca0 (Notes API:
> write_notes_tree(): Store the notes tree in the database, 2010-02-13),
> which is not part of 1.7.0.3.

That may be true... but I doubt the tree in question was a notes
tree.  The path entries were names like 'README', 'modules' and
'stewardbot'.  Something I would assume was the project's source
tree, not its notes tree.  Unless someone abused the note tree
editor to edit the README or something...

-- 
Shawn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 22:29   ` Shawn O. Pearce
@ 2010-03-26 22:40     ` Jonathan Nieder
  2010-03-26 23:09       ` Junio C Hamano
  2010-03-26 22:59     ` Mike.lifeguard
  1 sibling, 1 reply; 33+ messages in thread
From: Jonathan Nieder @ 2010-03-26 22:40 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git, mike.lifeguard

Shawn O. Pearce wrote:
> Jonathan Nieder <jrnieder@gmail.com> wrote:
>> Shawn O. Pearce wrote:

>>> Any ideas?  Why is Git 1.7.0.3 jamming a leading '0' on a file mode?
>> 
>> See http://thread.gmane.org/gmane.comp.version-control.git/141028
>> and commit c88f0cc (notes: fix malformed tree entry, 2010-02-24).
>> 
>> The regression that that fixes appeared in 61a7cca0 (Notes API:
>> write_notes_tree(): Store the notes tree in the database, 2010-02-13),
>> which is not part of 1.7.0.3.
>
> That may be true... but I doubt the tree in question was a notes
> tree.  The path entries were names like 'README', 'modules' and
> 'stewardbot'.  Something I would assume was the project's source
> tree, not its notes tree.

Yes, true.  The problem is probably elsewhere, especially because
1.7.0.3 doesn’t even have that commit.  Still, I find this a bit
strange because such breakage should have been noticeable if it
happens often.

What has changed recently that involves writing trees?

Jonathan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 22:29   ` Shawn O. Pearce
  2010-03-26 22:40     ` Jonathan Nieder
@ 2010-03-26 22:59     ` Mike.lifeguard
  2010-03-26 23:05       ` Shawn O. Pearce
  1 sibling, 1 reply; 33+ messages in thread
From: Mike.lifeguard @ 2010-03-26 22:59 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Jonathan Nieder, git, schacon

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10-03-26 07:29 PM, Shawn O. Pearce wrote:
> Something I would assume was the project's source
> tree, not its notes tree.

Yes, it is the source tree. We don't even know what a notes tree is.

Apparently Scott Chacon has some clue about this error:
http://support.github.com/discussions/repos/2566-strange-warning-from-fsck-and-github-repo-using-too-much-diskspace
so I've added him to CC. (Note that changing all SHA1s is not really a
problem for us, there are only 3 copies of the repo, and the project has
only been using version control for 2 days)

Thanks, all
- -Mike
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkutPG0ACgkQst0AR/DaKHuRtQCdEyy/KIWwpNYUA4EnkHGy2Y3D
chwAoLDzdhD9dmmn5mdkJxGrL5Kjlf4/
=k2Eg
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 22:59     ` Mike.lifeguard
@ 2010-03-26 23:05       ` Shawn O. Pearce
  2010-03-26 23:22         ` Mike.lifeguard
  2010-03-26 23:50         ` Junio C Hamano
  0 siblings, 2 replies; 33+ messages in thread
From: Shawn O. Pearce @ 2010-03-26 23:05 UTC (permalink / raw)
  To: Mike.lifeguard, Scott Chacon; +Cc: Jonathan Nieder, git

"Mike.lifeguard" <mike.lifeguard@gmail.com> wrote:
> Apparently Scott Chacon has some clue about this error:
> http://support.github.com/discussions/repos/2566-strange-warning-from-fsck-and-github-repo-using-too-much-diskspace
> so I've added him to CC. (Note that changing all SHA1s is not really a
> problem for us, there are only 3 copies of the repo, and the project has
> only been using version control for 2 days)

Scott, please fix that library on GitHub.  JGit's fsck has a hard
failure on these malformed trees, because the leading '0' mode
causes the tree to come up with the wrong SHA-1 hash given its
logical content.  They shouldn't be created like this.


Mike, it sounds like you might be able to fix your project by just
running something like:

  $ git filter-branch --index-filter '' --all

It rewrites the trees, which will change their SHA-1s (and the commit
SHA-1s downstream from there) with correctly formatted tree objects.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 22:40     ` Jonathan Nieder
@ 2010-03-26 23:09       ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2010-03-26 23:09 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Shawn O. Pearce, git, mike.lifeguard

Jonathan Nieder <jrnieder@gmail.com> writes:

> Shawn O. Pearce wrote:
>> Jonathan Nieder <jrnieder@gmail.com> wrote:
>>> Shawn O. Pearce wrote:
>
>>>> Any ideas?  Why is Git 1.7.0.3 jamming a leading '0' on a file mode?
>>> 
>>> See http://thread.gmane.org/gmane.comp.version-control.git/141028
>>> and commit c88f0cc (notes: fix malformed tree entry, 2010-02-24).
>>> 
>>> The regression that that fixes appeared in 61a7cca0 (Notes API:
>>> write_notes_tree(): Store the notes tree in the database, 2010-02-13),
>>> which is not part of 1.7.0.3.
>>
>> That may be true... but I doubt the tree in question was a notes
>> tree.  The path entries were names like 'README', 'modules' and
>> 'stewardbot'.  Something I would assume was the project's source
>> tree, not its notes tree.
>
> Yes, true.  The problem is probably elsewhere, especially because
> 1.7.0.3 doesn’t even have that commit.  Still, I find this a bit
> strange because such breakage should have been noticeable if it
> happens often.
>
> What has changed recently that involves writing trees?

Asking grep for "%06o" reveals nothing.  Perhaps somebody else's imitation
implementation?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 23:05       ` Shawn O. Pearce
@ 2010-03-26 23:22         ` Mike.lifeguard
  2010-03-26 23:49           ` Jonathan Nieder
  2010-03-26 23:50         ` Junio C Hamano
  1 sibling, 1 reply; 33+ messages in thread
From: Mike.lifeguard @ 2010-03-26 23:22 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Scott Chacon, Jonathan Nieder, git

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10-03-26 08:05 PM, Shawn O. Pearce wrote:
>   $ git filter-branch --index-filter '' --all

This and a few other variations I tried does rewrite things, but the
problem persists:

mikelifeguard@arbour:~/Code/git/stewbot (master)$ git filter-branch
- --subdirectory-filter '' -- --all
Rewrite 5b5d93ca1ebcbc90c3ad688b9b9751d014b452a8 (8/8)
WARNING: Ref 'refs/heads/master' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/remotes/origin/gh-pages' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
mikelifeguard@arbour:~/Code/git/stewbot (master)$ git fsck
warning in tree a39aa6d4a6dcfd6c14d8f818bbdf1dfcb3e11771: contains
zero-padded file modes

- -Mike
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkutQcQACgkQst0AR/DaKHudmACgtZ67Tv1pO769BL5OvduZacix
BvMAoLr1UTf5UTd6zowRDHLovBVSY+0R
=n/G1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 23:22         ` Mike.lifeguard
@ 2010-03-26 23:49           ` Jonathan Nieder
  0 siblings, 0 replies; 33+ messages in thread
From: Jonathan Nieder @ 2010-03-26 23:49 UTC (permalink / raw)
  To: Mike.lifeguard; +Cc: Shawn O. Pearce, Scott Chacon, git

Mike.lifeguard wrote:
> On 10-03-26 08:05 PM, Shawn O. Pearce wrote:

>>   $ git filter-branch --index-filter '' --all
>
> This and a few other variations I tried does rewrite things, but the
> problem persists:

Yes, I think git write-tree does not rewrite subtrees.  How about

git fast-export --all |
(
	cd /empty/repository &&
	git init &&
	git fast-import
)

?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 23:05       ` Shawn O. Pearce
  2010-03-26 23:22         ` Mike.lifeguard
@ 2010-03-26 23:50         ` Junio C Hamano
  2010-03-26 23:56           ` Avery Pennarun
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2010-03-26 23:50 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mike.lifeguard, Scott Chacon, Jonathan Nieder, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Scott, please fix that library on GitHub.  JGit's fsck has a hard
> failure on these malformed trees, because the leading '0' mode
> causes the tree to come up with the wrong SHA-1 hash given its
> logical content.  They shouldn't be created like this.

What is curious is that even though 6407180 (git-fsck-cache: be stricter
about "tree" objects, 2005-07-27) does talk about zero-padding, it appears
that we never had a version of git that padded mode in '0' in the entire
history of write-tree (except that "notes tree" one, but even that didn't
escape the laboratory).

But now we know there is a tool in the wild creating broken objects left
and right, jgit's fsck routine might need to be more lenient (while
warning loudly) in what it accepts.

Scott, does your tool have outside users (i.e. being freely distributed
and you have no control over the continued use of existing copies that
create broken objects)?  If not, then there won't be further damage once
you fix it at Github, and we may not have to worry about changing jgit
after all.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 23:50         ` Junio C Hamano
@ 2010-03-26 23:56           ` Avery Pennarun
  2010-03-27  0:00             ` Mike.lifeguard
  0 siblings, 1 reply; 33+ messages in thread
From: Avery Pennarun @ 2010-03-26 23:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Shawn O. Pearce, Mike.lifeguard, Scott Chacon, Jonathan Nieder,
	git

On Fri, Mar 26, 2010 at 7:50 PM, Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
>
>> Scott, please fix that library on GitHub.  JGit's fsck has a hard
>> failure on these malformed trees, because the leading '0' mode
>> causes the tree to come up with the wrong SHA-1 hash given its
>> logical content.  They shouldn't be created like this.
>
> What is curious is that even though 6407180 (git-fsck-cache: be stricter
> about "tree" objects, 2005-07-27) does talk about zero-padding, it appears
> that we never had a version of git that padded mode in '0' in the entire
> history of write-tree (except that "notes tree" one, but even that didn't
> escape the laboratory).

It's apparently an easy mistake to make.  bup did this for a while
until I added a 'git fsck' to its automated tests :)

The problem is that everything in git works perfectly with these
invalid file modes *except* fsck, and there's rarely a need to run
fsck, so this problem can hide for a long time.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-26 23:56           ` Avery Pennarun
@ 2010-03-27  0:00             ` Mike.lifeguard
  2010-03-27  1:22               ` Shawn O. Pearce
  0 siblings, 1 reply; 33+ messages in thread
From: Mike.lifeguard @ 2010-03-27  0:00 UTC (permalink / raw)
  To: Avery Pennarun
  Cc: Junio C Hamano, Shawn O. Pearce, Scott Chacon, Jonathan Nieder,
	git

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10-03-26 08:49 PM, Jonathan Nieder wrote:
> git fast-export --all |
> (
> 	cd /empty/repository &&
> 	git init &&
> 	git fast-import
> )

That one did something:
*When I cd-ed into the repo, there were staged changes waiting for me
(O.o) -- the changes would have simply deleted every file in the source
tree.
*git fsck had no warnings
*As predicted, SHA1s changed

On 10-03-26 08:56 PM, Avery Pennarun wrote:
> The problem is that everything in git works perfectly with these
> invalid file modes *except* fsck, and there's rarely a need to run
> fsck, so this problem can hide for a long time.

So, does the error matter or not? If it doesn't matter, then shouldn't
Jgit stop whining? If it does, then whatever-it-is needs to be fixed.

We're still not sure what was done with github to cause this. I've done
nothing with github's web interface, and the project lead can't recall
doing anything prior to this error (only stuff today, but this error
cropped up yesterday). I suspect witchcraft :P

- -Mike
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkutSoIACgkQst0AR/DaKHsblwCcC5j2jDuy95EOjhkK8adfWXl7
ZFEAnRn2bi9glDh6RR3xTwYkjxnMQYqx
=YSTH
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27  0:00             ` Mike.lifeguard
@ 2010-03-27  1:22               ` Shawn O. Pearce
  2010-03-27  1:30                 ` Nicolas Pitre
  0 siblings, 1 reply; 33+ messages in thread
From: Shawn O. Pearce @ 2010-03-27  1:22 UTC (permalink / raw)
  To: Mike.lifeguard
  Cc: Avery Pennarun, Junio C Hamano, Scott Chacon, Jonathan Nieder,
	git

"Mike.lifeguard" <mike.lifeguard@gmail.com> wrote:
> On 10-03-26 08:56 PM, Avery Pennarun wrote:
> > The problem is that everything in git works perfectly with these
> > invalid file modes *except* fsck, and there's rarely a need to run
> > fsck, so this problem can hide for a long time.
> 
> So, does the error matter or not? If it doesn't matter, then shouldn't
> Jgit stop whining? If it does, then whatever-it-is needs to be fixed.

Its less harmful than other types of corruption.  But its quite
wrong from a format perspective. The hash of the tree differs even
though there is no semantic difference in the tree content.

Given that GitHub has blessed the world with this corruption,
we may need to modify JGit to accept it.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27  1:22               ` Shawn O. Pearce
@ 2010-03-27  1:30                 ` Nicolas Pitre
  2010-03-27  1:34                   ` Shawn O. Pearce
  0 siblings, 1 reply; 33+ messages in thread
From: Nicolas Pitre @ 2010-03-27  1:30 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Mike.lifeguard, Avery Pennarun, Junio C Hamano, Scott Chacon,
	Jonathan Nieder, git

On Fri, 26 Mar 2010, Shawn O. Pearce wrote:

> "Mike.lifeguard" <mike.lifeguard@gmail.com> wrote:
> > On 10-03-26 08:56 PM, Avery Pennarun wrote:
> > > The problem is that everything in git works perfectly with these
> > > invalid file modes *except* fsck, and there's rarely a need to run
> > > fsck, so this problem can hide for a long time.
> > 
> > So, does the error matter or not? If it doesn't matter, then shouldn't
> > Jgit stop whining? If it does, then whatever-it-is needs to be fixed.
> 
> Its less harmful than other types of corruption.  But its quite
> wrong from a format perspective. The hash of the tree differs even
> though there is no semantic difference in the tree content.
> 
> Given that GitHub has blessed the world with this corruption,
> we may need to modify JGit to accept it.

Should we?

This is going to screw up pack v4 (yes, someday I'll have the time to 
make it real).


Nicolas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27  1:30                 ` Nicolas Pitre
@ 2010-03-27  1:34                   ` Shawn O. Pearce
  2010-03-27  1:56                     ` Nicolas Pitre
  2010-03-27  5:16                     ` Junio C Hamano
  0 siblings, 2 replies; 33+ messages in thread
From: Shawn O. Pearce @ 2010-03-27  1:34 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Mike.lifeguard, Avery Pennarun, Junio C Hamano, Scott Chacon,
	Jonathan Nieder, git

Nicolas Pitre <nico@fluxnic.net> wrote:
> On Fri, 26 Mar 2010, Shawn O. Pearce wrote:
> > "Mike.lifeguard" <mike.lifeguard@gmail.com> wrote:
> > > On 10-03-26 08:56 PM, Avery Pennarun wrote:
> > > > The problem is that everything in git works perfectly with these
> > > > invalid file modes *except* fsck, and there's rarely a need to run
> > > > fsck, so this problem can hide for a long time.
> > > 
> > > So, does the error matter or not? If it doesn't matter, then shouldn't
> > > Jgit stop whining? If it does, then whatever-it-is needs to be fixed.
> > 
> > Its less harmful than other types of corruption.  But its quite
> > wrong from a format perspective. The hash of the tree differs even
> > though there is no semantic difference in the tree content.
> > 
> > Given that GitHub has blessed the world with this corruption,
> > we may need to modify JGit to accept it.
> 
> Should we?
> 
> This is going to screw up pack v4 (yes, someday I'll have the time to 
> make it real).

Exactly.  I *really* don't want to permit this sort of corruption
in a Git repository.

But GitHub's approach here seems to be "Meh, its fine, don't worry
about it".

Its *NOT* fine.  But Avery and Junio might disagree with me.  :-)


Though, FWIW, it might not screw up pack v4.  IIRC from our
discussions long ago on pack v4, we store "$mode $name" pairs in
an indexed list, preciously because we needed to support odd modes
like 10664 from ancient Git binaries.  If we continue to allow this
corruption, it means we have to ensure $mode is the octal string
and not the binary value.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27  1:34                   ` Shawn O. Pearce
@ 2010-03-27  1:56                     ` Nicolas Pitre
  2010-03-27  2:33                       ` Avery Pennarun
  2010-03-27 12:44                       ` Scott Chacon
  2010-03-27  5:16                     ` Junio C Hamano
  1 sibling, 2 replies; 33+ messages in thread
From: Nicolas Pitre @ 2010-03-27  1:56 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Mike.lifeguard, Avery Pennarun, Junio C Hamano, Scott Chacon,
	Jonathan Nieder, git

On Fri, 26 Mar 2010, Shawn O. Pearce wrote:

> Nicolas Pitre <nico@fluxnic.net> wrote:
> > On Fri, 26 Mar 2010, Shawn O. Pearce wrote:
> > > Given that GitHub has blessed the world with this corruption,
> > > we may need to modify JGit to accept it.
> > 
> > Should we?
> > 
> > This is going to screw up pack v4 (yes, someday I'll have the time to 
> > make it real).
> 
> Exactly.  I *really* don't want to permit this sort of corruption
> in a Git repository.
> 
> But GitHub's approach here seems to be "Meh, its fine, don't worry
> about it".

It's up to GitHub to fork Git then, and while at it stop calling it Git 
compatible.  Really.  If we start to get slack about the pack format 
like this then every Git reimplementation du jour will make similar 
deviations except in different directions and we'll end up with a mess 
to support.

And in this case there is _no_ excuse as 'git fsck' is actually 
complaining.

My stance has always been that the C Git is authoritative with regards to 
formats and protocols.  It's up to Github to fix their screw-up.

> Its *NOT* fine.  But Avery and Junio might disagree with me.  :-)

FWIW I agree with you.

> Though, FWIW, it might not screw up pack v4.  IIRC from our
> discussions long ago on pack v4, we store "$mode $name" pairs in
> an indexed list, preciously because we needed to support odd modes
> like 10664 from ancient Git binaries.  If we continue to allow this
> corruption, it means we have to ensure $mode is the octal string
> and not the binary value.

Which is a real pity.

In fact, my position is that pack v4 would simply refuse to optimize the 
encoding for such tree objects, period.  Only the non ambiguously 
encoded tree objects would benefit from the v4 improvements.


Nicolas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27  1:56                     ` Nicolas Pitre
@ 2010-03-27  2:33                       ` Avery Pennarun
  2010-03-27 12:44                       ` Scott Chacon
  1 sibling, 0 replies; 33+ messages in thread
From: Avery Pennarun @ 2010-03-27  2:33 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Shawn O. Pearce, Mike.lifeguard, Junio C Hamano, Scott Chacon,
	Jonathan Nieder, git

On Fri, Mar 26, 2010 at 9:56 PM, Nicolas Pitre <nico@fluxnic.net> wrote:
> On Fri, 26 Mar 2010, Shawn O. Pearce wrote:
>> Its *NOT* fine.  But Avery and Junio might disagree with me.  :-)
>
> FWIW I agree with you.

I would also like to remove my name from the "disagree" list. :)

Producing nonstandard output isn't fine at all - I mentioned Postel's
Law, but the neglected half of that law is that you're supposed to
produce valid data in the first place.  This is why (as I mentioned
earlier) bup's automated tests now run 'git fsck' explicitly to verify
that it gets it right.  It was only the very first versions of bup,
which thankfully nobody used for anything important, that screwed this
up.  Barring any new and improved screw-ups, anyway.

I only brought it up to say that it's actually easy to make this
mistake undetected.  Very few people run git fsck nowadays.  The world
might benefit if git complained (albeit non-fatally) *whenever* it saw
such an incorrect tree.

> In fact, my position is that pack v4 would simply refuse to optimize the
> encoding for such tree objects, period.  Only the non ambiguously
> encoded tree objects would benefit from the v4 improvements.

This sounds very wise to me.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27  1:34                   ` Shawn O. Pearce
  2010-03-27  1:56                     ` Nicolas Pitre
@ 2010-03-27  5:16                     ` Junio C Hamano
  2010-03-27 19:20                       ` Shawn O. Pearce
  1 sibling, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2010-03-27  5:16 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Mike.lifeguard, Avery Pennarun, Scott Chacon,
	Jonathan Nieder, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> But GitHub's approach here seems to be "Meh, its fine, don't worry
> about it".
>
> Its *NOT* fine.  But Avery and Junio might disagree with me.  :-)

Did I ever say it is _fine_?  I thought I said "complain loudly".

That would at least give poor jgit users who have hit such a corrupted
object a chance to get a controlled notice and ask for help (and get an
insn to recover with filter-branch that appeared in this thread).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27  1:56                     ` Nicolas Pitre
  2010-03-27  2:33                       ` Avery Pennarun
@ 2010-03-27 12:44                       ` Scott Chacon
  2010-03-27 14:21                         ` Nicolas Pitre
  1 sibling, 1 reply; 33+ messages in thread
From: Scott Chacon @ 2010-03-27 12:44 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Shawn O. Pearce, Mike.lifeguard, Avery Pennarun, Junio C Hamano,
	Jonathan Nieder, git

Hey,

Sorry it's taken me a bit - I'm traveling right now.

On Fri, Mar 26, 2010 at 6:56 PM, Nicolas Pitre <nico@fluxnic.net> wrote:
>> > > Given that GitHub has blessed the world with this corruption,
>> > > we may need to modify JGit to accept it.

Well, shouldn't it accept it just because CGit accepts it?  Isn't that
an incompatibility in implementation?

>> But GitHub's approach here seems to be "Meh, its fine, don't worry
>> about it".

That isn't really my approach, I actually thought I had fixed this a
while ago.  It seems to be a pretty understandable mistake, since
ls-tree and cat-file -p both output zero padded modes and it is only
an issue on trees with subtrees, obviously, so we don't see it all the
time at GitHub.  I have fixed this and it's in the queue for
deployment which should be in the next few days (I gotta get home
first).

> It's up to GitHub to fork Git then, and while at it stop calling it Git
> compatible.  Really.  If we start to get slack about the pack format
> like this then every Git reimplementation du jour will make similar
> deviations except in different directions and we'll end up with a mess
> to support.

Really?  It's not the pack format - we use stock Git servers and
almost always have.  It's the tree writing when someone edits a file
inline - I was writing out zero-padded trees. And, it _is_ Git
compatible - CGit only issues a warning, and that only if the
circumstances align such that we write a tree with a subtree, which
again is pretty rare.  There are only a handful of projects like this
and in all CGit circumstances makes no practical difference.

> My stance has always been that the C Git is authoritative with regards to
> formats and protocols.  It's up to Github to fix their screw-up.

It is fixed and will be deployed soon, but really, there is no reason
to be snippy.  It is a simple and minor mistake effecting very few
repositories (maybe 100 out of 730k), and the only reason it's an
issue at all is that JGit is not following the authoritative CGit
implementation of basically ignoring it.

Also, if we're all concerned about "Git reimplementation du jour"
deviations, then we need to focus on libifying Git so there isn't a
need for such re-implementations.  I'm hoping to help with a possible
GSoC project on libgit2, but the lack of a linkable library will
ensure that re-implementations in nearly every useful language will
continue.

Scott

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 12:44                       ` Scott Chacon
@ 2010-03-27 14:21                         ` Nicolas Pitre
  2010-03-27 19:14                           ` Shawn O. Pearce
  0 siblings, 1 reply; 33+ messages in thread
From: Nicolas Pitre @ 2010-03-27 14:21 UTC (permalink / raw)
  To: Scott Chacon
  Cc: Shawn O. Pearce, Mike.lifeguard, Avery Pennarun, Junio C Hamano,
	Jonathan Nieder, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3481 bytes --]

On Sat, 27 Mar 2010, Scott Chacon wrote:

> Hey,
> 
> Sorry it's taken me a bit - I'm traveling right now.
> 
> On Fri, Mar 26, 2010 at 6:56 PM, Nicolas Pitre <nico@fluxnic.net> wrote:
> >> > > Given that GitHub has blessed the world with this corruption,
> >> > > we may need to modify JGit to accept it.
> 
> Well, shouldn't it accept it just because CGit accepts it?  Isn't that
> an incompatibility in implementation?

CGit fsck complains about it.  This should be sufficient a clue to 
avoid such things.

> >> But GitHub's approach here seems to be "Meh, its fine, don't worry
> >> about it".
> 
> That isn't really my approach, I actually thought I had fixed this a
> while ago.  It seems to be a pretty understandable mistake, since
> ls-tree and cat-file -p both output zero padded modes and it is only
> an issue on trees with subtrees, obviously, so we don't see it all the
> time at GitHub.  I have fixed this and it's in the queue for
> deployment which should be in the next few days (I gotta get home
> first).

Thanks.

> > It's up to GitHub to fork Git then, and while at it stop calling it Git
> > compatible.  Really.  If we start to get slack about the pack format
> > like this then every Git reimplementation du jour will make similar
> > deviations except in different directions and we'll end up with a mess
> > to support.
> 
> Really?  It's not the pack format - we use stock Git servers and
> almost always have.  It's the tree writing when someone edits a file
> inline - I was writing out zero-padded trees. And, it _is_ Git
> compatible - CGit only issues a warning, and that only if the
> circumstances align such that we write a tree with a subtree, which
> again is pretty rare.  There are only a handful of projects like this
> and in all CGit circumstances makes no practical difference.

It is still damn important to those with an interest in pack format 
improvements that only one way of creating a tree object exists, 
especially as we stamp a SHA1 hash on it.  Whatever we do with the tree 
encoding in the future, it is essential that the canonical expression of 
any tree object be unambiguous and always produce the same hash.

> > My stance has always been that the C Git is authoritative with regards to
> > formats and protocols.  It's up to Github to fix their screw-up.
> 
> It is fixed and will be deployed soon, but really, there is no reason
> to be snippy.  It is a simple and minor mistake effecting very few
> repositories (maybe 100 out of 730k), and the only reason it's an
> issue at all is that JGit is not following the authoritative CGit
> implementation of basically ignoring it.

But again CGit's fsck is not ignoring this discrepancy.  And if the CGit 
core is otherwise silently accepting it then it is a mistake.

> Also, if we're all concerned about "Git reimplementation du jour"
> deviations, then we need to focus on libifying Git so there isn't a
> need for such re-implementations.  I'm hoping to help with a possible
> GSoC project on libgit2, but the lack of a linkable library will
> ensure that re-implementations in nearly every useful language will
> continue.

Don't get me wrong.  I'm not against Git reimplementations per se, as 
long as they rigorously implement the exact format and protocol from 
CGit.  In that sense it is important that the CGit fsck and verify-pack 
tools be exploited on objects/packs produced by alternate Git 
implementation systematically to find such issues.


Nicolas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 14:21                         ` Nicolas Pitre
@ 2010-03-27 19:14                           ` Shawn O. Pearce
  2010-03-27 19:30                             ` A Large Angry SCM
                                               ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Shawn O. Pearce @ 2010-03-27 19:14 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Scott Chacon, Mike.lifeguard, Avery Pennarun, Junio C Hamano,
	Jonathan Nieder, git

Nicolas Pitre <nico@fluxnic.net> wrote:
> On Sat, 27 Mar 2010, Scott Chacon wrote:
> > > My stance has always been that the C Git is authoritative with regards to
> > > formats and protocols. ??It's up to Github to fix their screw-up.
> > 
> > It is fixed and will be deployed soon, but really, there is no reason
> > to be snippy.  It is a simple and minor mistake effecting very few
> > repositories (maybe 100 out of 730k)

What is the C Git stance on these 100 repositories then?  Are they
now considered corrupt?  Or is 100 enough in the wild that we have
to accept the problem, just like we accept the 10664 mode issue from
"ancient" Linux?

I would love to say "those are corrupt, sorry, fix your repository".

But we have traditionally tried to help our users, and not cause
them pain.  Forcing a rewrite on these 100 projects to fix up the
corruption is going to be painful for them.  

> > , and the only reason it's an
> > issue at all is that JGit is not following the authoritative CGit
> > implementation of basically ignoring it.
> 
> But again CGit's fsck is not ignoring this discrepancy.  And if the CGit 
> core is otherwise silently accepting it then it is a mistake.

Right.  I tend to agree.  CGit was too lax here, fsck shouldn't
be issuing a warning, it should be a fatal error.  Both CGit and
JGit are too lax by not failing when reading that tree during
normal processing.
 
> > Also, if we're all concerned about "Git reimplementation du jour"
> > deviations, then we need to focus on libifying Git so there isn't a
> > need for such re-implementations.  I'm hoping to help with a possible
> > GSoC project on libgit2, but the lack of a linkable library will
> > ensure that re-implementations in nearly every useful language will
> > continue.
> 
> Don't get me wrong.  I'm not against Git reimplementations per se, as 
> long as they rigorously implement the exact format and protocol from 
> CGit.  In that sense it is important that the CGit fsck and verify-pack 
> tools be exploited on objects/packs produced by alternate Git 
> implementation systematically to find such issues.

When JGit had the tree sort order wrong, JGit was in the wrong,
and any repository which contained those corrupt trees had to be
fixed by rewriting them.  IIRC it was only the JGit repository
itself that had this problem in the wild.  But we fixed our code.

IMHO, this leading '0' thing is a similar breakage.  We shouldn't
relax CGit or JGit to accept it just because the Ruby implementation
of Git got the tree encoding wrong.  If anything, we should teach
these implementations to catch these sorts of problems earlier.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27  5:16                     ` Junio C Hamano
@ 2010-03-27 19:20                       ` Shawn O. Pearce
  2010-03-27 20:04                         ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Shawn O. Pearce @ 2010-03-27 19:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nicolas Pitre, Mike.lifeguard, Avery Pennarun, Scott Chacon,
	Jonathan Nieder, git

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > But GitHub's approach here seems to be "Meh, its fine, don't worry
> > about it".
> >
> > Its *NOT* fine.  But Avery and Junio might disagree with me.  :-)
> 
> Did I ever say it is _fine_?  I thought I said "complain loudly".

I apologize if I misrepresented you above.
 
> That would at least give poor jgit users who have hit such a corrupted
> object a chance to get a controlled notice and ask for help (and get an
> insn to recover with filter-branch that appeared in this thread).

Well, there is "complain loudly but do it anyway" and "hard stop".

JGit currently has the leading '0' be a "hard stop".  Because this is
the fsck code running inside of the receive-pack service, validating
what the user sent is isn't malformed.  Its clearly malformed.

This only got discovered because Mike tried to take a repository
from GitHub and push it into Gerrit Code Review, where JGit's fsck
routine cannot be bypassed during receive-pack.

Are you suggesting JGit should change its behavior to be "complain
loudly but do it anyway"?  I'm open to making the code change there
if that is how you think a Git implementation should behave in
this case.  But I don't want to do it just to match CGit's behavior,
sometimes CGit can be wrong.  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:14                           ` Shawn O. Pearce
@ 2010-03-27 19:30                             ` A Large Angry SCM
  2010-03-27 19:32                               ` Shawn O. Pearce
  2010-03-27 20:13                             ` A Large Angry SCM
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: A Large Angry SCM @ 2010-03-27 19:30 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun,
	Junio C Hamano, Jonathan Nieder, git

Shawn O. Pearce wrote:
> Nicolas Pitre <nico@fluxnic.net> wrote:
>> On Sat, 27 Mar 2010, Scott Chacon wrote:
>>>> My stance has always been that the C Git is authoritative with regards to
>>>> formats and protocols. ??It's up to Github to fix their screw-up.
>>> It is fixed and will be deployed soon, but really, there is no reason
>>> to be snippy.  It is a simple and minor mistake effecting very few
>>> repositories (maybe 100 out of 730k)
> 
> What is the C Git stance on these 100 repositories then?  Are they
> now considered corrupt?  Or is 100 enough in the wild that we have
> to accept the problem, just like we accept the 10664 mode issue from
> "ancient" Linux?
> 
> I would love to say "those are corrupt, sorry, fix your repository".
> 
> But we have traditionally tried to help our users, and not cause
> them pain.  Forcing a rewrite on these 100 projects to fix up the
> corruption is going to be painful for them.  
> 
>>> , and the only reason it's an
>>> issue at all is that JGit is not following the authoritative CGit
>>> implementation of basically ignoring it.
>> But again CGit's fsck is not ignoring this discrepancy.  And if the CGit 
>> core is otherwise silently accepting it then it is a mistake.
> 
> Right.  I tend to agree.  CGit was too lax here, fsck shouldn't
> be issuing a warning, it should be a fatal error.  Both CGit and
> JGit are too lax by not failing when reading that tree during
> normal processing.
>  
>>> Also, if we're all concerned about "Git reimplementation du jour"
>>> deviations, then we need to focus on libifying Git so there isn't a
>>> need for such re-implementations.  I'm hoping to help with a possible
>>> GSoC project on libgit2, but the lack of a linkable library will
>>> ensure that re-implementations in nearly every useful language will
>>> continue.
>> Don't get me wrong.  I'm not against Git reimplementations per se, as 
>> long as they rigorously implement the exact format and protocol from 
>> CGit.  In that sense it is important that the CGit fsck and verify-pack 
>> tools be exploited on objects/packs produced by alternate Git 
>> implementation systematically to find such issues.
> 
> When JGit had the tree sort order wrong, JGit was in the wrong,
> and any repository which contained those corrupt trees had to be
> fixed by rewriting them.  IIRC it was only the JGit repository
> itself that had this problem in the wild.  But we fixed our code.
> 
> IMHO, this leading '0' thing is a similar breakage.  We shouldn't
> relax CGit or JGit to accept it just because the Ruby implementation
> of Git got the tree encoding wrong.  If anything, we should teach
> these implementations to catch these sorts of problems earlier.
> 

Just add an additional data point, it looks like up to 16 of these trees 
with zero-padded file modes are reachable from Linus' kernel master ref.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:30                             ` A Large Angry SCM
@ 2010-03-27 19:32                               ` Shawn O. Pearce
  2010-03-27 19:39                                 ` A Large Angry SCM
  0 siblings, 1 reply; 33+ messages in thread
From: Shawn O. Pearce @ 2010-03-27 19:32 UTC (permalink / raw)
  To: A Large Angry SCM
  Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun,
	Junio C Hamano, Jonathan Nieder, git

A Large Angry SCM <gitzilla@gmail.com> wrote:
> Shawn O. Pearce wrote:
>>
>> IMHO, this leading '0' thing is a similar breakage.  We shouldn't
>> relax CGit or JGit to accept it just because the Ruby implementation
>> of Git got the tree encoding wrong.  If anything, we should teach
>> these implementations to catch these sorts of problems earlier.
>
> Just add an additional data point, it looks like up to 16 of these trees  
> with zero-padded file modes are reachable from Linus' kernel master ref.

Frell.

We can't ask Linus to rewrite his history to repair this breakage.
The fact that its made it into the kernel history means we have
to accept this.  The kernel project is simply too large and move
too fast for us to ask them to fix their repository history.
Smaller projects of 1-2 people, we could have gotten away with
asking them to fix their history.

I guess that answers the questions then.  CGit permits this with
a warning, and must always continue to do that.  And JGit needs to
fix itself to do the same.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:32                               ` Shawn O. Pearce
@ 2010-03-27 19:39                                 ` A Large Angry SCM
  2010-03-27 19:44                                   ` A Large Angry SCM
  0 siblings, 1 reply; 33+ messages in thread
From: A Large Angry SCM @ 2010-03-27 19:39 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun,
	Junio C Hamano, Jonathan Nieder, git

Shawn O. Pearce wrote:
> A Large Angry SCM <gitzilla@gmail.com> wrote:
>> Shawn O. Pearce wrote:
>>> IMHO, this leading '0' thing is a similar breakage.  We shouldn't
>>> relax CGit or JGit to accept it just because the Ruby implementation
>>> of Git got the tree encoding wrong.  If anything, we should teach
>>> these implementations to catch these sorts of problems earlier.
>> Just add an additional data point, it looks like up to 16 of these trees  
>> with zero-padded file modes are reachable from Linus' kernel master ref.
> 
> Frell.
> 
> We can't ask Linus to rewrite his history to repair this breakage.
> The fact that its made it into the kernel history means we have
> to accept this.  The kernel project is simply too large and move
> too fast for us to ask them to fix their repository history.
> Smaller projects of 1-2 people, we could have gotten away with
> asking them to fix their history.
> 
> I guess that answers the questions then.  CGit permits this with
> a warning, and must always continue to do that.  And JGit needs to
> fix itself to do the same.
> 

Wait a minute, something strange is going on here.

My combined kernel repository has 16 of these things according to 
git-fsck. And when I do 'git-fsck torvalds/linux-2.6/master' I get the 
same 16 BUT when I 'git-rev-list --objects torvalds/linux-2.6/master' 
they do not appear in the output.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:39                                 ` A Large Angry SCM
@ 2010-03-27 19:44                                   ` A Large Angry SCM
  2010-03-27 19:57                                     ` A Large Angry SCM
  2010-03-28 17:38                                     ` Sitaram Chamarty
  0 siblings, 2 replies; 33+ messages in thread
From: A Large Angry SCM @ 2010-03-27 19:44 UTC (permalink / raw)
  To: gitzilla
  Cc: Shawn O. Pearce, Nicolas Pitre, Scott Chacon, Mike.lifeguard,
	Avery Pennarun, Junio C Hamano, Jonathan Nieder, git

A Large Angry SCM wrote:
> Shawn O. Pearce wrote:
>> A Large Angry SCM <gitzilla@gmail.com> wrote:
>>> Shawn O. Pearce wrote:
>>>> IMHO, this leading '0' thing is a similar breakage.  We shouldn't
>>>> relax CGit or JGit to accept it just because the Ruby implementation
>>>> of Git got the tree encoding wrong.  If anything, we should teach
>>>> these implementations to catch these sorts of problems earlier.
>>> Just add an additional data point, it looks like up to 16 of these 
>>> trees  with zero-padded file modes are reachable from Linus' kernel 
>>> master ref.
>>
>> Frell.
>>
>> We can't ask Linus to rewrite his history to repair this breakage.
>> The fact that its made it into the kernel history means we have
>> to accept this.  The kernel project is simply too large and move
>> too fast for us to ask them to fix their repository history.
>> Smaller projects of 1-2 people, we could have gotten away with
>> asking them to fix their history.
>>
>> I guess that answers the questions then.  CGit permits this with
>> a warning, and must always continue to do that.  And JGit needs to
>> fix itself to do the same.
>>
> 
> Wait a minute, something strange is going on here.
> 
> My combined kernel repository has 16 of these things according to 
> git-fsck. And when I do 'git-fsck torvalds/linux-2.6/master' I get the 
> same 16 BUT when I 'git-rev-list --objects torvalds/linux-2.6/master' 
> they do not appear in the output.
> 

Ignore my noise until some more checking is done!

My combined repository also includes Scott's progit book and examples 
repositories. I'm guessing that git-fsck did not limit itself to just 
the object reachable from the torvalds/linux-2.6/master ref.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:44                                   ` A Large Angry SCM
@ 2010-03-27 19:57                                     ` A Large Angry SCM
  2010-03-28 17:38                                     ` Sitaram Chamarty
  1 sibling, 0 replies; 33+ messages in thread
From: A Large Angry SCM @ 2010-03-27 19:57 UTC (permalink / raw)
  To: gitzilla
  Cc: Shawn O. Pearce, Nicolas Pitre, Scott Chacon, Mike.lifeguard,
	Avery Pennarun, Junio C Hamano, Jonathan Nieder, git

A Large Angry SCM wrote:
> A Large Angry SCM wrote:
>> Shawn O. Pearce wrote:
>>> A Large Angry SCM <gitzilla@gmail.com> wrote:
>>>> Shawn O. Pearce wrote:
>>>>> IMHO, this leading '0' thing is a similar breakage.  We shouldn't
>>>>> relax CGit or JGit to accept it just because the Ruby implementation
>>>>> of Git got the tree encoding wrong.  If anything, we should teach
>>>>> these implementations to catch these sorts of problems earlier.
>>>> Just add an additional data point, it looks like up to 16 of these 
>>>> trees  with zero-padded file modes are reachable from Linus' kernel 
>>>> master ref.
>>>
>>> Frell.
>>>
>>> We can't ask Linus to rewrite his history to repair this breakage.
>>> The fact that its made it into the kernel history means we have
>>> to accept this.  The kernel project is simply too large and move
>>> too fast for us to ask them to fix their repository history.
>>> Smaller projects of 1-2 people, we could have gotten away with
>>> asking them to fix their history.
>>>
>>> I guess that answers the questions then.  CGit permits this with
>>> a warning, and must always continue to do that.  And JGit needs to
>>> fix itself to do the same.
>>>
>>
>> Wait a minute, something strange is going on here.
>>
>> My combined kernel repository has 16 of these things according to 
>> git-fsck. And when I do 'git-fsck torvalds/linux-2.6/master' I get the 
>> same 16 BUT when I 'git-rev-list --objects torvalds/linux-2.6/master' 
>> they do not appear in the output.
>>
> 
> Ignore my noise until some more checking is done!
> 
> My combined repository also includes Scott's progit book and examples 
> repositories. I'm guessing that git-fsck did not limit itself to just 
> the object reachable from the torvalds/linux-2.6/master ref.
> 

The 16 offending trees are all git-rev-list reachable from Scott's 
progit book master ref and _NOT_ from Linus' Linux master ref.

Sorry about the confusion!!!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:20                       ` Shawn O. Pearce
@ 2010-03-27 20:04                         ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2010-03-27 20:04 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Mike.lifeguard, Avery Pennarun, Scott Chacon,
	Jonathan Nieder, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> JGit currently has the leading '0' be a "hard stop".  Because this is
> the fsck code running inside of the receive-pack service, validating
> what the user sent is isn't malformed.  Its clearly malformed.

The "complain loudly to let the user know about the need to fix the
corrupt repository" comment was meant for the "git fsck" equivalent of
jgit (if there such a thing).

I think "hard stop to prevent corruption from getting propagated" is
actually something we should do in receive-pack in the reference
implementation if we don't do so already.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:14                           ` Shawn O. Pearce
  2010-03-27 19:30                             ` A Large Angry SCM
@ 2010-03-27 20:13                             ` A Large Angry SCM
  2010-03-27 20:16                             ` Junio C Hamano
  2010-03-27 22:16                             ` Avery Pennarun
  3 siblings, 0 replies; 33+ messages in thread
From: A Large Angry SCM @ 2010-03-27 20:13 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun,
	Junio C Hamano, Jonathan Nieder, git

Shawn O. Pearce wrote:
> Nicolas Pitre <nico@fluxnic.net> wrote:
>> On Sat, 27 Mar 2010, Scott Chacon wrote:
>>>> My stance has always been that the C Git is authoritative with regards to
>>>> formats and protocols. ??It's up to Github to fix their screw-up.
>>> It is fixed and will be deployed soon, but really, there is no reason
>>> to be snippy.  It is a simple and minor mistake effecting very few
>>> repositories (maybe 100 out of 730k)
> 
> What is the C Git stance on these 100 repositories then?  Are they
> now considered corrupt?  Or is 100 enough in the wild that we have
> to accept the problem, just like we accept the 10664 mode issue from
> "ancient" Linux?
> 
> I would love to say "those are corrupt, sorry, fix your repository".
> 
> But we have traditionally tried to help our users, and not cause
> them pain.  Forcing a rewrite on these 100 projects to fix up the
> corruption is going to be painful for them.  
> 
>>> , and the only reason it's an
>>> issue at all is that JGit is not following the authoritative CGit
>>> implementation of basically ignoring it.
>> But again CGit's fsck is not ignoring this discrepancy.  And if the CGit 
>> core is otherwise silently accepting it then it is a mistake.
> 
> Right.  I tend to agree.  CGit was too lax here, fsck shouldn't
> be issuing a warning, it should be a fatal error.  Both CGit and
> JGit are too lax by not failing when reading that tree during
> normal processing.

CGit should treat the object as corrupt, output a message to that 
effect, and continue checking the rest of the objects. Everything else 
that traverses graph should exit with an error as soon as it tries 
detects a corrupt object.

This would allow someone to use git-for-each-ref and git-rev-list to 
prune the graph by deleting refs without trashing the entire repository.

>>> Also, if we're all concerned about "Git reimplementation du jour"
>>> deviations, then we need to focus on libifying Git so there isn't a
>>> need for such re-implementations.  I'm hoping to help with a possible
>>> GSoC project on libgit2, but the lack of a linkable library will
>>> ensure that re-implementations in nearly every useful language will
>>> continue.
>> Don't get me wrong.  I'm not against Git reimplementations per se, as 
>> long as they rigorously implement the exact format and protocol from 
>> CGit.  In that sense it is important that the CGit fsck and verify-pack 
>> tools be exploited on objects/packs produced by alternate Git 
>> implementation systematically to find such issues.
> 
> When JGit had the tree sort order wrong, JGit was in the wrong,
> and any repository which contained those corrupt trees had to be
> fixed by rewriting them.  IIRC it was only the JGit repository
> itself that had this problem in the wild.  But we fixed our code.
> 
> IMHO, this leading '0' thing is a similar breakage.  We shouldn't
> relax CGit or JGit to accept it just because the Ruby implementation
> of Git got the tree encoding wrong.  If anything, we should teach
> these implementations to catch these sorts of problems earlier.
> 

I agree. Now how can the git community help them help themselves?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:14                           ` Shawn O. Pearce
  2010-03-27 19:30                             ` A Large Angry SCM
  2010-03-27 20:13                             ` A Large Angry SCM
@ 2010-03-27 20:16                             ` Junio C Hamano
  2010-03-27 22:16                             ` Avery Pennarun
  3 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2010-03-27 20:16 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Avery Pennarun,
	Jonathan Nieder, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Nicolas Pitre <nico@fluxnic.net> wrote:
>> On Sat, 27 Mar 2010, Scott Chacon wrote:
>> > > My stance has always been that the C Git is authoritative with regards to
>> > > formats and protocols. ??It's up to Github to fix their screw-up.
>> > 
>> > It is fixed and will be deployed soon, but really, there is no reason
>> > to be snippy.  It is a simple and minor mistake effecting very few
>> > repositories (maybe 100 out of 730k)
>
> What is the C Git stance on these 100 repositories then?  Are they
> now considered corrupt?  Or is 100 enough in the wild that we have
> to accept the problem, just like we accept the 10664 mode issue from
> "ancient" Linux?
>
> I would love to say "those are corrupt, sorry, fix your repository".

This is why I first asked how widespread the copies of the implementation
of that broken tool are.  If it is only 100 and all breakages are confined
to objects created at GitHub installation, and the owners of these 100
repositories are not locally creating corrupt objects with copies of
broken reimplementation of git they have, I would say that we tell them to
fix it, and GitHub can hopefully help them as their hosting site.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:14                           ` Shawn O. Pearce
                                               ` (2 preceding siblings ...)
  2010-03-27 20:16                             ` Junio C Hamano
@ 2010-03-27 22:16                             ` Avery Pennarun
  3 siblings, 0 replies; 33+ messages in thread
From: Avery Pennarun @ 2010-03-27 22:16 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Nicolas Pitre, Scott Chacon, Mike.lifeguard, Junio C Hamano,
	Jonathan Nieder, git

On Sat, Mar 27, 2010 at 3:14 PM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Nicolas Pitre <nico@fluxnic.net> wrote:
>> On Sat, 27 Mar 2010, Scott Chacon wrote:
>> > , and the only reason it's an
>> > issue at all is that JGit is not following the authoritative CGit
>> > implementation of basically ignoring it.
>>
>> But again CGit's fsck is not ignoring this discrepancy.  And if the CGit
>> core is otherwise silently accepting it then it is a mistake.
>
> Right.  I tend to agree.  CGit was too lax here, fsck shouldn't
> be issuing a warning, it should be a fatal error.  Both CGit and
> JGit are too lax by not failing when reading that tree during
> normal processing.

Gah, no!  Why would you want to make CGit work *less*?  Print a
warning, sure, but since the tree is perfectly readable, there's no
reason to refuse to read it.  That's just rude.

Similarly, there should be no reason for fsck to treat *any*
recoverable error as fatal.  If it drops dead, it then misses the
chance to diagnose later problems.  When I run fsck, I want to see all
the problems, not just the first one, especially if the first one can
be fixed by filter-branch.  Bonus points if (like e2fsck) it offers to
fix it for me, though that's probably not worth implementing here.

CGit works fine already.  The only problem it has is that it works so
well that Scott didn't notice his bug.  This can be fixed by adding a
simple warning.

Have fun,

Avery

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-27 19:44                                   ` A Large Angry SCM
  2010-03-27 19:57                                     ` A Large Angry SCM
@ 2010-03-28 17:38                                     ` Sitaram Chamarty
  2010-03-28 23:28                                       ` A Large Angry SCM
  1 sibling, 1 reply; 33+ messages in thread
From: Sitaram Chamarty @ 2010-03-28 17:38 UTC (permalink / raw)
  To: gitzilla
  Cc: Shawn O. Pearce, Nicolas Pitre, Scott Chacon, Mike.lifeguard,
	Avery Pennarun, Junio C Hamano, Jonathan Nieder, git

On Sun, Mar 28, 2010 at 1:14 AM, A Large Angry SCM <gitzilla@gmail.com> wrote:

> My combined repository also includes Scott's progit book and examples
> repositories. I'm guessing that git-fsck did not limit itself to just the
> object reachable from the torvalds/linux-2.6/master ref.

<struck speechless>

You have *one* repo containing both Scott's book and the Linux kernel
tree?  "large angry SCM" is probably an understatement then... any SCM
has the right to be angry mixing up things that are so unrelated!

Or did I totally, *totally* misunderstand...?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Tree with leading '0' modes in 1.7.0.3
  2010-03-28 17:38                                     ` Sitaram Chamarty
@ 2010-03-28 23:28                                       ` A Large Angry SCM
  0 siblings, 0 replies; 33+ messages in thread
From: A Large Angry SCM @ 2010-03-28 23:28 UTC (permalink / raw)
  To: Sitaram Chamarty
  Cc: Shawn O. Pearce, Nicolas Pitre, Scott Chacon, Mike.lifeguard,
	Avery Pennarun, Junio C Hamano, Jonathan Nieder, git

Sitaram Chamarty wrote:
> On Sun, Mar 28, 2010 at 1:14 AM, A Large Angry SCM <gitzilla@gmail.com> wrote:
> 
>> My combined repository also includes Scott's progit book and examples
>> repositories. I'm guessing that git-fsck did not limit itself to just the
>> object reachable from the torvalds/linux-2.6/master ref.
> 
> <struck speechless>
> 
> You have *one* repo containing both Scott's book and the Linux kernel
> tree?  "large angry SCM" is probably an understatement then... any SCM
> has the right to be angry mixing up things that are so unrelated!
> 
> Or did I totally, *totally* misunderstand...?
> 

You understood correctly. I use that repository to track activities in a 
number of different but related projects/communities. Scott's stuff has 
since been purged until he fixes the corruption.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2010-03-28 23:28 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-26 21:56 Tree with leading '0' modes in 1.7.0.3 Shawn O. Pearce
2010-03-26 22:26 ` Jonathan Nieder
2010-03-26 22:29   ` Shawn O. Pearce
2010-03-26 22:40     ` Jonathan Nieder
2010-03-26 23:09       ` Junio C Hamano
2010-03-26 22:59     ` Mike.lifeguard
2010-03-26 23:05       ` Shawn O. Pearce
2010-03-26 23:22         ` Mike.lifeguard
2010-03-26 23:49           ` Jonathan Nieder
2010-03-26 23:50         ` Junio C Hamano
2010-03-26 23:56           ` Avery Pennarun
2010-03-27  0:00             ` Mike.lifeguard
2010-03-27  1:22               ` Shawn O. Pearce
2010-03-27  1:30                 ` Nicolas Pitre
2010-03-27  1:34                   ` Shawn O. Pearce
2010-03-27  1:56                     ` Nicolas Pitre
2010-03-27  2:33                       ` Avery Pennarun
2010-03-27 12:44                       ` Scott Chacon
2010-03-27 14:21                         ` Nicolas Pitre
2010-03-27 19:14                           ` Shawn O. Pearce
2010-03-27 19:30                             ` A Large Angry SCM
2010-03-27 19:32                               ` Shawn O. Pearce
2010-03-27 19:39                                 ` A Large Angry SCM
2010-03-27 19:44                                   ` A Large Angry SCM
2010-03-27 19:57                                     ` A Large Angry SCM
2010-03-28 17:38                                     ` Sitaram Chamarty
2010-03-28 23:28                                       ` A Large Angry SCM
2010-03-27 20:13                             ` A Large Angry SCM
2010-03-27 20:16                             ` Junio C Hamano
2010-03-27 22:16                             ` Avery Pennarun
2010-03-27  5:16                     ` Junio C Hamano
2010-03-27 19:20                       ` Shawn O. Pearce
2010-03-27 20:04                         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).