git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Inspecting a corrupt git object
@ 2010-08-04  9:25 Magnus Bäck
  2010-08-04  9:48 ` Alejandro Riveira Fernández
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Magnus Bäck @ 2010-08-04  9:25 UTC (permalink / raw)
  To: git

We recently discovered a git tree object corruption in one of our
busiest gits on the master server. From what I can tell "git cat-file -p"
output looked just fine, but "git gc" complained loudly about the object
being corrupt. I had the same git cloned on my machine and found (after
unpacking the packfiles) that my object was different from the one on
the server. Same size and everything, but the second byte (and only the
second byte) differed between good and bad object.

$ head -n 5 /tmp/hexdump_corrupt.txt
00000000  78 9c 2b 29 4a 4d 55 30  32 36 62 30 34 30 30 33 |x.+)JMU026b04003|
00000010  31 51 70 cc 4b 29 ca cf  4c d1 cb cd 66 a8 38 dd |1Qp.K)..L...f.8.|
00000020  76 77 82 ba af da a1 66  06 b9 b4 03 66 9d 27 18 |vw.....f....f.'.|
00000030  93 ec 50 55 f9 26 e6 65  a6 a5 16 97 e8 55 e4 e6 |..PU.&.e.....U..|
00000040  30 d8 98 fe a9 93 98 cc  be 24 a4 ac 93 3b 43 b7 |0........$...;C.|
$ head -n 5 /tmp/hexdump_okay.txt
00000000  78 01 2b 29 4a 4d 55 30  32 36 62 30 34 30 30 33 |x.+)JMU026b04003|
00000010  31 51 70 cc 4b 29 ca cf  4c d1 cb cd 66 a8 38 dd |1Qp.K)..L...f.8.|
00000020  76 77 82 ba af da a1 66  06 b9 b4 03 66 9d 27 18 |vw.....f....f.'.|
00000030  93 ec 50 55 f9 26 e6 65  a6 a5 16 97 e8 55 e4 e6 |..PU.&.e.....U..|
00000040  30 d8 98 fe a9 93 98 cc  be 24 a4 ac 93 3b 43 b7 |0........$...;C.|

From what I gather from the community book and Pro Git, a git object
file is a deflated representation of the object type as a string, the
payload size, a null byte, and the payload. Is there a standard tool for
inflating the file back so that I can inspect what the actual difference
between these two are? Short of writing a tool utilizing zlib, at least.

Any other ideas why we would see such a difference? Hardware
malfunction or memory corruption I guess, but something else?
I can supply the actual object files if necessary.

-- 
Magnus Bäck                      Opinions are my own and do not necessarily
SW Configuration Manager         represent the ones of my employer, etc.
Sony Ericsson

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inspecting a corrupt git object
  2010-08-04  9:25 Inspecting a corrupt git object Magnus Bäck
@ 2010-08-04  9:48 ` Alejandro Riveira Fernández
  2010-08-04 13:09   ` Magnus Bäck
  2010-08-04  9:48 ` Thomas Rast
  2010-08-04 11:11 ` Holger Hellmuth
  2 siblings, 1 reply; 6+ messages in thread
From: Alejandro Riveira Fernández @ 2010-08-04  9:48 UTC (permalink / raw)
  To: git

On Wed, 04 Aug 2010 11:25:30 +0200, Magnus Bäck wrote:

[ ... ]
> 
> From what I gather from the community book and Pro Git, a git object
> file is a deflated representation of the object type as a string, the
> payload size, a null byte, and the payload. Is there a standard tool for
> inflating the file back so that I can inspect what the actual difference
> between these two are? Short of writing a tool utilizing zlib, at least.

 Maybe

 git cat-file -p <sha1>
 
 ?

> 
> Any other ideas why we would see such a difference? Hardware malfunction
> or memory corruption I guess, but something else? I can supply the
> actual object files if necessary.

Alejandro

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inspecting a corrupt git object
  2010-08-04  9:25 Inspecting a corrupt git object Magnus Bäck
  2010-08-04  9:48 ` Alejandro Riveira Fernández
@ 2010-08-04  9:48 ` Thomas Rast
  2010-08-04 13:02   ` Magnus Bäck
  2010-08-04 11:11 ` Holger Hellmuth
  2 siblings, 1 reply; 6+ messages in thread
From: Thomas Rast @ 2010-08-04  9:48 UTC (permalink / raw)
  To: Magnus Bäck; +Cc: git

Magnus Bäck wrote:
> 
> $ head -n 1 /tmp/hexdump_corrupt.txt
> 00000000  78 9c 2b 29 4a 4d 55 30  32 36 62 30 34 30 30 33 |x.+)JMU026b04003|
> $ head -n 1 /tmp/hexdump_okay.txt
> 00000000  78 01 2b 29 4a 4d 55 30  32 36 62 30 34 30 30 33 |x.+)JMU026b04003|
> 
> From what I gather from the community book and Pro Git, a git object
> file is a deflated representation of the object type as a string, the
> payload size, a null byte, and the payload. Is there a standard tool for
> inflating the file back so that I can inspect what the actual difference
> between these two are? Short of writing a tool utilizing zlib, at least.

I'm sure it's a one-liner in almost any scripting language, e.g. you
can use

  python -c 'import sys,zlib; sys.stdout.write(zlib.decompress(open(sys.argv[1]).read()))'

with a filename argument if you have Python at hand.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inspecting a corrupt git object
  2010-08-04  9:25 Inspecting a corrupt git object Magnus Bäck
  2010-08-04  9:48 ` Alejandro Riveira Fernández
  2010-08-04  9:48 ` Thomas Rast
@ 2010-08-04 11:11 ` Holger Hellmuth
  2 siblings, 0 replies; 6+ messages in thread
From: Holger Hellmuth @ 2010-08-04 11:11 UTC (permalink / raw)
  To: git

Magnus Bäck schrieb:
> Any other ideas why we would see such a difference? Hardware
> malfunction or memory corruption I guess, but something else?
> I can supply the actual object files if necessary.
> 

I checked with a repository here and all objects seem to start with 78
01. That means it is a common prefix. Ergo no malicious tampering, as
that would make only sense if the contents of the blob had changed.

So a random hardware or software malfunction is left as explanation IMHO

Holger

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inspecting a corrupt git object
  2010-08-04  9:48 ` Thomas Rast
@ 2010-08-04 13:02   ` Magnus Bäck
  0 siblings, 0 replies; 6+ messages in thread
From: Magnus Bäck @ 2010-08-04 13:02 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git

On Wednesday, August 04, 2010 at 11:48 CEST,
     Thomas Rast <trast@student.ethz.ch> wrote:

> Magnus Bäck wrote:
>
> > From what I gather from the community book and Pro Git, a git object
> > file is a deflated representation of the object type as a string,
> > the payload size, a null byte, and the payload. Is there a standard
> > tool for inflating the file back so that I can inspect what the
> > actual difference between these two are? Short of writing a tool
> > utilizing zlib, at least.
> 
> I'm sure it's a one-liner in almost any scripting language, e.g. you
> can use
> 
>   python -c 'import sys,zlib; sys.stdout.write(zlib.decompress(open(sys.argv[1]).read()))'
> 
> with a filename argument if you have Python at hand.

That worked fine, thanks. Apparently this difference in the second byte
of the compressed data makes no difference for the end result -- the two
inflated files are identical.

Interestingly, just as we were about to transplant the loose object from
my working repository to the server where "git gc" failed and the object
was seemingly corrupt, the person doing the actual work (I don't have
access to the server) ran "git gc" to find the id of the bad object, and
suddenly it completed without errors. The object in question had now
been included in a packfile, and upon unpacking that packfile to inspect
the object it was identical to the file I had, i.e. the new loose object
was different from the original loose object. I had expected a loose
object -> packfile -> loose object cycle to not change anything.
Everything seems to be back to normal now, which is good, but I prefer
I understand why things get fixed.

We did have some initial problems with reaching the per-process limit
for open files (as no repack had been done for an extended time and 5000
packfiles were lingering), but it seems weird for such a problem to be
related to the possible corruptness of a single tree object.

-- 
Magnus Bäck                      Opinions are my own and do not necessarily
SW Configuration Manager         represent the ones of my employer, etc.
Sony Ericsson

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inspecting a corrupt git object
  2010-08-04  9:48 ` Alejandro Riveira Fernández
@ 2010-08-04 13:09   ` Magnus Bäck
  0 siblings, 0 replies; 6+ messages in thread
From: Magnus Bäck @ 2010-08-04 13:09 UTC (permalink / raw)
  To: Alejandro Riveira Fernández; +Cc: git

On Wednesday, August 04, 2010 at 11:48 CEST,
     Alejandro Riveira Fernández <ariveira@gmail.com> wrote:

> On Wed, 04 Aug 2010 11:25:30 +0200, Magnus Bäck wrote:
>
> > From what I gather from the community book and Pro Git, a git object
> > file is a deflated representation of the object type as a string,
> > the payload size, a null byte, and the payload. Is there a standard
> > tool for inflating the file back so that I can inspect what the
> > actual difference between these two are? Short of writing a tool
> > utilizing zlib, at least.
>
>  Maybe
>
>  git cat-file -p <sha1>
>
>  ?

Sorry, I should've been more clear here. I know about cat-file's
pretty-printing abilities, but I just wanted to inflate the loose
object data and see *exactly* where the differing byte ended up.

-- 
Magnus Bäck                      Opinions are my own and do not necessarily
SW Configuration Manager         represent the ones of my employer, etc.
Sony Ericsson

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-04 13:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-04  9:25 Inspecting a corrupt git object Magnus Bäck
2010-08-04  9:48 ` Alejandro Riveira Fernández
2010-08-04 13:09   ` Magnus Bäck
2010-08-04  9:48 ` Thomas Rast
2010-08-04 13:02   ` Magnus Bäck
2010-08-04 11:11 ` Holger Hellmuth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).