* Inspecting a corrupt git object
@ 2010-08-04 9:25 Magnus Bäck
2010-08-04 9:48 ` Alejandro Riveira Fernández
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Magnus Bäck @ 2010-08-04 9:25 UTC (permalink / raw)
To: git
We recently discovered a git tree object corruption in one of our
busiest gits on the master server. From what I can tell "git cat-file -p"
output looked just fine, but "git gc" complained loudly about the object
being corrupt. I had the same git cloned on my machine and found (after
unpacking the packfiles) that my object was different from the one on
the server. Same size and everything, but the second byte (and only the
second byte) differed between good and bad object.
$ head -n 5 /tmp/hexdump_corrupt.txt
00000000 78 9c 2b 29 4a 4d 55 30 32 36 62 30 34 30 30 33 |x.+)JMU026b04003|
00000010 31 51 70 cc 4b 29 ca cf 4c d1 cb cd 66 a8 38 dd |1Qp.K)..L...f.8.|
00000020 76 77 82 ba af da a1 66 06 b9 b4 03 66 9d 27 18 |vw.....f....f.'.|
00000030 93 ec 50 55 f9 26 e6 65 a6 a5 16 97 e8 55 e4 e6 |..PU.&.e.....U..|
00000040 30 d8 98 fe a9 93 98 cc be 24 a4 ac 93 3b 43 b7 |0........$...;C.|
$ head -n 5 /tmp/hexdump_okay.txt
00000000 78 01 2b 29 4a 4d 55 30 32 36 62 30 34 30 30 33 |x.+)JMU026b04003|
00000010 31 51 70 cc 4b 29 ca cf 4c d1 cb cd 66 a8 38 dd |1Qp.K)..L...f.8.|
00000020 76 77 82 ba af da a1 66 06 b9 b4 03 66 9d 27 18 |vw.....f....f.'.|
00000030 93 ec 50 55 f9 26 e6 65 a6 a5 16 97 e8 55 e4 e6 |..PU.&.e.....U..|
00000040 30 d8 98 fe a9 93 98 cc be 24 a4 ac 93 3b 43 b7 |0........$...;C.|
From what I gather from the community book and Pro Git, a git object
file is a deflated representation of the object type as a string, the
payload size, a null byte, and the payload. Is there a standard tool for
inflating the file back so that I can inspect what the actual difference
between these two are? Short of writing a tool utilizing zlib, at least.
Any other ideas why we would see such a difference? Hardware
malfunction or memory corruption I guess, but something else?
I can supply the actual object files if necessary.
--
Magnus Bäck Opinions are my own and do not necessarily
SW Configuration Manager represent the ones of my employer, etc.
Sony Ericsson
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inspecting a corrupt git object
2010-08-04 9:25 Inspecting a corrupt git object Magnus Bäck
@ 2010-08-04 9:48 ` Alejandro Riveira Fernández
2010-08-04 13:09 ` Magnus Bäck
2010-08-04 9:48 ` Thomas Rast
2010-08-04 11:11 ` Holger Hellmuth
2 siblings, 1 reply; 6+ messages in thread
From: Alejandro Riveira Fernández @ 2010-08-04 9:48 UTC (permalink / raw)
To: git
On Wed, 04 Aug 2010 11:25:30 +0200, Magnus Bäck wrote:
[ ... ]
>
> From what I gather from the community book and Pro Git, a git object
> file is a deflated representation of the object type as a string, the
> payload size, a null byte, and the payload. Is there a standard tool for
> inflating the file back so that I can inspect what the actual difference
> between these two are? Short of writing a tool utilizing zlib, at least.
Maybe
git cat-file -p <sha1>
?
>
> Any other ideas why we would see such a difference? Hardware malfunction
> or memory corruption I guess, but something else? I can supply the
> actual object files if necessary.
Alejandro
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inspecting a corrupt git object
2010-08-04 9:48 ` Alejandro Riveira Fernández
@ 2010-08-04 13:09 ` Magnus Bäck
0 siblings, 0 replies; 6+ messages in thread
From: Magnus Bäck @ 2010-08-04 13:09 UTC (permalink / raw)
To: Alejandro Riveira Fernández; +Cc: git
On Wednesday, August 04, 2010 at 11:48 CEST,
Alejandro Riveira Fernández <ariveira@gmail.com> wrote:
> On Wed, 04 Aug 2010 11:25:30 +0200, Magnus Bäck wrote:
>
> > From what I gather from the community book and Pro Git, a git object
> > file is a deflated representation of the object type as a string,
> > the payload size, a null byte, and the payload. Is there a standard
> > tool for inflating the file back so that I can inspect what the
> > actual difference between these two are? Short of writing a tool
> > utilizing zlib, at least.
>
> Maybe
>
> git cat-file -p <sha1>
>
> ?
Sorry, I should've been more clear here. I know about cat-file's
pretty-printing abilities, but I just wanted to inflate the loose
object data and see *exactly* where the differing byte ended up.
--
Magnus Bäck Opinions are my own and do not necessarily
SW Configuration Manager represent the ones of my employer, etc.
Sony Ericsson
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inspecting a corrupt git object
2010-08-04 9:25 Inspecting a corrupt git object Magnus Bäck
2010-08-04 9:48 ` Alejandro Riveira Fernández
@ 2010-08-04 9:48 ` Thomas Rast
2010-08-04 13:02 ` Magnus Bäck
2010-08-04 11:11 ` Holger Hellmuth
2 siblings, 1 reply; 6+ messages in thread
From: Thomas Rast @ 2010-08-04 9:48 UTC (permalink / raw)
To: Magnus Bäck; +Cc: git
Magnus Bäck wrote:
>
> $ head -n 1 /tmp/hexdump_corrupt.txt
> 00000000 78 9c 2b 29 4a 4d 55 30 32 36 62 30 34 30 30 33 |x.+)JMU026b04003|
> $ head -n 1 /tmp/hexdump_okay.txt
> 00000000 78 01 2b 29 4a 4d 55 30 32 36 62 30 34 30 30 33 |x.+)JMU026b04003|
>
> From what I gather from the community book and Pro Git, a git object
> file is a deflated representation of the object type as a string, the
> payload size, a null byte, and the payload. Is there a standard tool for
> inflating the file back so that I can inspect what the actual difference
> between these two are? Short of writing a tool utilizing zlib, at least.
I'm sure it's a one-liner in almost any scripting language, e.g. you
can use
python -c 'import sys,zlib; sys.stdout.write(zlib.decompress(open(sys.argv[1]).read()))'
with a filename argument if you have Python at hand.
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inspecting a corrupt git object
2010-08-04 9:48 ` Thomas Rast
@ 2010-08-04 13:02 ` Magnus Bäck
0 siblings, 0 replies; 6+ messages in thread
From: Magnus Bäck @ 2010-08-04 13:02 UTC (permalink / raw)
To: Thomas Rast; +Cc: git
On Wednesday, August 04, 2010 at 11:48 CEST,
Thomas Rast <trast@student.ethz.ch> wrote:
> Magnus Bäck wrote:
>
> > From what I gather from the community book and Pro Git, a git object
> > file is a deflated representation of the object type as a string,
> > the payload size, a null byte, and the payload. Is there a standard
> > tool for inflating the file back so that I can inspect what the
> > actual difference between these two are? Short of writing a tool
> > utilizing zlib, at least.
>
> I'm sure it's a one-liner in almost any scripting language, e.g. you
> can use
>
> python -c 'import sys,zlib; sys.stdout.write(zlib.decompress(open(sys.argv[1]).read()))'
>
> with a filename argument if you have Python at hand.
That worked fine, thanks. Apparently this difference in the second byte
of the compressed data makes no difference for the end result -- the two
inflated files are identical.
Interestingly, just as we were about to transplant the loose object from
my working repository to the server where "git gc" failed and the object
was seemingly corrupt, the person doing the actual work (I don't have
access to the server) ran "git gc" to find the id of the bad object, and
suddenly it completed without errors. The object in question had now
been included in a packfile, and upon unpacking that packfile to inspect
the object it was identical to the file I had, i.e. the new loose object
was different from the original loose object. I had expected a loose
object -> packfile -> loose object cycle to not change anything.
Everything seems to be back to normal now, which is good, but I prefer
I understand why things get fixed.
We did have some initial problems with reaching the per-process limit
for open files (as no repack had been done for an extended time and 5000
packfiles were lingering), but it seems weird for such a problem to be
related to the possible corruptness of a single tree object.
--
Magnus Bäck Opinions are my own and do not necessarily
SW Configuration Manager represent the ones of my employer, etc.
Sony Ericsson
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inspecting a corrupt git object
2010-08-04 9:25 Inspecting a corrupt git object Magnus Bäck
2010-08-04 9:48 ` Alejandro Riveira Fernández
2010-08-04 9:48 ` Thomas Rast
@ 2010-08-04 11:11 ` Holger Hellmuth
2 siblings, 0 replies; 6+ messages in thread
From: Holger Hellmuth @ 2010-08-04 11:11 UTC (permalink / raw)
To: git
Magnus Bäck schrieb:
> Any other ideas why we would see such a difference? Hardware
> malfunction or memory corruption I guess, but something else?
> I can supply the actual object files if necessary.
>
I checked with a repository here and all objects seem to start with 78
01. That means it is a common prefix. Ergo no malicious tampering, as
that would make only sense if the contents of the blob had changed.
So a random hardware or software malfunction is left as explanation IMHO
Holger
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-08-04 13:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-04 9:25 Inspecting a corrupt git object Magnus Bäck
2010-08-04 9:48 ` Alejandro Riveira Fernández
2010-08-04 13:09 ` Magnus Bäck
2010-08-04 9:48 ` Thomas Rast
2010-08-04 13:02 ` Magnus Bäck
2010-08-04 11:11 ` Holger Hellmuth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).