* fsck errors on newly cloned, newly imported git repository
@ 2010-10-24 15:54 Mike Herrick
2010-10-25 10:58 ` Drew Northup
0 siblings, 1 reply; 5+ messages in thread
From: Mike Herrick @ 2010-10-24 15:54 UTC (permalink / raw)
To: git
This weekend we're cutting over to use git for our source code control
system. I've imported about 20 years worth of previous history using
"git cvsimport" (takes about four hours). I then cloned the resulting
repository onto five different machines (four Linux, one Solaris).
I've set up a cron job to do a nightly "git fsck" on each of the five
machines, and last night, two of the machines reported fsck errors on
their initial run. Here's a sample of the errors:
error: packed cd00921f75f91985d1b67181632a4764af50d4e8 from
.git/objects/pack/pack-b17f2e0a970084fed1f7a6c7664601e78059063f.pack
is corrupt
error: sha1 mismatch 20abcd833a10aad51ff7f59b6a5e179d77e9a388
error: 20abcd833a10aad51ff7f59b6a5e179d77e9a388: object corrupt or missing
error: sha1 mismatch 343c28f127a5e5b9b85b0bdc5419e131b10ff2f0
...
broken link from tree df17bc72fd5f7cea686f97e14f71f8464149ed25
to blob d085a51be07285bec9ccf0323a7cf47856dbb31f
broken link from tree ab9e5b7383bcde71680abe552e30ae5abf64cf6d
to blob 83e8475441911692d1a63d0272e17d62d1b7b8d1
...
missing blob ad3209e27bbc3676bf06f889779908928948b65a
missing blob 4d4829314e64e2a0524fa520f59f7d18482e2b0a
missing blob 9ed481a1970e5f38b1479241ed21a2296c09cda0
...
The errors reported on these two machines were different, but what's
interesting is that all of the missing blobs refer to various
revisions of the same file, namely our "Changes" file (which is
updated with each change). It's also the largest file in our
repository (3.3M). I immediately started looking at logs to see if
there was any indication of disk corruption and found none (no SMART
errors either). Both of these machines have been stable over a
multi-year period of time (no unexplained crashes). They're also
older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with
relatively little memory: 1GB and .5GB), but with newly installed
version of git (1.7.3.1). I initially used git-daemon for the clone
process, but even using ssh, I still see fsck errors on the resulting
clones on these two machines.
If I don't find an explanation for this behavior, our conversion to
git will need to be backed out before development begins tomorrow :-(.
Thanks for any pointers.
Mike.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fsck errors on newly cloned, newly imported git repository
2010-10-24 15:54 fsck errors on newly cloned, newly imported git repository Mike Herrick
@ 2010-10-25 10:58 ` Drew Northup
2010-10-25 12:25 ` Mike Herrick
0 siblings, 1 reply; 5+ messages in thread
From: Drew Northup @ 2010-10-25 10:58 UTC (permalink / raw)
To: Mike Herrick; +Cc: git
On Sun, 2010-10-24 at 11:54 -0400, Mike Herrick wrote:
> This weekend we're cutting over to use git for our source code control
> system. I've imported about 20 years worth of previous history using
> "git cvsimport" (takes about four hours). I then cloned the resulting
> repository onto five different machines (four Linux, one Solaris).
> I've set up a cron job to do a nightly "git fsck" on each of the five
> machines, and last night, two of the machines reported fsck errors on
> their initial run.
<snip>
> The errors reported on these two machines were different, but what's
> interesting is that all of the missing blobs refer to various
> revisions of the same file, namely our "Changes" file (which is
> updated with each change). It's also the largest file in our
> repository (3.3M). I immediately started looking at logs to see if
> there was any indication of disk corruption and found none (no SMART
> errors either). Both of these machines have been stable over a
> multi-year period of time (no unexplained crashes). They're also
> older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with
> relatively little memory: 1GB and .5GB), but with newly installed
> version of git (1.7.3.1). I initially used git-daemon for the clone
> process, but even using ssh, I still see fsck errors on the resulting
> clones on these two machines.
Did you "git fsck" BEFORE you attempted to clone? Is it ONLY clones
showing errors? Alas, no blatant evidence of disk corruption is not
evidence of no disk corruption as well.
--
-Drew Northup N1XIM
AKA RvnPhnx on OPN
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fsck errors on newly cloned, newly imported git repository
2010-10-25 10:58 ` Drew Northup
@ 2010-10-25 12:25 ` Mike Herrick
2010-10-25 14:02 ` Mike Herrick
0 siblings, 1 reply; 5+ messages in thread
From: Mike Herrick @ 2010-10-25 12:25 UTC (permalink / raw)
To: Drew Northup; +Cc: git
On Mon, Oct 25, 2010 at 6:58 AM, Drew Northup <drew.northup@maine.edu> wrote:
>
> On Sun, 2010-10-24 at 11:54 -0400, Mike Herrick wrote:
>> This weekend we're cutting over to use git for our source code control
>> system. I've imported about 20 years worth of previous history using
>> "git cvsimport" (takes about four hours). I then cloned the resulting
>> repository onto five different machines (four Linux, one Solaris).
>> I've set up a cron job to do a nightly "git fsck" on each of the five
>> machines, and last night, two of the machines reported fsck errors on
>> their initial run.
> <snip>
>
>> The errors reported on these two machines were different, but what's
>> interesting is that all of the missing blobs refer to various
>> revisions of the same file, namely our "Changes" file (which is
>> updated with each change). It's also the largest file in our
>> repository (3.3M). I immediately started looking at logs to see if
>> there was any indication of disk corruption and found none (no SMART
>> errors either). Both of these machines have been stable over a
>> multi-year period of time (no unexplained crashes). They're also
>> older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with
>> relatively little memory: 1GB and .5GB), but with newly installed
>> version of git (1.7.3.1). I initially used git-daemon for the clone
>> process, but even using ssh, I still see fsck errors on the resulting
>> clones on these two machines.
>
> Did you "git fsck" BEFORE you attempted to clone? Is it ONLY clones
> showing errors? Alas, no blatant evidence of disk corruption is not
> evidence of no disk corruption as well.
Thanks for your reply.
Only two of the five clones exhibit fsck errors and the server
repository has no fsck errors.
The two machines report different sets of missing blobs, but always in
the "Changes" file (which has the somewhat unique characteristics that
it is the "most changed" file in the repository, the largest, and one
which is almost always only added to).
I've since created two more clones on one of the machines (one using
git-daemon and the other ssh) and both of these clones have the exact
same set of missing blobs! For me this rules out disk corruption.
The good(?) news is that the process is repeatable on one machine:
cloning from a known good repository results in different (but
repeatable) errors. Performing a second clone on the other "bad"
machine also results in missing blobs, but different ones than the
first (although all in the Changes file).
My current thought is that somehow it's related to very old kernels?
Apparently these machines are FC2 vintage.
Mike.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fsck errors on newly cloned, newly imported git repository
2010-10-25 12:25 ` Mike Herrick
@ 2010-10-25 14:02 ` Mike Herrick
2010-10-28 23:40 ` Mike Herrick
0 siblings, 1 reply; 5+ messages in thread
From: Mike Herrick @ 2010-10-25 14:02 UTC (permalink / raw)
To: Drew Northup; +Cc: git
On Mon, Oct 25, 2010 at 8:25 AM, Mike Herrick <mike.herrick@gmail.com> wrote:
> On Mon, Oct 25, 2010 at 6:58 AM, Drew Northup <drew.northup@maine.edu> wrote:
>>
>> On Sun, 2010-10-24 at 11:54 -0400, Mike Herrick wrote:
>>> This weekend we're cutting over to use git for our source code control
>>> system. I've imported about 20 years worth of previous history using
>>> "git cvsimport" (takes about four hours). I then cloned the resulting
>>> repository onto five different machines (four Linux, one Solaris).
>>> I've set up a cron job to do a nightly "git fsck" on each of the five
>>> machines, and last night, two of the machines reported fsck errors on
>>> their initial run.
>> <snip>
>>
>>> The errors reported on these two machines were different, but what's
>>> interesting is that all of the missing blobs refer to various
>>> revisions of the same file, namely our "Changes" file (which is
>>> updated with each change). It's also the largest file in our
>>> repository (3.3M). I immediately started looking at logs to see if
>>> there was any indication of disk corruption and found none (no SMART
>>> errors either). Both of these machines have been stable over a
>>> multi-year period of time (no unexplained crashes). They're also
>>> older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with
>>> relatively little memory: 1GB and .5GB), but with newly installed
>>> version of git (1.7.3.1). I initially used git-daemon for the clone
>>> process, but even using ssh, I still see fsck errors on the resulting
>>> clones on these two machines.
>>
>> Did you "git fsck" BEFORE you attempted to clone? Is it ONLY clones
>> showing errors? Alas, no blatant evidence of disk corruption is not
>> evidence of no disk corruption as well.
>
> Thanks for your reply.
>
> Only two of the five clones exhibit fsck errors and the server
> repository has no fsck errors.
>
> The two machines report different sets of missing blobs, but always in
> the "Changes" file (which has the somewhat unique characteristics that
> it is the "most changed" file in the repository, the largest, and one
> which is almost always only added to).
>
> I've since created two more clones on one of the machines (one using
> git-daemon and the other ssh) and both of these clones have the exact
> same set of missing blobs! For me this rules out disk corruption.
>
> The good(?) news is that the process is repeatable on one machine:
> cloning from a known good repository results in different (but
> repeatable) errors. Performing a second clone on the other "bad"
> machine also results in missing blobs, but different ones than the
> first (although all in the Changes file).
>
> My current thought is that somehow it's related to very old kernels?
> Apparently these machines are FC2 vintage.
We've backed out of our git cutover due to these errors.
I should also point out that on the machine where the errors are
repeatable, two of the clones were made to a local disk and one to an
NFS disk, and all three showed the same missing blobs (another
indication that it is unlikely to be a disk problem).
It's also interesting that the missing blobs seem to be in the same
general timeframe, 2001-2002 on one machine and 2008-2009 on the other
machine (as evidenced by the file sizes of the missing blobs):
[mikeh@mac5 src]$ for i in `cat /tmp/lin4`; do git cat-file -s $i ; done
1494474
1667992
1496198
1643008
1666070
1724686
1494201
1643297
1665137
1640569
1726140
[mikeh@mac5 src]$ for i in `cat /tmp/toulouse`; do git cat-file -s $i ; done
3055178
2858902
3060252
2887177
3038051
3033691
3008232
2981567
3000575
3081501
2995707
3070232
3076036
3059223
3075351
3070343
3054573
3033120
3028284
3078443
2896078
2895094
2973070
2859356
I was hoping that these would be on some type of boundary (and hence
powers of two), but that doesn't seem to be the case.
Mike.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fsck errors on newly cloned, newly imported git repository
2010-10-25 14:02 ` Mike Herrick
@ 2010-10-28 23:40 ` Mike Herrick
0 siblings, 0 replies; 5+ messages in thread
From: Mike Herrick @ 2010-10-28 23:40 UTC (permalink / raw)
To: Drew Northup; +Cc: git
Following up to my own post, after some serious debugging, it turned
out that the libcrypto library was to blame.
I found that after the clone, a git index-pack on the pack-file would
generate a different index each time (from the same pack-file). I
narrowed it down to this routine returning an incorrect sha1 on "bad"
machines:
static void write_sha1_file_prepare(const void *buf, unsigned long len,
const char *type, unsigned char *sha1,
char *hdr, int *hdrlen)
{
git_SHA_CTX c;
/* Generate the header */
*hdrlen = sprintf(hdr, "%s %lu", type, len)+1;
/* Sha1.. */
git_SHA1_Init(&c);
git_SHA1_Update(&c, hdr, *hdrlen);
git_SHA1_Update(&c, buf, len);
git_SHA1_Final(sha1, &c);
}
Upgrading the libcrypto solved the problem on both machines.
FYI, the offending versions were openssl-0.9.7a-26 and
openssl-0.9.7a-35 (these were Fedora Core 2 vintage). Hopefully
no-one has any systems this old lying around any longer.
Mike.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-10-28 23:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-24 15:54 fsck errors on newly cloned, newly imported git repository Mike Herrick
2010-10-25 10:58 ` Drew Northup
2010-10-25 12:25 ` Mike Herrick
2010-10-25 14:02 ` Mike Herrick
2010-10-28 23:40 ` Mike Herrick
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).