* fsck errors on newly cloned, newly imported git repository @ 2010-10-24 15:54 Mike Herrick 2010-10-25 10:58 ` Drew Northup 0 siblings, 1 reply; 5+ messages in thread From: Mike Herrick @ 2010-10-24 15:54 UTC (permalink / raw) To: git This weekend we're cutting over to use git for our source code control system. I've imported about 20 years worth of previous history using "git cvsimport" (takes about four hours). I then cloned the resulting repository onto five different machines (four Linux, one Solaris). I've set up a cron job to do a nightly "git fsck" on each of the five machines, and last night, two of the machines reported fsck errors on their initial run. Here's a sample of the errors: error: packed cd00921f75f91985d1b67181632a4764af50d4e8 from .git/objects/pack/pack-b17f2e0a970084fed1f7a6c7664601e78059063f.pack is corrupt error: sha1 mismatch 20abcd833a10aad51ff7f59b6a5e179d77e9a388 error: 20abcd833a10aad51ff7f59b6a5e179d77e9a388: object corrupt or missing error: sha1 mismatch 343c28f127a5e5b9b85b0bdc5419e131b10ff2f0 ... broken link from tree df17bc72fd5f7cea686f97e14f71f8464149ed25 to blob d085a51be07285bec9ccf0323a7cf47856dbb31f broken link from tree ab9e5b7383bcde71680abe552e30ae5abf64cf6d to blob 83e8475441911692d1a63d0272e17d62d1b7b8d1 ... missing blob ad3209e27bbc3676bf06f889779908928948b65a missing blob 4d4829314e64e2a0524fa520f59f7d18482e2b0a missing blob 9ed481a1970e5f38b1479241ed21a2296c09cda0 ... The errors reported on these two machines were different, but what's interesting is that all of the missing blobs refer to various revisions of the same file, namely our "Changes" file (which is updated with each change). It's also the largest file in our repository (3.3M). I immediately started looking at logs to see if there was any indication of disk corruption and found none (no SMART errors either). Both of these machines have been stable over a multi-year period of time (no unexplained crashes). They're also older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with relatively little memory: 1GB and .5GB), but with newly installed version of git (1.7.3.1). I initially used git-daemon for the clone process, but even using ssh, I still see fsck errors on the resulting clones on these two machines. If I don't find an explanation for this behavior, our conversion to git will need to be backed out before development begins tomorrow :-(. Thanks for any pointers. Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fsck errors on newly cloned, newly imported git repository 2010-10-24 15:54 fsck errors on newly cloned, newly imported git repository Mike Herrick @ 2010-10-25 10:58 ` Drew Northup 2010-10-25 12:25 ` Mike Herrick 0 siblings, 1 reply; 5+ messages in thread From: Drew Northup @ 2010-10-25 10:58 UTC (permalink / raw) To: Mike Herrick; +Cc: git On Sun, 2010-10-24 at 11:54 -0400, Mike Herrick wrote: > This weekend we're cutting over to use git for our source code control > system. I've imported about 20 years worth of previous history using > "git cvsimport" (takes about four hours). I then cloned the resulting > repository onto five different machines (four Linux, one Solaris). > I've set up a cron job to do a nightly "git fsck" on each of the five > machines, and last night, two of the machines reported fsck errors on > their initial run. <snip> > The errors reported on these two machines were different, but what's > interesting is that all of the missing blobs refer to various > revisions of the same file, namely our "Changes" file (which is > updated with each change). It's also the largest file in our > repository (3.3M). I immediately started looking at logs to see if > there was any indication of disk corruption and found none (no SMART > errors either). Both of these machines have been stable over a > multi-year period of time (no unexplained crashes). They're also > older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with > relatively little memory: 1GB and .5GB), but with newly installed > version of git (1.7.3.1). I initially used git-daemon for the clone > process, but even using ssh, I still see fsck errors on the resulting > clones on these two machines. Did you "git fsck" BEFORE you attempted to clone? Is it ONLY clones showing errors? Alas, no blatant evidence of disk corruption is not evidence of no disk corruption as well. -- -Drew Northup N1XIM AKA RvnPhnx on OPN ________________________________________________ "As opposed to vegetable or mineral error?" -John Pescatore, SANS NewsBites Vol. 12 Num. 59 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fsck errors on newly cloned, newly imported git repository 2010-10-25 10:58 ` Drew Northup @ 2010-10-25 12:25 ` Mike Herrick 2010-10-25 14:02 ` Mike Herrick 0 siblings, 1 reply; 5+ messages in thread From: Mike Herrick @ 2010-10-25 12:25 UTC (permalink / raw) To: Drew Northup; +Cc: git On Mon, Oct 25, 2010 at 6:58 AM, Drew Northup <drew.northup@maine.edu> wrote: > > On Sun, 2010-10-24 at 11:54 -0400, Mike Herrick wrote: >> This weekend we're cutting over to use git for our source code control >> system. I've imported about 20 years worth of previous history using >> "git cvsimport" (takes about four hours). I then cloned the resulting >> repository onto five different machines (four Linux, one Solaris). >> I've set up a cron job to do a nightly "git fsck" on each of the five >> machines, and last night, two of the machines reported fsck errors on >> their initial run. > <snip> > >> The errors reported on these two machines were different, but what's >> interesting is that all of the missing blobs refer to various >> revisions of the same file, namely our "Changes" file (which is >> updated with each change). It's also the largest file in our >> repository (3.3M). I immediately started looking at logs to see if >> there was any indication of disk corruption and found none (no SMART >> errors either). Both of these machines have been stable over a >> multi-year period of time (no unexplained crashes). They're also >> older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with >> relatively little memory: 1GB and .5GB), but with newly installed >> version of git (1.7.3.1). I initially used git-daemon for the clone >> process, but even using ssh, I still see fsck errors on the resulting >> clones on these two machines. > > Did you "git fsck" BEFORE you attempted to clone? Is it ONLY clones > showing errors? Alas, no blatant evidence of disk corruption is not > evidence of no disk corruption as well. Thanks for your reply. Only two of the five clones exhibit fsck errors and the server repository has no fsck errors. The two machines report different sets of missing blobs, but always in the "Changes" file (which has the somewhat unique characteristics that it is the "most changed" file in the repository, the largest, and one which is almost always only added to). I've since created two more clones on one of the machines (one using git-daemon and the other ssh) and both of these clones have the exact same set of missing blobs! For me this rules out disk corruption. The good(?) news is that the process is repeatable on one machine: cloning from a known good repository results in different (but repeatable) errors. Performing a second clone on the other "bad" machine also results in missing blobs, but different ones than the first (although all in the Changes file). My current thought is that somehow it's related to very old kernels? Apparently these machines are FC2 vintage. Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fsck errors on newly cloned, newly imported git repository 2010-10-25 12:25 ` Mike Herrick @ 2010-10-25 14:02 ` Mike Herrick 2010-10-28 23:40 ` Mike Herrick 0 siblings, 1 reply; 5+ messages in thread From: Mike Herrick @ 2010-10-25 14:02 UTC (permalink / raw) To: Drew Northup; +Cc: git On Mon, Oct 25, 2010 at 8:25 AM, Mike Herrick <mike.herrick@gmail.com> wrote: > On Mon, Oct 25, 2010 at 6:58 AM, Drew Northup <drew.northup@maine.edu> wrote: >> >> On Sun, 2010-10-24 at 11:54 -0400, Mike Herrick wrote: >>> This weekend we're cutting over to use git for our source code control >>> system. I've imported about 20 years worth of previous history using >>> "git cvsimport" (takes about four hours). I then cloned the resulting >>> repository onto five different machines (four Linux, one Solaris). >>> I've set up a cron job to do a nightly "git fsck" on each of the five >>> machines, and last night, two of the machines reported fsck errors on >>> their initial run. >> <snip> >> >>> The errors reported on these two machines were different, but what's >>> interesting is that all of the missing blobs refer to various >>> revisions of the same file, namely our "Changes" file (which is >>> updated with each change). It's also the largest file in our >>> repository (3.3M). I immediately started looking at logs to see if >>> there was any indication of disk corruption and found none (no SMART >>> errors either). Both of these machines have been stable over a >>> multi-year period of time (no unexplained crashes). They're also >>> older Linux machines (running 2.6.5-1.358 and 2.6.1-1.65, with >>> relatively little memory: 1GB and .5GB), but with newly installed >>> version of git (1.7.3.1). I initially used git-daemon for the clone >>> process, but even using ssh, I still see fsck errors on the resulting >>> clones on these two machines. >> >> Did you "git fsck" BEFORE you attempted to clone? Is it ONLY clones >> showing errors? Alas, no blatant evidence of disk corruption is not >> evidence of no disk corruption as well. > > Thanks for your reply. > > Only two of the five clones exhibit fsck errors and the server > repository has no fsck errors. > > The two machines report different sets of missing blobs, but always in > the "Changes" file (which has the somewhat unique characteristics that > it is the "most changed" file in the repository, the largest, and one > which is almost always only added to). > > I've since created two more clones on one of the machines (one using > git-daemon and the other ssh) and both of these clones have the exact > same set of missing blobs! For me this rules out disk corruption. > > The good(?) news is that the process is repeatable on one machine: > cloning from a known good repository results in different (but > repeatable) errors. Performing a second clone on the other "bad" > machine also results in missing blobs, but different ones than the > first (although all in the Changes file). > > My current thought is that somehow it's related to very old kernels? > Apparently these machines are FC2 vintage. We've backed out of our git cutover due to these errors. I should also point out that on the machine where the errors are repeatable, two of the clones were made to a local disk and one to an NFS disk, and all three showed the same missing blobs (another indication that it is unlikely to be a disk problem). It's also interesting that the missing blobs seem to be in the same general timeframe, 2001-2002 on one machine and 2008-2009 on the other machine (as evidenced by the file sizes of the missing blobs): [mikeh@mac5 src]$ for i in `cat /tmp/lin4`; do git cat-file -s $i ; done 1494474 1667992 1496198 1643008 1666070 1724686 1494201 1643297 1665137 1640569 1726140 [mikeh@mac5 src]$ for i in `cat /tmp/toulouse`; do git cat-file -s $i ; done 3055178 2858902 3060252 2887177 3038051 3033691 3008232 2981567 3000575 3081501 2995707 3070232 3076036 3059223 3075351 3070343 3054573 3033120 3028284 3078443 2896078 2895094 2973070 2859356 I was hoping that these would be on some type of boundary (and hence powers of two), but that doesn't seem to be the case. Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fsck errors on newly cloned, newly imported git repository 2010-10-25 14:02 ` Mike Herrick @ 2010-10-28 23:40 ` Mike Herrick 0 siblings, 0 replies; 5+ messages in thread From: Mike Herrick @ 2010-10-28 23:40 UTC (permalink / raw) To: Drew Northup; +Cc: git Following up to my own post, after some serious debugging, it turned out that the libcrypto library was to blame. I found that after the clone, a git index-pack on the pack-file would generate a different index each time (from the same pack-file). I narrowed it down to this routine returning an incorrect sha1 on "bad" machines: static void write_sha1_file_prepare(const void *buf, unsigned long len, const char *type, unsigned char *sha1, char *hdr, int *hdrlen) { git_SHA_CTX c; /* Generate the header */ *hdrlen = sprintf(hdr, "%s %lu", type, len)+1; /* Sha1.. */ git_SHA1_Init(&c); git_SHA1_Update(&c, hdr, *hdrlen); git_SHA1_Update(&c, buf, len); git_SHA1_Final(sha1, &c); } Upgrading the libcrypto solved the problem on both machines. FYI, the offending versions were openssl-0.9.7a-26 and openssl-0.9.7a-35 (these were Fedora Core 2 vintage). Hopefully no-one has any systems this old lying around any longer. Mike. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-10-28 23:40 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-24 15:54 fsck errors on newly cloned, newly imported git repository Mike Herrick 2010-10-25 10:58 ` Drew Northup 2010-10-25 12:25 ` Mike Herrick 2010-10-25 14:02 ` Mike Herrick 2010-10-28 23:40 ` Mike Herrick
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).