* [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file @ 2008-12-09 8:36 Jan Krüger 2008-12-09 9:02 ` R. Tyler Ballance ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Jan Krüger @ 2008-12-09 8:36 UTC (permalink / raw) To: Git ML; +Cc: tyler For fixing a corrupted repository by using backup copies of individual files, allow write_sha1_file() to write loose files even if the object already exists in a pack file, but only if the existing entry is marked as corrupted. Signed-off-by: Jan Krüger <jk@jk.gs> --- On IRC I talked to rtyler who had a corrupted pack file and plenty of object backups by way of cloned repositories. We decided to try extracting the corrupted objects from the other object database and injecting them into the broken repo as loose objects, but this failed because sha1_write_file() refuses to write loose objects that are already present in a pack file. This patch expands the check to see if the pack entry has been marked as corrupted and, if so, allows writing a loose object with the same ID. Unfortunately, when Tyler tried a merge while using this patch, something we didn't manage to track down happened and now git doesn't consider the object corrupted anymore. I'm not sure enough that it wasn't caused by the patch to submit this patch without hesitation. Apart from that, I think the change is not all too great since it makes write_sha1_file() walk the list of pack entries twice. That's a bit of a waste. So those are the reasons why I wanted a few opinions first. Another reason is that there might be a way smarter method to fix this kind of problem, in which case I'd love hearing about it for future reference. sha1_file.c | 9 +++++---- 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 6c0e251..17085cc 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2373,14 +2373,17 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha char hdr[32]; int hdrlen; - /* Normally if we have it in the pack then we do not bother writing - * it out into .git/objects/??/?{38} file. - */ write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen); if (returnsha1) hashcpy(returnsha1, sha1); - if (has_sha1_file(sha1)) - return 0; + /* Normally if we have it in the pack then we do not bother writing + * it out into .git/objects/??/?{38} file. We do, though, if there + * is no chance that we have an uncorrupted version of the object. + */ + if (has_sha1_file(sha1)) { + if (has_loose_object(sha1) || !has_packed_and_bad(sha1)) + return 0; + } return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); } -- 1.6.0.4.766.g6fc4a ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger @ 2008-12-09 9:02 ` R. Tyler Ballance 2008-12-09 16:24 ` Shawn O. Pearce 2009-01-06 22:52 ` R. Tyler Ballance 2 siblings, 0 replies; 49+ messages in thread From: R. Tyler Ballance @ 2008-12-09 9:02 UTC (permalink / raw) To: Jan Krüger; +Cc: Git ML [-- Attachment #1: Type: text/plain, Size: 2313 bytes --] On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote: > For fixing a corrupted repository by using backup copies of individual > files, allow write_sha1_file() to write loose files even if the object > already exists in a pack file, but only if the existing entry is marked > as corrupted. > > Signed-off-by: Jan Krüger <jk@jk.gs> > --- > > On IRC I talked to rtyler who had a corrupted pack file and plenty of > object backups by way of cloned repositories. We decided to try > extracting the corrupted objects from the other object database and > injecting them into the broken repo as loose objects, but this failed > because sha1_write_file() refuses to write loose objects that are > already present in a pack file. Figured I'd chime in here with some anecdotal evidence with the error condition that I hit shortly after Jan sent the email. xdev3 (master-release)% git pull --no-ff . master From . * branch master -> FETCH_HEAD error: failed to read object befd9bc4d184b4383569909e4d245f3337c1f8ed at offset 1415784644 from .git/objects/pack/pack-f7eb06e39f01b528c1d1a2c413ac51b31b8515aa.pack fatal: object befd9bc4d184b4383569909e4d245f3337c1f8ed is corrupted Merge with strategy recursive failed. xdev3 (master-release)% I ran that command a couple of times to make sure it wasn't a fluke, I repeated the error numerous times (without switching branches or pulling from a remote). This pull was done with a slightly modified internal version of v1.6.0.4 xdev3 (master-release)% git --version git version 1.6.0.4-kb1 xdev3 (master-release) After consulting with Jan, I tried running the same command with a modified version of v1.6.0.5 with Jan's patch xdev3 (master-release)% ~/basket/bin/git pull --no-ff . master From . * branch master -> FETCH_HEAD Merge made by recursive. ** TOP SECRET MERGES! ;) ** 13 files changed, 51 insertions(+), 21 deletions(-) xdev3 (master-release)% Purely anecdotal as I'm not entirely clear what the hell is actually going on here :) Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger 2008-12-09 9:02 ` R. Tyler Ballance @ 2008-12-09 16:24 ` Shawn O. Pearce 2009-01-06 22:52 ` R. Tyler Ballance 2 siblings, 0 replies; 49+ messages in thread From: Shawn O. Pearce @ 2008-12-09 16:24 UTC (permalink / raw) To: Jan Krüger; +Cc: Git ML, tyler Jan Krüger <jk@jk.gs> wrote: > For fixing a corrupted repository by using backup copies of individual > files, allow write_sha1_file() to write loose files even if the object > already exists in a pack file, but only if the existing entry is marked > as corrupted. Huh. So I'm digging around sha1_file.c and I'm not yet sure why your patch makes a difference. has_sha1_file() calls find_pack_entry() to determine which pack has the object, and at what offset (if found). It doesn't care about the offset, but it does care about the successful match. find_pack_entry() already considers the bad_object_sha1 for each pack, before it even tries the binary search within the index. So if the entry was known to be bad has_sha1_file() will return 0, unless the object is loose. Where this breaks down is if the object is being created, its very likely we didn't attempt to read it in this process. The bad_object_sha1 table is transient and populated only when unpacking an object entry fails. So for example during a merge if a tree was stored in a pack and is corrupt and the merge result produces that same tree object we won't write it out with write_sha1_file() because it exists in a pack, but since we never read it we also don't know the pack entry is corrupt. Its horribly inefficient to read every object before we write it back out. The best thing to do when faced with corruption is to stop and repack, overlaying the object database from a known good copy of the repository so pack-objects can use the good copy when a corrupt object is identified. So I agree with you that changing this in write_sha1_file() is a bad idea for the normal good cases, but I also don't see how this patch changes anything at all... the code path you introduced is already implemented. > diff --git a/sha1_file.c b/sha1_file.c > index 6c0e251..17085cc 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2373,14 +2373,17 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha > char hdr[32]; > int hdrlen; > > - /* Normally if we have it in the pack then we do not bother writing > - * it out into .git/objects/??/?{38} file. > - */ > write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen); > if (returnsha1) > hashcpy(returnsha1, sha1); > - if (has_sha1_file(sha1)) > - return 0; > + /* Normally if we have it in the pack then we do not bother writing > + * it out into .git/objects/??/?{38} file. We do, though, if there > + * is no chance that we have an uncorrupted version of the object. > + */ > + if (has_sha1_file(sha1)) { > + if (has_loose_object(sha1) || !has_packed_and_bad(sha1)) > + return 0; > + } > return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); > } -- Shawn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger 2008-12-09 9:02 ` R. Tyler Ballance 2008-12-09 16:24 ` Shawn O. Pearce @ 2009-01-06 22:52 ` R. Tyler Ballance 2009-01-07 1:25 ` Nicolas Pitre 2 siblings, 1 reply; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-06 22:52 UTC (permalink / raw) To: Jan Krüger; +Cc: Git ML [-- Attachment #1: Type: text/plain, Size: 3858 bytes --] On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote: > For fixing a corrupted repository by using backup copies of individual > files, allow write_sha1_file() to write loose files even if the object > already exists in a pack file, but only if the existing entry is marked > as corrupted. I figured I'd reply to this again, since the issue cropped up again. We started experiencing *large* numbers of corruptions like the ones that started the thread (one developer was receiving them once or twice a day) with v1.6.0.4 We went ahead and upgraded to a custom build of v1.6.1 with Jan's patch (below) and the issues /seem/ to have resolved themselves. I'm not certain whether Jan's patch was really responsible, or if there was another issue that caused this to correct itself in v1.6.1. As it stands, I think it's safe to assume that given the frequency of the occurances that they were not tied to a memory or disk error (or other levels of the machine's stack would be suffering as well). The only thing I can think of is that /some/ developers who've experienced the issue are using Samba mount points and changing files in Mac OS X, but using Git on the mounted share (i.e. TextMate changes a file hosted on Samba, changes are committed in an SSH session on that machine), but that doesn't account for everything. If there was something else included in the v1.6.1 release please let me know so I can back Jan's patch out. Cheers > > Signed-off-by: Jan Krüger <jk@jk.gs> > --- > > On IRC I talked to rtyler who had a corrupted pack file and plenty of > object backups by way of cloned repositories. We decided to try > extracting the corrupted objects from the other object database and > injecting them into the broken repo as loose objects, but this failed > because sha1_write_file() refuses to write loose objects that are > already present in a pack file. > > This patch expands the check to see if the pack entry has been marked > as corrupted and, if so, allows writing a loose object with the same > ID. Unfortunately, when Tyler tried a merge while using this patch, > something we didn't manage to track down happened and now git doesn't > consider the object corrupted anymore. I'm not sure enough that it > wasn't caused by the patch to submit this patch without hesitation. > > Apart from that, I think the change is not all too great since it makes > write_sha1_file() walk the list of pack entries twice. That's a bit of > a waste. > > So those are the reasons why I wanted a few opinions first. Another > reason is that there might be a way smarter method to fix this kind of > problem, in which case I'd love hearing about it for future reference. > > sha1_file.c | 9 +++++---- > 1 files changed, 5 insertions(+), 4 deletions(-) > > diff --git a/sha1_file.c b/sha1_file.c > index 6c0e251..17085cc 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2373,14 +2373,17 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha > char hdr[32]; > int hdrlen; > > - /* Normally if we have it in the pack then we do not bother writing > - * it out into .git/objects/??/?{38} file. > - */ > write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen); > if (returnsha1) > hashcpy(returnsha1, sha1); > - if (has_sha1_file(sha1)) > - return 0; > + /* Normally if we have it in the pack then we do not bother writing > + * it out into .git/objects/??/?{38} file. We do, though, if there > + * is no chance that we have an uncorrupted version of the object. > + */ > + if (has_sha1_file(sha1)) { > + if (has_loose_object(sha1) || !has_packed_and_bad(sha1)) > + return 0; > + } > return write_loose_object(sha1, hdr, hdrlen, buf, len, 0); > } > -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-06 22:52 ` R. Tyler Ballance @ 2009-01-07 1:25 ` Nicolas Pitre 2009-01-07 1:39 ` R. Tyler Ballance 0 siblings, 1 reply; 49+ messages in thread From: Nicolas Pitre @ 2009-01-07 1:25 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Jan Krüger, Git ML [-- Attachment #1: Type: TEXT/PLAIN, Size: 2646 bytes --] On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote: > > For fixing a corrupted repository by using backup copies of individual > > files, allow write_sha1_file() to write loose files even if the object > > already exists in a pack file, but only if the existing entry is marked > > as corrupted. > > I figured I'd reply to this again, since the issue cropped up again. > > We started experiencing *large* numbers of corruptions like the ones > that started the thread (one developer was receiving them once or twice > a day) with v1.6.0.4 > > We went ahead and upgraded to a custom build of v1.6.1 with Jan's patch > (below) and the issues /seem/ to have resolved themselves. I'm not > certain whether Jan's patch was really responsible, or if there was > another issue that caused this to correct itself in v1.6.1. > > As it stands, I think it's safe to assume that given the frequency of > the occurances that they were not tied to a memory or disk error (or > other levels of the machine's stack would be suffering as well). The > only thing I can think of is that /some/ developers who've experienced > the issue are using Samba mount points and changing files in Mac OS X, > but using Git on the mounted share (i.e. TextMate changes a file hosted > on Samba, changes are committed in an SSH session on that machine), but > that doesn't account for everything. > > If there was something else included in the v1.6.1 release please let me > know so I can back Jan's patch out. Please back it out. As it stands, that patch is a no op because of the way git is used, and even if the patch was to work as intended, its purpose is not to magically fix corruptions without special action from your part. If you have corruption problems coming back only because of the removal of this patch then something is really really fishy and I would really like to know about it. There were indeed many changes between v1.6.0.4 and v1.6.1: the exact number is 1029. A couple of them are especially addressing increased robustness against some kind of pack corruptions. But in any case you still should see error messages appearing about them. And don't underestimate the power of disk corruptions. I started to work on git corruption resilience simply because I ended up with a corrupted pack at some point. Then a while later I got another corrupted pack. Then another while later I lost my filesystem entirely and had to reinstall my system (after buying a new disk). Turns out that my old disk is silently corrupting data without signaling any errors to the host. Nicolas ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 1:25 ` Nicolas Pitre @ 2009-01-07 1:39 ` R. Tyler Ballance 2009-01-07 2:09 ` Nicolas Pitre 2009-01-07 4:54 ` Linus Torvalds 0 siblings, 2 replies; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-07 1:39 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jan Krüger, Git ML [-- Attachment #1: Type: text/plain, Size: 3010 bytes --] On Tue, 2009-01-06 at 20:25 -0500, Nicolas Pitre wrote: > On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > > > On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote: > > > For fixing a corrupted repository by using backup copies of individual > > > files, allow write_sha1_file() to write loose files even if the object > > > already exists in a pack file, but only if the existing entry is marked > > > as corrupted. > > > > I figured I'd reply to this again, since the issue cropped up again. > > > > We started experiencing *large* numbers of corruptions like the ones > > that started the thread (one developer was receiving them once or twice > > a day) with v1.6.0.4 > > > > We went ahead and upgraded to a custom build of v1.6.1 with Jan's patch > > (below) and the issues /seem/ to have resolved themselves. I'm not > > certain whether Jan's patch was really responsible, or if there was > > another issue that caused this to correct itself in v1.6.1. I'll back the patch out and redeploy, it's worth mentioning that a coworker of mine just got the issue as well (on 1.6.1). He was able to `git pull` and the error went away, but I doubt that it "magically fixed itself" > Please back it out. As it stands, that patch is a no op because of the > way git is used, and even if the patch was to work as intended, its > purpose is not to magically fix corruptions without special action from > your part. If you have corruption problems coming back only because of > the removal of this patch then something is really really fishy and I > would really like to know about it. > > There were indeed many changes between v1.6.0.4 and v1.6.1: the exact > number is 1029. A couple of them are especially addressing increased > robustness against some kind of pack corruptions. But in any case you > still should see error messages appearing about them. > > And don't underestimate the power of disk corruptions. I started to > work on git corruption resilience simply because I ended up with a > corrupted pack at some point. Then a while later I got another > corrupted pack. Then another while later I lost my filesystem entirely > and had to reinstall my system (after buying a new disk). Turns out > that my old disk is silently corrupting data without signaling any > errors to the host. I highly doubt this, I've got the issue appearing on at least 7 different development boxes (not workstations, 2U quad-core ECC RAM, etc machines), while that doesn't mean that they all don't have issues, the probability of them *all* having disk issues, and it somehow only manifesting itself with Git usage, is low ;) I've tarred one of the repositories that had it in a reproducible state so I can create a build and extract the tar and run against that to verify any patches anybody might have, but unfortunately at 7GB of company code and assets, I can't exactly share ;) Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 1:39 ` R. Tyler Ballance @ 2009-01-07 2:09 ` Nicolas Pitre 2009-01-07 2:47 ` R. Tyler Ballance 2009-01-07 4:54 ` Linus Torvalds 1 sibling, 1 reply; 49+ messages in thread From: Nicolas Pitre @ 2009-01-07 2:09 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Jan Krüger, Git ML [-- Attachment #1: Type: TEXT/PLAIN, Size: 2165 bytes --] On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > On Tue, 2009-01-06 at 20:25 -0500, Nicolas Pitre wrote: > > On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > > > > > On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote: > > > > For fixing a corrupted repository by using backup copies of individual > > > > files, allow write_sha1_file() to write loose files even if the object > > > > already exists in a pack file, but only if the existing entry is marked > > > > as corrupted. > > > > > > I figured I'd reply to this again, since the issue cropped up again. > > > > > > We started experiencing *large* numbers of corruptions like the ones > > > that started the thread (one developer was receiving them once or twice > > > a day) with v1.6.0.4 > > > > > > We went ahead and upgraded to a custom build of v1.6.1 with Jan's patch > > > (below) and the issues /seem/ to have resolved themselves. I'm not > > > certain whether Jan's patch was really responsible, or if there was > > > another issue that caused this to correct itself in v1.6.1. > > I'll back the patch out and redeploy, it's worth mentioning that a > coworker of mine just got the issue as well (on 1.6.1). He was able to > `git pull` and the error went away, but I doubt that it "magically fixed > itself" Please describe the "issue", ideally with transcripts of error messages, etc. Normally a simple pull operation should not provide any "fix" for corruptions. > I highly doubt this, I've got the issue appearing on at least 7 > different development boxes (not workstations, 2U quad-core ECC RAM, etc > machines), while that doesn't mean that they all don't have issues, the > probability of them *all* having disk issues, and it somehow only > manifesting itself with Git usage, is low ;) Agreed. > I've tarred one of the repositories that had it in a reproducible state That is wonderful. > so I can create a build and extract the tar and run against that to > verify any patches anybody might have, but unfortunately at 7GB of > company code and assets, I can't exactly share ;) First step is to understand what is going on. Only then could reliable patches be made. Nicolas ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 2:09 ` Nicolas Pitre @ 2009-01-07 2:47 ` R. Tyler Ballance 2009-01-07 3:21 ` Nicolas Pitre 0 siblings, 1 reply; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-07 2:47 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jan Krüger, Git ML [-- Attachment #1: Type: text/plain, Size: 751 bytes --] On Tue, 2009-01-06 at 21:09 -0500, Nicolas Pitre wrote: > > I've tarred one of the repositories that had it in a reproducible > state > > That is wonderful. > > > so I can create a build and extract the tar and run against that to > > verify any patches anybody might have, but unfortunately at 7GB of > > company code and assets, I can't exactly share ;) > > First step is to understand what is going on. Only then could reliable > patches be made. If you want to point me in the right direction, I have a few hours to kill this evening and fscking around with gdb(1) and printf() just might be some of my favorite things</sarcasm> ;) Looking forward to killing this issue Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 2:47 ` R. Tyler Ballance @ 2009-01-07 3:21 ` Nicolas Pitre 0 siblings, 0 replies; 49+ messages in thread From: Nicolas Pitre @ 2009-01-07 3:21 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Jan Krüger, Git ML On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > On Tue, 2009-01-06 at 21:09 -0500, Nicolas Pitre wrote: > > > I've tarred one of the repositories that had it in a reproducible > > state > > > > That is wonderful. > > > > > so I can create a build and extract the tar and run against that to > > > verify any patches anybody might have, but unfortunately at 7GB of > > > company code and assets, I can't exactly share ;) > > > > First step is to understand what is going on. Only then could reliable > > patches be made. > > If you want to point me in the right direction, I have a few hours to > kill this evening and fscking around with gdb(1) and printf() just might > be some of my favorite things</sarcasm> ;) Heh. ;-) To start with, a simple log of what you need to do to reproduce the issue would be nice. Just do script /tmp/foo then reproduce the issue and exit, after which I'd be interrested in the content of /tmp/foo. Nicolas ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 1:39 ` R. Tyler Ballance 2009-01-07 2:09 ` Nicolas Pitre @ 2009-01-07 4:54 ` Linus Torvalds 2009-01-07 7:41 ` R. Tyler Ballance 1 sibling, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2009-01-07 4:54 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > > I'll back the patch out and redeploy, it's worth mentioning that a > coworker of mine just got the issue as well (on 1.6.1). He was able to > `git pull` and the error went away, but I doubt that it "magically fixed > itself" Quite frankly, that behaviour sounds like a disk _cache_ corruption issue. The fact that some corruption "comes and goes" and sometimes magically heals itself sounds very much like some disk cache problem, and then that particular part of the cache gets replaced and then when re-populated it is magically correct. We had that in one case with a Linux NFS client, where a rename across directories caused problems. This was a networked filesystem on OS X, right? File caching is much more "interesting" in networked filesystems than it is in normal private on-disk ones. > I've tarred one of the repositories that had it in a reproducible state > so I can create a build and extract the tar and run against that to > verify any patches anybody might have, but unfortunately at 7GB of > company code and assets, I can't exactly share ;) The thing to do is - untar it on some trusted machine with a local disk and a known-good filesystem. IOW, not that networked samba share. - verify that it really does happen on that machine, with that untarred image. Because maybe it doesn't. The hope is that you caught the corruption in the cache, and it actually got written out to the tar-file. But if it _is_ a disk cache (well, network cache) issue, maybe the IO required to tar everything up was enough to flush it, and the tar-file actually _works_ because it got repopulated correctly. So that's why you should double-check that it really ends up being corrupt after being untarred again. - go back and test the original git repo on the network share, preferably on another client. See if the error has gone away. - If so, try to compare that known-corrupt filesystem with the original one: and preferably do this on another machine over the network mount. See if they differ. They obviously should *not* differ, since it's an tar/untar of the same files, but ... The fact that you seem to get a _lot_ of these errors really does make it sound like something in your environment. It's actually really hard to get git to corrupt anything. Especially objects that got packed. They've been quiescent for a long time, they got repacked in a very simple way, they are totally read-only. But it is _not_ hard to corrupt network filesystems. It's downright trivial with some of them, especially with some hardware (eg there's no end-to-end checksumming except for the _extremely_ weak 16-bit IP csum, and even that has been known to be disabled, or screwed up by ethernet cards that do IP packet offloading and thus computing the csum not on the data that tee user actually wrote, but the data that the card received, which is not necessarily at all the same thing). And while ethernet uses a stronger CRC, that one is not end-to-end, so corruption on the card or in a switch in between easily defeats that too. Just google for something like "OS X" SMB "file corruption" and you'll find quite a bit of hits. Not all that unusual. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 4:54 ` Linus Torvalds @ 2009-01-07 7:41 ` R. Tyler Ballance 2009-01-07 8:16 ` Junio C Hamano ` (3 more replies) 0 siblings, 4 replies; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-07 7:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML [-- Attachment #1: Type: text/plain, Size: 5231 bytes --] On Tue, 2009-01-06 at 20:54 -0800, Linus Torvalds wrote: > > On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > > > > I'll back the patch out and redeploy, it's worth mentioning that a > > coworker of mine just got the issue as well (on 1.6.1). He was able to > > `git pull` and the error went away, but I doubt that it "magically fixed > > itself" > > Quite frankly, that behaviour sounds like a disk _cache_ corruption issue. > The fact that some corruption "comes and goes" and sometimes magically > heals itself sounds very much like some disk cache problem, and then that > particular part of the cache gets replaced and then when re-populated it > is magically correct. > > We had that in one case with a Linux NFS client, where a rename across > directories caused problems. > > This was a networked filesystem on OS X, right? File caching is much more > "interesting" in networked filesystems than it is in normal private > on-disk ones. Not quite, what I meant was that some users (not all) who've experienced this issue are using Samba to copy files over directly into the Git repository. I was mentioning this in case somewhere between Finder, Samba, ext3 and Git, some file system change events were pissing Git off and causing it. I don't think this is the case as the coworker that I mentioned earlier doesn't use Samba and neither do I (we both experience the issue today, mine disappeared by upgrading to 1.6.1, his by `git pull`). > > > I've tarred one of the repositories that had it in a reproducible state > > so I can create a build and extract the tar and run against that to > > verify any patches anybody might have, but unfortunately at 7GB of > > company code and assets, I can't exactly share ;) > > The thing to do is > > - untar it on some trusted machine with a local disk and a known-good > filesystem. > > IOW, not that networked samba share. > > - verify that it really does happen on that machine, with that untarred > image. Because maybe it doesn't. Unfortunately it doesn't, what I did notice was this when I did a `git status` in the directory right after untarring: tyler@grapefruit:~/jburgess_main> git status # # ---impressive amount of file names fly by--- # ----snip--- # # Untracked files: # (use "git add <file>..." to include in what will be committed) # # artwork/ # bt/ # flash/ tyler@grapefruit:~/jburgess_main> Basically, somehow Git thinks that *every* file in the repository is deleted at this point. I went ahead and performed a `git reset --hard` to see if the issue would manifest itself thereafter, but it did not. I did try to do a git-fsck(1), and this is what I got: tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full [1] 19381 segmentation fault /usr/local/bin/git fsck --full tyler@grapefruit:~/jburgess_main> > > The hope is that you caught the corruption in the cache, and it > actually got written out to the tar-file. But if it _is_ a disk cache > (well, network cache) issue, maybe the IO required to tar everything up > was enough to flush it, and the tar-file actually _works_ because it > got repopulated correctly. When I was working through this with Jan, one of the things that we did was move the actual object file in .git/objects, they existed so maybe I could look into those to check? > > So that's why you should double-check that it really ends up being > corrupt after being untarred again. > > - go back and test the original git repo on the network share, preferably > on another client. See if the error has gone away. Unfortunately the repository is being used by the original developer I tarred from with our 1.6.1 build, he hasn't reported any issues, but I can't exactly steal it back (that's why I made the tar) > The fact that you seem to get a _lot_ of these errors really does make > it > sound like something in your environment. It's actually really hard to get > git to corrupt anything. Especially objects that got packed. They've been > quiescent for a long time, they got repacked in a very simple way, they > are totally read-only. I checked with our operations team, and contrary to my suspicion (your NFS comment piqued my curiosity), these disks that are actually on the machines are not NFS mounts but rather local disk arrays. --> is it NFSd? or all local storage <== all local <== df -h <== mount <== /dev/sda5 705G 247G 423G 37% /nail --> hm, there goes that theory <== git corruption? --> yeah, looking into it <== sucks --> Linus had a theory about NFS/etc corruption of the disk cache <== when the company folds we can all blame you... and your silly git games <== (think positive, joel) --> thanks ;) Any thing else I can do to help debug this? :-/ Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 7:41 ` R. Tyler Ballance @ 2009-01-07 8:16 ` Junio C Hamano 2009-01-07 8:32 ` R. Tyler Ballance 2009-01-07 9:05 ` R. Tyler Ballance ` (2 subsequent siblings) 3 siblings, 1 reply; 49+ messages in thread From: Junio C Hamano @ 2009-01-07 8:16 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Linus Torvalds, Nicolas Pitre, Jan Krüger, Git ML "R. Tyler Ballance" <tyler@slide.com> writes: > Unfortunately it doesn't, what I did notice was this when I did a `git > status` in the directory right after untarring: > tyler@grapefruit:~/jburgess_main> git status > # > # ---impressive amount of file names fly by--- > # ----snip--- > ... > Basically, somehow Git thinks that *every* file in the repository is > deleted at this point. That makes me suspect that your .git/index file is corrupt. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 8:16 ` Junio C Hamano @ 2009-01-07 8:32 ` R. Tyler Ballance 2009-01-07 9:42 ` Junio C Hamano 0 siblings, 1 reply; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-07 8:32 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Nicolas Pitre, Jan Krüger, Git ML [-- Attachment #1: Type: text/plain, Size: 823 bytes --] On Wed, 2009-01-07 at 00:16 -0800, Junio C Hamano wrote: > "R. Tyler Ballance" <tyler@slide.com> writes: > > > Unfortunately it doesn't, what I did notice was this when I did a `git > > status` in the directory right after untarring: > > tyler@grapefruit:~/jburgess_main> git status > > # > > # ---impressive amount of file names fly by--- > > # ----snip--- > > ... > > Basically, somehow Git thinks that *every* file in the repository is > > deleted at this point. > > That makes me suspect that your .git/index file is corrupt. Would this be tied to the corrupted pack file issue, or separate. Either way, how could I verify your assumptions? (i'll be lurking in #git for a while if you want to interactively help ;)) Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 8:32 ` R. Tyler Ballance @ 2009-01-07 9:42 ` Junio C Hamano 0 siblings, 0 replies; 49+ messages in thread From: Junio C Hamano @ 2009-01-07 9:42 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Linus Torvalds, Nicolas Pitre, Jan Krüger, Git ML "R. Tyler Ballance" <tyler@slide.com> writes: > On Wed, 2009-01-07 at 00:16 -0800, Junio C Hamano wrote: >> "R. Tyler Ballance" <tyler@slide.com> writes: >> >> > Unfortunately it doesn't, what I did notice was this when I did a `git >> > status` in the directory right after untarring: >> > tyler@grapefruit:~/jburgess_main> git status >> > # >> > # ---impressive amount of file names fly by--- >> > # ----snip--- >> > ... >> > Basically, somehow Git thinks that *every* file in the repository is >> > deleted at this point. >> >> That makes me suspect that your .git/index file is corrupt. > > Would this be tied to the corrupted pack file issue, or separate. If you have perfectly good set of packs, if your index is corrupt you may see "everything deleted", so in that sense it is independent. As Linus's earlier conjecture was that this is related to some sort of disk/cache corruption, I wouldn't be surprised if such a failure hit packs and the index file indiscriminatingly. So in that sense they are related. I think "git ls-files" (before doing anything else, such as resetting, of course) would report that the index is corrupt, if that is indeed the case. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 7:41 ` R. Tyler Ballance 2009-01-07 8:16 ` Junio C Hamano @ 2009-01-07 9:05 ` R. Tyler Ballance 2009-01-07 15:31 ` Nicolas Pitre 2009-01-07 16:07 ` Linus Torvalds 3 siblings, 0 replies; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-07 9:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML [-- Attachment #1: Type: text/plain, Size: 2154 bytes --] On Tue, 2009-01-06 at 23:41 -0800, R. Tyler Ballance wrote: > I did try to do a git-fsck(1), and this is what I got: > tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full > [1] 19381 segmentation fault /usr/local/bin/git fsck --full > tyler@grapefruit:~/jburgess_main> > > Disregard this comment, Jan reminded me in IRC of the issue I had with this repository and the stack-size ulimit. A healthy 32M stack-size allows for the command to complete: tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full dangling blob 6e58a06cd0f027fce1fc8a923a8c81d6b55f1705 dangling blob 5e87e049f69ee06af1f4f92a3d4ddcd912f8535e dangling blob acb8e0937cea3f4068c5c67f60d3b97952654fa1 dangling blob 8bd540d657027ab12228e0522de86a97d3f8a7a9 dangling blob 90d93096369df4b2151df3e289952297c5f390dd dangling blob aaa4a1bf9e6d39991503b7908ff71605d6632fef dangling blob b003e20f06de7d0a7e11a1047fbb39e2deb84899 dangling blob bf207363786f08e888a725cfb8b2fe4bea11ceab dangling blob 642493a1bcf09f74551f0ce5b4c3ccc23acaf3c9 dangling blob ff80b45e48795d4544d0a5b2f18714123ab21746 dangling blob e3c0e67e23ec588acab733e2069fb09fa38a235d dangling blob 62efa65bcbb6d94f26be9b91dc98d7a7f6fbf602 dangling blob a6a2c717c230fe26263b9ce002f3f82fdab6f727 dangling blob d32988875aa90ef728dc1af6b71c130f2c8e8b94 dangling blob c8aad9fc37a27872bae18af78b1623bfb8f9a9f7 dangling blob 4cdbe97301e333d89037dc9f4a8440dab9e62049 dangling blob 2819bbcda3b0efe828709f5b22624712cb9ebdae dangling blob b156cba3ee89e1506d02fbb845e8dc0889ff4090 dangling blob ed658b8beca69ebf6e25ab1ed6217881e27d4e9a dangling blob d581cb2fb1cf2b4d15f8f9b90b08a3ebc619414f dangling blob a85c2e55d27c9fb1f5874f8bd81c65e597327b4f dangling blob 548f3e70c06f9306d71f6b8d11ed24ade581fa31 dangling blob 7a1c8fa8bc59c4e8fb347426cc11fabf8b0d1639 tyler@grapefruit:~/jburgess_main> Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 7:41 ` R. Tyler Ballance 2009-01-07 8:16 ` Junio C Hamano 2009-01-07 9:05 ` R. Tyler Ballance @ 2009-01-07 15:31 ` Nicolas Pitre 2009-01-07 16:07 ` Linus Torvalds 3 siblings, 0 replies; 49+ messages in thread From: Nicolas Pitre @ 2009-01-07 15:31 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Linus Torvalds, Jan Krüger, Git ML On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > On Tue, 2009-01-06 at 20:54 -0800, Linus Torvalds wrote: > > > > On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > > > > > > I'll back the patch out and redeploy, it's worth mentioning that a > > > coworker of mine just got the issue as well (on 1.6.1). He was able to > > > `git pull` and the error went away, but I doubt that it "magically fixed > > > itself" > > > > Quite frankly, that behaviour sounds like a disk _cache_ corruption issue. > > The fact that some corruption "comes and goes" and sometimes magically > > heals itself sounds very much like some disk cache problem, and then that > > particular part of the cache gets replaced and then when re-populated it > > is magically correct. > > > > We had that in one case with a Linux NFS client, where a rename across > > directories caused problems. > > > > This was a networked filesystem on OS X, right? File caching is much more > > "interesting" in networked filesystems than it is in normal private > > on-disk ones. > > Not quite, what I meant was that some users (not all) who've experienced > this issue are using Samba to copy files over directly into the Git > repository. I was mentioning this in case somewhere between Finder, > Samba, ext3 and Git, some file system change events were pissing Git off > and causing it. As long as those files are not within the .git directory that should be fine. > I don't think this is the case as the coworker that I > mentioned earlier doesn't use Samba and neither do I (we both experience > the issue today, mine disappeared by upgrading to 1.6.1, his by `git > pull`). Problem is that none of that "makes sense". If a real git corruption was there, it wouldn't disappear without explicit action from your part. What git v1.6.1 is able to do over earlier versions is to still function properly if some corrupted objects have a redundant copy in the same repository, but it wouldn't stop complaining about the existence of corrupted data. Doing a 'git gc' or 'git repack -a' might get rid of the corruption. And a 'git pull' might hide it if the received pack during the pull operation happens to contain another copy of the object that was corrupted before, but it wouldn't prevent 'git fsck --full' from seeing it. The fact that you cannot reproduce the corruption issues after unarchiving a repository that was known to have problems right before it was archived is really really strange. That does not rule out git bugs of course, but at least this shows that no actual corruption on disk was initially involved. Again, I'd suggest you perform your git usage within a script session so to capture the exact operation performed and error messages produced when/if similar problems do occur again. Otherwise we're only running after our tail. Nicolas ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 7:41 ` R. Tyler Ballance ` (2 preceding siblings ...) 2009-01-07 15:31 ` Nicolas Pitre @ 2009-01-07 16:07 ` Linus Torvalds 2009-01-07 16:08 ` Linus Torvalds 2009-01-07 22:55 ` R. Tyler Ballance 3 siblings, 2 replies; 49+ messages in thread From: Linus Torvalds @ 2009-01-07 16:07 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML On Tue, 6 Jan 2009, R. Tyler Ballance wrote: > > > > The thing to do is > > > > - untar it on some trusted machine with a local disk and a known-good > > filesystem. > > > > IOW, not that networked samba share. > > > > - verify that it really does happen on that machine, with that untarred > > image. Because maybe it doesn't. > > Unfortunately it doesn't Well, that's not necessarily "unfortunate". It does actually end up showing that the objects themselves were apparently never really corrupt. So there is no fundamental data structure corrupttion - because when you copy the repository, it's all good agin! In other words, that's not a worthless piece of information at all, and it does tell us a lot, namely that the corruption was never real long-term data corruption of the git object archive, but something local and temporary. Again, we're really back to either: - it could be some _temporary_ git corruption caused internally inside a git process - ie a wild pointer, or perhaps a race condition (but we don't really use threading in 1.6.0.4 unless you ask for it, and even then just for pack-file generation) - or it's the disk cache corruption, and the tar/untar ended up flushing it. And quite frankly, since the corruption seems to be site-specific, I really do suspect the second case. Although it's possible, of course, that it could be some compiler issue that makes _your_ binaries have issues even when nobody else sees it. > what I did notice was this when I did a `git > status` in the directory right after untarring: > tyler@grapefruit:~/jburgess_main> git status > # > # ---impressive amount of file names fly by--- > # ----snip--- > # > # Untracked files: > # (use "git add <file>..." to include in what will be > committed) > # > # artwork/ > # bt/ > # flash/ Hmm. That's actually _normal_ under some circumstances. At least with older git versions, or if your .git/index file couldn't be rewritten for some reason - your existing index file contains all the old stat information, and if git cannot (or, in the case of older git version, just will not) refresh it automatically, it will show all the files as changed, even if it's just the inode number that really changed. A _normal_ git install should have auto-refreshed the index, though. Unless the tar archive only contained the ".git" directory, and not the checkout? > tyler@grapefruit:~/jburgess_main> > > Basically, somehow Git thinks that *every* file in the repository is > deleted at this point. I went ahead and performed a `git reset --hard` > to see if the issue would manifest itself thereafter, but it did not. That would be what I'd expect if you had only tarred up .git, although then I wouldn't have expected those "Untracked files:". Hmm. Without being able to look at the archive, I'm just guessing randomly. > I did try to do a git-fsck(1), and this is what I got: > tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full > [1] 19381 segmentation fault /usr/local/bin/git fsck --full > tyler@grapefruit:~/jburgess_main> .. and that's the unrelated fsck bug that got fixed later. > > The hope is that you caught the corruption in the cache, and it > > actually got written out to the tar-file. But if it _is_ a disk cache > > (well, network cache) issue, maybe the IO required to tar everything up > > was enough to flush it, and the tar-file actually _works_ because it > > got repopulated correctly. > > When I was working through this with Jan, one of the things that we did > was move the actual object file in .git/objects, they existed so maybe I > could look into those to check? Yes. If you have any bad loose objects, if you compare them to the good objects with the same name, that's going to be interesting information. The pattern of corruption can be very telling. For example, on Linux, a disk cache corruption would usually be at 4kB block boundaries, because that's the granularity of the cache. While a bit error would be obvious etc etc. > I checked with our operations team, and contrary to my suspicion (your > NFS comment piqued my curiosity), these disks that are actually on the > machines are not NFS mounts but rather local disk arrays. Ok, that generally makes caching much simpler. What filesystem? Is there anything else that you do that is site-specific and/or slightly different? For example, a long time ago we had a bug related to CRLF conversion which would cause a use-after-free problem, and that would corrupt the data internally to git. And dobody else saw it than this one person, and it was a total mystery to everybody until we realized that he used this one feature that nobody else was using. So as you're on OS X, I assume you don't have CRLF conversion, but maybe you use some other feature that we support but nobody really actually uses. Like keyword expansion or something? Oh - that would also explain why you got all those entries in "git status" that went away when you did a "git reset --hard": if you had some keyword expansion (or CRLF) enabled in the original users "~/.gitconfig", that checkout would have had expansion/CRLF/whatever conversion, but then when you tarred/untarred it on another setup, the expansion would be seen as a difference because it wasn't enabled. Hmm? Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 16:07 ` Linus Torvalds @ 2009-01-07 16:08 ` Linus Torvalds 2009-01-07 22:55 ` R. Tyler Ballance 1 sibling, 0 replies; 49+ messages in thread From: Linus Torvalds @ 2009-01-07 16:08 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML On Wed, 7 Jan 2009, Linus Torvalds wrote: > > And dobody else saw it than this one person, and it was a total mystery to > everybody until we realized that he used this one feature that nobody else > was using. So as you're on OS X, I assume you don't have CRLF conversion, > but maybe you use some other feature that we support but nobody really > actually uses. Like keyword expansion or something? > > Oh - that would also explain why you got all those entries in "git status" > that went away when you did a "git reset --hard": if you had some keyword > expansion (or CRLF) enabled in the original users "~/.gitconfig", that > checkout would have had expansion/CRLF/whatever conversion, but then when > you tarred/untarred it on another setup, the expansion would be seen as a > difference because it wasn't enabled. Btw, if you untar it again, and just do a "git diff", that should show any such effects. Rather than showing just that something changed, it should show _how_ it changed. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 16:07 ` Linus Torvalds 2009-01-07 16:08 ` Linus Torvalds @ 2009-01-07 22:55 ` R. Tyler Ballance 2009-01-07 23:29 ` Linus Torvalds 1 sibling, 1 reply; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-07 22:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML [-- Attachment #1: Type: text/plain, Size: 4576 bytes --] On Wed, 2009-01-07 at 08:07 -0800, Linus Torvalds wrote: > Well, that's not necessarily "unfortunate". It does actually end up > showing that the objects themselves were apparently never really corrupt. > > So there is no fundamental data structure corrupttion - because when you > copy the repository, it's all good agin! > - it could be some _temporary_ git corruption caused internally inside a > git process - ie a wild pointer, or perhaps a race condition (but we > don't really use threading in 1.6.0.4 unless you ask for it, and even > then just for pack-file generation) I have a feeling it's something like this, one of our operations guys did some research while I was looking at code and he came across this: On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote: git-merge is using too much RAM, and failing to malloc() but NOT > reporting it. This is all sorts of bad: > > A) using an unscalable amount of RAM > B) failing to detect malloc() failure > C) reporting file corruption instead > I was able to reproduce this. > > limit ~1.5GB -> corrupt file > limit ~3GB -> magically no longer corrupt. > > The false fail may be limited to git-merge, but git status also > allocates the same amount of RAM. > > To temporarily work around this problem, issue this once you log in to > a dev box: > > tcsh: > limit vmemoryuse 3000000 > bash: > ulimit -v 3000000 > > Be gentle. > And quite frankly, since the corruption seems to be site-specific, I > really do suspect the second case. Although it's possible, of course, that > it could be some compiler issue that makes _your_ binaries have issues > even when nobody else sees it. I think you're correct insofar that our major site-specific alteration has come up on the mailing list before (okay maybe two site-specific things). * Our Git repo is ~7.1GB * ulimit -v is set to ~1.5G I think I know how this could be failing and corrupting things (assuming it's malloc(2)) related. What I'm thinking is that in xmalloc() or one of the other x*)_ functions, the malloc(size) is failing because of the ulimits, and then the potentially somewhere it's silently failing or maybe even accidentally returning one of those "malloc(1)" pointers? I've got two new tarred repositories from two developers the issue happened to today, so I'm flush full of sample repositories to try stuff on :) > > Hmm. That's actually _normal_ under some circumstances. At least with > older git versions, or if your .git/index file couldn't be rewritten for > some reason - your existing index file contains all the old stat > information, and if git cannot (or, in the case of older git version, just > will not) refresh it automatically, it will show all the files as changed, > even if it's just the inode number that really changed. > > A _normal_ git install should have auto-refreshed the index, though. > Unless the tar archive only contained the ".git" directory, and not the > checkout? I believe the issues I noticed when untarring the repo were a red herring, I did the `git diff` after untarring and I noticed that only a certain set of files where changed, I'm willing to go so far as to guess that they were the files affected in the corrupted packs. Of the 32k files in our repository, 98 were actually different after untarring (according to git-diff(1)) > And dobody else saw it than this one person, and it was a total mystery to > everybody until we realized that he used this one feature that nobody else > was using. So as you're on OS X, I assume you don't have CRLF conversion, > but maybe you use some other feature that we support but nobody really > actually uses. Like keyword expansion or something? The two new folks this happened to today had nothing "special" about them other than the ulimit. I've got the script(1) output of performing git-ls-files(1) and some other commands that I tried, nothing they output was particular informative or interesting, and I don't think it will help if this really is a memory related issue, that said I'd be more than happy to send it to a couple of you (Junio, Linus, Nico). I'm *so* ready for this bug to die >=\ Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 22:55 ` R. Tyler Ballance @ 2009-01-07 23:29 ` Linus Torvalds 2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance 2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds 0 siblings, 2 replies; 49+ messages in thread From: Linus Torvalds @ 2009-01-07 23:29 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML On Wed, 7 Jan 2009, R. Tyler Ballance wrote: > > > git process - ie a wild pointer, or perhaps a race condition (but we > > don't really use threading in 1.6.0.4 unless you ask for it, and even > > then just for pack-file generation) > > I have a feeling it's something like this, one of our operations guys > did some research while I was looking at code and he came across this: > > On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote: > git-merge is using too much RAM, and failing to malloc() but > NOT > > reporting it. This is all sorts of bad: > > > > A) using an unscalable amount of RAM > > B) failing to detect malloc() failure > > C) reporting file corruption instead Well, I dont' think that's exactly it. git internally doesn't really use malloc at all, and uses xmalloc() instead which will die() if the malloc fails. So there's almost certainly no "failing to detect failures" Yes, there's a few places that don't use the wrapper, but they should be safe (eg either they SIGSEGV, or they are like create_delta_index() and just create a sub-optimal pack with a warning). HOWEVER: > > I was able to reproduce this. > > > > limit ~1.5GB -> corrupt file > > limit ~3GB -> magically no longer corrupt. That is interesting, although I also worry that there might be other issues going on (ie since you've reported thigns magically fixing themselves, maybe the ulimit tests just _happened_ to show that, even if it wasn't the core reason). BUT! This is definitely worth looking at. For example, we do have some cases where we try to do "mmap()", and if it fails, we try to free some memory and try again. In particular, in xmmap(), if an mmap() fails - which may be due to running out of virtual address space - we'll actually try to release some pack-file memory and try again. Maybe there's a bug there - and it would be one that seldom triggers for others. > I think you're correct insofar that our major site-specific alteration > has come up on the mailing list before (okay maybe two site-specific > things). > * Our Git repo is ~7.1GB > * ulimit -v is set to ~1.5G It is certainly possible. It's too bad that it's private, because it makes it _much_ harder to try to pinpoint this. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 23:29 ` Linus Torvalds @ 2009-01-08 0:28 ` R. Tyler Ballance 2009-01-08 0:48 ` Linus Torvalds 2009-01-08 2:52 ` Linus Torvalds 2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds 1 sibling, 2 replies; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-08 0:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb [-- Attachment #1: Type: text/plain, Size: 2370 bytes --] On Wed, 2009-01-07 at 15:29 -0800, Linus Torvalds wrote: > It is certainly possible. It's too bad that it's private, because it makes > it _much_ harder to try to pinpoint this. My most esteemed colleague (Ken aka kb) who pointed out the memory issue was on the right path (I think), and I have a reproduction case you can try with your very own Linux kernel tree! WOO! I set ulimit -v really low (150M), and the operations I made got an mmap(2) fatal error, but there is a sweet spot that I found, see the transcript below. I basically chose an arbitrary revision from a couple of weeks ago, and rolled the repository back to that point, then I tried with iterations of ulimit -v 150, 250, 450, and then back down to 350. tyler@grapefruit:~/source/git/linux-2.6> limit cputime unlimited filesize unlimited datasize unlimited stacksize 8MB coredumpsize 0kB memoryuse 2561MB maxproc 24564 descriptors 1024 memorylocked 64kB addressspace unlimited maxfilelocks unlimited sigpending 24564 msgqueue 819200 nice 0 rt_priority 0 tyler@grapefruit:~/source/git/linux-2.6> export START=56d18e9932ebf4e8eca42d2ce509450e6c9c1666 tyler@grapefruit:~/source/git/linux-2.6> git reset --hard $START HEAD is now at 56d18e9 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus tyler@grapefruit:~/source/git/linux-2.6> ulimit -v `echo "350 * 1024" | bc -l` tyler@grapefruit:~/source/git/linux-2.6> git pull error: failed to read object be1b87c70af69acfadb8a27a7a76dfb61de92643 at offset 1850923 from .git/objects/pack/pack-dbe154052997a05499eb6b4fd90b924da68e799a.pack fatal: object be1b87c70af69acfadb8a27a7a76dfb61de92643 is corrupted tyler@grapefruit:~/source/git/linux-2.6> I've tried this a couple of times, and it does seem to be reproducible, let me know if you have any issues reproducing it locally and I'll try to dig into it more with valgrind or something a bit more pin-pointing than "ulimit -v && try, try again" Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance @ 2009-01-08 0:48 ` Linus Torvalds 2009-01-08 0:57 ` R. Tyler Ballance 2009-01-08 2:21 ` James Pickens 2009-01-08 2:52 ` Linus Torvalds 1 sibling, 2 replies; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 0:48 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb On Wed, 7 Jan 2009, R. Tyler Ballance wrote: > > My most esteemed colleague (Ken aka kb) who pointed out the memory issue > was on the right path (I think), and I have a reproduction case you can > try with your very own Linux kernel tree! > > WOO! > > I set ulimit -v really low (150M), and the operations I made got an > mmap(2) fatal error, but there is a sweet spot that I found, see the > transcript below. This is indeed the packfile mapping. The sweet spot you found depends on how big the biggest two pack-files are, I do believe. And if you do that [core] packedgitwindowsize = 64M I think you'll find that it works. Of course, with a _really_ low ulimit, you'd need to make it even smaller, but at some point you start hitting other problems than the pack-file limits, ie just the simple fact that git wants and expects you to have a certain amount of memory available ;) Can you cnfirm that your "reproducible" case starts working with that addition to your ~/.gitconfig? If so, the solution is pretty simple: we should just lower the default pack windowsize. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 0:48 ` Linus Torvalds @ 2009-01-08 0:57 ` R. Tyler Ballance 2009-01-08 1:08 ` Linus Torvalds 2009-01-08 2:21 ` James Pickens 1 sibling, 1 reply; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-08 0:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb [-- Attachment #1: Type: text/plain, Size: 1584 bytes --] On Wed, 2009-01-07 at 16:48 -0800, Linus Torvalds wrote: > > On Wed, 7 Jan 2009, R. Tyler Ballance wrote: > > > > My most esteemed colleague (Ken aka kb) who pointed out the memory issue > > was on the right path (I think), and I have a reproduction case you can > > try with your very own Linux kernel tree! > > > > WOO! > > > > I set ulimit -v really low (150M), and the operations I made got an > > mmap(2) fatal error, but there is a sweet spot that I found, see the > > transcript below. > > This is indeed the packfile mapping. The sweet spot you found depends on > how big the biggest two pack-files are, I do believe. > > And if you do that > > [core] > packedgitwindowsize = 64M > > I think you'll find that it works. Of course, with a _really_ low ulimit, > you'd need to make it even smaller, but at some point you start hitting > other problems than the pack-file limits, ie just the simple fact that git > wants and expects you to have a certain amount of memory available ;) > > Can you cnfirm that your "reproducible" case starts working with that > addition to your ~/.gitconfig? If so, the solution is pretty simple: we > should just lower the default pack windowsize. This certainly corrected the issue, is there some magic packedgitwindowsize that i should be looking at my own repository (our internal one) in order to prevent the issue from occurring? Looking into .git/objects/pack, I think the two biggest pack files are 3.5G and 177MBG respectively :-! Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 0:57 ` R. Tyler Ballance @ 2009-01-08 1:08 ` Linus Torvalds 2009-01-08 1:29 ` Linus Torvalds 0 siblings, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 1:08 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb On Wed, 7 Jan 2009, R. Tyler Ballance wrote: > > > > Can you cnfirm that your "reproducible" case starts working with that > > addition to your ~/.gitconfig? If so, the solution is pretty simple: we > > should just lower the default pack windowsize. > > This certainly corrected the issue, is there some magic > packedgitwindowsize that i should be looking at my own repository (our > internal one) in order to prevent the issue from occurring? > > Looking into .git/objects/pack, I think the two biggest pack files are > 3.5G and 177MBG respectively :-! So there's a few rules to packedgitwindowsize: - we need to be able to have at least two windows open at a time, in addition to all the "normal" memory git needs just for objects, of course. And quite frankly, you'd be better off with a few more windows, even if that obviously implies smaller windows. - the window size really wants to be a round power-of-two number, and at _least_ it wants to be a nice multiple of the 2*page size. So if you have a virtual memory limit of 1.5GB, I'd hesitate to make the pack window size less than 512M, and 256M is probably better. That way, I'd expect you to be able to always have at least four windows open (assuming a reasonably generous half a gigabyte for "other stuff"). And quite frankly, there's not a huge downside to making them smaller. At "just" 32MB, you'll still fit plenty of data in one pack window, and while it will cost you a few mmap/unmap's to switch windows around, most operations simply will not likely ever notice. At least not under Linux, where mmap/munmap is pretty cheap. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 1:08 ` Linus Torvalds @ 2009-01-08 1:29 ` Linus Torvalds 2009-01-08 1:46 ` Shawn O. Pearce 0 siblings, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 1:29 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb On Wed, 7 Jan 2009, Linus Torvalds wrote: > > So there's a few rules to packedgitwindowsize: > > - we need to be able to have at least two windows open at a time, in > addition to all the "normal" memory git needs just for objects, of > course. And quite frankly, you'd be better off with a few more windows, > even if that obviously implies smaller windows. Btw, I'm not 100% certain of this. Somebody should double-check me. Maybe there are cases where we want more than two windows alive. And maybe there aren't even that, and we can always make do with just one. So I will _not_ guarantee that "at least two pack windows" is necessarily the right answer. The windowing code was mostly other people doing it. I think Shawn and Nico. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 1:29 ` Linus Torvalds @ 2009-01-08 1:46 ` Shawn O. Pearce 0 siblings, 0 replies; 49+ messages in thread From: Shawn O. Pearce @ 2009-01-08 1:46 UTC (permalink / raw) To: Linus Torvalds Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 7 Jan 2009, Linus Torvalds wrote: > > > > So there's a few rules to packedgitwindowsize: > > > > - we need to be able to have at least two windows open at a time, in > > addition to all the "normal" memory git needs just for objects, of > > course. And quite frankly, you'd be better off with a few more windows, > > even if that obviously implies smaller windows. > > Btw, I'm not 100% certain of this. Somebody should double-check me. Maybe > there are cases where we want more than two windows alive. And maybe there > aren't even that, and we can always make do with just one. > > So I will _not_ guarantee that "at least two pack windows" is necessarily > the right answer. The windowing code was mostly other people doing it. I > think Shawn and Nico. I was fairly certain we needed at least two windows open at once, but reviewing the code in sha1_file.c I don't see a reason for that restriction anymore. I think it used to have to do with the delta reconstruction; to unpack a delta we would read the delta header from one window, but we may need base data from another. The delta unpack code kept the delta window pinned in use, so we couldn't replace it to access base data from elsewhere, hence we needed two windows. Thinking about it now I don't recall how we handled the recusion on a delta chain longer than 2. ;-) But looking at the code we have long since refactored it so this isn't an issue anymore. We release the window between reading the delta header and reading the base, so the delta window can be replaced if necessary. I think the "2 window minimum" is just a performance suggestion, not a requirement. -- Shawn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 0:48 ` Linus Torvalds 2009-01-08 0:57 ` R. Tyler Ballance @ 2009-01-08 2:21 ` James Pickens 2009-01-08 2:43 ` Shawn O. Pearce 2009-01-08 2:52 ` Boyd Stephen Smith Jr. 1 sibling, 2 replies; 49+ messages in thread From: James Pickens @ 2009-01-08 2:21 UTC (permalink / raw) To: Git ML Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, kb, Linus Torvalds On Wed, Jan 7, 2009, Linus Torvalds <torvalds@linux-foundation.org> wrote: > Can you cnfirm that your "reproducible" case starts working with that > addition to your ~/.gitconfig? If so, the solution is pretty simple: we > should just lower the default pack windowsize. Umm... isn't that more of a workaround than a solution? I.e., if you lower the default pack windowsize, couldn't the corruption still happen under the right conditions? James ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 2:21 ` James Pickens @ 2009-01-08 2:43 ` Shawn O. Pearce 2009-01-08 5:40 ` Junio C Hamano 2009-01-08 2:52 ` Boyd Stephen Smith Jr. 1 sibling, 1 reply; 49+ messages in thread From: Shawn O. Pearce @ 2009-01-08 2:43 UTC (permalink / raw) To: James Pickens Cc: Git ML, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, kb, Linus Torvalds James Pickens <jepicken@gmail.com> wrote: > On Wed, Jan 7, 2009, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Can you cnfirm that your "reproducible" case starts working with that > > addition to your ~/.gitconfig? If so, the solution is pretty simple: we > > should just lower the default pack windowsize. > > Umm... isn't that more of a workaround than a solution? I.e., if you lower > the default pack windowsize, couldn't the corruption still happen under the > right conditions? Uhm, yea. So I managed to reproduce it on a Linux system here. Different object ids than R. Tyler's case, but I'm going to try to debug it and see why we are getting these. For those following along at home, Linus' 2.6 tree: $ ulimit -v `echo '150 * 1024'|bc -l` $ git co 56d18e9932ebf4e8eca42d2ce509450e6c9c1666 HEAD is now at 56d18e9... Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus $ git merge 9e42d0cf5020aaf217433cad1a224745241d212a Updating 56d18e9..9e42d0c error: failed to read delta base object ef135b90084f3c54fccea4e273aeff029db2d873 at offset 48342508 from .git/objects/pack/pack-edb47354be787909e05c15bd1d9eb8b4684d2e4d.pack error: failed to read delta base object c4e828b71d96622bb258938d69aab9cec53d5cae at offset 128427683 from .git/objects/pack/pack-edb47354be787909e05c15bd1d9eb8b4684d2e4d.pack error: failed to read object 3cd5a6463cfd9306095bf6312a9b7ab09d4f2f5d at offset 128427777 from .git/objects/pack/pack-edb47354be787909e05c15bd1d9eb8b4684d2e4d.pack fatal: object 3cd5a6463cfd9306095bf6312a9b7ab09d4f2f5d is corrupted No, the repository is not corrupt. We f'd up our memory management somewhere. -- Shawn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 2:43 ` Shawn O. Pearce @ 2009-01-08 5:40 ` Junio C Hamano 2009-01-08 6:04 ` Shawn O. Pearce 0 siblings, 1 reply; 49+ messages in thread From: Junio C Hamano @ 2009-01-08 5:40 UTC (permalink / raw) To: Shawn O. Pearce Cc: James Pickens, Git ML, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, kb, Linus Torvalds "Shawn O. Pearce" <spearce@spearce.org> writes: > For those following along at home, Linus' 2.6 tree: > > $ ulimit -v `echo '150 * 1024'|bc -l` > $ git co 56d18e9932ebf4e8eca42d2ce509450e6c9c1666 Hmm, without any "wrap zlib to die on error" patch, this step already fails with: $ git checkout 56d18e9932ebf4e8eca42d2ce509450e6c9c1666 fatal: Out of memory? mmap failed: Cannot allocate memory I guess that is because our test repositories are packed differently. I'll retry after repacking.. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 5:40 ` Junio C Hamano @ 2009-01-08 6:04 ` Shawn O. Pearce 0 siblings, 0 replies; 49+ messages in thread From: Shawn O. Pearce @ 2009-01-08 6:04 UTC (permalink / raw) To: Junio C Hamano Cc: James Pickens, Git ML, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, kb, Linus Torvalds Junio C Hamano <gitster@pobox.com> wrote: > "Shawn O. Pearce" <spearce@spearce.org> writes: > > For those following along at home, Linus' 2.6 tree: > > > > $ ulimit -v `echo '150 * 1024'|bc -l` > > $ git co 56d18e9932ebf4e8eca42d2ce509450e6c9c1666 > > Hmm, without any "wrap zlib to die on error" patch, this step already > fails with: > > $ git checkout 56d18e9932ebf4e8eca42d2ce509450e6c9c1666 > fatal: Out of memory? mmap failed: Cannot allocate memory > > I guess that is because our test repositories are packed differently. > I'll retry after repacking.. Yup. I actually did something more like this to get the test repository: git clone git://android.git.kernel.org/kernel/common.git git fetch git://kernel.org/pub/.../torvalds/linux-2.6.git master The android kernel repository I had handy on my local system was quite a bit away from Linus' so I wound up with two different but sizable packs. I thought android was closer to upstream, but its apparently not. I started from there because it was local and I thought it would be a quick way to get a test environment, but sadly it didn't even have the base 56d18e we were talking about. -- Shawn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 2:21 ` James Pickens 2009-01-08 2:43 ` Shawn O. Pearce @ 2009-01-08 2:52 ` Boyd Stephen Smith Jr. 1 sibling, 0 replies; 49+ messages in thread From: Boyd Stephen Smith Jr. @ 2009-01-08 2:52 UTC (permalink / raw) To: James Pickens Cc: Git ML, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, kb, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 945 bytes --] On Wednesday 2009 January 07 20:21:18 James Pickens wrote: >On Wed, Jan 7, 2009, Linus Torvalds <torvalds@linux-foundation.org> wrote: >> Can you cnfirm that your "reproducible" case starts working with that >> addition to your ~/.gitconfig? If so, the solution is pretty simple: we >> should just lower the default pack windowsize. > >Umm... isn't that more of a workaround than a solution? I.e., if you lower >the default pack windowsize, couldn't the corruption still happen under the >right conditions? IMHO: I agree, somewhat. I'm fine with a "die()" message when there's not enough memory, but either corruption or just a spurious, but scary "<SHA> is corrupt" messages should be fixed. -- Boyd Stephen Smith Jr. ,= ,-_-. =. bss@iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/ \_/ [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance 2009-01-08 0:48 ` Linus Torvalds @ 2009-01-08 2:52 ` Linus Torvalds 2009-01-08 3:01 ` Shawn O. Pearce 1 sibling, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 2:52 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb On Wed, 7 Jan 2009, R. Tyler Ballance wrote: > > tyler@grapefruit:~/source/git/linux-2.6> git pull > error: failed to read object be1b87c70af69acfadb8a27a7a76dfb61de92643 at offset 1850923 > from .git/objects/pack/pack-dbe154052997a05499eb6b4fd90b924da68e799a.pack > fatal: object be1b87c70af69acfadb8a27a7a76dfb61de92643 is corrupted Btw, this is an interesting error message, mostly because of what is _not_ there. In particular, it doesn't report any reason _why_ it failed to read the object, which as far as I can tell can happen for only one reason: unpack_compressed_entry() returns NULL, and that path is the only thing that can do so without a message. And it only does it if zlib fails. Now, zlib can fail because the unpacking fails, but it can fail for other cases too. And the thing is, we don't check/report those kinds of failure cases very well. Which really bit us here, because if we had checked the return value of inflateInit(), we'd almost certainly would have gotten a nice big "you ran out of memory" thing, and we wouldn't have been chasing this down as a corruption issue. We probably should wrap all the "inflateInit()" calls, and do something like static void xinflateInit(z_streamp strm) { const char *err; switch (inflateInit(strm)) { case Z_OK: return; case Z_MEM_ERROR: err = "out of memory"; break; case Z_VERSION_ERROR: err = "wrong version"; break; default: err = "error"; } die("inflateInit: %s (%s)", err, strm->msg ? strm->msg : "no message"); } or similar. That way we'd get good error reports when we run out of memory, rather than consider it to be a corruption issue. We could also try to make a few of these wrappers actually release some of the memory (the way xmmap() does), but there are likely diminishing returns. And the much more important issue is the proper error reporting: if we had reported "out of memory", we'd not have spent so much time chasing disk corruption etc. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 2:52 ` Linus Torvalds @ 2009-01-08 3:01 ` Shawn O. Pearce 2009-01-08 3:06 ` Linus Torvalds 0 siblings, 1 reply; 49+ messages in thread From: Shawn O. Pearce @ 2009-01-08 3:01 UTC (permalink / raw) To: Linus Torvalds Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, 7 Jan 2009, R. Tyler Ballance wrote: > > > > tyler@grapefruit:~/source/git/linux-2.6> git pull > > error: failed to read object be1b87c70af69acfadb8a27a7a76dfb61de92643 at offset 1850923 > > from .git/objects/pack/pack-dbe154052997a05499eb6b4fd90b924da68e799a.pack > > fatal: object be1b87c70af69acfadb8a27a7a76dfb61de92643 is corrupted > > Btw, this is an interesting error message, mostly because of what is > _not_ there. > > In particular, it doesn't report any reason _why_ it failed to read the > object, which as far as I can tell can happen for only one reason: > unpack_compressed_entry() returns NULL, and that path is the only thing > that can do so without a message. > > And it only does it if zlib fails. Ok, well, in this case I've been able to reproduce a zlib inflate failure on the base object in a 2 deep delta chain. We got back: #define Z_STREAM_ERROR (-2) this causes the buffer to be freed and NULL to come back out of unpack_compressed_entry(), and then everything is corrupt... -- Shawn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 3:01 ` Shawn O. Pearce @ 2009-01-08 3:06 ` Linus Torvalds 2009-01-08 3:13 ` Shawn O. Pearce 0 siblings, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 3:06 UTC (permalink / raw) To: Shawn O. Pearce Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb On Wed, 7 Jan 2009, Shawn O. Pearce wrote: > > Ok, well, in this case I've been able to reproduce a zlib inflate > failure on the base object in a 2 deep delta chain. We got back: > > #define Z_STREAM_ERROR (-2) > > this causes the buffer to be freed and NULL to come back out of > unpack_compressed_entry(), and then everything is corrupt... I bet you actually got an earlier error already from the inflateInit. The Z_STREAM_ERROR probably comes from inflate() itself - and could very easily be due to a allocation error in inflateInit leaving the stream data incomplete. Let me try wrapping that dang thing and send a patch. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 3:06 ` Linus Torvalds @ 2009-01-08 3:13 ` Shawn O. Pearce 2009-01-08 3:16 ` [PATCH] Wrap inflateInit to retry allocation after releasing pack memory Shawn O. Pearce 0 siblings, 1 reply; 49+ messages in thread From: Shawn O. Pearce @ 2009-01-08 3:13 UTC (permalink / raw) To: Linus Torvalds Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb Linus Torvalds <torvalds@linux-foundation.org> wrote: > The Z_STREAM_ERROR probably comes from inflate() itself - and could very > easily be due to a allocation error in inflateInit leaving the stream data > incomplete. > > Let me try wrapping that dang thing and send a patch. Yup. I'm actually doing the same thing... -- Shawn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 3:13 ` Shawn O. Pearce @ 2009-01-08 3:16 ` Shawn O. Pearce 2009-01-08 3:54 ` Linus Torvalds 0 siblings, 1 reply; 49+ messages in thread From: Shawn O. Pearce @ 2009-01-08 3:16 UTC (permalink / raw) To: Linus Torvalds, Junio C Hamano Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb If we are running low on virtual memory we should release pack windows if zlib's inflateInit fails due to an out of memory error. It may be that we are running under a low ulimit and are getting tight on address space. Shedding unused windows may get us sufficient working space to continue. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> --- "Shawn O. Pearce" <spearce@spearce.org> wrote: > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > The Z_STREAM_ERROR probably comes from inflate() itself - and could very > > easily be due to a allocation error in inflateInit leaving the stream data > > incomplete. > > > > Let me try wrapping that dang thing and send a patch. > > Yup. I'm actually doing the same thing... builtin-apply.c | 2 +- builtin-pack-objects.c | 2 +- builtin-unpack-objects.c | 2 +- cache.h | 1 + http-push.c | 4 ++-- http-walker.c | 4 ++-- index-pack.c | 4 ++-- sha1_file.c | 8 ++++---- wrapper.c | 20 ++++++++++++++++++++ 9 files changed, 34 insertions(+), 13 deletions(-) diff --git a/builtin-apply.c b/builtin-apply.c index af25ee9..cb2663e 100644 --- a/builtin-apply.c +++ b/builtin-apply.c @@ -1258,7 +1258,7 @@ static char *inflate_it(const void *data, unsigned long size, stream.avail_in = size; stream.next_out = out = xmalloc(inflated_size); stream.avail_out = inflated_size; - inflateInit(&stream); + xinflateInit(&stream); st = inflate(&stream, Z_FINISH); if ((st != Z_STREAM_END) || stream.total_out != inflated_size) { free(out); diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index e851534..09576c6 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -195,7 +195,7 @@ static int check_pack_inflate(struct packed_git *p, int st; memset(&stream, 0, sizeof(stream)); - inflateInit(&stream); + xinflateInit(&stream); do { in = use_pack(p, w_curs, offset, &stream.avail_in); stream.next_in = in; diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c index 47ed610..cb9edac 100644 --- a/builtin-unpack-objects.c +++ b/builtin-unpack-objects.c @@ -99,7 +99,7 @@ static void *get_data(unsigned long size) stream.avail_out = size; stream.next_in = fill(1); stream.avail_in = len; - inflateInit(&stream); + xinflateInit(&stream); for (;;) { int ret = inflate(&stream, 0); diff --git a/cache.h b/cache.h index 231c06d..7d5c38d 100644 --- a/cache.h +++ b/cache.h @@ -17,6 +17,7 @@ #if defined(NO_DEFLATE_BOUND) || ZLIB_VERNUM < 0x1200 #define deflateBound(c,s) ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11) #endif +extern void xinflateInit(z_stream *stream); #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT) #define DTYPE(de) ((de)->d_type) diff --git a/http-push.c b/http-push.c index a4b7d08..906ca48 100644 --- a/http-push.c +++ b/http-push.c @@ -269,7 +269,7 @@ static void start_fetch_loose(struct transfer_request *request) memset(&request->stream, 0, sizeof(request->stream)); - inflateInit(&request->stream); + xinflateInit(&request->stream); git_SHA1_Init(&request->c); @@ -310,7 +310,7 @@ static void start_fetch_loose(struct transfer_request *request) file; also rewind to the beginning of the local file. */ if (prev_read == -1) { memset(&request->stream, 0, sizeof(request->stream)); - inflateInit(&request->stream); + xinflateInit(&request->stream); git_SHA1_Init(&request->c); if (prev_posn>0) { prev_posn = 0; diff --git a/http-walker.c b/http-walker.c index 7271c7d..6aa8486 100644 --- a/http-walker.c +++ b/http-walker.c @@ -142,7 +142,7 @@ static void start_object_request(struct walker *walker, memset(&obj_req->stream, 0, sizeof(obj_req->stream)); - inflateInit(&obj_req->stream); + xinflateInit(&obj_req->stream); git_SHA1_Init(&obj_req->c); @@ -183,7 +183,7 @@ static void start_object_request(struct walker *walker, file; also rewind to the beginning of the local file. */ if (prev_read == -1) { memset(&obj_req->stream, 0, sizeof(obj_req->stream)); - inflateInit(&obj_req->stream); + xinflateInit(&obj_req->stream); git_SHA1_Init(&obj_req->c); if (prev_posn>0) { prev_posn = 0; diff --git a/index-pack.c b/index-pack.c index 2931511..c6bfc12 100644 --- a/index-pack.c +++ b/index-pack.c @@ -275,7 +275,7 @@ static void *unpack_entry_data(unsigned long offset, unsigned long size) stream.avail_out = size; stream.next_in = fill(1); stream.avail_in = input_len; - inflateInit(&stream); + xinflateInit(&stream); for (;;) { int ret = inflate(&stream, 0); @@ -382,7 +382,7 @@ static void *get_data_from_pack(struct object_entry *obj) stream.avail_out = obj->size; stream.next_in = src; stream.avail_in = len; - inflateInit(&stream); + xinflateInit(&stream); while ((st = inflate(&stream, Z_FINISH)) == Z_OK); inflateEnd(&stream); if (st != Z_STREAM_END || stream.total_out != obj->size) diff --git a/sha1_file.c b/sha1_file.c index 52d1ead..9aabae2 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1196,7 +1196,7 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon stream->avail_out = bufsiz; if (legacy_loose_object(map)) { - inflateInit(stream); + xinflateInit(stream); return inflate(stream, 0); } @@ -1217,7 +1217,7 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon /* Set up the stream for the rest.. */ stream->next_in = map; stream->avail_in = mapsize; - inflateInit(stream); + xinflateInit(stream); /* And generate the fake traditional header */ stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu", @@ -1348,7 +1348,7 @@ unsigned long get_size_from_delta(struct packed_git *p, stream.next_out = delta_head; stream.avail_out = sizeof(delta_head); - inflateInit(&stream); + xinflateInit(&stream); do { in = use_pack(p, w_curs, curpos, &stream.avail_in); stream.next_in = in; @@ -1585,7 +1585,7 @@ static void *unpack_compressed_entry(struct packed_git *p, stream.next_out = buffer; stream.avail_out = size; - inflateInit(&stream); + xinflateInit(&stream); do { in = use_pack(p, w_curs, curpos, &stream.avail_in); stream.next_in = in; diff --git a/wrapper.c b/wrapper.c index 93562f0..f255eef 100644 --- a/wrapper.c +++ b/wrapper.c @@ -196,3 +196,23 @@ int xmkstemp(char *template) die("Unable to create temporary file: %s", strerror(errno)); return fd; } + +void xinflateInit(z_stream *stream) +{ + switch (inflateInit(stream)) { + case Z_OK: + return; + + case Z_MEM_ERROR: + release_pack_memory(128 * 1024, -1); + if (inflateInit(stream) == Z_OK) + return; + die("Out of memory? inflateInit failed"); + + case Z_VERSION_ERROR: + die("Wrong zlib version? inflateInit failed"); + + default: + die("Unknown inflateInit failure"); + } +} -- 1.6.1.141.gfe98e ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 3:16 ` [PATCH] Wrap inflateInit to retry allocation after releasing pack memory Shawn O. Pearce @ 2009-01-08 3:54 ` Linus Torvalds 2009-01-08 5:23 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 3:54 UTC (permalink / raw) To: Shawn O. Pearce Cc: Junio C Hamano, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb On Wed, 7 Jan 2009, Shawn O. Pearce wrote: > > If we are running low on virtual memory we should release pack > windows if zlib's inflateInit fails due to an out of memory error. > It may be that we are running under a low ulimit and are getting > tight on address space. Shedding unused windows may get us > sufficient working space to continue. Let's do this (more complete) wrapping instead, ok? This one _just_ wraps things, btw - it doesn't do the "retry on low memory error" part, at least not yet. I think that's an independent issue from the reporting. Hmm? Tyler - does this make the corruption errors go away, and be replaced by hard failures with "out of memory" reporting? This patch is potentially pretty noisy, on purpose. I didn't remove the reporting from places that already do so - some of them have stricter errors than this. For example: Z_BUF_ERROR can be valid depending on circumstance, so the wrapper doesn't complain about it, but the caller may not accept it. Linus --- builtin-apply.c | 5 ++- builtin-pack-objects.c | 6 ++-- builtin-unpack-objects.c | 6 ++-- cache.h | 4 +++ http-push.c | 8 +++--- http-walker.c | 8 +++--- index-pack.c | 12 ++++---- sha1_file.c | 24 +++++++++--------- wrapper.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 99 insertions(+), 34 deletions(-) diff --git a/builtin-apply.c b/builtin-apply.c index 07244b0..ed02b6d 100644 --- a/builtin-apply.c +++ b/builtin-apply.c @@ -1253,8 +1253,9 @@ static char *inflate_it(const void *data, unsigned long size, stream.avail_in = size; stream.next_out = out = xmalloc(inflated_size); stream.avail_out = inflated_size; - inflateInit(&stream); - st = inflate(&stream, Z_FINISH); + git_inflate_init(&stream); + st = git_inflate(&stream, Z_FINISH); + git_inflate_end(&stream); if ((st != Z_STREAM_END) || stream.total_out != inflated_size) { free(out); return NULL; diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index e851534..cb51916 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -195,16 +195,16 @@ static int check_pack_inflate(struct packed_git *p, int st; memset(&stream, 0, sizeof(stream)); - inflateInit(&stream); + git_inflate_init(&stream); do { in = use_pack(p, w_curs, offset, &stream.avail_in); stream.next_in = in; stream.next_out = fakebuf; stream.avail_out = sizeof(fakebuf); - st = inflate(&stream, Z_FINISH); + st = git_inflate(&stream, Z_FINISH); offset += stream.next_in - in; } while (st == Z_OK || st == Z_BUF_ERROR); - inflateEnd(&stream); + git_inflate_end(&stream); return (st == Z_STREAM_END && stream.total_out == expect && stream.total_in == len) ? 0 : -1; diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c index 47ed610..9a77323 100644 --- a/builtin-unpack-objects.c +++ b/builtin-unpack-objects.c @@ -99,10 +99,10 @@ static void *get_data(unsigned long size) stream.avail_out = size; stream.next_in = fill(1); stream.avail_in = len; - inflateInit(&stream); + git_inflate_init(&stream); for (;;) { - int ret = inflate(&stream, 0); + int ret = git_inflate(&stream, 0); use(len - stream.avail_in); if (stream.total_out == size && ret == Z_STREAM_END) break; @@ -118,7 +118,7 @@ static void *get_data(unsigned long size) stream.next_in = fill(1); stream.avail_in = len; } - inflateEnd(&stream); + git_inflate_end(&stream); return buf; } diff --git a/cache.h b/cache.h index 231c06d..49e54fb 100644 --- a/cache.h +++ b/cache.h @@ -18,6 +18,10 @@ #define deflateBound(c,s) ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11) #endif +void git_inflate_init(z_streamp strm); +void git_inflate_end(z_streamp strm); +int git_inflate(z_streamp strm, int flush); + #if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT) #define DTYPE(de) ((de)->d_type) #else diff --git a/http-push.c b/http-push.c index 7c64609..809002b 100644 --- a/http-push.c +++ b/http-push.c @@ -208,7 +208,7 @@ static size_t fwrite_sha1_file(void *ptr, size_t eltsize, size_t nmemb, do { request->stream.next_out = expn; request->stream.avail_out = sizeof(expn); - request->zret = inflate(&request->stream, Z_SYNC_FLUSH); + request->zret = git_inflate(&request->stream, Z_SYNC_FLUSH); git_SHA1_Update(&request->c, expn, sizeof(expn) - request->stream.avail_out); } while (request->stream.avail_in && request->zret == Z_OK); @@ -268,7 +268,7 @@ static void start_fetch_loose(struct transfer_request *request) memset(&request->stream, 0, sizeof(request->stream)); - inflateInit(&request->stream); + git_inflate_init(&request->stream); git_SHA1_Init(&request->c); @@ -309,7 +309,7 @@ static void start_fetch_loose(struct transfer_request *request) file; also rewind to the beginning of the local file. */ if (prev_read == -1) { memset(&request->stream, 0, sizeof(request->stream)); - inflateInit(&request->stream); + git_inflate_init(&request->stream); git_SHA1_Init(&request->c); if (prev_posn>0) { prev_posn = 0; @@ -741,7 +741,7 @@ static void finish_request(struct transfer_request *request) if (request->http_code == 416) fprintf(stderr, "Warning: requested range invalid; we may already have all the data.\n"); - inflateEnd(&request->stream); + git_inflate_end(&request->stream); git_SHA1_Final(request->real_sha1, &request->c); if (request->zret != Z_STREAM_END) { unlink(request->tmpfile); diff --git a/http-walker.c b/http-walker.c index 7271c7d..0dbad3c 100644 --- a/http-walker.c +++ b/http-walker.c @@ -82,7 +82,7 @@ static size_t fwrite_sha1_file(void *ptr, size_t eltsize, size_t nmemb, do { obj_req->stream.next_out = expn; obj_req->stream.avail_out = sizeof(expn); - obj_req->zret = inflate(&obj_req->stream, Z_SYNC_FLUSH); + obj_req->zret = git_inflate(&obj_req->stream, Z_SYNC_FLUSH); git_SHA1_Update(&obj_req->c, expn, sizeof(expn) - obj_req->stream.avail_out); } while (obj_req->stream.avail_in && obj_req->zret == Z_OK); @@ -142,7 +142,7 @@ static void start_object_request(struct walker *walker, memset(&obj_req->stream, 0, sizeof(obj_req->stream)); - inflateInit(&obj_req->stream); + git_inflate_init(&obj_req->stream); git_SHA1_Init(&obj_req->c); @@ -183,7 +183,7 @@ static void start_object_request(struct walker *walker, file; also rewind to the beginning of the local file. */ if (prev_read == -1) { memset(&obj_req->stream, 0, sizeof(obj_req->stream)); - inflateInit(&obj_req->stream); + git_inflate_init(&obj_req->stream); git_SHA1_Init(&obj_req->c); if (prev_posn>0) { prev_posn = 0; @@ -243,7 +243,7 @@ static void finish_object_request(struct object_request *obj_req) return; } - inflateEnd(&obj_req->stream); + git_inflate_end(&obj_req->stream); git_SHA1_Final(obj_req->real_sha1, &obj_req->c); if (obj_req->zret != Z_STREAM_END) { unlink(obj_req->tmpfile); diff --git a/index-pack.c b/index-pack.c index 60ed41a..c0a3d97 100644 --- a/index-pack.c +++ b/index-pack.c @@ -275,10 +275,10 @@ static void *unpack_entry_data(unsigned long offset, unsigned long size) stream.avail_out = size; stream.next_in = fill(1); stream.avail_in = input_len; - inflateInit(&stream); + git_inflate_init(&stream); for (;;) { - int ret = inflate(&stream, 0); + int ret = git_inflate(&stream, 0); use(input_len - stream.avail_in); if (stream.total_out == size && ret == Z_STREAM_END) break; @@ -287,7 +287,7 @@ static void *unpack_entry_data(unsigned long offset, unsigned long size) stream.next_in = fill(1); stream.avail_in = input_len; } - inflateEnd(&stream); + git_inflate_end(&stream); return buf; } @@ -382,9 +382,9 @@ static void *get_data_from_pack(struct object_entry *obj) stream.avail_out = obj->size; stream.next_in = src; stream.avail_in = len; - inflateInit(&stream); - while ((st = inflate(&stream, Z_FINISH)) == Z_OK); - inflateEnd(&stream); + git_inflate_init(&stream); + while ((st = git_inflate(&stream, Z_FINISH)) == Z_OK); + git_inflate_end(&stream); if (st != Z_STREAM_END || stream.total_out != obj->size) die("serious inflate inconsistency"); free(src); diff --git a/sha1_file.c b/sha1_file.c index 52d1ead..8600b04 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1196,8 +1196,8 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon stream->avail_out = bufsiz; if (legacy_loose_object(map)) { - inflateInit(stream); - return inflate(stream, 0); + git_inflate_init(stream); + return git_inflate(stream, 0); } @@ -1217,7 +1217,7 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon /* Set up the stream for the rest.. */ stream->next_in = map; stream->avail_in = mapsize; - inflateInit(stream); + git_inflate_init(stream); /* And generate the fake traditional header */ stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu", @@ -1254,11 +1254,11 @@ static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size stream->next_out = buf + bytes; stream->avail_out = size - bytes; while (status == Z_OK) - status = inflate(stream, Z_FINISH); + status = git_inflate(stream, Z_FINISH); } buf[size] = 0; if (status == Z_STREAM_END && !stream->avail_in) { - inflateEnd(stream); + git_inflate_end(stream); return buf; } @@ -1348,15 +1348,15 @@ unsigned long get_size_from_delta(struct packed_git *p, stream.next_out = delta_head; stream.avail_out = sizeof(delta_head); - inflateInit(&stream); + git_inflate_init(&stream); do { in = use_pack(p, w_curs, curpos, &stream.avail_in); stream.next_in = in; - st = inflate(&stream, Z_FINISH); + st = git_inflate(&stream, Z_FINISH); curpos += stream.next_in - in; } while ((st == Z_OK || st == Z_BUF_ERROR) && stream.total_out < sizeof(delta_head)); - inflateEnd(&stream); + git_inflate_end(&stream); if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) { error("delta data unpack-initial failed"); return 0; @@ -1585,14 +1585,14 @@ static void *unpack_compressed_entry(struct packed_git *p, stream.next_out = buffer; stream.avail_out = size; - inflateInit(&stream); + git_inflate_init(&stream); do { in = use_pack(p, w_curs, curpos, &stream.avail_in); stream.next_in = in; - st = inflate(&stream, Z_FINISH); + st = git_inflate(&stream, Z_FINISH); curpos += stream.next_in - in; } while (st == Z_OK || st == Z_BUF_ERROR); - inflateEnd(&stream); + git_inflate_end(&stream); if ((st != Z_STREAM_END) || stream.total_out != size) { free(buffer); return NULL; @@ -2017,7 +2017,7 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size status = error("unable to parse %s header", sha1_to_hex(sha1)); else if (sizep) *sizep = size; - inflateEnd(&stream); + git_inflate_end(&stream); munmap(map, mapsize); return status; } diff --git a/wrapper.c b/wrapper.c index 93562f0..29afa96 100644 --- a/wrapper.c +++ b/wrapper.c @@ -196,3 +196,63 @@ int xmkstemp(char *template) die("Unable to create temporary file: %s", strerror(errno)); return fd; } + +/* + * zlib wrappers to make sure we don't silently miss errors + * at init time. + */ +void git_inflate_init(z_streamp strm) +{ + const char *err; + + switch (inflateInit(strm)) { + case Z_OK: + return; + + case Z_MEM_ERROR: + err = "out of memory"; + break; + case Z_VERSION_ERROR: + err = "wrong version"; + break; + default: + err = "error"; + } + die("inflateInit: %s (%s)", err, strm->msg ? strm->msg : "no message"); +} + +void git_inflate_end(z_streamp strm) +{ + if (inflateEnd(strm) != Z_OK) + error("inflateEnd: %s", strm->msg ? strm->msg : "failed"); +} + +int git_inflate(z_streamp strm, int flush) +{ + int ret = inflate(strm, flush); + const char *err; + + switch (ret) { + /* Out of memory is fatal. */ + case Z_MEM_ERROR: + die("inflate: out of memory"); + + /* Data corruption errors: we may want to recover from them (fsck) */ + case Z_NEED_DICT: + err = "needs dictionary"; break; + case Z_DATA_ERROR: + err = "data stream error"; break; + case Z_STREAM_ERROR: + err = "stream consistency error"; break; + default: + err = "unknown error"; break; + + /* Z_BUF_ERROR: normal, needs a buffer output buffer */ + case Z_BUF_ERROR: + case Z_OK: + case Z_STREAM_END: + return ret; + } + error("inflate: %s (%s)", err, strm->msg ? strm->msg : "no message"); + return ret; +} ^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 3:54 ` Linus Torvalds @ 2009-01-08 5:23 ` Junio C Hamano 2009-01-08 15:35 ` Linus Torvalds 2009-01-08 15:34 ` Shawn O. Pearce 2009-01-08 18:15 ` R. Tyler Ballance 2 siblings, 1 reply; 49+ messages in thread From: Junio C Hamano @ 2009-01-08 5:23 UTC (permalink / raw) To: Linus Torvalds Cc: Shawn O. Pearce, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb Linus Torvalds <torvalds@linux-foundation.org> writes: > +int git_inflate(z_streamp strm, int flush) > +{ > +... > + /* Z_BUF_ERROR: normal, needs a buffer output buffer */ > + case Z_BUF_ERROR: Thanks, but "needs a buffer output buffer" made me scratch my head somewhat. ... Z_BUF_ERROR if no progress is possible or if there was not enough room in the output buffer when Z_FINISH is used. Note that Z_BUF_ERROR is not fatal, and inflate() can be called again with more input and more output space to continue decompressing. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 5:23 ` Junio C Hamano @ 2009-01-08 15:35 ` Linus Torvalds 0 siblings, 0 replies; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 15:35 UTC (permalink / raw) To: Junio C Hamano Cc: Shawn O. Pearce, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb On Wed, 7 Jan 2009, Junio C Hamano wrote: > > Thanks, but "needs a buffer output buffer" made me scratch my head > somewhat. That was just me editing it. It was originally "needs an output buffer" and then I was supposed to edit it to "needs a buffer" (because it can be either output or input, and in the case of git it's usually actually the input that was partial). And then I messed up, and it became that. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 3:54 ` Linus Torvalds 2009-01-08 5:23 ` Junio C Hamano @ 2009-01-08 15:34 ` Shawn O. Pearce 2009-01-08 16:14 ` Linus Torvalds 2009-01-08 18:15 ` R. Tyler Ballance 2 siblings, 1 reply; 49+ messages in thread From: Shawn O. Pearce @ 2009-01-08 15:34 UTC (permalink / raw) To: Linus Torvalds Cc: Junio C Hamano, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb Linus Torvalds <torvalds@linux-foundation.org> wrote: > Let's do this (more complete) wrapping instead, ok? Ack. > This one _just_ wraps things, btw - it doesn't do the "retry on low memory > error" part, at least not yet. I think that's an independent issue from > the reporting. I still think we should try to reduce pack memory usage when we get oom from zlib and retry the current operation once. We do it almost everywhere else and it works relatively well. We may also want to consider dropping (e.g. halving) the window size and/or limit when we run out of memory. We'll run slower but if the OS has denied us further resources it may be a ulimit thing on a shared system, and we should try harder to work with what we have available to us. -- Shawn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 15:34 ` Shawn O. Pearce @ 2009-01-08 16:14 ` Linus Torvalds 0 siblings, 0 replies; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 16:14 UTC (permalink / raw) To: Shawn O. Pearce Cc: Junio C Hamano, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb On Thu, 8 Jan 2009, Shawn O. Pearce wrote: > > I still think we should try to reduce pack memory usage when we get > oom from zlib and retry the current operation once. We do it almost > everywhere else and it works relatively well. Oh, I agree. It's just that I wanted to verify that people who see this problem actually see the message, and that we confirm that it is due to this and nothing else. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 3:54 ` Linus Torvalds 2009-01-08 5:23 ` Junio C Hamano 2009-01-08 15:34 ` Shawn O. Pearce @ 2009-01-08 18:15 ` R. Tyler Ballance 2009-01-08 20:22 ` Linus Torvalds 2 siblings, 1 reply; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-08 18:15 UTC (permalink / raw) To: Linus Torvalds Cc: Shawn O. Pearce, Junio C Hamano, Nicolas Pitre, Jan Krüger, Git ML, kb [-- Attachment #1: Type: text/plain, Size: 2597 bytes --] On Wed, 2009-01-07 at 19:54 -0800, Linus Torvalds wrote: > > On Wed, 7 Jan 2009, Shawn O. Pearce wrote: > > > > If we are running low on virtual memory we should release pack > > windows if zlib's inflateInit fails due to an out of memory error. > > It may be that we are running under a low ulimit and are getting > > tight on address space. Shedding unused windows may get us > > sufficient working space to continue. > > Let's do this (more complete) wrapping instead, ok? > > This one _just_ wraps things, btw - it doesn't do the "retry on low memory > error" part, at least not yet. I think that's an independent issue from > the reporting. > > Hmm? > > Tyler - does this make the corruption errors go away, and be replaced by > hard failures with "out of memory" reporting? Yeah, looks like it: tyler@grapefruit:~/source/git/linux-2.6> export START=56d18e9932ebf4e8eca42d2ce509450e6c9c1666 tyler@grapefruit:~/source/git/linux-2.6> git reset --hard HEAD is now at 9e42d0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 tyler@grapefruit:~/source/git/linux-2.6> git reset --hard $START HEAD is now at 56d18e9 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus tyler@grapefruit:~/source/git/linux-2.6> ulimit -v `echo "350 * 1024" | bc -l` tyler@grapefruit:~/source/git/linux-2.6> limit cputime unlimited filesize unlimited datasize unlimited stacksize 8MB coredumpsize 0kB memoryuse 2561MB maxproc 24564 descriptors 1024 memorylocked 64kB addressspace 350MB maxfilelocks unlimited sigpending 24564 msgqueue 819200 nice 0 rt_priority 0 tyler@grapefruit:~/source/git/linux-2.6> git pull Updating 56d18e9..9e42d0c fatal: Out of memory? inflateInit failed tyler@grapefruit:~/source/git/linux-2.6> which git /home/tyler/bin/git tyler@grapefruit:~/source/git/linux-2.6> > > This patch is potentially pretty noisy, on purpose. I didn't remove the > reporting from places that already do so - some of them have stricter > errors than this. I'm assuming this patch is going to be reworked, if so, I'll back it out of our internal 1.6.1 build and anxiously await The Real Deal(tm) Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 18:15 ` R. Tyler Ballance @ 2009-01-08 20:22 ` Linus Torvalds 2009-01-08 20:37 ` R. Tyler Ballance 2009-01-09 1:43 ` Junio C Hamano 0 siblings, 2 replies; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 20:22 UTC (permalink / raw) To: R. Tyler Ballance Cc: Shawn O. Pearce, Junio C Hamano, Nicolas Pitre, Jan Krüger, Git ML, kb On Thu, 8 Jan 2009, R. Tyler Ballance wrote: > > > > Tyler - does this make the corruption errors go away, and be replaced by > > hard failures with "out of memory" reporting? > > Yeah, looks like it: Well, I was hoping that you'd have a confirmation from your own huge repo, but I do suspect it's all the same thing, so I guess this counts as confirmation too. > > This patch is potentially pretty noisy, on purpose. I didn't remove the > > reporting from places that already do so - some of them have stricter > > errors than this. > > I'm assuming this patch is going to be reworked, if so, I'll back it out > of our internal 1.6.1 build and anxiously await The Real Deal(tm) Oh, it shouldn't be any noisier under _normal_ load - it's more that certain real corruption cases will now report the error twice. That said, the new errors should actually be more informative than the old ones, so even that isn't necessarily all bad. Junio - I think we should apply this, and likely to the stable branch too. Add the re-trying the inflateInit() after shrinking pack windows on top of it. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 20:22 ` Linus Torvalds @ 2009-01-08 20:37 ` R. Tyler Ballance 2009-01-09 1:43 ` Junio C Hamano 1 sibling, 0 replies; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-08 20:37 UTC (permalink / raw) To: Linus Torvalds Cc: Shawn O. Pearce, Junio C Hamano, Nicolas Pitre, Jan Krüger, Git ML, kb [-- Attachment #1: Type: text/plain, Size: 2325 bytes --] On Thu, 2009-01-08 at 12:22 -0800, Linus Torvalds wrote: > > On Thu, 8 Jan 2009, R. Tyler Ballance wrote: > > > > > > Tyler - does this make the corruption errors go away, and be replaced by > > > hard failures with "out of memory" reporting? > > > > Yeah, looks like it: > > Well, I was hoping that you'd have a confirmation from your own huge repo, > but I do suspect it's all the same thing, so I guess this counts as > confirmation too. I never got a real solid "consistent" reproduction case with our repository, just a lot of users that experienced the issue. I think the Linux repro case is a far better example, and yeah, it's sorta confirmation (waiting for operations here to deploy the patched 1.6.1 to dev machines). > > > > This patch is potentially pretty noisy, on purpose. I didn't remove the > > > reporting from places that already do so - some of them have stricter > > > errors than this. > > > > I'm assuming this patch is going to be reworked, if so, I'll back it out > > of our internal 1.6.1 build and anxiously await The Real Deal(tm) > > Oh, it shouldn't be any noisier under _normal_ load - it's more that > certain real corruption cases will now report the error twice. That said, > the new errors should actually be more informative than the old ones, so > even that isn't necessarily all bad. > > Junio - I think we should apply this, and likely to the stable branch too. > Add the re-trying the inflateInit() after shrinking pack windows on top of > it. I really appreciate this guys, this is one of the longer threads I've participated (spanning over a month) and I'm glad you guys were finally able to track the issue down. From now moving forward, I'll try to get a reproduction case with the kernel tree or something equally big since I know it's frustrating to play the game of telephone with a proprietary code base ("try this? what does that do? okay, then this?"). Linus, I'll have a chance to look at your comments on my "variable packed git window size" patch this weekend, and I'll follow-up in the appropriate thread. I'm relatively certain that after this witch hunt, I can get Slide to cover a round of beers at LinuxWorld or the nearest GitTogether ;) Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory 2009-01-08 20:22 ` Linus Torvalds 2009-01-08 20:37 ` R. Tyler Ballance @ 2009-01-09 1:43 ` Junio C Hamano 1 sibling, 0 replies; 49+ messages in thread From: Junio C Hamano @ 2009-01-09 1:43 UTC (permalink / raw) To: Linus Torvalds Cc: R. Tyler Ballance, Shawn O. Pearce, Nicolas Pitre, Jan Krüger, Git ML, kb Linus Torvalds <torvalds@linux-foundation.org> writes: > Junio - I think we should apply this, and likely to the stable branch too. Yeah, I didn't lose the patch. > Add the re-trying the inflateInit() after shrinking pack windows on top of > it. That too. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-07 23:29 ` Linus Torvalds 2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance @ 2009-01-08 0:37 ` Linus Torvalds 2009-01-08 0:49 ` R. Tyler Ballance 1 sibling, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 0:37 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML On Wed, 7 Jan 2009, Linus Torvalds wrote: > > > > limit ~1.5GB -> corrupt file > > > limit ~3GB -> magically no longer corrupt. > > That is interesting, although I also worry that there might be other > issues going on (ie since you've reported thigns magically fixing > themselves, maybe the ulimit tests just _happened_ to show that, even if > it wasn't the core reason). > > BUT! This is definitely worth looking at. > > For example, we do have some cases where we try to do "mmap()", and if it > fails, we try to free some memory and try again. In particular, in > xmmap(), if an mmap() fails - which may be due to running out of virtual > address space - we'll actually try to release some pack-file memory and > try again. Maybe there's a bug there - and it would be one that seldom > triggers for others. Ho humm. We really do have some interesting things there. Is this a 64-bit machine? I didn't think OS X did that, but if there is some limited 64-bit support there, maybe "sizeof(void *)" is 8, then we default the default git pack-window to a pretty healthy 1GB. I could easily see that if you have a virtual memory size limit of 1.5GB, and the pack window size is 1GB, we might have trouble. Because we could only keep one such pack window in memory at a time. I have _not_ looked at the code, though. I'd have expected a SIGSEGV if we really had issues with the window handling. Anyway, _if_ your system has 64-bit pointers, then _maybe_ something the default 1GB pack window causes problem. If so, then adding a [core] packedgitwindowsize = 64M might make a difference. It would certainly be very interesting to hear if there's any impact. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds @ 2009-01-08 0:49 ` R. Tyler Ballance 2009-01-08 1:01 ` Linus Torvalds 0 siblings, 1 reply; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-08 0:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML [-- Attachment #1: Type: text/plain, Size: 2976 bytes --] On Wed, 2009-01-07 at 16:37 -0800, Linus Torvalds wrote: > > On Wed, 7 Jan 2009, Linus Torvalds wrote: > > > > > > limit ~1.5GB -> corrupt file > > > > limit ~3GB -> magically no longer corrupt. > > > > That is interesting, although I also worry that there might be other > > issues going on (ie since you've reported thigns magically fixing > > themselves, maybe the ulimit tests just _happened_ to show that, even if > > it wasn't the core reason). > > > > BUT! This is definitely worth looking at. > > > > For example, we do have some cases where we try to do "mmap()", and if it > > fails, we try to free some memory and try again. In particular, in > > xmmap(), if an mmap() fails - which may be due to running out of virtual > > address space - we'll actually try to release some pack-file memory and > > try again. Maybe there's a bug there - and it would be one that seldom > > triggers for others. > > Ho humm. We really do have some interesting things there. Always enjoyable when these mail threads get this deep ;) > > Is this a 64-bit machine? I didn't think OS X did that, but if there is > some limited 64-bit support there, maybe "sizeof(void *)" is 8, then we > default the default git pack-window to a pretty healthy 1GB. I was only mentioning OS X with regards to the Samba/NFS red herring, the rest of our operations are on 64-bit Linux machines. The machine I reproduced this on ("Public repo case!") is the following: tyler@grapefruit:~> uname -a Linux grapefruit.corp.slide.com 2.6.27.7-9-default #1 SMP 2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux tyler@grapefruit:~> cat /etc/issue Welcome to openSUSE 11.1 - Kernel \r (\l). The machines we're experiencing this issue on "in the wild" are: xdev3 (master)% uname -a Linux xdev3 2.6.24-22-server #1 SMP Mon Nov 24 20:06:28 UTC 2008 x86_64 GNU/Linux xdev3 (master)% cat /etc/issue Ubuntu 8.04.1 \n \l > > I could easily see that if you have a virtual memory size limit of 1.5GB, > and the pack window size is 1GB, we might have trouble. Because we could > only keep one such pack window in memory at a time. The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW > > I have _not_ looked at the code, though. I'd have expected a SIGSEGV if we > really had issues with the window handling. > > Anyway, _if_ your system has 64-bit pointers, then _maybe_ something the > default 1GB pack window causes problem. > > If so, then adding a > > [core] > packedgitwindowsize = 64M > > might make a difference. It would certainly be very interesting to hear if > there's any impact. I can try this still if you'd like, but it doesn't seem like that'd be the issue since we're already lowering the window size system-wide Cheers -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 0:49 ` R. Tyler Ballance @ 2009-01-08 1:01 ` Linus Torvalds 2009-01-08 1:06 ` R. Tyler Ballance 0 siblings, 1 reply; 49+ messages in thread From: Linus Torvalds @ 2009-01-08 1:01 UTC (permalink / raw) To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML On Wed, 7 Jan 2009, R. Tyler Ballance wrote: > > I was only mentioning OS X with regards to the Samba/NFS red herring, > the rest of our operations are on 64-bit Linux machines. Ahh, ok. Good. > > I could easily see that if you have a virtual memory size limit of 1.5GB, > > and the pack window size is 1GB, we might have trouble. Because we could > > only keep one such pack window in memory at a time. > > The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW Interesting. So you already had to lower it. However, now that you mention it, and now that I search for your emails about it on the mailing list (I don't normally read the mailing list except very occasionally), I see your patch that does #define DYNAMIC_WINDOW_SIZE_PERCENTAGE 0.85 ... packed_git_window_size = (unsigned int)(as->rlim_cur * DYNAMIC_WINDOW_SIZE_PERCENTAGE); which is actually very bad. It's bad for several reasons: - 85% of the virtual address space is actually pessimal. You need space for AT LEAST two full-sized windows, so you need less than 50%. - the way that variable is used, it _has_ to be a multiple of the page size. In fact, it needs to be a multiple of _twice_ the page size. So just doing a random fraction of the rlimit is not correct. Setting it in the .gitconfig does it right, though. > > If so, then adding a > > > > [core] > > packedgitwindowsize = 64M > > > > might make a difference. It would certainly be very interesting to hear if > > there's any impact. > > I can try this still if you'd like, but it doesn't seem like that'd be > the issue since we're already lowering the window size system-wide Please do try, at least if your local git changes still match that patch I found, because that patch generates problems. Linus ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file 2009-01-08 1:01 ` Linus Torvalds @ 2009-01-08 1:06 ` R. Tyler Ballance 0 siblings, 0 replies; 49+ messages in thread From: R. Tyler Ballance @ 2009-01-08 1:06 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML [-- Attachment #1: Type: text/plain, Size: 2255 bytes --] On Wed, 2009-01-07 at 17:01 -0800, Linus Torvalds wrote: > > On Wed, 7 Jan 2009, R. Tyler Ballance wrote: > > > > I was only mentioning OS X with regards to the Samba/NFS red herring, > > the rest of our operations are on 64-bit Linux machines. > > Ahh, ok. Good. > > > > I could easily see that if you have a virtual memory size limit of 1.5GB, > > > and the pack window size is 1GB, we might have trouble. Because we could > > > only keep one such pack window in memory at a time. > > > > The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW > > Interesting. So you already had to lower it. However, now that you mention > it, and now that I search for your emails about it on the mailing list (I > don't normally read the mailing list except very occasionally), I see your > patch that does > > #define DYNAMIC_WINDOW_SIZE_PERCENTAGE 0.85 > ... > packed_git_window_size = (unsigned int)(as->rlim_cur * DYNAMIC_WINDOW_SIZE_PERCENTAGE); > > which is actually very bad. > > It's bad for several reasons: > > - 85% of the virtual address space is actually pessimal. > > You need space for AT LEAST two full-sized windows, so you need less > than 50%. > > - the way that variable is used, it _has_ to be a multiple of the page > size. In fact, it needs to be a multiple of _twice_ the page size. So > just doing a random fraction of the rlimit is not correct. This patch never made it into any of our Git builds because my flight landed and it wasn't stable enough (and as you pointed out, it sucks ;)) > > Setting it in the .gitconfig does it right, though. > > > > If so, then adding a > > > > > > [core] > > > packedgitwindowsize = 64M > > > > > > might make a difference. It would certainly be very interesting to hear if > > > there's any impact. > > > > I can try this still if you'd like, but it doesn't seem like that'd be > > the issue since we're already lowering the window size system-wide > > Please do try, at least if your local git changes still match that patch I > found, because that patch generates problems. See my prior reply in "Public repo case!" sent at 4:57PST -- -R. Tyler Ballance Slide, Inc. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2009-01-09 1:45 UTC | newest] Thread overview: 49+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger 2008-12-09 9:02 ` R. Tyler Ballance 2008-12-09 16:24 ` Shawn O. Pearce 2009-01-06 22:52 ` R. Tyler Ballance 2009-01-07 1:25 ` Nicolas Pitre 2009-01-07 1:39 ` R. Tyler Ballance 2009-01-07 2:09 ` Nicolas Pitre 2009-01-07 2:47 ` R. Tyler Ballance 2009-01-07 3:21 ` Nicolas Pitre 2009-01-07 4:54 ` Linus Torvalds 2009-01-07 7:41 ` R. Tyler Ballance 2009-01-07 8:16 ` Junio C Hamano 2009-01-07 8:32 ` R. Tyler Ballance 2009-01-07 9:42 ` Junio C Hamano 2009-01-07 9:05 ` R. Tyler Ballance 2009-01-07 15:31 ` Nicolas Pitre 2009-01-07 16:07 ` Linus Torvalds 2009-01-07 16:08 ` Linus Torvalds 2009-01-07 22:55 ` R. Tyler Ballance 2009-01-07 23:29 ` Linus Torvalds 2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance 2009-01-08 0:48 ` Linus Torvalds 2009-01-08 0:57 ` R. Tyler Ballance 2009-01-08 1:08 ` Linus Torvalds 2009-01-08 1:29 ` Linus Torvalds 2009-01-08 1:46 ` Shawn O. Pearce 2009-01-08 2:21 ` James Pickens 2009-01-08 2:43 ` Shawn O. Pearce 2009-01-08 5:40 ` Junio C Hamano 2009-01-08 6:04 ` Shawn O. Pearce 2009-01-08 2:52 ` Boyd Stephen Smith Jr. 2009-01-08 2:52 ` Linus Torvalds 2009-01-08 3:01 ` Shawn O. Pearce 2009-01-08 3:06 ` Linus Torvalds 2009-01-08 3:13 ` Shawn O. Pearce 2009-01-08 3:16 ` [PATCH] Wrap inflateInit to retry allocation after releasing pack memory Shawn O. Pearce 2009-01-08 3:54 ` Linus Torvalds 2009-01-08 5:23 ` Junio C Hamano 2009-01-08 15:35 ` Linus Torvalds 2009-01-08 15:34 ` Shawn O. Pearce 2009-01-08 16:14 ` Linus Torvalds 2009-01-08 18:15 ` R. Tyler Ballance 2009-01-08 20:22 ` Linus Torvalds 2009-01-08 20:37 ` R. Tyler Ballance 2009-01-09 1:43 ` Junio C Hamano 2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds 2009-01-08 0:49 ` R. Tyler Ballance 2009-01-08 1:01 ` Linus Torvalds 2009-01-08 1:06 ` R. Tyler Ballance
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).