* [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
@ 2008-12-09 8:36 Jan Krüger
2008-12-09 9:02 ` R. Tyler Ballance
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: Jan Krüger @ 2008-12-09 8:36 UTC (permalink / raw)
To: Git ML; +Cc: tyler
For fixing a corrupted repository by using backup copies of individual
files, allow write_sha1_file() to write loose files even if the object
already exists in a pack file, but only if the existing entry is marked
as corrupted.
Signed-off-by: Jan Krüger <jk@jk.gs>
---
On IRC I talked to rtyler who had a corrupted pack file and plenty of
object backups by way of cloned repositories. We decided to try
extracting the corrupted objects from the other object database and
injecting them into the broken repo as loose objects, but this failed
because sha1_write_file() refuses to write loose objects that are
already present in a pack file.
This patch expands the check to see if the pack entry has been marked
as corrupted and, if so, allows writing a loose object with the same
ID. Unfortunately, when Tyler tried a merge while using this patch,
something we didn't manage to track down happened and now git doesn't
consider the object corrupted anymore. I'm not sure enough that it
wasn't caused by the patch to submit this patch without hesitation.
Apart from that, I think the change is not all too great since it makes
write_sha1_file() walk the list of pack entries twice. That's a bit of
a waste.
So those are the reasons why I wanted a few opinions first. Another
reason is that there might be a way smarter method to fix this kind of
problem, in which case I'd love hearing about it for future reference.
sha1_file.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index 6c0e251..17085cc 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2373,14 +2373,17 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
char hdr[32];
int hdrlen;
- /* Normally if we have it in the pack then we do not bother writing
- * it out into .git/objects/??/?{38} file.
- */
write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
if (returnsha1)
hashcpy(returnsha1, sha1);
- if (has_sha1_file(sha1))
- return 0;
+ /* Normally if we have it in the pack then we do not bother writing
+ * it out into .git/objects/??/?{38} file. We do, though, if there
+ * is no chance that we have an uncorrupted version of the object.
+ */
+ if (has_sha1_file(sha1)) {
+ if (has_loose_object(sha1) || !has_packed_and_bad(sha1))
+ return 0;
+ }
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
}
--
1.6.0.4.766.g6fc4a
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger
@ 2008-12-09 9:02 ` R. Tyler Ballance
2008-12-09 16:24 ` Shawn O. Pearce
2009-01-06 22:52 ` R. Tyler Ballance
2 siblings, 0 replies; 49+ messages in thread
From: R. Tyler Ballance @ 2008-12-09 9:02 UTC (permalink / raw)
To: Jan Krüger; +Cc: Git ML
[-- Attachment #1: Type: text/plain, Size: 2313 bytes --]
On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote:
> For fixing a corrupted repository by using backup copies of individual
> files, allow write_sha1_file() to write loose files even if the object
> already exists in a pack file, but only if the existing entry is marked
> as corrupted.
>
> Signed-off-by: Jan Krüger <jk@jk.gs>
> ---
>
> On IRC I talked to rtyler who had a corrupted pack file and plenty of
> object backups by way of cloned repositories. We decided to try
> extracting the corrupted objects from the other object database and
> injecting them into the broken repo as loose objects, but this failed
> because sha1_write_file() refuses to write loose objects that are
> already present in a pack file.
Figured I'd chime in here with some anecdotal evidence with the error
condition that I hit shortly after Jan sent the email.
xdev3 (master-release)% git pull --no-ff . master
From .
* branch master -> FETCH_HEAD
error: failed to read object
befd9bc4d184b4383569909e4d245f3337c1f8ed at offset 1415784644
from .git/objects/pack/pack-f7eb06e39f01b528c1d1a2c413ac51b31b8515aa.pack
fatal: object befd9bc4d184b4383569909e4d245f3337c1f8ed is
corrupted
Merge with strategy recursive failed.
xdev3 (master-release)%
I ran that command a couple of times to make sure it wasn't a fluke, I
repeated the error numerous times (without switching branches or pulling
from a remote). This pull was done with a slightly modified internal
version of v1.6.0.4
xdev3 (master-release)% git --version
git version 1.6.0.4-kb1
xdev3 (master-release)
After consulting with Jan, I tried running the same command with a
modified version of v1.6.0.5 with Jan's patch
xdev3 (master-release)% ~/basket/bin/git pull --no-ff . master
From .
* branch master -> FETCH_HEAD
Merge made by recursive.
** TOP SECRET MERGES! ;) **
13 files changed, 51 insertions(+), 21 deletions(-)
xdev3 (master-release)%
Purely anecdotal as I'm not entirely clear what the hell is actually going on here :)
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger
2008-12-09 9:02 ` R. Tyler Ballance
@ 2008-12-09 16:24 ` Shawn O. Pearce
2009-01-06 22:52 ` R. Tyler Ballance
2 siblings, 0 replies; 49+ messages in thread
From: Shawn O. Pearce @ 2008-12-09 16:24 UTC (permalink / raw)
To: Jan Krüger; +Cc: Git ML, tyler
Jan Krüger <jk@jk.gs> wrote:
> For fixing a corrupted repository by using backup copies of individual
> files, allow write_sha1_file() to write loose files even if the object
> already exists in a pack file, but only if the existing entry is marked
> as corrupted.
Huh. So I'm digging around sha1_file.c and I'm not yet sure why
your patch makes a difference.
has_sha1_file() calls find_pack_entry() to determine which pack has
the object, and at what offset (if found). It doesn't care about
the offset, but it does care about the successful match.
find_pack_entry() already considers the bad_object_sha1 for each
pack, before it even tries the binary search within the index.
So if the entry was known to be bad has_sha1_file() will return 0,
unless the object is loose.
Where this breaks down is if the object is being created,
its very likely we didn't attempt to read it in this process.
The bad_object_sha1 table is transient and populated only when
unpacking an object entry fails. So for example during a merge
if a tree was stored in a pack and is corrupt and the merge
result produces that same tree object we won't write it out with
write_sha1_file() because it exists in a pack, but since we never
read it we also don't know the pack entry is corrupt.
Its horribly inefficient to read every object before we write it
back out. The best thing to do when faced with corruption is to
stop and repack, overlaying the object database from a known good
copy of the repository so pack-objects can use the good copy when
a corrupt object is identified.
So I agree with you that changing this in write_sha1_file() is a
bad idea for the normal good cases, but I also don't see how this
patch changes anything at all... the code path you introduced is
already implemented.
> diff --git a/sha1_file.c b/sha1_file.c
> index 6c0e251..17085cc 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2373,14 +2373,17 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
> char hdr[32];
> int hdrlen;
>
> - /* Normally if we have it in the pack then we do not bother writing
> - * it out into .git/objects/??/?{38} file.
> - */
> write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
> if (returnsha1)
> hashcpy(returnsha1, sha1);
> - if (has_sha1_file(sha1))
> - return 0;
> + /* Normally if we have it in the pack then we do not bother writing
> + * it out into .git/objects/??/?{38} file. We do, though, if there
> + * is no chance that we have an uncorrupted version of the object.
> + */
> + if (has_sha1_file(sha1)) {
> + if (has_loose_object(sha1) || !has_packed_and_bad(sha1))
> + return 0;
> + }
> return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
> }
--
Shawn.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger
2008-12-09 9:02 ` R. Tyler Ballance
2008-12-09 16:24 ` Shawn O. Pearce
@ 2009-01-06 22:52 ` R. Tyler Ballance
2009-01-07 1:25 ` Nicolas Pitre
2 siblings, 1 reply; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-06 22:52 UTC (permalink / raw)
To: Jan Krüger; +Cc: Git ML
[-- Attachment #1: Type: text/plain, Size: 3858 bytes --]
On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote:
> For fixing a corrupted repository by using backup copies of individual
> files, allow write_sha1_file() to write loose files even if the object
> already exists in a pack file, but only if the existing entry is marked
> as corrupted.
I figured I'd reply to this again, since the issue cropped up again.
We started experiencing *large* numbers of corruptions like the ones
that started the thread (one developer was receiving them once or twice
a day) with v1.6.0.4
We went ahead and upgraded to a custom build of v1.6.1 with Jan's patch
(below) and the issues /seem/ to have resolved themselves. I'm not
certain whether Jan's patch was really responsible, or if there was
another issue that caused this to correct itself in v1.6.1.
As it stands, I think it's safe to assume that given the frequency of
the occurances that they were not tied to a memory or disk error (or
other levels of the machine's stack would be suffering as well). The
only thing I can think of is that /some/ developers who've experienced
the issue are using Samba mount points and changing files in Mac OS X,
but using Git on the mounted share (i.e. TextMate changes a file hosted
on Samba, changes are committed in an SSH session on that machine), but
that doesn't account for everything.
If there was something else included in the v1.6.1 release please let me
know so I can back Jan's patch out.
Cheers
>
> Signed-off-by: Jan Krüger <jk@jk.gs>
> ---
>
> On IRC I talked to rtyler who had a corrupted pack file and plenty of
> object backups by way of cloned repositories. We decided to try
> extracting the corrupted objects from the other object database and
> injecting them into the broken repo as loose objects, but this failed
> because sha1_write_file() refuses to write loose objects that are
> already present in a pack file.
>
> This patch expands the check to see if the pack entry has been marked
> as corrupted and, if so, allows writing a loose object with the same
> ID. Unfortunately, when Tyler tried a merge while using this patch,
> something we didn't manage to track down happened and now git doesn't
> consider the object corrupted anymore. I'm not sure enough that it
> wasn't caused by the patch to submit this patch without hesitation.
>
> Apart from that, I think the change is not all too great since it makes
> write_sha1_file() walk the list of pack entries twice. That's a bit of
> a waste.
>
> So those are the reasons why I wanted a few opinions first. Another
> reason is that there might be a way smarter method to fix this kind of
> problem, in which case I'd love hearing about it for future reference.
>
> sha1_file.c | 9 +++++----
> 1 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 6c0e251..17085cc 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2373,14 +2373,17 @@ int write_sha1_file(void *buf, unsigned long len, const char *type, unsigned cha
> char hdr[32];
> int hdrlen;
>
> - /* Normally if we have it in the pack then we do not bother writing
> - * it out into .git/objects/??/?{38} file.
> - */
> write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
> if (returnsha1)
> hashcpy(returnsha1, sha1);
> - if (has_sha1_file(sha1))
> - return 0;
> + /* Normally if we have it in the pack then we do not bother writing
> + * it out into .git/objects/??/?{38} file. We do, though, if there
> + * is no chance that we have an uncorrupted version of the object.
> + */
> + if (has_sha1_file(sha1)) {
> + if (has_loose_object(sha1) || !has_packed_and_bad(sha1))
> + return 0;
> + }
> return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
> }
>
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-06 22:52 ` R. Tyler Ballance
@ 2009-01-07 1:25 ` Nicolas Pitre
2009-01-07 1:39 ` R. Tyler Ballance
0 siblings, 1 reply; 49+ messages in thread
From: Nicolas Pitre @ 2009-01-07 1:25 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Jan Krüger, Git ML
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2646 bytes --]
On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote:
> > For fixing a corrupted repository by using backup copies of individual
> > files, allow write_sha1_file() to write loose files even if the object
> > already exists in a pack file, but only if the existing entry is marked
> > as corrupted.
>
> I figured I'd reply to this again, since the issue cropped up again.
>
> We started experiencing *large* numbers of corruptions like the ones
> that started the thread (one developer was receiving them once or twice
> a day) with v1.6.0.4
>
> We went ahead and upgraded to a custom build of v1.6.1 with Jan's patch
> (below) and the issues /seem/ to have resolved themselves. I'm not
> certain whether Jan's patch was really responsible, or if there was
> another issue that caused this to correct itself in v1.6.1.
>
> As it stands, I think it's safe to assume that given the frequency of
> the occurances that they were not tied to a memory or disk error (or
> other levels of the machine's stack would be suffering as well). The
> only thing I can think of is that /some/ developers who've experienced
> the issue are using Samba mount points and changing files in Mac OS X,
> but using Git on the mounted share (i.e. TextMate changes a file hosted
> on Samba, changes are committed in an SSH session on that machine), but
> that doesn't account for everything.
>
> If there was something else included in the v1.6.1 release please let me
> know so I can back Jan's patch out.
Please back it out. As it stands, that patch is a no op because of the
way git is used, and even if the patch was to work as intended, its
purpose is not to magically fix corruptions without special action from
your part. If you have corruption problems coming back only because of
the removal of this patch then something is really really fishy and I
would really like to know about it.
There were indeed many changes between v1.6.0.4 and v1.6.1: the exact
number is 1029. A couple of them are especially addressing increased
robustness against some kind of pack corruptions. But in any case you
still should see error messages appearing about them.
And don't underestimate the power of disk corruptions. I started to
work on git corruption resilience simply because I ended up with a
corrupted pack at some point. Then a while later I got another
corrupted pack. Then another while later I lost my filesystem entirely
and had to reinstall my system (after buying a new disk). Turns out
that my old disk is silently corrupting data without signaling any
errors to the host.
Nicolas
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 1:25 ` Nicolas Pitre
@ 2009-01-07 1:39 ` R. Tyler Ballance
2009-01-07 2:09 ` Nicolas Pitre
2009-01-07 4:54 ` Linus Torvalds
0 siblings, 2 replies; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-07 1:39 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Jan Krüger, Git ML
[-- Attachment #1: Type: text/plain, Size: 3010 bytes --]
On Tue, 2009-01-06 at 20:25 -0500, Nicolas Pitre wrote:
> On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
>
> > On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote:
> > > For fixing a corrupted repository by using backup copies of individual
> > > files, allow write_sha1_file() to write loose files even if the object
> > > already exists in a pack file, but only if the existing entry is marked
> > > as corrupted.
> >
> > I figured I'd reply to this again, since the issue cropped up again.
> >
> > We started experiencing *large* numbers of corruptions like the ones
> > that started the thread (one developer was receiving them once or twice
> > a day) with v1.6.0.4
> >
> > We went ahead and upgraded to a custom build of v1.6.1 with Jan's patch
> > (below) and the issues /seem/ to have resolved themselves. I'm not
> > certain whether Jan's patch was really responsible, or if there was
> > another issue that caused this to correct itself in v1.6.1.
I'll back the patch out and redeploy, it's worth mentioning that a
coworker of mine just got the issue as well (on 1.6.1). He was able to
`git pull` and the error went away, but I doubt that it "magically fixed
itself"
> Please back it out. As it stands, that patch is a no op because of the
> way git is used, and even if the patch was to work as intended, its
> purpose is not to magically fix corruptions without special action from
> your part. If you have corruption problems coming back only because of
> the removal of this patch then something is really really fishy and I
> would really like to know about it.
>
> There were indeed many changes between v1.6.0.4 and v1.6.1: the exact
> number is 1029. A couple of them are especially addressing increased
> robustness against some kind of pack corruptions. But in any case you
> still should see error messages appearing about them.
>
> And don't underestimate the power of disk corruptions. I started to
> work on git corruption resilience simply because I ended up with a
> corrupted pack at some point. Then a while later I got another
> corrupted pack. Then another while later I lost my filesystem entirely
> and had to reinstall my system (after buying a new disk). Turns out
> that my old disk is silently corrupting data without signaling any
> errors to the host.
I highly doubt this, I've got the issue appearing on at least 7
different development boxes (not workstations, 2U quad-core ECC RAM, etc
machines), while that doesn't mean that they all don't have issues, the
probability of them *all* having disk issues, and it somehow only
manifesting itself with Git usage, is low ;)
I've tarred one of the repositories that had it in a reproducible state
so I can create a build and extract the tar and run against that to
verify any patches anybody might have, but unfortunately at 7GB of
company code and assets, I can't exactly share ;)
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 1:39 ` R. Tyler Ballance
@ 2009-01-07 2:09 ` Nicolas Pitre
2009-01-07 2:47 ` R. Tyler Ballance
2009-01-07 4:54 ` Linus Torvalds
1 sibling, 1 reply; 49+ messages in thread
From: Nicolas Pitre @ 2009-01-07 2:09 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Jan Krüger, Git ML
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2165 bytes --]
On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> On Tue, 2009-01-06 at 20:25 -0500, Nicolas Pitre wrote:
> > On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> >
> > > On Tue, 2008-12-09 at 09:36 +0100, Jan Krüger wrote:
> > > > For fixing a corrupted repository by using backup copies of individual
> > > > files, allow write_sha1_file() to write loose files even if the object
> > > > already exists in a pack file, but only if the existing entry is marked
> > > > as corrupted.
> > >
> > > I figured I'd reply to this again, since the issue cropped up again.
> > >
> > > We started experiencing *large* numbers of corruptions like the ones
> > > that started the thread (one developer was receiving them once or twice
> > > a day) with v1.6.0.4
> > >
> > > We went ahead and upgraded to a custom build of v1.6.1 with Jan's patch
> > > (below) and the issues /seem/ to have resolved themselves. I'm not
> > > certain whether Jan's patch was really responsible, or if there was
> > > another issue that caused this to correct itself in v1.6.1.
>
> I'll back the patch out and redeploy, it's worth mentioning that a
> coworker of mine just got the issue as well (on 1.6.1). He was able to
> `git pull` and the error went away, but I doubt that it "magically fixed
> itself"
Please describe the "issue", ideally with transcripts of error messages,
etc. Normally a simple pull operation should not provide any "fix" for
corruptions.
> I highly doubt this, I've got the issue appearing on at least 7
> different development boxes (not workstations, 2U quad-core ECC RAM, etc
> machines), while that doesn't mean that they all don't have issues, the
> probability of them *all* having disk issues, and it somehow only
> manifesting itself with Git usage, is low ;)
Agreed.
> I've tarred one of the repositories that had it in a reproducible state
That is wonderful.
> so I can create a build and extract the tar and run against that to
> verify any patches anybody might have, but unfortunately at 7GB of
> company code and assets, I can't exactly share ;)
First step is to understand what is going on. Only then could reliable
patches be made.
Nicolas
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 2:09 ` Nicolas Pitre
@ 2009-01-07 2:47 ` R. Tyler Ballance
2009-01-07 3:21 ` Nicolas Pitre
0 siblings, 1 reply; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-07 2:47 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Jan Krüger, Git ML
[-- Attachment #1: Type: text/plain, Size: 751 bytes --]
On Tue, 2009-01-06 at 21:09 -0500, Nicolas Pitre wrote:
> > I've tarred one of the repositories that had it in a reproducible
> state
>
> That is wonderful.
>
> > so I can create a build and extract the tar and run against that to
> > verify any patches anybody might have, but unfortunately at 7GB of
> > company code and assets, I can't exactly share ;)
>
> First step is to understand what is going on. Only then could reliable
> patches be made.
If you want to point me in the right direction, I have a few hours to
kill this evening and fscking around with gdb(1) and printf() just might
be some of my favorite things</sarcasm> ;)
Looking forward to killing this issue
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 2:47 ` R. Tyler Ballance
@ 2009-01-07 3:21 ` Nicolas Pitre
0 siblings, 0 replies; 49+ messages in thread
From: Nicolas Pitre @ 2009-01-07 3:21 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Jan Krüger, Git ML
On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> On Tue, 2009-01-06 at 21:09 -0500, Nicolas Pitre wrote:
> > > I've tarred one of the repositories that had it in a reproducible
> > state
> >
> > That is wonderful.
> >
> > > so I can create a build and extract the tar and run against that to
> > > verify any patches anybody might have, but unfortunately at 7GB of
> > > company code and assets, I can't exactly share ;)
> >
> > First step is to understand what is going on. Only then could reliable
> > patches be made.
>
> If you want to point me in the right direction, I have a few hours to
> kill this evening and fscking around with gdb(1) and printf() just might
> be some of my favorite things</sarcasm> ;)
Heh. ;-)
To start with, a simple log of what you need to do to reproduce the
issue would be nice. Just do
script /tmp/foo
then reproduce the issue and exit, after which I'd be interrested in the
content of /tmp/foo.
Nicolas
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 1:39 ` R. Tyler Ballance
2009-01-07 2:09 ` Nicolas Pitre
@ 2009-01-07 4:54 ` Linus Torvalds
2009-01-07 7:41 ` R. Tyler Ballance
1 sibling, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2009-01-07 4:54 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
>
> I'll back the patch out and redeploy, it's worth mentioning that a
> coworker of mine just got the issue as well (on 1.6.1). He was able to
> `git pull` and the error went away, but I doubt that it "magically fixed
> itself"
Quite frankly, that behaviour sounds like a disk _cache_ corruption issue.
The fact that some corruption "comes and goes" and sometimes magically
heals itself sounds very much like some disk cache problem, and then that
particular part of the cache gets replaced and then when re-populated it
is magically correct.
We had that in one case with a Linux NFS client, where a rename across
directories caused problems.
This was a networked filesystem on OS X, right? File caching is much more
"interesting" in networked filesystems than it is in normal private
on-disk ones.
> I've tarred one of the repositories that had it in a reproducible state
> so I can create a build and extract the tar and run against that to
> verify any patches anybody might have, but unfortunately at 7GB of
> company code and assets, I can't exactly share ;)
The thing to do is
- untar it on some trusted machine with a local disk and a known-good
filesystem.
IOW, not that networked samba share.
- verify that it really does happen on that machine, with that untarred
image. Because maybe it doesn't.
The hope is that you caught the corruption in the cache, and it
actually got written out to the tar-file. But if it _is_ a disk cache
(well, network cache) issue, maybe the IO required to tar everything up
was enough to flush it, and the tar-file actually _works_ because it
got repopulated correctly.
So that's why you should double-check that it really ends up being
corrupt after being untarred again.
- go back and test the original git repo on the network share, preferably
on another client. See if the error has gone away.
- If so, try to compare that known-corrupt filesystem with the original
one: and preferably do this on another machine over the network mount.
See if they differ. They obviously should *not* differ, since it's an
tar/untar of the same files, but ...
The fact that you seem to get a _lot_ of these errors really does make it
sound like something in your environment. It's actually really hard to get
git to corrupt anything. Especially objects that got packed. They've been
quiescent for a long time, they got repacked in a very simple way, they
are totally read-only.
But it is _not_ hard to corrupt network filesystems. It's downright
trivial with some of them, especially with some hardware (eg there's no
end-to-end checksumming except for the _extremely_ weak 16-bit IP csum,
and even that has been known to be disabled, or screwed up by ethernet
cards that do IP packet offloading and thus computing the csum not on the
data that tee user actually wrote, but the data that the card received,
which is not necessarily at all the same thing).
And while ethernet uses a stronger CRC, that one is not end-to-end, so
corruption on the card or in a switch in between easily defeats that too.
Just google for something like
"OS X" SMB "file corruption"
and you'll find quite a bit of hits. Not all that unusual.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 4:54 ` Linus Torvalds
@ 2009-01-07 7:41 ` R. Tyler Ballance
2009-01-07 8:16 ` Junio C Hamano
` (3 more replies)
0 siblings, 4 replies; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-07 7:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
[-- Attachment #1: Type: text/plain, Size: 5231 bytes --]
On Tue, 2009-01-06 at 20:54 -0800, Linus Torvalds wrote:
>
> On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> >
> > I'll back the patch out and redeploy, it's worth mentioning that a
> > coworker of mine just got the issue as well (on 1.6.1). He was able to
> > `git pull` and the error went away, but I doubt that it "magically fixed
> > itself"
>
> Quite frankly, that behaviour sounds like a disk _cache_ corruption issue.
> The fact that some corruption "comes and goes" and sometimes magically
> heals itself sounds very much like some disk cache problem, and then that
> particular part of the cache gets replaced and then when re-populated it
> is magically correct.
>
> We had that in one case with a Linux NFS client, where a rename across
> directories caused problems.
>
> This was a networked filesystem on OS X, right? File caching is much more
> "interesting" in networked filesystems than it is in normal private
> on-disk ones.
Not quite, what I meant was that some users (not all) who've experienced
this issue are using Samba to copy files over directly into the Git
repository. I was mentioning this in case somewhere between Finder,
Samba, ext3 and Git, some file system change events were pissing Git off
and causing it. I don't think this is the case as the coworker that I
mentioned earlier doesn't use Samba and neither do I (we both experience
the issue today, mine disappeared by upgrading to 1.6.1, his by `git
pull`).
>
> > I've tarred one of the repositories that had it in a reproducible state
> > so I can create a build and extract the tar and run against that to
> > verify any patches anybody might have, but unfortunately at 7GB of
> > company code and assets, I can't exactly share ;)
>
> The thing to do is
>
> - untar it on some trusted machine with a local disk and a known-good
> filesystem.
>
> IOW, not that networked samba share.
>
> - verify that it really does happen on that machine, with that untarred
> image. Because maybe it doesn't.
Unfortunately it doesn't, what I did notice was this when I did a `git
status` in the directory right after untarring:
tyler@grapefruit:~/jburgess_main> git status
#
# ---impressive amount of file names fly by---
# ----snip---
#
# Untracked files:
# (use "git add <file>..." to include in what will be
committed)
#
# artwork/
# bt/
# flash/
tyler@grapefruit:~/jburgess_main>
Basically, somehow Git thinks that *every* file in the repository is
deleted at this point. I went ahead and performed a `git reset --hard`
to see if the issue would manifest itself thereafter, but it did not.
I did try to do a git-fsck(1), and this is what I got:
tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full
[1] 19381 segmentation fault /usr/local/bin/git fsck --full
tyler@grapefruit:~/jburgess_main>
>
> The hope is that you caught the corruption in the cache, and it
> actually got written out to the tar-file. But if it _is_ a disk cache
> (well, network cache) issue, maybe the IO required to tar everything up
> was enough to flush it, and the tar-file actually _works_ because it
> got repopulated correctly.
When I was working through this with Jan, one of the things that we did
was move the actual object file in .git/objects, they existed so maybe I
could look into those to check?
>
> So that's why you should double-check that it really ends up being
> corrupt after being untarred again.
>
> - go back and test the original git repo on the network share, preferably
> on another client. See if the error has gone away.
Unfortunately the repository is being used by the original developer I
tarred from with our 1.6.1 build, he hasn't reported any issues, but I
can't exactly steal it back (that's why I made the tar)
> The fact that you seem to get a _lot_ of these errors really does make
> it
> sound like something in your environment. It's actually really hard to get
> git to corrupt anything. Especially objects that got packed. They've been
> quiescent for a long time, they got repacked in a very simple way, they
> are totally read-only.
I checked with our operations team, and contrary to my suspicion (your
NFS comment piqued my curiosity), these disks that are actually on the
machines are not NFS mounts but rather local disk arrays.
--> is it NFSd? or all local storage
<== all local
<== df -h
<== mount
<== /dev/sda5 705G 247G 423G 37% /nail
--> hm, there goes that theory
<== git corruption?
--> yeah, looking into it
<== sucks
--> Linus had a theory about NFS/etc corruption of the disk
cache
<== when the company folds we can all blame you... and your
silly git games
<== (think positive, joel)
--> thanks
;)
Any thing else I can do to help debug this? :-/
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 7:41 ` R. Tyler Ballance
@ 2009-01-07 8:16 ` Junio C Hamano
2009-01-07 8:32 ` R. Tyler Ballance
2009-01-07 9:05 ` R. Tyler Ballance
` (2 subsequent siblings)
3 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2009-01-07 8:16 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Linus Torvalds, Nicolas Pitre, Jan Krüger, Git ML
"R. Tyler Ballance" <tyler@slide.com> writes:
> Unfortunately it doesn't, what I did notice was this when I did a `git
> status` in the directory right after untarring:
> tyler@grapefruit:~/jburgess_main> git status
> #
> # ---impressive amount of file names fly by---
> # ----snip---
> ...
> Basically, somehow Git thinks that *every* file in the repository is
> deleted at this point.
That makes me suspect that your .git/index file is corrupt.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 8:16 ` Junio C Hamano
@ 2009-01-07 8:32 ` R. Tyler Ballance
2009-01-07 9:42 ` Junio C Hamano
0 siblings, 1 reply; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-07 8:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, Nicolas Pitre, Jan Krüger, Git ML
[-- Attachment #1: Type: text/plain, Size: 823 bytes --]
On Wed, 2009-01-07 at 00:16 -0800, Junio C Hamano wrote:
> "R. Tyler Ballance" <tyler@slide.com> writes:
>
> > Unfortunately it doesn't, what I did notice was this when I did a `git
> > status` in the directory right after untarring:
> > tyler@grapefruit:~/jburgess_main> git status
> > #
> > # ---impressive amount of file names fly by---
> > # ----snip---
> > ...
> > Basically, somehow Git thinks that *every* file in the repository is
> > deleted at this point.
>
> That makes me suspect that your .git/index file is corrupt.
Would this be tied to the corrupted pack file issue, or separate.
Either way, how could I verify your assumptions? (i'll be lurking in
#git for a while if you want to interactively help ;))
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 7:41 ` R. Tyler Ballance
2009-01-07 8:16 ` Junio C Hamano
@ 2009-01-07 9:05 ` R. Tyler Ballance
2009-01-07 15:31 ` Nicolas Pitre
2009-01-07 16:07 ` Linus Torvalds
3 siblings, 0 replies; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-07 9:05 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
[-- Attachment #1: Type: text/plain, Size: 2154 bytes --]
On Tue, 2009-01-06 at 23:41 -0800, R. Tyler Ballance wrote:
> I did try to do a git-fsck(1), and this is what I got:
> tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full
> [1] 19381 segmentation fault /usr/local/bin/git fsck --full
> tyler@grapefruit:~/jburgess_main>
> >
Disregard this comment, Jan reminded me in IRC of the issue I had with
this repository and the stack-size ulimit. A healthy 32M stack-size
allows for the command to complete:
tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full
dangling blob 6e58a06cd0f027fce1fc8a923a8c81d6b55f1705
dangling blob 5e87e049f69ee06af1f4f92a3d4ddcd912f8535e
dangling blob acb8e0937cea3f4068c5c67f60d3b97952654fa1
dangling blob 8bd540d657027ab12228e0522de86a97d3f8a7a9
dangling blob 90d93096369df4b2151df3e289952297c5f390dd
dangling blob aaa4a1bf9e6d39991503b7908ff71605d6632fef
dangling blob b003e20f06de7d0a7e11a1047fbb39e2deb84899
dangling blob bf207363786f08e888a725cfb8b2fe4bea11ceab
dangling blob 642493a1bcf09f74551f0ce5b4c3ccc23acaf3c9
dangling blob ff80b45e48795d4544d0a5b2f18714123ab21746
dangling blob e3c0e67e23ec588acab733e2069fb09fa38a235d
dangling blob 62efa65bcbb6d94f26be9b91dc98d7a7f6fbf602
dangling blob a6a2c717c230fe26263b9ce002f3f82fdab6f727
dangling blob d32988875aa90ef728dc1af6b71c130f2c8e8b94
dangling blob c8aad9fc37a27872bae18af78b1623bfb8f9a9f7
dangling blob 4cdbe97301e333d89037dc9f4a8440dab9e62049
dangling blob 2819bbcda3b0efe828709f5b22624712cb9ebdae
dangling blob b156cba3ee89e1506d02fbb845e8dc0889ff4090
dangling blob ed658b8beca69ebf6e25ab1ed6217881e27d4e9a
dangling blob d581cb2fb1cf2b4d15f8f9b90b08a3ebc619414f
dangling blob a85c2e55d27c9fb1f5874f8bd81c65e597327b4f
dangling blob 548f3e70c06f9306d71f6b8d11ed24ade581fa31
dangling blob 7a1c8fa8bc59c4e8fb347426cc11fabf8b0d1639
tyler@grapefruit:~/jburgess_main>
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 8:32 ` R. Tyler Ballance
@ 2009-01-07 9:42 ` Junio C Hamano
0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2009-01-07 9:42 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Linus Torvalds, Nicolas Pitre, Jan Krüger, Git ML
"R. Tyler Ballance" <tyler@slide.com> writes:
> On Wed, 2009-01-07 at 00:16 -0800, Junio C Hamano wrote:
>> "R. Tyler Ballance" <tyler@slide.com> writes:
>>
>> > Unfortunately it doesn't, what I did notice was this when I did a `git
>> > status` in the directory right after untarring:
>> > tyler@grapefruit:~/jburgess_main> git status
>> > #
>> > # ---impressive amount of file names fly by---
>> > # ----snip---
>> > ...
>> > Basically, somehow Git thinks that *every* file in the repository is
>> > deleted at this point.
>>
>> That makes me suspect that your .git/index file is corrupt.
>
> Would this be tied to the corrupted pack file issue, or separate.
If you have perfectly good set of packs, if your index is corrupt you may
see "everything deleted", so in that sense it is independent.
As Linus's earlier conjecture was that this is related to some sort of
disk/cache corruption, I wouldn't be surprised if such a failure hit packs
and the index file indiscriminatingly. So in that sense they are
related.
I think "git ls-files" (before doing anything else, such as resetting, of
course) would report that the index is corrupt, if that is indeed the case.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 7:41 ` R. Tyler Ballance
2009-01-07 8:16 ` Junio C Hamano
2009-01-07 9:05 ` R. Tyler Ballance
@ 2009-01-07 15:31 ` Nicolas Pitre
2009-01-07 16:07 ` Linus Torvalds
3 siblings, 0 replies; 49+ messages in thread
From: Nicolas Pitre @ 2009-01-07 15:31 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Linus Torvalds, Jan Krüger, Git ML
On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> On Tue, 2009-01-06 at 20:54 -0800, Linus Torvalds wrote:
> >
> > On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> > >
> > > I'll back the patch out and redeploy, it's worth mentioning that a
> > > coworker of mine just got the issue as well (on 1.6.1). He was able to
> > > `git pull` and the error went away, but I doubt that it "magically fixed
> > > itself"
> >
> > Quite frankly, that behaviour sounds like a disk _cache_ corruption issue.
> > The fact that some corruption "comes and goes" and sometimes magically
> > heals itself sounds very much like some disk cache problem, and then that
> > particular part of the cache gets replaced and then when re-populated it
> > is magically correct.
> >
> > We had that in one case with a Linux NFS client, where a rename across
> > directories caused problems.
> >
> > This was a networked filesystem on OS X, right? File caching is much more
> > "interesting" in networked filesystems than it is in normal private
> > on-disk ones.
>
> Not quite, what I meant was that some users (not all) who've experienced
> this issue are using Samba to copy files over directly into the Git
> repository. I was mentioning this in case somewhere between Finder,
> Samba, ext3 and Git, some file system change events were pissing Git off
> and causing it.
As long as those files are not within the .git directory that should be
fine.
> I don't think this is the case as the coworker that I
> mentioned earlier doesn't use Samba and neither do I (we both experience
> the issue today, mine disappeared by upgrading to 1.6.1, his by `git
> pull`).
Problem is that none of that "makes sense". If a real git corruption
was there, it wouldn't disappear without explicit action from your part.
What git v1.6.1 is able to do over earlier versions is to still function
properly if some corrupted objects have a redundant copy in the same
repository, but it wouldn't stop complaining about the existence of
corrupted data. Doing a 'git gc' or 'git repack -a' might get rid of
the corruption. And a 'git pull' might hide it if the received pack
during the pull operation happens to contain another copy of the object
that was corrupted before, but it wouldn't prevent 'git fsck --full'
from seeing it.
The fact that you cannot reproduce the corruption issues after
unarchiving a repository that was known to have problems right before it
was archived is really really strange. That does not rule out git bugs
of course, but at least this shows that no actual corruption on disk was
initially involved.
Again, I'd suggest you perform your git usage within a script session so
to capture the exact operation performed and error messages produced
when/if similar problems do occur again. Otherwise we're only running
after our tail.
Nicolas
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 7:41 ` R. Tyler Ballance
` (2 preceding siblings ...)
2009-01-07 15:31 ` Nicolas Pitre
@ 2009-01-07 16:07 ` Linus Torvalds
2009-01-07 16:08 ` Linus Torvalds
2009-01-07 22:55 ` R. Tyler Ballance
3 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2009-01-07 16:07 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
On Tue, 6 Jan 2009, R. Tyler Ballance wrote:
> >
> > The thing to do is
> >
> > - untar it on some trusted machine with a local disk and a known-good
> > filesystem.
> >
> > IOW, not that networked samba share.
> >
> > - verify that it really does happen on that machine, with that untarred
> > image. Because maybe it doesn't.
>
> Unfortunately it doesn't
Well, that's not necessarily "unfortunate". It does actually end up
showing that the objects themselves were apparently never really corrupt.
So there is no fundamental data structure corrupttion - because when you
copy the repository, it's all good agin!
In other words, that's not a worthless piece of information at all, and it
does tell us a lot, namely that the corruption was never real long-term
data corruption of the git object archive, but something local and
temporary. Again, we're really back to either:
- it could be some _temporary_ git corruption caused internally inside a
git process - ie a wild pointer, or perhaps a race condition (but we
don't really use threading in 1.6.0.4 unless you ask for it, and even
then just for pack-file generation)
- or it's the disk cache corruption, and the tar/untar ended up flushing
it.
And quite frankly, since the corruption seems to be site-specific, I
really do suspect the second case. Although it's possible, of course, that
it could be some compiler issue that makes _your_ binaries have issues
even when nobody else sees it.
> what I did notice was this when I did a `git
> status` in the directory right after untarring:
> tyler@grapefruit:~/jburgess_main> git status
> #
> # ---impressive amount of file names fly by---
> # ----snip---
> #
> # Untracked files:
> # (use "git add <file>..." to include in what will be
> committed)
> #
> # artwork/
> # bt/
> # flash/
Hmm. That's actually _normal_ under some circumstances. At least with
older git versions, or if your .git/index file couldn't be rewritten for
some reason - your existing index file contains all the old stat
information, and if git cannot (or, in the case of older git version, just
will not) refresh it automatically, it will show all the files as changed,
even if it's just the inode number that really changed.
A _normal_ git install should have auto-refreshed the index, though.
Unless the tar archive only contained the ".git" directory, and not the
checkout?
> tyler@grapefruit:~/jburgess_main>
>
> Basically, somehow Git thinks that *every* file in the repository is
> deleted at this point. I went ahead and performed a `git reset --hard`
> to see if the issue would manifest itself thereafter, but it did not.
That would be what I'd expect if you had only tarred up .git, although
then I wouldn't have expected those "Untracked files:". Hmm. Without being
able to look at the archive, I'm just guessing randomly.
> I did try to do a git-fsck(1), and this is what I got:
> tyler@grapefruit:~/jburgess_main> /usr/local/bin/git fsck --full
> [1] 19381 segmentation fault /usr/local/bin/git fsck --full
> tyler@grapefruit:~/jburgess_main>
.. and that's the unrelated fsck bug that got fixed later.
> > The hope is that you caught the corruption in the cache, and it
> > actually got written out to the tar-file. But if it _is_ a disk cache
> > (well, network cache) issue, maybe the IO required to tar everything up
> > was enough to flush it, and the tar-file actually _works_ because it
> > got repopulated correctly.
>
> When I was working through this with Jan, one of the things that we did
> was move the actual object file in .git/objects, they existed so maybe I
> could look into those to check?
Yes. If you have any bad loose objects, if you compare them to the good
objects with the same name, that's going to be interesting information.
The pattern of corruption can be very telling. For example, on Linux, a
disk cache corruption would usually be at 4kB block boundaries, because
that's the granularity of the cache. While a bit error would be obvious
etc etc.
> I checked with our operations team, and contrary to my suspicion (your
> NFS comment piqued my curiosity), these disks that are actually on the
> machines are not NFS mounts but rather local disk arrays.
Ok, that generally makes caching much simpler. What filesystem?
Is there anything else that you do that is site-specific and/or slightly
different? For example, a long time ago we had a bug related to CRLF
conversion which would cause a use-after-free problem, and that would
corrupt the data internally to git.
And dobody else saw it than this one person, and it was a total mystery to
everybody until we realized that he used this one feature that nobody else
was using. So as you're on OS X, I assume you don't have CRLF conversion,
but maybe you use some other feature that we support but nobody really
actually uses. Like keyword expansion or something?
Oh - that would also explain why you got all those entries in "git status"
that went away when you did a "git reset --hard": if you had some keyword
expansion (or CRLF) enabled in the original users "~/.gitconfig", that
checkout would have had expansion/CRLF/whatever conversion, but then when
you tarred/untarred it on another setup, the expansion would be seen as a
difference because it wasn't enabled.
Hmm?
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 16:07 ` Linus Torvalds
@ 2009-01-07 16:08 ` Linus Torvalds
2009-01-07 22:55 ` R. Tyler Ballance
1 sibling, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2009-01-07 16:08 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
On Wed, 7 Jan 2009, Linus Torvalds wrote:
>
> And dobody else saw it than this one person, and it was a total mystery to
> everybody until we realized that he used this one feature that nobody else
> was using. So as you're on OS X, I assume you don't have CRLF conversion,
> but maybe you use some other feature that we support but nobody really
> actually uses. Like keyword expansion or something?
>
> Oh - that would also explain why you got all those entries in "git status"
> that went away when you did a "git reset --hard": if you had some keyword
> expansion (or CRLF) enabled in the original users "~/.gitconfig", that
> checkout would have had expansion/CRLF/whatever conversion, but then when
> you tarred/untarred it on another setup, the expansion would be seen as a
> difference because it wasn't enabled.
Btw, if you untar it again, and just do a "git diff", that should show any
such effects. Rather than showing just that something changed, it should
show _how_ it changed.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 16:07 ` Linus Torvalds
2009-01-07 16:08 ` Linus Torvalds
@ 2009-01-07 22:55 ` R. Tyler Ballance
2009-01-07 23:29 ` Linus Torvalds
1 sibling, 1 reply; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-07 22:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
[-- Attachment #1: Type: text/plain, Size: 4576 bytes --]
On Wed, 2009-01-07 at 08:07 -0800, Linus Torvalds wrote:
> Well, that's not necessarily "unfortunate". It does actually end up
> showing that the objects themselves were apparently never really corrupt.
>
> So there is no fundamental data structure corrupttion - because when you
> copy the repository, it's all good agin!
> - it could be some _temporary_ git corruption caused internally inside a
> git process - ie a wild pointer, or perhaps a race condition (but we
> don't really use threading in 1.6.0.4 unless you ask for it, and even
> then just for pack-file generation)
I have a feeling it's something like this, one of our operations guys
did some research while I was looking at code and he came across this:
On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote:
git-merge is using too much RAM, and failing to malloc() but
NOT
> reporting it. This is all sorts of bad:
>
> A) using an unscalable amount of RAM
> B) failing to detect malloc() failure
> C) reporting file corruption instead
> I was able to reproduce this.
>
> limit ~1.5GB -> corrupt file
> limit ~3GB -> magically no longer corrupt.
>
> The false fail may be limited to git-merge, but git status also
> allocates the same amount of RAM.
>
> To temporarily work around this problem, issue this once you
log in to
> a dev box:
>
> tcsh:
> limit vmemoryuse 3000000
> bash:
> ulimit -v 3000000
>
> Be gentle.
> And quite frankly, since the corruption seems to be site-specific, I
> really do suspect the second case. Although it's possible, of course, that
> it could be some compiler issue that makes _your_ binaries have issues
> even when nobody else sees it.
I think you're correct insofar that our major site-specific alteration
has come up on the mailing list before (okay maybe two site-specific
things).
* Our Git repo is ~7.1GB
* ulimit -v is set to ~1.5G
I think I know how this could be failing and corrupting things (assuming
it's malloc(2)) related.
What I'm thinking is that in xmalloc() or one of the other x*)_
functions, the malloc(size) is failing because of the ulimits, and then
the potentially somewhere it's silently failing or maybe even
accidentally returning one of those "malloc(1)" pointers?
I've got two new tarred repositories from two developers the issue
happened to today, so I'm flush full of sample repositories to try stuff
on :)
>
> Hmm. That's actually _normal_ under some circumstances. At least with
> older git versions, or if your .git/index file couldn't be rewritten for
> some reason - your existing index file contains all the old stat
> information, and if git cannot (or, in the case of older git version, just
> will not) refresh it automatically, it will show all the files as changed,
> even if it's just the inode number that really changed.
>
> A _normal_ git install should have auto-refreshed the index, though.
> Unless the tar archive only contained the ".git" directory, and not the
> checkout?
I believe the issues I noticed when untarring the repo were a red
herring, I did the `git diff` after untarring and I noticed that only a
certain set of files where changed, I'm willing to go so far as to guess
that they were the files affected in the corrupted packs. Of the 32k
files in our repository, 98 were actually different after untarring
(according to git-diff(1))
> And dobody else saw it than this one person, and it was a total mystery to
> everybody until we realized that he used this one feature that nobody else
> was using. So as you're on OS X, I assume you don't have CRLF conversion,
> but maybe you use some other feature that we support but nobody really
> actually uses. Like keyword expansion or something?
The two new folks this happened to today had nothing "special" about
them other than the ulimit.
I've got the script(1) output of performing git-ls-files(1) and some
other commands that I tried, nothing they output was particular
informative or interesting, and I don't think it will help if this
really is a memory related issue, that said I'd be more than happy to
send it to a couple of you (Junio, Linus, Nico).
I'm *so* ready for this bug to die >=\
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 22:55 ` R. Tyler Ballance
@ 2009-01-07 23:29 ` Linus Torvalds
2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance
2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds
0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2009-01-07 23:29 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
>
> > git process - ie a wild pointer, or perhaps a race condition (but we
> > don't really use threading in 1.6.0.4 unless you ask for it, and even
> > then just for pack-file generation)
>
> I have a feeling it's something like this, one of our operations guys
> did some research while I was looking at code and he came across this:
>
> On Wed, 2009-01-07 at 14:17 -0800, Ken Brownfield wrote:
> git-merge is using too much RAM, and failing to malloc() but
> NOT
> > reporting it. This is all sorts of bad:
> >
> > A) using an unscalable amount of RAM
> > B) failing to detect malloc() failure
> > C) reporting file corruption instead
Well, I dont' think that's exactly it. git internally doesn't really use
malloc at all, and uses xmalloc() instead which will die() if the malloc
fails. So there's almost certainly no "failing to detect failures"
Yes, there's a few places that don't use the wrapper, but they should be
safe (eg either they SIGSEGV, or they are like create_delta_index() and
just create a sub-optimal pack with a warning).
HOWEVER:
> > I was able to reproduce this.
> >
> > limit ~1.5GB -> corrupt file
> > limit ~3GB -> magically no longer corrupt.
That is interesting, although I also worry that there might be other
issues going on (ie since you've reported thigns magically fixing
themselves, maybe the ulimit tests just _happened_ to show that, even if
it wasn't the core reason).
BUT! This is definitely worth looking at.
For example, we do have some cases where we try to do "mmap()", and if it
fails, we try to free some memory and try again. In particular, in
xmmap(), if an mmap() fails - which may be due to running out of virtual
address space - we'll actually try to release some pack-file memory and
try again. Maybe there's a bug there - and it would be one that seldom
triggers for others.
> I think you're correct insofar that our major site-specific alteration
> has come up on the mailing list before (okay maybe two site-specific
> things).
> * Our Git repo is ~7.1GB
> * ulimit -v is set to ~1.5G
It is certainly possible. It's too bad that it's private, because it makes
it _much_ harder to try to pinpoint this.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 23:29 ` Linus Torvalds
@ 2009-01-08 0:28 ` R. Tyler Ballance
2009-01-08 0:48 ` Linus Torvalds
2009-01-08 2:52 ` Linus Torvalds
2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds
1 sibling, 2 replies; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-08 0:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
[-- Attachment #1: Type: text/plain, Size: 2370 bytes --]
On Wed, 2009-01-07 at 15:29 -0800, Linus Torvalds wrote:
> It is certainly possible. It's too bad that it's private, because it makes
> it _much_ harder to try to pinpoint this.
My most esteemed colleague (Ken aka kb) who pointed out the memory issue
was on the right path (I think), and I have a reproduction case you can
try with your very own Linux kernel tree!
WOO!
I set ulimit -v really low (150M), and the operations I made got an
mmap(2) fatal error, but there is a sweet spot that I found, see the
transcript below. I basically chose an arbitrary revision from a couple
of weeks ago, and rolled the repository back to that point, then I tried
with iterations of ulimit -v 150, 250, 450, and then back down to 350.
tyler@grapefruit:~/source/git/linux-2.6> limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 8MB
coredumpsize 0kB
memoryuse 2561MB
maxproc 24564
descriptors 1024
memorylocked 64kB
addressspace unlimited
maxfilelocks unlimited
sigpending 24564
msgqueue 819200
nice 0
rt_priority 0
tyler@grapefruit:~/source/git/linux-2.6> export
START=56d18e9932ebf4e8eca42d2ce509450e6c9c1666
tyler@grapefruit:~/source/git/linux-2.6> git reset --hard $START
HEAD is now at 56d18e9 Merge branch 'upstream' of
git://ftp.linux-mips.org/pub/scm/upstream-linus
tyler@grapefruit:~/source/git/linux-2.6> ulimit -v `echo "350 *
1024" | bc -l`
tyler@grapefruit:~/source/git/linux-2.6> git pull
error: failed to read object
be1b87c70af69acfadb8a27a7a76dfb61de92643 at offset 1850923
from .git/objects/pack/pack-dbe154052997a05499eb6b4fd90b924da68e799a.pack
fatal: object be1b87c70af69acfadb8a27a7a76dfb61de92643 is
corrupted
tyler@grapefruit:~/source/git/linux-2.6>
I've tried this a couple of times, and it does seem to be reproducible,
let me know if you have any issues reproducing it locally and I'll try
to dig into it more with valgrind or something a bit more pin-pointing
than "ulimit -v && try, try again"
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-07 23:29 ` Linus Torvalds
2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance
@ 2009-01-08 0:37 ` Linus Torvalds
2009-01-08 0:49 ` R. Tyler Ballance
1 sibling, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 0:37 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
On Wed, 7 Jan 2009, Linus Torvalds wrote:
>
> > > limit ~1.5GB -> corrupt file
> > > limit ~3GB -> magically no longer corrupt.
>
> That is interesting, although I also worry that there might be other
> issues going on (ie since you've reported thigns magically fixing
> themselves, maybe the ulimit tests just _happened_ to show that, even if
> it wasn't the core reason).
>
> BUT! This is definitely worth looking at.
>
> For example, we do have some cases where we try to do "mmap()", and if it
> fails, we try to free some memory and try again. In particular, in
> xmmap(), if an mmap() fails - which may be due to running out of virtual
> address space - we'll actually try to release some pack-file memory and
> try again. Maybe there's a bug there - and it would be one that seldom
> triggers for others.
Ho humm. We really do have some interesting things there.
Is this a 64-bit machine? I didn't think OS X did that, but if there is
some limited 64-bit support there, maybe "sizeof(void *)" is 8, then we
default the default git pack-window to a pretty healthy 1GB.
I could easily see that if you have a virtual memory size limit of 1.5GB,
and the pack window size is 1GB, we might have trouble. Because we could
only keep one such pack window in memory at a time.
I have _not_ looked at the code, though. I'd have expected a SIGSEGV if we
really had issues with the window handling.
Anyway, _if_ your system has 64-bit pointers, then _maybe_ something the
default 1GB pack window causes problem.
If so, then adding a
[core]
packedgitwindowsize = 64M
might make a difference. It would certainly be very interesting to hear if
there's any impact.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance
@ 2009-01-08 0:48 ` Linus Torvalds
2009-01-08 0:57 ` R. Tyler Ballance
2009-01-08 2:21 ` James Pickens
2009-01-08 2:52 ` Linus Torvalds
1 sibling, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 0:48 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
>
> My most esteemed colleague (Ken aka kb) who pointed out the memory issue
> was on the right path (I think), and I have a reproduction case you can
> try with your very own Linux kernel tree!
>
> WOO!
>
> I set ulimit -v really low (150M), and the operations I made got an
> mmap(2) fatal error, but there is a sweet spot that I found, see the
> transcript below.
This is indeed the packfile mapping. The sweet spot you found depends on
how big the biggest two pack-files are, I do believe.
And if you do that
[core]
packedgitwindowsize = 64M
I think you'll find that it works. Of course, with a _really_ low ulimit,
you'd need to make it even smaller, but at some point you start hitting
other problems than the pack-file limits, ie just the simple fact that git
wants and expects you to have a certain amount of memory available ;)
Can you cnfirm that your "reproducible" case starts working with that
addition to your ~/.gitconfig? If so, the solution is pretty simple: we
should just lower the default pack windowsize.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds
@ 2009-01-08 0:49 ` R. Tyler Ballance
2009-01-08 1:01 ` Linus Torvalds
0 siblings, 1 reply; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-08 0:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
[-- Attachment #1: Type: text/plain, Size: 2976 bytes --]
On Wed, 2009-01-07 at 16:37 -0800, Linus Torvalds wrote:
>
> On Wed, 7 Jan 2009, Linus Torvalds wrote:
> >
> > > > limit ~1.5GB -> corrupt file
> > > > limit ~3GB -> magically no longer corrupt.
> >
> > That is interesting, although I also worry that there might be other
> > issues going on (ie since you've reported thigns magically fixing
> > themselves, maybe the ulimit tests just _happened_ to show that, even if
> > it wasn't the core reason).
> >
> > BUT! This is definitely worth looking at.
> >
> > For example, we do have some cases where we try to do "mmap()", and if it
> > fails, we try to free some memory and try again. In particular, in
> > xmmap(), if an mmap() fails - which may be due to running out of virtual
> > address space - we'll actually try to release some pack-file memory and
> > try again. Maybe there's a bug there - and it would be one that seldom
> > triggers for others.
>
> Ho humm. We really do have some interesting things there.
Always enjoyable when these mail threads get this deep ;)
>
> Is this a 64-bit machine? I didn't think OS X did that, but if there is
> some limited 64-bit support there, maybe "sizeof(void *)" is 8, then we
> default the default git pack-window to a pretty healthy 1GB.
I was only mentioning OS X with regards to the Samba/NFS red herring,
the rest of our operations are on 64-bit Linux machines.
The machine I reproduced this on ("Public repo case!") is the following:
tyler@grapefruit:~> uname -a
Linux grapefruit.corp.slide.com 2.6.27.7-9-default #1 SMP
2008-12-04 18:10:04 +0100 x86_64 x86_64 x86_64 GNU/Linux
tyler@grapefruit:~> cat /etc/issue
Welcome to openSUSE 11.1 - Kernel \r (\l).
The machines we're experiencing this issue on "in the wild" are:
xdev3 (master)% uname -a
Linux xdev3 2.6.24-22-server #1 SMP Mon Nov 24 20:06:28 UTC 2008
x86_64 GNU/Linux
xdev3 (master)% cat /etc/issue
Ubuntu 8.04.1 \n \l
>
> I could easily see that if you have a virtual memory size limit of 1.5GB,
> and the pack window size is 1GB, we might have trouble. Because we could
> only keep one such pack window in memory at a time.
The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW
>
> I have _not_ looked at the code, though. I'd have expected a SIGSEGV if we
> really had issues with the window handling.
>
> Anyway, _if_ your system has 64-bit pointers, then _maybe_ something the
> default 1GB pack window causes problem.
>
> If so, then adding a
>
> [core]
> packedgitwindowsize = 64M
>
> might make a difference. It would certainly be very interesting to hear if
> there's any impact.
I can try this still if you'd like, but it doesn't seem like that'd be
the issue since we're already lowering the window size system-wide
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 0:48 ` Linus Torvalds
@ 2009-01-08 0:57 ` R. Tyler Ballance
2009-01-08 1:08 ` Linus Torvalds
2009-01-08 2:21 ` James Pickens
1 sibling, 1 reply; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-08 0:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]
On Wed, 2009-01-07 at 16:48 -0800, Linus Torvalds wrote:
>
> On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
> >
> > My most esteemed colleague (Ken aka kb) who pointed out the memory issue
> > was on the right path (I think), and I have a reproduction case you can
> > try with your very own Linux kernel tree!
> >
> > WOO!
> >
> > I set ulimit -v really low (150M), and the operations I made got an
> > mmap(2) fatal error, but there is a sweet spot that I found, see the
> > transcript below.
>
> This is indeed the packfile mapping. The sweet spot you found depends on
> how big the biggest two pack-files are, I do believe.
>
> And if you do that
>
> [core]
> packedgitwindowsize = 64M
>
> I think you'll find that it works. Of course, with a _really_ low ulimit,
> you'd need to make it even smaller, but at some point you start hitting
> other problems than the pack-file limits, ie just the simple fact that git
> wants and expects you to have a certain amount of memory available ;)
>
> Can you cnfirm that your "reproducible" case starts working with that
> addition to your ~/.gitconfig? If so, the solution is pretty simple: we
> should just lower the default pack windowsize.
This certainly corrected the issue, is there some magic
packedgitwindowsize that i should be looking at my own repository (our
internal one) in order to prevent the issue from occurring?
Looking into .git/objects/pack, I think the two biggest pack files are
3.5G and 177MBG respectively :-!
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 0:49 ` R. Tyler Ballance
@ 2009-01-08 1:01 ` Linus Torvalds
2009-01-08 1:06 ` R. Tyler Ballance
0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 1:01 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML
On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
>
> I was only mentioning OS X with regards to the Samba/NFS red herring,
> the rest of our operations are on 64-bit Linux machines.
Ahh, ok. Good.
> > I could easily see that if you have a virtual memory size limit of 1.5GB,
> > and the pack window size is 1GB, we might have trouble. Because we could
> > only keep one such pack window in memory at a time.
>
> The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW
Interesting. So you already had to lower it. However, now that you mention
it, and now that I search for your emails about it on the mailing list (I
don't normally read the mailing list except very occasionally), I see your
patch that does
#define DYNAMIC_WINDOW_SIZE_PERCENTAGE 0.85
...
packed_git_window_size = (unsigned int)(as->rlim_cur * DYNAMIC_WINDOW_SIZE_PERCENTAGE);
which is actually very bad.
It's bad for several reasons:
- 85% of the virtual address space is actually pessimal.
You need space for AT LEAST two full-sized windows, so you need less
than 50%.
- the way that variable is used, it _has_ to be a multiple of the page
size. In fact, it needs to be a multiple of _twice_ the page size. So
just doing a random fraction of the rlimit is not correct.
Setting it in the .gitconfig does it right, though.
> > If so, then adding a
> >
> > [core]
> > packedgitwindowsize = 64M
> >
> > might make a difference. It would certainly be very interesting to hear if
> > there's any impact.
>
> I can try this still if you'd like, but it doesn't seem like that'd be
> the issue since we're already lowering the window size system-wide
Please do try, at least if your local git changes still match that patch I
found, because that patch generates problems.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 1:01 ` Linus Torvalds
@ 2009-01-08 1:06 ` R. Tyler Ballance
0 siblings, 0 replies; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-08 1:06 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Nicolas Pitre, Jan Krüger, Git ML
[-- Attachment #1: Type: text/plain, Size: 2255 bytes --]
On Wed, 2009-01-07 at 17:01 -0800, Linus Torvalds wrote:
>
> On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
> >
> > I was only mentioning OS X with regards to the Samba/NFS red herring,
> > the rest of our operations are on 64-bit Linux machines.
>
> Ahh, ok. Good.
>
> > > I could easily see that if you have a virtual memory size limit of 1.5GB,
> > > and the pack window size is 1GB, we might have trouble. Because we could
> > > only keep one such pack window in memory at a time.
> >
> > The DEFAULT_PACKED_GIT_WINDOW_SIZE in our local builds is 256M, FWIW
>
> Interesting. So you already had to lower it. However, now that you mention
> it, and now that I search for your emails about it on the mailing list (I
> don't normally read the mailing list except very occasionally), I see your
> patch that does
>
> #define DYNAMIC_WINDOW_SIZE_PERCENTAGE 0.85
> ...
> packed_git_window_size = (unsigned int)(as->rlim_cur * DYNAMIC_WINDOW_SIZE_PERCENTAGE);
>
> which is actually very bad.
>
> It's bad for several reasons:
>
> - 85% of the virtual address space is actually pessimal.
>
> You need space for AT LEAST two full-sized windows, so you need less
> than 50%.
>
> - the way that variable is used, it _has_ to be a multiple of the page
> size. In fact, it needs to be a multiple of _twice_ the page size. So
> just doing a random fraction of the rlimit is not correct.
This patch never made it into any of our Git builds because my flight
landed and it wasn't stable enough (and as you pointed out, it sucks ;))
>
> Setting it in the .gitconfig does it right, though.
>
> > > If so, then adding a
> > >
> > > [core]
> > > packedgitwindowsize = 64M
> > >
> > > might make a difference. It would certainly be very interesting to hear if
> > > there's any impact.
> >
> > I can try this still if you'd like, but it doesn't seem like that'd be
> > the issue since we're already lowering the window size system-wide
>
> Please do try, at least if your local git changes still match that patch I
> found, because that patch generates problems.
See my prior reply in "Public repo case!" sent at 4:57PST
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 0:57 ` R. Tyler Ballance
@ 2009-01-08 1:08 ` Linus Torvalds
2009-01-08 1:29 ` Linus Torvalds
0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 1:08 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
> >
> > Can you cnfirm that your "reproducible" case starts working with that
> > addition to your ~/.gitconfig? If so, the solution is pretty simple: we
> > should just lower the default pack windowsize.
>
> This certainly corrected the issue, is there some magic
> packedgitwindowsize that i should be looking at my own repository (our
> internal one) in order to prevent the issue from occurring?
>
> Looking into .git/objects/pack, I think the two biggest pack files are
> 3.5G and 177MBG respectively :-!
So there's a few rules to packedgitwindowsize:
- we need to be able to have at least two windows open at a time, in
addition to all the "normal" memory git needs just for objects, of
course. And quite frankly, you'd be better off with a few more windows,
even if that obviously implies smaller windows.
- the window size really wants to be a round power-of-two number, and at
_least_ it wants to be a nice multiple of the 2*page size.
So if you have a virtual memory limit of 1.5GB, I'd hesitate to make the
pack window size less than 512M, and 256M is probably better. That way,
I'd expect you to be able to always have at least four windows open
(assuming a reasonably generous half a gigabyte for "other stuff").
And quite frankly, there's not a huge downside to making them smaller. At
"just" 32MB, you'll still fit plenty of data in one pack window, and while
it will cost you a few mmap/unmap's to switch windows around, most
operations simply will not likely ever notice. At least not under Linux,
where mmap/munmap is pretty cheap.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 1:08 ` Linus Torvalds
@ 2009-01-08 1:29 ` Linus Torvalds
2009-01-08 1:46 ` Shawn O. Pearce
0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 1:29 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
On Wed, 7 Jan 2009, Linus Torvalds wrote:
>
> So there's a few rules to packedgitwindowsize:
>
> - we need to be able to have at least two windows open at a time, in
> addition to all the "normal" memory git needs just for objects, of
> course. And quite frankly, you'd be better off with a few more windows,
> even if that obviously implies smaller windows.
Btw, I'm not 100% certain of this. Somebody should double-check me. Maybe
there are cases where we want more than two windows alive. And maybe there
aren't even that, and we can always make do with just one.
So I will _not_ guarantee that "at least two pack windows" is necessarily
the right answer. The windowing code was mostly other people doing it. I
think Shawn and Nico.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 1:29 ` Linus Torvalds
@ 2009-01-08 1:46 ` Shawn O. Pearce
0 siblings, 0 replies; 49+ messages in thread
From: Shawn O. Pearce @ 2009-01-08 1:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, 7 Jan 2009, Linus Torvalds wrote:
> >
> > So there's a few rules to packedgitwindowsize:
> >
> > - we need to be able to have at least two windows open at a time, in
> > addition to all the "normal" memory git needs just for objects, of
> > course. And quite frankly, you'd be better off with a few more windows,
> > even if that obviously implies smaller windows.
>
> Btw, I'm not 100% certain of this. Somebody should double-check me. Maybe
> there are cases where we want more than two windows alive. And maybe there
> aren't even that, and we can always make do with just one.
>
> So I will _not_ guarantee that "at least two pack windows" is necessarily
> the right answer. The windowing code was mostly other people doing it. I
> think Shawn and Nico.
I was fairly certain we needed at least two windows open at once,
but reviewing the code in sha1_file.c I don't see a reason for that
restriction anymore.
I think it used to have to do with the delta reconstruction; to
unpack a delta we would read the delta header from one window,
but we may need base data from another. The delta unpack code
kept the delta window pinned in use, so we couldn't replace it
to access base data from elsewhere, hence we needed two windows.
Thinking about it now I don't recall how we handled the recusion
on a delta chain longer than 2. ;-)
But looking at the code we have long since refactored it so this
isn't an issue anymore. We release the window between reading
the delta header and reading the base, so the delta window can be
replaced if necessary. I think the "2 window minimum" is just a
performance suggestion, not a requirement.
--
Shawn.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 0:48 ` Linus Torvalds
2009-01-08 0:57 ` R. Tyler Ballance
@ 2009-01-08 2:21 ` James Pickens
2009-01-08 2:43 ` Shawn O. Pearce
2009-01-08 2:52 ` Boyd Stephen Smith Jr.
1 sibling, 2 replies; 49+ messages in thread
From: James Pickens @ 2009-01-08 2:21 UTC (permalink / raw)
To: Git ML
Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, kb,
Linus Torvalds
On Wed, Jan 7, 2009, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Can you cnfirm that your "reproducible" case starts working with that
> addition to your ~/.gitconfig? If so, the solution is pretty simple: we
> should just lower the default pack windowsize.
Umm... isn't that more of a workaround than a solution? I.e., if you lower
the default pack windowsize, couldn't the corruption still happen under the
right conditions?
James
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 2:21 ` James Pickens
@ 2009-01-08 2:43 ` Shawn O. Pearce
2009-01-08 5:40 ` Junio C Hamano
2009-01-08 2:52 ` Boyd Stephen Smith Jr.
1 sibling, 1 reply; 49+ messages in thread
From: Shawn O. Pearce @ 2009-01-08 2:43 UTC (permalink / raw)
To: James Pickens
Cc: Git ML, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, kb,
Linus Torvalds
James Pickens <jepicken@gmail.com> wrote:
> On Wed, Jan 7, 2009, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > Can you cnfirm that your "reproducible" case starts working with that
> > addition to your ~/.gitconfig? If so, the solution is pretty simple: we
> > should just lower the default pack windowsize.
>
> Umm... isn't that more of a workaround than a solution? I.e., if you lower
> the default pack windowsize, couldn't the corruption still happen under the
> right conditions?
Uhm, yea. So I managed to reproduce it on a Linux system here.
Different object ids than R. Tyler's case, but I'm going to try
to debug it and see why we are getting these.
For those following along at home, Linus' 2.6 tree:
$ ulimit -v `echo '150 * 1024'|bc -l`
$ git co 56d18e9932ebf4e8eca42d2ce509450e6c9c1666
HEAD is now at 56d18e9... Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
$ git merge 9e42d0cf5020aaf217433cad1a224745241d212a
Updating 56d18e9..9e42d0c
error: failed to read delta base object ef135b90084f3c54fccea4e273aeff029db2d873 at offset 48342508 from .git/objects/pack/pack-edb47354be787909e05c15bd1d9eb8b4684d2e4d.pack
error: failed to read delta base object c4e828b71d96622bb258938d69aab9cec53d5cae at offset 128427683 from .git/objects/pack/pack-edb47354be787909e05c15bd1d9eb8b4684d2e4d.pack
error: failed to read object 3cd5a6463cfd9306095bf6312a9b7ab09d4f2f5d at offset 128427777 from .git/objects/pack/pack-edb47354be787909e05c15bd1d9eb8b4684d2e4d.pack
fatal: object 3cd5a6463cfd9306095bf6312a9b7ab09d4f2f5d is corrupted
No, the repository is not corrupt. We f'd up our memory management
somewhere.
--
Shawn.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance
2009-01-08 0:48 ` Linus Torvalds
@ 2009-01-08 2:52 ` Linus Torvalds
2009-01-08 3:01 ` Shawn O. Pearce
1 sibling, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 2:52 UTC (permalink / raw)
To: R. Tyler Ballance; +Cc: Nicolas Pitre, Jan Krüger, Git ML, kb
On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
>
> tyler@grapefruit:~/source/git/linux-2.6> git pull
> error: failed to read object be1b87c70af69acfadb8a27a7a76dfb61de92643 at offset 1850923
> from .git/objects/pack/pack-dbe154052997a05499eb6b4fd90b924da68e799a.pack
> fatal: object be1b87c70af69acfadb8a27a7a76dfb61de92643 is corrupted
Btw, this is an interesting error message, mostly because of what is
_not_ there.
In particular, it doesn't report any reason _why_ it failed to read the
object, which as far as I can tell can happen for only one reason:
unpack_compressed_entry() returns NULL, and that path is the only thing
that can do so without a message.
And it only does it if zlib fails.
Now, zlib can fail because the unpacking fails, but it can fail for other
cases too.
And the thing is, we don't check/report those kinds of failure cases very
well. Which really bit us here, because if we had checked the return value
of inflateInit(), we'd almost certainly would have gotten a nice big "you
ran out of memory" thing, and we wouldn't have been chasing this down as a
corruption issue.
We probably should wrap all the "inflateInit()" calls, and do something
like
static void xinflateInit(z_streamp strm)
{
const char *err;
switch (inflateInit(strm)) {
case Z_OK:
return;
case Z_MEM_ERROR:
err = "out of memory";
break;
case Z_VERSION_ERROR:
err = "wrong version";
break;
default:
err = "error";
}
die("inflateInit: %s (%s)", err,
strm->msg ? strm->msg : "no message");
}
or similar. That way we'd get good error reports when we run out of
memory, rather than consider it to be a corruption issue.
We could also try to make a few of these wrappers actually release some of
the memory (the way xmmap() does), but there are likely diminishing
returns. And the much more important issue is the proper error reporting:
if we had reported "out of memory", we'd not have spent so much time
chasing disk corruption etc.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 2:21 ` James Pickens
2009-01-08 2:43 ` Shawn O. Pearce
@ 2009-01-08 2:52 ` Boyd Stephen Smith Jr.
1 sibling, 0 replies; 49+ messages in thread
From: Boyd Stephen Smith Jr. @ 2009-01-08 2:52 UTC (permalink / raw)
To: James Pickens
Cc: Git ML, R. Tyler Ballance, Nicolas Pitre, Jan Krüger, kb,
Linus Torvalds
[-- Attachment #1: Type: text/plain, Size: 945 bytes --]
On Wednesday 2009 January 07 20:21:18 James Pickens wrote:
>On Wed, Jan 7, 2009, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> Can you cnfirm that your "reproducible" case starts working with that
>> addition to your ~/.gitconfig? If so, the solution is pretty simple: we
>> should just lower the default pack windowsize.
>
>Umm... isn't that more of a workaround than a solution? I.e., if you lower
>the default pack windowsize, couldn't the corruption still happen under the
>right conditions?
IMHO:
I agree, somewhat. I'm fine with a "die()" message when there's not enough
memory, but either corruption or just a spurious, but scary "<SHA> is
corrupt" messages should be fixed.
--
Boyd Stephen Smith Jr. ,= ,-_-. =.
bss@iguanasuicide.net ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/ \_/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 2:52 ` Linus Torvalds
@ 2009-01-08 3:01 ` Shawn O. Pearce
2009-01-08 3:06 ` Linus Torvalds
0 siblings, 1 reply; 49+ messages in thread
From: Shawn O. Pearce @ 2009-01-08 3:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, 7 Jan 2009, R. Tyler Ballance wrote:
> >
> > tyler@grapefruit:~/source/git/linux-2.6> git pull
> > error: failed to read object be1b87c70af69acfadb8a27a7a76dfb61de92643 at offset 1850923
> > from .git/objects/pack/pack-dbe154052997a05499eb6b4fd90b924da68e799a.pack
> > fatal: object be1b87c70af69acfadb8a27a7a76dfb61de92643 is corrupted
>
> Btw, this is an interesting error message, mostly because of what is
> _not_ there.
>
> In particular, it doesn't report any reason _why_ it failed to read the
> object, which as far as I can tell can happen for only one reason:
> unpack_compressed_entry() returns NULL, and that path is the only thing
> that can do so without a message.
>
> And it only does it if zlib fails.
Ok, well, in this case I've been able to reproduce a zlib inflate
failure on the base object in a 2 deep delta chain. We got back:
#define Z_STREAM_ERROR (-2)
this causes the buffer to be freed and NULL to come back out of
unpack_compressed_entry(), and then everything is corrupt...
--
Shawn.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 3:01 ` Shawn O. Pearce
@ 2009-01-08 3:06 ` Linus Torvalds
2009-01-08 3:13 ` Shawn O. Pearce
0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 3:06 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb
On Wed, 7 Jan 2009, Shawn O. Pearce wrote:
>
> Ok, well, in this case I've been able to reproduce a zlib inflate
> failure on the base object in a 2 deep delta chain. We got back:
>
> #define Z_STREAM_ERROR (-2)
>
> this causes the buffer to be freed and NULL to come back out of
> unpack_compressed_entry(), and then everything is corrupt...
I bet you actually got an earlier error already from the inflateInit.
The Z_STREAM_ERROR probably comes from inflate() itself - and could very
easily be due to a allocation error in inflateInit leaving the stream data
incomplete.
Let me try wrapping that dang thing and send a patch.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 3:06 ` Linus Torvalds
@ 2009-01-08 3:13 ` Shawn O. Pearce
2009-01-08 3:16 ` [PATCH] Wrap inflateInit to retry allocation after releasing pack memory Shawn O. Pearce
0 siblings, 1 reply; 49+ messages in thread
From: Shawn O. Pearce @ 2009-01-08 3:13 UTC (permalink / raw)
To: Linus Torvalds
Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> The Z_STREAM_ERROR probably comes from inflate() itself - and could very
> easily be due to a allocation error in inflateInit leaving the stream data
> incomplete.
>
> Let me try wrapping that dang thing and send a patch.
Yup. I'm actually doing the same thing...
--
Shawn.
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 3:13 ` Shawn O. Pearce
@ 2009-01-08 3:16 ` Shawn O. Pearce
2009-01-08 3:54 ` Linus Torvalds
0 siblings, 1 reply; 49+ messages in thread
From: Shawn O. Pearce @ 2009-01-08 3:16 UTC (permalink / raw)
To: Linus Torvalds, Junio C Hamano
Cc: R. Tyler Ballance, Nicolas Pitre, Jan Krüger, Git ML, kb
If we are running low on virtual memory we should release pack
windows if zlib's inflateInit fails due to an out of memory error.
It may be that we are running under a low ulimit and are getting
tight on address space. Shedding unused windows may get us
sufficient working space to continue.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > The Z_STREAM_ERROR probably comes from inflate() itself - and could very
> > easily be due to a allocation error in inflateInit leaving the stream data
> > incomplete.
> >
> > Let me try wrapping that dang thing and send a patch.
>
> Yup. I'm actually doing the same thing...
builtin-apply.c | 2 +-
builtin-pack-objects.c | 2 +-
builtin-unpack-objects.c | 2 +-
cache.h | 1 +
http-push.c | 4 ++--
http-walker.c | 4 ++--
index-pack.c | 4 ++--
sha1_file.c | 8 ++++----
wrapper.c | 20 ++++++++++++++++++++
9 files changed, 34 insertions(+), 13 deletions(-)
diff --git a/builtin-apply.c b/builtin-apply.c
index af25ee9..cb2663e 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -1258,7 +1258,7 @@ static char *inflate_it(const void *data, unsigned long size,
stream.avail_in = size;
stream.next_out = out = xmalloc(inflated_size);
stream.avail_out = inflated_size;
- inflateInit(&stream);
+ xinflateInit(&stream);
st = inflate(&stream, Z_FINISH);
if ((st != Z_STREAM_END) || stream.total_out != inflated_size) {
free(out);
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index e851534..09576c6 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -195,7 +195,7 @@ static int check_pack_inflate(struct packed_git *p,
int st;
memset(&stream, 0, sizeof(stream));
- inflateInit(&stream);
+ xinflateInit(&stream);
do {
in = use_pack(p, w_curs, offset, &stream.avail_in);
stream.next_in = in;
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index 47ed610..cb9edac 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -99,7 +99,7 @@ static void *get_data(unsigned long size)
stream.avail_out = size;
stream.next_in = fill(1);
stream.avail_in = len;
- inflateInit(&stream);
+ xinflateInit(&stream);
for (;;) {
int ret = inflate(&stream, 0);
diff --git a/cache.h b/cache.h
index 231c06d..7d5c38d 100644
--- a/cache.h
+++ b/cache.h
@@ -17,6 +17,7 @@
#if defined(NO_DEFLATE_BOUND) || ZLIB_VERNUM < 0x1200
#define deflateBound(c,s) ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11)
#endif
+extern void xinflateInit(z_stream *stream);
#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
#define DTYPE(de) ((de)->d_type)
diff --git a/http-push.c b/http-push.c
index a4b7d08..906ca48 100644
--- a/http-push.c
+++ b/http-push.c
@@ -269,7 +269,7 @@ static void start_fetch_loose(struct transfer_request *request)
memset(&request->stream, 0, sizeof(request->stream));
- inflateInit(&request->stream);
+ xinflateInit(&request->stream);
git_SHA1_Init(&request->c);
@@ -310,7 +310,7 @@ static void start_fetch_loose(struct transfer_request *request)
file; also rewind to the beginning of the local file. */
if (prev_read == -1) {
memset(&request->stream, 0, sizeof(request->stream));
- inflateInit(&request->stream);
+ xinflateInit(&request->stream);
git_SHA1_Init(&request->c);
if (prev_posn>0) {
prev_posn = 0;
diff --git a/http-walker.c b/http-walker.c
index 7271c7d..6aa8486 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -142,7 +142,7 @@ static void start_object_request(struct walker *walker,
memset(&obj_req->stream, 0, sizeof(obj_req->stream));
- inflateInit(&obj_req->stream);
+ xinflateInit(&obj_req->stream);
git_SHA1_Init(&obj_req->c);
@@ -183,7 +183,7 @@ static void start_object_request(struct walker *walker,
file; also rewind to the beginning of the local file. */
if (prev_read == -1) {
memset(&obj_req->stream, 0, sizeof(obj_req->stream));
- inflateInit(&obj_req->stream);
+ xinflateInit(&obj_req->stream);
git_SHA1_Init(&obj_req->c);
if (prev_posn>0) {
prev_posn = 0;
diff --git a/index-pack.c b/index-pack.c
index 2931511..c6bfc12 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -275,7 +275,7 @@ static void *unpack_entry_data(unsigned long offset, unsigned long size)
stream.avail_out = size;
stream.next_in = fill(1);
stream.avail_in = input_len;
- inflateInit(&stream);
+ xinflateInit(&stream);
for (;;) {
int ret = inflate(&stream, 0);
@@ -382,7 +382,7 @@ static void *get_data_from_pack(struct object_entry *obj)
stream.avail_out = obj->size;
stream.next_in = src;
stream.avail_in = len;
- inflateInit(&stream);
+ xinflateInit(&stream);
while ((st = inflate(&stream, Z_FINISH)) == Z_OK);
inflateEnd(&stream);
if (st != Z_STREAM_END || stream.total_out != obj->size)
diff --git a/sha1_file.c b/sha1_file.c
index 52d1ead..9aabae2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1196,7 +1196,7 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon
stream->avail_out = bufsiz;
if (legacy_loose_object(map)) {
- inflateInit(stream);
+ xinflateInit(stream);
return inflate(stream, 0);
}
@@ -1217,7 +1217,7 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon
/* Set up the stream for the rest.. */
stream->next_in = map;
stream->avail_in = mapsize;
- inflateInit(stream);
+ xinflateInit(stream);
/* And generate the fake traditional header */
stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu",
@@ -1348,7 +1348,7 @@ unsigned long get_size_from_delta(struct packed_git *p,
stream.next_out = delta_head;
stream.avail_out = sizeof(delta_head);
- inflateInit(&stream);
+ xinflateInit(&stream);
do {
in = use_pack(p, w_curs, curpos, &stream.avail_in);
stream.next_in = in;
@@ -1585,7 +1585,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
stream.next_out = buffer;
stream.avail_out = size;
- inflateInit(&stream);
+ xinflateInit(&stream);
do {
in = use_pack(p, w_curs, curpos, &stream.avail_in);
stream.next_in = in;
diff --git a/wrapper.c b/wrapper.c
index 93562f0..f255eef 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -196,3 +196,23 @@ int xmkstemp(char *template)
die("Unable to create temporary file: %s", strerror(errno));
return fd;
}
+
+void xinflateInit(z_stream *stream)
+{
+ switch (inflateInit(stream)) {
+ case Z_OK:
+ return;
+
+ case Z_MEM_ERROR:
+ release_pack_memory(128 * 1024, -1);
+ if (inflateInit(stream) == Z_OK)
+ return;
+ die("Out of memory? inflateInit failed");
+
+ case Z_VERSION_ERROR:
+ die("Wrong zlib version? inflateInit failed");
+
+ default:
+ die("Unknown inflateInit failure");
+ }
+}
--
1.6.1.141.gfe98e
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 3:16 ` [PATCH] Wrap inflateInit to retry allocation after releasing pack memory Shawn O. Pearce
@ 2009-01-08 3:54 ` Linus Torvalds
2009-01-08 5:23 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 3:54 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, R. Tyler Ballance, Nicolas Pitre, Jan Krüger,
Git ML, kb
On Wed, 7 Jan 2009, Shawn O. Pearce wrote:
>
> If we are running low on virtual memory we should release pack
> windows if zlib's inflateInit fails due to an out of memory error.
> It may be that we are running under a low ulimit and are getting
> tight on address space. Shedding unused windows may get us
> sufficient working space to continue.
Let's do this (more complete) wrapping instead, ok?
This one _just_ wraps things, btw - it doesn't do the "retry on low memory
error" part, at least not yet. I think that's an independent issue from
the reporting.
Hmm?
Tyler - does this make the corruption errors go away, and be replaced by
hard failures with "out of memory" reporting?
This patch is potentially pretty noisy, on purpose. I didn't remove the
reporting from places that already do so - some of them have stricter
errors than this.
For example: Z_BUF_ERROR can be valid depending on circumstance, so the
wrapper doesn't complain about it, but the caller may not accept it.
Linus
---
builtin-apply.c | 5 ++-
builtin-pack-objects.c | 6 ++--
builtin-unpack-objects.c | 6 ++--
cache.h | 4 +++
http-push.c | 8 +++---
http-walker.c | 8 +++---
index-pack.c | 12 ++++----
sha1_file.c | 24 +++++++++---------
wrapper.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++
9 files changed, 99 insertions(+), 34 deletions(-)
diff --git a/builtin-apply.c b/builtin-apply.c
index 07244b0..ed02b6d 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -1253,8 +1253,9 @@ static char *inflate_it(const void *data, unsigned long size,
stream.avail_in = size;
stream.next_out = out = xmalloc(inflated_size);
stream.avail_out = inflated_size;
- inflateInit(&stream);
- st = inflate(&stream, Z_FINISH);
+ git_inflate_init(&stream);
+ st = git_inflate(&stream, Z_FINISH);
+ git_inflate_end(&stream);
if ((st != Z_STREAM_END) || stream.total_out != inflated_size) {
free(out);
return NULL;
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index e851534..cb51916 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -195,16 +195,16 @@ static int check_pack_inflate(struct packed_git *p,
int st;
memset(&stream, 0, sizeof(stream));
- inflateInit(&stream);
+ git_inflate_init(&stream);
do {
in = use_pack(p, w_curs, offset, &stream.avail_in);
stream.next_in = in;
stream.next_out = fakebuf;
stream.avail_out = sizeof(fakebuf);
- st = inflate(&stream, Z_FINISH);
+ st = git_inflate(&stream, Z_FINISH);
offset += stream.next_in - in;
} while (st == Z_OK || st == Z_BUF_ERROR);
- inflateEnd(&stream);
+ git_inflate_end(&stream);
return (st == Z_STREAM_END &&
stream.total_out == expect &&
stream.total_in == len) ? 0 : -1;
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index 47ed610..9a77323 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -99,10 +99,10 @@ static void *get_data(unsigned long size)
stream.avail_out = size;
stream.next_in = fill(1);
stream.avail_in = len;
- inflateInit(&stream);
+ git_inflate_init(&stream);
for (;;) {
- int ret = inflate(&stream, 0);
+ int ret = git_inflate(&stream, 0);
use(len - stream.avail_in);
if (stream.total_out == size && ret == Z_STREAM_END)
break;
@@ -118,7 +118,7 @@ static void *get_data(unsigned long size)
stream.next_in = fill(1);
stream.avail_in = len;
}
- inflateEnd(&stream);
+ git_inflate_end(&stream);
return buf;
}
diff --git a/cache.h b/cache.h
index 231c06d..49e54fb 100644
--- a/cache.h
+++ b/cache.h
@@ -18,6 +18,10 @@
#define deflateBound(c,s) ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11)
#endif
+void git_inflate_init(z_streamp strm);
+void git_inflate_end(z_streamp strm);
+int git_inflate(z_streamp strm, int flush);
+
#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
#define DTYPE(de) ((de)->d_type)
#else
diff --git a/http-push.c b/http-push.c
index 7c64609..809002b 100644
--- a/http-push.c
+++ b/http-push.c
@@ -208,7 +208,7 @@ static size_t fwrite_sha1_file(void *ptr, size_t eltsize, size_t nmemb,
do {
request->stream.next_out = expn;
request->stream.avail_out = sizeof(expn);
- request->zret = inflate(&request->stream, Z_SYNC_FLUSH);
+ request->zret = git_inflate(&request->stream, Z_SYNC_FLUSH);
git_SHA1_Update(&request->c, expn,
sizeof(expn) - request->stream.avail_out);
} while (request->stream.avail_in && request->zret == Z_OK);
@@ -268,7 +268,7 @@ static void start_fetch_loose(struct transfer_request *request)
memset(&request->stream, 0, sizeof(request->stream));
- inflateInit(&request->stream);
+ git_inflate_init(&request->stream);
git_SHA1_Init(&request->c);
@@ -309,7 +309,7 @@ static void start_fetch_loose(struct transfer_request *request)
file; also rewind to the beginning of the local file. */
if (prev_read == -1) {
memset(&request->stream, 0, sizeof(request->stream));
- inflateInit(&request->stream);
+ git_inflate_init(&request->stream);
git_SHA1_Init(&request->c);
if (prev_posn>0) {
prev_posn = 0;
@@ -741,7 +741,7 @@ static void finish_request(struct transfer_request *request)
if (request->http_code == 416)
fprintf(stderr, "Warning: requested range invalid; we may already have all the data.\n");
- inflateEnd(&request->stream);
+ git_inflate_end(&request->stream);
git_SHA1_Final(request->real_sha1, &request->c);
if (request->zret != Z_STREAM_END) {
unlink(request->tmpfile);
diff --git a/http-walker.c b/http-walker.c
index 7271c7d..0dbad3c 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -82,7 +82,7 @@ static size_t fwrite_sha1_file(void *ptr, size_t eltsize, size_t nmemb,
do {
obj_req->stream.next_out = expn;
obj_req->stream.avail_out = sizeof(expn);
- obj_req->zret = inflate(&obj_req->stream, Z_SYNC_FLUSH);
+ obj_req->zret = git_inflate(&obj_req->stream, Z_SYNC_FLUSH);
git_SHA1_Update(&obj_req->c, expn,
sizeof(expn) - obj_req->stream.avail_out);
} while (obj_req->stream.avail_in && obj_req->zret == Z_OK);
@@ -142,7 +142,7 @@ static void start_object_request(struct walker *walker,
memset(&obj_req->stream, 0, sizeof(obj_req->stream));
- inflateInit(&obj_req->stream);
+ git_inflate_init(&obj_req->stream);
git_SHA1_Init(&obj_req->c);
@@ -183,7 +183,7 @@ static void start_object_request(struct walker *walker,
file; also rewind to the beginning of the local file. */
if (prev_read == -1) {
memset(&obj_req->stream, 0, sizeof(obj_req->stream));
- inflateInit(&obj_req->stream);
+ git_inflate_init(&obj_req->stream);
git_SHA1_Init(&obj_req->c);
if (prev_posn>0) {
prev_posn = 0;
@@ -243,7 +243,7 @@ static void finish_object_request(struct object_request *obj_req)
return;
}
- inflateEnd(&obj_req->stream);
+ git_inflate_end(&obj_req->stream);
git_SHA1_Final(obj_req->real_sha1, &obj_req->c);
if (obj_req->zret != Z_STREAM_END) {
unlink(obj_req->tmpfile);
diff --git a/index-pack.c b/index-pack.c
index 60ed41a..c0a3d97 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -275,10 +275,10 @@ static void *unpack_entry_data(unsigned long offset, unsigned long size)
stream.avail_out = size;
stream.next_in = fill(1);
stream.avail_in = input_len;
- inflateInit(&stream);
+ git_inflate_init(&stream);
for (;;) {
- int ret = inflate(&stream, 0);
+ int ret = git_inflate(&stream, 0);
use(input_len - stream.avail_in);
if (stream.total_out == size && ret == Z_STREAM_END)
break;
@@ -287,7 +287,7 @@ static void *unpack_entry_data(unsigned long offset, unsigned long size)
stream.next_in = fill(1);
stream.avail_in = input_len;
}
- inflateEnd(&stream);
+ git_inflate_end(&stream);
return buf;
}
@@ -382,9 +382,9 @@ static void *get_data_from_pack(struct object_entry *obj)
stream.avail_out = obj->size;
stream.next_in = src;
stream.avail_in = len;
- inflateInit(&stream);
- while ((st = inflate(&stream, Z_FINISH)) == Z_OK);
- inflateEnd(&stream);
+ git_inflate_init(&stream);
+ while ((st = git_inflate(&stream, Z_FINISH)) == Z_OK);
+ git_inflate_end(&stream);
if (st != Z_STREAM_END || stream.total_out != obj->size)
die("serious inflate inconsistency");
free(src);
diff --git a/sha1_file.c b/sha1_file.c
index 52d1ead..8600b04 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1196,8 +1196,8 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon
stream->avail_out = bufsiz;
if (legacy_loose_object(map)) {
- inflateInit(stream);
- return inflate(stream, 0);
+ git_inflate_init(stream);
+ return git_inflate(stream, 0);
}
@@ -1217,7 +1217,7 @@ static int unpack_sha1_header(z_stream *stream, unsigned char *map, unsigned lon
/* Set up the stream for the rest.. */
stream->next_in = map;
stream->avail_in = mapsize;
- inflateInit(stream);
+ git_inflate_init(stream);
/* And generate the fake traditional header */
stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu",
@@ -1254,11 +1254,11 @@ static void *unpack_sha1_rest(z_stream *stream, void *buffer, unsigned long size
stream->next_out = buf + bytes;
stream->avail_out = size - bytes;
while (status == Z_OK)
- status = inflate(stream, Z_FINISH);
+ status = git_inflate(stream, Z_FINISH);
}
buf[size] = 0;
if (status == Z_STREAM_END && !stream->avail_in) {
- inflateEnd(stream);
+ git_inflate_end(stream);
return buf;
}
@@ -1348,15 +1348,15 @@ unsigned long get_size_from_delta(struct packed_git *p,
stream.next_out = delta_head;
stream.avail_out = sizeof(delta_head);
- inflateInit(&stream);
+ git_inflate_init(&stream);
do {
in = use_pack(p, w_curs, curpos, &stream.avail_in);
stream.next_in = in;
- st = inflate(&stream, Z_FINISH);
+ st = git_inflate(&stream, Z_FINISH);
curpos += stream.next_in - in;
} while ((st == Z_OK || st == Z_BUF_ERROR) &&
stream.total_out < sizeof(delta_head));
- inflateEnd(&stream);
+ git_inflate_end(&stream);
if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) {
error("delta data unpack-initial failed");
return 0;
@@ -1585,14 +1585,14 @@ static void *unpack_compressed_entry(struct packed_git *p,
stream.next_out = buffer;
stream.avail_out = size;
- inflateInit(&stream);
+ git_inflate_init(&stream);
do {
in = use_pack(p, w_curs, curpos, &stream.avail_in);
stream.next_in = in;
- st = inflate(&stream, Z_FINISH);
+ st = git_inflate(&stream, Z_FINISH);
curpos += stream.next_in - in;
} while (st == Z_OK || st == Z_BUF_ERROR);
- inflateEnd(&stream);
+ git_inflate_end(&stream);
if ((st != Z_STREAM_END) || stream.total_out != size) {
free(buffer);
return NULL;
@@ -2017,7 +2017,7 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size
status = error("unable to parse %s header", sha1_to_hex(sha1));
else if (sizep)
*sizep = size;
- inflateEnd(&stream);
+ git_inflate_end(&stream);
munmap(map, mapsize);
return status;
}
diff --git a/wrapper.c b/wrapper.c
index 93562f0..29afa96 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -196,3 +196,63 @@ int xmkstemp(char *template)
die("Unable to create temporary file: %s", strerror(errno));
return fd;
}
+
+/*
+ * zlib wrappers to make sure we don't silently miss errors
+ * at init time.
+ */
+void git_inflate_init(z_streamp strm)
+{
+ const char *err;
+
+ switch (inflateInit(strm)) {
+ case Z_OK:
+ return;
+
+ case Z_MEM_ERROR:
+ err = "out of memory";
+ break;
+ case Z_VERSION_ERROR:
+ err = "wrong version";
+ break;
+ default:
+ err = "error";
+ }
+ die("inflateInit: %s (%s)", err, strm->msg ? strm->msg : "no message");
+}
+
+void git_inflate_end(z_streamp strm)
+{
+ if (inflateEnd(strm) != Z_OK)
+ error("inflateEnd: %s", strm->msg ? strm->msg : "failed");
+}
+
+int git_inflate(z_streamp strm, int flush)
+{
+ int ret = inflate(strm, flush);
+ const char *err;
+
+ switch (ret) {
+ /* Out of memory is fatal. */
+ case Z_MEM_ERROR:
+ die("inflate: out of memory");
+
+ /* Data corruption errors: we may want to recover from them (fsck) */
+ case Z_NEED_DICT:
+ err = "needs dictionary"; break;
+ case Z_DATA_ERROR:
+ err = "data stream error"; break;
+ case Z_STREAM_ERROR:
+ err = "stream consistency error"; break;
+ default:
+ err = "unknown error"; break;
+
+ /* Z_BUF_ERROR: normal, needs a buffer output buffer */
+ case Z_BUF_ERROR:
+ case Z_OK:
+ case Z_STREAM_END:
+ return ret;
+ }
+ error("inflate: %s (%s)", err, strm->msg ? strm->msg : "no message");
+ return ret;
+}
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 3:54 ` Linus Torvalds
@ 2009-01-08 5:23 ` Junio C Hamano
2009-01-08 15:35 ` Linus Torvalds
2009-01-08 15:34 ` Shawn O. Pearce
2009-01-08 18:15 ` R. Tyler Ballance
2 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2009-01-08 5:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Shawn O. Pearce, R. Tyler Ballance, Nicolas Pitre,
Jan Krüger, Git ML, kb
Linus Torvalds <torvalds@linux-foundation.org> writes:
> +int git_inflate(z_streamp strm, int flush)
> +{
> +...
> + /* Z_BUF_ERROR: normal, needs a buffer output buffer */
> + case Z_BUF_ERROR:
Thanks, but "needs a buffer output buffer" made me scratch my head
somewhat.
... Z_BUF_ERROR if no progress is possible or if there was not enough
room in the output buffer when Z_FINISH is used. Note that Z_BUF_ERROR
is not fatal, and inflate() can be called again with more input and more
output space to continue decompressing.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 2:43 ` Shawn O. Pearce
@ 2009-01-08 5:40 ` Junio C Hamano
2009-01-08 6:04 ` Shawn O. Pearce
0 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2009-01-08 5:40 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: James Pickens, Git ML, R. Tyler Ballance, Nicolas Pitre,
Jan Krüger, kb, Linus Torvalds
"Shawn O. Pearce" <spearce@spearce.org> writes:
> For those following along at home, Linus' 2.6 tree:
>
> $ ulimit -v `echo '150 * 1024'|bc -l`
> $ git co 56d18e9932ebf4e8eca42d2ce509450e6c9c1666
Hmm, without any "wrap zlib to die on error" patch, this step already
fails with:
$ git checkout 56d18e9932ebf4e8eca42d2ce509450e6c9c1666
fatal: Out of memory? mmap failed: Cannot allocate memory
I guess that is because our test repositories are packed differently.
I'll retry after repacking..
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: Public repro case! Re: [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file
2009-01-08 5:40 ` Junio C Hamano
@ 2009-01-08 6:04 ` Shawn O. Pearce
0 siblings, 0 replies; 49+ messages in thread
From: Shawn O. Pearce @ 2009-01-08 6:04 UTC (permalink / raw)
To: Junio C Hamano
Cc: James Pickens, Git ML, R. Tyler Ballance, Nicolas Pitre,
Jan Krüger, kb, Linus Torvalds
Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> > For those following along at home, Linus' 2.6 tree:
> >
> > $ ulimit -v `echo '150 * 1024'|bc -l`
> > $ git co 56d18e9932ebf4e8eca42d2ce509450e6c9c1666
>
> Hmm, without any "wrap zlib to die on error" patch, this step already
> fails with:
>
> $ git checkout 56d18e9932ebf4e8eca42d2ce509450e6c9c1666
> fatal: Out of memory? mmap failed: Cannot allocate memory
>
> I guess that is because our test repositories are packed differently.
> I'll retry after repacking..
Yup. I actually did something more like this to get the test
repository:
git clone git://android.git.kernel.org/kernel/common.git
git fetch git://kernel.org/pub/.../torvalds/linux-2.6.git master
The android kernel repository I had handy on my local system was
quite a bit away from Linus' so I wound up with two different but
sizable packs. I thought android was closer to upstream, but its
apparently not. I started from there because it was local and I
thought it would be a quick way to get a test environment, but
sadly it didn't even have the base 56d18e we were talking about.
--
Shawn.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 3:54 ` Linus Torvalds
2009-01-08 5:23 ` Junio C Hamano
@ 2009-01-08 15:34 ` Shawn O. Pearce
2009-01-08 16:14 ` Linus Torvalds
2009-01-08 18:15 ` R. Tyler Ballance
2 siblings, 1 reply; 49+ messages in thread
From: Shawn O. Pearce @ 2009-01-08 15:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Junio C Hamano, R. Tyler Ballance, Nicolas Pitre, Jan Krüger,
Git ML, kb
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Let's do this (more complete) wrapping instead, ok?
Ack.
> This one _just_ wraps things, btw - it doesn't do the "retry on low memory
> error" part, at least not yet. I think that's an independent issue from
> the reporting.
I still think we should try to reduce pack memory usage when we get
oom from zlib and retry the current operation once. We do it almost
everywhere else and it works relatively well.
We may also want to consider dropping (e.g. halving) the window
size and/or limit when we run out of memory. We'll run slower but
if the OS has denied us further resources it may be a ulimit thing
on a shared system, and we should try harder to work with what we
have available to us.
--
Shawn.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 5:23 ` Junio C Hamano
@ 2009-01-08 15:35 ` Linus Torvalds
0 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 15:35 UTC (permalink / raw)
To: Junio C Hamano
Cc: Shawn O. Pearce, R. Tyler Ballance, Nicolas Pitre,
Jan Krüger, Git ML, kb
On Wed, 7 Jan 2009, Junio C Hamano wrote:
>
> Thanks, but "needs a buffer output buffer" made me scratch my head
> somewhat.
That was just me editing it.
It was originally "needs an output buffer" and then I was supposed to edit
it to "needs a buffer" (because it can be either output or input, and in
the case of git it's usually actually the input that was partial).
And then I messed up, and it became that.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 15:34 ` Shawn O. Pearce
@ 2009-01-08 16:14 ` Linus Torvalds
0 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 16:14 UTC (permalink / raw)
To: Shawn O. Pearce
Cc: Junio C Hamano, R. Tyler Ballance, Nicolas Pitre, Jan Krüger,
Git ML, kb
On Thu, 8 Jan 2009, Shawn O. Pearce wrote:
>
> I still think we should try to reduce pack memory usage when we get
> oom from zlib and retry the current operation once. We do it almost
> everywhere else and it works relatively well.
Oh, I agree.
It's just that I wanted to verify that people who see this problem
actually see the message, and that we confirm that it is due to this and
nothing else.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 3:54 ` Linus Torvalds
2009-01-08 5:23 ` Junio C Hamano
2009-01-08 15:34 ` Shawn O. Pearce
@ 2009-01-08 18:15 ` R. Tyler Ballance
2009-01-08 20:22 ` Linus Torvalds
2 siblings, 1 reply; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-08 18:15 UTC (permalink / raw)
To: Linus Torvalds
Cc: Shawn O. Pearce, Junio C Hamano, Nicolas Pitre, Jan Krüger,
Git ML, kb
[-- Attachment #1: Type: text/plain, Size: 2597 bytes --]
On Wed, 2009-01-07 at 19:54 -0800, Linus Torvalds wrote:
>
> On Wed, 7 Jan 2009, Shawn O. Pearce wrote:
> >
> > If we are running low on virtual memory we should release pack
> > windows if zlib's inflateInit fails due to an out of memory error.
> > It may be that we are running under a low ulimit and are getting
> > tight on address space. Shedding unused windows may get us
> > sufficient working space to continue.
>
> Let's do this (more complete) wrapping instead, ok?
>
> This one _just_ wraps things, btw - it doesn't do the "retry on low memory
> error" part, at least not yet. I think that's an independent issue from
> the reporting.
>
> Hmm?
>
> Tyler - does this make the corruption errors go away, and be replaced by
> hard failures with "out of memory" reporting?
Yeah, looks like it:
tyler@grapefruit:~/source/git/linux-2.6> export
START=56d18e9932ebf4e8eca42d2ce509450e6c9c1666
tyler@grapefruit:~/source/git/linux-2.6> git reset --hard
HEAD is now at 9e42d0c Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
tyler@grapefruit:~/source/git/linux-2.6> git reset --hard $START
HEAD is now at 56d18e9 Merge branch 'upstream' of
git://ftp.linux-mips.org/pub/scm/upstream-linus
tyler@grapefruit:~/source/git/linux-2.6> ulimit -v `echo "350 *
1024" | bc -l`
tyler@grapefruit:~/source/git/linux-2.6> limit
cputime unlimited
filesize unlimited
datasize unlimited
stacksize 8MB
coredumpsize 0kB
memoryuse 2561MB
maxproc 24564
descriptors 1024
memorylocked 64kB
addressspace 350MB
maxfilelocks unlimited
sigpending 24564
msgqueue 819200
nice 0
rt_priority 0
tyler@grapefruit:~/source/git/linux-2.6> git pull
Updating 56d18e9..9e42d0c
fatal: Out of memory? inflateInit failed
tyler@grapefruit:~/source/git/linux-2.6> which git
/home/tyler/bin/git
tyler@grapefruit:~/source/git/linux-2.6>
>
> This patch is potentially pretty noisy, on purpose. I didn't remove the
> reporting from places that already do so - some of them have stricter
> errors than this.
I'm assuming this patch is going to be reworked, if so, I'll back it out
of our internal 1.6.1 build and anxiously await The Real Deal(tm)
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 18:15 ` R. Tyler Ballance
@ 2009-01-08 20:22 ` Linus Torvalds
2009-01-08 20:37 ` R. Tyler Ballance
2009-01-09 1:43 ` Junio C Hamano
0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2009-01-08 20:22 UTC (permalink / raw)
To: R. Tyler Ballance
Cc: Shawn O. Pearce, Junio C Hamano, Nicolas Pitre, Jan Krüger,
Git ML, kb
On Thu, 8 Jan 2009, R. Tyler Ballance wrote:
> >
> > Tyler - does this make the corruption errors go away, and be replaced by
> > hard failures with "out of memory" reporting?
>
> Yeah, looks like it:
Well, I was hoping that you'd have a confirmation from your own huge repo,
but I do suspect it's all the same thing, so I guess this counts as
confirmation too.
> > This patch is potentially pretty noisy, on purpose. I didn't remove the
> > reporting from places that already do so - some of them have stricter
> > errors than this.
>
> I'm assuming this patch is going to be reworked, if so, I'll back it out
> of our internal 1.6.1 build and anxiously await The Real Deal(tm)
Oh, it shouldn't be any noisier under _normal_ load - it's more that
certain real corruption cases will now report the error twice. That said,
the new errors should actually be more informative than the old ones, so
even that isn't necessarily all bad.
Junio - I think we should apply this, and likely to the stable branch too.
Add the re-trying the inflateInit() after shrinking pack windows on top of
it.
Linus
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 20:22 ` Linus Torvalds
@ 2009-01-08 20:37 ` R. Tyler Ballance
2009-01-09 1:43 ` Junio C Hamano
1 sibling, 0 replies; 49+ messages in thread
From: R. Tyler Ballance @ 2009-01-08 20:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Shawn O. Pearce, Junio C Hamano, Nicolas Pitre, Jan Krüger,
Git ML, kb
[-- Attachment #1: Type: text/plain, Size: 2325 bytes --]
On Thu, 2009-01-08 at 12:22 -0800, Linus Torvalds wrote:
>
> On Thu, 8 Jan 2009, R. Tyler Ballance wrote:
> > >
> > > Tyler - does this make the corruption errors go away, and be replaced by
> > > hard failures with "out of memory" reporting?
> >
> > Yeah, looks like it:
>
> Well, I was hoping that you'd have a confirmation from your own huge repo,
> but I do suspect it's all the same thing, so I guess this counts as
> confirmation too.
I never got a real solid "consistent" reproduction case with our
repository, just a lot of users that experienced the issue. I think the
Linux repro case is a far better example, and yeah, it's sorta
confirmation (waiting for operations here to deploy the patched 1.6.1 to
dev machines).
>
> > > This patch is potentially pretty noisy, on purpose. I didn't remove the
> > > reporting from places that already do so - some of them have stricter
> > > errors than this.
> >
> > I'm assuming this patch is going to be reworked, if so, I'll back it out
> > of our internal 1.6.1 build and anxiously await The Real Deal(tm)
>
> Oh, it shouldn't be any noisier under _normal_ load - it's more that
> certain real corruption cases will now report the error twice. That said,
> the new errors should actually be more informative than the old ones, so
> even that isn't necessarily all bad.
>
> Junio - I think we should apply this, and likely to the stable branch too.
> Add the re-trying the inflateInit() after shrinking pack windows on top of
> it.
I really appreciate this guys, this is one of the longer threads I've
participated (spanning over a month) and I'm glad you guys were finally
able to track the issue down.
From now moving forward, I'll try to get a reproduction case with the
kernel tree or something equally big since I know it's frustrating to
play the game of telephone with a proprietary code base ("try this? what
does that do? okay, then this?").
Linus, I'll have a chance to look at your comments on my "variable
packed git window size" patch this weekend, and I'll follow-up in the
appropriate thread.
I'm relatively certain that after this witch hunt, I can get Slide to
cover a round of beers at LinuxWorld or the nearest GitTogether ;)
Cheers
--
-R. Tyler Ballance
Slide, Inc.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
2009-01-08 20:22 ` Linus Torvalds
2009-01-08 20:37 ` R. Tyler Ballance
@ 2009-01-09 1:43 ` Junio C Hamano
1 sibling, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2009-01-09 1:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: R. Tyler Ballance, Shawn O. Pearce, Nicolas Pitre,
Jan Krüger, Git ML, kb
Linus Torvalds <torvalds@linux-foundation.org> writes:
> Junio - I think we should apply this, and likely to the stable branch too.
Yeah, I didn't lose the patch.
> Add the re-trying the inflateInit() after shrinking pack windows on top of
> it.
That too.
^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2009-01-09 1:45 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-09 8:36 [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Jan Krüger
2008-12-09 9:02 ` R. Tyler Ballance
2008-12-09 16:24 ` Shawn O. Pearce
2009-01-06 22:52 ` R. Tyler Ballance
2009-01-07 1:25 ` Nicolas Pitre
2009-01-07 1:39 ` R. Tyler Ballance
2009-01-07 2:09 ` Nicolas Pitre
2009-01-07 2:47 ` R. Tyler Ballance
2009-01-07 3:21 ` Nicolas Pitre
2009-01-07 4:54 ` Linus Torvalds
2009-01-07 7:41 ` R. Tyler Ballance
2009-01-07 8:16 ` Junio C Hamano
2009-01-07 8:32 ` R. Tyler Ballance
2009-01-07 9:42 ` Junio C Hamano
2009-01-07 9:05 ` R. Tyler Ballance
2009-01-07 15:31 ` Nicolas Pitre
2009-01-07 16:07 ` Linus Torvalds
2009-01-07 16:08 ` Linus Torvalds
2009-01-07 22:55 ` R. Tyler Ballance
2009-01-07 23:29 ` Linus Torvalds
2009-01-08 0:28 ` Public repro case! " R. Tyler Ballance
2009-01-08 0:48 ` Linus Torvalds
2009-01-08 0:57 ` R. Tyler Ballance
2009-01-08 1:08 ` Linus Torvalds
2009-01-08 1:29 ` Linus Torvalds
2009-01-08 1:46 ` Shawn O. Pearce
2009-01-08 2:21 ` James Pickens
2009-01-08 2:43 ` Shawn O. Pearce
2009-01-08 5:40 ` Junio C Hamano
2009-01-08 6:04 ` Shawn O. Pearce
2009-01-08 2:52 ` Boyd Stephen Smith Jr.
2009-01-08 2:52 ` Linus Torvalds
2009-01-08 3:01 ` Shawn O. Pearce
2009-01-08 3:06 ` Linus Torvalds
2009-01-08 3:13 ` Shawn O. Pearce
2009-01-08 3:16 ` [PATCH] Wrap inflateInit to retry allocation after releasing pack memory Shawn O. Pearce
2009-01-08 3:54 ` Linus Torvalds
2009-01-08 5:23 ` Junio C Hamano
2009-01-08 15:35 ` Linus Torvalds
2009-01-08 15:34 ` Shawn O. Pearce
2009-01-08 16:14 ` Linus Torvalds
2009-01-08 18:15 ` R. Tyler Ballance
2009-01-08 20:22 ` Linus Torvalds
2009-01-08 20:37 ` R. Tyler Ballance
2009-01-09 1:43 ` Junio C Hamano
2009-01-08 0:37 ` [PATCH/RFC] Allow writing loose objects that are corrupted in a pack file Linus Torvalds
2009-01-08 0:49 ` R. Tyler Ballance
2009-01-08 1:01 ` Linus Torvalds
2009-01-08 1:06 ` R. Tyler Ballance
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).