* Big repo not shrinking on repack or gc?
@ 2015-01-14 11:51 Andreas Krey
2015-01-14 12:49 ` Jeff King
0 siblings, 1 reply; 11+ messages in thread
From: Andreas Krey @ 2015-01-14 11:51 UTC (permalink / raw)
To: git
Hi everybody,
I have a repo here that is 130G, but when I clone --mirror it, the result
is only 25G big. Because of the --mirror I don't think that I missed
any refs that keep objects only in the source repo.
I already tried 'git repack -fad' and 'git gc' to shrink the original repo,
but it only shaved off 3G, and there are a lot of loose objects and old
pack files that I simply don't expect to be there after a repack.
Shouldn't 'git gc' (even without --aggressive) or a 'repack -fad' remove
those redundant objects and packs?
How to clean this up? (Additional problem: I don't have enough space
to run a repack anymore.)
Andreas
--
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-14 11:51 Big repo not shrinking on repack or gc? Andreas Krey
@ 2015-01-14 12:49 ` Jeff King
2015-01-14 13:07 ` Andreas Krey
2015-01-14 14:39 ` Andreas Krey
0 siblings, 2 replies; 11+ messages in thread
From: Jeff King @ 2015-01-14 12:49 UTC (permalink / raw)
To: Andreas Krey; +Cc: git
On Wed, Jan 14, 2015 at 12:51:30PM +0100, Andreas Krey wrote:
> I have a repo here that is 130G, but when I clone --mirror it, the result
> is only 25G big. Because of the --mirror I don't think that I missed
> any refs that keep objects only in the source repo.
Perhaps some objects are mentioned by reflogs, but not by the refs? They
would not be transferred as part of a clone. Try:
git rev-list --objects --all | cut -d' ' -f1 | sort >reachable
git rev-list --objects --reflog | cut -d' ' -f1 | sort >reflogs
comm -13 reachable reflogs |
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
That should print the size, in bytes, that reflog-only objects are using
on disk. You can use "git reflog expire --expire-unreachable=now --all"
to get rid of them (and then repack).
> I already tried 'git repack -fad' and 'git gc' to shrink the original repo,
You don't need the "-f" here. Just "git repack -ad" should be enough
(and the "-f" probably makes it _way_ slower).
-Peff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-14 12:49 ` Jeff King
@ 2015-01-14 13:07 ` Andreas Krey
2015-01-14 14:39 ` Andreas Krey
1 sibling, 0 replies; 11+ messages in thread
From: Andreas Krey @ 2015-01-14 13:07 UTC (permalink / raw)
To: Jeff King; +Cc: git
On Wed, 14 Jan 2015 07:49:36 +0000, Jeff King wrote:
> On Wed, Jan 14, 2015 at 12:51:30PM +0100, Andreas Krey wrote:
>
> > I have a repo here that is 130G, but when I clone --mirror it, the result
> > is only 25G big. Because of the --mirror I don't think that I missed
> > any refs that keep objects only in the source repo.
>
> Perhaps some objects are mentioned by reflogs, but not by the refs? They
> would not be transferred as part of a clone. Try:
>
> git rev-list --objects --all | cut -d' ' -f1 | sort >reachable
> git rev-list --objects --reflog | cut -d' ' -f1 | sort >reflogs
Actually, the output of 'git rev-list --objects --reflog' is empty, and
there isn't even a reflog (or similar) directory. (This is a bare repo
inside atlass. stash.)
...
> > I already tried 'git repack -fad' and 'git gc' to shrink the original repo,
>
> You don't need the "-f" here. Just "git repack -ad" should be enough
> (and the "-f" probably makes it _way_ slower).
Right, the -f is an old workaround for old jgits in another repo.
Apparently, part of the trick is --prune=all or similar on 'git gc',
to get rid of the loose objects faster. That got a copy of the repo
down to around 70G - still way to go.
Andreas
--
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-14 12:49 ` Jeff King
2015-01-14 13:07 ` Andreas Krey
@ 2015-01-14 14:39 ` Andreas Krey
2015-01-14 16:00 ` Andreas Krey
2015-01-14 17:24 ` Junio C Hamano
1 sibling, 2 replies; 11+ messages in thread
From: Andreas Krey @ 2015-01-14 14:39 UTC (permalink / raw)
To: Jeff King; +Cc: git
On Wed, 14 Jan 2015 07:49:36 +0000, Jeff King wrote:
...
> You don't need the "-f" here. Just "git repack -ad" should be enough
> (and the "-f" probably makes it _way_ slower).
Indeed, factor four.
However, my expectation is that a repack -ad will remove all the
old pack files, as what is in there is either referenced and put
into the new pack, or dropped => there should be a single pack file
afterwards.
This is not the case. :-( (Done only with 1.8.2 due to
lack of compilers for this box.)
Andreas
--
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-14 14:39 ` Andreas Krey
@ 2015-01-14 16:00 ` Andreas Krey
2015-01-14 17:24 ` Junio C Hamano
1 sibling, 0 replies; 11+ messages in thread
From: Andreas Krey @ 2015-01-14 16:00 UTC (permalink / raw)
To: Jeff King; +Cc: git
On Wed, 14 Jan 2015 15:39:46 +0000, Andreas Krey wrote:
...
> This is not the case. :-( (Done only with 1.8.2 due to
> lack of compilers for this box.)
Neither for current git (copied repo to other machine)
There is one new pack file of a plausible size (25G),
and 65G worth of old packfiles.
Andreas
--
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-14 14:39 ` Andreas Krey
2015-01-14 16:00 ` Andreas Krey
@ 2015-01-14 17:24 ` Junio C Hamano
2015-01-15 1:23 ` Bryan Turner
1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2015-01-14 17:24 UTC (permalink / raw)
To: Andreas Krey; +Cc: Jeff King, git
Andreas Krey <a.krey@gmx.de> writes:
> On Wed, 14 Jan 2015 07:49:36 +0000, Jeff King wrote:
> ...
>> You don't need the "-f" here. Just "git repack -ad" should be enough
>> (and the "-f" probably makes it _way_ slower).
>
> Indeed, factor four.
>
> However, my expectation is that a repack -ad will remove all the
> old pack files, as what is in there is either referenced and put
> into the new pack, or dropped => there should be a single pack file
> afterwards.
>
> This is not the case. :-( (Done only with 1.8.2 due to
> lack of compilers for this box.)
Guess in the dark: "ls -l .git/objects/pack"
Do you see any .keep files?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-14 17:24 ` Junio C Hamano
@ 2015-01-15 1:23 ` Bryan Turner
2015-01-15 6:38 ` Andreas Krey
0 siblings, 1 reply; 11+ messages in thread
From: Bryan Turner @ 2015-01-15 1:23 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Andreas Krey, Jeff King, Git Users
On Thu, Jan 15, 2015 at 4:24 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Andreas Krey <a.krey@gmx.de> writes:
>
>> On Wed, 14 Jan 2015 07:49:36 +0000, Jeff King wrote:
>> ...
>>> You don't need the "-f" here. Just "git repack -ad" should be enough
>>> (and the "-f" probably makes it _way_ slower).
>>
>> Indeed, factor four.
>>
>> However, my expectation is that a repack -ad will remove all the
>> old pack files, as what is in there is either referenced and put
>> into the new pack, or dropped => there should be a single pack file
>> afterwards.
>>
>> This is not the case. :-( (Done only with 1.8.2 due to
>> lack of compilers for this box.)
>
> Guess in the dark: "ls -l .git/objects/pack"
> Do you see any .keep files?
I'm one of the Stash developers and just noticed this thread. If the
repository in question has been forked via Stash there likely _will_
be .keep files. Stash uses alternates for forks, so it's possible, by
deleting those kept packs and pruning objects (which you've already
done I see) that you will corrupt, or have already corrupted, some
number of the forks. (At the moment Stash packs "garbage" into a "dead
pack" which it flags with a .keep, to ensure forks don't lose access
to objects that once existed upstream that they still reference.)
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-15 1:23 ` Bryan Turner
@ 2015-01-15 6:38 ` Andreas Krey
2015-01-15 7:05 ` Bryan Turner
0 siblings, 1 reply; 11+ messages in thread
From: Andreas Krey @ 2015-01-15 6:38 UTC (permalink / raw)
To: Bryan Turner; +Cc: Junio C Hamano, Jeff King, Git Users
On Thu, 15 Jan 2015 12:23:00 +0000, Bryan Turner wrote:
...
> > Guess in the dark: "ls -l .git/objects/pack"
> > Do you see any .keep files?
Lots of. :-(
> I'm one of the Stash developers and just noticed this thread. If the
> repository in question has been forked via Stash there likely _will_
> be .keep files. Stash uses alternates for forks, so it's possible, by
> deleting those kept packs and pruning objects (which you've already
> done I see) that you will corrupt, or have already corrupted, some
> number of the forks.
There are a few forks in this stash instance, but the repository in
question is neither the source nor the destination of any.
So, git seems to be mostly out of the equation now (gc and repack
apparently doing what they are supposed to do), and the question
moves to 'how can stash let such a repo grow to that size'.
> (At the moment Stash packs "garbage" into a "dead
> pack" which it flags with a .keep, to ensure forks don't lose access
> to objects that once existed upstream that they still reference.)
Does it do so in any case even if there is no actual fork? That would
explain a lot - we are daily (force-)pushing new commit in there (and
potentially big ones) that become garbage the next day, and should
be cleaned up rather fast.
(We're pulling them into another non-stash repo for longer-term keeping -
these are backups of dev repos in the form of git stash commits including
untracked files.)
Andreas
--
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-15 6:38 ` Andreas Krey
@ 2015-01-15 7:05 ` Bryan Turner
2015-01-15 7:43 ` Andreas Krey
0 siblings, 1 reply; 11+ messages in thread
From: Bryan Turner @ 2015-01-15 7:05 UTC (permalink / raw)
To: Andreas Krey; +Cc: Junio C Hamano, Jeff King, Git Users
On Thu, Jan 15, 2015 at 5:38 PM, Andreas Krey <a.krey@gmx.de> wrote:
> On Thu, 15 Jan 2015 12:23:00 +0000, Bryan Turner wrote:
> ...
>> > Guess in the dark: "ls -l .git/objects/pack"
>> > Do you see any .keep files?
>
> Lots of. :-(
>
>> I'm one of the Stash developers and just noticed this thread. If the
>> repository in question has been forked via Stash there likely _will_
>> be .keep files. Stash uses alternates for forks, so it's possible, by
>> deleting those kept packs and pruning objects (which you've already
>> done I see) that you will corrupt, or have already corrupted, some
>> number of the forks.
>
> There are a few forks in this stash instance, but the repository in
> question is neither the source nor the destination of any.
>
> So, git seems to be mostly out of the equation now (gc and repack
> apparently doing what they are supposed to do), and the question
> moves to 'how can stash let such a repo grow to that size'.
>
>
>> (At the moment Stash packs "garbage" into a "dead
>> pack" which it flags with a .keep, to ensure forks don't lose access
>> to objects that once existed upstream that they still reference.)
>
> Does it do so in any case even if there is no actual fork? That would
> explain a lot - we are daily (force-)pushing new commit in there (and
> potentially big ones) that become garbage the next day, and should
> be cleaned up rather fast.
No, Stash will only do that in a repository which has been forked. In
any non-forked repository, Stash does not interact with garbage
collection in any way. Auto GC is left enabled, and all pruning
settings are left at their defaults. The default pruning interval is
two weeks, so if your development approach is rebase-heavy you may
need to adjust them.
What are the contents of some of those .keep files? If they're written
by Stash they contain a message saying so. ("GENERATED BY ATLASSIAN
STASH - DO NOT REMOVE")
>
> (We're pulling them into another non-stash repo for longer-term keeping -
> these are backups of dev repos in the form of git stash commits including
> untracked files.)
>
> Andreas
>
> --
> "Totally trivial. Famous last words."
> From: Linus Torvalds <torvalds@*.org>
> Date: Fri, 22 Jan 2010 07:29:21 -0800
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-15 7:05 ` Bryan Turner
@ 2015-01-15 7:43 ` Andreas Krey
2015-01-15 8:56 ` Bryan Turner
0 siblings, 1 reply; 11+ messages in thread
From: Andreas Krey @ 2015-01-15 7:43 UTC (permalink / raw)
To: Bryan Turner; +Cc: Junio C Hamano, Jeff King, Git Users
On Thu, 15 Jan 2015 18:05:46 +0000, Bryan Turner wrote:
...
> No, Stash will only do that in a repository which has been forked. In
> any non-forked repository, Stash does not interact with garbage
> collection in any way. Auto GC is left enabled, and all pruning
> settings are left at their defaults. The default pruning interval is
> two weeks, so if your development approach is rebase-heavy you may
> need to adjust them.
>
> What are the contents of some of those .keep files? If they're written
> by Stash they contain a message saying so. ("GENERATED BY ATLASSIAN
> STASH - DO NOT REMOVE")
They do. So it seems it was forked once upon a time, but...
/opt/apps/atlassian/stash-data/shared/data/repositories $ grep '' */objects/info/alternates
158/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/20/objects
45/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/33/objects
93/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/91/objects
...there is no trace of a fork still existing (the repo in question is 143).
Andreas
--
"Totally trivial. Famous last words."
From: Linus Torvalds <torvalds@*.org>
Date: Fri, 22 Jan 2010 07:29:21 -0800
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc?
2015-01-15 7:43 ` Andreas Krey
@ 2015-01-15 8:56 ` Bryan Turner
0 siblings, 0 replies; 11+ messages in thread
From: Bryan Turner @ 2015-01-15 8:56 UTC (permalink / raw)
To: Andreas Krey; +Cc: Junio C Hamano, Jeff King, Git Users
On Thu, Jan 15, 2015 at 6:43 PM, Andreas Krey <a.krey@gmx.de> wrote:
> On Thu, 15 Jan 2015 18:05:46 +0000, Bryan Turner wrote:
> ...
>
> They do. So it seems it was forked once upon a time, but...
>
> /opt/apps/atlassian/stash-data/shared/data/repositories $ grep '' */objects/info/alternates
> 158/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/20/objects
> 45/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/33/objects
> 93/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/91/objects
>
> ...there is no trace of a fork still existing (the repo in question is 143).
Yes, the system doesn't currently detect when a repository becomes
un-forked because it's not a common use case.
At this point I think we should probably take this off-list. You can
either e-mail me directly (bturner at atlassian dot com), or, better
still, raise a ticket on support.atlassian.com. Either way I'll work
with you directly to un-fork the repository on disk and allow it to
clean itself up.
>
> Andreas
>
> --
> "Totally trivial. Famous last words."
> From: Linus Torvalds <torvalds@*.org>
> Date: Fri, 22 Jan 2010 07:29:21 -0800
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-01-15 8:56 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-14 11:51 Big repo not shrinking on repack or gc? Andreas Krey
2015-01-14 12:49 ` Jeff King
2015-01-14 13:07 ` Andreas Krey
2015-01-14 14:39 ` Andreas Krey
2015-01-14 16:00 ` Andreas Krey
2015-01-14 17:24 ` Junio C Hamano
2015-01-15 1:23 ` Bryan Turner
2015-01-15 6:38 ` Andreas Krey
2015-01-15 7:05 ` Bryan Turner
2015-01-15 7:43 ` Andreas Krey
2015-01-15 8:56 ` Bryan Turner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).