* Big repo not shrinking on repack or gc? @ 2015-01-14 11:51 Andreas Krey 2015-01-14 12:49 ` Jeff King 0 siblings, 1 reply; 11+ messages in thread From: Andreas Krey @ 2015-01-14 11:51 UTC (permalink / raw) To: git Hi everybody, I have a repo here that is 130G, but when I clone --mirror it, the result is only 25G big. Because of the --mirror I don't think that I missed any refs that keep objects only in the source repo. I already tried 'git repack -fad' and 'git gc' to shrink the original repo, but it only shaved off 3G, and there are a lot of loose objects and old pack files that I simply don't expect to be there after a repack. Shouldn't 'git gc' (even without --aggressive) or a 'repack -fad' remove those redundant objects and packs? How to clean this up? (Additional problem: I don't have enough space to run a repack anymore.) Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-14 11:51 Big repo not shrinking on repack or gc? Andreas Krey @ 2015-01-14 12:49 ` Jeff King 2015-01-14 13:07 ` Andreas Krey 2015-01-14 14:39 ` Andreas Krey 0 siblings, 2 replies; 11+ messages in thread From: Jeff King @ 2015-01-14 12:49 UTC (permalink / raw) To: Andreas Krey; +Cc: git On Wed, Jan 14, 2015 at 12:51:30PM +0100, Andreas Krey wrote: > I have a repo here that is 130G, but when I clone --mirror it, the result > is only 25G big. Because of the --mirror I don't think that I missed > any refs that keep objects only in the source repo. Perhaps some objects are mentioned by reflogs, but not by the refs? They would not be transferred as part of a clone. Try: git rev-list --objects --all | cut -d' ' -f1 | sort >reachable git rev-list --objects --reflog | cut -d' ' -f1 | sort >reflogs comm -13 reachable reflogs | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' That should print the size, in bytes, that reflog-only objects are using on disk. You can use "git reflog expire --expire-unreachable=now --all" to get rid of them (and then repack). > I already tried 'git repack -fad' and 'git gc' to shrink the original repo, You don't need the "-f" here. Just "git repack -ad" should be enough (and the "-f" probably makes it _way_ slower). -Peff ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-14 12:49 ` Jeff King @ 2015-01-14 13:07 ` Andreas Krey 2015-01-14 14:39 ` Andreas Krey 1 sibling, 0 replies; 11+ messages in thread From: Andreas Krey @ 2015-01-14 13:07 UTC (permalink / raw) To: Jeff King; +Cc: git On Wed, 14 Jan 2015 07:49:36 +0000, Jeff King wrote: > On Wed, Jan 14, 2015 at 12:51:30PM +0100, Andreas Krey wrote: > > > I have a repo here that is 130G, but when I clone --mirror it, the result > > is only 25G big. Because of the --mirror I don't think that I missed > > any refs that keep objects only in the source repo. > > Perhaps some objects are mentioned by reflogs, but not by the refs? They > would not be transferred as part of a clone. Try: > > git rev-list --objects --all | cut -d' ' -f1 | sort >reachable > git rev-list --objects --reflog | cut -d' ' -f1 | sort >reflogs Actually, the output of 'git rev-list --objects --reflog' is empty, and there isn't even a reflog (or similar) directory. (This is a bare repo inside atlass. stash.) ... > > I already tried 'git repack -fad' and 'git gc' to shrink the original repo, > > You don't need the "-f" here. Just "git repack -ad" should be enough > (and the "-f" probably makes it _way_ slower). Right, the -f is an old workaround for old jgits in another repo. Apparently, part of the trick is --prune=all or similar on 'git gc', to get rid of the loose objects faster. That got a copy of the repo down to around 70G - still way to go. Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-14 12:49 ` Jeff King 2015-01-14 13:07 ` Andreas Krey @ 2015-01-14 14:39 ` Andreas Krey 2015-01-14 16:00 ` Andreas Krey 2015-01-14 17:24 ` Junio C Hamano 1 sibling, 2 replies; 11+ messages in thread From: Andreas Krey @ 2015-01-14 14:39 UTC (permalink / raw) To: Jeff King; +Cc: git On Wed, 14 Jan 2015 07:49:36 +0000, Jeff King wrote: ... > You don't need the "-f" here. Just "git repack -ad" should be enough > (and the "-f" probably makes it _way_ slower). Indeed, factor four. However, my expectation is that a repack -ad will remove all the old pack files, as what is in there is either referenced and put into the new pack, or dropped => there should be a single pack file afterwards. This is not the case. :-( (Done only with 1.8.2 due to lack of compilers for this box.) Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-14 14:39 ` Andreas Krey @ 2015-01-14 16:00 ` Andreas Krey 2015-01-14 17:24 ` Junio C Hamano 1 sibling, 0 replies; 11+ messages in thread From: Andreas Krey @ 2015-01-14 16:00 UTC (permalink / raw) To: Jeff King; +Cc: git On Wed, 14 Jan 2015 15:39:46 +0000, Andreas Krey wrote: ... > This is not the case. :-( (Done only with 1.8.2 due to > lack of compilers for this box.) Neither for current git (copied repo to other machine) There is one new pack file of a plausible size (25G), and 65G worth of old packfiles. Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-14 14:39 ` Andreas Krey 2015-01-14 16:00 ` Andreas Krey @ 2015-01-14 17:24 ` Junio C Hamano 2015-01-15 1:23 ` Bryan Turner 1 sibling, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2015-01-14 17:24 UTC (permalink / raw) To: Andreas Krey; +Cc: Jeff King, git Andreas Krey <a.krey@gmx.de> writes: > On Wed, 14 Jan 2015 07:49:36 +0000, Jeff King wrote: > ... >> You don't need the "-f" here. Just "git repack -ad" should be enough >> (and the "-f" probably makes it _way_ slower). > > Indeed, factor four. > > However, my expectation is that a repack -ad will remove all the > old pack files, as what is in there is either referenced and put > into the new pack, or dropped => there should be a single pack file > afterwards. > > This is not the case. :-( (Done only with 1.8.2 due to > lack of compilers for this box.) Guess in the dark: "ls -l .git/objects/pack" Do you see any .keep files? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-14 17:24 ` Junio C Hamano @ 2015-01-15 1:23 ` Bryan Turner 2015-01-15 6:38 ` Andreas Krey 0 siblings, 1 reply; 11+ messages in thread From: Bryan Turner @ 2015-01-15 1:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Andreas Krey, Jeff King, Git Users On Thu, Jan 15, 2015 at 4:24 AM, Junio C Hamano <gitster@pobox.com> wrote: > Andreas Krey <a.krey@gmx.de> writes: > >> On Wed, 14 Jan 2015 07:49:36 +0000, Jeff King wrote: >> ... >>> You don't need the "-f" here. Just "git repack -ad" should be enough >>> (and the "-f" probably makes it _way_ slower). >> >> Indeed, factor four. >> >> However, my expectation is that a repack -ad will remove all the >> old pack files, as what is in there is either referenced and put >> into the new pack, or dropped => there should be a single pack file >> afterwards. >> >> This is not the case. :-( (Done only with 1.8.2 due to >> lack of compilers for this box.) > > Guess in the dark: "ls -l .git/objects/pack" > Do you see any .keep files? I'm one of the Stash developers and just noticed this thread. If the repository in question has been forked via Stash there likely _will_ be .keep files. Stash uses alternates for forks, so it's possible, by deleting those kept packs and pruning objects (which you've already done I see) that you will corrupt, or have already corrupted, some number of the forks. (At the moment Stash packs "garbage" into a "dead pack" which it flags with a .keep, to ensure forks don't lose access to objects that once existed upstream that they still reference.) > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-15 1:23 ` Bryan Turner @ 2015-01-15 6:38 ` Andreas Krey 2015-01-15 7:05 ` Bryan Turner 0 siblings, 1 reply; 11+ messages in thread From: Andreas Krey @ 2015-01-15 6:38 UTC (permalink / raw) To: Bryan Turner; +Cc: Junio C Hamano, Jeff King, Git Users On Thu, 15 Jan 2015 12:23:00 +0000, Bryan Turner wrote: ... > > Guess in the dark: "ls -l .git/objects/pack" > > Do you see any .keep files? Lots of. :-( > I'm one of the Stash developers and just noticed this thread. If the > repository in question has been forked via Stash there likely _will_ > be .keep files. Stash uses alternates for forks, so it's possible, by > deleting those kept packs and pruning objects (which you've already > done I see) that you will corrupt, or have already corrupted, some > number of the forks. There are a few forks in this stash instance, but the repository in question is neither the source nor the destination of any. So, git seems to be mostly out of the equation now (gc and repack apparently doing what they are supposed to do), and the question moves to 'how can stash let such a repo grow to that size'. > (At the moment Stash packs "garbage" into a "dead > pack" which it flags with a .keep, to ensure forks don't lose access > to objects that once existed upstream that they still reference.) Does it do so in any case even if there is no actual fork? That would explain a lot - we are daily (force-)pushing new commit in there (and potentially big ones) that become garbage the next day, and should be cleaned up rather fast. (We're pulling them into another non-stash repo for longer-term keeping - these are backups of dev repos in the form of git stash commits including untracked files.) Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-15 6:38 ` Andreas Krey @ 2015-01-15 7:05 ` Bryan Turner 2015-01-15 7:43 ` Andreas Krey 0 siblings, 1 reply; 11+ messages in thread From: Bryan Turner @ 2015-01-15 7:05 UTC (permalink / raw) To: Andreas Krey; +Cc: Junio C Hamano, Jeff King, Git Users On Thu, Jan 15, 2015 at 5:38 PM, Andreas Krey <a.krey@gmx.de> wrote: > On Thu, 15 Jan 2015 12:23:00 +0000, Bryan Turner wrote: > ... >> > Guess in the dark: "ls -l .git/objects/pack" >> > Do you see any .keep files? > > Lots of. :-( > >> I'm one of the Stash developers and just noticed this thread. If the >> repository in question has been forked via Stash there likely _will_ >> be .keep files. Stash uses alternates for forks, so it's possible, by >> deleting those kept packs and pruning objects (which you've already >> done I see) that you will corrupt, or have already corrupted, some >> number of the forks. > > There are a few forks in this stash instance, but the repository in > question is neither the source nor the destination of any. > > So, git seems to be mostly out of the equation now (gc and repack > apparently doing what they are supposed to do), and the question > moves to 'how can stash let such a repo grow to that size'. > > >> (At the moment Stash packs "garbage" into a "dead >> pack" which it flags with a .keep, to ensure forks don't lose access >> to objects that once existed upstream that they still reference.) > > Does it do so in any case even if there is no actual fork? That would > explain a lot - we are daily (force-)pushing new commit in there (and > potentially big ones) that become garbage the next day, and should > be cleaned up rather fast. No, Stash will only do that in a repository which has been forked. In any non-forked repository, Stash does not interact with garbage collection in any way. Auto GC is left enabled, and all pruning settings are left at their defaults. The default pruning interval is two weeks, so if your development approach is rebase-heavy you may need to adjust them. What are the contents of some of those .keep files? If they're written by Stash they contain a message saying so. ("GENERATED BY ATLASSIAN STASH - DO NOT REMOVE") > > (We're pulling them into another non-stash repo for longer-term keeping - > these are backups of dev repos in the form of git stash commits including > untracked files.) > > Andreas > > -- > "Totally trivial. Famous last words." > From: Linus Torvalds <torvalds@*.org> > Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-15 7:05 ` Bryan Turner @ 2015-01-15 7:43 ` Andreas Krey 2015-01-15 8:56 ` Bryan Turner 0 siblings, 1 reply; 11+ messages in thread From: Andreas Krey @ 2015-01-15 7:43 UTC (permalink / raw) To: Bryan Turner; +Cc: Junio C Hamano, Jeff King, Git Users On Thu, 15 Jan 2015 18:05:46 +0000, Bryan Turner wrote: ... > No, Stash will only do that in a repository which has been forked. In > any non-forked repository, Stash does not interact with garbage > collection in any way. Auto GC is left enabled, and all pruning > settings are left at their defaults. The default pruning interval is > two weeks, so if your development approach is rebase-heavy you may > need to adjust them. > > What are the contents of some of those .keep files? If they're written > by Stash they contain a message saying so. ("GENERATED BY ATLASSIAN > STASH - DO NOT REMOVE") They do. So it seems it was forked once upon a time, but... /opt/apps/atlassian/stash-data/shared/data/repositories $ grep '' */objects/info/alternates 158/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/20/objects 45/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/33/objects 93/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/91/objects ...there is no trace of a fork still existing (the repo in question is 143). Andreas -- "Totally trivial. Famous last words." From: Linus Torvalds <torvalds@*.org> Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Big repo not shrinking on repack or gc? 2015-01-15 7:43 ` Andreas Krey @ 2015-01-15 8:56 ` Bryan Turner 0 siblings, 0 replies; 11+ messages in thread From: Bryan Turner @ 2015-01-15 8:56 UTC (permalink / raw) To: Andreas Krey; +Cc: Junio C Hamano, Jeff King, Git Users On Thu, Jan 15, 2015 at 6:43 PM, Andreas Krey <a.krey@gmx.de> wrote: > On Thu, 15 Jan 2015 18:05:46 +0000, Bryan Turner wrote: > ... > > They do. So it seems it was forked once upon a time, but... > > /opt/apps/atlassian/stash-data/shared/data/repositories $ grep '' */objects/info/alternates > 158/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/20/objects > 45/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/33/objects > 93/objects/info/alternates:/data/opt_apps/atlassian/stash-data/shared/data/repositories/91/objects > > ...there is no trace of a fork still existing (the repo in question is 143). Yes, the system doesn't currently detect when a repository becomes un-forked because it's not a common use case. At this point I think we should probably take this off-list. You can either e-mail me directly (bturner at atlassian dot com), or, better still, raise a ticket on support.atlassian.com. Either way I'll work with you directly to un-fork the repository on disk and allow it to clean itself up. > > Andreas > > -- > "Totally trivial. Famous last words." > From: Linus Torvalds <torvalds@*.org> > Date: Fri, 22 Jan 2010 07:29:21 -0800 ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-01-15 8:56 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-14 11:51 Big repo not shrinking on repack or gc? Andreas Krey 2015-01-14 12:49 ` Jeff King 2015-01-14 13:07 ` Andreas Krey 2015-01-14 14:39 ` Andreas Krey 2015-01-14 16:00 ` Andreas Krey 2015-01-14 17:24 ` Junio C Hamano 2015-01-15 1:23 ` Bryan Turner 2015-01-15 6:38 ` Andreas Krey 2015-01-15 7:05 ` Bryan Turner 2015-01-15 7:43 ` Andreas Krey 2015-01-15 8:56 ` Bryan Turner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).