* git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
@ 2008-04-23 15:41 Avery Pennarun
2008-04-23 17:00 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Avery Pennarun @ 2008-04-23 15:41 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Git Mailing List
On 4/23/08, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Peter Karlsson schrieb:
> > [Not seeing any unreachable objects]
>
> > Jeff King:
> >> Did you remove refs/original/ ?
> >
> > That, and cloned the repository to a new location after the conversion,
> > and removing the references to "origin" there. It does seem that the
> > objects are still there, but I can't see them with "gitk --all".
>
> Did you clone locally? Then you must use the file:// protocol, otherwise
> everything is hard-linked from the origin.
This question has come up at least once a week since I subscribed to
the list. I can think of these solutions:
- Add a note to the git-gc and/or git-repack man page about how hidden
refs can impact the cleanup.
- Add an option to make git-clone *not* hardlink stuff; its different
behaviour for hardlinking vs. file:// seems to be very confusing.
- Make git-gc give a warning when there are some objects that are only
referenced via the reflog or refs/original. (I suspect this would
trigger too often though.)
- Give git-gc a "really, I'm serious" option that makes it ignore the
reflog and refs/original.
Thoughts?
Avery
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first 2008-04-23 15:41 git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first Avery Pennarun @ 2008-04-23 17:00 ` Junio C Hamano 2008-04-23 18:36 ` Avery Pennarun 2008-04-23 22:13 ` Jeff King 0 siblings, 2 replies; 12+ messages in thread From: Junio C Hamano @ 2008-04-23 17:00 UTC (permalink / raw) To: Avery Pennarun; +Cc: Johannes Sixt, Git Mailing List "Avery Pennarun" <apenwarr@gmail.com> writes: > This question has come up at least once a week since I subscribed to > the list. I can think of these solutions: > > - Add a note to the git-gc and/or git-repack man page about how hidden > refs can impact the cleanup. > > - Add an option to make git-clone *not* hardlink stuff; its different > behaviour for hardlinking vs. file:// seems to be very confusing. > > - Make git-gc give a warning when there are some objects that are only > referenced via the reflog or refs/original. (I suspect this would > trigger too often though.) > > - Give git-gc a "really, I'm serious" option that makes it ignore the > reflog and refs/original. - Teach people that leftover cruft is nothing to worry about. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first 2008-04-23 17:00 ` Junio C Hamano @ 2008-04-23 18:36 ` Avery Pennarun 2008-04-23 22:13 ` Jeff King 1 sibling, 0 replies; 12+ messages in thread From: Avery Pennarun @ 2008-04-23 18:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Sixt, Git Mailing List On 4/23/08, Junio C Hamano <gitster@pobox.com> wrote: > "Avery Pennarun" <apenwarr@gmail.com> writes: > > > This question has come up at least once a week since I subscribed to > > the list. I can think of these solutions: > > > > - Add a note to the git-gc and/or git-repack man page about how hidden > > refs can impact the cleanup. > > > > - Add an option to make git-clone *not* hardlink stuff; its different > > behaviour for hardlinking vs. file:// seems to be very confusing. > > > > - Make git-gc give a warning when there are some objects that are only > > referenced via the reflog or refs/original. (I suspect this would > > trigger too often though.) > > > > - Give git-gc a "really, I'm serious" option that makes it ignore the > > reflog and refs/original. > > - Teach people that leftover cruft is nothing to worry about. I think any option that starts with "teach people" will not reduce FAQ traffic to the list :) But maybe we could remind people of this somewhere prominent. The git-filter-branch man page? That said, I think I know why people are concerned about the cruft: it's for the same reason I was when I first tried git-filter-branch to get rid of some gigantic files after importing from svn, to cut the size of a clone from >1GB to <100MB. It's impossible to see if I've succeeded or not unless I make an actual clone, and even *then* I was misled at first because making a local clone is clever and avoids doing any work. Avery ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first 2008-04-23 17:00 ` Junio C Hamano 2008-04-23 18:36 ` Avery Pennarun @ 2008-04-23 22:13 ` Jeff King 2008-04-24 1:28 ` Jeff King 1 sibling, 1 reply; 12+ messages in thread From: Jeff King @ 2008-04-23 22:13 UTC (permalink / raw) To: Junio C Hamano; +Cc: Avery Pennarun, Johannes Sixt, Git Mailing List On Wed, Apr 23, 2008 at 10:00:59AM -0700, Junio C Hamano wrote: > - Teach people that leftover cruft is nothing to worry about. But it _is_ something to worry about in some particular situations. For run-of-the-mill rebasing, sure, ignore it. But this question usually comes up because the user did something like: 1. import from foreign SCM or other source 2. realize massive, history-wide mistake; git filter-branch to fix up the changes 3. wonder why git is using twice as much space as it needs to; with a repository in the hundreds or thousands of megs, this can get really annoying (either because of wasted space, or because some operations, like "git repack -a" actually do per-object work). -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first 2008-04-23 22:13 ` Jeff King @ 2008-04-24 1:28 ` Jeff King 2008-04-24 15:43 ` Avery Pennarun 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2008-04-24 1:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Avery Pennarun, Johannes Sixt, Git Mailing List On Wed, Apr 23, 2008 at 06:13:16PM -0400, Jeff King wrote: > > - Teach people that leftover cruft is nothing to worry about. > > But it _is_ something to worry about in some particular situations. For > run-of-the-mill rebasing, sure, ignore it. But this question usually > comes up because the user did something like: OK, maybe I am wrong. Within a few hours of me posting this, somebody starts a new thread with a toy example wondering why git-gc didn't clean up an --amended commit. I don't know the best way to teach people about this (short of using a big stick, of course), but maybe something like this would help a little: -- >8 -- doc/git-gc: add a note about what is collected It seems to be a FAQ that people try running git-gc, and then get puzzled about why the size of their .git directory didn't change. This note mentions the reasons why things might unexpectedly get kept. Signed-off-by: Jeff King <peff@peff.net> --- Documentation/git-gc.txt | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index d424a4e..9a4b62e 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -104,6 +104,21 @@ The optional configuration variable 'gc.pruneExpire' controls how old the unreferenced loose objects have to be before they are pruned. The default is "2 weeks ago". + +Notes +----- + +git-gc tries very hard to be safe about the garbage it collects. In +particular, it will keep not only objects referenced by your current set +of branches and tags, but also objects referenced by the index, remote +tracking branches, refs saved by linkgit:git-filter-branch[1] in +refs/original/, or reflogs (which may references commits in branches +that were later amended or rewound). + +If you are expecting some objects to be collected and it isn't, check +all of those locations and decide whether it makes sense in your case to +remove those references. + See Also -------- linkgit:git-prune[1] -- 1.5.5.1.143.ge2bb9 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first 2008-04-24 1:28 ` Jeff King @ 2008-04-24 15:43 ` Avery Pennarun 2008-04-24 16:14 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Avery Pennarun @ 2008-04-24 15:43 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On 4/23/08, Jeff King <peff@peff.net> wrote: > + > +Notes > +----- > + > +git-gc tries very hard to be safe about the garbage it collects. In > +particular, it will keep not only objects referenced by your current set > +of branches and tags, but also objects referenced by the index, remote > +tracking branches, refs saved by linkgit:git-filter-branch[1] in > +refs/original/, or reflogs (which may references commits in branches > +that were later amended or rewound). > + > +If you are expecting some objects to be collected and it isn't, check > +all of those locations and decide whether it makes sense in your case to > +remove those references. > + This information would have helped me quite a bit when I first encountered this problem. It would be nice if it also showed up under git-prune (since git-gc doesn't delete anything itself, if I understand correctly). Also a link to some information about reflogs (even just to "see also" git-reflog) would help, since I didn't hear about reflogs at all until after I joined the mailing list. Avery ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first 2008-04-24 15:43 ` Avery Pennarun @ 2008-04-24 16:14 ` Jeff King 2008-04-24 16:59 ` Avery Pennarun 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2008-04-24 16:14 UTC (permalink / raw) To: Avery Pennarun; +Cc: Git Mailing List On Thu, Apr 24, 2008 at 11:43:55AM -0400, Avery Pennarun wrote: > > +If you are expecting some objects to be collected and it isn't, check > > +all of those locations and decide whether it makes sense in your case to > > +remove those references. > > + > > This information would have helped me quite a bit when I first > encountered this problem. It would be nice if it also showed up under > git-prune (since git-gc doesn't delete anything itself, if I Hmm, maybe it would make sense to put that note in git-prune, with a note in git-gc to look at the prune page. > understand correctly). Also a link to some information about reflogs > (even just to "see also" git-reflog) would help, since I didn't hear > about reflogs at all until after I joined the mailing list. $ grep -A6 See.Also Documentation/git-gc.txt See Also -------- linkgit:git-prune[1] linkgit:git-reflog[1] linkgit:git-repack[1] linkgit:git-rerere[1] But if the note were moved to git-prune, it would be natural to mention git-reflog there. What do you think? -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first 2008-04-24 16:14 ` Jeff King @ 2008-04-24 16:59 ` Avery Pennarun 2008-04-29 20:45 ` [PATCH] Documentation: point git-prune users to git-gc Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Avery Pennarun @ 2008-04-24 16:59 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On 4/24/08, Jeff King <peff@peff.net> wrote: > Hmm, maybe it would make sense to put that note in git-prune, with a > note in git-gc to look at the prune page. Perhaps. > But if the note were moved to git-prune, it would be natural to mention > git-reflog there. What do you think? I gather there's a movement in recent git versions (sorry, I only tuned in recently) to encourage people to use git-gc instead of git-prune in almost all cases. The reasons I ever looked at git-prune at all was that git-gc mentioned it in "See Also", and because "git-prune" sounded more obviously like what I wanted than "git-gc" when I looked at "man git". Adding git-gc *and* git-reflog as See Also entries in git-prune would make sense to me. Avery ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] Documentation: point git-prune users to git-gc 2008-04-24 16:59 ` Avery Pennarun @ 2008-04-29 20:45 ` Jeff King 2008-04-29 22:05 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2008-04-29 20:45 UTC (permalink / raw) To: Avery Pennarun; +Cc: Junio C Hamano, Git Mailing List On Thu, Apr 24, 2008 at 12:59:34PM -0400, Avery Pennarun wrote: > I gather there's a movement in recent git versions (sorry, I only > tuned in recently) to encourage people to use git-gc instead of > git-prune in almost all cases. The reasons I ever looked at git-prune Yes, I don't think there is any reason for most people to use git-prune at all, unless they are trying specifically to prune and don't want the other gc effects to happen. Junio, please correct me if I'm wrong there. > Adding git-gc *and* git-reflog as See Also entries in git-prune would > make sense to me. Agreed. Below is a patch that will hopefully clarify the situation. -- >8 -- Documentation: point git-prune users to git-gc Most users should be using git-gc instead of directly calling prune. For those who really do want more information on pruning, let's point them at git-fsck, which goes into slightly more detail on reachability. And since we're pointing users there, let's make sure reflogs are mentioned in git-fsck(1). Signed-off-by: Jeff King <peff@peff.net> --- Documentation/git-fsck.txt | 3 ++- Documentation/git-prune.txt | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletions(-) diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt index f16cb98..4cc26fb 100644 --- a/Documentation/git-fsck.txt +++ b/Documentation/git-fsck.txt @@ -22,7 +22,8 @@ OPTIONS An object to treat as the head of an unreachability trace. + If no objects are given, git-fsck defaults to using the -index file and all SHA1 references in .git/refs/* as heads. +index file, all SHA1 references in .git/refs/*, and all reflogs (unless +--no-reflogs is given) as heads. --unreachable:: Print out objects that exist but that aren't readable from any diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt index f151cff..f92bb8c 100644 --- a/Documentation/git-prune.txt +++ b/Documentation/git-prune.txt @@ -13,6 +13,9 @@ SYNOPSIS DESCRIPTION ----------- +NOTE: In most cases, users should run linkgit:git-gc[1], which calls +git-prune. See the section "NOTES", below. + This runs `git-fsck --unreachable` using all the refs available in `$GIT_DIR/refs`, optionally with additional set of objects specified on the command line, and prunes all @@ -50,6 +53,23 @@ borrows from your repository via its $ git prune $(cd ../another && $(git-rev-parse --all)) ------------ +Notes +----- + +In most cases, users will not need to call git-prune directly, but +should instead call linkgit:git-gc[1], which handles pruning along with +many other housekeeping tasks. + +For a description of which objects are considered for pruning, see +git-fsck's --unreachable option. + +See Also +-------- + +linkgit:git-fsck[1], +linkgit:git-gc[1], +linkgit:git-reflog[1] + Author ------ Written by Linus Torvalds <torvalds@osdl.org> -- 1.5.5.1.172.g4dce ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Documentation: point git-prune users to git-gc 2008-04-29 20:45 ` [PATCH] Documentation: point git-prune users to git-gc Jeff King @ 2008-04-29 22:05 ` Junio C Hamano 2008-04-29 23:19 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2008-04-29 22:05 UTC (permalink / raw) To: Jeff King; +Cc: Avery Pennarun, Git Mailing List Jeff King <peff@peff.net> writes: > Yes, I don't think there is any reason for most people to use git-prune > at all, unless they are trying specifically to prune and don't want the > other gc effects to happen. > > Junio, please correct me if I'm wrong there. Well, this is a hard statement to make corrections to. If A is defined to be a subset of B, and A is generally useful, the only reason to do B is when you want the effect of B without anything else. So your statement cannot be incorrect. However, in order to help people decide when to run B (or, if there ever be a case where they might want to), there needs a discussion what other things that _might_ be unwanted A does in addition to B. For that reason,... > diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt > index f151cff..f92bb8c 100644 > --- a/Documentation/git-prune.txt > +++ b/Documentation/git-prune.txt > @@ -13,6 +13,9 @@ SYNOPSIS > DESCRIPTION > ----------- > > +NOTE: In most cases, users should run linkgit:git-gc[1], which calls > +git-prune. See the section "NOTES", below. > + I think this note upfront is not helping readers very much (this is git-prune documentation after all -- they are interested in the command and not gc), but ... > This runs `git-fsck --unreachable` using all the refs > available in `$GIT_DIR/refs`, optionally with additional set of > objects specified on the command line, and prunes all > @@ -50,6 +53,23 @@ borrows from your repository via its > $ git prune $(cd ../another && $(git-rev-parse --all)) > ------------ > > +Notes > +----- > + > +In most cases, users will not need to call git-prune directly, but > +should instead call linkgit:git-gc[1], which handles pruning along with > +many other housekeeping tasks. ... this paragraph should be made a bit fatter by mentioning what "other housekeeping tasks" are. > +For a description of which objects are considered for pruning, see > +git-fsck's --unreachable option. > + > +See Also > +-------- > + > +linkgit:git-fsck[1], > +linkgit:git-gc[1], > +linkgit:git-reflog[1] > + > Author > ------ > Written by Linus Torvalds <torvalds@osdl.org> > -- > 1.5.5.1.172.g4dce ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Documentation: point git-prune users to git-gc 2008-04-29 22:05 ` Junio C Hamano @ 2008-04-29 23:19 ` Jeff King 2008-04-30 1:01 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2008-04-29 23:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: Avery Pennarun, Git Mailing List On Tue, Apr 29, 2008 at 03:05:03PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Yes, I don't think there is any reason for most people to use git-prune > > at all, unless they are trying specifically to prune and don't want the > > other gc effects to happen. > > > > Junio, please correct me if I'm wrong there. > > Well, this is a hard statement to make corrections to. If A is defined to > be a subset of B, and A is generally useful, the only reason to do B is > when you want the effect of B without anything else. So your statement > cannot be incorrect. Heh. Sorry, I got very sloppy with my wording...there was an 11-month-old child yelling in my ear. :) My meaning was: "people who want to clean up their repo but don't know the right command stumble upon git-prune. They probably should be using git-gc instead. People who know that they want to prune presumably know enough to ignore the warning note." > However, in order to help people decide when to run B (or, if there ever > be a case where they might want to), there needs a discussion what other > things that _might_ be unwanted A does in addition to B. Fair enough. > > --- a/Documentation/git-prune.txt > > +++ b/Documentation/git-prune.txt > > @@ -13,6 +13,9 @@ SYNOPSIS > > DESCRIPTION > > ----------- > > > > +NOTE: In most cases, users should run linkgit:git-gc[1], which calls > > +git-prune. See the section "NOTES", below. > > + > > I think this note upfront is not helping readers very much (this is > git-prune documentation after all -- they are interested in the command > and not gc), but ... I'm not so sure that they are interested in the prune command. At first I started with just a note near the end, but the point of this is specifically to deal with users who "stumble" upon prune, either from reading the command list (i.e., trying to match a command to the objective they want to perform) or from pre-gc tutorials or emails which mention it. > > +Notes > > +----- > > + > > +In most cases, users will not need to call git-prune directly, but > > +should instead call linkgit:git-gc[1], which handles pruning along with > > +many other housekeeping tasks. > > ... this paragraph should be made a bit fatter by mentioning what "other > housekeeping tasks" are. OK, I was trying to imply "go look at git-gc for those tasks" so they didn't have to be repeated. Would you prefer it be spelled out explicitly here, or is a more firm pointer OK? -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Documentation: point git-prune users to git-gc 2008-04-29 23:19 ` Jeff King @ 2008-04-30 1:01 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2008-04-30 1:01 UTC (permalink / raw) To: Jeff King; +Cc: Avery Pennarun, Git Mailing List Jeff King <peff@peff.net> writes: > I'm not so sure that they are interested in the prune command. At first > I started with just a note near the end, but the point of this is > specifically to deal with users who "stumble" upon prune, either from > reading the command list (i.e., trying to match a command to the > objective they want to perform) or from pre-gc tutorials or emails which > mention it. Ah, you are right. People tend to stop reading when they _think_ they heard enough even though they haven't. The note upfront is good, and I suspect we would not have to reword the latter parts either then. >> > +Notes >> > +----- >> > + >> > +In most cases, users will not need to call git-prune directly, but >> > +should instead call linkgit:git-gc[1], which handles pruning along with >> > +many other housekeeping tasks. >> >> ... this paragraph should be made a bit fatter by mentioning what "other >> housekeeping tasks" are. > > OK, I was trying to imply "go look at git-gc for those tasks" so they > didn't have to be repeated. Would you prefer it be spelled out > explicitly here, or is a more firm pointer OK? Will apply as is. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-04-30 1:02 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-04-23 15:41 git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first Avery Pennarun 2008-04-23 17:00 ` Junio C Hamano 2008-04-23 18:36 ` Avery Pennarun 2008-04-23 22:13 ` Jeff King 2008-04-24 1:28 ` Jeff King 2008-04-24 15:43 ` Avery Pennarun 2008-04-24 16:14 ` Jeff King 2008-04-24 16:59 ` Avery Pennarun 2008-04-29 20:45 ` [PATCH] Documentation: point git-prune users to git-gc Jeff King 2008-04-29 22:05 ` Junio C Hamano 2008-04-29 23:19 ` Jeff King 2008-04-30 1:01 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).