* git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
@ 2008-04-23 15:41 Avery Pennarun
2008-04-23 17:00 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Avery Pennarun @ 2008-04-23 15:41 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Git Mailing List
On 4/23/08, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Peter Karlsson schrieb:
> > [Not seeing any unreachable objects]
>
> > Jeff King:
> >> Did you remove refs/original/ ?
> >
> > That, and cloned the repository to a new location after the conversion,
> > and removing the references to "origin" there. It does seem that the
> > objects are still there, but I can't see them with "gitk --all".
>
> Did you clone locally? Then you must use the file:// protocol, otherwise
> everything is hard-linked from the origin.
This question has come up at least once a week since I subscribed to
the list. I can think of these solutions:
- Add a note to the git-gc and/or git-repack man page about how hidden
refs can impact the cleanup.
- Add an option to make git-clone *not* hardlink stuff; its different
behaviour for hardlinking vs. file:// seems to be very confusing.
- Make git-gc give a warning when there are some objects that are only
referenced via the reflog or refs/original. (I suspect this would
trigger too often though.)
- Give git-gc a "really, I'm serious" option that makes it ignore the
reflog and refs/original.
Thoughts?
Avery
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
2008-04-23 15:41 git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first Avery Pennarun
@ 2008-04-23 17:00 ` Junio C Hamano
2008-04-23 18:36 ` Avery Pennarun
2008-04-23 22:13 ` Jeff King
0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2008-04-23 17:00 UTC (permalink / raw)
To: Avery Pennarun; +Cc: Johannes Sixt, Git Mailing List
"Avery Pennarun" <apenwarr@gmail.com> writes:
> This question has come up at least once a week since I subscribed to
> the list. I can think of these solutions:
>
> - Add a note to the git-gc and/or git-repack man page about how hidden
> refs can impact the cleanup.
>
> - Add an option to make git-clone *not* hardlink stuff; its different
> behaviour for hardlinking vs. file:// seems to be very confusing.
>
> - Make git-gc give a warning when there are some objects that are only
> referenced via the reflog or refs/original. (I suspect this would
> trigger too often though.)
>
> - Give git-gc a "really, I'm serious" option that makes it ignore the
> reflog and refs/original.
- Teach people that leftover cruft is nothing to worry about.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
2008-04-23 17:00 ` Junio C Hamano
@ 2008-04-23 18:36 ` Avery Pennarun
2008-04-23 22:13 ` Jeff King
1 sibling, 0 replies; 12+ messages in thread
From: Avery Pennarun @ 2008-04-23 18:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Sixt, Git Mailing List
On 4/23/08, Junio C Hamano <gitster@pobox.com> wrote:
> "Avery Pennarun" <apenwarr@gmail.com> writes:
>
> > This question has come up at least once a week since I subscribed to
> > the list. I can think of these solutions:
> >
> > - Add a note to the git-gc and/or git-repack man page about how hidden
> > refs can impact the cleanup.
> >
> > - Add an option to make git-clone *not* hardlink stuff; its different
> > behaviour for hardlinking vs. file:// seems to be very confusing.
> >
> > - Make git-gc give a warning when there are some objects that are only
> > referenced via the reflog or refs/original. (I suspect this would
> > trigger too often though.)
> >
> > - Give git-gc a "really, I'm serious" option that makes it ignore the
> > reflog and refs/original.
>
> - Teach people that leftover cruft is nothing to worry about.
I think any option that starts with "teach people" will not reduce FAQ
traffic to the list :) But maybe we could remind people of this
somewhere prominent. The git-filter-branch man page?
That said, I think I know why people are concerned about the cruft:
it's for the same reason I was when I first tried git-filter-branch to
get rid of some gigantic files after importing from svn, to cut the
size of a clone from >1GB to <100MB. It's impossible to see if I've
succeeded or not unless I make an actual clone, and even *then* I was
misled at first because making a local clone is clever and avoids
doing any work.
Avery
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
2008-04-23 17:00 ` Junio C Hamano
2008-04-23 18:36 ` Avery Pennarun
@ 2008-04-23 22:13 ` Jeff King
2008-04-24 1:28 ` Jeff King
1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-23 22:13 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Avery Pennarun, Johannes Sixt, Git Mailing List
On Wed, Apr 23, 2008 at 10:00:59AM -0700, Junio C Hamano wrote:
> - Teach people that leftover cruft is nothing to worry about.
But it _is_ something to worry about in some particular situations. For
run-of-the-mill rebasing, sure, ignore it. But this question usually
comes up because the user did something like:
1. import from foreign SCM or other source
2. realize massive, history-wide mistake; git filter-branch
to fix up the changes
3. wonder why git is using twice as much space as it needs to; with
a repository in the hundreds or thousands of megs, this can get
really annoying (either because of wasted space, or because some
operations, like "git repack -a" actually do per-object work).
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
2008-04-23 22:13 ` Jeff King
@ 2008-04-24 1:28 ` Jeff King
2008-04-24 15:43 ` Avery Pennarun
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-24 1:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Avery Pennarun, Johannes Sixt, Git Mailing List
On Wed, Apr 23, 2008 at 06:13:16PM -0400, Jeff King wrote:
> > - Teach people that leftover cruft is nothing to worry about.
>
> But it _is_ something to worry about in some particular situations. For
> run-of-the-mill rebasing, sure, ignore it. But this question usually
> comes up because the user did something like:
OK, maybe I am wrong. Within a few hours of me posting this, somebody
starts a new thread with a toy example wondering why git-gc didn't clean
up an --amended commit.
I don't know the best way to teach people about this (short of using a
big stick, of course), but maybe something like this would help a
little:
-- >8 --
doc/git-gc: add a note about what is collected
It seems to be a FAQ that people try running git-gc, and
then get puzzled about why the size of their .git directory
didn't change. This note mentions the reasons why things
might unexpectedly get kept.
Signed-off-by: Jeff King <peff@peff.net>
---
Documentation/git-gc.txt | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index d424a4e..9a4b62e 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -104,6 +104,21 @@ The optional configuration variable 'gc.pruneExpire' controls how old
the unreferenced loose objects have to be before they are pruned. The
default is "2 weeks ago".
+
+Notes
+-----
+
+git-gc tries very hard to be safe about the garbage it collects. In
+particular, it will keep not only objects referenced by your current set
+of branches and tags, but also objects referenced by the index, remote
+tracking branches, refs saved by linkgit:git-filter-branch[1] in
+refs/original/, or reflogs (which may references commits in branches
+that were later amended or rewound).
+
+If you are expecting some objects to be collected and it isn't, check
+all of those locations and decide whether it makes sense in your case to
+remove those references.
+
See Also
--------
linkgit:git-prune[1]
--
1.5.5.1.143.ge2bb9
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
2008-04-24 1:28 ` Jeff King
@ 2008-04-24 15:43 ` Avery Pennarun
2008-04-24 16:14 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Avery Pennarun @ 2008-04-24 15:43 UTC (permalink / raw)
To: Jeff King; +Cc: Git Mailing List
On 4/23/08, Jeff King <peff@peff.net> wrote:
> +
> +Notes
> +-----
> +
> +git-gc tries very hard to be safe about the garbage it collects. In
> +particular, it will keep not only objects referenced by your current set
> +of branches and tags, but also objects referenced by the index, remote
> +tracking branches, refs saved by linkgit:git-filter-branch[1] in
> +refs/original/, or reflogs (which may references commits in branches
> +that were later amended or rewound).
> +
> +If you are expecting some objects to be collected and it isn't, check
> +all of those locations and decide whether it makes sense in your case to
> +remove those references.
> +
This information would have helped me quite a bit when I first
encountered this problem. It would be nice if it also showed up under
git-prune (since git-gc doesn't delete anything itself, if I
understand correctly). Also a link to some information about reflogs
(even just to "see also" git-reflog) would help, since I didn't hear
about reflogs at all until after I joined the mailing list.
Avery
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
2008-04-24 15:43 ` Avery Pennarun
@ 2008-04-24 16:14 ` Jeff King
2008-04-24 16:59 ` Avery Pennarun
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-24 16:14 UTC (permalink / raw)
To: Avery Pennarun; +Cc: Git Mailing List
On Thu, Apr 24, 2008 at 11:43:55AM -0400, Avery Pennarun wrote:
> > +If you are expecting some objects to be collected and it isn't, check
> > +all of those locations and decide whether it makes sense in your case to
> > +remove those references.
> > +
>
> This information would have helped me quite a bit when I first
> encountered this problem. It would be nice if it also showed up under
> git-prune (since git-gc doesn't delete anything itself, if I
Hmm, maybe it would make sense to put that note in git-prune, with a
note in git-gc to look at the prune page.
> understand correctly). Also a link to some information about reflogs
> (even just to "see also" git-reflog) would help, since I didn't hear
> about reflogs at all until after I joined the mailing list.
$ grep -A6 See.Also Documentation/git-gc.txt
See Also
--------
linkgit:git-prune[1]
linkgit:git-reflog[1]
linkgit:git-repack[1]
linkgit:git-rerere[1]
But if the note were moved to git-prune, it would be natural to mention
git-reflog there. What do you think?
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
2008-04-24 16:14 ` Jeff King
@ 2008-04-24 16:59 ` Avery Pennarun
2008-04-29 20:45 ` [PATCH] Documentation: point git-prune users to git-gc Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Avery Pennarun @ 2008-04-24 16:59 UTC (permalink / raw)
To: Jeff King; +Cc: Git Mailing List
On 4/24/08, Jeff King <peff@peff.net> wrote:
> Hmm, maybe it would make sense to put that note in git-prune, with a
> note in git-gc to look at the prune page.
Perhaps.
> But if the note were moved to git-prune, it would be natural to mention
> git-reflog there. What do you think?
I gather there's a movement in recent git versions (sorry, I only
tuned in recently) to encourage people to use git-gc instead of
git-prune in almost all cases. The reasons I ever looked at git-prune
at all was that git-gc mentioned it in "See Also", and because
"git-prune" sounded more obviously like what I wanted than "git-gc"
when I looked at "man git".
Adding git-gc *and* git-reflog as See Also entries in git-prune would
make sense to me.
Avery
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] Documentation: point git-prune users to git-gc
2008-04-24 16:59 ` Avery Pennarun
@ 2008-04-29 20:45 ` Jeff King
2008-04-29 22:05 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-29 20:45 UTC (permalink / raw)
To: Avery Pennarun; +Cc: Junio C Hamano, Git Mailing List
On Thu, Apr 24, 2008 at 12:59:34PM -0400, Avery Pennarun wrote:
> I gather there's a movement in recent git versions (sorry, I only
> tuned in recently) to encourage people to use git-gc instead of
> git-prune in almost all cases. The reasons I ever looked at git-prune
Yes, I don't think there is any reason for most people to use git-prune
at all, unless they are trying specifically to prune and don't want the
other gc effects to happen.
Junio, please correct me if I'm wrong there.
> Adding git-gc *and* git-reflog as See Also entries in git-prune would
> make sense to me.
Agreed. Below is a patch that will hopefully clarify the situation.
-- >8 --
Documentation: point git-prune users to git-gc
Most users should be using git-gc instead of directly
calling prune. For those who really do want more information
on pruning, let's point them at git-fsck, which goes into
slightly more detail on reachability.
And since we're pointing users there, let's make sure
reflogs are mentioned in git-fsck(1).
Signed-off-by: Jeff King <peff@peff.net>
---
Documentation/git-fsck.txt | 3 ++-
Documentation/git-prune.txt | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index f16cb98..4cc26fb 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -22,7 +22,8 @@ OPTIONS
An object to treat as the head of an unreachability trace.
+
If no objects are given, git-fsck defaults to using the
-index file and all SHA1 references in .git/refs/* as heads.
+index file, all SHA1 references in .git/refs/*, and all reflogs (unless
+--no-reflogs is given) as heads.
--unreachable::
Print out objects that exist but that aren't readable from any
diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index f151cff..f92bb8c 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -13,6 +13,9 @@ SYNOPSIS
DESCRIPTION
-----------
+NOTE: In most cases, users should run linkgit:git-gc[1], which calls
+git-prune. See the section "NOTES", below.
+
This runs `git-fsck --unreachable` using all the refs
available in `$GIT_DIR/refs`, optionally with additional set of
objects specified on the command line, and prunes all
@@ -50,6 +53,23 @@ borrows from your repository via its
$ git prune $(cd ../another && $(git-rev-parse --all))
------------
+Notes
+-----
+
+In most cases, users will not need to call git-prune directly, but
+should instead call linkgit:git-gc[1], which handles pruning along with
+many other housekeeping tasks.
+
+For a description of which objects are considered for pruning, see
+git-fsck's --unreachable option.
+
+See Also
+--------
+
+linkgit:git-fsck[1],
+linkgit:git-gc[1],
+linkgit:git-reflog[1]
+
Author
------
Written by Linus Torvalds <torvalds@osdl.org>
--
1.5.5.1.172.g4dce
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] Documentation: point git-prune users to git-gc
2008-04-29 20:45 ` [PATCH] Documentation: point git-prune users to git-gc Jeff King
@ 2008-04-29 22:05 ` Junio C Hamano
2008-04-29 23:19 ` Jeff King
0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2008-04-29 22:05 UTC (permalink / raw)
To: Jeff King; +Cc: Avery Pennarun, Git Mailing List
Jeff King <peff@peff.net> writes:
> Yes, I don't think there is any reason for most people to use git-prune
> at all, unless they are trying specifically to prune and don't want the
> other gc effects to happen.
>
> Junio, please correct me if I'm wrong there.
Well, this is a hard statement to make corrections to. If A is defined to
be a subset of B, and A is generally useful, the only reason to do B is
when you want the effect of B without anything else. So your statement
cannot be incorrect.
However, in order to help people decide when to run B (or, if there ever
be a case where they might want to), there needs a discussion what other
things that _might_ be unwanted A does in addition to B.
For that reason,...
> diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
> index f151cff..f92bb8c 100644
> --- a/Documentation/git-prune.txt
> +++ b/Documentation/git-prune.txt
> @@ -13,6 +13,9 @@ SYNOPSIS
> DESCRIPTION
> -----------
>
> +NOTE: In most cases, users should run linkgit:git-gc[1], which calls
> +git-prune. See the section "NOTES", below.
> +
I think this note upfront is not helping readers very much (this is
git-prune documentation after all -- they are interested in the command
and not gc), but ...
> This runs `git-fsck --unreachable` using all the refs
> available in `$GIT_DIR/refs`, optionally with additional set of
> objects specified on the command line, and prunes all
> @@ -50,6 +53,23 @@ borrows from your repository via its
> $ git prune $(cd ../another && $(git-rev-parse --all))
> ------------
>
> +Notes
> +-----
> +
> +In most cases, users will not need to call git-prune directly, but
> +should instead call linkgit:git-gc[1], which handles pruning along with
> +many other housekeeping tasks.
... this paragraph should be made a bit fatter by mentioning what "other
housekeeping tasks" are.
> +For a description of which objects are considered for pruning, see
> +git-fsck's --unreachable option.
> +
> +See Also
> +--------
> +
> +linkgit:git-fsck[1],
> +linkgit:git-gc[1],
> +linkgit:git-reflog[1]
> +
> Author
> ------
> Written by Linus Torvalds <torvalds@osdl.org>
> --
> 1.5.5.1.172.g4dce
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Documentation: point git-prune users to git-gc
2008-04-29 22:05 ` Junio C Hamano
@ 2008-04-29 23:19 ` Jeff King
2008-04-30 1:01 ` Junio C Hamano
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-29 23:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Avery Pennarun, Git Mailing List
On Tue, Apr 29, 2008 at 03:05:03PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Yes, I don't think there is any reason for most people to use git-prune
> > at all, unless they are trying specifically to prune and don't want the
> > other gc effects to happen.
> >
> > Junio, please correct me if I'm wrong there.
>
> Well, this is a hard statement to make corrections to. If A is defined to
> be a subset of B, and A is generally useful, the only reason to do B is
> when you want the effect of B without anything else. So your statement
> cannot be incorrect.
Heh. Sorry, I got very sloppy with my wording...there was an
11-month-old child yelling in my ear. :)
My meaning was: "people who want to clean up their repo but don't know
the right command stumble upon git-prune. They probably should be using
git-gc instead. People who know that they want to prune presumably know
enough to ignore the warning note."
> However, in order to help people decide when to run B (or, if there ever
> be a case where they might want to), there needs a discussion what other
> things that _might_ be unwanted A does in addition to B.
Fair enough.
> > --- a/Documentation/git-prune.txt
> > +++ b/Documentation/git-prune.txt
> > @@ -13,6 +13,9 @@ SYNOPSIS
> > DESCRIPTION
> > -----------
> >
> > +NOTE: In most cases, users should run linkgit:git-gc[1], which calls
> > +git-prune. See the section "NOTES", below.
> > +
>
> I think this note upfront is not helping readers very much (this is
> git-prune documentation after all -- they are interested in the command
> and not gc), but ...
I'm not so sure that they are interested in the prune command. At first
I started with just a note near the end, but the point of this is
specifically to deal with users who "stumble" upon prune, either from
reading the command list (i.e., trying to match a command to the
objective they want to perform) or from pre-gc tutorials or emails which
mention it.
> > +Notes
> > +-----
> > +
> > +In most cases, users will not need to call git-prune directly, but
> > +should instead call linkgit:git-gc[1], which handles pruning along with
> > +many other housekeeping tasks.
>
> ... this paragraph should be made a bit fatter by mentioning what "other
> housekeeping tasks" are.
OK, I was trying to imply "go look at git-gc for those tasks" so they
didn't have to be repeated. Would you prefer it be spelled out
explicitly here, or is a more firm pointer OK?
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Documentation: point git-prune users to git-gc
2008-04-29 23:19 ` Jeff King
@ 2008-04-30 1:01 ` Junio C Hamano
0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2008-04-30 1:01 UTC (permalink / raw)
To: Jeff King; +Cc: Avery Pennarun, Git Mailing List
Jeff King <peff@peff.net> writes:
> I'm not so sure that they are interested in the prune command. At first
> I started with just a note near the end, but the point of this is
> specifically to deal with users who "stumble" upon prune, either from
> reading the command list (i.e., trying to match a command to the
> objective they want to perform) or from pre-gc tutorials or emails which
> mention it.
Ah, you are right. People tend to stop reading when they _think_ they
heard enough even though they haven't. The note upfront is good, and I
suspect we would not have to reword the latter parts either then.
>> > +Notes
>> > +-----
>> > +
>> > +In most cases, users will not need to call git-prune directly, but
>> > +should instead call linkgit:git-gc[1], which handles pruning along with
>> > +many other housekeeping tasks.
>>
>> ... this paragraph should be made a bit fatter by mentioning what "other
>> housekeeping tasks" are.
>
> OK, I was trying to imply "go look at git-gc for those tasks" so they
> didn't have to be repeated. Would you prefer it be spelled out
> explicitly here, or is a more firm pointer OK?
Will apply as is.
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-04-30 1:02 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-23 15:41 git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first Avery Pennarun
2008-04-23 17:00 ` Junio C Hamano
2008-04-23 18:36 ` Avery Pennarun
2008-04-23 22:13 ` Jeff King
2008-04-24 1:28 ` Jeff King
2008-04-24 15:43 ` Avery Pennarun
2008-04-24 16:14 ` Jeff King
2008-04-24 16:59 ` Avery Pennarun
2008-04-29 20:45 ` [PATCH] Documentation: point git-prune users to git-gc Jeff King
2008-04-29 22:05 ` Junio C Hamano
2008-04-29 23:19 ` Jeff King
2008-04-30 1:01 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).