git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
@ 2008-04-23 15:41 Avery Pennarun
  2008-04-23 17:00 ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Avery Pennarun @ 2008-04-23 15:41 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Git Mailing List

On 4/23/08, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Peter Karlsson schrieb:
> > [Not seeing any unreachable objects]
>
> > Jeff King:
> >> Did you remove refs/original/ ?
>  >
>  > That, and cloned the repository to a new location after the conversion,
>  > and removing the references to "origin" there. It does seem that the
>  > objects are still there, but I can't see them with "gitk --all".
>
> Did you clone locally? Then you must use the file:// protocol, otherwise
>  everything is hard-linked from the origin.

This question has come up at least once a week since I subscribed to
the list.  I can think of these solutions:

- Add a note to the git-gc and/or git-repack man page about how hidden
refs can impact the cleanup.

- Add an option to make git-clone *not* hardlink stuff; its different
behaviour for hardlinking vs. file:// seems to be very confusing.

- Make git-gc give a warning when there are some objects that are only
referenced via the reflog or refs/original.  (I suspect this would
trigger too often though.)

- Give git-gc a "really, I'm serious" option that makes it ignore the
reflog and refs/original.

Thoughts?

Avery

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
  2008-04-23 15:41 git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first Avery Pennarun
@ 2008-04-23 17:00 ` Junio C Hamano
  2008-04-23 18:36   ` Avery Pennarun
  2008-04-23 22:13   ` Jeff King
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2008-04-23 17:00 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Johannes Sixt, Git Mailing List

"Avery Pennarun" <apenwarr@gmail.com> writes:

> This question has come up at least once a week since I subscribed to
> the list.  I can think of these solutions:
>
> - Add a note to the git-gc and/or git-repack man page about how hidden
> refs can impact the cleanup.
>
> - Add an option to make git-clone *not* hardlink stuff; its different
> behaviour for hardlinking vs. file:// seems to be very confusing.
>
> - Make git-gc give a warning when there are some objects that are only
> referenced via the reflog or refs/original.  (I suspect this would
> trigger too often though.)
>
> - Give git-gc a "really, I'm serious" option that makes it ignore the
> reflog and refs/original.

- Teach people that leftover cruft is nothing to worry about.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
  2008-04-23 17:00 ` Junio C Hamano
@ 2008-04-23 18:36   ` Avery Pennarun
  2008-04-23 22:13   ` Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Avery Pennarun @ 2008-04-23 18:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Git Mailing List

On 4/23/08, Junio C Hamano <gitster@pobox.com> wrote:
> "Avery Pennarun" <apenwarr@gmail.com> writes:
>
>  > This question has come up at least once a week since I subscribed to
>  > the list.  I can think of these solutions:
>  >
>  > - Add a note to the git-gc and/or git-repack man page about how hidden
>  > refs can impact the cleanup.
>  >
>  > - Add an option to make git-clone *not* hardlink stuff; its different
>  > behaviour for hardlinking vs. file:// seems to be very confusing.
>  >
>  > - Make git-gc give a warning when there are some objects that are only
>  > referenced via the reflog or refs/original.  (I suspect this would
>  > trigger too often though.)
>  >
>  > - Give git-gc a "really, I'm serious" option that makes it ignore the
>  > reflog and refs/original.
>
> - Teach people that leftover cruft is nothing to worry about.

I think any option that starts with "teach people" will not reduce FAQ
traffic to the list :)  But maybe we could remind people of this
somewhere prominent.  The git-filter-branch man page?

That said, I think I know why people are concerned about the cruft:
it's for the same reason I was when I first tried git-filter-branch to
get rid of some gigantic files after importing from svn, to cut the
size of a clone from >1GB to <100MB.  It's impossible to see if I've
succeeded or not unless I make an actual clone, and even *then* I was
misled at first because making a local clone is clever and avoids
doing any work.

Avery

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
  2008-04-23 17:00 ` Junio C Hamano
  2008-04-23 18:36   ` Avery Pennarun
@ 2008-04-23 22:13   ` Jeff King
  2008-04-24  1:28     ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-23 22:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Avery Pennarun, Johannes Sixt, Git Mailing List

On Wed, Apr 23, 2008 at 10:00:59AM -0700, Junio C Hamano wrote:

> - Teach people that leftover cruft is nothing to worry about.

But it _is_ something to worry about in some particular situations. For
run-of-the-mill rebasing, sure, ignore it. But this question usually
comes up because the user did something like:

  1. import from foreign SCM or other source
  2. realize massive, history-wide mistake; git filter-branch
     to fix up the changes
  3. wonder why git is using twice as much space as it needs to; with
     a repository in the hundreds or thousands of megs, this can get
     really annoying (either because of wasted space, or because some
     operations, like "git repack -a" actually do per-object work).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
  2008-04-23 22:13   ` Jeff King
@ 2008-04-24  1:28     ` Jeff King
  2008-04-24 15:43       ` Avery Pennarun
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-24  1:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Avery Pennarun, Johannes Sixt, Git Mailing List

On Wed, Apr 23, 2008 at 06:13:16PM -0400, Jeff King wrote:

> > - Teach people that leftover cruft is nothing to worry about.
> 
> But it _is_ something to worry about in some particular situations. For
> run-of-the-mill rebasing, sure, ignore it. But this question usually
> comes up because the user did something like:

OK, maybe I am wrong. Within a few hours of me posting this, somebody
starts a new thread with a toy example wondering why git-gc didn't clean
up an --amended commit.

I don't know the best way to teach people about this (short of using a
big stick, of course), but maybe something like this would help a
little:

-- >8 --
doc/git-gc: add a note about what is collected

It seems to be a FAQ that people try running git-gc, and
then get puzzled about why the size of their .git directory
didn't change. This note mentions the reasons why things
might unexpectedly get kept.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-gc.txt |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index d424a4e..9a4b62e 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -104,6 +104,21 @@ The optional configuration variable 'gc.pruneExpire' controls how old
 the unreferenced loose objects have to be before they are pruned.  The
 default is "2 weeks ago".
 
+
+Notes
+-----
+
+git-gc tries very hard to be safe about the garbage it collects. In
+particular, it will keep not only objects referenced by your current set
+of branches and tags, but also objects referenced by the index, remote
+tracking branches, refs saved by linkgit:git-filter-branch[1] in
+refs/original/, or reflogs (which may references commits in branches
+that were later amended or rewound).
+
+If you are expecting some objects to be collected and it isn't, check
+all of those locations and decide whether it makes sense in your case to
+remove those references.
+
 See Also
 --------
 linkgit:git-prune[1]
-- 
1.5.5.1.143.ge2bb9

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
  2008-04-24  1:28     ` Jeff King
@ 2008-04-24 15:43       ` Avery Pennarun
  2008-04-24 16:14         ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Avery Pennarun @ 2008-04-24 15:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 4/23/08, Jeff King <peff@peff.net> wrote:
>  +
>  +Notes
>  +-----
>  +
>  +git-gc tries very hard to be safe about the garbage it collects. In
>  +particular, it will keep not only objects referenced by your current set
>  +of branches and tags, but also objects referenced by the index, remote
>  +tracking branches, refs saved by linkgit:git-filter-branch[1] in
>  +refs/original/, or reflogs (which may references commits in branches
>  +that were later amended or rewound).
>  +
>  +If you are expecting some objects to be collected and it isn't, check
>  +all of those locations and decide whether it makes sense in your case to
>  +remove those references.
>  +

This information would have helped me quite a bit when I first
encountered this problem.  It would be nice if it also showed up under
git-prune (since git-gc doesn't delete anything itself, if I
understand correctly).  Also a link to some information about reflogs
(even just to "see also" git-reflog) would help, since I didn't hear
about reflogs at all until after I joined the mailing list.

Avery

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
  2008-04-24 15:43       ` Avery Pennarun
@ 2008-04-24 16:14         ` Jeff King
  2008-04-24 16:59           ` Avery Pennarun
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-24 16:14 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Git Mailing List

On Thu, Apr 24, 2008 at 11:43:55AM -0400, Avery Pennarun wrote:

> >  +If you are expecting some objects to be collected and it isn't, check
> >  +all of those locations and decide whether it makes sense in your case to
> >  +remove those references.
> >  +
> 
> This information would have helped me quite a bit when I first
> encountered this problem.  It would be nice if it also showed up under
> git-prune (since git-gc doesn't delete anything itself, if I

Hmm, maybe it would make sense to put that note in git-prune, with a
note in git-gc to look at the prune page.

> understand correctly).  Also a link to some information about reflogs
> (even just to "see also" git-reflog) would help, since I didn't hear
> about reflogs at all until after I joined the mailing list.

$ grep -A6 See.Also Documentation/git-gc.txt
See Also
--------
linkgit:git-prune[1]
linkgit:git-reflog[1]
linkgit:git-repack[1]
linkgit:git-rerere[1]

But if the note were moved to git-prune, it would be natural to mention
git-reflog there. What do you think?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first
  2008-04-24 16:14         ` Jeff King
@ 2008-04-24 16:59           ` Avery Pennarun
  2008-04-29 20:45             ` [PATCH] Documentation: point git-prune users to git-gc Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Avery Pennarun @ 2008-04-24 16:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 4/24/08, Jeff King <peff@peff.net> wrote:
> Hmm, maybe it would make sense to put that note in git-prune, with a
>  note in git-gc to look at the prune page.

Perhaps.

>  But if the note were moved to git-prune, it would be natural to mention
>  git-reflog there. What do you think?

I gather there's a movement in recent git versions (sorry, I only
tuned in recently) to encourage people to use git-gc instead of
git-prune in almost all cases.  The reasons I ever looked at git-prune
at all was that git-gc mentioned it in "See Also", and because
"git-prune" sounded more obviously like what I wanted than "git-gc"
when I looked at "man git".

Adding git-gc *and* git-reflog as See Also entries in git-prune would
make sense to me.

Avery

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] Documentation: point git-prune users to git-gc
  2008-04-24 16:59           ` Avery Pennarun
@ 2008-04-29 20:45             ` Jeff King
  2008-04-29 22:05               ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-29 20:45 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Junio C Hamano, Git Mailing List

On Thu, Apr 24, 2008 at 12:59:34PM -0400, Avery Pennarun wrote:

> I gather there's a movement in recent git versions (sorry, I only
> tuned in recently) to encourage people to use git-gc instead of
> git-prune in almost all cases.  The reasons I ever looked at git-prune

Yes, I don't think there is any reason for most people to use git-prune
at all, unless they are trying specifically to prune and don't want the
other gc effects to happen.

Junio, please correct me if I'm wrong there.

> Adding git-gc *and* git-reflog as See Also entries in git-prune would
> make sense to me.

Agreed. Below is a patch that will hopefully clarify the situation.

-- >8 --
Documentation: point git-prune users to git-gc

Most users should be using git-gc instead of directly
calling prune. For those who really do want more information
on pruning, let's point them at git-fsck, which goes into
slightly more detail on reachability.

And since we're pointing users there, let's make sure
reflogs are mentioned in git-fsck(1).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-fsck.txt  |    3 ++-
 Documentation/git-prune.txt |   20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index f16cb98..4cc26fb 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -22,7 +22,8 @@ OPTIONS
 	An object to treat as the head of an unreachability trace.
 +
 If no objects are given, git-fsck defaults to using the
-index file and all SHA1 references in .git/refs/* as heads.
+index file, all SHA1 references in .git/refs/*, and all reflogs (unless
+--no-reflogs is given) as heads.
 
 --unreachable::
 	Print out objects that exist but that aren't readable from any
diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
index f151cff..f92bb8c 100644
--- a/Documentation/git-prune.txt
+++ b/Documentation/git-prune.txt
@@ -13,6 +13,9 @@ SYNOPSIS
 DESCRIPTION
 -----------
 
+NOTE: In most cases, users should run linkgit:git-gc[1], which calls
+git-prune. See the section "NOTES", below.
+
 This runs `git-fsck --unreachable` using all the refs
 available in `$GIT_DIR/refs`, optionally with additional set of
 objects specified on the command line, and prunes all
@@ -50,6 +53,23 @@ borrows from your repository via its
 $ git prune $(cd ../another && $(git-rev-parse --all))
 ------------
 
+Notes
+-----
+
+In most cases, users will not need to call git-prune directly, but
+should instead call linkgit:git-gc[1], which handles pruning along with
+many other housekeeping tasks.
+
+For a description of which objects are considered for pruning, see
+git-fsck's --unreachable option.
+
+See Also
+--------
+
+linkgit:git-fsck[1],
+linkgit:git-gc[1],
+linkgit:git-reflog[1]
+
 Author
 ------
 Written by Linus Torvalds <torvalds@osdl.org>
-- 
1.5.5.1.172.g4dce

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation: point git-prune users to git-gc
  2008-04-29 20:45             ` [PATCH] Documentation: point git-prune users to git-gc Jeff King
@ 2008-04-29 22:05               ` Junio C Hamano
  2008-04-29 23:19                 ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2008-04-29 22:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Avery Pennarun, Git Mailing List

Jeff King <peff@peff.net> writes:

> Yes, I don't think there is any reason for most people to use git-prune
> at all, unless they are trying specifically to prune and don't want the
> other gc effects to happen.
>
> Junio, please correct me if I'm wrong there.

Well, this is a hard statement to make corrections to.  If A is defined to
be a subset of B, and A is generally useful, the only reason to do B is
when you want the effect of B without anything else.  So your statement
cannot be incorrect.

However, in order to help people decide when to run B (or, if there ever
be a case where they might want to), there needs a discussion what other
things that _might_ be unwanted A does in addition to B.

For that reason,...

> diff --git a/Documentation/git-prune.txt b/Documentation/git-prune.txt
> index f151cff..f92bb8c 100644
> --- a/Documentation/git-prune.txt
> +++ b/Documentation/git-prune.txt
> @@ -13,6 +13,9 @@ SYNOPSIS
>  DESCRIPTION
>  -----------
>  
> +NOTE: In most cases, users should run linkgit:git-gc[1], which calls
> +git-prune. See the section "NOTES", below.
> +

I think this note upfront is not helping readers very much (this is
git-prune documentation after all -- they are interested in the command
and not gc), but ...

>  This runs `git-fsck --unreachable` using all the refs
>  available in `$GIT_DIR/refs`, optionally with additional set of
>  objects specified on the command line, and prunes all
> @@ -50,6 +53,23 @@ borrows from your repository via its
>  $ git prune $(cd ../another && $(git-rev-parse --all))
>  ------------
>  
> +Notes
> +-----
> +
> +In most cases, users will not need to call git-prune directly, but
> +should instead call linkgit:git-gc[1], which handles pruning along with
> +many other housekeeping tasks.

... this paragraph should be made a bit fatter by mentioning what "other
housekeeping tasks" are.

> +For a description of which objects are considered for pruning, see
> +git-fsck's --unreachable option.
> +
> +See Also
> +--------
> +
> +linkgit:git-fsck[1],
> +linkgit:git-gc[1],
> +linkgit:git-reflog[1]
> +
>  Author
>  ------
>  Written by Linus Torvalds <torvalds@osdl.org>
> -- 
> 1.5.5.1.172.g4dce

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation: point git-prune users to git-gc
  2008-04-29 22:05               ` Junio C Hamano
@ 2008-04-29 23:19                 ` Jeff King
  2008-04-30  1:01                   ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2008-04-29 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Avery Pennarun, Git Mailing List

On Tue, Apr 29, 2008 at 03:05:03PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Yes, I don't think there is any reason for most people to use git-prune
> > at all, unless they are trying specifically to prune and don't want the
> > other gc effects to happen.
> >
> > Junio, please correct me if I'm wrong there.
> 
> Well, this is a hard statement to make corrections to.  If A is defined to
> be a subset of B, and A is generally useful, the only reason to do B is
> when you want the effect of B without anything else.  So your statement
> cannot be incorrect.

Heh. Sorry, I got very sloppy with my wording...there was an
11-month-old child yelling in my ear. :)

My meaning was: "people who want to clean up their repo but don't know
the right command stumble upon git-prune. They probably should be using
git-gc instead. People who know that they want to prune presumably know
enough to ignore the warning note."

> However, in order to help people decide when to run B (or, if there ever
> be a case where they might want to), there needs a discussion what other
> things that _might_ be unwanted A does in addition to B.

Fair enough.

> > --- a/Documentation/git-prune.txt
> > +++ b/Documentation/git-prune.txt
> > @@ -13,6 +13,9 @@ SYNOPSIS
> >  DESCRIPTION
> >  -----------
> >  
> > +NOTE: In most cases, users should run linkgit:git-gc[1], which calls
> > +git-prune. See the section "NOTES", below.
> > +
> 
> I think this note upfront is not helping readers very much (this is
> git-prune documentation after all -- they are interested in the command
> and not gc), but ...

I'm not so sure that they are interested in the prune command. At first
I started with just a note near the end, but the point of this is
specifically to deal with users who "stumble" upon prune, either from
reading the command list (i.e., trying to match a command to the
objective they want to perform) or from pre-gc tutorials or emails which
mention it.

> > +Notes
> > +-----
> > +
> > +In most cases, users will not need to call git-prune directly, but
> > +should instead call linkgit:git-gc[1], which handles pruning along with
> > +many other housekeeping tasks.
> 
> ... this paragraph should be made a bit fatter by mentioning what "other
> housekeeping tasks" are.

OK, I was trying to imply "go look at git-gc for those tasks" so they
didn't have to be repeated. Would you prefer it be spelled out
explicitly here, or is a more firm pointer OK?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Documentation: point git-prune users to git-gc
  2008-04-29 23:19                 ` Jeff King
@ 2008-04-30  1:01                   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2008-04-30  1:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Avery Pennarun, Git Mailing List

Jeff King <peff@peff.net> writes:

> I'm not so sure that they are interested in the prune command. At first
> I started with just a note near the end, but the point of this is
> specifically to deal with users who "stumble" upon prune, either from
> reading the command list (i.e., trying to match a command to the
> objective they want to perform) or from pre-gc tutorials or emails which
> mention it.

Ah, you are right.  People tend to stop reading when they _think_ they
heard enough even though they haven't.  The note upfront is good, and I
suspect we would not have to reword the latter parts either then.

>> > +Notes
>> > +-----
>> > +
>> > +In most cases, users will not need to call git-prune directly, but
>> > +should instead call linkgit:git-gc[1], which handles pruning along with
>> > +many other housekeeping tasks.
>> 
>> ... this paragraph should be made a bit fatter by mentioning what "other
>> housekeeping tasks" are.
>
> OK, I was trying to imply "go look at git-gc for those tasks" so they
> didn't have to be repeated. Would you prefer it be spelled out
> explicitly here, or is a more firm pointer OK?

Will apply as is.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-04-30  1:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-23 15:41 git-gc doesn't clean up leftover objects after git-filter-branch unless you clone first Avery Pennarun
2008-04-23 17:00 ` Junio C Hamano
2008-04-23 18:36   ` Avery Pennarun
2008-04-23 22:13   ` Jeff King
2008-04-24  1:28     ` Jeff King
2008-04-24 15:43       ` Avery Pennarun
2008-04-24 16:14         ` Jeff King
2008-04-24 16:59           ` Avery Pennarun
2008-04-29 20:45             ` [PATCH] Documentation: point git-prune users to git-gc Jeff King
2008-04-29 22:05               ` Junio C Hamano
2008-04-29 23:19                 ` Jeff King
2008-04-30  1:01                   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).